- Truncating the dataset
- Univariate to multivariate format
- Starting values: the apocryphal Dutch method
AS Reml has not really been designed to perform data management (better use a database system - Access, SAS, Oracle, MySQL, your pick), but still allows you to perform a few variable conversions, subsetting, etc. Below you will find some (hopefully) useful examples.
Truncating the dataset
The presence of runts (individual with extremely poor growth) is common in progeny trials. Lets say that you want to test the effect of dropping these trees from the analysis, but you are not sure where is the limit of size to drop the trees.
Dropping runts tree !P # Tree IDs, V1 rep 5 # replicates V2 iblock 20 # incomplete block V3 dbh04 # Diameter at age 4 V4 bd newdbh !=V4 !>=$A !*V4 !M0 crctest1ped.txt crctest1dat.txt !dopart $B !part 1 newdbh ~ mu rep !r rep.iblock tree 0 !part 2 bd ~ mu rep !r rep.iblock tree 0 # … etc
To run the previous example, assuming the file is called testrunts.as, you would need to specify to specify two numbers in the command line: the first one referring to $A (minimum diameter to include in the analysis) and the second referring to $B (part of the analysis to be run), e.g.
> asreml testrunts 5 1
would keep all trees with more than 5 cm of diameter and run part 1. So, how does it work? First, variable number 4 (dbh04) is copied to newdbh (!=V4); then, ASReml compares its value with the value taken by $A (!>=$A, 5 cm in this case). If newdbh is greater than or equal to $A, it takes a value of 1 (TRUE), 0 otherwise (FALSE). Now newdbh contains either 1s or 0s. Later newdbh is multiplied by dbh04 (!*V4), taken the value of dbh04 for the 1s and 0 for the 0s. Finally, the 0s are converted to missing values (!M0).
Univariate to multivariate format
Sometimes the dataset is setup in a univariate way, but we are interested in using some multivariate structures that are easier to fit when the data appear as several traits. For example, we are interested in dbh in several sites, so for each record we have only one dbh in one site. It is possible to expand the dataset to have as many variables for dbh as there are sites.
Creating multiple variables from one variable tree !P # Tree IDs, V1 site 3 # Three sites with codes 1, 2 and 3 V2 rep 5 # replicates within sites V3 iblock 20 # incomplete blocks V4 dbh # Diameter V5 bd dbhS1 !=V2 !==1 !*V5 !M0 # dbh at site 1 dbhS2 !=V2 !==2 !*V5 !M0 # dbh at site 2 dbhS3 !=V2 !==3 !*V5 !M0 # dbh at site 3 crctest2ped.txt crctest2dat.txt !dopart $A !part 1 … … etc
In this case, dbhS1, dbhS2 and dbhS3 correspond to dbhs measures at sites 1, 2 and 3 respectively. How does it work? First the site codes are copied to, say, dbhS1 (!=V2). Then the value is compared to the site code (e.g. 1, !==1). If the result of the comparison is TRUE, the variable takes a value of 1, 0 otherwise. Now dbhS1, which contains 1s or 0s, is multiplied by the dbhs (!*V5) and the 0s are treated as missing values (!M 0). Same method is applied for other sites.
Starting values: the apocryphal Dutch method
How to get starting values is the second question (after how to define covariance structures) when starting to run multivariate analyses. It is usually framed as ‘Where do I get the bloody values from?’ In lots of cases we do not have an idea of the covariance, so a value like 0.1 is used, and in most bivariate cases will work well, thank you very much. In other cases, we have some knowledge of the correlation (for example, between growth and wood density, ~−0.1 to −0.3 is a good assumption. We can use that cov = r * sqrt(v1 * v2), where the variances v1 and v2 come from the univariate analyses and r from our assumption. Even better, we can use a CORR rather than US for most structures (except for the residuals) so it is possible to use the assumed correlation values directly.
The previous explanation is all good if we are putting together a bivariate analysis, but What happens when we want to run more than 2 traits? Lets say that we have 4 traits, we would need to run 6 bivariate analyses to get decent starting values to run the full multivariate (4 traits analysis). If you are doing this often, it becomes a real pain (wherever you prefer). Yes, there is hope and here is where the apocryphal Dutch method comes in.
Some time ago Greg Dutkowski and myself were involved with the Southern Tree Breeding Association people calculating genetic parameters to run Radiata Pine and Blue Gum national genetic evaluations. It was quite tedious, but Greg showed me this neat way of getting starting values which worked at least 50% of the time. He called it the ‘slash method’, but it was the end of a long day so I kept talking about the Dutch method.
The apocryphal Duth method makes heavy use of two ASReml features: the ability to pick up results from previous runs (using the -c flag at the command line) and the posibility to fit level specific variances for a factor (or a trait) using the slash (/) qualifier. The system works as follows:
- First, one runs univariate analyses for each of the traits (best using !dopart for each trait), so we can determine the appropriate model (on terms of significant fixed and random effects) for each trait. These analyses would be processed using something like: asreml CPtrial 5 1 (assuming that the command file is called CPtrial.as, the minimum dbh is 5 cm and we are running part 1). Analysis of E. globulus CP trial
tree !P family 57 !I rep 5 !I inblk 20 !I plot 264 !I dbh height pilo # dropping runts (trees with dbh less than A) # from the analysis newdbh !=V6 !>=$A !*V6 !M0 newhei !=V6 !>=$A !*V7 !M0 newpil !=V6 !>=$A !*V8 !M0 CPtrial.ped !ALPHA !GROUPS 10 CPtrial.dat !DOPART $B
- # Only the significant terms have been left in
# the models subraces are fitted as genetics # groups (see !GROUPS 10) !part 1 newdbh ~ mu rep !r rep.inblk plot tree family !part 2 newhei ~ mu rep !r rep.inblk tree family !part 3 newpil ~ mu rep !r rep.inblk tree family
- Second, we set a multivariate model including all traits BUT we define a covariance structure only for the residuals and, maybe, for the additive genetic factor (either tree or mother). All the other factors (e.g. replicate, plot, etc) are set using slash rather than dot, so ASReml is in effect fitting only variances (without any covariance) for those factors, without the need for defining covariance structures. Check also the at(Trait,1).plot that is fitting the plot factor only for trait 1 (newdbh). This is run as: asreml CPtrial 5 4.
# Multivariate analysis !part 4 newdbh newhei newpilo ~ Trait Trait.rep !r Trait/tree, Trait/rep.inblk Trait/family, at(Trait,1).plot
- # This line says that there is ONE error structure
1 2 0 0 Trait 0 US !+6 236 0.1 0.87 0.1 0.1 3.16 # The !+6 tells ASReml that there are 6 # covariance components coming in the next # lines (it allows readable formatting when # using lots of components)
- Once we run the multivariate run (and if we are lucky) the new model converges and we have a model with a full structure for the residuals and, maybe, a full structure for the additive genetic component. Now we can extend the model to another factor where we define the structure and use -c to continue running the model from the previous iterations (without the need to put the results from the previous step). Repeat this for each additional factor and cross your fingers. At the end of this process you should have the full model running. This is run as: asreml -c CPtrial 5 4.
- Multivariate analysis
part 4newdbh newhei newpilo ~ Trait Trait.rep !r Trait.tree,
Trait/rep.inblk Trait/family, at(Trait,1).plot
- # We are adding an additional structure for
# Trait.tree 1 2 1 0 Trait 0 US !+6 236 0.1 0.87 0.1 0.1 3.16 Trait.tree 2 Trait 0 CORR !+6 0.9 0.4 0.3 34.7 0.31 2.82 tree # When using CORR we first list the # correlations (first 3 numbers) followed by # the variances (next three).
and then, again, as: asreml -c CPtrial 5 4 for the following code:
# Multivariate analysis !part 4 newdbh newhei newpilo ~ Trait Trait.rep !r Trait.tree, Trait.rep.inblk Trait/family, at(Trait,1).plot # We added an additional structure # for Trait.rep.inblk 1 2 2 0 Trait 0 US !+6 236 0.1 0.87 0.1 0.1 3.16 Trait.tree 2 Trait 0 CORR 0.9 0.4 0.3 34.7 0.31 2.82 tree Trait.rep.inblk 2 Trait 0 CORR !+6 0.4 0.2 0.1 58.14 0.45 0.60 rep.inblk
and keep going for each additional structure (Trait.family in this case).
- A variant of this process (which is a good idea when there are some convergence problems) is to get the covariances and variances from the .asr file at each step and stick them in !part for the multivariate model in the .as file. This will allow you to tweak the model by parts and not loose any estimates when adding further terms in the model.
- There is neither explicit or implicit guarantee for the Dutch method, but if you are lucky it can save you a lot of time. If it doesn’t work you will need to run the bivariate analyses (Sorry).