ASReml · Multivariate Analysis

This page was last modified on 07 November 2005, at 14:34 NZST

Page contents:

Single site, multiple traits

By now, ASReml’s univariate notation should start making sense. After enjoying the speed and flexibility of the program for the univariate case, you are probably thinking of moving to more complex analyses. The next step will be to run two or more traits in one site.

Multiple traits one site
 tree_ID    !P
 mother    100 !I
 subrace     8 !A
 locality   30 !A
 rep         5 !I
 iblock     23 !I
 plot      100 !I
woodped.csv !CSV
woodtraits.csv !CSV !skip 1 !dopart $A

!part 1 # Basic Density
BD ~ mu subrace rep !r tree_ID

!part 2 # Cellulose Content
CC ~ mu subrace rep !r tree_ID

!part 3 # Bivariate analysis 
BD CC ~ Trait Trait.subrace Trait.rep !r ,
        Trait.tree_ID !f mv
1 2 1 
Trait 0 US 645 1 0.32 

Trait.tree_ID 2
Trait 0 CORR 0.8 516 1.51

Parts 1 and 2 of the previous example are just the univariate runs that will provide starting values for the multivariate analysis. Part 3 introduces some new notation. The left hand side (LHS) of the model equation now has two variables: BD and CC. Rather than mu we use the keyword Trait. Trait tells ASReml to hold a vector of results (rather than one overall mean -mu- we will have as many overall means as traits there are on the LHS. We also add an interaction between Trait and all other terms of the equation, so we get results for, say subrace, for each of the traits.

The 1 2 1 specifies (from left to right) that there is ONE independent error structure, which consists of the direct product of TWO matrices and that there is ONE additional covariance structure to define (which corresponds to Trait.tree_ID). The latter has to be defined because it is more complex than just the product of a design matrix by a scalar. The definition of the error structure comprises two matrices: an identity matrix of size number of observations (the 0 that tells ASReml to count the records) and a 2×2 covariance matrix (the size is specified by Trait; you could also use 2 rather than Trait here). This covariance matrix is unstructured (US) and has error variance of 645 for BD, 0.32 for CC (both values coming from the univariate analyses) and 1 for the error covariance between the traits (we are just guessing this number). The 0 between Trait and US is a place holder that we will use when dealing with spatial analysis.

Now we need to deal with the additional structure. We first have the name of the term for the structure (Trait.tree_ID) followed by the number of matrices to be multiplied (2). The first matrix is of dimension 2×2 (specified by Trait as in the case of the error structure) and is specified as a correlation matrix (CORR) in this case. You can use a US matrix as well, but correlation matrices tend to behave better and make convergence easier. The values are first the correlation between traits (just a guess) and the additive variances for BD (516) and for CC (1.51). The second matrix is the numerator relationship matrix, specified by tree_ID (which was defined as pedigree when using !P).

To make things a bit clearer concerning the US and CORR matrices, they are read as a triangular matrix row by row from left to right (US) and as a lower triangular matrix row by row from left to right, followed by one or more variances (CORR). Thus, if you have a covariance matrix A and a correlation matrix B, where v is variance, c is covariance and r is correlation, and the indices refer to the traits:

     v11 c12 c13 c14           v11 r12 r13 r14
     c21 v22 c23 c24           r21 v22 r23 r24
A =  c31 c32 v33 c34      B =  r31 r32 v33 r34
     c41 c42 c43 v44           r41 r42 r43 v44

Then the starting values of the US matrix would be specified in the following order: v11 c21 v22 c31 c32 v33 c41 c42 c43 v44; while the values for the CORR matrix would be in this order: r21 r31 r32 r41 r42 r43 v11 v22 v33 v44.

Another example is the analysis of a CP progeny trial using a parental model:

Bivariate analysis for CP trial
# don’t use !P because is not a tree model
 mom   12 !I  
 dad   12 !I
 fam  105 !I
 block 10
 # Skip first line of file
 # because it contains columns names
woodprops.txt !skip 1 !dopart $A
!part 1
# GCA and SCA model
# mom and(dad) create only one factor
BD ~ mu block !r mom and(dad) fam

!part 2
PY ~ mu block !r mom and(dad) fam

!part 3
# Bivariate GCA and SCA model
BD PY  ~ Trait Trait.block !r and(dad),
1 2 2
Trait 0 US 0.0038 0.0005 0.0027 !GP 2
Trait 0 CORR 0.9 0.0003 0.003 !GP

Trait.fam 2
Trait 0 CORR 0.2 0.0002 0.003 !GP

There are several elements in common with the previous example, so I will mention the few new issues. In the case of the Controlled Pollinated trial we know both parents, so we can get an estimate of dominance using the fam term, which refers to the family code. Because we are using a parental model rather than an individual model, we use mom and(dad) to overlay the design matrices and create a unique prediction for each parent, rather than one as male and one as male. This overlay creates a unique factor, something that you can see in part 3, where the structure refers to mom and(dad). A few points to highlight: the numbers after the model equation are now 1 2 2, because there is still only ONE error structure that is the product of TWO matrices, but now there are TWO additional covariance structures to be defined ( and Trait.fam). Finally, the use of !GP to enforce positive definite covariance matrices because, in this specific dataset, some of the parameters tended to be outside the parametric space (e.g. correlation over 1).

top of page

Two sites single trait treated as multivariate

In the univariate analysis page you can find the code for multiple site analysis considering either unique variances for additive variance and error variance or single additive variance and multiple error variances (one for each site). Another option is to consider the expressions of a trait in each site as a different trait (this goes back at least to 1952 in a paper by Doug Falconer). For example, diameter in site 1 would be considered as a different trait from diameter at site 2; thus, each trait is going to have its own additive and residual variances. The following example shows code for two sites, but it is easy to extend to any number of traits.

Multiple sites as multivariate analysis
 tree      !P       #V1
 site       2       #V2
 subrace   20       #V3
 rep        5       #V4
 iblock    20       #V5
 dbh_04             #V6
# These two new variables are created only
# to show an alternative way of setting up
# the problem
 dbh_site1 !=V2 !=1 !*V6 !M0    
 dbh_site2 !=V2 !=2 !*V6 !M0
multivarped.csv !csv
multivardat.csv !csv !dopart $A

# Univariate analysis of site 1 to get
# starting values
# We filter data to keep only site 1
!part 1          
!filter site !select 1
dbh_04 ~ mu subrace rep !r rep.iblock tree

# We now filter data to keep only site 2
!part 2     
!filter site !select 2
dbh_04 ~ mu subrace rep !r rep.iblock tree

# Multivariate analysis considering both sites
# No need to filter now
!part 3 
 dbh_04 ~ mu site site.subrace site.rep !r site.rep.iblock,
2 1 2
# The errors are independent between sites, so
# the covariance will be 0. Assuming 4000
# observations for site 1 and 3500 for site 2
4000 0 IDEN !S2=1335
3500 0 IDEN !S2=1102

# Incomplete block effects are independent
# between sites, so a diagonal structure is 
# ideal. It is not possible to use DIAG for
# the error, so there we use a covariance of 0
site.rep.iblock 2
site 0 DIAG 62 50   

# The additive performance on both sites is
# correlated, and we use CORR to get a direct
# estimate of that correlation
site.tree 2
site 0 CORR 0.85 752 600

# Equivalent multivariate analysis using 
# the new traits created
!part 4  
dbh_site1 dbh_site2 ~ Trait Trait.subrace Trait.rep,
                      !r Trait.rep.iblock Trait.tree
1 2 2
Trait 0 US 1335 0 1102

Trait.rep.iblock 2
Trait 0 DIAG 62 50

Trait.tree 2
Trait 0 CORR 0.85 752 600

A few explanations now. The previous example uses two alternative ways to code the problem. The first one (!part 3) uses site interacting with all the terms of the model equation, including tree. This creates results for each factor in both sites. The 2 1 2 after the model equation refers to TWO error structures, which are identity matrices of size number of observations (represented by the first value of the line). This is because the errors are independent between sites. The other TWO refers to the additional structures (site.rep.iblock and site.tree). site.rep.iblock is the variance of incomplete blocks in each site, and, because they are independent, we use a diagonal structure with the variances for each site as starting values. site.tree represents the tree (or additive genetic) variance in each site. Because the trees are related there is a correlation between the effects ib both sites. We can estimate the correlation directly if we use CORR (and it makes convergence easier). We could still use a US structure as, for example, site 0 US 752 1 600.

The second approach (!part 4) creates a trait for diameter in each site, so for observations in site 1 dbh_site1 = dbh_04, while dbh_site2 is a missing value. Note the use of Trait rather than site as well as the use of a comma at the end of the first line of the model equation, to indicate that continues in the next line. In the case of the additional structures, Trait 0 DIAG 62 50 represents a 2×2 diagonal matrix with diagonal elements 62 and 50, while rep.iblock represents the incomplete block design matrix. Trait 0 CORR 0.85 752 600 is a 2×2 correlation matrix, while tree represents the numerator relationship matrix.