ASReml · Single Page

This page was last modified on 07 November 2005, at 14:33 NZST

This page contains all the ASReml Cookbook pages in—obviously—a single page. Some people may prefer this format for printing.

Obtaining ASReml

A trial version (30 days) of ASReml can be obtained from VSN’s International web site. Licensed users can upgrade to newer versions from the specific update page. An alpha development ASReml 3 version is http://www.austatgen.org/asreml3alpha/ available here.

The latest version of the software can be used as a standalone program or be called from the S+ or R statistical packages. There are separate PDF User Guides (manuals) for standalone and S+/R use, which can be downloaded from http://www.vsni.co.uk/resources/doc/.

You can also download the User Guide from this site (PDF, 1 MB).

I need to confirm what is the current way to subscribe to the ASREML list

There is an ASReml users’ email list. Once subscribed, you can email the list using asreml-l@agric.nsw.gov.au. Some of the old messages can be found at http://www.arm.li.csiro.au/lists/asreml/. There is also a web forum.

Since 2002, ASReml users also need to acquire a license to use it. The current costs for single licenses are £195 (academic), £425 (not for profit) and £558 (commercial). Check here for current exchange rates: http://www.x-rates.com/. There are discounts available for multiple copies and site licenses. The payment system has then moved from a ‘purchase a license, pay a small maintenance, no academic pricing’ one to a ‘rent the software (like SAS), higher cost for commercial licenses, reasonably priced academic licenses’ one.

All queries concerning purchase/licensing of ASReml need to go to sales@vsn-intl.com. You can also visit the official ASReml site for additional information.

Setting up text editors

As I mentioned before, I recommend using the Context programming text editor. This editor supports a multiple document interface (so you can simultaneously look at the command file, output, etc), huge file sizes, syntax highlighting for several languages apart from ASReml (e.g., Python, Perl, C++). Once you download and install Context, go to the ‘Highlighters’ folder of the installed program and copy the file Attach:ASReml.chl (for ASReml v1.0) or Attach:ASReml2.chl (for ASReml v2.0, this one kindly provided by Mick Carrick from DPI Victoria, Australia). Just right click the appropriate link to save it to your computer.

Update: ASReml v2 comes with a Context highlighter as part of the standard distribution. The highlighter is based on my first highlighter, later extended by Greg Dutkowski and Arthur Gilmour.

Next time you restart Context and open a .as file, Context will highlight the ASReml code for you. This is very handy when writing long complex jobs. If you are writing a yet unsaved file, just click on Tools | Set Highlighter | ASReml. Ultraedit is a high quality commercial text editor for windows. Researchers at Skogforsk use it to prepare ASReml runs and G. Dutkowski — who was visiting them — sent me the instructions to setup the system:

  1. First, download the syntax highlighting file (ASREML.txt).
  2. Copy ASREML.txt to the c:\program files\ultraedit folder
  3. Go to Menu - Advanced - Configuration - File Types and name the extensions .AS,.ASC as ASReml Control Files.
  4. Go to Menu - Advanced - Configuration - Syntax Highlighting and Set Language as Language 2, and Browse to c:\program files\ultraedit\ASREML.txt.
  5. Restart Ultraedit.
  6. You are all set!

Job Anatomy

This page presents an introduction to the structure of an ASReml job:


Basic structure

An ASReml job is a set of commands in a text file (usually with extension .as) providing information about the data and the statistical models that we want to fit. The commands file has three parts:

Header
that is the first line of an ASReml job. It usually includes a succint description or title for the job. This header is used in the output file and titles for the plots produced by the program.
Data section
where the user specifies the structure of the dataset (including both factors and variables), where is the dataset (and any other additional files required for the analysis) located, and options for reading the data.
Model section
where the user specifies one or more equations and covariance structures, to fully describe the models being fit.

top of page

Describe your dataset

Factors
are the classification variables that you have in your dataset. Things like site (Hobart, Burnie), fertiliser (0, 100, 500), rep (1, 2, …, 50), etc. For each factor you need to specify the number of levels (2 for site, 3 for fertiliser, 4 for subrace, 50 for rep) and the type of coding that you are using. If you use consecutive integers starting from 1 (like rep) you do not need to specify coding, if you specify non consecutive integers (like fertiliser) you will use !I, for alphanumeric factors (like site) you will use !A. These factors would look like:
 site        2 !A
 fertiliser  3 !I
 rep        50
Covariates
include response variables as well as any other covariables you assessed. By default, ASReml considers ., * and NA as missing values. You can define any other character as missing values using the qualifier !M; for example, growth !M 0 considers 0 as a missing value. When setting the file containing your dataset is a good idea to explicitly define missing values (for example, always use a .). Any data transformation follows the variable names.
Data source
is where you use the name of the text file containing your dataset. If the file is not in the same directory of the .as file you will need to include the path to the file. There should not be any space between the beginning of the line and the file name. This is the line where you can also add information about the format of the file (for example !csv for a comma delimited file) and create subsets of data. For example:
 …
 growth
 density
d:\research\woodgrowth.csv !csv

Remember: Field names are case sensitive, so tree and Tree are not the same.
Always leave a space before a field name. ASReml needs the space to know the difference between the data section and the rest of the command file.

There is a large number of options that can be checked in Chapters 3 and 4 of the ASReml User’s Manual. You can also find extra hints in the “data tips” section of these notes.

top of page

Know thy models

ASReml notation aims to collect information about the model being fitted to the dataset. Hence, it is fundamental to understand the model before trying to fit it! A model comprises two parts i) the model equation and ii) the moments or distributional properties of the model. If the user provides only the model equation, ASReml will use the simplest Covariance Structures available, namely a variance pre- and post-multiplied by the design matrices. For example, with the following code for a trial with an incomplete block design:

growth ~ mu rep !r rep.iblock family

ASReml will assume that the the variance of the residuals will follow a distribution I σe2, while the variances of incomplete blocks and families will be Zb σb2 Zb and Zf σf2 Zf respectively. This is OK if you want to use these defaults, but maybe you want to fit different Covariance Structures.

Model equations are constructed as a linear combination of ASReml keywords (for example, mu), effects and covariates. You can specify simple factors, interactions, nesting, covariances, polynomial functions, splines, etc. By default all equations include an error term.

Some useful keywords:

mu
fits the overall mean.
Trait
is the multivariate version of mu. It creates a vector holding the overall means for each trait included in the analysis.
pol
generates polynomial functions based on a covariate. Usage: pol(var,n) generates a polynomial of n degree based on covariate var.

Some common models:

ASReml codeMeaning
var ~ mu FactorSingle fixed factor
var ~ mu Factor1 Factor2Two fixed factors
var ~ mu Factor1 Factor2 Factor1.Factor2Two fixed factors with interaction
var ~ mu !r FactorSingle random factor
var ~ mu Factor1 !r Factor2Mixed model with one fixed and one random effect
var ~ mu Factor1 !r Factor1.Factor2Mixed model where the second factor is nested within the first one
var ~ mu covariate FactorSingle fixed factor with one covariance

Univariate analysis

Page contents:

Some basic models

Lets start with the simplest design normally used in tree breeding programs: randomized complete blocks. You can run the different parts of the program using something like: > asreml partnumber (e.g. > asreml 2)

Complete block design
 tree !P
 rep     10      # replicates
 plot   500      # plot codes
 dbh
 bd
# Pedigree file
crctest1ped.txt
# Data file and option to run
# differentparts of the program
crctest1dat.txt !dopart $A

# Assuming fixed reps
!part 1 
dbh ~ mu rep !r plot tree

# Assuming random reps
!part 2 
dbh ~ mu !r rep plot tree

# Single tree plot
!part 3 
dbh ~ mu !r rep tree

In the case of incomplete block designs we treat replicates as fixed, and incomplete blocks (iblock) within each rep as random. Note that the nested effect is created defining rep as an effect in the model equation and then rep.iblock, without defining iblock by itself. If you wanted to have interaction rather than nesting you would use, for example: factor1 factor2 factor1.factor2 or, even shorter, factor1*factor2, which would be expanded to the previous notation.

Incomplete block design
 tree !P
 rep     5      # replicates
 iblock 25      # incomplete blocks
 plot           # plot codes
 dbh
 bd
crctest2ped.txt
crctest2dat.txt !dopart $A

# this part analyses dbh
!part 1 
dbh ~ mu rep !r rep.iblock plot tree

# while this part works on basic density (bd)
!part 2 
bd ~ mu rep !r rep.iblock plot tree

Equivalent models

In some situations, e.g. when you are only interested in predicting breeding values for the parents for backwards selection, you may prefer to use models that are equivalent and computationally less demanding (e.g. a family model). For example:

Incomplete block design using OP material
 tree !P
 motherID  200
 rep         5   # replicates
 iblock     20   # incomplete blocks
 plot     1000   # plot codes
 dbh
crctest3ped.txt
crctest3dat.txt !dopart $A

# Uses an individual tree model
# where var(tree) = additive variance
!part 1   
dbh ~ mu rep !r rep.iblock plot tree

# Uses a family model
# where var(motherID) = 1/4 additive variance
# (if there is no selfing, etc)
!part 2   
dbh ~ mu rep !r rep.iblock plot motherID

…or in the case of controlled pollinated material:

Incomplete block design using CP material
 tree !P
 motherID   20
 fatherID   20
 family    120
 rep         5     # replicates
 iblock     20     # incomplete blocks
 plot     1000     # plot codes
 dbh
crctest4ped.txt
crctest4dat.txt !dopart $A

# Uses an individual tree model
# where var(tree) = additive variance and
# var(family) = 1/4 dominance variance
!part 1   
dbh ~ mu rep !r rep.iblock plot tree family


# Uses a family model
!part 2   
dbh ~ mu rep !r rep.iblock plot motherID,
      and(fatherID) family

In the previous equation motherID and(fatherID) overlays the design matrices for males and females so you get only one prediction for each parent, in spite of some parents acting as both male and female (which is typical in crossing programs in trees). The variance of motherID will represent var(GCA).

If you face problems overlaying the matrices with and, please read Overlapping Design Matrices.

Diallels

The specifications of diallels is very straightforward in ASReml, and do not require the creation of many additional variables to hold extra factors.

Note: The specification of family code is in such a way that direction of cross does not matter (e.g., 55×96 = 96×55). In reciprocals code direction is important (e.g., 55×96 != 96×55).

Diallel in complete block design
 tree !P
 motherID   20
 fatherID   20
 family    120     # Family code
 recipro   200     # Reciprocals code
 rep        10     # replicates
 plot     1000     # plot codes
 dbh
crctest5ped.txt
crctest5dat.txt !dopart $A

# Uses an individual tree model
!part 1   
dbh ~ mu rep !r rep.iblock plot tree family,
      motherID recipro 

# Uses a family model
!part 2   
dbh ~ mu rep !r plot fatherID and(motherID),
      family motherID recipro

Clonal data

Clonal data can be seen as repeated observations of a genotype, thus their analysis is related to repeated measurements, although there is no ordering in time. The analysis of clonal data is straightforward in ASReml. In the data file all ramets of the same clone will have the same genotype ID, and each genotype will be only once in the pedigree file. If each genotype is repeated in the pedigree file, it will be necessary to include the !repeat keyword after the pedigree file name.

Brian Kennedy (in Animal Model BLUP. Erasmus Intensive Graduate Course. August 20–26 1989. University of Guelph. page 130) showed the mathematics behind using clonal data as repeated measurements, referring to the analysis of embryo splitting. I first ran code like this while working in longitudinal analysis in 1999–2000. However, João Costa e Silva provided me with a very good interpretation of the analyses at the end of 2003. For more details, check: Costa e Silva, J., Borralho, N.M.G. & Potts, B.M. 2004. Additive and non-additive genetic parameters from clonally replicated and seedling progenies of Eucalyptus globulus. Theoretical and Applied Genetics 108: 1113–1119.

Incomplete block design with clonal CP material
 genotypeID !P
 motherID   20
 fatherID   20
 family    120
 rep         5     # replicates
 iblock     20     # incomplete blocks
 dbh
crctest6ped.txt
crctest6dat.txt !dopart $A 

# Uses an individual tree model
# where var(genotypeID) = additive variance,
# var(family) = 0.25 dominance variance, and
# var(ide(genotypeID)) = 0.75 dominance + epistasis
!part 1
dbh ~ mu rep !r rep.iblock genotypeID fam ide(genotypeID)

The ide(genotypeID) part of the job, creates an additional matrix for genotypeID that ignores the pedigree relationships.

Multiple site, single trait

The traditional approach used in tree breeding to analyse progeny trials in multiple sites was to either assume a unique error variance (and then use the approach explained before) or to correct the data by the site specific error variance and then use the typical approach using the corrected data. Using ASReml it is possible to use alternative methods, either explicitly fitting a site specific error variance (but keeping a unique additive genetic variance) or fitting site specific additive and residual variances, in fact using a Multivariate Analysis approach, where the expression of a trait in each site is considered a different variable. In any case, any of the alternative methods needs a specification of Covariance Structures.

Multiple site single trait analysis
 tree !P
 family    120     # family code
 site        2
 rep         5     # replicates
 iblock     20     # incomplete blocks
 dbh
crctest8ped.txt
crctest8dat.txt !dopart $A

# Uses an individual tree model for site 1
!part 1   
! filter site !select 1
dbh ~ mu rep !r rep.iblock tree family

# Uses an individual tree model for site 2
!part 2   
! filter site !select 2
dbh ~ mu rep !r rep.iblock tree family

# Both sites, as single trait but different
# error variances
!part 3   
dbh ~ mu site site.rep !r site.rep.iblock,
      site.tree site.family

# there are two separate errors, with 
#one dimension
2 1 0 

# Error site 1
# 1500 observations in site 1, 0 is a 
# place holder (see spatial analysis),
# IDEN indicates an identity matrix, 
# !S2=25 is the starting value
# for residual variance (obtained from 
# running !part 1)
1500 0 IDEN !S2=25

# Error site 2
# Same explanation as before
1300 0 IDEN !S2=30

Covariance structures

When fitting simple models (as in many examples of Univariate Analysis) one needs to specify only the model equation (the bit like y ~ mu…) but nothing about the covariances that complete the model specification. This is because ASReml assumes that, in absence of any additional information, the covariance structure is the product of a scalar (a variance component) by a design matrix. For example, the residual covariance matrix in simple examples is R = I σe2, or the additive genetic variance matrix is G = A σa2 (where A is the numerator relationship matrix).

However, there are several situations when the analysis require a more complex covariance structure, usually a direct sum or direct product of two or more matrices. For example, an analysis of data from several sites might consider different error variances for each site, that is R = Σd Ri, where Σd represents a direct sum (see any matrix algebra book for an explanation) and Ri is the residual matrix for site i.

Other example of a more complex covariance structure is a Multivariate Analysis in one site, where both the residual and additive genetic covariance matrices are constructed as the product of two matrices. For example, R = I * R0, where I is an identity matrix of size number of observations, * is the direct product operation (do not confuse with a plain matrix multiplication) and R0 is the error covariance matrix for the traits involved in the analysis. Similarly, G = A * G0 where all the matrices are as previously defined and G0 is the additive covariance matrix for the traits.

You will see that the ASReml notation for this type of analysis closely resembles the matrix notation. Anyway, ASReml supports a large number of covariance structures (and I will present only a few of them), which are particularly useful for longitudinal and spatial analysis. The structures are easier to understand (at least for me) if we express a covariance matrix (M) as the product of a correlation matrix (C) pre- and postmultiplied by a diagonal matrix (D) containing standard deviations for each of the traits. That is:

    | v11 c12 c13 c14 |         |  1  r12 r13 r14 |
    | c21 v22 c23 c24 |         | r21  1  r23 r24 |
M = | c31 c32 v33 c34 |     C = | r31 r32  1  r34 |
    | c41 c42 c43 v44 |         | r41 r42 r43  1  |

    | s11  0   0   0  |
    |  0  s22  0   0  |
D = |  0   0  s33  0  |
    |  0   0   0  s44 |

M = D*C*D

where the v are variances, the r correlations and the s standard deviations.

If we do not impose any restriction on M, apart from being positive (p.d.) definite, we are talking about an unstructured matrix (US in ASReml parlance). Thus, M or C can take any value (as long as it is p.d.) as is usual when analyzing multiple trait problems.

There are cases when the order of assessment or the spatial location of the experimental units create patterns of variation, which are reflected by the covariance matrix. For example, the breeding value of an individual i observed at time j (aij) is a function of genes involved in expression at time j - k (aij-k), plus the effect of genes acting in the new measurement (αj), which are considered independent of the past measurement aij = ρjk aij-k + αj, where ρjk is the additive genetic correlation between measures j and k.

Rather than using a different correlation for each pair of ages, it is possible to postulate mechanisms which model the correlations. For example, an autoregressive model (AR in ASReml lingo), where the correlation between measurements j and k is r|j-k|. In this model M = D*C_AR_*D, where C_AR_ (for equally spaced assessments) is:

       |  1  r^1 r^2 r^3 |
       | r^1  1  r^1 r^2 |
C_AR = | r^2 r^1  1  r^1 |
       | r^3 r^2 r^1  1  |

A model including this structure will certainly be more parsimonious (economic on terms of number of parameters) than one using an unstructured approach. Looking at the previous pattern it is a lot easier to understand why they are called ‘structures’. A similar situation is considered in Spatial Analysis, where the ‘independent errors’ assumption of typical analyses is relaxed. A common spatial model will consider the presence of autocorrelated residuals in both directions (rows and columns). Here the level of autocorrelation will depend on distance between trees rather than on time.

Another structure, based on random regressions, is explained in the Longitudinal Analysis section of the cookbook. ASReml allows fitting many more different structures, so see section 4.9 (variance model specification) of the manual for more details.

Multivariate analysis

Page contents:


Single site, multiple traits

By now, ASReml’s univariate notation should start making sense. After enjoying the speed and flexibility of the program for the univariate case, you are probably thinking of moving to more complex analyses. The next step will be to run two or more traits in one site.

Multiple traits one site
 tree_ID    !P
 mother    100 !I
 subrace     8 !A
 locality   30 !A
 rep         5 !I
 iblock     23 !I
 plot      100 !I
 BD
 PPY
 FL
 MFA
 CC
woodped.csv !CSV
woodtraits.csv !CSV !skip 1 !dopart $A

!part 1 # Basic Density
BD ~ mu subrace rep !r tree_ID
0

!part 2 # Cellulose Content
CC ~ mu subrace rep !r tree_ID
0

!part 3 # Bivariate analysis 
BD CC ~ Trait Trait.subrace Trait.rep !r ,
        Trait.tree_ID !f mv
1 2 1 
0
Trait 0 US 645 1 0.32 

Trait.tree_ID 2
Trait 0 CORR 0.8 516 1.51
tree_ID

Parts 1 and 2 of the previous example are just the univariate runs that will provide starting values for the multivariate analysis. Part 3 introduces some new notation. The left hand side (LHS) of the model equation now has two variables: BD and CC. Rather than mu we use the keyword Trait. Trait tells ASReml to hold a vector of results (rather than one overall mean -mu- we will have as many overall means as traits there are on the LHS. We also add an interaction between Trait and all other terms of the equation, so we get results for, say subrace, for each of the traits.

The 1 2 1 specifies (from left to right) that there is ONE independent error structure, which consists of the direct product of TWO matrices and that there is ONE additional covariance structure to define (which corresponds to Trait.tree_ID). The latter has to be defined because it is more complex than just the product of a design matrix by a scalar. The definition of the error structure comprises two matrices: an identity matrix of size number of observations (the 0 that tells ASReml to count the records) and a 2×2 covariance matrix (the size is specified by Trait; you could also use 2 rather than Trait here). This covariance matrix is unstructured (US) and has error variance of 645 for BD, 0.32 for CC (both values coming from the univariate analyses) and 1 for the error covariance between the traits (we are just guessing this number). The 0 between Trait and US is a place holder that we will use when dealing with spatial analysis.

Now we need to deal with the additional structure. We first have the name of the term for the structure (Trait.tree_ID) followed by the number of matrices to be multiplied (2). The first matrix is of dimension 2×2 (specified by Trait as in the case of the error structure) and is specified as a correlation matrix (CORR) in this case. You can use a US matrix as well, but correlation matrices tend to behave better and make convergence easier. The values are first the correlation between traits (just a guess) and the additive variances for BD (516) and for CC (1.51). The second matrix is the numerator relationship matrix, specified by tree_ID (which was defined as pedigree when using !P).

To make things a bit clearer concerning the US and CORR matrices, they are read as a triangular matrix row by row from left to right (US) and as a lower triangular matrix row by row from left to right, followed by one or more variances (CORR). Thus, if you have a covariance matrix A and a correlation matrix B, where v is variance, c is covariance and r is correlation, and the indices refer to the traits:

     v11 c12 c13 c14           v11 r12 r13 r14
     c21 v22 c23 c24           r21 v22 r23 r24
A =  c31 c32 v33 c34      B =  r31 r32 v33 r34
     c41 c42 c43 v44           r41 r42 r43 v44

Then the starting values of the US matrix would be specified in the following order: v11 c21 v22 c31 c32 v33 c41 c42 c43 v44; while the values for the CORR matrix would be in this order: r21 r31 r32 r41 r42 r43 v11 v22 v33 v44.

Another example is the analysis of a CP progeny trial using a parental model:

Bivariate analysis for CP trial
# don’t use !P because is not a tree model
 treeID         
 mom   12 !I  
 dad   12 !I
 fam  105 !I
 block 10
 BD
 PY
 # Skip first line of file
 # because it contains columns names
woodprops.txt !skip 1 !dopart $A
!part 1
# GCA and SCA model
# mom and(dad) create only one factor
BD ~ mu block !r mom and(dad) fam
0

!part 2
PY ~ mu block !r mom and(dad) fam
0

!part 3
# Bivariate GCA and SCA model
BD PY  ~ Trait Trait.block !r Trait.mom and(dad),
         Trait.fam
1 2 2
0
Trait 0 US 0.0038 0.0005 0.0027 !GP

Trait.mom 2
Trait 0 CORR 0.9 0.0003 0.003 !GP
mom

Trait.fam 2
Trait 0 CORR 0.2 0.0002 0.003 !GP
fam

There are several elements in common with the previous example, so I will mention the few new issues. In the case of the Controlled Pollinated trial we know both parents, so we can get an estimate of dominance using the fam term, which refers to the family code. Because we are using a parental model rather than an individual model, we use mom and(dad) to overlay the design matrices and create a unique prediction for each parent, rather than one as male and one as male. This overlay creates a unique factor, something that you can see in part 3, where the structure Trait.mom refers to mom and(dad). A few points to highlight: the numbers after the model equation are now 1 2 2, because there is still only ONE error structure that is the product of TWO matrices, but now there are TWO additional covariance structures to be defined (Trait.mom and Trait.fam). Finally, the use of !GP to enforce positive definite covariance matrices because, in this specific dataset, some of the parameters tended to be outside the parametric space (e.g. correlation over 1).

top of page

Two sites single trait treated as multivariate

In the univariate analysis page you can find the code for multiple site analysis considering either unique variances for additive variance and error variance or single additive variance and multiple error variances (one for each site). Another option is to consider the expressions of a trait in each site as a different trait (this goes back at least to 1952 in a paper by Doug Falconer). For example, diameter in site 1 would be considered as a different trait from diameter at site 2; thus, each trait is going to have its own additive and residual variances. The following example shows code for two sites, but it is easy to extend to any number of traits.

Multiple sites as multivariate analysis
 tree      !P       #V1
 site       2       #V2
 subrace   20       #V3
 rep        5       #V4
 iblock    20       #V5
 dbh_04             #V6
# These two new variables are created only
# to show an alternative way of setting up
# the problem
 dbh_site1 !=V2 !=1 !*V6 !M0    
 dbh_site2 !=V2 !=2 !*V6 !M0
multivarped.csv !csv
multivardat.csv !csv !dopart $A

# Univariate analysis of site 1 to get
# starting values
# We filter data to keep only site 1
!part 1          
!filter site !select 1
dbh_04 ~ mu subrace rep !r rep.iblock tree
0

# We now filter data to keep only site 2
!part 2     
!filter site !select 2
dbh_04 ~ mu subrace rep !r rep.iblock tree
0 

# Multivariate analysis considering both sites
# No need to filter now
!part 3 
 dbh_04 ~ mu site site.subrace site.rep !r site.rep.iblock,
         site.tree
2 1 2
# The errors are independent between sites, so
# the covariance will be 0. Assuming 4000
# observations for site 1 and 3500 for site 2
4000 0 IDEN !S2=1335
3500 0 IDEN !S2=1102

# Incomplete block effects are independent
# between sites, so a diagonal structure is 
# ideal. It is not possible to use DIAG for
# the error, so there we use a covariance of 0
site.rep.iblock 2
site 0 DIAG 62 50   
rep.iblock

# The additive performance on both sites is
# correlated, and we use CORR to get a direct
# estimate of that correlation
site.tree 2
site 0 CORR 0.85 752 600
tree

# Equivalent multivariate analysis using 
# the new traits created
!part 4  
dbh_site1 dbh_site2 ~ Trait Trait.subrace Trait.rep,
                      !r Trait.rep.iblock Trait.tree
1 2 2
0
Trait 0 US 1335 0 1102

Trait.rep.iblock 2
Trait 0 DIAG 62 50
rep.iblock

Trait.tree 2
Trait 0 CORR 0.85 752 600
tree

A few explanations now. The previous example uses two alternative ways to code the problem. The first one (!part 3) uses site interacting with all the terms of the model equation, including tree. This creates results for each factor in both sites. The 2 1 2 after the model equation refers to TWO error structures, which are identity matrices of size number of observations (represented by the first value of the line). This is because the errors are independent between sites. The other TWO refers to the additional structures (site.rep.iblock and site.tree). site.rep.iblock is the variance of incomplete blocks in each site, and, because they are independent, we use a diagonal structure with the variances for each site as starting values. site.tree represents the tree (or additive genetic) variance in each site. Because the trees are related there is a correlation between the effects ib both sites. We can estimate the correlation directly if we use CORR (and it makes convergence easier). We could still use a US structure as, for example, site 0 US 752 1 600.

The second approach (!part 4) creates a trait for diameter in each site, so for observations in site 1 dbh_site1 = dbh_04, while dbh_site2 is a missing value. Note the use of Trait rather than site as well as the use of a comma at the end of the first line of the model equation, to indicate that continues in the next line. In the case of the additional structures, Trait 0 DIAG 62 50 represents a 2×2 diagonal matrix with diagonal elements 62 and 50, while rep.iblock represents the incomplete block design matrix. Trait 0 CORR 0.85 752 600 is a 2×2 correlation matrix, while tree represents the numerator relationship matrix.

Estimating heritability

ASReml has very comprehensive facilities for estimating functions of variance components. This is done through a ‘.pin file’, which is a text file—separate from the original job file—containing commands to combine the variance components reported in the .asr file. ASReml will automatically calculate the standard error for each function of variance components used in the .pin file.

There are three types of functions supported in a .pin file:

  1. Linear combinations (F): sums of variance components to create phenotypic variances, for example.
  2. Ratios (H): to calculate the ratio of two variance components as in heritabilities.
  3. Correlations ®: to calculate, well, the correlation between two traits.

A .pin file uses the position of the variance components in the .asr file to denote the operations. Thus, let’s assume that the .asr file presents the following results:

Source  Model  terms   Gamma    Component  Comp/SE  % C
tree     210    210   0.922524    1.02382    3.01   0 P
Variance 228    226   1.00000     1.10980    5.13   0 P

tree (representing additive variance) is in the first position and Variance (the residuals) is in the second position. A .pin file to calculate heritability and its standard error would look like:

F pheno 1 + 2
H herit 1 3

This tells ASReml to sum the first and second variance components (additive and residual variances) and store it in a variable called ‘pheno’, which is the third variance component now. Then ASReml calculates heritability as the ratio of the first variance component (additive variance) and the third (phenotypic variance). Piece of cake!

One normally creates a .pin file with the same names as the .as file. For example, if the job file is called treemodel.as, the .pin file is called treemodel.pin.

To run the .pin file one needs to first run the .as file as in asreml treemodel and achieve proper convergence for the run. Then one runs the pin file as asreml -p treemodel, which assumes that there is a treemodel.pin file. If you have named the .pin file differently, then one needs to specify the file name and extension as in asreml -p anothername.pin.

More coming soon.

Longitudinal analysis

Page contents:


Introduction

Now it is the time to start with problems that breeders face when dealing with multiple assessments of a trait. Longitudinal data arise when an individual is repeatedly assessed at different times. For example, tree height at ages 2, 5, 8 and 12 years, or data coming from increment cores.

Longitudinal data can be considered as a special case of multivariate analysis, because the ‘same trait’ is measured each time. However, longitudinal analysis differs from typical multivariate because there is an underlying continuum (‘time’) and the sequential nature of assessments (height at age 5 comes first than height at age 8) creates patterns of variation.

Longitudinal data allow us to model changes of heritability and genetic correlation with age. Furthermore, data from several assessments can be pooled to improve the prediction of breeding values at a given age. This type of datasets is frequent in breeding programs (especially in the ones where labour and other assessment costs are low). Despite of this, its analysis has mostly been reduced to univariate analysis by age or, in special cases, to bivariate analysis to obtain age-age correlations.

The steps for prediction of breeding values (or family effects) for longitudinal data are almost identical to those for univariate and multivariate analysis. Nevertheless, we can use the existence of patterns to simplify the analysis. The use of an unstructured (US) covariance matrix is not necessarily the best option. If there are m measurements, the covariance matrix involves m(m+1)/2 components. In other words, increasing the number of assessments by m increases the number of estimates by m(m+1)/2. Therefore it is easy to see that the amount of information to estimate parameters is somewhat reduced (and also the ‘quality’ of the estimates). It is possible to use other structures to reduce the number of parameters to estimate and still explain the observed patterns of correlation.

The above description is very simple and brief. If you require more detail on the description of the different models, their interrelationships and examples of application in forest genetics see my references [7] and [10] (pdf versions available for download). Yes, this is an example of shameless self promotion.

top of page

Longitudinal analysis in ASReml

There are basically two ways of setting up the datasets for longitudinal analysis, either as multiple traits for each record or as multiple records for a single trait. In any case, you will need to setup observations for all times for all individuals. Thus, if you do not have observations for some times it is necessary to specify the missing values (see below):

Data as a single trait
tree fixed_effects random_effects assessment_No age trait
1001      1              25              1       5    4.2
1001      1              25              2       8    7.9
1001      1              25              3      12   14.3
.
.
.

Data as multiple trait
tree fixed_effects random_effects  trait1 trait2 trait3
1001      1              25         4.2    7.9    14.3
.
.
.

You can have multiple fixed and multiple random effects in any of the setups. In the case of the single trait setup it is necessary to use the !repeat command in the pedigree file line, because the same tree ID is used several times (as many times as assessments). The example below assumes a single trait setup for the dataset and considers three assessments for 1526 trees in a single site with 8 reps (just to make it small). Replicates are considered a fixed effect.

Analysis of longitudinal data under different models
 tree   !P
 mum    50 !I
 dad     1
 rep     8
 assess  3 !I
 age
 height !m0
RRlongit.csv !REPEAT
RRlongit.csv !CSV !maxit 20 !DOPART $A

!PART 1  # Plain vanilla US
height ~ assess assess.rep !r assess.tree !f mv
1 2 1 !ASMV 3  !STEP 0.1
1526
assess 0 US !+6 !GP
         0.46
         0.39 0.97
         0.48 0.83 2.36

assess.tree 2
assess 0 US !+6 !GP
         0.13
         0.20 0.34
         0.24 0.41 0.51
tree

!PART 2  # Same as before but using CORR
height ~ assess assess.rep !r assess.tree !f mv
1 2 1 !ASMV 3  !STEP 0.1
1526
assess 0 US !+6 !GP
         0.46
         0.39 0.97
         0.48 0.83 2.36

assess.tree 2
assess 0 CORR !+6 !GP
         0.95
         0.78 0.92
         0.13 0.34 0.51
tree

!PART 3  # Autoregressive tree (using POW)
height ~ assess assess.rep !r assess.tree !f mv
1 2 1 !ASMV 3  !STEP 0.1
1526
assess 0 US !+6 !GP
         0.46
         0.39 0.97
         0.48 0.83 2.36

assess.tree 2       
assess 0 POW !+4 !GP
         0.96
         0.13 0.34 0.51
tree
5 8 12 # These are the ages

!PART 4  # Random regression, full fit
height ~ assess assess.rep !r pol(age,2).tree !f mv
1 2 1 !ASMV 3  !STEP 0.1
1526
assess 0 US !+6 !GP
         0.46
         0.39 0.97
         0.48 0.83 2.36

pol(age,2).tree 2
pol(age,2) 0 US !+6 !GP
             5.20
             1.80 1.10
             0.01 0.10 0.01
tree

!PART 5 # Random regression, reduced fit
height ~ assess assess.rep !r pol(age,1).tree !f mv
1 2 1 !ASMV 3  !STEP 0.1
1526
assess 0 US !+6 !GP
         0.46
         0.39 0.97
         0.48 0.83 2.36

pol(age,1).tree 2
pol(age,1) 0 US !+6 !GP
             5.20
             1.80 1.10
tree

!END

For the US, CORR and POW (autoregressive) structures the factor assess is used in the same way as site for the multiple site or Trait for the multivariate case. Trait is not used here because there is only a single trait. Parts 1 and 2 are equivalent to a full multivariate case when traits are different, and are included here to start from the most generic case. However, given that we are assessing the same trait at different times, it is expected to have a large degree of autocorrelation, which can be used to simplify the model.

Part 3 presents the use of an autoregressive structure (POW), where we can specify the ages of assessment, allowing for irregular lags between measurements. In some case we can improve fitting of the model specifying a different time scale, for example a square root or logarithmic transformation. Then, rather than using times 5 8 12, we would use 2.236 2.828 3.464 (square root) or 1.609 2.079 2.485. In the autoregressive case we often specify a single correlation (0.96) and different variances for each assessment (0.13 0.34 0.51), although it would be possible to specify a single variance when we have homogeneous variances (usually after some transformation to stabilize variances).

Parts 4 and 5 show examples of using full fit and reduced fit random regression models, which is explained in the next section. For all parts, the command !ASMV 3 tells ASReml to consider the univariate setup of each record as multivariate, with 3 records (measurements) for each individual).

top of page

Random regressions

The phenotypic trajectory of a trait (dependent on time) can be expressed through a mathematical function tractable in a mixed linear model framework, for example, using polynomial regression, growth models, or cubic splines. Each part of the model (e.g. replicate, plot, additive effects, residuals) can be modelled using a different function. For example, the additive genetic effect of tree i on time j (a_ij_) as a polynomial regression:

When using random regressions for modelling factors, rather than obtaining a covariance matrix of dimensions (numbers of observations) * (number of observations), we get a covariance matrix of the random regression parameters, that is of dimensions (number of coefficients) * (number of coefficients). As an example, for tree height assessed at 10 ages, the covariance matrix for the additive genetic matrix would be 10*10. However, if we fit a polynomial of third degree for the additive genetic values, the covariance matrix for the random regression parameters would be 4*4 (fitting intercept, x, x2 and x3).

Once we have the covariance matrix for the random parameters, it is very easy to obtain the full covariance matrix using: G = Z Q Z`, where Q refers to the covariance matrix of the random regression parameters and Z to the design matrices for times.

If we use a polynomial of degree (n-1), where n is the number of assessmenst, we will get a perfect (full) fit for the matrix under modelling. However, the interesting part is to find a good reduced fit (degree lower than n-1) more parsimonious model. Part 4 correspond to a full fit random regression, where for 3 assessments we are fitting a quadratic polynomial. The pol(d) ASReml function fits a an orthogonal polynomial of degree d, and here is interacted with tree, so there is a different polynomial for each tree. Given that tree is a random effect, the output will include the variances for each of the coefficients, as well as the covariances between coefficients. Part 5 represents a reduced fit random regressions model, where we are fitting a linear regression, reducing the number of coefficients needed in the model.

The following example shows how to calculate the full covariance matrix of additive genetic effects for 3 assessments, based on the output from a full fit quadratic polynomial (Part 4) and a reduced fit linear polynomial.

Example of modelling using Random regressions
# Full (quadratic) fit from
# Part 4 (6 parameters)
Design matrix for random regression coefficients
(in .res file)
    | 1.  −1.      1.    |
Z = | 1.  - .137  - .503 |
    | 1.   1.013    .981 |

Covariance matrix for the random regression coefficients 
(in .asr file)
    | 1.84386      .734105  - .040207 |
Q = |  .734105     .501793    .061487 |
    | -.040207     .061487    .037533 |

Reconstructed additive genetic covariance matrix (G, 
using  any matrix algebra package)
G = Z Q Z’
    |  .711588     1.0615705    1.3042244 |
G = | 1.0615705    1.7105521    2.3399636 |
    | 1.3042244    2.3399636    3.9255211 |

# Reduced (linear) fit from 
# Part 5 (3 parameters)
Design matrix for random regression coefficients 
(in .res file)
    |   1.  - 1.    |
Z = |   1.  -  .137 |
    |   1.    1.013 |

Covariance matrix for the random regression coefficients 
(in .asr file)
Q = | 1.78707      .757768 |
    |  .757768     .446248 |

Reconstructed additive genetic covariance matrix (G, 
using  any matrix algebra package)
G = Z Q Z’
    |  .717782      .9866238    1.3448718 |
G = |  .9866238    1.5878172    2.388944  |
    | 1.3448718    2.388944     3.7802338 |

# Compare to unstructured (US) result from 
# Part 1 (6 parameters)
    |  .71157   1.06150   1.30423 |
G = | 1.06150   1.71025   2.33960 |
    | 1.30423   2.33960   3.92577 |

In some cases (like in this example), it will be possible to have substantial reductions on the number of parameters to estimate, without big differences on the estimated covariance matrices.

Spatial analysis

Page contents:


Introduction

Forestry progeny tests may show large environmental variability, especially when they contain large numbers of entries, which increases the large of the replicates. The existence of environmental gradients, patchy environmental conditions and competition between individuals creates association between closely located individuals. In some cases this association is positive (e.g. environmental gradients), while in other cases is negative (e.g. competition). There is a large number of models that have been put forward to model the presence of spatial association, but here the presentation will be limited to separable autoregressive (AR) processes.

The Genetic Improvement Program at the CRC for Sustainable Production Forestry has been working in spatial analysis for a few years now. There are at least two good references for spatial analysis in forest genetics, where the analyses were conducted using AS Reml (papers from the Can J For Res are available here subject to these copyright and author rights notices):

  • Costa e Silva, J., Dutkowski, G.W. and Gilmour, A.R. 2001. Analysis of early tree height in forest genetic trials is enhanced by including a spatially correlated residual. Canadian Journal of Forest Research 31: 1887–1893. (PDF paper, Danger Will Robinson! 1.1 Mb file, not-so-fast server. If you can, just go to your library).
  • Dutkowski, G.W., Costa e Silva, J., Gilmour, A.R. and Lopez, G.A. 2002. Spatial analysis methods for forest genetic trials. Canadian Journal of Forest Research 32: 2201–2214. I won’t put a link to the 2.5 Mb file with nice colour plates.

top of page

Spatial analysis in ASReml

The following code is adapted from code borrowed from Greg Dutkowski (thanks Greg!). The example presents a ‘typical’ complete block design and then adds spatial components to it.

Examples for spatial analysis
 tree       !P
 family  50 !I
 row     19
 col     84
 rep      5
 iblock  10
 plot   400
 dbh
radiata.ped !make !alpha
radiata.csv !csv !dopart $A

# Analysis as random incomplete blocks
# includes additive (tree) and SCA (family) effects
!part 1
dbh ~ mu rep !r rep.iblock plot tree family !f mv
0

# Analysis as random incomplete blocks
# but adding spatial row and col autoregressive
# processes and independent error (units)
!part 2
dbh ~ mu rep !r rep.iblock plot tree family units !f mv
1 2 0
row row AR 0.8
col col AR 0.8

# Spatial analysis dropping non-significant design
# features (see text)
!part 3
dbh ~ mu rep !r tree family units !f mv
1 2 0
row row AR 0.8
col col AR 0.8

# Analysis as random complete blocks
# with spatial component. However, spatial is
# represented in the G matrix rather than the 
# R matrix
!part 4
dbh ~ mu !r row.col tree family
1 1 1
0 0 IDEN
row.col 2
row row AR1V 0.9 01
col col AR 0.9

Part 1 of the code is simply an incomplete block design with random plot, additive (tree) and SCA (Family) effects. You can see explanations for this code in the Univariate Analysis section of the cookbook. Part 2 introduces a few additional elements:

  • The 1 2 0 states there is ONE error structure that is the Kronecker product of TWO matrices (that are explained below).
  • Separate autoregressive (AR) errors for rows and columns. The first row states the factor that we are using to fit the AR residual. The second row specifies the ‘distance’ measure. Same thing for col. the 0.9 is the starting value for each of the autoregressive coefficients.
  • units refers to the uncorrelated (non spatial) residual. Thus, our model includes spatially associated residuals (explained in the previous two points) and spatially independent residuals. Do not forget to include units, because if it is not in the model you will probably get inflated estimates of heritability and think that spatial analysis is the best thing after sliced bread (and a long warm shower and …). Well, it is not. It can help you with patchy situations (for example disease scores) or to discover assessor effects, but in well designed experiments and standard traits (e.g. volume, basic density) will not make much of a difference.
  • You will also need to fit missing values as fixed effects (!f mv). Most likely you do not have a perfectly rectangular experiment, but you can create a rectangular matrix with missing values and then fitting the missing values in the model. This will speed up (or even make possible) the use of spatial analysis in ASReml. Remember: to use part 2 or part 3 (modelling spatial through the residual matrix R) you must have a rectangular distribution of trees.

Most of the time, when you fit spatial and independent residuals (Part 2), the experimental design features will become non significant and you will drop them from the model (the exception, if any, will probably be the plot effect). Part 3 shows the likely form of the model without incomplete block and plot effects. Part 4 is a bit more intriguing, although it is equivalent to Part 3. There, rather than modelling the spatial residuals in the R residual matrix, we are modelling them as additional random effects, that is in the G matrix. Thus,

  • 1 1 1 There is ONE residual structure, which is the product of ONE identity matrix multiplied by a scalar. There is also ONE additional covariance structure.
  • The additional covariance structure row.col is the Kronecker product of TWO matrices, which have the same structure as the autoregressive residual matrices of Parts 2 and 3.
  • Modelling spatial correlation through G does not require a rectangular distribution of trees. Part 4 is faster, but it may not run for large arrays.

You can revisit a longer explanation of Covariance Structures. You might also be interested in ENV, a program for plotting spatial residuals.

top of page

Spatial analysis for irregularly spaced trees

The above code works well with equally spaced individuals; however, there are examples when the distances between trees are variable (for example when using GPS measurements of position). In this case, the code can be changed to something like (Thanks to Arthur Gilmour):

# Scale the coordinates so a typical distance 
# is around 1.
 y  !/10     
 x  !/10

filename.dat !dopart $A
!part 1
dbh ~ mu
0

# With Nugget variance fitted as residual
!part 2  
dbh ~ mu !r fac(x,y)
0 0 1
fac(x,y) 1
fac(x,y) fac(x,y) IEXPV 0.2 1

# with Nugget variance in G structure, 
# spatial in residual
!part 3
dbh ~ mu x y !r units 1
1 1 0
0 910 EUC 0.2

In these jobs we are not including any genetic components, just to highlight the spatial syntax. Part 1 obviously fits only the overall mean. Part 2 creates a factor based on the combinations of two continuous variables, and allows for the spatial association in the G structure (in the fac(x,y) 1 … part). Part 3 fits the distances as covariates and allow for the spatial association at the R structure level. In the code 0 910 EUC 0.2, the 910 refers to variables 9 and 10 in the listing of read variables (x and y). Parts 2 and 3 are memory hogs, so try to stay clear of them if you can. An option would be to create an equally spaced grid and then assign each tree to the closest point in that grid. This would convert the problem to something that can be treated with the code used at the beginning of this page.

General resources

Page contents:


Other ASReml sites

  • Steve Kachman’s ASReml Workshop notes present examples of code, including .pin files for functions of variance components.
  • Julius van der Werf organises the Armidale Animal Breeding Summer Courses (University of New England, Australia). Module C of the 2005 course includes PowerPoint presentations by Arthur Gilmour. The page contains links to four sets of slides (around 3MB each set).
  • A simple example of fitting a model with asreml-R in the R wiki.
  • A PDF version of a presentation on using the asreml library in R by David Butler.

Race classification

One of the assumptions of genetics analyses is that all individuals come from a single population with common mean and variance. It is clear that in several species of Eucalyptus there are large population differences. Therefore, it is necessary to fit the population of origin in the model equation, either as a fixed effect (e.g. when you have first generation selections, so trees come from only one population) or using genetic groups (e.g. advanced generations, when there are crosses between populations). We are trying to standardise the populations used for the analyses of E. globulus using the subraces shown in the race classification map devised by Dutkowski & Potts (Australian Journal of Botany 47:237–263. 1999).

top of page

CRC-SPF general

If you are interested in research material generated by the CRC-SPF check the publications database, where you will find references and electronic copies of papers, reports and presentations. This database covers not only genetics but also all the other research areas of the CRC for Sustainable Production Forestry. Note: access to CRC resources requires login in the publications management system. Please login using username = guest, password = guest.

There is a large number of courses, presentations, seminars, meetings going on at the CRC for Sustainable Production Forestry. Check the calendar to see all public events.

ENV (freely available) is a program for plotting spatial residuals to monitor analyses using ASReml.

Blockit (freely available) is a DOS (windows terminal) program which creates complete rectangular arrays of data for spatial analysis for one or many blocks/sites.

top of page

References

If you are interested in genetic analysis of progeny tests, and don’t know where to start, browse a few books and sets of notes until you find the ones you feel more comfortable with. You will probably need to start with a quantitative genetics book, then with books or notes on breeding value prediction using BLUP, as well as some papers to have a look at what is going on in your area of research.

  • Books
    • Lynch, M. and Walsh, B. 1998. Genetics and analysis of quantitative traits. Sinauer Associates. 980 p (ISBN 0–87893–481–2). This is the ‘new classic’ for quantitative genetics, and a wonderful place to start learning about the topic. You can also visit the web site for the book, with plenty of references, software and drafts for the second volume (Amazon price).
    • Falconer, D.S. and MacKay, T.F.C. 1996. Introduction to quantitative genetics. Longman. 464 p (ISBN 0582–24302–5). The ‘old classic’ that has been around since 1960. A very good book, although I prefer Lynch and Walsh’s (Amazon price).
    • Mrode, R.A. 1996. Linear models for the prediction of animal breeding values. CAB International. 187 p (ISBN 0–85198–996–9). An excellent introduction to BLUP, with a large number of small examples with all the details of the calculations (Amazon price).
    • Henderson, C.R. 1984. Applications of linear models in animal breeding. University of Guelph. 423 p (ISBN 0–88955–030–1). This is more for consulting specific details, rather than a cover-to-cover book. Not for the faint hearted. It is good to have Searle’s notes as a companion volume if you want to get your hands dirty. To obtain this and the following book please contact Larry Schaeffer.
    • Searle, S.R. 1998. A mathematical supplement to C.R. Henderson’s book “Applications of linear models in animal breeding”. University of Guelph. 254 p.
  • Notes (some of my copies are photocopy of a photocopy)
    • Kennedy, B.W. 1989. Animal model BLUP. Erasmus intensive graduate course. Trinity College Dublin. 141 p. As most of Brian Kennedy’s stuff, it is extremely clear and a real pleasure to read.
    • Quaas, R.L., Anderson, R.D. and Gilmour, A.R. 1984. BLUP School Handbook. Animal Genetics and Breeding Unit, University of New England. 158 p. Nice notes and there is even a reference to autoregressive processes. I think this is just after Arthur finished his PhD at Massey.
  • Sites
    • Larry Schaeffer has lots of course material in BLUP and related topics. He also maintains some of Brian Kennedy’s notes. He has the dubious honour of having a web page where the menu looks like mixed model equations.
    • Is it a fixed or a random effect? Read a very short description at Bob Barker’s site.
    • The Virtual Library of Forest Genetics maintained by Matti Haapanen has a large number of links to forest geneticists, software, organizations, etc.
    • Ever wanted to have a look to Sir Ronald Fisher’s papers? Your library doesn’t hold the journals or books with his work? No worries, have a look at the Collected papers made available by the University of Adelaide.

When I have some time I will be adding more references to other pages and written material. If you know of any material that should be here, please let me know.

Generalised models (from discussion group)

The old version of the manual used to include an explanation for fitting generalised linear models, including binomial and poisson distributions. It is not in the current user manual, but the explanations are in the beta manual, which you can get from Obtaining AS Reml.

The following explanation is based on emails by Bruce Southey and Arthur Gilmour.

How does one calculate heritability for binomial or poisson models?

Just as you would a normally distributed trait. This is in the old manuals for the logistic models. Basically you use the variance of the link function as the residual variance. For example, the variance of the logit link is pi*pi/3 (i.e., approximately 3.29).

Are there any good references regarding generalized mixed models, particularly for genetic analysis?

There are papers that use ASReml, like Southey et al. 2003. Journal of Animal Science 81:1399-1405. Another good source is Steve Kachman’s work. He will give a course at Iowa State University on AS Reml in summer 2005.

It seems to be easier starting on binary data and David Collett’s book Modelling Binary Data is an excellent starting point. Then work up from that.

How good are ASReml results for GLMM?

Generalised Linear Mixed Models (GLMM) procedures in ASReml (which use the Schall method) may be seriously biased when applied to animal models (which is why they have not been described in the User Guide). The situation for Sire (Family) models seems to be a bit better. It is not easy to quantify the bias in any particular case without doing a simulation study.

Arthur’s explanation for this problem is that as the mean incidence becomes more extreme, the likelihood of fixed classes having all 0’s or all 1’s increases, and then there is no genetic information available from that class. However, dropping such classes from the analysis, is likely to change the overall mean, hence the overall model. There are strategies for handling this, like making contemporary groups RANDOM at the lowest level, or collapsing classes.

Even if all fixed effects have a welled defined mean (not 0 or 1), the animal model is still fitting an effect for every individual and the only constraint on that effect being ± infinity is that it has a correlation with other effects which may have the opposite sign. My own studies 20 years ago suggested that there was a tendency for the estimated variance component to be biased down (it is too small).

Other formulations of the GLMM model (e.g. using a linear predictor ignoring the animal effect) have slightly different behaviour but ASReml only has the one option in this regard.

As the writer of ASReml I am not prepared to say you can confidently accept answers you might get from ASReml when estimating heritability on the liability scale from BINARY data. Sometimes they will be fine but I am not able to give the rules when that is the case. Apparently Bruce has had no problems with the situations he has encountered which is encouraging.

Log likelihood is not suitable for comparing binomial models

The LogL value reported by ASReml is for the analysis of the working variable. As this working variable changes between iterations and especially between models, the LogL values are not comparable - even to confirm convergence of a particular model.

ASReml also calculates a ‘variance heterogenity factor’, which is the Deviance/DF (the deviance is calculated for the underlying maximum likelihood model). In a GLM (no random effects) this deviance can be used to compare models. However, for GLMM (with random effects) it does not represent the total variation, because it ignores the part related to the random effects. So it also is unsuitable as a basis for testing in GLMM models.

Real recipes

What is a cookbook without some food related recipes? Here there is a couple of things that you can prepare while analyses are running.

Pizza

Dough

T is tablespoon, t is teaspoon, C is cup

1 T dry baker’s yeast.
0.5 t salt.
1 T honey.
1 C warm water but not too warm!
1 T sunflower oil.
3 C flour (2 white plain flour and 1 whole wheat or rye).

Mix first five ingredients together and let sit until bubbly. Stir in flour and then knead on a well-floured board, adding more flour if needed. Knead until smooth and even. Placed in an oiled medium-sized bowl, cover and put in a warm place to rise until doubled in size (approximately one hour). Preheat oven to 200C (400 F). Pat dough onto a greased pizza pan and put in the oven for five minutes. Allow to cool slightly and then spread a thin film of oil on top of partially cooked crust. Top with sauce and desired ingredients. Cook another ten to fifteen minutes or until bubbly.

The dough is suitable for freezing. Just cook the pizza dough for around twelve minutes instead of five.

Examples of topping

  • Pesto, cottage cheese, olives and mozzarella cheese.
  • Tomato paste, mozzarella cheese, shaved smoked salmon, capers, avocado and cherry tomatoes.
  • Tomato paste, ham, pineapple and banana.

Beef with plum sauce and Chinese greens

Olive oil (around 2 T)
Sesame oil (around 1T)
Freshly ground black pepper
1 4cm think piece of rump steak (around 500g)
12 eshallots
1 T Worcestershire sauce
1 C red wine
Half a cup of plum sauce
1 can of plums cut in half and pitted (drained)
2 bunches of Chinese greens (e.g., Choy Sum, Buk Shoi, Chinese Cabbage, etc)

Steak

Massage both sides of the steak in olive oil and pepper. Heat a large pan and cook the steak on each side for around 4 minutes, remove, put in a plate and cover with foil to keep it warm.

Plum sauce

Clean the pan and then add peeled eshallots and cook them until golden. Add Worcestershire sauce and wine and reduce by half. Add plum sauce and plums, bring slowly to the boil and reduce to a simmer.

Greens

Rinse the Chinese greens and cut into 5cm lengths. Heat a large pan, add sesame oil and toss greens over a high heat until wilted.

Serving

Cut the steak into slices and place on top of the greens. Top with plum sauce.

The plum sauce can be prepared in advance and stored a few days. It tastes even better.