Forestry progeny tests may show large environmental variability, especially when they contain large numbers of entries, which increases the large of the replicates. The existence of environmental gradients, patchy environmental conditions and competition between individuals creates association between closely located individuals. In some cases this association is positive (e.g. environmental gradients), while in other cases is negative (e.g. competition). There is a large number of models that have been put forward to model the presence of spatial association, but here the presentation will be limited to separable autoregressive (AR) processes.
The Genetic Improvement Program at the CRC for Sustainable Production Forestry has been working in spatial analysis for a few years now. There are at least two good references for spatial analysis in forest genetics, where the analyses were conducted using AS Reml (papers from the Can J For Res are available here subject to these copyright and author rights notices):
- Costa e Silva, J., Dutkowski, G.W. and Gilmour, A.R. 2001. Analysis of early tree height in forest genetic trials is enhanced by including a spatially correlated residual. Canadian Journal of Forest Research 31: 1887–1893. (PDF paper, Danger Will Robinson! 1.1 Mb file, not-so-fast server. If you can, just go to your library).
- Dutkowski, G.W., Costa e Silva, J., Gilmour, A.R. and Lopez, G.A. 2002. Spatial analysis methods for forest genetic trials. Canadian Journal of Forest Research 32: 2201–2214. I won’t put a link to the 2.5 Mb file with nice colour plates.
Spatial analysis in ASReml
The following code is adapted from code borrowed from Greg Dutkowski (thanks Greg!). The example presents a ‘typical’ complete block design and then adds spatial components to it.
Examples for spatial analysis tree !P family 50 !I row 19 col 84 rep 5 iblock 10 plot 400 dbh radiata.ped !make !alpha radiata.csv !csv !dopart $A # Analysis as random incomplete blocks # includes additive (tree) and SCA (family) effects !part 1 dbh ~ mu rep !r rep.iblock plot tree family !f mv 0 # Analysis as random incomplete blocks # but adding spatial row and col autoregressive # processes and independent error (units) !part 2 dbh ~ mu rep !r rep.iblock plot tree family units !f mv 1 2 0 row row AR 0.8 col col AR 0.8 # Spatial analysis dropping non-significant design # features (see text) !part 3 dbh ~ mu rep !r tree family units !f mv 1 2 0 row row AR 0.8 col col AR 0.8 # Analysis as random complete blocks # with spatial component. However, spatial is # represented in the G matrix rather than the # R matrix !part 4 dbh ~ mu !r row.col tree family 1 1 1 0 0 IDEN row.col 2 row row AR1V 0.9 01 col col AR 0.9
Part 1 of the code is simply an incomplete block design with random plot, additive (tree) and SCA (Family) effects. You can see explanations for this code in the Univariate Analysis section of the cookbook. Part 2 introduces a few additional elements:
- The 1 2 0 states there is ONE error structure that is the Kronecker product of TWO matrices (that are explained below).
- Separate autoregressive (AR) errors for rows and columns. The first row states the factor that we are using to fit the AR residual. The second row specifies the ‘distance’ measure. Same thing for col. the 0.9 is the starting value for each of the autoregressive coefficients.
- units refers to the uncorrelated (non spatial) residual. Thus, our model includes spatially associated residuals (explained in the previous two points) and spatially independent residuals. Do not forget to include units, because if it is not in the model you will probably get inflated estimates of heritability and think that spatial analysis is the best thing after sliced bread (and a long warm shower and …). Well, it is not. It can help you with patchy situations (for example disease scores) or to discover assessor effects, but in well designed experiments and standard traits (e.g. volume, basic density) will not make much of a difference.
- You will also need to fit missing values as fixed effects (!f mv). Most likely you do not have a perfectly rectangular experiment, but you can create a rectangular matrix with missing values and then fitting the missing values in the model. This will speed up (or even make possible) the use of spatial analysis in ASReml. Remember: to use part 2 or part 3 (modelling spatial through the residual matrix R) you must have a rectangular distribution of trees.
Most of the time, when you fit spatial and independent residuals (Part 2), the experimental design features will become non significant and you will drop them from the model (the exception, if any, will probably be the plot effect). Part 3 shows the likely form of the model without incomplete block and plot effects. Part 4 is a bit more intriguing, although it is equivalent to Part 3. There, rather than modelling the spatial residuals in the R residual matrix, we are modelling them as additional random effects, that is in the G matrix. Thus,
- 1 1 1 There is ONE residual structure, which is the product of ONE identity matrix multiplied by a scalar. There is also ONE additional covariance structure.
- The additional covariance structure row.col is the Kronecker product of TWO matrices, which have the same structure as the autoregressive residual matrices of Parts 2 and 3.
- Modelling spatial correlation through G does not require a rectangular distribution of trees. Part 4 is faster, but it may not run for large arrays.
Spatial analysis for irregularly spaced trees
The above code works well with equally spaced individuals; however, there are examples when the distances between trees are variable (for example when using GPS measurements of position). In this case, the code can be changed to something like (Thanks to Arthur Gilmour):
… # Scale the coordinates so a typical distance # is around 1. y !/10 x !/10 filename.dat !dopart $A !part 1 dbh ~ mu 0 # With Nugget variance fitted as residual !part 2 dbh ~ mu !r fac(x,y) 0 0 1 fac(x,y) 1 fac(x,y) fac(x,y) IEXPV 0.2 1 # with Nugget variance in G structure, # spatial in residual !part 3 dbh ~ mu x y !r units 1 1 1 0 0 910 EUC 0.2
In these jobs we are not including any genetic components, just to highlight the spatial syntax. Part 1 obviously fits only the overall mean. Part 2 creates a factor based on the combinations of two continuous variables, and allows for the spatial association in the G structure (in the fac(x,y) 1 … part). Part 3 fits the distances as covariates and allow for the spatial association at the R structure level. In the code 0 910 EUC 0.2, the 910 refers to variables 9 and 10 in the listing of read variables (x and y). Parts 2 and 3 are memory hogs, so try to stay clear of them if you can. An option would be to create an equally spaced grid and then assign each tree to the closest point in that grid. This would convert the problem to something that can be treated with the code used at the beginning of this page.