+ - 0:00:00
Notes for current slide
Notes for next slide



Statistical Methods for Omics Assisted Breeding

Spatial Analysis of Crop Field Trials


Emi Tanaka
emi.tanaka@sydney.edu.au
School of Mathematics and Statisitcs

2018/11/14

These slides may take a while to render properly. You can find the pdf here.

Creative Commons LicenseCreative Commons LicenseCreative Commons License    This work by Emi Tanaka is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

1 / 21

Example: Wheat Yield in South Australia

Gilmour et al. (1997) Accounting for natural and extraneous variation in the analysis of field experiments. Journal of Agricultural, Biological, and Environmental Statistics 2 269-293.

data(gilmour.serpentine,
package="agridat")
  • A randomised complete block experiment of wheat in South Australia.
  • The numbers represent the genotype number.
  • The color represent the yield with darker green higher yield and yellow the lowest.
  • The plot is made using desplot. Try replicating it on your own!
  • Note row and column contain more than once of the same genotype.
2 / 21

Skimming the data

gilmour.serpentine %>% skimr::skim()
Skim summary statistics
n obs: 330
n variables: 5
── Variable type:factor ─────────────────────────────────────────────────────────────────────────────────────
variable missing n n_unique ordered
gen 0 330 107 FALSE
rep 0 330 3 FALSE
── Variable type:integer ────────────────────────────────────────────────────────────────────────────────────
variable missing n mean sd p0 p25 p50 p75 p100 hist
col 0 330 8 4.33 1 4 8 12 15 ▇▇▇▇▃▇▇▇
row 0 330 11.5 6.35 1 6 11.5 17 22 ▇▇▅▇▇▅▇▇
yield 0 330 591.77 154.19 194 469 617.5 713.5 925 ▁▂▅▅▆▇▅▁
dat <- gilmour.serpentine %>% mutate(rowf=factor(row), colf=factor(col))
3 / 21

Baseline Analysis for Randomised Complete Block Design

fit_base <- asreml(yield ~ gen, random=~rep, data=dat, trace=F)
effect component std.error z.ratio constr
rep!rep.var 12740 12860 0.99 P
R!variance 13360 1271 11 P

Spatial Modelling

fit_ar1xar1 <- asreml(yield ~ gen, random=~rep, data=dat, trace=F,
rcov=~ar1v(rowf):ar1(colf))
Error in asreml.chkOrd(Y, Var, dataFrame): Data frame order does not match that specified by the R model formula.
  • As the error message implies, the data.frame is not in expected order.
  • The rcov structure implies that col needs to be ordered within row.

Note Gilmour et al. (1997) did not include the rep effect but we include it here to respect the experimental design.

4 / 21

Spatial Modelling: Local Trend

# order data so col within row
dat <- dat %>% arrange(row, col)
# now no error message
fit_ar1xar1 <- asreml(yield ~ gen, random=~rep, data=dat, trace=F,
rcov=~ar1v(rowf):ar1(colf))
# see variance parameter estimates
lucid::vc(fit_ar1xar1)
effect component std.error z.ratio constr
rep!rep.var 0 NA NA B
R!variance 1 NA NA F
R!rowf.cor 0.904 0.018 49 U
R!rowf.var 19700 3797 5.2 P
R!colf.cor 0.425 0.064 6.6 U

General idea: plots that are geographically closer, should be more similar.

5 / 21

Why AR1 \(\times\) AR1?

rcov=~ ar1v(row):ar1(col)
  • Low additional number of parameters (three) to estimate but yet the structure is flexible and anisotropic.
  • For positive autocorrelation, the structure captures the general idea that the plots that are geographically close together are more similar.
  • The AR1 \(\times\) AR1 structure generally fits well in practice.
  • The distance of plots are typically measured by the row and column position with distance in the row and column direction thought to be equidistant respectively.
  • Anisotropic means correlation in all directions can be different.
  • For example, the correlation of plots that are two rows apart and three column is \(\rho_r^2\rho_c^3\).
6 / 21

Other separable processes

If you don't believe that plots are similar in say a column direction then you can model such that the plots are uncorrelated in column direction.

rcov=~ ar1v(row):id(col)

You can of course do other combinations:

rcov=~ id(row):ar1v(col)
rcov=~ idv(row):id(col)

Log residual likelihood

fit_base$loglik
[1] -1235.222
fit_ar1xar1$loglik
[1] -1103.19
lrt(fit_ar1xar1, fit_base)
Likelihood ratio test(s) assuming nested random models.
Chisq probability adjusted using Stram & Lee, 1994.
Df LR statistic Pr(Chisq)
fit_ar1xar1/fit_base 1 264.06 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Certainly the fit of AR1$\times$AR1 model increases

7 / 21

Residual Plot

dat %>%
mutate(resid=fit_ar1xar1$residuals) %>%
ggplot(aes(col, resid)) +
geom_point() +
geom_smooth(se=F) +
facet_wrap(~paste0("Row ", row), nrow=4) +
theme_bw(base_size=18) +
labs(x="Column",
y="Conditional Residual")

There is clearly a pattern in the column direction indicating there is some effect in relation to the column we have not removed.

8 / 21

Sample Variograms for Residuals

Sample variogram is made of:

  • Variogram is a popular method to detect spatial dependence.
  • The idea here is that if we have modelled the spatial dependence appropriately in the model, the residuals should not exhibit any spatial dependence.
  • In theory, variogram will start with (0,0) and if no spatial dependence is exhibited then it should flatten out.
  • In practice, there is high variability for sample variogram of plots furthest away due to less information so it may not be as flat as you like.
9 / 21

Sample Variograms for Residuals

vario_df <- variogram(fit_ar1xar1)
x <- 0:max(vario_df$colf)
y <- 0:max(vario_df$rowf)
z <- matrix(vario_df$gamma,
nrow=length(x),
ncol=length(y))
plotly::plot_ly(x=x, y=y, z=z,
type="surface")

You can also:

myf::metplot(fit_ar1xar1)

10 / 21

Smooth global trend

Smooth global trend is incorporated using cubic smoothing spline indexed by the column number.

dat <- dat %>% mutate(lcol=scale(col, scale=F)[,1])
fit_spl <- asreml(yield ~ lcol + gen,
random=~rep + spl(lcol),
data=dat, trace=F,
rcov=~ar1v(rowf):ar1(colf))
fit_spl <- update(fit_spl)
lucid::vc(fit_spl)
effect component std.error z.ratio constr
rep!rep.var 0 NA NA B
spl(lcol) 3275 3231 1 P
R!variance 1 NA NA F
R!rowf.cor 0.794 0.034 24 U
R!rowf.var 8697 1501 5.8 P
R!colf.cor 0.27 0.077 3.5 U

Sample Variogram

11 / 21

Extraneous Variation

We fit a random column effect.

fit_rcol <- asreml(yield ~ lcol + gen,
random=~rep + spl(lcol) + colf,
data=dat, trace=F,
rcov=~ar1v(rowf):ar1(colf))
fit_rcol <- update(fit_rcol)
lucid::vc(fit_rcol)
effect component std.error z.ratio constr
rep!rep.var 0 NA NA B
spl(lcol) 3068 3088 0.99 P
colf!colf.var 4167 1984 2.1 P
R!variance 1 NA NA F
R!rowf.cor 0.431 0.0688 6.3 U
R!rowf.var 3590 444.6 8.1 P
R!colf.cor 0.359 0.0692 5.2 U

Sample Variogram

12 / 21

Possible other extraneous variation?

Could management practice have impacted the yield? Each plot were sprayed with one of the four directions: , , and .

dat <- dat %>% mutate(colcode=case_when(
col%%4==1 ~ "updown",
col%%4==2 ~ "downdown",
col%%4==3 ~ "downup",
col%%4==0 ~ "upup"))

13 / 21

Extraneous Variation

We fit a fixed colcode effect.

fit_colcode <- asreml(yield ~ lcol + gen + colcode,
random=~rep + spl(lcol) + colf,
data=dat, trace=F,
rcov=~ar1v(rowf):ar1(colf))
fit_colcode <- update(fit_colcode)
lucid::vc(fit_colcode)
effect component std.error z.ratio constr
rep!rep.var 0 NA NA B
spl(lcol) 2236 2055 1.1 P
colf!colf.var 1382 909.9 1.5 P
R!variance 1 NA NA F
R!rowf.cor 0.433 0.0686 6.3 U
R!rowf.var 3597 446.2 8.1 P
R!colf.cor 0.36 0.0692 5.2 U

Sample Variogram

14 / 21

Extraneous Variation

We fit a random row effect.

fit_rrow <- asreml(yield ~ lcol + gen + colcode,
random=~rep + spl(lcol) + colf + rowf,
data=dat, trace=F,
rcov=~ar1v(rowf):ar1(colf))
fit_rrow <- update(fit_rrow)
lucid::vc(fit_rrow)
effect component std.error z.ratio constr
rep!rep.var 0 NA NA B
spl(lcol) 2250 2050 1.1 P
colf!colf.var 1326 906.8 1.5 P
rowf!rowf.var 531 265.1 2 P
R!variance 1 NA NA F
R!rowf.cor 0.4775 0.0694 6.9 U
R!rowf.var 3015 397.1 7.6 P
R!colf.cor 0.1956 0.0922 2.1 U

Sample Variogram

15 / 21

Possible more extraneous variation?

The tractor harvested 3 rows at a time. L indicates the left of the tractor; M is middle and R is right. If the driver misaligns the center, it may affect the observed yield.

dat <- dat %>% mutate(rowcode=case_when(
row%%3==0 ~ "M",
TRUE ~ "LR"))

16 / 21

Extraneous Variation

We fit a random row effect.

fit_rowcode <- asreml(yield ~ lcol + gen + colcode + rowcode,
random=~rep + spl(lcol) + colf + rowf,
data=dat, trace=F,
rcov=~ar1v(rowf):ar1(colf))
fit_rowcode <- update(fit_rowcode)
lucid::vc(fit_rowcode)
effect component std.error z.ratio constr
rep!rep.var 0 NA NA B
spl(lcol) 2235 2037 1.1 P
colf!colf.var 1319 914.7 1.4 P
rowf!rowf.var 235.8 180 1.3 P
R!variance 1 NA NA F
R!rowf.cor 0.5012 0.0676 7.4 U
R!rowf.var 3142 419.7 7.5 P
R!colf.cor 0.1994 0.0917 2.2 U

Sample Variogram

17 / 21

Extraneous Variation

We fit a nugget/measurement error effect.

fit_nugget <- asreml(yield ~ lcol + gen + colcode + rowcode,
random=~rep + spl(lcol) + colf + rowf + units,
data=dat, trace=F,
rcov=~ar1v(rowf):ar1(colf))
fit_nugget <- update(fit_nugget)
lucid::vc(fit_nugget)
effect component std.error z.ratio constr
rep!rep.var 0 NA NA B
spl(lcol) 2163 2131 1 P
colf!colf.var 811.6 1340 0.61 P
rowf!rowf.var 200.6 148.9 1.3 P
units!units.var 1315 259.1 5.1 P
R!variance 1 NA NA F
R!rowf.cor 0.9076 0.0851 11 U
R!rowf.var 3153 1936 1.6 P
R!colf.cor 0.3818 0.1706 2.2 U

Sample Variogram

18 / 21

All Models Fitted

  • Some models have different fixed effects and thus comparison of models based on residual likelihood will be inappropriate.
  • The models fitted are not entirely the same as Gilmour et al. (1997), e.g. I did not choose to fit a linear row effect as I felt that was an overkill but I chose to continue fitting the rep effect as this was part of the experimental design. The effect of rep is essentially zero so it does not make much difference.
19 / 21

Key References

  • Gilmour et al. (1997) Accounting for Natural and Extraneous Variation in the Analysis of Field Experiments. Journal of Agricultural, Biological, and Environmental Statistics 2(3) 269-293

Extra References

These were not covered in the workshop but statistical tools good to be aware of.

  • Stefanova et al. (2010) Enhanced diagnostics for the spatial analysis of field trials. Journal of Agricultural, Biological, and Environmental Statistics 14(4) 392-410
  • Velazco et al. (2017) Modelling spatial trends in sorghum breeding field trials using a two-dimensional P-spline mixed model. Theoretical and Applied Genetics 130(7) 1375-1392
  • Rodríguez-Álvarez et al. (2018) Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spatial Statisitcs 23 52-71

Notes

  • Again remember (because it's worth repeating):

    "All models are wrong but some are useful." -George Box

  • More complex models generally fit the data well but model selection seek to find a parsimonious model.
  • Model selection is a hard statistical problem especially for linear mixed models. There is no universal model selection tools that work for every data.
20 / 21



Slides

These slides were made using the R package xaringan with the ninja-themes and is available at bit.ly/UT-WS-spatial.

Your Turn

Download day3-session05-spatial-analysis-tutorial.Rmd here, open in RStudio, push the button "Run Document" on the top tab and work through the exercises. For workshop participants, contact Emi for the tutorials.

Creative Commons LicenseCreative Commons LicenseCreative Commons License    This work by Emi Tanaka is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

21 / 21

Example: Wheat Yield in South Australia

Gilmour et al. (1997) Accounting for natural and extraneous variation in the analysis of field experiments. Journal of Agricultural, Biological, and Environmental Statistics 2 269-293.

data(gilmour.serpentine,
package="agridat")
  • A randomised complete block experiment of wheat in South Australia.
  • The numbers represent the genotype number.
  • The color represent the yield with darker green higher yield and yellow the lowest.
  • The plot is made using desplot. Try replicating it on your own!
  • Note row and column contain more than once of the same genotype.
2 / 21
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow