class: split-60 title-slide2 with-border white background-image: url("images/bg4.jpg") background-size: cover .column.shade_main[.content[ <br><br> # Statistical Methods for Omics Assisted Breeding ## <b style='color:#FFEB3B'>Spatial Analysis of Crop Field Trials</b> ### <br>Emi Tanaka <br>emi.tanaka@sydney.edu.au <br>School of Mathematics and Statisitcs ### 2018/11/14 <br> ### <span style='font-size:14pt; color:pink!important'>These slides may take a while to render properly. You can find the pdf <a href="https://www.dropbox.com/s/vh0hi67uera2qr5/day3-session05-spatial-analysis.pdf?dl=1">here</a>.</span> <br><br> ### <a rel='license' href='http://creativecommons.org/licenses/by-sa/4.0/'><img alt='Creative Commons License' style='border-width:0; width:30pt' src='images/cc.svg' /><img alt='Creative Commons License' style='border-width:0; width:20pt' src='images/by.svg' /><img alt='Creative Commons License' style='border-width:0; width:20pt' src='images/sa.svg' /></a><span style='font-size:10pt'> This work by <span xmlns:cc='http://creativecommons.org/ns#' property='cc:attributionName'>Emi Tanaka</span> is licensed under a <a rel='license' href='http://creativecommons.org/licenses/by-sa/4.0/'>Creative Commons Attribution-ShareAlike 4.0 International License</a>.</span> ]] .column[.content[ ]] <img src="images/USydLogo-white.svg" style="position:absolute; top:80%; left:40%;width:200px"> --- class: split-70 with-border .column[.content[ # Example: Wheat Yield in South Australia ![](day3-session05-spatial-analysis_files/figure-html/unnamed-chunk-2-1.png)<!-- --> .bottom_abs.width100.font_small[ Gilmour et al. (1997) Accounting for natural and extraneous variation in the analysis of field experiments. *Journal of Agricultural, Biological, and Environmental Statistics* **2** 269-293. ]]] .column.bg-blue[.content.font_sm80[ ```r data(gilmour.serpentine, package="agridat") ``` * A randomised complete block experiment of wheat in South Australia. * The numbers represent the genotype number. * The color represent the yield with darker green higher yield and yellow the lowest. * The plot is made using `desplot`. Try replicating it on your own! * Note row and column contain more than once of the same genotype. ]] --- class: split-90 with-border .column[.content[ # Skimming the data ```r gilmour.serpentine %>% skimr::skim() ``` ``` Skim summary statistics n obs: 330 n variables: 5 ── Variable type:factor ───────────────────────────────────────────────────────────────────────────────────── variable missing n n_unique ordered gen 0 330 107 FALSE rep 0 330 3 FALSE ── Variable type:integer ──────────────────────────────────────────────────────────────────────────────────── variable missing n mean sd p0 p25 p50 p75 p100 hist col 0 330 8 4.33 1 4 8 12 15 ▇▇▇▇▃▇▇▇ row 0 330 11.5 6.35 1 6 11.5 17 22 ▇▇▅▇▇▅▇▇ yield 0 330 591.77 154.19 194 469 617.5 713.5 925 ▁▂▅▅▆▇▅▁ ``` ```r dat <- gilmour.serpentine %>% mutate(rowf=factor(row), colf=factor(col)) ``` ]] .column.bg-blue[.content[ ]] --- class: split-90 with-border .column[.content[ ### Baseline Analysis for Randomised Complete Block Design ```r fit_base <- asreml(yield ~ gen, random=~rep, data=dat, trace=F) ``` ``` effect component std.error z.ratio constr rep!rep.var 12740 12860 0.99 P R!variance 13360 1271 11 P ``` ### Spatial Modelling ```r fit_ar1xar1 <- asreml(yield ~ gen, random=~rep, data=dat, trace=F, rcov=~ar1v(rowf):ar1(colf)) ``` ``` Error in asreml.chkOrd(Y, Var, dataFrame): Data frame order does not match that specified by the R model formula. ``` * As the error message implies, the `data.frame` is not in expected order. * The `rcov` structure implies that `col` needs to be ordered within `row`. ]] .column.bg-blue.font_small[.content[ Note Gilmour et al. (1997) did not include the `rep` effect but we include it here to respect the experimental design. ]] --- class: split-90 with-border .column[.content[ # Spatial Modelling: Local Trend ```r # order data so col within row dat <- dat %>% arrange(row, col) # now no error message fit_ar1xar1 <- asreml(yield ~ gen, random=~rep, data=dat, trace=F, rcov=~ar1v(rowf):ar1(colf)) # see variance parameter estimates lucid::vc(fit_ar1xar1) ``` ``` effect component std.error z.ratio constr rep!rep.var 0 NA NA B R!variance 1 NA NA F R!rowf.cor 0.904 0.018 49 U R!rowf.var 19700 3797 5.2 P R!colf.cor 0.425 0.064 6.6 U ``` General idea: plots that are geographically closer, should be more similar. ]] .column.bg-blue[.content[ ]] --- class: split-70 with-border .column[.content[ # Why AR1 `\(\times\)` AR1? ```r rcov=~ ar1v(row):ar1(col) ``` <center> <img src="images/ar1xar1.png" width="80%"/> </center> * Low additional number of parameters (three) to estimate but yet the structure is flexible and .indigo[anisotropic]. * For positive autocorrelation, the structure captures the general idea that the plots that are geographically close together are more similar. * The AR1 `\(\times\)` AR1 structure generally fits well in practice. ]] .column.bg-blue[.content.bg-sm80[ * The distance of plots are typically measured by the row and column position with distance in the row and column direction thought to be equidistant respectively. * .yellow[Anisotropic] means correlation in all directions can be different. * For example, the correlation of plots that are two rows apart and three column is `\(\rho_r^2\rho_c^3\)`. ]] --- class: split-40 with-border .column[.content.font_sm80[ # Other separable processes If you don't believe that plots are similar in say a column direction then you can model such that the plots are uncorrelated in column direction. ```r rcov=~ ar1v(row):id(col) ``` <center> <img src="images/ar1xid.png" width="70%"/> </center> You can of course do other combinations: ```r rcov=~ id(row):ar1v(col) rcov=~ idv(row):id(col) ``` ]] .column.bg-blue[.content[ # Log residual likelihood ```r fit_base$loglik ``` ``` [1] -1235.222 ``` ```r fit_ar1xar1$loglik ``` ``` [1] -1103.19 ``` ```r lrt(fit_ar1xar1, fit_base) ``` ``` Likelihood ratio test(s) assuming nested random models. Chisq probability adjusted using Stram & Lee, 1994. Df LR statistic Pr(Chisq) fit_ar1xar1/fit_base 1 264.06 < 2.2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` Certainly the fit of AR1$\times$AR1 model increases ]] --- class: split-50 with-border .column[.content[ # Residual Plot ![](day3-session05-spatial-analysis_files/figure-html/ar1xar1_resid-1.png)<!-- --> ]] .column.bg-blue[.content[ ```r dat %>% mutate(resid=fit_ar1xar1$residuals) %>% ggplot(aes(col, resid)) + geom_point() + geom_smooth(se=F) + facet_wrap(~paste0("Row ", row), nrow=4) + theme_bw(base_size=18) + labs(x="Column", y="Conditional Residual") ``` There is clearly a pattern in the column direction indicating there is some effect in relation to the column we have not removed. ]] --- class: split-60 with-border .column[.content[ # Sample Variograms for Residuals <img src="images/vario.png" width="100%"/> Sample variogram is made of: <img src="images/samplevario.png" width="100%"/> ]] .column.bg-blue[.content.font_sm90[ * .yellow[Variogram] is a popular method to detect spatial dependence. * The idea here is that if we have modelled the spatial dependence appropriately in the model, the residuals should not exhibit any spatial dependence. * In theory, variogram will start with (0,0) and if no spatial dependence is exhibited then it should flatten out. * In practice, there is high variability for sample variogram of plots furthest away due to less information so it may not be as flat as you like. ]] --- class: split-60 with-border .column[.content[ # Sample Variograms for Residuals <iframe src="plotly3.html" width="100%" height="600" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe> ]] .column.bg-blue[.content.font_small[ ```r vario_df <- variogram(fit_ar1xar1) x <- 0:max(vario_df$colf) y <- 0:max(vario_df$rowf) z <- matrix(vario_df$gamma, nrow=length(x), ncol=length(y)) plotly::plot_ly(x=x, y=y, z=z, type="surface") ``` You can also: ```r myf::metplot(fit_ar1xar1) ``` <img src="day3-session05-spatial-analysis_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> ]] --- class: split-70 with-border .column[.content[ # Smooth global trend Smooth global trend is incorporated using cubic smoothing spline indexed by the column number. ```r dat <- dat %>% mutate(lcol=scale(col, scale=F)[,1]) fit_spl <- asreml(yield ~ lcol + gen, random=~rep + spl(lcol), data=dat, trace=F, rcov=~ar1v(rowf):ar1(colf)) fit_spl <- update(fit_spl) lucid::vc(fit_spl) ``` ``` effect component std.error z.ratio constr rep!rep.var 0 NA NA B spl(lcol) 3275 3231 1 P R!variance 1 NA NA F R!rowf.cor 0.794 0.034 24 U R!rowf.var 8697 1501 5.8 P R!colf.cor 0.27 0.077 3.5 U ``` ]] .column.bg-white[.content[ ## Sample Variogram <iframe src="plotly4.html" width="100%" height="500" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe> ]] --- class: split-70 with-border .column[.content[ # Extraneous Variation We fit a random column effect. ```r fit_rcol <- asreml(yield ~ lcol + gen, random=~rep + spl(lcol) + colf, data=dat, trace=F, rcov=~ar1v(rowf):ar1(colf)) fit_rcol <- update(fit_rcol) lucid::vc(fit_rcol) ``` ``` effect component std.error z.ratio constr rep!rep.var 0 NA NA B spl(lcol) 3068 3088 0.99 P colf!colf.var 4167 1984 2.1 P R!variance 1 NA NA F R!rowf.cor 0.431 0.0688 6.3 U R!rowf.var 3590 444.6 8.1 P R!colf.cor 0.359 0.0692 5.2 U ``` ]] .column.bg-white[.content[ ## Sample Variogram <iframe src="plotly5.html" width="100%" height="500" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe> ]] --- class: split-50 with-border .column[.content.font_sm80[ # Possible other extraneous variation? ![](day3-session05-spatial-analysis_files/figure-html/unnamed-chunk-18-1.png)<!-- --> Could management practice have impacted the yield? Each plot were sprayed with one of the four directions: <img src="images/updown.png" style="height:1em"/>, <img src="images/downdown.png" style="height:1em"/>, <img src="images/downup.png" style="height:1em"/> and <img src="images/upup.png" style="height:1em"/>. ```r dat <- dat %>% mutate(colcode=case_when( col%%4==1 ~ "updown", col%%4==2 ~ "downdown", col%%4==3 ~ "downup", col%%4==0 ~ "upup")) ``` ]] .column[.content[ <img src="images/field_spray.gif" width="100%"/> ]] --- class: split-70 with-border .column[.content[ # Extraneous Variation We fit a fixed `colcode` effect. ```r fit_colcode <- asreml(yield ~ lcol + gen + colcode, random=~rep + spl(lcol) + colf, data=dat, trace=F, rcov=~ar1v(rowf):ar1(colf)) fit_colcode <- update(fit_colcode) lucid::vc(fit_colcode) ``` ``` effect component std.error z.ratio constr rep!rep.var 0 NA NA B spl(lcol) 2236 2055 1.1 P colf!colf.var 1382 909.9 1.5 P R!variance 1 NA NA F R!rowf.cor 0.433 0.0686 6.3 U R!rowf.var 3597 446.2 8.1 P R!colf.cor 0.36 0.0692 5.2 U ``` ]] .column.bg-white[.content[ ## Sample Variogram <iframe src="plotly6.html" width="100%" height="500" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe> ]] --- class: split-70 with-border .column[.content[ # Extraneous Variation We fit a random row effect. ```r fit_rrow <- asreml(yield ~ lcol + gen + colcode, random=~rep + spl(lcol) + colf + rowf, data=dat, trace=F, rcov=~ar1v(rowf):ar1(colf)) fit_rrow <- update(fit_rrow) lucid::vc(fit_rrow) ``` ``` effect component std.error z.ratio constr rep!rep.var 0 NA NA B spl(lcol) 2250 2050 1.1 P colf!colf.var 1326 906.8 1.5 P rowf!rowf.var 531 265.1 2 P R!variance 1 NA NA F R!rowf.cor 0.4775 0.0694 6.9 U R!rowf.var 3015 397.1 7.6 P R!colf.cor 0.1956 0.0922 2.1 U ``` ]] .column.bg-white[.content[ ## Sample Variogram <iframe src="plotly5.html" width="100%" height="500" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe> ]] --- class: split-50 with-border .column[.content.font_sm80[ # Possible more extraneous variation? ![](day3-session05-spatial-analysis_files/figure-html/unnamed-chunk-22-1.png)<!-- --> The tractor harvested 3 rows at a time. `L` indicates the left of the tractor; `M` is middle and `R` is right. If the driver misaligns the center, it may affect the observed yield. ```r dat <- dat %>% mutate(rowcode=case_when( row%%3==0 ~ "M", TRUE ~ "LR")) ``` ]] .column[.content[ <img src="images/field_harvest.gif" width="100%"/> ]] --- class: split-70 with-border .column[.content[ # Extraneous Variation We fit a random row effect. ```r fit_rowcode <- asreml(yield ~ lcol + gen + colcode + rowcode, random=~rep + spl(lcol) + colf + rowf, data=dat, trace=F, rcov=~ar1v(rowf):ar1(colf)) fit_rowcode <- update(fit_rowcode) lucid::vc(fit_rowcode) ``` ``` effect component std.error z.ratio constr rep!rep.var 0 NA NA B spl(lcol) 2235 2037 1.1 P colf!colf.var 1319 914.7 1.4 P rowf!rowf.var 235.8 180 1.3 P R!variance 1 NA NA F R!rowf.cor 0.5012 0.0676 7.4 U R!rowf.var 3142 419.7 7.5 P R!colf.cor 0.1994 0.0917 2.2 U ``` ]] .column.bg-white[.content[ ## Sample Variogram <iframe src="plotly7.html" width="100%" height="500" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe> ]] --- class: split-70 with-border .column[.content[ # Extraneous Variation We fit a nugget/measurement error effect. ```r fit_nugget <- asreml(yield ~ lcol + gen + colcode + rowcode, random=~rep + spl(lcol) + colf + rowf + units, data=dat, trace=F, rcov=~ar1v(rowf):ar1(colf)) fit_nugget <- update(fit_nugget) lucid::vc(fit_nugget) ``` ``` effect component std.error z.ratio constr rep!rep.var 0 NA NA B spl(lcol) 2163 2131 1 P colf!colf.var 811.6 1340 0.61 P rowf!rowf.var 200.6 148.9 1.3 P units!units.var 1315 259.1 5.1 P R!variance 1 NA NA F R!rowf.cor 0.9076 0.0851 11 U R!rowf.var 3153 1936 1.6 P R!colf.cor 0.3818 0.1706 2.2 U ``` ]] .column.bg-white[.content[ ## Sample Variogram <iframe src="plotly8.html" width="100%" height="500" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe> ]] --- class: split-90 with-border .column[.content[ # All Models Fitted <iframe src="html_extras/table15.html" width="100%" height="380" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe> .font_sm80[ * Some models have different fixed effects and thus comparison of models based on residual likelihood will be inappropriate. * The models fitted are not entirely the same as Gilmour et al. (1997), e.g. I did not choose to fit a linear `row` effect as I felt that was an overkill but I chose to continue fitting the `rep` effect as this was part of the experimental design. The effect of `rep` is essentially zero so it does not make much difference. ]]] .column.bg-blue[.content[ ]] --- class: split-two .column.bg-yellow[.content.font_sm80[ # Key References * Gilmour et al. (1997) Accounting for Natural and Extraneous Variation in the Analysis of Field Experiments. *Journal of Agricultural, Biological, and Environmental Statistics* **2**(3) 269-293 ## Extra References These were not covered in the workshop but statistical tools good to be aware of. * Stefanova et al. (2010) Enhanced diagnostics for the spatial analysis of field trials. *Journal of Agricultural, Biological, and Environmental Statistics* **14**(4) 392-410 * Velazco et al. (2017) Modelling spatial trends in sorghum breeding field trials using a two-dimensional P-spline mixed model. *Theoretical and Applied Genetics* **130**(7) 1375-1392 * Rodríguez-Álvarez et al. (2018) Correcting for spatial heterogeneity in plant breeding experiments with P-splines. *Spatial Statisitcs* **23** 52-71 ]] .column.bg-main1.white[.content[ # Notes * Again remember (because it's worth repeating): .center[ "All models are wrong but some are useful." <span style="float:right">-George Box</span> ] * More complex models generally fit the data well but model selection seek to find a parsimonious model. * Model selection is a hard statistical problem especially for linear mixed models. There is no universal model selection tools that work for every data. ]] --- class: split-40 title-slide2 with-border white background-image: url("images/bg4.jpg") background-size: cover .column[.content[ ]] .column.shade_main[.content[ <br><br> # <u>Slides</u> These slides were made using the R package [`xaringan`](https://github.com/yihui/xaringan) with the [`ninja-themes`](https://github.com/emitanaka/ninja-theme) and is available at [`bit.ly/UT-WS-spatial`](http://bit.ly/UT-WS-spatial). # <u>Your Turn</u> <s>Download `day3-session05-spatial-analysis-tutorial.Rmd` here, open in RStudio, push the button "Run Document" on the top tab and work through the exercises.</s> For workshop participants, contact Emi for the tutorials. <br><br> ### <a rel='license' href='http://creativecommons.org/licenses/by-sa/4.0/'><img alt='Creative Commons License' style='border-width:0; width:30pt' src='images/cc.svg' /><img alt='Creative Commons License' style='border-width:0; width:20pt' src='images/by.svg' /><img alt='Creative Commons License' style='border-width:0; width:20pt' src='images/sa.svg' /></a><span style='font-size:10pt'> This work by <span xmlns:cc='http://creativecommons.org/ns#' property='cc:attributionName'>Emi Tanaka</span> is licensed under a <a rel='license' href='http://creativecommons.org/licenses/by-sa/4.0/'>Creative Commons Attribution-ShareAlike 4.0 International License</a>.</span> ]] <img src="images/USydLogo-white.svg" style="position:absolute; top:80%; left:80%;width:200px">