class: split-60 title-slide2 with-border white
background-image: url("images/bg4.jpg")
background-size: cover

.column.shade_main[.content[
<br><br>

# Statistical Methods for Omics Assisted Breeding

## <b style='color:#FFEB3B'>Spatial Analysis of Crop Field Trials</b>

### <br>Emi Tanaka <br>emi.tanaka@sydney.edu.au <br>School of Mathematics and Statisitcs

### 2018/11/14 <br>
### <span style='font-size:14pt; color:pink!important'>These slides may take a while to render properly. You can find the pdf <a href="https://www.dropbox.com/s/vh0hi67uera2qr5/day3-session05-spatial-analysis.pdf?dl=1">here</a>.</span> <br><br>
### <a rel='license' href='http://creativecommons.org/licenses/by-sa/4.0/'><img alt='Creative Commons License' style='border-width:0; width:30pt' src='images/cc.svg' /><img alt='Creative Commons License' style='border-width:0; width:20pt' src='images/by.svg' /><img alt='Creative Commons License' style='border-width:0; width:20pt' src='images/sa.svg' /></a><span style='font-size:10pt'>&nbsp;&nbsp;&nbsp;&nbsp;This work by <span xmlns:cc='http://creativecommons.org/ns#' property='cc:attributionName'>Emi Tanaka</span> is licensed under a <a rel='license' href='http://creativecommons.org/licenses/by-sa/4.0/'>Creative Commons Attribution-ShareAlike 4.0 International License</a>.</span>

]]
.column[.content[

]]
<img src="images/USydLogo-white.svg" style="position:absolute; top:80%; left:40%;width:200px">

---

.column[.content[
# Example: Wheat Yield in South Australia
![](day3-session05-spatial-analysis_files/figure-html/unnamed-chunk-2-1.png)
.bottom_abs.width100.font_small[
Gilmour et al. (1997) Accounting for natural and extraneous variation in the analysis of field experiments. *Journal of Agricultural, Biological, and Environmental Statistics* **2** 269-293.
]]]
.column.bg-blue[.content.font_sm80[

```r
data(gilmour.serpentine, 
     package="agridat")
```

* A randomised complete block experiment of wheat in South Australia.
* The numbers represent the genotype number.
* The color represent the yield with darker green higher yield and yellow the lowest. 
* The plot is made using `desplot`. Try replicating it on your own!
* Note row and column contain more than once of the same genotype.
]]

---

```r
gilmour.serpentine %>% skimr::skim()
```

```
  Skim summary statistics
   n obs: 330 
   n variables: 5 
  
  ── Variable type:factor ─────────────────────────────────────────────────────────────────────────────────────
   variable missing   n n_unique ordered
        gen       0 330      107   FALSE
        rep       0 330        3   FALSE
  
  ── Variable type:integer ────────────────────────────────────────────────────────────────────────────────────
   variable missing   n   mean     sd  p0 p25   p50   p75 p100     hist
        col       0 330   8      4.33   1   4   8    12     15 ▇▇▇▇▃▇▇▇
        row       0 330  11.5    6.35   1   6  11.5  17     22 ▇▇▅▇▇▅▇▇
      yield       0 330 591.77 154.19 194 469 617.5 713.5  925 ▁▂▅▅▆▇▅▁
```

```r
dat <- gilmour.serpentine %>% mutate(rowf=factor(row), colf=factor(col))
```
]]
.column.bg-blue[.content[
]]

---

```r
fit_base <- asreml(yield ~ gen, random=~rep, data=dat, trace=F)
```

```
        effect component std.error z.ratio constr
   rep!rep.var     12740     12860    0.99      P
    R!variance     13360      1271   11         P
```
### Spatial Modelling

```r
fit_ar1xar1 <- asreml(yield ~ gen, random=~rep, data=dat, trace=F, 
                     rcov=~ar1v(rowf):ar1(colf))
```

```
  Error in asreml.chkOrd(Y, Var, dataFrame): Data frame order does not match that specified by the R model formula.
```
* As the error message implies, the `data.frame` is not in expected order.
* The `rcov` structure implies that `col` needs to be ordered within `row`.
]]
.column.bg-blue.font_small[.content[
Note Gilmour et al. (1997) did not include the `rep` effect but we include it here to respect the experimental design.
]]

---

```r
# order data so col within row
dat <- dat %>% arrange(row, col)
# now no error message
fit_ar1xar1 <- asreml(yield ~ gen, random=~rep, data=dat, trace=F, 
                     rcov=~ar1v(rowf):ar1(colf))
# see variance parameter estimates
lucid::vc(fit_ar1xar1)
```

```
        effect component std.error z.ratio constr
   rep!rep.var     0            NA      NA      B
    R!variance     1            NA      NA      F
    R!rowf.cor     0.904     0.018    49        U
    R!rowf.var 19700      3797         5.2      P
    R!colf.cor     0.425     0.064     6.6      U
```
General idea: plots that are geographically closer, should be more similar.
]]
.column.bg-blue[.content[
]]

---

```r
rcov=~ ar1v(row):ar1(col)
```
<center>
<img src="images/ar1xar1.png" width="80%"/>
</center>
* Low additional number of parameters (three) to estimate but yet the structure is flexible and .indigo[anisotropic]. 
* For positive autocorrelation, the structure captures the general idea that the plots that are geographically close together are more similar. 
* The AR1 `$\times$` AR1 structure generally fits well in practice.
]]
.column.bg-blue[.content.bg-sm80[
* The distance of plots are typically measured by the row and column position with distance in the row and column direction thought to be equidistant respectively.
* .yellow[Anisotropic] means correlation in all directions can be different. 
* For example, the correlation of plots that are two rows apart and three column is `$\rho_r^2\rho_c^3$`. 
]]

---

.column[.content.font_sm80[
# Other separable processes
If you don't believe that plots are similar in say a column direction then you can model such that the plots are uncorrelated in column direction.

```r
rcov=~ ar1v(row):id(col)
```
<center>
<img src="images/ar1xid.png" width="70%"/>
</center>

You can of course do other combinations:

```r
rcov=~ id(row):ar1v(col)
rcov=~ idv(row):id(col)
```

]]
.column.bg-blue[.content[
# Log residual likelihood

```r
fit_base$loglik
```

```
  [1] -1235.222
```

```r
fit_ar1xar1$loglik
```

```
  [1] -1103.19
```

```r
lrt(fit_ar1xar1, fit_base)
```

```
  Likelihood ratio test(s) assuming nested random models.
  Chisq probability adjusted using Stram & Lee, 1994.
  
                       Df LR statistic Pr(Chisq)    
  fit_ar1xar1/fit_base  1       264.06 < 2.2e-16 ***
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```
Certainly the fit of AR1$\times$AR1 model increases 
]]

---

.column[.content[
# Residual Plot 
![](day3-session05-spatial-analysis_files/figure-html/ar1xar1_resid-1.png)

]]
.column.bg-blue[.content[

```r
dat %>% 
 mutate(resid=fit_ar1xar1$residuals) %>% 
 ggplot(aes(col, resid)) + 
 geom_point() + 
 geom_smooth(se=F) + 
 facet_wrap(~paste0("Row ", row), nrow=4) + 
 theme_bw(base_size=18) + 
 labs(x="Column", 
       y="Conditional Residual")
```
There is clearly a pattern in the column direction indicating there is some effect in relation to the column we have not removed.
]]

---

Sample variogram is made of:

]]
.column.bg-blue[.content.font_sm90[
* .yellow[Variogram] is a popular method to detect spatial dependence. 
* The idea here is that if we have modelled the spatial dependence appropriately in the model, the residuals should not exhibit any spatial dependence. 
* In theory, variogram will start with (0,0) and if no spatial dependence is exhibited then it should flatten out. 
* In practice, there is high variability for sample variogram of plots furthest away due to less information so it may not be as flat as you like.  
]]

---

.column[.content[
# Sample Variograms for Residuals
<iframe src="plotly3.html" width="100%" height="600" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe>

]]
.column.bg-blue[.content.font_small[

```r
vario_df <- variogram(fit_ar1xar1)
x <- 0:max(vario_df$colf)
y <- 0:max(vario_df$rowf)
z <- matrix(vario_df$gamma, 
            nrow=length(x), 
            ncol=length(y))
plotly::plot_ly(x=x, y=y, z=z, 
                type="surface") 
```
You can also:

```r
myf::metplot(fit_ar1xar1)
```

<img src="day3-session05-spatial-analysis_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" />
]]

---

.column[.content[
# Smooth global trend 
Smooth global trend is incorporated using cubic smoothing spline indexed by the column number.

```r
dat <- dat %>% mutate(lcol=scale(col, scale=F)[,1])
fit_spl <- asreml(yield ~ lcol + gen, 
               random=~rep + spl(lcol), 
               data=dat, trace=F,
               rcov=~ar1v(rowf):ar1(colf))
fit_spl <- update(fit_spl)
lucid::vc(fit_spl)
```

```
        effect component std.error z.ratio constr
   rep!rep.var     0            NA      NA      B
     spl(lcol)  3275      3231         1        P
    R!variance     1            NA      NA      F
    R!rowf.cor     0.794     0.034    24        U
    R!rowf.var  8697      1501         5.8      P
    R!colf.cor     0.27      0.077     3.5      U
```
]]
.column.bg-white[.content[
## Sample Variogram

]]

---

```r
fit_rcol <- asreml(yield ~ lcol + gen, 
               random=~rep + spl(lcol) + colf, 
               data=dat, trace=F,
               rcov=~ar1v(rowf):ar1(colf))
fit_rcol <- update(fit_rcol)
lucid::vc(fit_rcol)
```

```
          effect component std.error z.ratio constr
     rep!rep.var     0            NA      NA      B
       spl(lcol)  3068     3088         0.99      P
   colf!colf.var  4167     1984         2.1       P
      R!variance     1            NA      NA      F
      R!rowf.cor     0.431    0.0688    6.3       U
      R!rowf.var  3590      444.6       8.1       P
      R!colf.cor     0.359    0.0692    5.2       U
```
]]
.column.bg-white[.content[
## Sample Variogram

]]

---

.column[.content.font_sm80[
# Possible other extraneous variation? 
![](day3-session05-spatial-analysis_files/figure-html/unnamed-chunk-18-1.png)

Could management practice have impacted the yield? Each plot were sprayed with one of the four directions: <img src="images/updown.png" style="height:1em"/>, <img src="images/downdown.png" style="height:1em"/>, <img src="images/downup.png" style="height:1em"/> and <img src="images/upup.png" style="height:1em"/>.

```r
dat <- dat %>% mutate(colcode=case_when(
  col%%4==1 ~ "updown",
  col%%4==2 ~ "downdown",
  col%%4==3 ~ "downup",
  col%%4==0 ~ "upup"))
```

]]
.column[.content[

<img src="images/field_spray.gif" width="100%"/>
 
]]

---

```r
fit_colcode <- asreml(yield ~ lcol + gen + colcode, 
               random=~rep + spl(lcol) + colf, 
               data=dat, trace=F,
               rcov=~ar1v(rowf):ar1(colf))
fit_colcode <- update(fit_colcode)
lucid::vc(fit_colcode)
```

```
          effect component std.error z.ratio constr
     rep!rep.var     0            NA      NA      B
       spl(lcol)  2236     2055          1.1      P
   colf!colf.var  1382      909.9        1.5      P
      R!variance     1            NA      NA      F
      R!rowf.cor     0.433    0.0686     6.3      U
      R!rowf.var  3597      446.2        8.1      P
      R!colf.cor     0.36     0.0692     5.2      U
```
]]
.column.bg-white[.content[
## Sample Variogram
<iframe src="plotly6.html" width="100%" height="500" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe>

]]

---

```r
fit_rrow <- asreml(yield ~ lcol + gen + colcode, 
               random=~rep + spl(lcol) + colf + rowf, 
               data=dat, trace=F,
               rcov=~ar1v(rowf):ar1(colf))
fit_rrow <- update(fit_rrow)
lucid::vc(fit_rrow)
```

```
          effect component std.error z.ratio constr
     rep!rep.var    0             NA      NA      B
       spl(lcol) 2250      2050          1.1      P
   colf!colf.var 1326       906.8        1.5      P
   rowf!rowf.var  531       265.1        2        P
      R!variance    1             NA      NA      F
      R!rowf.cor    0.4775    0.0694     6.9      U
      R!rowf.var 3015       397.1        7.6      P
      R!colf.cor    0.1956    0.0922     2.1      U
```
]]
.column.bg-white[.content[
## Sample Variogram

]]

---

.column[.content.font_sm80[
# Possible more extraneous variation? 
![](day3-session05-spatial-analysis_files/figure-html/unnamed-chunk-22-1.png)

The tractor harvested 3 rows at a time. `L` indicates the left of the tractor; `M` is middle and `R` is right. If the driver misaligns the center, it may affect the observed yield.

```r
dat <- dat %>% mutate(rowcode=case_when(
  row%%3==0 ~ "M",
  TRUE ~ "LR"))
```

]]
.column[.content[

<img src="images/field_harvest.gif" width="100%"/>
 
]]

---

```r
fit_rowcode <- asreml(yield ~ lcol + gen + colcode + rowcode, 
               random=~rep + spl(lcol) + colf + rowf, 
               data=dat, trace=F,
               rcov=~ar1v(rowf):ar1(colf))
fit_rowcode <- update(fit_rowcode)
lucid::vc(fit_rowcode)
```

```
          effect component std.error z.ratio constr
     rep!rep.var    0             NA      NA      B
       spl(lcol) 2235      2037          1.1      P
   colf!colf.var 1319       914.7        1.4      P
   rowf!rowf.var  235.8     180          1.3      P
      R!variance    1             NA      NA      F
      R!rowf.cor    0.5012    0.0676     7.4      U
      R!rowf.var 3142       419.7        7.5      P
      R!colf.cor    0.1994    0.0917     2.2      U
```
]]
.column.bg-white[.content[
## Sample Variogram

]]

---

```r
fit_nugget <- asreml(yield ~ lcol + gen + colcode + rowcode, 
        random=~rep + spl(lcol) + colf + rowf + units, 
        data=dat, trace=F,
        rcov=~ar1v(rowf):ar1(colf))
fit_nugget <- update(fit_nugget)
lucid::vc(fit_nugget)
```

```
            effect component std.error z.ratio constr
       rep!rep.var    0             NA      NA      B
         spl(lcol) 2163      2131         1         P
     colf!colf.var  811.6    1340         0.61      P
     rowf!rowf.var  200.6     148.9       1.3       P
   units!units.var 1315       259.1       5.1       P
        R!variance    1             NA      NA      F
        R!rowf.cor    0.9076    0.0851   11         U
        R!rowf.var 3153      1936         1.6       P
        R!colf.cor    0.3818    0.1706    2.2       U
```
]]
.column.bg-white[.content[
## Sample Variogram

]]

---

.column[.content[
# All Models Fitted
<iframe src="html_extras/table15.html" width="100%" height="380" scrolling="yes" seamless="seamless" frameBorder="0"> </iframe>
.font_sm80[
* Some models have different fixed effects and thus comparison of models based on residual likelihood will be inappropriate. 
* The models fitted are not entirely the same as Gilmour et al. (1997), e.g. I did not choose to fit a linear `row` effect as I felt that was an overkill but I chose to continue fitting the `rep` effect as this was part of the experimental design. The effect of `rep` is essentially zero so it does not make much difference. 
]]]
.column.bg-blue[.content[

]]

---

.column.bg-yellow[.content.font_sm80[
# Key References

* Gilmour et al. (1997) Accounting for Natural and Extraneous Variation in the Analysis of Field Experiments. *Journal of Agricultural, Biological, and Environmental Statistics* **2**(3) 269-293

## Extra References 
These were not covered in the workshop but statistical tools good to be aware of. 
* Stefanova et al. (2010) Enhanced diagnostics for the spatial analysis of field trials. *Journal of Agricultural, Biological, and Environmental Statistics* **14**(4) 392-410
* Velazco et al. (2017) Modelling spatial trends in sorghum breeding field trials using a two-dimensional P-spline mixed model. *Theoretical and Applied Genetics* **130**(7) 1375-1392
* Rodríguez-Álvarez et al. (2018) Correcting for spatial heterogeneity in plant breeding experiments with P-splines. *Spatial Statisitcs* **23** 52-71

]]
.column.bg-main1.white[.content[
# Notes

* Again remember (because it's worth repeating):
.center[
"All models are wrong but some are useful."
<span style="float:right">-George Box</span>
]
* More complex models generally fit the data well but model selection seek to find a parsimonious model. 
* Model selection is a hard statistical problem especially for linear mixed models. There is no universal model selection tools that work for every data.

]]

---

class: split-40 title-slide2 with-border white
background-image: url("images/bg4.jpg")
background-size: cover

]]
.column.shade_main[.content[
<br><br>

# <u>Slides</u>

These slides were made using the R package [`xaringan`](https://github.com/yihui/xaringan) with the [`ninja-themes`](https://github.com/emitanaka/ninja-theme) and is available at [`bit.ly/UT-WS-spatial`](http://bit.ly/UT-WS-spatial).

# <u>Your Turn</u>

<s>Download `day3-session05-spatial-analysis-tutorial.Rmd` here, open in RStudio, push the button "Run Document" on the top tab and work through the exercises.</s> 
For workshop participants, contact Emi for the tutorials.
<br><br>

### <a rel='license' href='http://creativecommons.org/licenses/by-sa/4.0/'><img alt='Creative Commons License' style='border-width:0; width:30pt' src='images/cc.svg' /><img alt='Creative Commons License' style='border-width:0; width:20pt' src='images/by.svg' /><img alt='Creative Commons License' style='border-width:0; width:20pt' src='images/sa.svg' /></a><span style='font-size:10pt'>&nbsp;&nbsp;&nbsp;&nbsp;This work by <span xmlns:cc='http://creativecommons.org/ns#' property='cc:attributionName'>Emi Tanaka</span> is licensed under a <a rel='license' href='http://creativecommons.org/licenses/by-sa/4.0/'>Creative Commons Attribution-ShareAlike 4.0 International License</a>.</span>

]]

Notes for current slide

Notes for next slide

Statistical Methods for Omics Assisted Breeding

Spatial Analysis of Crop Field Trials

Emi Tanaka
emi.tanaka@sydney.edu.au
School of Mathematics and Statisitcs

2018/11/14

These slides may take a while to render properly. You can find the pdf here.

This work by Emi Tanaka is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

1 / 21

Example: Wheat Yield in South Australia

Gilmour et al. (1997) Accounting for natural and extraneous variation in the analysis of field experiments. Journal of Agricultural, Biological, and Environmental Statistics 2 269-293.

data(gilmour.serpentine, 
     package="agridat")

A randomised complete block experiment of wheat in South Australia.
The numbers represent the genotype number.
The color represent the yield with darker green higher yield and yellow the lowest.
The plot is made using desplot. Try replicating it on your own!
Note row and column contain more than once of the same genotype.

2 / 21

Skimming the data
gilmour.serpentine %>% skimr::skim()

  Skim summary statistics
   n obs: 330 
   n variables: 5 
  ── Variable type:factor ─────────────────────────────────────────────────────────────────────────────────────
   variable missing   n n_unique ordered
        gen       0 330      107   FALSE
        rep       0 330        3   FALSE
  ── Variable type:integer ────────────────────────────────────────────────────────────────────────────────────
   variable missing   n   mean     sd  p0 p25   p50   p75 p100     hist
        col       0 330   8      4.33   1   4   8    12     15 ▇▇▇▇▃▇▇▇
        row       0 330  11.5    6.35   1   6  11.5  17     22 ▇▇▅▇▇▅▇▇
      yield       0 330 591.77 154.19 194 469 617.5 713.5  925 ▁▂▅▅▆▇▅▁
dat <- gilmour.serpentine %>% mutate(rowf=factor(row), colf=factor(col))

3 / 21

Baseline Analysis for Randomised Complete Block Design

fit_base <- asreml(yield ~ gen, random=~rep, data=dat, trace=F)

        effect component std.error z.ratio constr
   rep!rep.var     12740     12860    0.99      P
    R!variance     13360      1271   11         P

Spatial Modelling

fit_ar1xar1 <- asreml(yield ~ gen, random=~rep, data=dat, trace=F, 
                     rcov=~ar1v(rowf):ar1(colf))

  Error in asreml.chkOrd(Y, Var, dataFrame): Data frame order does not match that specified by the R model formula.

As the error message implies, the data.frame is not in expected order.
The rcov structure implies that col needs to be ordered within row.

Note Gilmour et al. (1997) did not include the rep effect but we include it here to respect the experimental design.

4 / 21

Spatial Modelling: Local Trend

# order data so col within row
dat <- dat %>% arrange(row, col)
# now no error message
fit_ar1xar1 <- asreml(yield ~ gen, random=~rep, data=dat, trace=F, 
                     rcov=~ar1v(rowf):ar1(colf))
# see variance parameter estimates
lucid::vc(fit_ar1xar1)

        effect component std.error z.ratio constr
   rep!rep.var     0            NA      NA      B
    R!variance     1            NA      NA      F
    R!rowf.cor     0.904     0.018    49        U
    R!rowf.var 19700      3797         5.2      P
    R!colf.cor     0.425     0.064     6.6      U

General idea: plots that are geographically closer, should be more similar.

5 / 21

Why AR1 $\times$ AR1?

rcov=~ ar1v(row):ar1(col)

Low additional number of parameters (three) to estimate but yet the structure is flexible and anisotropic.
For positive autocorrelation, the structure captures the general idea that the plots that are geographically close together are more similar.
The AR1 $\times$ AR1 structure generally fits well in practice.

The distance of plots are typically measured by the row and column position with distance in the row and column direction thought to be equidistant respectively.
Anisotropic means correlation in all directions can be different.
For example, the correlation of plots that are two rows apart and three column is $\rho_r^2\rho_c^3$.

6 / 21

Other separable processes

If you don't believe that plots are similar in say a column direction then you can model such that the plots are uncorrelated in column direction.

rcov=~ ar1v(row):id(col)

You can of course do other combinations:

rcov=~ id(row):ar1v(col)
rcov=~ idv(row):id(col)

Log residual likelihood

fit_base$loglik

  [1] -1235.222

fit_ar1xar1$loglik

  [1] -1103.19

lrt(fit_ar1xar1, fit_base)

  Likelihood ratio test(s) assuming nested random models.
  Chisq probability adjusted using Stram & Lee, 1994.
                       Df LR statistic Pr(Chisq)    
  fit_ar1xar1/fit_base  1       264.06 < 2.2e-16 ***
  ---
  Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Certainly the fit of AR1$\times$AR1 model increases

7 / 21

Residual Plot

dat %>% 
 mutate(resid=fit_ar1xar1$residuals) %>% 
 ggplot(aes(col, resid)) + 
 geom_point() + 
 geom_smooth(se=F) + 
 facet_wrap(~paste0("Row ", row), nrow=4) + 
 theme_bw(base_size=18) + 
 labs(x="Column", 
       y="Conditional Residual")

There is clearly a pattern in the column direction indicating there is some effect in relation to the column we have not removed.

8 / 21

Sample Variograms for Residuals

Sample variogram is made of:

Variogram is a popular method to detect spatial dependence.
The idea here is that if we have modelled the spatial dependence appropriately in the model, the residuals should not exhibit any spatial dependence.
In theory, variogram will start with (0,0) and if no spatial dependence is exhibited then it should flatten out.
In practice, there is high variability for sample variogram of plots furthest away due to less information so it may not be as flat as you like.

9 / 21

Sample Variograms for Residuals

vario_df <- variogram(fit_ar1xar1)
x <- 0:max(vario_df$colf)
y <- 0:max(vario_df$rowf)
z <- matrix(vario_df$gamma, 
            nrow=length(x), 
            ncol=length(y))
plotly::plot_ly(x=x, y=y, z=z, 
                type="surface")

You can also:

myf::metplot(fit_ar1xar1)

10 / 21

Smooth global trend

Smooth global trend is incorporated using cubic smoothing spline indexed by the column number.

dat <- dat %>% mutate(lcol=scale(col, scale=F)[,1])
fit_spl <- asreml(yield ~ lcol + gen, 
               random=~rep + spl(lcol), 
               data=dat, trace=F,
               rcov=~ar1v(rowf):ar1(colf))
fit_spl <- update(fit_spl)
lucid::vc(fit_spl)

        effect component std.error z.ratio constr
   rep!rep.var     0            NA      NA      B
     spl(lcol)  3275      3231         1        P
    R!variance     1            NA      NA      F
    R!rowf.cor     0.794     0.034    24        U
    R!rowf.var  8697      1501         5.8      P
    R!colf.cor     0.27      0.077     3.5      U

Sample Variogram

11 / 21

Extraneous Variation

We fit a random column effect.

fit_rcol <- asreml(yield ~ lcol + gen, 
               random=~rep + spl(lcol) + colf, 
               data=dat, trace=F,
               rcov=~ar1v(rowf):ar1(colf))
fit_rcol <- update(fit_rcol)
lucid::vc(fit_rcol)

          effect component std.error z.ratio constr
     rep!rep.var     0            NA      NA      B
       spl(lcol)  3068     3088         0.99      P
   colf!colf.var  4167     1984         2.1       P
      R!variance     1            NA      NA      F
      R!rowf.cor     0.431    0.0688    6.3       U
      R!rowf.var  3590      444.6       8.1       P
      R!colf.cor     0.359    0.0692    5.2       U

Sample Variogram

12 / 21

Possible other extraneous variation?

Could management practice have impacted the yield? Each plot were sprayed with one of the four directions: , , and .

dat <- dat %>% mutate(colcode=case_when(
  col%%4==1 ~ "updown",
  col%%4==2 ~ "downdown",
  col%%4==3 ~ "downup",
  col%%4==0 ~ "upup"))

13 / 21

Extraneous Variation

We fit a fixed colcode effect.

fit_colcode <- asreml(yield ~ lcol + gen + colcode, 
               random=~rep + spl(lcol) + colf, 
               data=dat, trace=F,
               rcov=~ar1v(rowf):ar1(colf))
fit_colcode <- update(fit_colcode)
lucid::vc(fit_colcode)

          effect component std.error z.ratio constr
     rep!rep.var     0            NA      NA      B
       spl(lcol)  2236     2055          1.1      P
   colf!colf.var  1382      909.9        1.5      P
      R!variance     1            NA      NA      F
      R!rowf.cor     0.433    0.0686     6.3      U
      R!rowf.var  3597      446.2        8.1      P
      R!colf.cor     0.36     0.0692     5.2      U

Sample Variogram

14 / 21

Extraneous Variation

We fit a random row effect.

fit_rrow <- asreml(yield ~ lcol + gen + colcode, 
               random=~rep + spl(lcol) + colf + rowf, 
               data=dat, trace=F,
               rcov=~ar1v(rowf):ar1(colf))
fit_rrow <- update(fit_rrow)
lucid::vc(fit_rrow)

          effect component std.error z.ratio constr
     rep!rep.var    0             NA      NA      B
       spl(lcol) 2250      2050          1.1      P
   colf!colf.var 1326       906.8        1.5      P
   rowf!rowf.var  531       265.1        2        P
      R!variance    1             NA      NA      F
      R!rowf.cor    0.4775    0.0694     6.9      U
      R!rowf.var 3015       397.1        7.6      P
      R!colf.cor    0.1956    0.0922     2.1      U

Sample Variogram

15 / 21

Possible more extraneous variation?

The tractor harvested 3 rows at a time. L indicates the left of the tractor; M is middle and R is right. If the driver misaligns the center, it may affect the observed yield.

dat <- dat %>% mutate(rowcode=case_when(
  row%%3==0 ~ "M",
  TRUE ~ "LR"))

16 / 21

Extraneous Variation

We fit a random row effect.

fit_rowcode <- asreml(yield ~ lcol + gen + colcode + rowcode, 
               random=~rep + spl(lcol) + colf + rowf, 
               data=dat, trace=F,
               rcov=~ar1v(rowf):ar1(colf))
fit_rowcode <- update(fit_rowcode)
lucid::vc(fit_rowcode)

          effect component std.error z.ratio constr
     rep!rep.var    0             NA      NA      B
       spl(lcol) 2235      2037          1.1      P
   colf!colf.var 1319       914.7        1.4      P
   rowf!rowf.var  235.8     180          1.3      P
      R!variance    1             NA      NA      F
      R!rowf.cor    0.5012    0.0676     7.4      U
      R!rowf.var 3142       419.7        7.5      P
      R!colf.cor    0.1994    0.0917     2.2      U

Sample Variogram

17 / 21

Extraneous Variation

We fit a nugget/measurement error effect.

fit_nugget <- asreml(yield ~ lcol + gen + colcode + rowcode, 
        random=~rep + spl(lcol) + colf + rowf + units, 
        data=dat, trace=F,
        rcov=~ar1v(rowf):ar1(colf))
fit_nugget <- update(fit_nugget)
lucid::vc(fit_nugget)

            effect component std.error z.ratio constr
       rep!rep.var    0             NA      NA      B
         spl(lcol) 2163      2131         1         P
     colf!colf.var  811.6    1340         0.61      P
     rowf!rowf.var  200.6     148.9       1.3       P
   units!units.var 1315       259.1       5.1       P
        R!variance    1             NA      NA      F
        R!rowf.cor    0.9076    0.0851   11         U
        R!rowf.var 3153      1936         1.6       P
        R!colf.cor    0.3818    0.1706    2.2       U

Sample Variogram

18 / 21

All Models Fitted

Some models have different fixed effects and thus comparison of models based on residual likelihood will be inappropriate.
The models fitted are not entirely the same as Gilmour et al. (1997), e.g. I did not choose to fit a linear row effect as I felt that was an overkill but I chose to continue fitting the rep effect as this was part of the experimental design. The effect of rep is essentially zero so it does not make much difference.

19 / 21

Key References

Gilmour et al. (1997) Accounting for Natural and Extraneous Variation in the Analysis of Field Experiments. Journal of Agricultural, Biological, and Environmental Statistics 2(3) 269-293

Extra References

These were not covered in the workshop but statistical tools good to be aware of.

Stefanova et al. (2010) Enhanced diagnostics for the spatial analysis of field trials. Journal of Agricultural, Biological, and Environmental Statistics 14(4) 392-410
Velazco et al. (2017) Modelling spatial trends in sorghum breeding field trials using a two-dimensional P-spline mixed model. Theoretical and Applied Genetics 130(7) 1375-1392
Rodríguez-Álvarez et al. (2018) Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spatial Statisitcs 23 52-71

Notes

Again remember (because it's worth repeating):
"All models are wrong but some are useful." -George Box
More complex models generally fit the data well but model selection seek to find a parsimonious model.
Model selection is a hard statistical problem especially for linear mixed models. There is no universal model selection tools that work for every data.

20 / 21

Slides

These slides were made using the R package xaringan with the ninja-themes and is available at bit.ly/UT-WS-spatial.

Your Turn

~~Download day3-session05-spatial-analysis-tutorial.Rmd here, open in RStudio, push the button "Run Document" on the top tab and work through the exercises.~~ For workshop participants, contact Emi for the tutorials.

This work by Emi Tanaka is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

21 / 21

Example: Wheat Yield in South Australia

Gilmour et al. (1997) Accounting for natural and extraneous variation in the analysis of field experiments. Journal of Agricultural, Biological, and Environmental Statistics 2 269-293.

data(gilmour.serpentine, 
     package="agridat")

A randomised complete block experiment of wheat in South Australia.
The numbers represent the genotype number.
The color represent the yield with darker green higher yield and yellow the lowest.
The plot is made using desplot. Try replicating it on your own!
Note row and column contain more than once of the same genotype.

2 / 21

Paused

Help

Keyboard shortcuts

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Esc	Back to slideshow

Statistical Methods for Omics Assisted Breeding

Spatial Analysis of Crop Field Trials

Emi Tanaka emi.tanaka@sydney.edu.au School of Mathematics and Statisitcs

2018/11/14

These slides may take a while to render properly. You can find the pdf here.

This work by Emi Tanaka is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Example: Wheat Yield in South Australia

Skimming the data

Baseline Analysis for Randomised Complete Block Design

Spatial Modelling

Spatial Modelling: Local Trend

Why AR1 \(\times\) AR1?

Other separable processes

Log residual likelihood

Residual Plot

Sample Variograms for Residuals

Sample Variograms for Residuals

Smooth global trend

Sample Variogram

Extraneous Variation

Sample Variogram

Possible other extraneous variation?

Extraneous Variation

Sample Variogram

Extraneous Variation

Sample Variogram

Possible more extraneous variation?

Extraneous Variation

Sample Variogram

Extraneous Variation

Sample Variogram

All Models Fitted

Key References

Extra References

Notes

Slides

Your Turn

This work by Emi Tanaka is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Example: Wheat Yield in South Australia

Help

Emi Tanaka
emi.tanaka@sydney.edu.au
School of Mathematics and Statisitcs