Processing math: 100%

background-color: #006DAE
class: middle center hide-slide-number

<div class="shade_black"  style="width:60%;right:0;bottom:0;padding:10px;border: dashed 4px white;margin: auto;">
<i class="fas fa-exclamation-circle"></i> These slides are viewed best by Chrome and occasionally need to be refreshed if elements did not load properly. See <a href=edibble-slides.pdf>here for PDF <i class="fas fa-file-pdf"></i></a>. 
</div>

<br>

---

background-image: url(images/bg1.jpg)
background-size: cover
class: hide-slide-number split-70 title-slide
count: false

.column.shade_black[.content[

<br>

# .monash-blue.outline-text[Advent of "Grammar"]

<br>

<h2 style="font-weight:900!important;">Bridging Statistics and Data Science for the Design of Experiments</h2>

.bottom_abs.width100[

**Emi Tanaka**

Department of Econometrics and Business Statistics

<span><i class="fas  fa-envelope faa-float animated "></i></span>  emi.tanaka@monash.edu
<span><i class="fab  fa-twitter faa-float animated "></i></span>  @statsgen

27th Nov 2020 @ Statistical Society of Australia

<br>
]

]]

---

<div class="bg-white" style="width:900px;margin:auto;color:black;border:2px solid black;border-radius: 5px;padding:3px;background:">
<div class="center" style="color:white;border-radius: 5px;background:#006DAE;font-size:35px;font-weight:900;">NOTICE</div>
<div style="padding:10px;">
<ul>
<li> <b>This talk is most relevant for people who code</b> <i class="fas fa-code"></i></li>
<li><b>Examples are all in R but philosophy holds for other programming languages</b></li>
</ul>
</div>
</div>
<div>

<br>

</div>

---

# .circle-big[1]

# Grammar of Graphics

<i class="fab fa-r-project blue"></i> `ggplot2` 📦

.footnote.monash-bg-blue[
Wickham (2016) ggplot2: Elegant Graphics for Data Analysis. *Springer-Verlag New York*

Wilkinson (2005) The Grammar of Graphics. *Statistics and Computing. Springer, 2nd edition.*
]

---

## A typical Australian academic is expected to do 40% teaching, 40% research and 20% administration

.font_small[and like 200% external service work, outreach, writing grants, reviewing things, sitting on panels, learning how to do online teaching,...]

---

# What are the differences between these plots?

]
.item[

![](images/unnamed-chunk-2-1.png)

]
]

<div style="position:absolute; bottom:10%;left:30%;">
How do you construct these plots?
</div>

---

# Using Base R

```r
df
```

```
##       duty perc
## 1 Teaching   40
## 2 Research   40
## 3    Admin   20
```
]
.item.border-right[

```r
barplot(as.matrix(df$perc),
        legend = df$duty)
```

![](images/barplot-1.png)

]
.item[

```r
pie(df$perc, labels = df$duty)
```

![](images/pie-1.png)

]
]

.footnote[
R Core Team (2020) R: A Language and Environment for Statistical Computing https://www.R-project.org/
]

<div class="show-on-hover" style="padding:10px;position:absolute;bottom: 25px; right: 10px;">
<a href="#edibble" style="font-size:12pt"><i class="fas fa-caret-right"></i> edibble</a>
</div>

<div class="corner-box" style="bottom:50px;">
<ul>
<li><b>Single purpose functions</b> to generate "named plots"</li>
<li><b>Input</b> varies, here it is vector or matrix</li>
</ul>
</div>

---

# Using `ggplot2`

```r
df
```

```
##       duty perc
## 1 Teaching   40
## 2 Research   40
## 3    Admin   20
```
]
.item.border-right[

```r
ggplot(df, aes(x = "", # dummy
               y = perc, 
               fill = duty)) + 
  geom_col()
```

![](images/ggbarplot-1.png)

]
.item[

]
]

.footnote[
Wilkinson (2005) The Grammar of Graphics. *Statistics and Computing. Springer, 2nd edition.*

Wickham (2008) Practical Tools for Exploring Data and Models. *PhD Thesis Chapter 3: A layered grammar of graphics*.

Wickham (2010) A Layered Grammar of Graphics, *Journal of Computational and Graphical Statistics, 19:1, 3-28*

]

```r
ggplot(df, aes(x = "", # dummy
               y = perc, 
               fill = duty)) + 
  geom_col() + 
* coord_polar(theta = "y")
```

![](images/ggpie-1.png)

.corner-box[
* `ggplot2` implements the **grammar of graphics**
* the difference between a **stacked barplot** and a **pie chart** is that the coordinate system have been transformed from **Cartesian coordinate** to **polar coordinate**
]

---

# Data <i class="fas fa-exchange-alt"></i> Plot

* `ggplot` uses **tidy data** as input so plot construction is enforced by consistent thinking in relation to tidy data
<details style="font-size:15pt;">
<summary>Tidy data principles</summary>
<ol>
<li>Each variable must have its own column.</li>
<li>Each observation must have its own row.</li>
<li>Each value must have its own cell.</li>
</ol>
</details>
* Variables are mapped to a plot aesthetic
* Plots are constructed from its components expressed by the **grammar of graphics**

]
.item[

]
]

]

---

## Do you like potatoes?

<br>

## .gray[Let's have a look at a potato field experiment to understand the] infection rate .gray[when it's treated with] sulfur at different season and amount.gray[?]

---

# <i class="fas fa-puzzle-piece"></i> How do we get these graphs?

```r
data(cochran.crd,
     package = "agridat")

cochran.crd
```

```
##    inf trt row col
## 1    9  F3   4   1
## 2   12   O   4   2
## 3   18  S6   4   3
## 4   10 F12   4   4
## 5   24  S6   4   5
## 6   17 S12   4   6
## 7   30  S3   4   7
## 8   16  F6   4   8
## 9   10   O   3   1
## 10   7  S3   3   2
## 11   4 F12   3   3
## 12  10  F6   3   4
## 13  21  S3   3   5
## 14  24   O   3   6
## 15  29   O   3   7
## 16  12  S6   3   8
## 17   9  F3   2   1
## 18   7 S12   2   2
## 19  18  F6   2   3
## 20  30   O   2   4
## 21  18  F6   2   5
## 22  16 S12   2   6
## 23  16  F3   2   7
## 24   4 F12   2   8
## 25   9  S3   1   1
## 26  18   O   1   2
## 27  17 S12   1   3
## 28  19  S6   1   4
## 29  32   O   1   5
## 30   5 F12   1   6
## 31  26   O   1   7
## 32   4  F3   1   8
```

]
.item[

![](images/field1-1.png)
![](images/field2-1.png)

]
.item[

]
]

.footnote[
Kevin Wright (2020). agridat: Agricultural Datasets. R
  package version 1.17.
  
W.G. Cochran and G. Cox, 1957. Experimental Designs, 2nd ed. John Wiley, New York.
]

```r
ggplot(cochran.crd, 
       aes(x = col, 
           y = row, 
*          fill = inf)) +
  geom_tile(color = "black", size = 1.2) + 
* scale_fill_gradient(low = "gold",
*                     high = "firebrick4") +
  theme(text = element_text(size = 20))
```

```r
ggplot(cochran.crd, 
       aes(x = col, 
           y = row, 
*          fill = trt)) +
  geom_tile(color = "black", size = 1.2) +
* scale_fill_viridis_d() +
  theme(text = element_text(size = 20))
```

---

# <i class="fas fa-chalkboard-teacher"></i> Lessons from the grammar of graphics<br> in `ggplot2`

<br>
<div class="info-box" style="margin-right:100px;margin-left:100px;text-align:left;">
<ul>
<li> <b>Modifiable</b>: <code>ggplot</code> object can be modified</li>
{{content}}
</ul>
</div>

<li> <b>Generalisable</b>: Commands may feel verbose (compared to single-graphics functions) but you're less limited in the plots you can construct without an exhaustive list of functions (compared to the number of plots you can create)</li>
{{content}}
--

<li> <b>Extensible</b>: the system can be extended to make specialised plots or add more features if the same "grammar" is adopted</li>

---

# .circle-big[2]

# Grammar of <br>Data Manipulation

<i class="fab fa-r-project blue"></i> `dplyr` 📦

.footnote.monash-bg-blue[
Wickham, François, Henry & Müller (2020) dplyr: A Grammar of Data Manipulation. R-package version 1.0.0.
]

---

# What does the code below do?

```r
# part 1
tapply(cochran.crd$inf, cochran.crd$trt, mean)
```

```
#    F12     F3     F6      O    S12     S3     S6 
#  5.750  9.500 15.500 22.625 14.250 16.750 18.250
```

```r
df <- cochran.crd

# part 2
df$season <- ifelse(grepl("F", df$trt), 
                    "Fall", 
                    "Spring")
df$season[df$trt=="O"] <- "N/A"

# part 3
df$sulfur <- sapply(df$trt, 
    function(x) switch(as.character(x), 
          F3 = 300, F6 = 600, F12 = 1200,
          S3 = 300, S6 = 600, S12 = 1200,
          O = 0)) # pounds/acre
```
]
.item[

```r
# part 4, move trt after col
df[, c("inf", "row", "col", "trt", 
       "season", "sulfur")]
```

```
#    inf row col trt season sulfur
# 1    9   4   1  F3   Fall    300
# 2   12   4   2   O    N/A      0
# 3   18   4   3  S6 Spring    600
# 4   10   4   4 F12   Fall   1200
# 5   24   4   5  S6 Spring    600
# 6   17   4   6 S12 Spring   1200
# 7   30   4   7  S3 Spring    300
# 8   16   4   8  F6   Fall    600
# 9   10   3   1   O    N/A      0
# 10   7   3   2  S3 Spring    300
# 11   4   3   3 F12   Fall   1200
# 12  10   3   4  F6   Fall    600
# 13  21   3   5  S3 Spring    300
# 14  24   3   6   O    N/A      0
# 15  29   3   7   O    N/A      0
# 16  12   3   8  S6 Spring    600
# 17   9   2   1  F3   Fall    300
# 18   7   2   2 S12 Spring   1200
# 19  18   2   3  F6   Fall    600
# 20  30   2   4   O    N/A      0
# 21  18   2   5  F6   Fall    600
# 22  16   2   6 S12 Spring   1200
# 23  16   2   7  F3   Fall    300
# 24   4   2   8 F12   Fall   1200
# 25   9   1   1  S3 Spring    300
# 26  18   1   2   O    N/A      0
# 27  17   1   3 S12 Spring   1200
# 28  19   1   4  S6 Spring    600
# 29  32   1   5   O    N/A      0
# 30   5   1   6 F12   Fall   1200
# 31  26   1   7   O    N/A      0
# 32   4   1   8  F3   Fall    300
```

]

---

# Which is easier to read for you? .font_small[Part 1]

```r
# part 1
tapply(cochran.crd$inf, cochran.crd$trt, mean)
```

```
#    F12     F3     F6      O    S12     S3     S6 
#  5.750  9.500 15.500 22.625 14.250 16.750 18.250
```

]
.item[

```r
# part 1
cochran.crd %>% 
  group_by(trt) %>% 
  summarise(inf_mean = mean(inf))
```

```
# # A tibble: 7 x 2
#   trt   inf_mean
#   <fct>    <dbl>
# 1 F12       5.75
# 2 F3        9.5 
# 3 F6       15.5 
# 4 O        22.6 
# 5 S12      14.2 
# 6 S3       16.8 
# 7 S6       18.2
```
]

]

--
* Can you guess what `tapply` does if you don't know about it?
{{content}}
--
* <i class="fas fa-info-circle animated flash"></i> Functions are less accessible to broader audiences if function names and argument names are designed so:
  * specialist knowledge is required,
  * approach is ad-hoc, and/or
  * unintuitively designed for either input and/or output

---

# Which is easier to read for you? .font_small[Part 2]

```r
df <- cochran.crd

# part 2
df$season <- ifelse(grepl("F", df$trt), 
                    "Fall", 
                    "Spring")
df$season[df$trt=="O"] <- "N/A"

# part 3
df$sulfur <- sapply(df$trt, 
    function(x) switch(as.character(x), 
          F3 = 300, F6 = 600, F12 = 1200,
          S3 = 300, S6 = 600, S12 = 1200,
          O = 0)) # pounds/acre

# part 4, move trt after col
df[, c("inf", "row", "col", "trt", 
       "season", "sulfur")]
```

]
.item[

```r
cochran.crd %>% 
  # part 2 & 3
  mutate(season = case_when(
                    str_detect(trt, "F") ~ "Fall",
                    str_detect(trt, "S") ~ "Spring",
                                    TRUE ~ "N/A"),
         sulfur = case_when(
                    str_detect(trt, "12") ~ 1200,
                    str_detect(trt, "6") ~ 600,
                    str_detect(trt, "3") ~ 300,
                                     TRUE ~ 0)) %>% 
  # part 4
  relocate(trt, .after = col) %>% 
  as_tibble() # optional
```

```
# # A tibble: 32 x 6
#      inf   row   col trt   season sulfur
#    <int> <int> <int> <fct> <chr>   <dbl>
#  1     9     4     1 F3    Fall      300
#  2    12     4     2 O     N/A         0
#  3    18     4     3 S6    Spring    600
#  4    10     4     4 F12   Fall     1200
#  5    24     4     5 S6    Spring    600
#  6    17     4     6 S12   Spring   1200
#  7    30     4     7 S3    Spring    300
#  8    16     4     8 F6    Fall      600
#  9    10     3     1 O     N/A         0
# 10     7     3     2 S3    Spring    300
# # … with 22 more rows
```
]
]

---

# Which is easier to read for you? .font_small[Part 3]

.center[
🎯 find the mean of `inf` by the combination of variables `season` and `sulfur`<br>
**How would you modify the code below?**
]

```r
tapply(df$inf, df$trt, mean)
```

```
##    F12     F3     F6      O    S12     S3     S6 
##  5.750  9.500 15.500 22.625 14.250 16.750 18.250
```

]
.item[

```r
df %>% 
  group_by(trt) %>% 
  summarise(inf_mean = mean(inf))
```

```
## # A tibble: 7 x 2
##   trt   inf_mean
##   <fct>    <dbl>
## 1 F12       5.75
## 2 F3        9.5 
## 3 F6       15.5 
## 4 O        22.6 
## 5 S12      14.2 
## 6 S3       16.8 
## 7 S6       18.2
```
]

]

---

# Which is easier to read for you? .font_small[Part 3]

.center[
🎯 find the mean of `inf` by the combination of variables `season` and `sulfur`<br>
**How would you modify the code below?**
]

```r
tapply(df$inf, list(df$season, df$sulfur), mean)
```

```
##             0   300   600  1200
## Fall       NA  9.50 15.50  5.75
## N/A    22.625    NA    NA    NA
## Spring     NA 16.75 18.25 14.25
```
Notice that the above output is `matrix`.
{{content}}

]
.item[

```r
df %>% 
  group_by(season, sulfur) %>% 
  summarise(inf_mean = mean(inf))
```

```
## # A tibble: 7 x 3
## # Groups:   season [3]
##   season sulfur inf_mean
##   <chr>   <dbl>    <dbl>
## 1 Fall      300     9.5 
## 2 Fall      600    15.5 
## 3 Fall     1200     5.75
## 4 N/A         0    22.6 
## 5 Spring    300    16.8 
## 6 Spring    600    18.2 
## 7 Spring   1200    14.2
```
]

]

This works also:

```r
tapply(df$inf, paste0(df$season, df$sulfur), mean)
```

```
##   Fall1200    Fall300    Fall600       N/A0 Spring1200 
##      5.750      9.500     15.500     22.625     14.250 
##  Spring300  Spring600 
##     16.750     18.250
```

---

# Reading code like English grammar

```r
df <- cochran.crd %>% 
  mutate(season = case_when(
                    str_detect(trt, "F") ~ "Fall",
                    str_detect(trt, "S") ~ "Spring",
                                    TRUE ~ "N/A"),
         sulfur = case_when(
                    str_detect(trt, "12") ~ 1200,
                    str_detect(trt, "6") ~ 600,
                    str_detect(trt, "3") ~ 300,
                                     TRUE ~ 0)) %>% 
  relocate(trt, .after = col) %>% 
  as_tibble() # optional

df %>% 
  group_by(season, sulfur) %>% 
  summarise(inf_mean = mean(inf))
```

]
.item[

* Read `%>%` as "and then"
* The code may read more familiar to an English user without much specialist knowledge
* Of course this is biased in favour of people who know English
* Bonus: no intermediate naming to think or worry about

]
]

---

# <i class="fas fa-chalkboard-teacher"></i> Lessons from grammar of data manipulation<br> in `dplyr`

<br>
<div class="info-box" style="margin-right:100px;margin-left:100px;text-align:left;">
<ul>
<li><b>Leveraging existing knowledge</b> of an everyday user or an "average" statistician for the choice of function names and argument names can make it more accessible</li>
{{content}}
</ul>
</div>
--

<li><b>Making it more accessible</b>, by making code more expressive or otherwise, can promote statistical literacy</li>
{{content}}

<li><b>Consistency in input and output</b> makes it easier for user expectation and helps to offload mental load</li>

---

## Can all this work be considered as statistics?

## Or data science?

<br>

<div class="info-box" style="margin-right:100px;margin-left:100px;text-align:left;font-size:18pt;">
Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to <b>extract knowledge and insights from many structural and unstructured data</b>. Data science is related to data mining, machine learning and big data.
<span style="float:right"><i class="fas fa-book"></i> <i>Wikipedia, accessed on 27th Nov 2020</i></span>
<br><br>
</div>

<br>

## Isn't this what statistics is about?

---

## There are two types of people...

<br>

---

## There are two types of people...

<br>

<br>

## But actually, it's hugely unbalanced

---

## Statistics by nature is inter-disciplinary and about extracting knowledge and insight from data 
.font_small[at least from my point of view]

## but communication largely became exclusive to selective set of statistically well-trained people governed by strict adherence to defintions or protocols in lieu of practicality
.font_small[which is not to say it is bad to do this]

---

## Far more people need to use statistics than just statistically well-trained people

<br>

.footnote[
Wickham (2015) Teaching Safe-Stats, Not Statistical Abstinence. *The American Statistician, Online Discussion*
]

## Statistics community need to go beyond talking among like-minded peers

---

# .circle-big[3]

# Grammar of <br>Experimental Design

<i class="fab fa-r-project blue"></i> `edibble` 📦

(Work-In-Progress)

---

# Typical course in experimental design <br>.font_small[(at least at University of Sydney in 2017-2019)]

Teach:

* Completely Randomised Design
* Randomised Complete Block Design
* Latin Square Design
* Balanced Incomplete Block Design
* Factorial Design
* <strike> 2</strike><sup>k</sup><strike> Factorial Design</strike> .font_small[(I removed this from 2018, I won't talk about this today)]
* Split-plot Design .font_small[(I added this from 2018 among other concepts)]

---

# Completely Randomised Design (CRD)

<br>

]

* `$t$` treatments randomised to `$n$` units

<br>

`$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{treatment} + \texttt{error}$$`

]

---

# Randomised Complete Block Design (RCBD)

<br>

]

* `$b$` blocks of size `$t$`
* `$t$` treatments randomised to `$t$` units within each block

`$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{treatment} + \texttt{block} + \texttt{error}$$`

]

---

# Latin Square Design (LSD)

<br>

]

* two orthogonal blocks of size `$t$`
* `$t$` treatments randomised to units such that every treatment appears exactly once in each block

`$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{treatment} + \texttt{row} + \texttt{column} + \texttt{error}$$`

]

---

# Balanced Incomplete Block Design (BIBD)
 
.grid[

<br>

]

* `$b$` blocks of size `$k < t$`
* `$t$` treatments randomised to units within each block such that every pair of treatment appears the same number of times across blocks

`$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{block} + \texttt{treatment} + \texttt{error}$$`

]

---

# Factorial Design

<br>

]

* `$ab = t$` treatments randomised to `$n$` units
* treatment is every combination of two factors A and B

`$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{A} + \texttt{B} + \texttt{A:B} + \texttt{error}$$`

<center>
<img src="images/factorial-eg1-anova-top.png" width = "850px"/>
<details style="font-size:4pt"><summary></summary>
<img src="images/factorial-eg1-anova-middle.png" width = "850px"/>
</details>
<img src="images/factorial-eg1-anova-bottom.png" width = "850px"/>
</center>
]

]

---

# Split-plot Design

<br>

]

<ul>
<li> $n_1$ whole plots consisting of $b$ sub plots</li>
<li>in total there are $n$ sub plots</li>
<li>treatment factor A is randomised to whole plots</li>
<li>treatment factor B is randomised to sub plots within each whole plot</li>
</ul>

`$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{A} + \texttt{WP} + \texttt{B} + \texttt{A:B} + \texttt{error}$$`

<center>
<img src="images/split-plot-eg1-anova.png" width = "850px"/>
</center>
]

</div>

]

---

# CRAN Task View of Design of Experiments

contains

# 📦 .yellow[229 R-packages ]

---

# Top 10 downloaded R-packages in 2018

![](images/top10_2018-1.png)

# `agricolae` is the most downloaded

---

# Top 10 downloaded R-packages in 2019

![](images/top10-1.png)

# Okay fine, results were different in 2019

---

# `agricolae::design.crd`

.blue[**Completely randomised design**] for `$t = 3$` treatments with `$2$` replicates each
<pre><code>
trt <- c("A", "B", "C")
agricolae::.bg-yellow[design.crd](trt = trt, r = 2) %>% glimpse()
</code></pre>

```
## List of 2
##  $ parameters:List of 7
##   ..$ design: chr "crd"
##   ..$ trt   : chr [1:3] "A" "B" "C"
##   ..$ r     : num [1:3] 2 2 2
##   ..$ serie : num 2
##   ..$ seed  : int 396269021
##   ..$ kinds : chr "Super-Duper"
##   ..$       : logi TRUE
##  $ book      :'data.frame':	6 obs. of  3 variables:
##   ..$ plots: num [1:6] 101 102 103 104 105 106
##   ..$ r    : int [1:6] 1 1 2 1 2 2
##   ..$ trt  : chr [1:6] "C" "A" "C" "B" ...
```
]

</div>

---

# `agricolae::design.rcbd`

<pre><code>
trt <- c("A", "B", "C")
agricolae::.bg-yellow[design.rcbd](trt = trt, r = 2) %>% glimpse()
</code></pre>

```
## List of 3
##  $ parameters:List of 7
##   ..$ design: chr "rcbd"
##   ..$ trt   : chr [1:3] "A" "B" "C"
##   ..$ r     : num 2
##   ..$ serie : num 2
##   ..$ seed  : int 973575053
##   ..$ kinds : chr "Super-Duper"
##   ..$       : logi TRUE
##  $ sketch    : chr [1:2, 1:3] "C" "A" "B" "C" ...
##  $ book      :'data.frame':	6 obs. of  3 variables:
##   ..$ plots: num [1:6] 101 102 103 201 202 203
##   ..$ block: Factor w/ 2 levels "1","2": 1 1 1 2 2 2
##   ..$ trt  : Factor w/ 3 levels "A","B","C": 3 2 1 1 3 2
```
]

<br>

</div>

---

# `agricolae::design.lsd()`

<pre><code>
trt <- c("A", "B", "C")
agricolae::.bg-yellow[design.lsd](trt = trt) %>% glimpse()
</code></pre>

```
## List of 3
##  $ parameters:List of 7
##   ..$ design: chr "lsd"
##   ..$ trt   : chr [1:3] "A" "B" "C"
##   ..$ r     : int 3
##   ..$ serie : num 2
##   ..$ seed  : int -1984440067
##   ..$ kinds : chr "Super-Duper"
##   ..$       : logi TRUE
##  $ sketch    : chr [1:3, 1:3] "B" "C" "A" "A" ...
##  $ book      :'data.frame':	9 obs. of  4 variables:
##   ..$ plots: num [1:9] 101 102 103 201 202 203 301 302 303
##   ..$ row  : Factor w/ 3 levels "1","2","3": 1 1 1 2 2 2 3 3 3
##   ..$ col  : Factor w/ 3 levels "1","2","3": 1 2 3 1 2 3 1 2 3
##   ..$ trt  : Factor w/ 3 levels "A","B","C": 2 1 3 3 2 1 1 3 2
```
]

<br>

</div>

---

# `agricolae::design.bib()`

<pre><code>
trt <- c("A", "B", "C")
agricolae::.bg-yellow[design.bib](trt = trt, k = 2) %>% glimpse()
</code></pre>

```
## [1] "No improvement over initial random design."
## 
## Parameters BIB
## ==============
## Lambda     : 1
## treatmeans : 3
## Block size : 2
## Blocks     : 3
## Replication: 2 
## 
## Efficiency factor 0.75 
## 
## <<< Book >>>
```

```
## List of 4
##  $ parameters:List of 6
##   ..$ design: chr "bib"
##   ..$ trt   : chr [1:3] "A" "B" "C"
##   ..$ k     : num 2
##   ..$ serie : num 2
##   ..$ seed  : int -509134623
##   ..$ kinds : chr "Super-Duper"
##  $ statistics:'data.frame':	1 obs. of  6 variables:
##   ..$ lambda    : num 1
##   ..$ treatmeans: int 3
##   ..$ blockSize : num 2
##   ..$ blocks    : int 3
##   ..$ r         : num 2
##   ..$ Efficiency: num 0.75
##  $ sketch    : chr [1:3, 1:2] "C" "B" "B" "A" ...
##  $ book      :'data.frame':	6 obs. of  3 variables:
##   ..$ plots: num [1:6] 101 102 201 202 301 302
##   ..$ block: Factor w/ 3 levels "1","2","3": 1 1 2 2 3 3
##   ..$ trt  : Factor w/ 3 levels "A","B","C": 3 1 2 1 2 3
```
]

<br>

</div>

---

# `agricolae::design.ab()`

.blue[**Factorial design**] for `$t = 3 \times 2$` treatments with `$2$` replication for each treatment

<pre><code>
agricolae::.bg-yellow[design.ab](trt = c(3, 2), r = 2, design = "crd") %>% glimpse()
</code></pre>

```
## List of 2
##  $ parameters:List of 8
##   ..$ design : chr "factorial"
##   ..$ trt    : chr [1:6] "1 1" "1 2" "2 1" "2 2" ...
##   ..$ r      : num [1:6] 2 2 2 2 2 2
##   ..$ serie  : num 2
##   ..$ seed   : int 801301585
##   ..$ kinds  : chr "Super-Duper"
##   ..$        : logi TRUE
##   ..$ applied: chr "crd"
##  $ book      :'data.frame':	12 obs. of  4 variables:
##   ..$ plots: num [1:12] 101 102 103 104 105 106 107 108 109 110 ...
##   ..$ r    : int [1:12] 1 1 1 2 1 1 1 2 2 2 ...
##   ..$ A    : chr [1:12] "2" "3" "2" "2" ...
##   ..$ B    : chr [1:12] "2" "2" "1" "2" ...
```
]

Note *not* A/B testing!

<br>

<br>

</div>

---

# `agricolae::design.split()`

.blue[**Split-plot design**] for `$t = 2 \times 4$` treatments with `$2$` replication for each treatment

<pre><code>
trt1 <- c("I", "R"); trt2 <- LETTERS[1:4]
agricolae::.bg-yellow[design.split](trt1 = trt1, trt2 = trt2, r = 2, design = "crd") %>% 
    glimpse()
</code></pre>

```
## List of 2
##  $ parameters:List of 8
##   ..$ design : chr "split"
##   ..$        : logi TRUE
##   ..$ trt1   : chr [1:2] "I" "R"
##   ..$ applied: chr "crd"
##   ..$ r      : num [1:2] 2 2
##   ..$ serie  : num 2
##   ..$ seed   : int -765020087
##   ..$ kinds  : chr "Super-Duper"
##  $ book      :'data.frame':	16 obs. of  5 variables:
##   ..$ plots : num [1:16] 101 101 101 101 102 102 102 102 103 103 ...
##   ..$ splots: Factor w/ 4 levels "1","2","3","4": 1 2 3 4 1 2 3 4 1 2 ...
##   ..$ r     : int [1:16] 1 1 1 1 1 1 1 1 2 2 ...
##   ..$ trt1  : chr [1:16] "I" "I" "I" "I" ...
##   ..$ trt2  : chr [1:16] "D" "A" "C" "B" ...
```
]

</div>

---

# Grammar of experimental design

* Remember Base R plots? <a href="#base-plot" style="font-size:12pt"><i class="fas fa-caret-right"></i> base plot</a>
--

* 🤔 "named experimental design" functions (`agricolae::design.crd`, etc.) are like "named statistical graphic" functions (`pie`, `barplot`)
--

* 💡 construction of experimental design needs to be made more accessible, modifiable, extensible and generalisable
--

* This is the concept for the development of the `edibble` R-package <br>https://github.com/emitanaka/edibble<br>(<i class="fas fa-wrench"></i> WIP, sorry code not ready yet, please enjoy prototype and live demo instead)
--

* `tibble` R-package is a modern reimagining of the `data.frame`
* `edibble` (WIP) creates experimental design tibbles
{{content}}

---

# Prototype for grammar of experimental design

* Consider a field experiment with 32 plots

]
.item[

```r
library(edibble)
initiate_design(seed = 2020) %>% 
* set_units(plot = 32)
```

```
## An edibble design
## └─plot (32 levels)
```
]
]

---

# Prototype Grammar of Experimental Design

* Consider a field experiment with 32 plots
* There are 7 different treatments

]
.item50[

```r
library(edibble)
initiate_design(seed = 2020) %>% 
  set_units(plot = 32) %>% 
* set_trts(trt = c("F12", "F3", "F6", "O",
*                  "S12", "S3", "S6"))
```

```
## An edibble design
## ├─plot (32 levels)
## └─trt (7 levels)
```

]
]

---

# Prototype Grammar of Experimental Design .circle[1]

* Consider a field experiment with 32 plots
* There are 7 different treatments
* Apply treatments to plots

]
.item50[

```r
library(edibble)
initiate_design(seed = 2020) %>% 
  set_units(plot = 32) %>% 
  set_trts(trt = c("F12", "F3", "F6", "O", 
                   "S12", "S3", "S6")) %>% 
* apply_trts(~ plot)
```

```
## An edibble design
## ├─plot (32 levels)
## └─trt (7 levels)
```

]
]

---

# Prototype Grammar of Experimental Design .circle[1]

* Consider a field experiment with 32 plots
* There are 7 different treatments
* Apply treatments to plots
* Randomise treatments to plots

]
.item50[

```r
library(edibble)
initiate_design(seed = 2020) %>% 
  set_units(plot = 32) %>% 
  set_trts(trt = c("F12", "F3", "F6", "O", 
                   "S12", "S3", "S6")) %>% 
  apply_trts(~ plot) %>% 
* randomise_trts()
```

```
## An edibble design
## ├─plot (32 levels)
## └─trt (7 levels)
```

]
]

---

# Prototype Grammar of Experimental Design .circle[1]

* Consider a field experiment with 32 plots
* There are 7 different treatments
* Apply treatments to plots
* Randomise treatments to plots
* The result is a **completely randomised design**

]
.item50[

```r
library(edibble)
initiate_design(seed = 2020) %>% 
  set_units(plot = 32) %>% 
  set_trts(trt = c("F12", "F3", "F6", "O", 
                   "S12", "S3", "S6")) %>% 
  apply_trts(~ plot) %>% 
  randomise_trts() %>% 
* serve_table()
```

```
## # An edibble: 32 x 2
##    plot   trt  
##    <fct>  <fct>
##  1 plot1  F12  
##  2 plot2  S3   
##  3 plot3  O    
##  4 plot4  S6   
##  5 plot5  F3   
##  6 plot6  S12  
##  7 plot7  F6   
##  8 plot8  S6   
##  9 plot9  F3   
## 10 plot10 S3   
## # … with 22 more rows
```

]
]

---

# Prototype Grammar of Experimental Design .circle[2]

* Consider a field experiment with 2 blocks each with 16 plots
* There are 7 different treatments
* Apply treatments to plots
* Randomise treatments to plots within blocks
* Resulting design is a **randomised block design**

<br>

Can you see how it differs from the previous design?

]
.item50[

```r
library(edibble)
initiate_design(seed = 2020) %>% 
* set_units(block = c("B1", "B2"),
*           plot = nested_in(block, 16)) %>%
  set_trts(trt = c("F12", "F3", "F6", "O", 
                   "S12", "S3", "S6")) %>% 
  apply_trts(~ plot) %>% 
  randomise_trts() %>% 
  serve_table() 
```

```
## # An edibble: 32 x 3
##    block plot   trt  
##    <fct> <fct>  <fct>
##  1 B1    plot1  F12  
##  2 B1    plot2  S3   
##  3 B1    plot3  S12  
##  4 B1    plot4  S6   
##  5 B1    plot5  F3   
##  6 B1    plot6  S3   
##  7 B1    plot7  F6   
##  8 B1    plot8  S6   
##  9 B1    plot9  F3   
## 10 B1    plot10 S6   
## # … with 22 more rows
```

]
]

---

.animated.flash.font_large[
# LIVE DEMO
]

---

# Limitations

.pad20[
* Generalising experimental designs is extremely hard
* A lot of domains have their own specialised design
{{content}}

]
--
* Some specialist designs that have certain properties cannot be easily replicated by `edibble`
* An ad-hoc approach may still be required to create design with specific properties
{{content}}
--
but the hope is that this can be achieved with replacement in one of the steps (e.g. replacing `randomise_trts`)
{{content}}
--
* _So why would anyone buy into this framework?_
{{content}}
--
* Hopefully, developers and users can see the benefit in being more expressive in code; and also a pay-off like easy visualisation of designs becomes worthwhile

---

# <i class="fas fa-clock"></i> Timeline

.grid[.item[
* By end of 2020, I intend to have `edibble` be able to construct textbook (mostly orthogonal) designs.
* From 2021, I plan to hold (free) workshops in experimental design with `edibble` to generate certain designs.
* The initial development for `edibble` is focussed on designs with "fixed" structures. 
* For survey and sampling designs you should check out [DeclareDesign](https://declaredesign.org/).

]
.item[
.center[
<br><br>

# <i class="fas fa-comments"></i>

Feedback is welcomed!

]
<br>

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<i class="fab fa-github"></i> <a href="https://github.com/emitanaka/edibble/issues" style="color:black!important">emitanaka/edibble/issues</a>
<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<i class="fas fa-envelope"></i> emi.tanaka@monash.edu

<br><br>
<div class="center">
<div style="border: 2px dotted #006DAE; width:400px;padding:10px;margin:auto;background:#D2F1FF;">
<i class="fas fa-link"></i> Slides can be found at

<a href="https://www.emitanaka.org/slides/SSA2020/" style="font-size:20pt">emitanaka.org/slides/SSA2020/</a>
</div>
</div>
]
]

---

# Acknowledgements

This slide was made using `xaringan` R-package and the following systems.

```r
sessioninfo::session_info()
```

```
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 4.0.1 (2020-06-06)
##  os       macOS Catalina 10.15.7      
##  system   x86_64, darwin17.0          
##  ui       X11                         
##  language (EN)                        
##  collate  en_AU.UTF-8                 
##  ctype    en_AU.UTF-8                 
##  tz       Australia/Melbourne         
##  date     2020-11-27                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version    date       lib source                              
##  agricolae     1.3-3      2020-06-07 [1] CRAN (R 4.0.1)                      
##  AlgDesign     1.2.0      2019-11-29 [1] CRAN (R 4.0.1)                      
##  anicon        0.1.0      2020-06-21 [1] Github (emitanaka/anicon@0b756df)   
##  assertthat    0.2.1      2019-03-21 [2] CRAN (R 4.0.0)                      
##  backports     1.2.0      2020-11-02 [1] CRAN (R 4.0.2)                      
##  broom         0.7.2      2020-10-20 [1] CRAN (R 4.0.2)                      
##  callr         3.5.1      2020-10-13 [1] CRAN (R 4.0.2)                      
##  cellranger    1.1.0      2016-07-27 [2] CRAN (R 4.0.0)                      
##  cli           2.2.0      2020-11-20 [1] CRAN (R 4.0.1)                      
##  cluster       2.1.0      2019-06-19 [2] CRAN (R 4.0.1)                      
##  colorspace    2.0-0      2020-11-11 [1] CRAN (R 4.0.2)                      
##  combinat      0.0-8      2012-10-29 [1] CRAN (R 4.0.1)                      
##  countdown     0.3.5      2020-07-20 [1] Github (gadenbuie/countdown@a544fa4)
##  crayon        1.3.4      2017-09-16 [2] CRAN (R 4.0.0)                      
##  DBI           1.1.0      2019-12-15 [1] CRAN (R 4.0.2)                      
##  dbplyr        2.0.0      2020-11-03 [1] CRAN (R 4.0.2)                      
##  digest        0.6.27     2020-10-24 [1] CRAN (R 4.0.2)                      
##  dplyr       * 1.0.2      2020-08-18 [1] CRAN (R 4.0.2)                      
##  edibble     * 0.0.0.9000 2020-11-15 [1] local                               
##  ellipsis      0.3.1      2020-05-15 [2] CRAN (R 4.0.0)                      
##  emo           0.0.0.9000 2020-06-26 [1] Github (hadley/emo@3f03b11)         
##  evaluate      0.14       2019-05-28 [2] CRAN (R 4.0.0)                      
##  fansi         0.4.1      2020-01-08 [2] CRAN (R 4.0.0)                      
##  farver        2.0.3.9000 2020-07-24 [1] Github (thomasp85/farver@f1bcb56)   
##  fastmap       1.0.1      2019-10-08 [2] CRAN (R 4.0.0)                      
##  forcats     * 0.5.0      2020-03-01 [2] CRAN (R 4.0.0)                      
##  fs            1.5.0      2020-07-31 [1] CRAN (R 4.0.2)                      
##  generics      0.1.0      2020-10-31 [2] CRAN (R 4.0.2)                      
##  ggplot2     * 3.3.2      2020-06-19 [1] CRAN (R 4.0.2)                      
##  glue          1.4.2      2020-08-27 [1] CRAN (R 4.0.2)                      
##  gtable        0.3.0      2019-03-25 [2] CRAN (R 4.0.0)                      
##  haven         2.3.1      2020-06-01 [2] CRAN (R 4.0.0)                      
##  highr         0.8        2019-03-20 [2] CRAN (R 4.0.0)                      
##  hms           0.5.3      2020-01-08 [2] CRAN (R 4.0.0)                      
##  htmltools     0.5.0      2020-06-16 [1] CRAN (R 4.0.2)                      
##  httpuv        1.5.4      2020-06-06 [2] CRAN (R 4.0.1)                      
##  httr          1.4.2      2020-07-20 [1] CRAN (R 4.0.2)                      
##  icon          0.1.0      2020-06-21 [1] Github (emitanaka/icon@8458546)     
##  igraph        1.2.6      2020-10-06 [1] CRAN (R 4.0.2)                      
##  jsonlite      1.7.1      2020-09-07 [1] CRAN (R 4.0.2)                      
##  klaR          0.6-15     2020-02-19 [1] CRAN (R 4.0.1)                      
##  knitr         1.30       2020-09-22 [1] CRAN (R 4.0.2)                      
##  labeling      0.4.2      2020-10-20 [1] CRAN (R 4.0.2)                      
##  labelled      2.7.0      2020-09-21 [1] CRAN (R 4.0.2)                      
##  later         1.1.0.1    2020-06-05 [2] CRAN (R 4.0.1)                      
##  lattice       0.20-41    2020-04-02 [2] CRAN (R 4.0.1)                      
##  lifecycle     0.2.0      2020-03-06 [1] CRAN (R 4.0.0)                      
##  lubridate     1.7.9      2020-06-08 [2] CRAN (R 4.0.1)                      
##  magrittr      2.0.1      2020-11-17 [1] CRAN (R 4.0.2)                      
##  MASS          7.3-53     2020-09-09 [1] CRAN (R 4.0.2)                      
##  mime          0.9        2020-02-04 [2] CRAN (R 4.0.0)                      
##  miniUI        0.1.1.1    2018-05-18 [1] CRAN (R 4.0.0)                      
##  modelr        0.1.8      2020-05-19 [2] CRAN (R 4.0.0)                      
##  munsell       0.5.0      2018-06-12 [2] CRAN (R 4.0.0)                      
##  nlme          3.1-150    2020-10-24 [2] CRAN (R 4.0.2)                      
##  pillar        1.4.7      2020-11-20 [1] CRAN (R 4.0.1)                      
##  pkgconfig     2.0.3      2019-09-22 [2] CRAN (R 4.0.0)                      
##  processx      3.4.4      2020-09-03 [1] CRAN (R 4.0.2)                      
##  promises      1.1.1      2020-06-09 [1] CRAN (R 4.0.0)                      
##  ps            1.4.0      2020-10-07 [1] CRAN (R 4.0.2)                      
##  purrr       * 0.3.4      2020-04-17 [2] CRAN (R 4.0.0)                      
##  questionr     0.7.3      2020-10-05 [1] CRAN (R 4.0.2)                      
##  R6            2.5.0      2020-10-28 [1] CRAN (R 4.0.2)                      
##  Rcpp          1.0.5      2020-07-06 [1] CRAN (R 4.0.0)                      
##  readr       * 1.4.0      2020-10-05 [2] CRAN (R 4.0.2)                      
##  readxl        1.3.1      2019-03-13 [2] CRAN (R 4.0.0)                      
##  reprex        0.3.0.9001 2020-08-08 [1] Github (tidyverse/reprex@9594ee9)   
##  rlang         0.4.8      2020-10-08 [1] CRAN (R 4.0.2)                      
##  rmarkdown     2.5        2020-10-21 [1] CRAN (R 4.0.1)                      
##  rstudioapi    0.13       2020-11-12 [1] CRAN (R 4.0.1)                      
##  rvest         0.3.6      2020-07-25 [1] CRAN (R 4.0.2)                      
##  scales        1.1.1      2020-05-11 [2] CRAN (R 4.0.0)                      
##  sessioninfo   1.1.1      2018-11-05 [2] CRAN (R 4.0.0)                      
##  shiny         1.5.0      2020-06-23 [1] CRAN (R 4.0.2)                      
##  stringi       1.5.3      2020-09-09 [2] CRAN (R 4.0.2)                      
##  stringr     * 1.4.0      2019-02-10 [2] CRAN (R 4.0.0)                      
##  tibble      * 3.0.4.9000 2020-11-26 [1] Github (tidyverse/tibble@9eeef4d)   
##  tidyr       * 1.1.2      2020-08-27 [1] CRAN (R 4.0.2)                      
##  tidyselect    1.1.0      2020-05-11 [2] CRAN (R 4.0.0)                      
##  tidyverse   * 1.3.0      2019-11-21 [1] CRAN (R 4.0.2)                      
##  utf8          1.1.4      2018-05-24 [2] CRAN (R 4.0.0)                      
##  vctrs         0.3.5.9000 2020-11-26 [1] Github (r-lib/vctrs@957baf7)        
##  viridisLite   0.3.0      2018-02-01 [2] CRAN (R 4.0.0)                      
##  whisker       0.4        2019-08-28 [2] CRAN (R 4.0.0)                      
##  withr         2.3.0      2020-09-22 [1] CRAN (R 4.0.2)                      
##  xaringan      0.18       2020-10-21 [1] CRAN (R 4.0.2)                      
##  xfun          0.19       2020-10-30 [1] CRAN (R 4.0.2)                      
##  xml2          1.3.2      2020-04-23 [2] CRAN (R 4.0.0)                      
##  xtable        1.8-4      2019-04-21 [2] CRAN (R 4.0.0)                      
##  yaml          2.2.1      2020-02-01 [1] CRAN (R 4.0.2)                      
## 
## [1] /Users/etan0038/Library/R/4.0/library
## [2] /Library/Frameworks/R.framework/Versions/4.0/Resources/library
```

]

(Scroll on html slide to see all)

Notes for current slide

Notes for next slide

These slides are viewed best by Chrome and occasionally need to be refreshed if elements did not load properly. See here for PDF .

Press the right arrow to progress to the next slide!

1/47

Advent of "Grammar"

Bridging Statistics and Data Science for the Design of Experiments

Emi Tanaka

Department of Econometrics and Business Statistics

emi.tanaka@monash.edu @statsgen

27th Nov 2020 @ Statistical Society of Australia

1/47

NOTICE

 This talk is most relevant for people who code 
Examples are all in R but philosophy holds for other programming languages

1  Grammar of Graphics

2  Grammar of Data Manipulation

3  Grammar of Experimental Design

2/47

1 Grammar of Graphics

ggplot2 📦

Wickham (2016) ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York

Wilkinson (2005) The Grammar of Graphics. Statistics and Computing. Springer, 2nd edition.

3/47

A typical Australian academic is expected to do 40% teaching, 40% research and 20% administration

and like 200% external service work, outreach, writing grants, reviewing things, sitting on panels, learning how to do online teaching,...

4/47

What are the differences between these plots?

5/47

What are the differences between these plots?

How do you construct these plots?

5/47

Using Base R

df

##       duty perc
## 1 Teaching   40
## 2 Research   40
## 3    Admin   20

barplot(as.matrix(df$perc),
        legend = df$duty)

pie(df$perc, labels = df$duty)

R Core Team (2020) R: A Language and Environment for Statistical Computing https://www.R-project.org/

edibble

6/47

Using Base R

df

##       duty perc
## 1 Teaching   40
## 2 Research   40
## 3    Admin   20

barplot(as.matrix(df$perc),
        legend = df$duty)

pie(df$perc, labels = df$duty)

R Core Team (2020) R: A Language and Environment for Statistical Computing https://www.R-project.org/

edibble

Single purpose functions to generate "named plots"
Input varies, here it is vector or matrix

6/47

Using `ggplot2`

df

##       duty perc
## 1 Teaching   40
## 2 Research   40
## 3    Admin   20

ggplot(df, aes(x = "", # dummy
               y = perc, 
               fill = duty)) + 
  geom_col()

Wilkinson (2005) The Grammar of Graphics. Statistics and Computing. Springer, 2nd edition.

Wickham (2008) Practical Tools for Exploring Data and Models. PhD Thesis Chapter 3: A layered grammar of graphics.

Wickham (2010) A Layered Grammar of Graphics, Journal of Computational and Graphical Statistics, 19:1, 3-28

7/47

Using `ggplot2`

df

##       duty perc
## 1 Teaching   40
## 2 Research   40
## 3    Admin   20

ggplot(df, aes(x = "", # dummy
               y = perc, 
               fill = duty)) + 
  geom_col()

ggplot(df, aes(x = "", # dummy
               y = perc, 
               fill = duty)) + 
  geom_col() + 
  coord_polar(theta = "y")

Wilkinson (2005) The Grammar of Graphics. Statistics and Computing. Springer, 2nd edition.

Wickham (2008) Practical Tools for Exploring Data and Models. PhD Thesis Chapter 3: A layered grammar of graphics.

Wickham (2010) A Layered Grammar of Graphics, Journal of Computational and Graphical Statistics, 19:1, 3-28

7/47

Using `ggplot2`

df

##       duty perc
## 1 Teaching   40
## 2 Research   40
## 3    Admin   20

ggplot(df, aes(x = "", # dummy
               y = perc, 
               fill = duty)) + 
  geom_col()

ggplot(df, aes(x = "", # dummy
               y = perc, 
               fill = duty)) + 
  geom_col() + 
  coord_polar(theta = "y")

Wilkinson (2005) The Grammar of Graphics. Statistics and Computing. Springer, 2nd edition.

Wickham (2008) Practical Tools for Exploring Data and Models. PhD Thesis Chapter 3: A layered grammar of graphics.

Wickham (2010) A Layered Grammar of Graphics, Journal of Computational and Graphical Statistics, 19:1, 3-28

ggplot2 implements the grammar of graphics
the difference between a stacked barplot and a pie chart is that the coordinate system have been transformed from Cartesian coordinate to polar coordinate

7/47

Data Plot

ggplot uses tidy data as input so plot construction is enforced by consistent thinking in relation to tidy data
Tidy data principles
1. Each variable must have its own column.
2. Each observation must have its own row.
3. Each value must have its own cell.
Variables are mapped to a plot aesthetic
Plots are constructed from its components expressed by the grammar of graphics

Wickham (2014) Tidy Data. Journal of Statistical Software, Articles 59 (10): 1–23.

8/47

Do you like potatoes?

9/47

Do you like potatoes?

Let's have a look at a potato field experiment to understand the infection rate when it's treated with sulfur at different season and amount?

9/47

How do we get these graphs?

data(cochran.crd,
     package = "agridat")
cochran.crd

##    inf trt row col
## 1    9  F3   4   1
## 2   12   O   4   2
## 3   18  S6   4   3
## 4   10 F12   4   4
## 5   24  S6   4   5
## 6   17 S12   4   6
## 7   30  S3   4   7
## 8   16  F6   4   8
## 9   10   O   3   1
## 10   7  S3   3   2
## 11   4 F12   3   3
## 12  10  F6   3   4
## 13  21  S3   3   5
## 14  24   O   3   6
## 15  29   O   3   7
## 16  12  S6   3   8
## 17   9  F3   2   1
## 18   7 S12   2   2
## 19  18  F6   2   3
## 20  30   O   2   4
## 21  18  F6   2   5
## 22  16 S12   2   6
## 23  16  F3   2   7
## 24   4 F12   2   8
## 25   9  S3   1   1
## 26  18   O   1   2
## 27  17 S12   1   3
## 28  19  S6   1   4
## 29  32   O   1   5
## 30   5 F12   1   6
## 31  26   O   1   7
## 32   4  F3   1   8

Kevin Wright (2020). agridat: Agricultural Datasets. R package version 1.17.

W.G. Cochran and G. Cox, 1957. Experimental Designs, 2nd ed. John Wiley, New York.

10/47

How do we get these graphs?

data(cochran.crd,
     package = "agridat")
cochran.crd

##    inf trt row col
## 1    9  F3   4   1
## 2   12   O   4   2
## 3   18  S6   4   3
## 4   10 F12   4   4
## 5   24  S6   4   5
## 6   17 S12   4   6
## 7   30  S3   4   7
## 8   16  F6   4   8
## 9   10   O   3   1
## 10   7  S3   3   2
## 11   4 F12   3   3
## 12  10  F6   3   4
## 13  21  S3   3   5
## 14  24   O   3   6
## 15  29   O   3   7
## 16  12  S6   3   8
## 17   9  F3   2   1
## 18   7 S12   2   2
## 19  18  F6   2   3
## 20  30   O   2   4
## 21  18  F6   2   5
## 22  16 S12   2   6
## 23  16  F3   2   7
## 24   4 F12   2   8
## 25   9  S3   1   1
## 26  18   O   1   2
## 27  17 S12   1   3
## 28  19  S6   1   4
## 29  32   O   1   5
## 30   5 F12   1   6
## 31  26   O   1   7
## 32   4  F3   1   8

ggplot(cochran.crd, 
       aes(x = col, 
           y = row, 
           fill = inf)) +
  geom_tile(color = "black", size = 1.2) + 
  scale_fill_gradient(low = "gold",
                      high = "firebrick4") +
  theme(text = element_text(size = 20))

ggplot(cochran.crd, 
       aes(x = col, 
           y = row, 
           fill = trt)) +
  geom_tile(color = "black", size = 1.2) +
  scale_fill_viridis_d() +
  theme(text = element_text(size = 20))

Kevin Wright (2020). agridat: Agricultural Datasets. R package version 1.17.

W.G. Cochran and G. Cox, 1957. Experimental Designs, 2nd ed. John Wiley, New York.

10/47

Lessons from the grammar of graphics
in `ggplot2`

Modifiable: ggplot object can be modified

11/47

Lessons from the grammar of graphics
in `ggplot2`

Modifiable: ggplot object can be modified
Generalisable: Commands may feel verbose (compared to single-graphics functions) but you're less limited in the plots you can construct without an exhaustive list of functions (compared to the number of plots you can create)

11/47

Lessons from the grammar of graphics
in `ggplot2`

Modifiable: ggplot object can be modified
Generalisable: Commands may feel verbose (compared to single-graphics functions) but you're less limited in the plots you can construct without an exhaustive list of functions (compared to the number of plots you can create)
Extensible: the system can be extended to make specialised plots or add more features if the same "grammar" is adopted

11/47

2 Grammar of
Data Manipulation

dplyr 📦

Wickham, François, Henry & Müller (2020) dplyr: A Grammar of Data Manipulation. R-package version 1.0.0.

12/47

What does the code below do?# part 1
tapply(cochran.crd$inf, cochran.crd$trt, mean)

#    F12     F3     F6      O    S12     S3     S6 
#  5.750  9.500 15.500 22.625 14.250 16.750 18.250
df <- cochran.crd
# part 2
df$season <- ifelse(grepl("F", df$trt), 
                    "Fall", 
                    "Spring")
df$season[df$trt=="O"] <- "N/A"
# part 3
df$sulfur <- sapply(df$trt, 
    function(x) switch(as.character(x), 
          F3 = 300, F6 = 600, F12 = 1200,
          S3 = 300, S6 = 600, S12 = 1200,
          O = 0)) # pounds/acre

# part 4, move trt after col
df[, c("inf", "row", "col", "trt", 
       "season", "sulfur")]

#    inf row col trt season sulfur
# 1    9   4   1  F3   Fall    300
# 2   12   4   2   O    N/A      0
# 3   18   4   3  S6 Spring    600
# 4   10   4   4 F12   Fall   1200
# 5   24   4   5  S6 Spring    600
# 6   17   4   6 S12 Spring   1200
# 7   30   4   7  S3 Spring    300
# 8   16   4   8  F6   Fall    600
# 9   10   3   1   O    N/A      0
# 10   7   3   2  S3 Spring    300
# 11   4   3   3 F12   Fall   1200
# 12  10   3   4  F6   Fall    600
# 13  21   3   5  S3 Spring    300
# 14  24   3   6   O    N/A      0
# 15  29   3   7   O    N/A      0
# 16  12   3   8  S6 Spring    600
# 17   9   2   1  F3   Fall    300
# 18   7   2   2 S12 Spring   1200
# 19  18   2   3  F6   Fall    600
# 20  30   2   4   O    N/A      0
# 21  18   2   5  F6   Fall    600
# 22  16   2   6 S12 Spring   1200
# 23  16   2   7  F3   Fall    300
# 24   4   2   8 F12   Fall   1200
# 25   9   1   1  S3 Spring    300
# 26  18   1   2   O    N/A      0
# 27  17   1   3 S12 Spring   1200
# 28  19   1   4  S6 Spring    600
# 29  32   1   5   O    N/A      0
# 30   5   1   6 F12   Fall   1200
# 31  26   1   7   O    N/A      0
# 32   4   1   8  F3   Fall    300

01:00
13/47

Which is easier to read for you? Part 1# part 1
tapply(cochran.crd$inf, cochran.crd$trt, mean)

#    F12     F3     F6      O    S12     S3     S6 
#  5.750  9.500 15.500 22.625 14.250 16.750 18.250
# part 1
cochran.crd %>% 
  group_by(trt) %>% 
  summarise(inf_mean = mean(inf))

# # A tibble: 7 x 2
#   trt   inf_mean
#   <fct>    <dbl>
# 1 F12       5.75
# 2 F3        9.5 
# 3 F6       15.5 
# 4 O        22.6 
# 5 S12      14.2 
# 6 S3       16.8 
# 7 S6       18.2
14/47

Which is easier to read for you? Part 1# part 1
tapply(cochran.crd$inf, cochran.crd$trt, mean)

#    F12     F3     F6      O    S12     S3     S6 
#  5.750  9.500 15.500 22.625 14.250 16.750 18.250
Can you guess what tapply does if you don't know about it?

# part 1
cochran.crd %>% 
  group_by(trt) %>% 
  summarise(inf_mean = mean(inf))

# # A tibble: 7 x 2
#   trt   inf_mean
#   <fct>    <dbl>
# 1 F12       5.75
# 2 F3        9.5 
# 3 F6       15.5 
# 4 O        22.6 
# 5 S12      14.2 
# 6 S3       16.8 
# 7 S6       18.2
14/47

Which is easier to read for you? Part 1# part 1
tapply(cochran.crd$inf, cochran.crd$trt, mean)

#    F12     F3     F6      O    S12     S3     S6 
#  5.750  9.500 15.500 22.625 14.250 16.750 18.250
Can you guess what tapply does if you don't know about it?
 Functions are less accessible to broader audiences if function names and argument names are designed so:specialist knowledge is required,
approach is ad-hoc, and/or
unintuitively designed for either input and/or output


# part 1
cochran.crd %>% 
  group_by(trt) %>% 
  summarise(inf_mean = mean(inf))

# # A tibble: 7 x 2
#   trt   inf_mean
#   <fct>    <dbl>
# 1 F12       5.75
# 2 F3        9.5 
# 3 F6       15.5 
# 4 O        22.6 
# 5 S12      14.2 
# 6 S3       16.8 
# 7 S6       18.2
14/47

Which is easier to read for you? Part 2df <- cochran.crd
# part 2
df$season <- ifelse(grepl("F", df$trt), 
                    "Fall", 
                    "Spring")
df$season[df$trt=="O"] <- "N/A"
# part 3
df$sulfur <- sapply(df$trt, 
    function(x) switch(as.character(x), 
          F3 = 300, F6 = 600, F12 = 1200,
          S3 = 300, S6 = 600, S12 = 1200,
          O = 0)) # pounds/acre
# part 4, move trt after col
df[, c("inf", "row", "col", "trt", 
       "season", "sulfur")]

#    inf row col trt season sulfur
# 1    9   4   1  F3   Fall    300
# 2   12   4   2   O    N/A      0
# 3   18   4   3  S6 Spring    600
# 4   10   4   4 F12   Fall   1200
# 5   24   4   5  S6 Spring    600
# 6   17   4   6 S12 Spring   1200
# 7   30   4   7  S3 Spring    300
# 8   16   4   8  F6   Fall    600
# 9   10   3   1   O    N/A      0
# 10   7   3   2  S3 Spring    300
# 11   4   3   3 F12   Fall   1200
# 12  10   3   4  F6   Fall    600
# 13  21   3   5  S3 Spring    300
# 14  24   3   6   O    N/A      0
# 15  29   3   7   O    N/A      0
# 16  12   3   8  S6 Spring    600
# 17   9   2   1  F3   Fall    300
# 18   7   2   2 S12 Spring   1200
# 19  18   2   3  F6   Fall    600
# 20  30   2   4   O    N/A      0
# 21  18   2   5  F6   Fall    600
# 22  16   2   6 S12 Spring   1200
# 23  16   2   7  F3   Fall    300
# 24   4   2   8 F12   Fall   1200
# 25   9   1   1  S3 Spring    300
# 26  18   1   2   O    N/A      0
# 27  17   1   3 S12 Spring   1200
# 28  19   1   4  S6 Spring    600
# 29  32   1   5   O    N/A      0
# 30   5   1   6 F12   Fall   1200
# 31  26   1   7   O    N/A      0
# 32   4   1   8  F3   Fall    300
cochran.crd %>% 
  # part 2 & 3
  mutate(season = case_when(
                    str_detect(trt, "F") ~ "Fall",
                    str_detect(trt, "S") ~ "Spring",
                                    TRUE ~ "N/A"),
         sulfur = case_when(
                    str_detect(trt, "12") ~ 1200,
                    str_detect(trt, "6") ~ 600,
                    str_detect(trt, "3") ~ 300,
                                     TRUE ~ 0)) %>% 
  # part 4
  relocate(trt, .after = col) %>% 
  as_tibble() # optional

# # A tibble: 32 x 6
#      inf   row   col trt   season sulfur
#    <int> <int> <int> <fct> <chr>   <dbl>
#  1     9     4     1 F3    Fall      300
#  2    12     4     2 O     N/A         0
#  3    18     4     3 S6    Spring    600
#  4    10     4     4 F12   Fall     1200
#  5    24     4     5 S6    Spring    600
#  6    17     4     6 S12   Spring   1200
#  7    30     4     7 S3    Spring    300
#  8    16     4     8 F6    Fall      600
#  9    10     3     1 O     N/A         0
# 10     7     3     2 S3    Spring    300
# # … with 22 more rows
15/47

Which is easier to read for you? Part 3

🎯 find the mean of inf by the combination of variables season and sulfur
How would you modify the code below?

tapply(df$inf, df$trt, mean)

##    F12     F3     F6      O    S12     S3     S6 
##  5.750  9.500 15.500 22.625 14.250 16.750 18.250

df %>% 
  group_by(trt) %>% 
  summarise(inf_mean = mean(inf))

## # A tibble: 7 x 2
##   trt   inf_mean
##   <fct>    <dbl>
## 1 F12       5.75
## 2 F3        9.5 
## 3 F6       15.5 
## 4 O        22.6 
## 5 S12      14.2 
## 6 S3       16.8 
## 7 S6       18.2

00:30

16/47

Which is easier to read for you? Part 3

🎯 find the mean of inf by the combination of variables season and sulfur
How would you modify the code below?

tapply(df$inf, list(df$season, df$sulfur), mean)

##             0   300   600  1200
## Fall       NA  9.50 15.50  5.75
## N/A    22.625    NA    NA    NA
## Spring     NA 16.75 18.25 14.25

Notice that the above output is matrix.

df %>% 
  group_by(season, sulfur) %>% 
  summarise(inf_mean = mean(inf))

## # A tibble: 7 x 3
## # Groups:   season [3]
##   season sulfur inf_mean
##   <chr>   <dbl>    <dbl>
## 1 Fall      300     9.5 
## 2 Fall      600    15.5 
## 3 Fall     1200     5.75
## 4 N/A         0    22.6 
## 5 Spring    300    16.8 
## 6 Spring    600    18.2 
## 7 Spring   1200    14.2

16/47

Which is easier to read for you? Part 3

🎯 find the mean of inf by the combination of variables season and sulfur
How would you modify the code below?

tapply(df$inf, list(df$season, df$sulfur), mean)

##             0   300   600  1200
## Fall       NA  9.50 15.50  5.75
## N/A    22.625    NA    NA    NA
## Spring     NA 16.75 18.25 14.25

Notice that the above output is matrix.

This works also:

tapply(df$inf, paste0(df$season, df$sulfur), mean)

##   Fall1200    Fall300    Fall600       N/A0 Spring1200 
##      5.750      9.500     15.500     22.625     14.250 
##  Spring300  Spring600 
##     16.750     18.250

df %>% 
  group_by(season, sulfur) %>% 
  summarise(inf_mean = mean(inf))

## # A tibble: 7 x 3
## # Groups:   season [3]
##   season sulfur inf_mean
##   <chr>   <dbl>    <dbl>
## 1 Fall      300     9.5 
## 2 Fall      600    15.5 
## 3 Fall     1200     5.75
## 4 N/A         0    22.6 
## 5 Spring    300    16.8 
## 6 Spring    600    18.2 
## 7 Spring   1200    14.2

16/47

Reading code like English grammardf <- cochran.crd %>% 
  mutate(season = case_when(
                    str_detect(trt, "F") ~ "Fall",
                    str_detect(trt, "S") ~ "Spring",
                                    TRUE ~ "N/A"),
         sulfur = case_when(
                    str_detect(trt, "12") ~ 1200,
                    str_detect(trt, "6") ~ 600,
                    str_detect(trt, "3") ~ 300,
                                     TRUE ~ 0)) %>% 
  relocate(trt, .after = col) %>% 
  as_tibble() # optional
df %>% 
  group_by(season, sulfur) %>% 
  summarise(inf_mean = mean(inf))

Read %>% as "and then"
The code may read more familiar to an English user without much specialist knowledge
Of course this is biased in favour of people who know English
Bonus: no intermediate naming to think or worry about

17/47

Lessons from grammar of data manipulation
in `dplyr`

Leveraging existing knowledge of an everyday user or an "average" statistician for the choice of function names and argument names can make it more accessible

18/47

Lessons from grammar of data manipulation
in `dplyr`

Leveraging existing knowledge of an everyday user or an "average" statistician for the choice of function names and argument names can make it more accessible
Making it more accessible, by making code more expressive or otherwise, can promote statistical literacy

18/47

Lessons from grammar of data manipulation
in `dplyr`

Leveraging existing knowledge of an everyday user or an "average" statistician for the choice of function names and argument names can make it more accessible
Making it more accessible, by making code more expressive or otherwise, can promote statistical literacy
Consistency in input and output makes it easier for user expectation and helps to offload mental load

18/47

Can all this work be considered as statistics?19/47

Can all this work be considered as statistics?

Or data science?

19/47

Can all this work be considered as statistics?

Or data science?

Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining, machine learning and big data. Wikipedia, accessed on 27th Nov 2020

19/47

Can all this work be considered as statistics?

Or data science?

Isn't this what statistics is about?

19/47

There are two types of people...

20/47

There are two types of people...

But actually, it's hugely unbalanced

21/47

Statistics by nature is inter-disciplinary and about extracting knowledge and insight from data

at least from my point of view

22/47

Statistics by nature is inter-disciplinary and about extracting knowledge and insight from data

at least from my point of view

but communication largely became exclusive to selective set of statistically well-trained people governed by strict adherence to defintions or protocols in lieu of practicality

which is not to say it is bad to do this

22/47

Far more people need to use statistics than just statistically well-trained people

Wickham (2015) Teaching Safe-Stats, Not Statistical Abstinence. The American Statistician, Online Discussion

23/47

Far more people need to use statistics than just statistically well-trained people

Statistics community need to go beyond talking among like-minded peers

Wickham (2015) Teaching Safe-Stats, Not Statistical Abstinence. The American Statistician, Online Discussion

23/47

3 Grammar of
Experimental Design

edibble 📦

(Work-In-Progress)

24/47

Typical course in experimental design
(at least at University of Sydney in 2017-2019)

Teach:

Completely Randomised Design
Randomised Complete Block Design
Latin Square Design
Balanced Incomplete Block Design
Factorial Design
2^k ~~Factorial Design~~ (I removed this from 2018, I won't talk about this today)
Split-plot Design (I added this from 2018 among other concepts)

25/47

Completely Randomised Design (CRD)

$t$ treatments randomised to $n$ units

$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{treatment} + \texttt{error}$

(with constraints and distributional assumptions)

26/47

Randomised Complete Block Design (RCBD)

$b$ blocks of size $t$
$t$ treatments randomised to $t$ units within each block

$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{treatment} + \texttt{block} + \texttt{error}$

27/47

Latin Square Design (LSD)

two orthogonal blocks of size $t$
$t$ treatments randomised to units such that every treatment appears exactly once in each block

$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{treatment} + \texttt{row} + \texttt{column} + \texttt{error}$

28/47

Balanced Incomplete Block Design (BIBD)

$b$ blocks of size $k < t$
$t$ treatments randomised to units within each block such that every pair of treatment appears the same number of times across blocks

$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{block} + \texttt{treatment} + \texttt{error}$

29/47

Factorial Design

$ab = t$ treatments randomised to $n$ units
treatment is every combination of two factors A and B

$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{A} + \texttt{B} + \texttt{A:B} + \texttt{error}$

30/47

Split-plot Design

$n_1$ whole plots consisting of $b$ sub plots
in total there are $n$ sub plots
treatment factor A is randomised to whole plots
treatment factor B is randomised to sub plots within each whole plot

$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{A} + \texttt{WP} + \texttt{B} + \texttt{A:B} + \texttt{error}$

31/47

CRAN Task View of Design of Experiments

contains

📦 229 R-packages

as of 2020-07-14

(Please note that there may be some webscrapping error)

32/47

Top 10 downloaded R-packages in 2018

`agricolae` is the most downloaded

(data from cranlogs from 2018-01-01 to 2018-12-31)

33/47

Top 10 downloaded R-packages in 2019

Okay fine, results were different in 2019

(data from cranlogs from 2019-01-01 to 2019-12-31)

Please note agricolae imports AlgDesign

34/47

`agricolae::design.crd`

for $t = 3$ treatments with $2$ replicates each


trt <- c("A", "B", "C")
agricolae::design.crd(trt = trt, r = 2) %>% glimpse()

## List of 2
##  $ parameters:List of 7
##   ..$ design: chr "crd"
##   ..$ trt   : chr [1:3] "A" "B" "C"
##   ..$ r     : num [1:3] 2 2 2
##   ..$ serie : num 2
##   ..$ seed  : int 396269021
##   ..$ kinds : chr "Super-Duper"
##   ..$       : logi TRUE
##  $ book      :'data.frame':    6 obs. of  3 variables:
##   ..$ plots: num [1:6] 101 102 103 104 105 106
##   ..$ r    : int [1:6] 1 1 2 1 2 2
##   ..$ trt  : chr [1:6] "C" "A" "C" "B" ...

35/47

`agricolae::design.rcbd`

for $t =3$ treatments with $2$ blocks


trt <- c("A", "B", "C")
agricolae::design.rcbd(trt = trt, r = 2) %>% glimpse()

## List of 3
##  $ parameters:List of 7
##   ..$ design: chr "rcbd"
##   ..$ trt   : chr [1:3] "A" "B" "C"
##   ..$ r     : num 2
##   ..$ serie : num 2
##   ..$ seed  : int 973575053
##   ..$ kinds : chr "Super-Duper"
##   ..$       : logi TRUE
##  $ sketch    : chr [1:2, 1:3] "C" "A" "B" "C" ...
##  $ book      :'data.frame':    6 obs. of  3 variables:
##   ..$ plots: num [1:6] 101 102 103 201 202 203
##   ..$ block: Factor w/ 2 levels "1","2": 1 1 1 2 2 2
##   ..$ trt  : Factor w/ 3 levels "A","B","C": 3 2 1 1 3 2

36/47

`agricolae::design.lsd()`

for $t = 3$ treatments


trt <- c("A", "B", "C")
agricolae::design.lsd(trt = trt) %>% glimpse()

## List of 3
##  $ parameters:List of 7
##   ..$ design: chr "lsd"
##   ..$ trt   : chr [1:3] "A" "B" "C"
##   ..$ r     : int 3
##   ..$ serie : num 2
##   ..$ seed  : int -1984440067
##   ..$ kinds : chr "Super-Duper"
##   ..$       : logi TRUE
##  $ sketch    : chr [1:3, 1:3] "B" "C" "A" "A" ...
##  $ book      :'data.frame':    9 obs. of  4 variables:
##   ..$ plots: num [1:9] 101 102 103 201 202 203 301 302 303
##   ..$ row  : Factor w/ 3 levels "1","2","3": 1 1 1 2 2 2 3 3 3
##   ..$ col  : Factor w/ 3 levels "1","2","3": 1 2 3 1 2 3 1 2 3
##   ..$ trt  : Factor w/ 3 levels "A","B","C": 2 1 3 3 2 1 1 3 2

37/47

`agricolae::design.bib()`

for $t = 3$ treatments with block size of $2$


trt <- c("A", "B", "C")
agricolae::design.bib(trt = trt, k = 2) %>% glimpse()

## [1] "No improvement over initial random design."
## 
## Parameters BIB
## ==============
## Lambda     : 1
## treatmeans : 3
## Block size : 2
## Blocks     : 3
## Replication: 2 
## 
## Efficiency factor 0.75 
## 
## <<< Book >>>

## List of 4
##  $ parameters:List of 6
##   ..$ design: chr "bib"
##   ..$ trt   : chr [1:3] "A" "B" "C"
##   ..$ k     : num 2
##   ..$ serie : num 2
##   ..$ seed  : int -509134623
##   ..$ kinds : chr "Super-Duper"
##  $ statistics:'data.frame':    1 obs. of  6 variables:
##   ..$ lambda    : num 1
##   ..$ treatmeans: int 3
##   ..$ blockSize : num 2
##   ..$ blocks    : int 3
##   ..$ r         : num 2
##   ..$ Efficiency: num 0.75
##  $ sketch    : chr [1:3, 1:2] "C" "B" "B" "A" ...
##  $ book      :'data.frame':    6 obs. of  3 variables:
##   ..$ plots: num [1:6] 101 102 201 202 301 302
##   ..$ block: Factor w/ 3 levels "1","2","3": 1 1 2 2 3 3
##   ..$ trt  : Factor w/ 3 levels "A","B","C": 3 1 2 1 2 3

38/47

`agricolae::design.ab()`

for $t = 3 \times 2$ treatments with $2$ replication for each treatment


agricolae::design.ab(trt = c(3, 2), r = 2, design = "crd") %>% glimpse()

## List of 2
##  $ parameters:List of 8
##   ..$ design : chr "factorial"
##   ..$ trt    : chr [1:6] "1 1" "1 2" "2 1" "2 2" ...
##   ..$ r      : num [1:6] 2 2 2 2 2 2
##   ..$ serie  : num 2
##   ..$ seed   : int 801301585
##   ..$ kinds  : chr "Super-Duper"
##   ..$        : logi TRUE
##   ..$ applied: chr "crd"
##  $ book      :'data.frame':    12 obs. of  4 variables:
##   ..$ plots: num [1:12] 101 102 103 104 105 106 107 108 109 110 ...
##   ..$ r    : int [1:12] 1 1 1 2 1 1 1 2 2 2 ...
##   ..$ A    : chr [1:12] "2" "3" "2" "2" ...
##   ..$ B    : chr [1:12] "2" "2" "1" "2" ...

Note not A/B testing!

39/47

`agricolae::design.split()`

for $t = 2 \times 4$ treatments with $2$ replication for each treatment


trt1 <- c("I", "R"); trt2 <- LETTERS[1:4]
agricolae::design.split(trt1 = trt1, trt2 = trt2, r = 2, design = "crd") %>% 
    glimpse()

## List of 2
##  $ parameters:List of 8
##   ..$ design : chr "split"
##   ..$        : logi TRUE
##   ..$ trt1   : chr [1:2] "I" "R"
##   ..$ applied: chr "crd"
##   ..$ r      : num [1:2] 2 2
##   ..$ serie  : num 2
##   ..$ seed   : int -765020087
##   ..$ kinds  : chr "Super-Duper"
##  $ book      :'data.frame':    16 obs. of  5 variables:
##   ..$ plots : num [1:16] 101 101 101 101 102 102 102 102 103 103 ...
##   ..$ splots: Factor w/ 4 levels "1","2","3","4": 1 2 3 4 1 2 3 4 1 2 ...
##   ..$ r     : int [1:16] 1 1 1 1 1 1 1 1 2 2 ...
##   ..$ trt1  : chr [1:16] "I" "I" "I" "I" ...
##   ..$ trt2  : chr [1:16] "D" "A" "C" "B" ...

40/47

Grammar of experimental design

Remember Base R plots? base plot

41/47

Grammar of experimental design

Remember Base R plots? base plot
🤔 "named experimental design" functions (agricolae::design.crd, etc.) are like "named statistical graphic" functions (pie, barplot)

41/47

Grammar of experimental design

Remember Base R plots? base plot
🤔 "named experimental design" functions (agricolae::design.crd, etc.) are like "named statistical graphic" functions (pie, barplot)
💡 construction of experimental design needs to be made more accessible, modifiable, extensible and generalisable

41/47

Grammar of experimental design

Remember Base R plots? base plot
🤔 "named experimental design" functions (agricolae::design.crd, etc.) are like "named statistical graphic" functions (pie, barplot)
💡 construction of experimental design needs to be made more accessible, modifiable, extensible and generalisable
This is the concept for the development of the edibble R-package
https://github.com/emitanaka/edibble
( WIP, sorry code not ready yet, please enjoy prototype and live demo instead)

41/47

Grammar of experimental design

Remember Base R plots? base plot
🤔 "named experimental design" functions (agricolae::design.crd, etc.) are like "named statistical graphic" functions (pie, barplot)
💡 construction of experimental design needs to be made more accessible, modifiable, extensible and generalisable
This is the concept for the development of the edibble R-package
https://github.com/emitanaka/edibble
( WIP, sorry code not ready yet, please enjoy prototype and live demo instead)
tibble R-package is a modern reimagining of the data.frame
edibble (WIP) creates experimental design tibbles

Müller & Wickham (2020) tibble: Simple Data Frames. R package version 3.0.3

41/47

Prototype for grammar of experimental designConsider a field experiment with 32 plots

library(edibble)
initiate_design(seed = 2020) %>% 
  set_units(plot = 32)

## An edibble design
## └─plot (32 levels)
42/47

Prototype Grammar of Experimental DesignConsider a field experiment with 32 plots
There are 7 different treatments

library(edibble)
initiate_design(seed = 2020) %>% 
  set_units(plot = 32) %>% 
  set_trts(trt = c("F12", "F3", "F6", "O",
                   "S12", "S3", "S6"))

## An edibble design
## ├─plot (32 levels)
## └─trt (7 levels)
42/47

Prototype Grammar of Experimental Design 1Consider a field experiment with 32 plots
There are 7 different treatments
Apply treatments to plots

library(edibble)
initiate_design(seed = 2020) %>% 
  set_units(plot = 32) %>% 
  set_trts(trt = c("F12", "F3", "F6", "O", 
                   "S12", "S3", "S6")) %>% 
  apply_trts(~ plot)

## An edibble design
## ├─plot (32 levels)
## └─trt (7 levels)
42/47

Prototype Grammar of Experimental Design 1Consider a field experiment with 32 plots
There are 7 different treatments
Apply treatments to plots
Randomise treatments to plots

library(edibble)
initiate_design(seed = 2020) %>% 
  set_units(plot = 32) %>% 
  set_trts(trt = c("F12", "F3", "F6", "O", 
                   "S12", "S3", "S6")) %>% 
  apply_trts(~ plot) %>% 
  randomise_trts()

## An edibble design
## ├─plot (32 levels)
## └─trt (7 levels)
42/47

Prototype Grammar of Experimental Design 1Consider a field experiment with 32 plots
There are 7 different treatments
Apply treatments to plots
Randomise treatments to plots
The result is a completely randomised design

library(edibble)
initiate_design(seed = 2020) %>% 
  set_units(plot = 32) %>% 
  set_trts(trt = c("F12", "F3", "F6", "O", 
                   "S12", "S3", "S6")) %>% 
  apply_trts(~ plot) %>% 
  randomise_trts() %>% 
  serve_table()

## # An edibble: 32 x 2
##    plot   trt  
##    <fct>  <fct>
##  1 plot1  F12  
##  2 plot2  S3   
##  3 plot3  O    
##  4 plot4  S6   
##  5 plot5  F3   
##  6 plot6  S12  
##  7 plot7  F6   
##  8 plot8  S6   
##  9 plot9  F3   
## 10 plot10 S3   
## # … with 22 more rows
42/47

Prototype Grammar of Experimental Design 2

Consider a field experiment with 2 blocks each with 16 plots
There are 7 different treatments
Apply treatments to plots
Randomise treatments to plots within blocks
Resulting design is a randomised block design

Can you see how it differs from the previous design?

library(edibble)
initiate_design(seed = 2020) %>% 
  set_units(block = c("B1", "B2"),
            plot = nested_in(block, 16)) %>%
  set_trts(trt = c("F12", "F3", "F6", "O", 
                   "S12", "S3", "S6")) %>% 
  apply_trts(~ plot) %>% 
  randomise_trts() %>% 
  serve_table()

## # An edibble: 32 x 3
##    block plot   trt  
##    <fct> <fct>  <fct>
##  1 B1    plot1  F12  
##  2 B1    plot2  S3   
##  3 B1    plot3  S12  
##  4 B1    plot4  S6   
##  5 B1    plot5  F3   
##  6 B1    plot6  S3   
##  7 B1    plot7  F6   
##  8 B1    plot8  S6   
##  9 B1    plot9  F3   
## 10 B1    plot10 S6   
## # … with 22 more rows

43/47

LIVE DEMO
44/47

LimitationsGeneralising experimental designs is extremely hard
A lot of domains have their own specialised design

45/47

LimitationsGeneralising experimental designs is extremely hard
A lot of domains have their own specialised design
Some specialist designs that have certain properties cannot be easily replicated by edibble
An ad-hoc approach may still be required to create design with specific properties

45/47

LimitationsGeneralising experimental designs is extremely hard
A lot of domains have their own specialised design
Some specialist designs that have certain properties cannot be easily replicated by edibble
An ad-hoc approach may still be required to create design with specific properties
but the hope is that this can be achieved with replacement in one of the steps (e.g. replacing randomise_trts)

45/47

LimitationsGeneralising experimental designs is extremely hard
A lot of domains have their own specialised design
Some specialist designs that have certain properties cannot be easily replicated by edibble
An ad-hoc approach may still be required to create design with specific properties
but the hope is that this can be achieved with replacement in one of the steps (e.g. replacing randomise_trts)
So why would anyone buy into this framework?

45/47

LimitationsGeneralising experimental designs is extremely hard
A lot of domains have their own specialised design
Some specialist designs that have certain properties cannot be easily replicated by edibble
An ad-hoc approach may still be required to create design with specific properties
but the hope is that this can be achieved with replacement in one of the steps (e.g. replacing randomise_trts)
So why would anyone buy into this framework?
Hopefully, developers and users can see the benefit in being more expressive in code; and also a pay-off like easy visualisation of designs becomes worthwhile

45/47

Timeline

By end of 2020, I intend to have edibble be able to construct textbook (mostly orthogonal) designs.
From 2021, I plan to hold (free) workshops in experimental design with edibble to generate certain designs.
The initial development for edibble is focussed on designs with "fixed" structures.
For survey and sampling designs you should check out DeclareDesign.

Feedback is welcomed!

emitanaka/edibble/issues
emi.tanaka@monash.edu

Slides can be found at emitanaka.org/slides/SSA2020/

46/47

Acknowledgements

This slide was made using xaringan R-package and the following systems.

sessioninfo::session_info()

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 4.0.1 (2020-06-06)
##  os       macOS Catalina 10.15.7      
##  system   x86_64, darwin17.0          
##  ui       X11                         
##  language (EN)                        
##  collate  en_AU.UTF-8                 
##  ctype    en_AU.UTF-8                 
##  tz       Australia/Melbourne         
##  date     2020-11-27                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version    date       lib source                              
##  agricolae     1.3-3      2020-06-07 [1] CRAN (R 4.0.1)                      
##  AlgDesign     1.2.0      2019-11-29 [1] CRAN (R 4.0.1)                      
##  anicon        0.1.0      2020-06-21 [1] Github (emitanaka/anicon@0b756df)   
##  assertthat    0.2.1      2019-03-21 [2] CRAN (R 4.0.0)                      
##  backports     1.2.0      2020-11-02 [1] CRAN (R 4.0.2)                      
##  broom         0.7.2      2020-10-20 [1] CRAN (R 4.0.2)                      
##  callr         3.5.1      2020-10-13 [1] CRAN (R 4.0.2)                      
##  cellranger    1.1.0      2016-07-27 [2] CRAN (R 4.0.0)                      
##  cli           2.2.0      2020-11-20 [1] CRAN (R 4.0.1)                      
##  cluster       2.1.0      2019-06-19 [2] CRAN (R 4.0.1)                      
##  colorspace    2.0-0      2020-11-11 [1] CRAN (R 4.0.2)                      
##  combinat      0.0-8      2012-10-29 [1] CRAN (R 4.0.1)                      
##  countdown     0.3.5      2020-07-20 [1] Github (gadenbuie/countdown@a544fa4)
##  crayon        1.3.4      2017-09-16 [2] CRAN (R 4.0.0)                      
##  DBI           1.1.0      2019-12-15 [1] CRAN (R 4.0.2)                      
##  dbplyr        2.0.0      2020-11-03 [1] CRAN (R 4.0.2)                      
##  digest        0.6.27     2020-10-24 [1] CRAN (R 4.0.2)                      
##  dplyr       * 1.0.2      2020-08-18 [1] CRAN (R 4.0.2)                      
##  edibble     * 0.0.0.9000 2020-11-15 [1] local                               
##  ellipsis      0.3.1      2020-05-15 [2] CRAN (R 4.0.0)                      
##  emo           0.0.0.9000 2020-06-26 [1] Github (hadley/emo@3f03b11)         
##  evaluate      0.14       2019-05-28 [2] CRAN (R 4.0.0)                      
##  fansi         0.4.1      2020-01-08 [2] CRAN (R 4.0.0)                      
##  farver        2.0.3.9000 2020-07-24 [1] Github (thomasp85/farver@f1bcb56)   
##  fastmap       1.0.1      2019-10-08 [2] CRAN (R 4.0.0)                      
##  forcats     * 0.5.0      2020-03-01 [2] CRAN (R 4.0.0)                      
##  fs            1.5.0      2020-07-31 [1] CRAN (R 4.0.2)                      
##  generics      0.1.0      2020-10-31 [2] CRAN (R 4.0.2)                      
##  ggplot2     * 3.3.2      2020-06-19 [1] CRAN (R 4.0.2)                      
##  glue          1.4.2      2020-08-27 [1] CRAN (R 4.0.2)                      
##  gtable        0.3.0      2019-03-25 [2] CRAN (R 4.0.0)                      
##  haven         2.3.1      2020-06-01 [2] CRAN (R 4.0.0)                      
##  highr         0.8        2019-03-20 [2] CRAN (R 4.0.0)                      
##  hms           0.5.3      2020-01-08 [2] CRAN (R 4.0.0)                      
##  htmltools     0.5.0      2020-06-16 [1] CRAN (R 4.0.2)                      
##  httpuv        1.5.4      2020-06-06 [2] CRAN (R 4.0.1)                      
##  httr          1.4.2      2020-07-20 [1] CRAN (R 4.0.2)                      
##  icon          0.1.0      2020-06-21 [1] Github (emitanaka/icon@8458546)     
##  igraph        1.2.6      2020-10-06 [1] CRAN (R 4.0.2)                      
##  jsonlite      1.7.1      2020-09-07 [1] CRAN (R 4.0.2)                      
##  klaR          0.6-15     2020-02-19 [1] CRAN (R 4.0.1)                      
##  knitr         1.30       2020-09-22 [1] CRAN (R 4.0.2)                      
##  labeling      0.4.2      2020-10-20 [1] CRAN (R 4.0.2)                      
##  labelled      2.7.0      2020-09-21 [1] CRAN (R 4.0.2)                      
##  later         1.1.0.1    2020-06-05 [2] CRAN (R 4.0.1)                      
##  lattice       0.20-41    2020-04-02 [2] CRAN (R 4.0.1)                      
##  lifecycle     0.2.0      2020-03-06 [1] CRAN (R 4.0.0)                      
##  lubridate     1.7.9      2020-06-08 [2] CRAN (R 4.0.1)                      
##  magrittr      2.0.1      2020-11-17 [1] CRAN (R 4.0.2)                      
##  MASS          7.3-53     2020-09-09 [1] CRAN (R 4.0.2)                      
##  mime          0.9        2020-02-04 [2] CRAN (R 4.0.0)                      
##  miniUI        0.1.1.1    2018-05-18 [1] CRAN (R 4.0.0)                      
##  modelr        0.1.8      2020-05-19 [2] CRAN (R 4.0.0)                      
##  munsell       0.5.0      2018-06-12 [2] CRAN (R 4.0.0)                      
##  nlme          3.1-150    2020-10-24 [2] CRAN (R 4.0.2)                      
##  pillar        1.4.7      2020-11-20 [1] CRAN (R 4.0.1)                      
##  pkgconfig     2.0.3      2019-09-22 [2] CRAN (R 4.0.0)                      
##  processx      3.4.4      2020-09-03 [1] CRAN (R 4.0.2)                      
##  promises      1.1.1      2020-06-09 [1] CRAN (R 4.0.0)                      
##  ps            1.4.0      2020-10-07 [1] CRAN (R 4.0.2)                      
##  purrr       * 0.3.4      2020-04-17 [2] CRAN (R 4.0.0)                      
##  questionr     0.7.3      2020-10-05 [1] CRAN (R 4.0.2)                      
##  R6            2.5.0      2020-10-28 [1] CRAN (R 4.0.2)                      
##  Rcpp          1.0.5      2020-07-06 [1] CRAN (R 4.0.0)                      
##  readr       * 1.4.0      2020-10-05 [2] CRAN (R 4.0.2)                      
##  readxl        1.3.1      2019-03-13 [2] CRAN (R 4.0.0)                      
##  reprex        0.3.0.9001 2020-08-08 [1] Github (tidyverse/reprex@9594ee9)   
##  rlang         0.4.8      2020-10-08 [1] CRAN (R 4.0.2)                      
##  rmarkdown     2.5        2020-10-21 [1] CRAN (R 4.0.1)                      
##  rstudioapi    0.13       2020-11-12 [1] CRAN (R 4.0.1)                      
##  rvest         0.3.6      2020-07-25 [1] CRAN (R 4.0.2)                      
##  scales        1.1.1      2020-05-11 [2] CRAN (R 4.0.0)                      
##  sessioninfo   1.1.1      2018-11-05 [2] CRAN (R 4.0.0)                      
##  shiny         1.5.0      2020-06-23 [1] CRAN (R 4.0.2)                      
##  stringi       1.5.3      2020-09-09 [2] CRAN (R 4.0.2)                      
##  stringr     * 1.4.0      2019-02-10 [2] CRAN (R 4.0.0)                      
##  tibble      * 3.0.4.9000 2020-11-26 [1] Github (tidyverse/tibble@9eeef4d)   
##  tidyr       * 1.1.2      2020-08-27 [1] CRAN (R 4.0.2)                      
##  tidyselect    1.1.0      2020-05-11 [2] CRAN (R 4.0.0)                      
##  tidyverse   * 1.3.0      2019-11-21 [1] CRAN (R 4.0.2)                      
##  utf8          1.1.4      2018-05-24 [2] CRAN (R 4.0.0)                      
##  vctrs         0.3.5.9000 2020-11-26 [1] Github (r-lib/vctrs@957baf7)        
##  viridisLite   0.3.0      2018-02-01 [2] CRAN (R 4.0.0)                      
##  whisker       0.4        2019-08-28 [2] CRAN (R 4.0.0)                      
##  withr         2.3.0      2020-09-22 [1] CRAN (R 4.0.2)                      
##  xaringan      0.18       2020-10-21 [1] CRAN (R 4.0.2)                      
##  xfun          0.19       2020-10-30 [1] CRAN (R 4.0.2)                      
##  xml2          1.3.2      2020-04-23 [2] CRAN (R 4.0.0)                      
##  xtable        1.8-4      2019-04-21 [2] CRAN (R 4.0.0)                      
##  yaml          2.2.1      2020-02-01 [1] CRAN (R 4.0.2)                      
## 
## [1] /Users/etan0038/Library/R/4.0/library
## [2] /Library/Frameworks/R.framework/Versions/4.0/Resources/library

(Scroll on html slide to see all)

47/47

Advent of "Grammar"

Bridging Statistics and Data Science for the Design of Experiments

Emi Tanaka

Department of Econometrics and Business Statistics

emi.tanaka@monash.edu @statsgen

27th Nov 2020 @ Statistical Society of Australia

1/47

Paused

Help

Keyboard shortcuts

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Esc	Back to slideshow

Advent of "Grammar"

Bridging Statistics and Data Science for the Design of Experiments

1

Grammar of Graphics

A typical Australian academic is expected to do 40% teaching, 40% research and 20% administration

What are the differences between these plots?

What are the differences between these plots?

Using Base R

Using Base R

Using ggplot2

Using ggplot2

Using ggplot2

Data Plot

Do you like potatoes?

Do you like potatoes?

Let's have a look at a potato field experiment to understand the infection rate when it's treated with sulfur at different season and amount?

How do we get these graphs?

How do we get these graphs?

Lessons from the grammar of graphics in ggplot2

Lessons from the grammar of graphics in ggplot2

Lessons from the grammar of graphics in ggplot2

2

Grammar of Data Manipulation

What does the code below do?

Which is easier to read for you? Part 1

Which is easier to read for you? Part 1

Which is easier to read for you? Part 1

Which is easier to read for you? Part 2

Which is easier to read for you? Part 3

Which is easier to read for you? Part 3

Which is easier to read for you? Part 3

Reading code like English grammar

Lessons from grammar of data manipulation in dplyr

Lessons from grammar of data manipulation in dplyr

Lessons from grammar of data manipulation in dplyr

Can all this work be considered as statistics?

Can all this work be considered as statistics?

Or data science?

Can all this work be considered as statistics?

Or data science?

Can all this work be considered as statistics?

Or data science?

Isn't this what statistics is about?

There are two types of people...

There are two types of people...

But actually, it's hugely unbalanced

Statistics by nature is inter-disciplinary and about extracting knowledge and insight from data

Statistics by nature is inter-disciplinary and about extracting knowledge and insight from data

but communication largely became exclusive to selective set of statistically well-trained people governed by strict adherence to defintions or protocols in lieu of practicality

Far more people need to use statistics than just statistically well-trained people

Far more people need to use statistics than just statistically well-trained people

Statistics community need to go beyond talking among like-minded peers

3

Grammar of Experimental Design

Typical course in experimental design (at least at University of Sydney in 2017-2019)

Completely Randomised Design (CRD)

Randomised Complete Block Design (RCBD)

Latin Square Design (LSD)

Balanced Incomplete Block Design (BIBD)

Factorial Design

Split-plot Design

CRAN Task View of Design of Experiments

📦 229 R-packages

Top 10 downloaded R-packages in 2018

agricolae is the most downloaded

Top 10 downloaded R-packages in 2019

Okay fine, results were different in 2019

agricolae::design.crd

agricolae::design.rcbd

agricolae::design.lsd()

agricolae::design.bib()

agricolae::design.ab()

agricolae::design.split()

Grammar of experimental design

Grammar of experimental design

Grammar of experimental design

Grammar of experimental design

Grammar of experimental design

Prototype for grammar of experimental design

Prototype Grammar of Experimental Design

Using `ggplot2`

Using `ggplot2`

Using `ggplot2`

Lessons from the grammar of graphics
in `ggplot2`

Lessons from the grammar of graphics
in `ggplot2`

Lessons from the grammar of graphics
in `ggplot2`

Grammar of
Data Manipulation

Lessons from grammar of data manipulation
in `dplyr`

Lessons from grammar of data manipulation
in `dplyr`

Lessons from grammar of data manipulation
in `dplyr`

Grammar of
Experimental Design

Typical course in experimental design
(at least at University of Sydney in 2017-2019)

`agricolae` is the most downloaded

`agricolae::design.crd`

`agricolae::design.rcbd`

`agricolae::design.lsd()`

`agricolae::design.bib()`

`agricolae::design.ab()`

`agricolae::design.split()`