Prototyping the Grammar of Experimental Design

background-color: #98ebee
class: middle center

<div class="shade_black" style="width:calc(45%);right:0;bottom:0;padding-left:5px;border: dashed 4px white;margin: auto;">
 These slides are viewed best by Chrome and occasionally need to be refreshed if elements did not load properly. See here for <a href=slides-edibble.pdf style="color:black!important"/>PDF </a>. 
</div>

---

count: false
background-image: url("images/bg1.png")
background-size: cover
class: hide-slide-number

:::::::::: { .grid grid: 1fr / 7fr 3fr;}

::: {.item .shade_black border-right-style: solid; border-right-color: white;}

# Prototyping the Grammar of Experimental Design

##

Presented by Emi Tanaka

School of Mathematics and Statistics 
 dr.emi.tanaka@gmail.com
 @statsgen

27th Nov 2019 @ NUMBAT | Melbourne, Australia

::: item

::::::::::

---

# Hi!

# Welcome to my experimental talk on

# .blue[Experimental Design]

where I pretend to know what I'm talking about

---

# Tidyverse packages and friends facilitates data science project workflow

---

# For experimental studies, statistical work starts before data import

---

Good

# Experimental Design

in theory

# give you more information

# without any extra cost!

---

# Good

Experimental Design

# in theory

give you more information

without any extra cost!
 
# .pink[What does that mean??]

---

How does

# .blue[*Grammar*]
# of 
# Experimental Design

help?

---

# (Layered) Grammar of Graphics `ggplot2`

---

# Base Plots `graphics`

Single purpose functions to generate "named plots"

&#x2717;  Cannot modify the resulting graphic  
&#x2717;  Not really extensible or generalisable

::: grid

::: { .item border-right: dashed 3px black; }

```r
barplot(as.matrix(df$perc), 
        legend = df$what)
```

:::

::: item

```r
pie(df$perc, labels = df$what)
```

:::

---

Just as there are .font_large["named plots"],  
there are  
# "named experimental designs"

---

# Typical course in experimental design .font_small[(at least in USYD in 2017-2019)]

::: paddings

Teach:

* Completely Randomised Design
* Randomised Complete Block Design
* Latin Square Design
* Balanced Incomplete Block Design
* Factorial Design
* <strike> 2</strike>k<strike> Factorial Design</strike> .font_small[(I removed this from 2018, I won't talk about this today)]
* Split-plot Design .font_small[(I added this from 2018 among other concepts)]

:::

---

# Completely Randomised Design (CRD)

::: grid

::: item

:::

::: item

* `$t$` treatments randomised to `$n$` units

`$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{treatment} + \texttt{error}$$`

]

:::

---

# Randomised Complete Block Design (RCBD)

::: grid

::: item

:::

::: item

* `$b$` blocks of size `$t$`
* `$t$` treatments randomised to `$t$` units within each block

`$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{treatment} + \texttt{block} + \texttt{error}$$`

]

:::

---

# Latin Square Design (LSD)

::: grid

::: item

:::

::: item

* two orthogonal blocks of size `$t$`
* `$t$` treatments randomised to units such that every treatment appears exactly once in each block

`$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{treatment} + \texttt{row} + \texttt{column} + \texttt{error}$$`

]

:::

---

# Balanced Incomplete Block Design (BIBD)
 
::: grid

::: item

:::

::: item

* `$b$` blocks of size `$k < t$`
* `$t$` treatments randomised to units within each block such that every pair of treatment appears the same number of times across blocks

`$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{block} + \texttt{treatment} + \texttt{error}$$`

]

:::

---

# Factorial Design

::: grid

::: item

:::

::: item

* `$ab = t$` treatments randomised to `$n$` units
* treatment is every combination of two factors A and B

`$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{A} + \texttt{B} + \texttt{A:B} + \texttt{error}$$`

<center>
<img src="images/factorial-eg1-anova-top.png" width = "850px"/>
<details style="font-size:4pt"><summary></summary>
<img src="images/factorial-eg1-anova-middle.png" width = "850px"/>
</details>
<img src="images/factorial-eg1-anova-bottom.png" width = "850px"/>
</center>
]

:::

---

# Split-plot Design

::: grid

::: item

:::

::: {.item font-size: 0.85em; }

* `$n_1$` whole plots consisting of `$b$` sub plots
* in total there are `$n$` sub plots
* treatment factor A is randomised to whole plots
* treatment factor B is randomsied to sub plots within each whole plot

`$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{A} + \texttt{WP} + \texttt{B} + \texttt{A:B} + \texttt{error}$$`

<center>
<img src="images/split-plot-eg1-anova.png" width = "850px"/>
</center>
]

:::

---

# CRAN Task View of Design of Experiments

contains

# 📦 .blue[239 R-packages ]

---

# Top 10 downloaded R-packages in 2018

# `agricolae` is the most downloaded

---

# `agricolae::design.crd`

.blue[**Completely randomised design**] for `$t = 3$` treatments with `$2$` replicates each
<pre><code>
trt <- c("A", "B", "C")
agricolae::.bg-yellow[design.crd](trt = trt, r = 2) %>% glimpse()
</code></pre>

```
  List of 2
   $ parameters:List of 7
    ..$ design: chr "crd"
    ..$ trt   : chr [1:3] "A" "B" "C"
    ..$ r     : num [1:3] 2 2 2
    ..$ serie : num 2
    ..$ seed  : int 2114380113
    ..$ kinds : chr "Super-Duper"
    ..$       : logi TRUE
   $ book      :'data.frame':	6 obs. of  3 variables:
    ..$ plots: num [1:6] 101 102 103 104 105 106
    ..$ r    : int [1:6] 1 1 2 2 1 2
    ..$ trt  : Factor w/ 3 levels "A","B","C": 3 1 3 1 2 2
```
]

::: {.plot-box .pos top: 35%; right: 50px;}

:::

---

# `agricolae::design.rcbd`

<pre><code>
trt <- c("A", "B", "C")
agricolae::.bg-yellow[design.rcbd](trt = trt, r = 2) %>% glimpse()
</code></pre>

```
  List of 3
   $ parameters:List of 7
    ..$ design: chr "rcbd"
    ..$ trt   : chr [1:3] "A" "B" "C"
    ..$ r     : num 2
    ..$ serie : num 2
    ..$ seed  : int -997119203
    ..$ kinds : chr "Super-Duper"
    ..$       : logi TRUE
   $ sketch    : chr [1:2, 1:3] "C" "C" "A" "A" ...
   $ book      :'data.frame':	6 obs. of  3 variables:
    ..$ plots: num [1:6] 101 102 103 201 202 203
    ..$ block: Factor w/ 2 levels "1","2": 1 1 1 2 2 2
    ..$ trt  : Factor w/ 3 levels "A","B","C": 3 1 2 3 1 2
```
]

::: { .info-box .pos top: 35%; right: 50px; width: 500px;}

* The `r` format is different (probably because this is a *balanced* design)
* There is a `sketch` in the list

:::

::: {.plot-box .pos bottom: 10px; right: 50px;}

:::

---

# `agricolae::design.lsd()`

<pre><code>
trt <- c("A", "B", "C")
agricolae::.bg-yellow[design.lsd](trt = trt) %>% glimpse()
</code></pre>

```
  List of 3
   $ parameters:List of 7
    ..$ design: chr "lsd"
    ..$ trt   : chr [1:3] "A" "B" "C"
    ..$ r     : int 3
    ..$ serie : num 2
    ..$ seed  : int -205979691
    ..$ kinds : chr "Super-Duper"
    ..$       : logi TRUE
   $ sketch    : chr [1:3, 1:3] "C" "A" "B" "B" ...
   $ book      :'data.frame':	9 obs. of  4 variables:
    ..$ plots: num [1:9] 101 102 103 201 202 203 301 302 303
    ..$ row  : Factor w/ 3 levels "1","2","3": 1 1 1 2 2 2 3 3 3
    ..$ col  : Factor w/ 3 levels "1","2","3": 1 2 3 1 2 3 1 2 3
    ..$ trt  : Factor w/ 3 levels "A","B","C": 3 2 1 1 3 2 2 1 3
```
]

::: {.plot-box .pos top: 100px; right: 50px;}

:::

---

# `agricolae::design.bib()`

<pre><code>
trt <- c("A", "B", "C")
agricolae::.bg-yellow[design.bib](trt = trt, k = 2) %>% glimpse()
</code></pre>

```
 [1] "No improvement over initial random design."
 
 Parameters BIB
 ==============
 Lambda : 1
 treatmeans : 3
 Block size : 2
 Blocks : 3
 Replication: 2 
 
 Efficiency factor 0.75 
 
 <<< Book >>>
```

```
  List of 4
   $ parameters:List of 6
    ..$ design: chr "bib"
    ..$ trt   : chr [1:3] "A" "B" "C"
    ..$ k     : num 2
    ..$ serie : num 2
    ..$ seed  : int 1122489977
    ..$ kinds : chr "Super-Duper"
   $ statistics:'data.frame':	1 obs. of  6 variables:
    ..$ lambda    : num 1
    ..$ treatmeans: int 3
    ..$ blockSize : num 2
    ..$ blocks    : int 3
    ..$ r         : num 2
    ..$ Efficiency: num 0.75
   $ sketch    : chr [1:3, 1:2] "C" "A" "B" "A" ...
   $ book      :'data.frame':	6 obs. of  3 variables:
    ..$ plots: num [1:6] 101 102 201 202 301 302
    ..$ block: Factor w/ 3 levels "1","2","3": 1 1 2 2 3 3
    ..$ trt  : Factor w/ 3 levels "A","B","C": 3 1 1 2 2 3
```
]

::: {.plot-box .pos top: 200px; right: 50px;}

:::

---

# `agricolae::design.ab()`

.blue[**Factorial design**] for `$t = 3 \times 2$` treatments with `$2$` replication for each treatment

<pre><code>
agricolae::.bg-yellow[design.ab](trt = c(3, 2), r = 2, design = "crd") %>% glimpse()
</code></pre>

```
  List of 2
   $ parameters:List of 8
    ..$ design : chr "factorial"
    ..$ trt    : chr [1:6] "1 1" "1 2" "2 1" "2 2" ...
    ..$ r      : num [1:6] 2 2 2 2 2 2
    ..$ serie  : num 2
    ..$ seed   : int 1588382117
    ..$ kinds  : chr "Super-Duper"
    ..$        : logi TRUE
    ..$ applied: chr "crd"
   $ book      :'data.frame':	12 obs. of  4 variables:
    ..$ plots: num [1:12] 101 102 103 104 105 106 107 108 109 110 ...
    ..$ r    : int [1:12] 1 1 1 1 2 2 2 1 1 2 ...
    ..$ A    : Factor w/ 3 levels "1","2","3": 3 2 2 3 2 3 3 1 1 1 ...
    ..$ B    : Factor w/ 2 levels "1","2": 2 2 1 1 1 1 2 1 2 1 ...
```
]

::: {.plot-box .pos top: 200px; right: 50px;}

:::

::: {.info-box .pos top: 60%; right: 350px; }

Note NOT A/B testing!

:::

---

# `agricolae::design.split()`

.blue[**Split-plot design**] for `$t = 2 \times 4$` treatments with `$2$` replication for each treatment

<pre><code>
trt1 <- c("I", "R"); trt2 <- LETTERS[1:4]
agricolae::.bg-yellow[design.split](trt1 = trt1, trt2 = trt2, r = 2, design = "crd") %>% 
 glimpse()
</code></pre>

```
  List of 2
   $ parameters:List of 8
    ..$ design : chr "split"
    ..$        : logi TRUE
    ..$ trt1   : chr [1:2] "I" "R"
    ..$ applied: chr "crd"
    ..$ r      : num [1:2] 2 2
    ..$ serie  : num 2
    ..$ seed   : int 1151770077
    ..$ kinds  : chr "Super-Duper"
   $ book      :'data.frame':	16 obs. of  5 variables:
    ..$ plots : num [1:16] 101 101 101 101 102 102 102 102 103 103 ...
    ..$ splots: Factor w/ 4 levels "1","2","3","4": 1 2 3 4 1 2 3 4 1 2 ...
    ..$ r     : int [1:16] 1 1 1 1 1 1 1 1 2 2 ...
    ..$ trt1  : Factor w/ 2 levels "I","R": 2 2 2 2 1 1 1 1 2 2 ...
    ..$ trt2  : Factor w/ 4 levels "B","C","D","A": 3 2 1 4 3 2 4 1 2 3 ...
```
]

::: {.plot-box .pos top: 300px; right: 50px;}

:::

---

A **modular approach** in teaching and software **deters**
# .blue[*statistical thinking*] 
of experimental designs

where connections between different experimental designs are lost

---

::: paddings

.font_large[.blue[**Experimental design should be adapted to resources**] *rather than* resources adapted to
the experimental design*]

E.g. scientist decides to remove some experimental units to fit a randomised complete block design

:::

::: { .bottom_abs .width100 font-size: 0.6em; }

*Exceptions apply for adaptive experiments or sampling experiments (where sample size is the main the focus)

:::

---

Identify

# elements of experimental design

.footnote[
with inspiration from 
.font_medium[Bailey (2008) Design of Comparative Experiments. Cambridge University Press]
]

---

# Wise words of Rosemary Bailey

::: { padding-left: 5%; padding-right: 25%; font-size: 0.9em; }

* A .blue[**treatment**] is the entire description of what can be applied to an experimental unit
* An .blue[**experimental unit**] is the smallest unit to which a treatment can be applied
* An .blue[**observational unit**] is the smallest unit on which a response will be measured
* .blue[**Treatment structure**] means meaningful ways of dividing up the whole set of treatment denoted by `$\mathcal{T}$`
* .blue[**Unit structure**] means meaningful ways of dividing up the set `$\Omega$` of units, ignoring the treatments
* .blue[**Replication**] is the number of times that each treatment is tested

:::

::: { .info-box .pos top: 40%; right: 5px; width: 230px; }

Observational unit is not the same as observation!

:::

---

# Definition of Design of Experiments

::: paddings

* The .blue[**design**] is the allocation of treatments to units

* Or mathematically, the design is the function or  .blue[**mapping**] `$T$` of unit to treatment, `$$T: \Omega_{\scriptsize\text{(units)}} \rightarrow \mathcal{T}_{\scriptsize\text{(treatments)}}$$`
so `$\forall \omega \in \Omega$`, `$T(\omega) = \alpha$` where `$\alpha \in \mathcal{T}$`
* `$T$` is surjective .font_small[(not that important for later but I realised this and thought it was cool)]

---

# Example: Calf Feeding

::: { .grid grid: 1fr / 380px 1fr; }

::: { .item .bg-grey font-size: 0.7em; }

* Three feed treatments are compared on 24 calves
* The calves are kept in 6 pens with 4
calves per pen
* Each feed is applied to two whole pens
* Every calf is weighed individually

:::

::: item

::: info-box

* **Treatments** are the 3 feeds
* **Experimental units** are the 6 pens
* **Observational units** are the 24 calves
* **Treatment structure** is unstructured 
* **Unit structure** is such that 4 calves are grouped to each pen

:::

:::

---

# Replication vs Repetition

::: { .grid grid: 1fr / 380px 1fr; }

::: { .item .bg-grey font-size: 0.7em; }

* Three feed treatments are compared on 24 calves
* The calves are kept in 6 pens with 4
calves per pen
* Each feed is applied to two whole pens
* Every calf is weighed individually

:::

::: item

In this case we say that: 
* the **replication** of each treatment is 2, and 
* the .blue[**repetition**] of each treatment is 8

Note well the denominator of `$f$` statistic and degrees of freedoms for the `$F$` distribution!

:::

---

# Example: Grafting on horses

::: { .grid grid: 1fr / 380px 1fr; }

::: { .item .bg-grey font-size: 0.4em; }

* A surgeon is going to use 9 horses in an experiment
* He wants to compare 3 methods of grafting
skin
* He intended to use 3 animals for each method
* After the graft was complete he would take a sample
of new skin from each horse
* He would then cut each sample into 20 (tiny) pieces
and use a precision instrument to measure the
thickness of each piece

:::

::: item

::: info-box

* **Treatments** are the 3 grafting methods
* **Experimental units** are the 9 horses
* **Observational units** are the 20 `$\times$` 9 skin pieces
* **Treatment structure** is unstructured 
* **Unit structure** is such that 20 skin pieces are grouped by the horse which the skin was taken from
* Each treatment **replication** is 3

:::

:::

---

# Example: Grafting on horses - simulation

::: paddings5

::: { .code-box .font_smaller width: 1100px; }

```r
set.seed(1)
nani <- 9; ntrt <- 3; ncut <- 20; n <- nani * ncut
graft <- factor(rep(LETTERS[1:ntrt], each = nani / ntrt * ncut))
animal <- factor(rep(LETTERS[1:nani], each = ncut))
anidev <- rnorm(nani, 0, 20)
*trt <- c(300, 300, 300)
y <- trt[as.numeric(graft)] + anidev[as.numeric(animal)] + rnorm(n, 0, 5)
```

:::

```r
anova(lm(y ~ graft + animal))
```

```
 Analysis of Variance Table
 
 Response: y
 Df Sum Sq Mean Sq F value Pr(>F) 
 graft 2 6676 3338.1 106.31 < 2.2e-16 ***
 animal 6 54575 9095.8 289.68 < 2.2e-16 ***
 Residuals 171 5369 31.4 
 ---
 Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

:::

::: {.info-box .pos right:10px; bottom:220px; width: 650px; }

* Grafting methods are compared with (OU) residuals
* Grafting methods appear statistically significant even though in the simulation, it has the same effects

:::

---

# Example: Grafting on horses appropriate analysis

::: paddings

Comparison should happen in the valid stratum

```r
summary(aov(y ~ graft + Error(animal)))
```

```
  
  Error: animal
            Df Sum Sq Mean Sq F value Pr(>F)
  graft      2   6676    3338   0.367  0.707
  Residuals  6  54575    9096               
  
  Error: Within
             Df Sum Sq Mean Sq F value Pr(>F)
  Residuals 171   5369    31.4
```

:::

::: {.info-box .pos right:10px; bottom:190px; width: 300px; }

This time the result is statistically non-significant

:::

---

# Pseudo-replication

::: paddings

* .blue[**Pseudo-replication**] or **false replication** describes a situation where multiple measurements, taken from the same experimental unit, are treated as replication

* **Technical replication** occurs when mulitple measurements are taken from the same unit 
* Technical replication is always a pseudo-replication 
* **Biological replication** occurs when measurements are taken from independent biological subjects

:::

---

# Example: Chick weight

::: grid

::: { .item font-size: 0.38em; }

* An experiment was conducted on a prairie in Western Canada to find out if insecticides used to control grasshoppers affected the weight of young chicks of ring-necked pheasants, either by affecting the grass around the chicks or by affecting the
grasshoppers eaten by the chicks.
* Three insecticides were used, at low and high doses.
* The low dose was the highest dose recommended by the department of agriculture;
the high dose was four times as much as the recommended dose, to assess the effects
of mistakes.
* The experimental procedure took place in each of three consecutive weeks.
* On the first day of each week a number of newly-hatched female pheasant chicks
were placed in a brooder pen.
* On the third day, the chicks were randomly divided into twelve groups of six chicks
each.
* Each chick was given an identification tape and weighed.
* On the fourth day, a portion of the field was divided into three strips, each of which
was divided into two swathes.
* The two swathes within each strip were sprayed with the two doses of the same
insecticide.
* Two pens were erected on each swathe, and one group of pheasant chicks was put
into each pen.
* For the next 48 hours, the chicks were fed with grasshoppers which had been collected
locally.
* Half the grasshoppers were anaesthetized and sprayed with insecticide; the other half
were also anaesthetized and handled in every way like the first half except that they
were not sprayed.
* All grasshoppers were frozen.
* The experimenters maintained a supply of frozen grasshoppers to each pen, putting
them on small platforms so that they would not absorb further insecticide from the
grass.
* In each swathe, one pen had unsprayed grasshoppers while the other had grasshoppers sprayed by the insecticide which had been applied to that swathe.
* At the end of the 48 hours, the chicks were weighed again individually. .font_large[😵 😵 😵]

:::

::: item

::: {.scroll-350 height: 600px;}
<center>
<img src="images/eg4-chicks.png" width = "500px" />
</center>

:::

::: { .footnote font-size: 8pt; }

Experiment based from Martin et al. (1996) Effects of grasshopper-control insecticides on survival and brain acetylcholinesterase of pheasant (Phasianus colchicus) chicks.
Environmental Toxicology and Chemistry **15**(4) 518-524.

:::

---

# Example: Chick weight - skeleton ANOVA

::: grid

::: item

::: { .info-box font-size: 0.7em; }

* **Treatments** are the factorial combination of insecticide (3 types), dose (low or high), and grasshopper food (sprayed or not) 
* .pink[**Experimental units** are the pens?]
* **Observational units** are the chicks
* **Treatment structure** is such that:
   * different insecticide is applied to strips, 
   * different dosage level is applied to swathes,
   * sprayed or unsprayed grasshopper is left in pens
* **Unit structure** is given as `Week/Strips/Swathes/Pens/Chick` via Wilkinson & Roger (1973) notation

:::

::: item

:::

::: { .footnote font-size: 8pt; }

Wilkinson and Rogers (1973) Symbolic Description of Factorial Models for Analysis of Variance. Journal of the Royal Statistical Society: Series C (Applied Statistics). **22**(3) 392-399

:::

---

How to describe a situation were there are .font_large[**multiple response measurements**] (i.e. multi-variate or multi-trait data)?

Also what about where measurement is over .font_large[**time**] (e.g. longitudinal data)?

---

# Refining Rosemary Bailey's definitions

::: paddings

"Each of you is perfect the way you are ... and you can use a little improvement."

Shunryu Suzuki

:::

---

# Treatment factor

::: paddings

* We need to distinguish between **treatment** and .blue[**treatment factor**]
.font_small[
* E.g. in the chick weight, `Dose`, `Insecticide` and `Food` are treatment factors
* Should `Food:Dose` etc be a treatment factor? Probably not if it can be distinguished as its own factor. Maybe something like "smallest division of factor such that it cannot be partitioned to other factors"
]
* The combination of treatment factors make a treatment
.font_small[
* E.g. in the chick weight, `Dose:Insecticide:Food` specify the treatment
]
* Not all combination is required in the experiment 
.font_small[
*  fractional factorial experiment
]
* If all combination is present then it is referred to as .blue[**factorial**] combination

:::

---

# Multiple responses from observational unit .blue[traits]

::: paddings

* Multiple **traits** (usually of different .blue[**measurement units**]) may be measured from the same observational unit

:::

---

# Multiple responses from observational unit non-standard traits

::: paddings

* The response may be non-standard, e.g.:
   * images,
   * files,
   * big data (like genome), and so on
* These non-standard responses (at least I think) may require more specific questions or refinement of the experimental protocol to convert non-standard trait to  standard traits

:::

---

# Multiple responses from observational unit .blue[duplication]

::: paddings

* Observations by the same measurement process/device from the same observational unit is referred to as .blue[**duplication**]
* The main use of **duplication** is about quality control of measurement process and is peripheral to the main aim of experiment
* **Duplication** is a technical replication

:::

---

# Multiple responses from observational unit .blue[measurement device]

::: { .paddings font-size: 0.8em; }

* Not all **technical replications** are **duplications**
* Observations by different measurement process/device, that is supposed to measure the same measurement unit, from the same observational unit is also a **technical replication** but not a **duplication**
* If different **measurement devices** are used then that should be recorded

:::

---

# Multiple responses from observational unit .blue[index]

::: { .paddings font-size: 0.75em; }

* Multiple observations of an observational unit at particular time points or periods
* The variable that has this inherent order from past to present is referred to as **index** .font_small[(definition in line with Wang, Cook & Hyndman, 2019)]
* There should only be one **index** .font_small[(i.e. should not be separated to day, month, year, hour, minutes, second etc)]
* The **index** should not be part of treatment - if time is indeed a treatment, it should be coded as treatment and be able to be randomised

:::

::: { .footnote font-size: 0.6em; }

Wang, Cook & Hyndman (2019) A new tidy data structure to support exploration and modeling of temporal data. https://pdf.earo.me/tsibble.pdf

:::

---

# Observational unit described by multiple variables

::: { .paddings }

* .blue[**Key**] is a set of variables that define observational units <strike>over time</strike> .font_small[(definition similar to Wang, Cook & Hyndman, 2019) 
E.g. in chick weight, the observational unit is indexed by week, strip, swathe, pen and chick.
]
* The set of variables that form the **key** must be factors

<center>
<img src="images/tsibble-semantics.png" width = "800px" /> 
.font_small[image from Wang, Cook & Hyndman (2019)]
</center>

:::

::: { .footnote font-size: 0.6em; }

Wang, Cook & Hyndman (2019) A new tidy data structure to support exploration and modeling of temporal data. https://pdf.earo.me/tsibble.pdf

:::

---

# Unit structure: .blue[Blocks]

::: paddings

* **Blocks** are grouping of similar units .font_small[Note if grouped units are heterogeneous, it may lower the power of the experiment (and also lead to negative estimate of variances!)]
* According to Bailey (2008), there are three types of blocks: 
 * Natural discrete divisions .font_small[E.g. in an experiment on people or animals, the two sexes make obvious blocks.]
 * Continuous gradients .font_small[Continuous underlying trends, e.g. time or space. The block boundaries may be arbitrary chosen.]
 * Trial management .font_small[E.g. lab technician.]
* **Blocks** are **extraneous variables** .font_small[(next slide)]
* **Blocks** are factors and may be sometimes be referred to as .blue[**blocking factors**]

:::

---

# .blue[Extraneous variables]

::: paddings

* Extraneous: "irrelevant or unrelated to the subject being dealt with" .font_small[source: Google dictionary]
* **Extraneous variables** are variables that may affect the response 
* These may be continuous, discrete, ordinal or factor 
* **Extraneous variables** are not necessary **blocking factors**
* If numerical, the variable may be binned to create a new variable to be used as a **blocking factor**

:::

---

Building the

# grammar of experimental design

---

# Elements of experimental design .font_small[the prototype (very much experimental)]

::: { .paddings font-size: 0.9em; }

The key elements: .font_small[(elements must be nouns)]

* .blue[`units`] describes physical entities relevant in experiment

::: { .paddings5 font-size: 0.6em; }

<details><summary>My thoughts</summary>

<ul>
<li> I decided not to distinguish experimental and observational unit</li>
<li> experimenal unit is specfied in the mapping of the treatment to units</li>
<li> observation unit is specified in the measurement process</li>
<li> you can create as many units as needed</li>
<li> a set should contain similar features </li>
<li> <code>units</code> must be factors </li>
<li> an experiment must contain at least one <code>units</code> </li>
</ul>

</details>

:::

* .blue[`trts`] describes the treatment

::: { .paddings5 font-size: 0.6em; }

<details><summary>My thoughts</summary>
<ul>
<li><code>trts</code> can contain multiple factors or variables</li>
<li>It does not have to be factors but most likely it is</li>
<li>An experiment does not need to contain <code>trts</code></li>
</ul>
</details>

:::

* .blue[`mapping`] describes the mapping of `units` to `trts`
* .blue[`index`] for temporal variable (default value of `0`)
* .blue[`key`] specify units which identify observational units
* .blue[`block`] and .blue[`cluster`] are synonyms for grouped units
* .blue[`traits`] for observations 
* .blue[`vars`] for extraneous variables

:::

---

# Grammar of experimental design .font_small[the prototype (or more the collection of some verbs that sounds like it can fit the framework)]

::: paddings

* These are manipulated by the *verb* that precedes the elements separated by `_`.

`verb_element`

]

* Some verbs include .blue[`set`], .blue[`get`], .blue[`group`], .blue[`modify`], .blue[`split`],  .blue[`combine`], .blue[`apply`], .blue[`assign`], .blue[`measure`], .blue[`allocate`], .blue[`permute`], .blue[`cluster`].

:::

---

# `edibble` .font_small[(so not edible now - please wait more before it is ripe for consumption)]

::: { .paddings font-size: 0.8em; }

* The grammar will be implemented in the `edibble` package - to be thougth of as .blue[`tibble` for experimental design]
* For example, a split-plot design will be generated as

::: paddings5

```r
design <- edibble() %>% 
 set_trts(trt1 = c("I", "R"), 
 trt2 = LETTERS[1:4]) %>% # factorial stucture by default
 set_units(n = 16, name = "subplot") %>% 
 group_units(size = 4, name = "wholeplot") %>% 
 set_mapping(wholeplot ~ trt1,
 subplot ~ trt2) # randomise units to trt by default
```

:::

* And the output will be a table!
* Variables of the same type will be contiguous (e.g. all blocking units are contiguous)

:::

---

# Visualising the design

::: grid

::: { .item border-right: dashed 3px black; }

Then .font_small[(the plan is to have)] one function to do a quick plot:

```r
design %>% 
  plot_design()
```

:::

::: { .item  }

And possibly a step-by-step animation:

```r
design %>% 
  animate_design()
```

:::

---

# "Good" experimental design

::: paddings

* The aim of this work is to facilitate this workflow

* Ususally many people are involved in the project - domain expert, statistician, technician, and so on

* A quick plot is helpful to ensure that everyone is on the same page

* A good experimental design isn't just about statistically efficienct design .font_small[(it requires communication to understand potential sources of variation are properly accounted for and practicality in conducting the experiment)]

:::

---

# Design not in scope .font_small[at least not in near future]

and those that focus on sample size calculations

</center>

---

::: paddings

::: { text-align: left; font-size: 0.8em; }

In summary, this work-in-progress aims to

* facilitate **statistical thinking** of adapting experimental designs to different conditions by using grammatical expressions to manipulate core element of experimental design to generate a new design
* **lubricate the project flow** of generating experimental data (tabular output with similar variables to be contiguous, visualising design)

:::

Scratch notes here:

http://github.com/emitanaka/edibble

:::

---

background-image: url("images/bg1.jpg")
background-size: cover
class: hide-slide-number animated pulse fast

:::::::::: { .grid  .white grid: 1fr / 3fr 1fr;}

::: {.item .shade_black  border-right-style: solid; border-right-color: white;}

<h1>Thanks!</h1>

These slides are made using `xaringan` R-package and can be found at

Emi Tanaka

dr.emi.tanaka@gmail.com 
 @statsgen

:::

::::

---

# Session Information

:::: scroll-350

```
  ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   setting  value                       
   version  R version 3.6.0 (2019-04-26)
   os       macOS Mojave 10.14.6        
   system   x86_64, darwin15.6.0        
   ui       X11                         
   language (EN)                        
   collate  en_AU.UTF-8                 
   ctype    en_AU.UTF-8                 
   tz       Australia/Melbourne         
   date     2019-11-27                  
  
  ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   package      * version    date       lib source                            
   agricolae      1.3-1      2019-04-04 [1] CRAN (R 3.6.0)                    
   AlgDesign      1.1-7.3    2014-10-15 [1] CRAN (R 3.6.0)                    
   anicon         0.1.0      2019-05-28 [1] Github (emitanaka/anicon@377aece) 
   assertthat     0.2.1      2019-03-21 [1] CRAN (R 3.6.0)                    
   backports      1.1.4      2019-04-10 [1] CRAN (R 3.6.0)                    
   boot           1.3-22     2019-04-02 [1] CRAN (R 3.6.0)                    
   broom          0.5.2      2019-04-07 [1] CRAN (R 3.6.0)                    
   callr          3.3.1      2019-07-18 [1] CRAN (R 3.6.0)                    
   cellranger     1.1.0      2016-07-27 [1] CRAN (R 3.6.0)                    
   class          7.3-15     2019-01-01 [1] CRAN (R 3.6.0)                    
   classInt       0.4-1      2019-08-06 [1] CRAN (R 3.6.0)                    
   cli            1.1.0      2019-03-19 [1] CRAN (R 3.6.0)                    
   cluster        2.0.8      2019-04-05 [1] CRAN (R 3.6.0)                    
   coda           0.19-3     2019-07-05 [1] CRAN (R 3.6.0)                    
   colorspace     1.4-1      2019-03-18 [1] CRAN (R 3.6.0)                    
   combinat       0.0-8      2012-10-29 [1] CRAN (R 3.6.0)                    
   cranlogs       2.1.1      2019-04-29 [1] CRAN (R 3.6.0)                    
   crayon         1.3.4      2017-09-16 [1] CRAN (R 3.6.0)                    
   curl           4.2        2019-09-24 [1] CRAN (R 3.6.0)                    
   DBI            1.0.0      2018-05-02 [1] CRAN (R 3.6.0)                    
   deldir         0.1-23     2019-07-31 [1] CRAN (R 3.6.0)                    
   desc           1.2.0      2018-05-01 [1] CRAN (R 3.6.0)                    
   devtools       2.0.2      2019-04-08 [1] CRAN (R 3.6.0)                    
   digest         0.6.22     2019-10-21 [1] CRAN (R 3.6.0)                    
   dplyr        * 0.8.3      2019-07-04 [1] CRAN (R 3.6.0)                    
   e1071          1.7-2      2019-06-05 [1] CRAN (R 3.6.0)                    
   ellipsis       0.2.0.9000 2019-08-03 [1] Github (r-lib/ellipsis@27e0846)   
   emo            0.0.0.9000 2019-06-03 [1] Github (hadley/emo@02a5206)       
   evaluate       0.14       2019-05-28 [1] CRAN (R 3.6.0)                    
   expm           0.999-4    2019-03-21 [1] CRAN (R 3.6.0)                    
   forcats      * 0.4.0      2019-02-17 [1] CRAN (R 3.6.0)                    
   fs             1.3.1      2019-05-06 [1] CRAN (R 3.6.0)                    
   gdata          2.18.0     2017-06-06 [1] CRAN (R 3.6.0)                    
   generics       0.0.2      2018-11-29 [1] CRAN (R 3.6.0)                    
   ggplot2      * 3.2.1      2019-08-10 [1] CRAN (R 3.6.0)                    
   glue           1.3.1.9000 2019-10-24 [1] Github (tidyverse/glue@71eeddf)   
   gmodels        2.18.1     2018-06-25 [1] CRAN (R 3.6.0)                    
   gtable         0.3.0      2019-03-25 [1] CRAN (R 3.6.0)                    
   gtools         3.8.1      2018-06-26 [1] CRAN (R 3.6.0)                    
   haven          2.1.0      2019-02-19 [1] CRAN (R 3.6.0)                    
   highr          0.8        2019-03-20 [1] CRAN (R 3.6.0)                    
   hms            0.5.1      2019-08-23 [1] CRAN (R 3.6.0)                    
   htmltools      0.4.0      2019-10-04 [1] CRAN (R 3.6.0)                    
   httpuv         1.5.2      2019-09-11 [1] CRAN (R 3.6.0)                    
   httr           1.4.1      2019-08-05 [1] CRAN (R 3.6.0)                    
   icon           0.1.0      2019-05-28 [1] Github (ropenscilabs/icon@a510f88)
   janeaustenr    0.1.5      2017-06-10 [1] CRAN (R 3.6.0)                    
   jsonlite       1.6        2018-12-07 [1] CRAN (R 3.6.0)                    
   KernSmooth     2.23-15    2015-06-29 [1] CRAN (R 3.6.0)                    
   klaR           0.6-14     2018-03-19 [1] CRAN (R 3.6.0)                    
   knitr          1.25       2019-09-18 [1] CRAN (R 3.6.0)                    
   labeling       0.3        2014-08-23 [1] CRAN (R 3.6.0)                    
   later          1.0.0      2019-10-04 [1] CRAN (R 3.6.0)                    
   lattice        0.20-38    2018-11-04 [1] CRAN (R 3.6.0)                    
   lazyeval       0.2.2      2019-03-15 [1] CRAN (R 3.6.0)                    
   LearnBayes     2.15.1     2018-03-18 [1] CRAN (R 3.6.0)                    
   lifecycle      0.1.0      2019-08-01 [1] CRAN (R 3.6.0)                    
   lubridate      1.7.4      2018-04-11 [1] CRAN (R 3.6.0)                    
   magrittr       1.5        2014-11-22 [1] CRAN (R 3.6.0)                    
   MASS           7.3-51.4   2019-03-31 [1] CRAN (R 3.6.0)                    
   Matrix         1.2-17     2019-03-22 [1] CRAN (R 3.6.0)                    
   memoise        1.1.0      2017-04-21 [1] CRAN (R 3.6.0)                    
   mime           0.7        2019-06-11 [1] CRAN (R 3.6.0)                    
   miniUI         0.1.1.1    2019-09-01 [1] Github (rstudio/miniUI@52f5854)   
   modelr         0.1.4      2019-02-18 [1] CRAN (R 3.6.0)                    
   munsell        0.5.0      2018-06-12 [1] CRAN (R 3.6.0)                    
   nlme           3.1-140    2019-05-12 [1] CRAN (R 3.6.0)                    
   pillar         1.4.2      2019-06-29 [1] CRAN (R 3.6.0)                    
   pkgbuild       1.0.3      2019-03-20 [1] CRAN (R 3.6.0)                    
   pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 3.6.0)                    
   pkgload        1.0.2      2018-10-29 [1] CRAN (R 3.6.0)                    
   plyr           1.8.4      2016-06-08 [1] CRAN (R 3.6.0)                    
   prettyunits    1.0.2      2015-07-13 [1] CRAN (R 3.6.0)                    
   processx       3.4.1      2019-07-18 [1] CRAN (R 3.6.0)                    
   promises       1.1.0      2019-10-04 [1] CRAN (R 3.6.0)                    
   ps             1.3.0      2018-12-21 [1] CRAN (R 3.6.0)                    
   purrr        * 0.3.2      2019-03-15 [1] CRAN (R 3.6.0)                    
   questionr      0.7.0      2018-11-26 [1] CRAN (R 3.6.0)                    
   quoter         0.1.0      2019-07-28 [1] local                             
   R6             2.4.0      2019-02-14 [1] CRAN (R 3.6.0)                    
   RColorBrewer   1.1-2      2014-12-07 [1] CRAN (R 3.6.0)                    
   Rcpp           1.0.2      2019-07-25 [1] CRAN (R 3.6.0)                    
   readr        * 1.3.1      2018-12-21 [1] CRAN (R 3.6.0)                    
   readxl         1.3.1      2019-03-13 [1] CRAN (R 3.6.0)                    
   remotes        2.0.4      2019-04-10 [1] CRAN (R 3.6.0)                    
   reshape2       1.4.3      2017-12-11 [1] CRAN (R 3.6.0)                    
   rlang          0.4.0.9000 2019-08-03 [1] Github (r-lib/rlang@b0905db)      
   rmarkdown      1.16       2019-10-01 [1] CRAN (R 3.6.0)                    
   rprojroot      1.3-2      2018-01-03 [1] CRAN (R 3.6.0)                    
   rstudioapi     0.10       2019-03-19 [1] CRAN (R 3.6.0)                    
   rvest          0.3.4      2019-05-15 [1] CRAN (R 3.6.0)                    
   scales         1.0.0      2018-08-09 [1] CRAN (R 3.6.0)                    
   sessioninfo    1.1.1      2018-11-05 [1] CRAN (R 3.6.0)                    
   sf             0.8-0      2019-09-17 [1] CRAN (R 3.6.0)                    
   shiny          1.3.2      2019-04-22 [1] CRAN (R 3.6.0)                    
   SnowballC      0.6.0      2019-01-15 [1] CRAN (R 3.6.0)                    
   sp             1.3-1      2018-06-05 [1] CRAN (R 3.6.0)                    
   spData         0.3.2      2019-09-19 [1] CRAN (R 3.6.0)                    
   spdep          1.1-3      2019-09-18 [1] CRAN (R 3.6.0)                    
   statquotes     0.2.2      2017-08-29 [1] CRAN (R 3.6.0)                    
   stringi        1.4.3      2019-03-12 [1] CRAN (R 3.6.0)                    
   stringr      * 1.4.0      2019-02-10 [1] CRAN (R 3.6.0)                    
   testthat       2.2.1      2019-07-25 [1] CRAN (R 3.6.0)                    
   tibble       * 2.1.3      2019-06-06 [1] CRAN (R 3.6.0)                    
   tidyr        * 1.0.0      2019-09-11 [1] CRAN (R 3.6.0)                    
   tidyselect     0.2.5      2018-10-11 [1] CRAN (R 3.6.0)                    
   tidytext       0.2.2      2019-07-29 [1] CRAN (R 3.6.0)                    
   tidyverse    * 1.2.1      2017-11-14 [1] CRAN (R 3.6.0)                    
   tokenizers     0.2.1      2018-03-29 [1] CRAN (R 3.6.0)                    
   units          0.6-3      2019-05-03 [1] CRAN (R 3.6.0)                    
   usethis        1.5.0      2019-04-07 [1] CRAN (R 3.6.0)                    
   vctrs          0.2.0.9000 2019-08-03 [1] Github (r-lib/vctrs@11c34ae)      
   withr          2.1.2      2018-03-15 [1] CRAN (R 3.6.0)                    
   wordcloud      2.6        2018-08-24 [1] CRAN (R 3.6.0)                    
   xaringan       0.9        2019-03-06 [1] CRAN (R 3.6.0)                    
   xfun           0.10       2019-10-01 [1] CRAN (R 3.6.0)                    
   xml2           1.2.0      2018-01-24 [1] CRAN (R 3.6.0)                    
   xtable         1.8-4      2019-04-21 [1] CRAN (R 3.6.0)                    
   yaml           2.2.0      2018-07-25 [1] CRAN (R 3.6.0)                    
   zeallot        0.1.0      2018-01-28 [1] CRAN (R 3.6.0)                    
  
  [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
```

:::

These slides are licensed under <center><a href="https://creativecommons.org/licenses/by-sa/3.0/au/"><img src="images/cc.svg" style="height:2em;"/><img src="images/by.svg" style="height:2em;"/><img src="images/sa.svg" style="height:2em;"/></a></center>