background-color: #98ebee class: middle center <div class="shade_black" style="width:calc(45%);right:0;bottom:0;padding-left:5px;border: dashed 4px white;margin: auto;"> <i class="fas fa-exclamation-circle"></i> These slides are viewed best by Chrome and occasionally need to be refreshed if elements did not load properly. See here for <a href=slides-edibble.pdf style="color:black!important"/>PDF <i class="fas fa-file-pdf"></i></a>. </div> --- count: false background-image: url("images/bg1.png") background-size: cover class: hide-slide-number :::::::::: { .grid grid: 1fr / 7fr 3fr;} ::: {.item .shade_black border-right-style: solid; border-right-color: white;} <br><br> # Prototyping the Grammar of Experimental Design ## <span style="color:"></span> <br> <br> <br> <br> Presented by Emi Tanaka School of Mathematics and Statistics<br>
<i class="fas fa-envelope faa-float animated "></i>
dr.emi.tanaka@gmail.com
<i class="fab fa-twitter faa-float animated faa-fast "></i>
@statsgen 27th Nov 2019 @ NUMBAT | Melbourne, Australia ::: item :::::::::: <img src="assets/USydLogo-white.svg" style="position:absolute; bottom:20%; left:50%;width:200px"> --- class: transition # Hi! # Welcome to my experimental talk on # .blue[Experimental Design] where I pretend to know what I'm talking about --- class: center # Tidyverse packages and friends <br>facilitates data science project workflow <br> <img src="images/tidyverse-workflow.png" width = "1150px"/> .footnote[ Wickham and Grolemund (2017) R for Data Science. O'Reilly Media ] --- class: center # For experimental studies, statistical work starts before data import <br> <img src="images/experiment-workflow.png" width = "1150px"/> .font_large[This work is about facilitating workflow in diagram (A)] --- class: transition center middle Good # Experimental Design in theory # give you more information # without any extra cost! --- class: transition center middle # Good Experimental Design # in theory give you more information without any extra cost! # .pink[What does that mean??] --- class: transition middle center How does # .blue[*Grammar*] # of # Experimental Design help? .font_small[(at least what's in the plan at the moment)] --- class: font_small # (Layered) Grammar of Graphics <br> `ggplot2` <br> <center> <img src="images/ggplot-eg1.png" width = "730px"/> </center> --- class: center font_small # Base Plots <br> `graphics` Single purpose functions to generate "named plots" ✗ Cannot modify the resulting graphic ✗ Not really extensible or generalisable <br> ::: grid ::: { .item border-right: dashed 3px black; } ```r barplot(as.matrix(df$perc), legend = df$what) ``` <img src="figure/unnamed-chunk-2-1.svg" width="216" style="display: block; margin: auto;" /> <p></p> ::: ::: item ```r pie(df$perc, labels = df$what) ``` <img src="figure/unnamed-chunk-3-1.svg" width="360" style="display: block; margin: auto;" /> <p></p> ::: ::: --- class: transition center middle Just as there are .font_large["named plots"], there are # "named experimental designs" --- # Typical course in experimental design <br>.font_small[(at least in USYD in 2017-2019)] ::: paddings Teach: * Completely Randomised Design * Randomised Complete Block Design * Latin Square Design * Balanced Incomplete Block Design * Factorial Design * <strike> 2</strike><sup>k</sup><strike> Factorial Design</strike> .font_small[(I removed this from 2018, I won't talk about this today)] * Split-plot Design .font_small[(I added this from 2018 among other concepts)] ::: --- # Completely Randomised Design (CRD) ::: grid ::: item <br> <center> <img src="images/crd-eg1.png" width = "300px"/> </center> ::: ::: item * `\(t\)` treatments randomised to `\(n\)` units <br> .model-box[ `$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{treatment} + \texttt{error}$$` .font_small[(with constraints and distributional assumptions)] <br> <center> <img src="images/crd-eg1-anova.png" width = "700px"/> </center> ] ::: ::: --- # Randomised Complete Block Design (RCBD) ::: grid ::: item <br> <center> <img src="images/rcbd-eg1.png" width = "300px"/> </center> ::: ::: item * `\(b\)` blocks of size `\(t\)` * `\(t\)` treatments randomised to `\(t\)` units within each block .model-box[ `$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{treatment} + \texttt{block} + \texttt{error}$$` <center> <img src="images/rcbd-eg1-anova.png" width = "850px"/> </center> ] ::: ::: --- # Latin Square Design (LSD) ::: grid ::: item <br> <center> <img src="images/lsd-eg1.png" width = "300px"/> </center> ::: ::: item * two orthogonal blocks of size `\(t\)` * `\(t\)` treatments randomised to units such that every treatment appears exactly once in each block .model-box[ `$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{treatment} + \texttt{row} + \texttt{column} + \texttt{error}$$` <center> <img src="images/lsd-eg1-anova.png" width = "850px"/> </center> ] ::: ::: --- # Balanced Incomplete Block Design (BIBD) ::: grid ::: item <br> <center> <img src="images/bibd-eg1.png" width = "300px"/> </center> ::: ::: item * `\(b\)` blocks of size `\(k < t\)` * `\(t\)` treatments randomised to units within each block such that every pair of treatment appears the same number of times across blocks .model-box[ `$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{block} + \texttt{treatment} + \texttt{error}$$` <center> <img src="images/bibd-eg1-anova.png" width = "850px"/> </center> ] ::: ::: --- # Factorial Design ::: grid ::: item <br> <center> <img src="images/factorial-eg1.png" width = "300px"/> </center> ::: ::: item * `\(ab = t\)` treatments randomised to `\(n\)` units * treatment is every combination of two factors A and B .model-box[ `$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{A} + \texttt{B} + \texttt{A:B} + \texttt{error}$$` <center> <img src="images/factorial-eg1-anova-top.png" width = "850px"/> <details style="font-size:4pt"><summary></summary> <img src="images/factorial-eg1-anova-middle.png" width = "850px"/> </details> <img src="images/factorial-eg1-anova-bottom.png" width = "850px"/> </center> ] ::: ::: --- # Split-plot Design ::: grid ::: item <br> <center> <img src="images/split-plot-eg1.png" width = "300px"/> </center> ::: ::: {.item font-size: 0.85em; } * `\(n_1\)` whole plots consisting of `\(b\)` sub plots * in total there are `\(n\)` sub plots * treatment factor A is randomised to whole plots * treatment factor B is randomsied to sub plots within each whole plot .model-box[ `$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{A} + \texttt{WP} + \texttt{B} + \texttt{A:B} + \texttt{error}$$` <center> <img src="images/split-plot-eg1-anova.png" width = "850px"/> </center> ] ::: ::: --- class: transition center middle # CRAN Task View of Design of Experiments contains # 📦 .blue[239 R-packages ] .font_small[as of 2019-11-23] --- class: center # Top 10 downloaded R-packages in 2018 <img src="figure/top10-1.svg" width="504" style="display: block; margin: auto;" /> # `agricolae` is the most downloaded .font_small[(data from `cranlogs` from 2018-01-01 to 2018-12-31)] --- class: font_small # `agricolae::design.crd` .blue[**Completely randomised design**] for `\(t = 3\)` treatments with `\(2\)` replicates each <pre><code> trt <- c("A", "B", "C") agricolae::.bg-yellow[design.crd](trt = trt, r = 2) %>% glimpse() </code></pre> .scroll-350[ ``` List of 2 $ parameters:List of 7 ..$ design: chr "crd" ..$ trt : chr [1:3] "A" "B" "C" ..$ r : num [1:3] 2 2 2 ..$ serie : num 2 ..$ seed : int 2114380113 ..$ kinds : chr "Super-Duper" ..$ : logi TRUE $ book :'data.frame': 6 obs. of 3 variables: ..$ plots: num [1:6] 101 102 103 104 105 106 ..$ r : int [1:6] 1 1 2 2 1 2 ..$ trt : Factor w/ 3 levels "A","B","C": 3 1 3 1 2 2 ``` ] ::: {.plot-box .pos top: 35%; right: 50px;} <img src="figure/unnamed-chunk-5-1.svg" width="216" style="display: block; margin: auto;" /> <p></p> ::: --- class: font_small # `agricolae::design.rcbd` .blue[**Randomised complete block design**] for `\(t =3\)` treatments with `\(2\)` blocks <pre><code> trt <- c("A", "B", "C") agricolae::.bg-yellow[design.rcbd](trt = trt, r = 2) %>% glimpse() </code></pre> .scroll-350[ ``` List of 3 $ parameters:List of 7 ..$ design: chr "rcbd" ..$ trt : chr [1:3] "A" "B" "C" ..$ r : num 2 ..$ serie : num 2 ..$ seed : int -997119203 ..$ kinds : chr "Super-Duper" ..$ : logi TRUE $ sketch : chr [1:2, 1:3] "C" "C" "A" "A" ... $ book :'data.frame': 6 obs. of 3 variables: ..$ plots: num [1:6] 101 102 103 201 202 203 ..$ block: Factor w/ 2 levels "1","2": 1 1 1 2 2 2 ..$ trt : Factor w/ 3 levels "A","B","C": 3 1 2 3 1 2 ``` ] ::: { .info-box .pos top: 35%; right: 50px; width: 500px;} * The `r` format is different (probably because this is a *balanced* design) * There is a `sketch` in the list ::: ::: {.plot-box .pos bottom: 10px; right: 50px;} <br> <img src="figure/unnamed-chunk-7-1.svg" width="216" style="display: block; margin: auto;" /> <p></p> ::: --- class: font_small # `agricolae::design.lsd()` .blue[**Latin square design**] for `\(t = 3\)` treatments <pre><code> trt <- c("A", "B", "C") agricolae::.bg-yellow[design.lsd](trt = trt) %>% glimpse() </code></pre> .scroll-350[ ``` List of 3 $ parameters:List of 7 ..$ design: chr "lsd" ..$ trt : chr [1:3] "A" "B" "C" ..$ r : int 3 ..$ serie : num 2 ..$ seed : int -205979691 ..$ kinds : chr "Super-Duper" ..$ : logi TRUE $ sketch : chr [1:3, 1:3] "C" "A" "B" "B" ... $ book :'data.frame': 9 obs. of 4 variables: ..$ plots: num [1:9] 101 102 103 201 202 203 301 302 303 ..$ row : Factor w/ 3 levels "1","2","3": 1 1 1 2 2 2 3 3 3 ..$ col : Factor w/ 3 levels "1","2","3": 1 2 3 1 2 3 1 2 3 ..$ trt : Factor w/ 3 levels "A","B","C": 3 2 1 1 3 2 2 1 3 ``` ] ::: {.plot-box .pos top: 100px; right: 50px;} <br> <img src="figure/unnamed-chunk-9-1.svg" width="216" style="display: block; margin: auto;" /> <p></p> ::: --- class: font_small # `agricolae::design.bib()` .blue[**Balanced incomplete block design**] for `\(t = 3\)` treatments with block size of `\(2\)` <pre><code> trt <- c("A", "B", "C") agricolae::.bg-yellow[design.bib](trt = trt, k = 2) %>% glimpse() </code></pre> .scroll-350[ ``` [1] "No improvement over initial random design." Parameters BIB ============== Lambda : 1 treatmeans : 3 Block size : 2 Blocks : 3 Replication: 2 Efficiency factor 0.75 <<< Book >>> ``` ``` List of 4 $ parameters:List of 6 ..$ design: chr "bib" ..$ trt : chr [1:3] "A" "B" "C" ..$ k : num 2 ..$ serie : num 2 ..$ seed : int 1122489977 ..$ kinds : chr "Super-Duper" $ statistics:'data.frame': 1 obs. of 6 variables: ..$ lambda : num 1 ..$ treatmeans: int 3 ..$ blockSize : num 2 ..$ blocks : int 3 ..$ r : num 2 ..$ Efficiency: num 0.75 $ sketch : chr [1:3, 1:2] "C" "A" "B" "A" ... $ book :'data.frame': 6 obs. of 3 variables: ..$ plots: num [1:6] 101 102 201 202 301 302 ..$ block: Factor w/ 3 levels "1","2","3": 1 1 2 2 3 3 ..$ trt : Factor w/ 3 levels "A","B","C": 3 1 1 2 2 3 ``` ] ::: {.plot-box .pos top: 200px; right: 50px;} <br> <img src="figure/unnamed-chunk-11-1.svg" width="216" style="display: block; margin: auto;" /> <p></p> ::: --- class: font_small # `agricolae::design.ab()` .blue[**Factorial design**] for `\(t = 3 \times 2\)` treatments with `\(2\)` replication for each treatment <pre><code> agricolae::.bg-yellow[design.ab](trt = c(3, 2), r = 2, design = "crd") %>% glimpse() </code></pre> .scroll-350[ ``` List of 2 $ parameters:List of 8 ..$ design : chr "factorial" ..$ trt : chr [1:6] "1 1" "1 2" "2 1" "2 2" ... ..$ r : num [1:6] 2 2 2 2 2 2 ..$ serie : num 2 ..$ seed : int 1588382117 ..$ kinds : chr "Super-Duper" ..$ : logi TRUE ..$ applied: chr "crd" $ book :'data.frame': 12 obs. of 4 variables: ..$ plots: num [1:12] 101 102 103 104 105 106 107 108 109 110 ... ..$ r : int [1:12] 1 1 1 1 2 2 2 1 1 2 ... ..$ A : Factor w/ 3 levels "1","2","3": 3 2 2 3 2 3 3 1 1 1 ... ..$ B : Factor w/ 2 levels "1","2": 2 2 1 1 1 1 2 1 2 1 ... ``` ] ::: {.plot-box .pos top: 200px; right: 50px;} <br> <img src="figure/unnamed-chunk-13-1.svg" width="216" style="display: block; margin: auto;" /> <p></p> <br> ::: ::: {.info-box .pos top: 60%; right: 350px; } Note NOT A/B testing! ::: --- class: font_small # `agricolae::design.split()` .blue[**Split-plot design**] for `\(t = 2 \times 4\)` treatments with `\(2\)` replication for each treatment <pre><code> trt1 <- c("I", "R"); trt2 <- LETTERS[1:4] agricolae::.bg-yellow[design.split](trt1 = trt1, trt2 = trt2, r = 2, design = "crd") %>% glimpse() </code></pre> .scroll-350[ ``` List of 2 $ parameters:List of 8 ..$ design : chr "split" ..$ : logi TRUE ..$ trt1 : chr [1:2] "I" "R" ..$ applied: chr "crd" ..$ r : num [1:2] 2 2 ..$ serie : num 2 ..$ seed : int 1151770077 ..$ kinds : chr "Super-Duper" $ book :'data.frame': 16 obs. of 5 variables: ..$ plots : num [1:16] 101 101 101 101 102 102 102 102 103 103 ... ..$ splots: Factor w/ 4 levels "1","2","3","4": 1 2 3 4 1 2 3 4 1 2 ... ..$ r : int [1:16] 1 1 1 1 1 1 1 1 2 2 ... ..$ trt1 : Factor w/ 2 levels "I","R": 2 2 2 2 1 1 1 1 2 2 ... ..$ trt2 : Factor w/ 4 levels "B","C","D","A": 3 2 1 4 3 2 4 1 2 3 ... ``` ] ::: {.plot-box .pos top: 300px; right: 50px;} <img src="figure/split-plot-graph-1.svg" width="216" style="display: block; margin: auto;" /> <p></p> ::: --- class: transition middle A **modular approach** in teaching and software **deters** # .blue[*statistical thinking*] of experimental designs <br> where connections between different experimental designs are lost --- class: transition middle ::: paddings .font_large[.blue[**Experimental design should be adapted to resources**] *rather than* resources adapted to the experimental design*] <br><br> E.g. scientist decides to remove some experimental units to fit a randomised complete block design ::: ::: { .bottom_abs .width100 font-size: 0.6em; } <sup>*</sup>Exceptions apply for adaptive experiments or sampling experiments (where sample size is the main the focus) ::: --- class: transition .font_large[Step 1] <br> Identify # elements of experimental design <br><br> .footnote[ with inspiration from <br> .font_medium[Bailey (2008) Design of Comparative Experiments. Cambridge University Press] ] --- # Wise words of Rosemary Bailey ::: { padding-left: 5%; padding-right: 25%; font-size: 0.9em; } * A .blue[**treatment**] is the entire description of what can be applied to an experimental unit * An .blue[**experimental unit**] is the smallest unit to which a treatment can be applied * An .blue[**observational unit**] is the smallest unit on which a response will be measured * .blue[**Treatment structure**] means meaningful ways of dividing up the whole set of treatment denoted by `\(\mathcal{T}\)` * .blue[**Unit structure**] means meaningful ways of dividing up the set `\(\Omega\)` of units, ignoring the treatments * .blue[**Replication**] is the number of times that each treatment is tested ::: ::: { .info-box .pos top: 40%; right: 5px; width: 230px; } Observational unit is not the same as observation! ::: --- # Definition of Design of Experiments ::: paddings * The .blue[**design**] is the allocation of treatments to units * Or mathematically, the design is the function or .blue[**mapping**] `\(T\)` of unit to treatment, `$$T: \Omega_{\scriptsize\text{(units)}} \rightarrow \mathcal{T}_{\scriptsize\text{(treatments)}}$$` so `\(\forall \omega \in \Omega\)`, `\(T(\omega) = \alpha\)` where `\(\alpha \in \mathcal{T}\)` * `\(T\)` is surjective .font_small[(not that important for later but I realised this and thought it was cool)] <center> <img src="images/mapping-surjective.png" width = "250px"/> </center> --- # Example: Calf Feeding ::: { .grid grid: 1fr / 380px 1fr; } ::: { .item .bg-grey font-size: 0.7em; } <img src="images/eg2-calf-feeding.png" width = "350px"/> * Three feed treatments are compared on 24 calves * The calves are kept in 6 pens with 4 calves per pen * Each feed is applied to two whole pens * Every calf is weighed individually ::: ::: item <br> ::: info-box * **Treatments** are the 3 feeds * **Experimental units** are the 6 pens * **Observational units** are the 24 calves * **Treatment structure** is unstructured * **Unit structure** is such that 4 calves are grouped to each pen ::: .pink[Is the replication of a treatment 2 or 8?] <br> <img src="images/eg2-calf-feeding-anova.png" width = "700px"/> ::: ::: --- # Replication vs Repetition ::: { .grid grid: 1fr / 380px 1fr; } ::: { .item .bg-grey font-size: 0.7em; } <img src="images/eg2-calf-feeding.png" width = "350px"/> * Three feed treatments are compared on 24 calves * The calves are kept in 6 pens with 4 calves per pen * Each feed is applied to two whole pens * Every calf is weighed individually ::: ::: item In this case we say that: * the **replication** of each treatment is 2, and * the .blue[**repetition**] of each treatment is 8 <br> <img src="images/eg2-calf-feeding-anova2.png" width = "700px"/> Note well the denominator of `\(f\)` statistic and degrees of freedoms for the `\(F\)` distribution! ::: ::: --- # Example: Grafting on horses ::: { .grid grid: 1fr / 380px 1fr; } ::: { .item .bg-grey font-size: 0.4em; } <img src="images/eg3-grafting.png" width = "350px"/> * A surgeon is going to use 9 horses in an experiment * He wants to compare 3 methods of grafting skin * He intended to use 3 animals for each method * After the graft was complete he would take a sample of new skin from each horse * He would then cut each sample into 20 (tiny) pieces and use a precision instrument to measure the thickness of each piece ::: ::: item <br> ::: info-box * **Treatments** are the 3 grafting methods * **Experimental units** are the 9 horses * **Observational units** are the 20 `\(\times\)` 9 skin pieces * **Treatment structure** is unstructured * **Unit structure** is such that 20 skin pieces are grouped by the horse which the skin was taken from * Each treatment **replication** is 3 ::: <br> <center> <img src="images/eg3-grafting-anova.png" width = "350px"/> </center> ::: ::: --- class: font_smaller # Example: Grafting on horses - simulation ::: paddings5 ::: { .code-box .font_smaller width: 1100px; } ```r set.seed(1) nani <- 9; ntrt <- 3; ncut <- 20; n <- nani * ncut graft <- factor(rep(LETTERS[1:ntrt], each = nani / ntrt * ncut)) animal <- factor(rep(LETTERS[1:nani], each = ncut)) anidev <- rnorm(nani, 0, 20) *trt <- c(300, 300, 300) y <- trt[as.numeric(graft)] + anidev[as.numeric(animal)] + rnorm(n, 0, 5) ``` ::: ```r anova(lm(y ~ graft + animal)) ``` ``` Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) graft 2 6676 3338.1 106.31 < 2.2e-16 *** animal 6 54575 9095.8 289.68 < 2.2e-16 *** Residuals 171 5369 31.4 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ::: ::: {.info-box .pos right:10px; bottom:220px; width: 650px; } * Grafting methods are compared with (OU) residuals * Grafting methods appear statistically significant even though in the simulation, it has the same effects ::: --- class: font_small # Example: Grafting on horses <br> appropriate analysis <br> ::: paddings Comparison should happen in the valid stratum <br> ```r summary(aov(y ~ graft + Error(animal))) ``` ``` Error: animal Df Sum Sq Mean Sq F value Pr(>F) graft 2 6676 3338 0.367 0.707 Residuals 6 54575 9096 Error: Within Df Sum Sq Mean Sq F value Pr(>F) Residuals 171 5369 31.4 ``` <p></p> ::: ::: {.info-box .pos right:10px; bottom:190px; width: 300px; } This time the result is statistically non-significant ::: --- # Pseudo-replication ::: paddings * .blue[**Pseudo-replication**] or **false replication** describes a situation where multiple measurements, taken from the same experimental unit, are treated as replication <Br> * **Technical replication** occurs when mulitple measurements are taken from the same unit * Technical replication is always a pseudo-replication * **Biological replication** occurs when measurements are taken from independent biological subjects ::: --- # Example: Chick weight ::: grid ::: { .item font-size: 0.38em; } * An experiment was conducted on a prairie in Western Canada to find out if insecticides used to control grasshoppers affected the weight of young chicks of ring-necked pheasants, either by affecting the grass around the chicks or by affecting the grasshoppers eaten by the chicks. * Three insecticides were used, at low and high doses. * The low dose was the highest dose recommended by the department of agriculture; the high dose was four times as much as the recommended dose, to assess the effects of mistakes. * The experimental procedure took place in each of three consecutive weeks. * On the first day of each week a number of newly-hatched female pheasant chicks were placed in a brooder pen. * On the third day, the chicks were randomly divided into twelve groups of six chicks each. * Each chick was given an identification tape and weighed. * On the fourth day, a portion of the field was divided into three strips, each of which was divided into two swathes. * The two swathes within each strip were sprayed with the two doses of the same insecticide. * Two pens were erected on each swathe, and one group of pheasant chicks was put into each pen. * For the next 48 hours, the chicks were fed with grasshoppers which had been collected locally. * Half the grasshoppers were anaesthetized and sprayed with insecticide; the other half were also anaesthetized and handled in every way like the first half except that they were not sprayed. * All grasshoppers were frozen. * The experimenters maintained a supply of frozen grasshoppers to each pen, putting them on small platforms so that they would not absorb further insecticide from the grass. * In each swathe, one pen had unsprayed grasshoppers while the other had grasshoppers sprayed by the insecticide which had been applied to that swathe. * At the end of the 48 hours, the chicks were weighed again individually. .font_large[😵 😵 😵] ::: ::: item ::: {.scroll-350 height: 600px;} <center> <img src="images/eg4-chicks.png" width = "500px" /> </center> ::: ::: ::: ::: { .footnote font-size: 8pt; } Experiment based from Martin et al. (1996) Effects of grasshopper-control insecticides on survival and brain acetylcholinesterase of pheasant (Phasianus colchicus) chicks. Environmental Toxicology and Chemistry **15**(4) 518-524. ::: --- # Example: Chick weight - skeleton ANOVA ::: grid ::: item <br> ::: { .info-box font-size: 0.7em; } * **Treatments** are the factorial combination of insecticide (3 types), dose (low or high), and grasshopper food (sprayed or not) * .pink[**Experimental units** are the pens?] * **Observational units** are the chicks * **Treatment structure** is such that: * different insecticide is applied to strips, * different dosage level is applied to swathes, * sprayed or unsprayed grasshopper is left in pens * **Unit structure** is given as `Week/Strips/Swathes/Pens/Chick` via Wilkinson & Roger (1973) notation ::: ::: ::: item <br> <center> <img src="images/eg4-chicks-anova.png" width = "500px" /> </center> ::: ::: ::: { .footnote font-size: 8pt; } Wilkinson and Rogers (1973) Symbolic Description of Factorial Models for Analysis of Variance. Journal of the Royal Statistical Society: Series C (Applied Statistics). **22**(3) 392-399 Experiment based from Martin et al. (1996) Effects of grasshopper-control insecticides on survival and brain acetylcholinesterase of pheasant (Phasianus colchicus) chicks. Environmental Toxicology and Chemistry **15**(4) 518-524. ::: --- class: transition middle How to describe a situation were there are .font_large[**multiple response measurements**] (i.e. multi-variate or multi-trait data)? <br> Also what about where measurement is over .font_large[**time**] (e.g. longitudinal data)? <br> <!-- What about **multi-phase experiments** (e.g. field experiment to lab experiment)? --> --- class: transition middle # Refining Rosemary Bailey's definitions ::: paddings "Each of you is perfect the way you are ... and you can use a little improvement." Shunryu Suzuki ::: --- # Treatment factor ::: paddings * We need to distinguish between **treatment** and .blue[**treatment factor**] .font_small[ * E.g. in the chick weight, `Dose`, `Insecticide` and `Food` are treatment factors * Should `Food:Dose` etc be a treatment factor? Probably not if it can be distinguished as its own factor. Maybe something like "smallest division of factor such that it cannot be partitioned to other factors" ] * The combination of treatment factors make a treatment .font_small[ * E.g. in the chick weight, `Dose:Insecticide:Food` specify the treatment ] * Not all combination is required in the experiment .font_small[ * fractional factorial experiment ] * If all combination is present then it is referred to as .blue[**factorial**] combination ::: ::: --- # Multiple responses from observational unit<br>.blue[traits] ::: paddings * Multiple **traits** (usually of different .blue[**measurement units**]) may be measured from the same observational unit <center> <img src="images/multitrait.png" width = "400px" /> </center> ::: --- # Multiple responses from observational unit<br>non-standard traits ::: paddings * The response may be non-standard, e.g.: * images, * files, * big data (like genome), and so on * These non-standard responses (at least I think) may require more specific questions or refinement of the experimental protocol to convert non-standard trait to standard traits ::: --- # Multiple responses from observational unit<br>.blue[duplication] ::: paddings * Observations by the same measurement process/device from the same observational unit is referred to as .blue[**duplication**] * The main use of **duplication** is about quality control of measurement process and is peripheral to the main aim of experiment * **Duplication** is a technical replication <center> <img src="images/duplication.png" width = "450px" /> </center> ::: --- # Multiple responses from observational unit<br>.blue[measurement device] ::: { .paddings font-size: 0.8em; } * Not all **technical replications** are **duplications** * Observations by different measurement process/device, that is supposed to measure the same measurement unit, from the same observational unit is also a **technical replication** but not a **duplication** * If different **measurement devices** are used then that should be recorded <center> <img src="images/technical-replication.png" width = "400px" /> </center> ::: --- # Multiple responses from observational unit<br>.blue[index] ::: { .paddings font-size: 0.75em; } * Multiple observations of an observational unit at particular time points or periods * The variable that has this inherent order from past to present is referred to as **index** .font_small[(definition in line with Wang, Cook & Hyndman, 2019)] * There should only be one **index** .font_small[(i.e. should not be separated to day, month, year, hour, minutes, second etc)] * The **index** should not be part of treatment - if time is indeed a treatment, it should be coded as treatment and be able to be randomised <center> <img src="images/longitudinal.png" width = "400px" /> </center> ::: ::: { .footnote font-size: 0.6em; } Wang, Cook & Hyndman (2019) A new tidy data structure to support exploration and modeling of temporal data. https://pdf.earo.me/tsibble.pdf ::: --- # Observational unit described by <br>multiple variables ::: { .paddings } * .blue[**Key**] is a set of variables that define observational units <strike>over time</strike><br> .font_small[(definition similar to Wang, Cook & Hyndman, 2019)<br><br> E.g. in chick weight, the observational unit is indexed by week, strip, swathe, pen and chick. ] * The set of variables that form the **key** must be factors <br> <center> <img src="images/tsibble-semantics.png" width = "800px" /><br> .font_small[image from Wang, Cook & Hyndman (2019)] </center> ::: ::: { .footnote font-size: 0.6em; } Wang, Cook & Hyndman (2019) A new tidy data structure to support exploration and modeling of temporal data. https://pdf.earo.me/tsibble.pdf ::: --- # Unit structure: .blue[Blocks] ::: paddings * **Blocks** are grouping of similar units<br>.font_small[Note if grouped units are heterogeneous, it may lower the power of the experiment (and also lead to negative estimate of variances!)] * According to Bailey (2008), there are three types of blocks: * Natural discrete divisions .font_small[E.g. in an experiment on people or animals, the two sexes make obvious blocks.] * Continuous gradients .font_small[Continuous underlying trends, e.g. time or space. The block boundaries may be arbitrary chosen.] * Trial management .font_small[E.g. lab technician.] * **Blocks** are **extraneous variables** .font_small[(next slide)] * **Blocks** are factors and may be sometimes be referred to as .blue[**blocking factors**] ::: --- # .blue[Extraneous variables] ::: paddings * Extraneous: "irrelevant or unrelated to the subject being dealt with" .font_small[source: Google dictionary] * **Extraneous variables** are variables that may affect the response * These may be continuous, discrete, ordinal or factor * **Extraneous variables** are not necessary **blocking factors** * If numerical, the variable may be binned to create a new variable to be used as a **blocking factor** ::: --- class: transition .font_large[Step 2] <br> Building the # grammar of experimental design --- # Elements of experimental design <br> .font_small[the prototype (very much experimental)] ::: { .paddings font-size: 0.9em; } The key elements: .font_small[(elements must be nouns)] * .blue[`units`] describes physical entities relevant in experiment ::: { .paddings5 font-size: 0.6em; } <details><summary>My thoughts</summary> <ul> <li> I decided not to distinguish experimental and observational unit</li> <li> experimenal unit is specfied in the mapping of the treatment to units</li> <li> observation unit is specified in the measurement process</li> <li> you can create as many units as needed</li> <li> a set should contain similar features </li> <li> <code>units</code> must be factors </li> <li> an experiment must contain at least one <code>units</code> </li> </ul> </details> ::: * .blue[`trts`] describes the treatment ::: { .paddings5 font-size: 0.6em; } <details><summary>My thoughts</summary> <ul> <li><code>trts</code> can contain multiple factors or variables</li> <li>It does <i>not</i> have to be factors but most likely it is</li> <li>An experiment does not need to contain <code>trts</code></li> </ul> </details> ::: * .blue[`mapping`] describes the mapping of `units` to `trts` * .blue[`index`] for temporal variable (default value of `0`) * .blue[`key`] specify units which identify observational units * .blue[`block`] and .blue[`cluster`] are synonyms for grouped units * .blue[`traits`] for observations * .blue[`vars`] for extraneous variables ::: --- # Grammar of experimental design <br> .font_small[the prototype (or more the collection of some verbs that sounds like it can fit the framework)] ::: paddings * These are manipulated by the *verb* that precedes the elements separated by `_`. <br> .center[ `verb_element` ] <br> * Some verbs include .blue[`set`], .blue[`get`], .blue[`group`], .blue[`modify`], .blue[`split`], .blue[`combine`], .blue[`apply`], .blue[`assign`], .blue[`measure`], .blue[`allocate`], .blue[`permute`], .blue[`cluster`]. ::: --- class: font_smaller # `edibble`<br>.font_small[(so not edible now - please wait more before it is ripe for consumption)] ::: { .paddings font-size: 0.8em; } * The grammar will be implemented in the `edibble` package - to be thougth of as .blue[`tibble` for experimental design] * For example, a split-plot design will be generated as ::: paddings5 ```r design <- edibble() %>% set_trts(trt1 = c("I", "R"), trt2 = LETTERS[1:4]) %>% # factorial stucture by default set_units(n = 16, name = "subplot") %>% group_units(size = 4, name = "wholeplot") %>% set_mapping(wholeplot ~ trt1, subplot ~ trt2) # randomise units to trt by default ``` ::: * And the output will be a table! * Variables of the same type will be contiguous (e.g. all blocking units are contiguous) ::: --- class: font_smaller # Visualising the design ::: grid ::: { .item border-right: dashed 3px black; } Then .font_small[(the plan is to have)] one function to do a quick plot: ```r design %>% plot_design() ``` <img src="figure/split-plot-graph-1.svg" width="360" style="display: block; margin: auto;" /> ::: ::: { .item } And possibly a step-by-step animation: ```r design %>% animate_design() ``` <center> <img src="images/split-plot-anim.png" width = "600px"/> </center> ::: ::: --- # "Good" experimental design ::: paddings * The aim of this work is to facilitate this workflow <center> <img src="images/design-workflow.png" width = "150px"/> </center> * Ususally many people are involved in the project - domain expert, statistician, technician, and so on * A quick plot is helpful to ensure that everyone is on the same page * A good experimental design isn't just about statistically efficienct design .font_small[(it requires communication to understand potential sources of variation are properly accounted for and practicality in conducting the experiment)] ::: --- # Design not in scope <br>.font_small[at least not in near future] <br> <center> <img src="images/design-workflow2.png" width = "300px"/> <br> and those that focus on sample size calculations </center> --- class: transition middle ::: paddings ::: { text-align: left; font-size: 0.8em; } In summary, this work-in-progress aims to * facilitate **statistical thinking** of adapting experimental designs to different conditions by using grammatical expressions to manipulate core element of experimental design to generate a new design * **lubricate the project flow** of generating experimental data (tabular output with similar variables to be contiguous, visualising design) ::: <br> .font_large[**Feedback and thoughts welcome!**] <br><br> Scratch notes here: http://github.com/emitanaka/edibble ::: --- background-image: url("images/bg1.jpg") background-size: cover class: hide-slide-number animated pulse fast :::::::::: { .grid .white grid: 1fr / 3fr 1fr;} ::: {.item .shade_black border-right-style: solid; border-right-color: white;} <br><br> <h1>Thanks!</h1> These slides are made using `xaringan` R-package and can be found at .center[ emitanaka.org/slides/NUMBAT2019 ] <br><br> Emi Tanaka
<i class="fas fa-envelope faa-float animated "></i>
dr.emi.tanaka@gmail.com<br>
<i class="fab fa-twitter faa-float animated faa-fast "></i>
@statsgen ::: <div class="transition monash-m-new delay-1s" style="clip-path:url(#swipe__clip-path);"> <div class="background-image" style="background-image:url('images/large.png');background-position: center;background-size:cover;margin-left:3px;"> <svg class="clip-svg absolute"> <defs> <clipPath id="swipe__clip-path" clipPathUnits="objectBoundingBox"> <polygon points="0.5745 0, 0.5 0.33, 0.42 0, 0 0, 0 1, 0.27 1, 0.27 0.59, 0.37 1, 0.634 1, 0.736 0.59, 0.736 1, 1 1, 1 0, 0.5745 0" /> </clipPath> </defs> </svg> </div> </div> :::: --- class: white background-color: #fee5e2 # Session Information :::: scroll-350 ``` ─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── setting value version R version 3.6.0 (2019-04-26) os macOS Mojave 10.14.6 system x86_64, darwin15.6.0 ui X11 language (EN) collate en_AU.UTF-8 ctype en_AU.UTF-8 tz Australia/Melbourne date 2019-11-27 ─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── package * version date lib source agricolae 1.3-1 2019-04-04 [1] CRAN (R 3.6.0) AlgDesign 1.1-7.3 2014-10-15 [1] CRAN (R 3.6.0) anicon 0.1.0 2019-05-28 [1] Github (emitanaka/anicon@377aece) assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0) backports 1.1.4 2019-04-10 [1] CRAN (R 3.6.0) boot 1.3-22 2019-04-02 [1] CRAN (R 3.6.0) broom 0.5.2 2019-04-07 [1] CRAN (R 3.6.0) callr 3.3.1 2019-07-18 [1] CRAN (R 3.6.0) cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.6.0) class 7.3-15 2019-01-01 [1] CRAN (R 3.6.0) classInt 0.4-1 2019-08-06 [1] CRAN (R 3.6.0) cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.0) cluster 2.0.8 2019-04-05 [1] CRAN (R 3.6.0) coda 0.19-3 2019-07-05 [1] CRAN (R 3.6.0) colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.0) combinat 0.0-8 2012-10-29 [1] CRAN (R 3.6.0) cranlogs 2.1.1 2019-04-29 [1] CRAN (R 3.6.0) crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0) curl 4.2 2019-09-24 [1] CRAN (R 3.6.0) DBI 1.0.0 2018-05-02 [1] CRAN (R 3.6.0) deldir 0.1-23 2019-07-31 [1] CRAN (R 3.6.0) desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0) devtools 2.0.2 2019-04-08 [1] CRAN (R 3.6.0) digest 0.6.22 2019-10-21 [1] CRAN (R 3.6.0) dplyr * 0.8.3 2019-07-04 [1] CRAN (R 3.6.0) e1071 1.7-2 2019-06-05 [1] CRAN (R 3.6.0) ellipsis 0.2.0.9000 2019-08-03 [1] Github (r-lib/ellipsis@27e0846) emo 0.0.0.9000 2019-06-03 [1] Github (hadley/emo@02a5206) evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0) expm 0.999-4 2019-03-21 [1] CRAN (R 3.6.0) forcats * 0.4.0 2019-02-17 [1] CRAN (R 3.6.0) fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0) gdata 2.18.0 2017-06-06 [1] CRAN (R 3.6.0) generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0) ggplot2 * 3.2.1 2019-08-10 [1] CRAN (R 3.6.0) glue 1.3.1.9000 2019-10-24 [1] Github (tidyverse/glue@71eeddf) gmodels 2.18.1 2018-06-25 [1] CRAN (R 3.6.0) gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.0) gtools 3.8.1 2018-06-26 [1] CRAN (R 3.6.0) haven 2.1.0 2019-02-19 [1] CRAN (R 3.6.0) highr 0.8 2019-03-20 [1] CRAN (R 3.6.0) hms 0.5.1 2019-08-23 [1] CRAN (R 3.6.0) htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.0) httpuv 1.5.2 2019-09-11 [1] CRAN (R 3.6.0) httr 1.4.1 2019-08-05 [1] CRAN (R 3.6.0) icon 0.1.0 2019-05-28 [1] Github (ropenscilabs/icon@a510f88) janeaustenr 0.1.5 2017-06-10 [1] CRAN (R 3.6.0) jsonlite 1.6 2018-12-07 [1] CRAN (R 3.6.0) KernSmooth 2.23-15 2015-06-29 [1] CRAN (R 3.6.0) klaR 0.6-14 2018-03-19 [1] CRAN (R 3.6.0) knitr 1.25 2019-09-18 [1] CRAN (R 3.6.0) labeling 0.3 2014-08-23 [1] CRAN (R 3.6.0) later 1.0.0 2019-10-04 [1] CRAN (R 3.6.0) lattice 0.20-38 2018-11-04 [1] CRAN (R 3.6.0) lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.6.0) LearnBayes 2.15.1 2018-03-18 [1] CRAN (R 3.6.0) lifecycle 0.1.0 2019-08-01 [1] CRAN (R 3.6.0) lubridate 1.7.4 2018-04-11 [1] CRAN (R 3.6.0) magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0) MASS 7.3-51.4 2019-03-31 [1] CRAN (R 3.6.0) Matrix 1.2-17 2019-03-22 [1] CRAN (R 3.6.0) memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0) mime 0.7 2019-06-11 [1] CRAN (R 3.6.0) miniUI 0.1.1.1 2019-09-01 [1] Github (rstudio/miniUI@52f5854) modelr 0.1.4 2019-02-18 [1] CRAN (R 3.6.0) munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.0) nlme 3.1-140 2019-05-12 [1] CRAN (R 3.6.0) pillar 1.4.2 2019-06-29 [1] CRAN (R 3.6.0) pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.6.0) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.0) pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0) plyr 1.8.4 2016-06-08 [1] CRAN (R 3.6.0) prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0) processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.0) promises 1.1.0 2019-10-04 [1] CRAN (R 3.6.0) ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0) purrr * 0.3.2 2019-03-15 [1] CRAN (R 3.6.0) questionr 0.7.0 2018-11-26 [1] CRAN (R 3.6.0) quoter 0.1.0 2019-07-28 [1] local R6 2.4.0 2019-02-14 [1] CRAN (R 3.6.0) RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 3.6.0) Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.6.0) readr * 1.3.1 2018-12-21 [1] CRAN (R 3.6.0) readxl 1.3.1 2019-03-13 [1] CRAN (R 3.6.0) remotes 2.0.4 2019-04-10 [1] CRAN (R 3.6.0) reshape2 1.4.3 2017-12-11 [1] CRAN (R 3.6.0) rlang 0.4.0.9000 2019-08-03 [1] Github (r-lib/rlang@b0905db) rmarkdown 1.16 2019-10-01 [1] CRAN (R 3.6.0) rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0) rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.6.0) rvest 0.3.4 2019-05-15 [1] CRAN (R 3.6.0) scales 1.0.0 2018-08-09 [1] CRAN (R 3.6.0) sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0) sf 0.8-0 2019-09-17 [1] CRAN (R 3.6.0) shiny 1.3.2 2019-04-22 [1] CRAN (R 3.6.0) SnowballC 0.6.0 2019-01-15 [1] CRAN (R 3.6.0) sp 1.3-1 2018-06-05 [1] CRAN (R 3.6.0) spData 0.3.2 2019-09-19 [1] CRAN (R 3.6.0) spdep 1.1-3 2019-09-18 [1] CRAN (R 3.6.0) statquotes 0.2.2 2017-08-29 [1] CRAN (R 3.6.0) stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0) stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.6.0) testthat 2.2.1 2019-07-25 [1] CRAN (R 3.6.0) tibble * 2.1.3 2019-06-06 [1] CRAN (R 3.6.0) tidyr * 1.0.0 2019-09-11 [1] CRAN (R 3.6.0) tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.6.0) tidytext 0.2.2 2019-07-29 [1] CRAN (R 3.6.0) tidyverse * 1.2.1 2017-11-14 [1] CRAN (R 3.6.0) tokenizers 0.2.1 2018-03-29 [1] CRAN (R 3.6.0) units 0.6-3 2019-05-03 [1] CRAN (R 3.6.0) usethis 1.5.0 2019-04-07 [1] CRAN (R 3.6.0) vctrs 0.2.0.9000 2019-08-03 [1] Github (r-lib/vctrs@11c34ae) withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0) wordcloud 2.6 2018-08-24 [1] CRAN (R 3.6.0) xaringan 0.9 2019-03-06 [1] CRAN (R 3.6.0) xfun 0.10 2019-10-01 [1] CRAN (R 3.6.0) xml2 1.2.0 2018-01-24 [1] CRAN (R 3.6.0) xtable 1.8-4 2019-04-21 [1] CRAN (R 3.6.0) yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0) zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.0) [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library ``` <p></p> ::: These slides are licensed under <br><center><a href="https://creativecommons.org/licenses/by-sa/3.0/au/"><img src="images/cc.svg" style="height:2em;"/><img src="images/by.svg" style="height:2em;"/><img src="images/sa.svg" style="height:2em;"/></a></center>