background-color: #006DAE class: middle center hide-slide-number <div class="shade_black" style="width:60%;right:0;bottom:0;padding:10px;border: dashed 4px white;margin: auto;"> <i class="fas fa-exclamation-circle"></i> These slides are viewed best by Chrome and occasionally need to be refreshed if elements did not load properly. See <a href=edibble-slides.pdf>here for PDF <i class="fas fa-file-pdf"></i></a>. </div> <br> .white[Press the **right arrow** to progress to the next slide!] --- background-image: url(images/bg1.jpg) background-size: cover class: hide-slide-number split-70 title-slide count: false .column.shade_black[.content[ <br> # .monash-blue.outline-text[Advent of "Grammar"] <h2 class="monash-blue2 outline-text" style="font-size: 30pt!important;"></h2> <br> <h2 style="font-weight:900!important;">Bridging Statistics and Data Science for the Design of Experiments</h2> .bottom_abs.width100[ **Emi Tanaka** Department of Econometrics and Business Statistics
<i class="fas fa-envelope faa-float animated "></i>
emi.tanaka@monash.edu
<i class="fab fa-twitter faa-float animated "></i>
@statsgen 27th Nov 2020 @ Statistical Society of Australia <br> ] ]] <div class="column transition monash-m-new delay-1s" style="clip-path:url(#swipe__clip-path);"> <div class="background-image" style="background-image:url('images/large.png');background-position: center;background-size:cover;margin-left:3px;"> <svg class="clip-svg absolute"> <defs> <clipPath id="swipe__clip-path" clipPathUnits="objectBoundingBox"> <polygon points="0.5745 0, 0.5 0.33, 0.42 0, 0 0, 0 1, 0.27 1, 0.27 0.59, 0.37 1, 0.634 1, 0.736 0.59, 0.736 1, 1 1, 1 0, 0.5745 0" /> </clipPath> </defs> </svg> </div> </div> --- class: monash-bg-blue middle white <div class="bg-white" style="width:900px;margin:auto;color:black;border:2px solid black;border-radius: 5px;padding:3px;background:"> <div class="center" style="color:white;border-radius: 5px;background:#006DAE;font-size:35px;font-weight:900;">NOTICE</div> <div style="padding:10px;"> <ul> <li> <b>This talk is most relevant for people who code</b> <i class="fas fa-code"></i></li> <li><b>Examples are all in R but philosophy holds for other programming languages</b></li> </ul> </div> </div> <div> <br> <div style="padding-left:300px;"> .circle[1] <b><i class="fas fa-chart-bar"></i> Grammar of Graphics</b><Br><Br> .circle[2] <b><i class="fas fa-wrench"></i> Grammar of Data Manipulation</b><Br><Br> .circle[3] <b><i class="fas fa-project-diagram"></i> Grammar of Experimental Design</b> </div> --- class: transition middle # .circle-big[1] # Grammar of Graphics <i class="fab fa-r-project blue"></i> `ggplot2` π¦ .footnote.monash-bg-blue[ Wickham (2016) ggplot2: Elegant Graphics for Data Analysis. *Springer-Verlag New York* Wilkinson (2005) The Grammar of Graphics. *Statistics and Computing. Springer, 2nd edition.* ] --- class: center middle <img src="images/female.png" width="200px"> ## A typical Australian academic is expected to do 40% teaching, 40% research and 20% administration .font_small[and like 200% external service work, outreach, writing grants, reviewing things, sitting on panels, learning how to do online teaching,...] --- # What are the differences between these plots? .grid[ .item[ ![](images/unnamed-chunk-1-1.png)<!-- --> ] .item[ ![](images/unnamed-chunk-2-1.png)<!-- --> ] ] -- <div style="position:absolute; bottom:10%;left:30%;"> How do you construct these plots? </div> --- name: base-plot class: font_smaller # Using Base R .grid[ .item.border-right[ ```r df ``` ``` ## duty perc ## 1 Teaching 40 ## 2 Research 40 ## 3 Admin 20 ``` ] .item.border-right[ ```r barplot(as.matrix(df$perc), legend = df$duty) ``` ![](images/barplot-1.png)<!-- --> ] .item[ ```r pie(df$perc, labels = df$duty) ``` ![](images/pie-1.png)<!-- --> ] ] .footnote[ R Core Team (2020) R: A Language and Environment for Statistical Computing https://www.R-project.org/ ] <div class="show-on-hover" style="padding:10px;position:absolute;bottom: 25px; right: 10px;"> <a href="#edibble" style="font-size:12pt"><i class="fas fa-caret-right"></i> edibble</a> </div> -- <div class="corner-box" style="bottom:50px;"> <ul> <li><b>Single purpose functions</b> to generate "named plots"</li> <li><b>Input</b> varies, here it is vector or matrix</li> </ul> </div> --- class: font_smaller # Using `ggplot2` .grid[ .item.border-right[ ```r df ``` ``` ## duty perc ## 1 Teaching 40 ## 2 Research 40 ## 3 Admin 20 ``` ] .item.border-right[ ```r ggplot(df, aes(x = "", # dummy y = perc, fill = duty)) + geom_col() ``` ![](images/ggbarplot-1.png)<!-- --> ] .item[ {{content}} ] ] .footnote[ Wilkinson (2005) The Grammar of Graphics. *Statistics and Computing. Springer, 2nd edition.* Wickham (2008) Practical Tools for Exploring Data and Models. *PhD Thesis Chapter 3: A layered grammar of graphics*. Wickham (2010) A Layered Grammar of Graphics, *Journal of Computational and Graphical Statistics, 19:1, 3-28* ] -- ```r ggplot(df, aes(x = "", # dummy y = perc, fill = duty)) + geom_col() + * coord_polar(theta = "y") ``` ![](images/ggpie-1.png)<!-- --> -- .corner-box[ * `ggplot2` implements the **grammar of graphics** * the difference between a **stacked barplot** and a **pie chart** is that the coordinate system have been transformed from **Cartesian coordinate** to **polar coordinate** ] --- # Data <i class="fas fa-exchange-alt"></i> Plot .grid[.item[ * `ggplot` uses **tidy data** as input so plot construction is enforced by consistent thinking in relation to tidy data <details style="font-size:15pt;"> <summary>Tidy data principles</summary> <ol> <li>Each variable must have its own column.</li> <li>Each observation must have its own row.</li> <li>Each value must have its own cell.</li> </ol> </details> * Variables are mapped to a plot aesthetic * Plots are constructed from its components expressed by the **grammar of graphics** ] .item[ <center style="padding:20px;"> <img src="images/ggplot-eg1.png" width = "700px"> </center> ] ] .footnote[ Wickham (2014) Tidy Data. *Journal of Statistical Software, Articles 59 (10): 1β23.* ] --- class: center middle <img src="images/potato.png" width = "200px"> ## Do you like potatoes? -- <br> ## .gray[Let's have a look at a potato field experiment to understand the] infection rate .gray[when it's treated with] sulfur at different season and amount.gray[?] --- class: font_smaller # <i class="fas fa-puzzle-piece"></i> How do we get these graphs? .grid[.item[ ```r data(cochran.crd, package = "agridat") cochran.crd ``` ``` ## inf trt row col ## 1 9 F3 4 1 ## 2 12 O 4 2 ## 3 18 S6 4 3 ## 4 10 F12 4 4 ## 5 24 S6 4 5 ## 6 17 S12 4 6 ## 7 30 S3 4 7 ## 8 16 F6 4 8 ## 9 10 O 3 1 ## 10 7 S3 3 2 ## 11 4 F12 3 3 ## 12 10 F6 3 4 ## 13 21 S3 3 5 ## 14 24 O 3 6 ## 15 29 O 3 7 ## 16 12 S6 3 8 ## 17 9 F3 2 1 ## 18 7 S12 2 2 ## 19 18 F6 2 3 ## 20 30 O 2 4 ## 21 18 F6 2 5 ## 22 16 S12 2 6 ## 23 16 F3 2 7 ## 24 4 F12 2 8 ## 25 9 S3 1 1 ## 26 18 O 1 2 ## 27 17 S12 1 3 ## 28 19 S6 1 4 ## 29 32 O 1 5 ## 30 5 F12 1 6 ## 31 26 O 1 7 ## 32 4 F3 1 8 ``` ] .item[ ![](images/field1-1.png)<!-- --> ![](images/field2-1.png)<!-- --> ] .item[ {{content}} ] ] .footnote[ Kevin Wright (2020). agridat: Agricultural Datasets. R package version 1.17. W.G. Cochran and G. Cox, 1957. Experimental Designs, 2nd ed. John Wiley, New York. ] -- ```r ggplot(cochran.crd, aes(x = col, y = row, * fill = inf)) + geom_tile(color = "black", size = 1.2) + * scale_fill_gradient(low = "gold", * high = "firebrick4") + theme(text = element_text(size = 20)) ``` ```r ggplot(cochran.crd, aes(x = col, y = row, * fill = trt)) + geom_tile(color = "black", size = 1.2) + * scale_fill_viridis_d() + theme(text = element_text(size = 20)) ``` --- # <i class="fas fa-chalkboard-teacher"></i> Lessons from the grammar of graphics<br> in `ggplot2` <br> <div class="info-box" style="margin-right:100px;margin-left:100px;text-align:left;"> <ul> <li> <b>Modifiable</b>: <code>ggplot</code> object can be modified</li> {{content}} </ul> </div> -- <li> <b>Generalisable</b>: Commands may feel verbose (compared to single-graphics functions) but you're less limited in the plots you can construct without an exhaustive list of functions (compared to the number of plots you can create)</li> {{content}} -- <li> <b>Extensible</b>: the system can be extended to make specialised plots or add more features if the same "grammar" is adopted</li> --- class: transition middle # .circle-big[2] # Grammar of <br>Data Manipulation <i class="fab fa-r-project blue"></i> `dplyr` π¦ .footnote.monash-bg-blue[ Wickham, FranΓ§ois, Henry & MΓΌller (2020) dplyr: A Grammar of Data Manipulation. R-package version 1.0.0. ] --- class: font_smaller # What does the code below do? .grid[ .item.border-right[ ```r # part 1 tapply(cochran.crd$inf, cochran.crd$trt, mean) ``` ``` # F12 F3 F6 O S12 S3 S6 # 5.750 9.500 15.500 22.625 14.250 16.750 18.250 ``` ```r df <- cochran.crd # part 2 df$season <- ifelse(grepl("F", df$trt), "Fall", "Spring") df$season[df$trt=="O"] <- "N/A" # part 3 df$sulfur <- sapply(df$trt, function(x) switch(as.character(x), F3 = 300, F6 = 600, F12 = 1200, S3 = 300, S6 = 600, S12 = 1200, O = 0)) # pounds/acre ``` ] .item[ ```r # part 4, move trt after col df[, c("inf", "row", "col", "trt", "season", "sulfur")] ``` ``` # inf row col trt season sulfur # 1 9 4 1 F3 Fall 300 # 2 12 4 2 O N/A 0 # 3 18 4 3 S6 Spring 600 # 4 10 4 4 F12 Fall 1200 # 5 24 4 5 S6 Spring 600 # 6 17 4 6 S12 Spring 1200 # 7 30 4 7 S3 Spring 300 # 8 16 4 8 F6 Fall 600 # 9 10 3 1 O N/A 0 # 10 7 3 2 S3 Spring 300 # 11 4 3 3 F12 Fall 1200 # 12 10 3 4 F6 Fall 600 # 13 21 3 5 S3 Spring 300 # 14 24 3 6 O N/A 0 # 15 29 3 7 O N/A 0 # 16 12 3 8 S6 Spring 600 # 17 9 2 1 F3 Fall 300 # 18 7 2 2 S12 Spring 1200 # 19 18 2 3 F6 Fall 600 # 20 30 2 4 O N/A 0 # 21 18 2 5 F6 Fall 600 # 22 16 2 6 S12 Spring 1200 # 23 16 2 7 F3 Fall 300 # 24 4 2 8 F12 Fall 1200 # 25 9 1 1 S3 Spring 300 # 26 18 1 2 O N/A 0 # 27 17 1 3 S12 Spring 1200 # 28 19 1 4 S6 Spring 600 # 29 32 1 5 O N/A 0 # 30 5 1 6 F12 Fall 1200 # 31 26 1 7 O N/A 0 # 32 4 1 8 F3 Fall 300 ``` ] ]
01
:
00
--- class: font_smaller # Which is easier to read for you? .font_small[Part 1] .grid[ .item.border-right[ ```r # part 1 tapply(cochran.crd$inf, cochran.crd$trt, mean) ``` ``` # F12 F3 F6 O S12 S3 S6 # 5.750 9.500 15.500 22.625 14.250 16.750 18.250 ``` {{content}} ] .item[ ```r # part 1 cochran.crd %>% group_by(trt) %>% summarise(inf_mean = mean(inf)) ``` ``` # # A tibble: 7 x 2 # trt inf_mean # <fct> <dbl> # 1 F12 5.75 # 2 F3 9.5 # 3 F6 15.5 # 4 O 22.6 # 5 S12 14.2 # 6 S3 16.8 # 7 S6 18.2 ``` ] ] -- * Can you guess what `tapply` does if you don't know about it? {{content}} -- * <i class="fas fa-info-circle animated flash"></i> Functions are less accessible to broader audiences if function names and argument names are designed so: * specialist knowledge is required, * approach is ad-hoc, and/or * unintuitively designed for either input and/or output --- class: font_smaller # Which is easier to read for you? .font_small[Part 2] .grid[ .item.border-right[ ```r df <- cochran.crd # part 2 df$season <- ifelse(grepl("F", df$trt), "Fall", "Spring") df$season[df$trt=="O"] <- "N/A" # part 3 df$sulfur <- sapply(df$trt, function(x) switch(as.character(x), F3 = 300, F6 = 600, F12 = 1200, S3 = 300, S6 = 600, S12 = 1200, O = 0)) # pounds/acre # part 4, move trt after col df[, c("inf", "row", "col", "trt", "season", "sulfur")] ``` ``` # inf row col trt season sulfur # 1 9 4 1 F3 Fall 300 # 2 12 4 2 O N/A 0 # 3 18 4 3 S6 Spring 600 # 4 10 4 4 F12 Fall 1200 # 5 24 4 5 S6 Spring 600 # 6 17 4 6 S12 Spring 1200 # 7 30 4 7 S3 Spring 300 # 8 16 4 8 F6 Fall 600 # 9 10 3 1 O N/A 0 # 10 7 3 2 S3 Spring 300 # 11 4 3 3 F12 Fall 1200 # 12 10 3 4 F6 Fall 600 # 13 21 3 5 S3 Spring 300 # 14 24 3 6 O N/A 0 # 15 29 3 7 O N/A 0 # 16 12 3 8 S6 Spring 600 # 17 9 2 1 F3 Fall 300 # 18 7 2 2 S12 Spring 1200 # 19 18 2 3 F6 Fall 600 # 20 30 2 4 O N/A 0 # 21 18 2 5 F6 Fall 600 # 22 16 2 6 S12 Spring 1200 # 23 16 2 7 F3 Fall 300 # 24 4 2 8 F12 Fall 1200 # 25 9 1 1 S3 Spring 300 # 26 18 1 2 O N/A 0 # 27 17 1 3 S12 Spring 1200 # 28 19 1 4 S6 Spring 600 # 29 32 1 5 O N/A 0 # 30 5 1 6 F12 Fall 1200 # 31 26 1 7 O N/A 0 # 32 4 1 8 F3 Fall 300 ``` ] .item[ ```r cochran.crd %>% # part 2 & 3 mutate(season = case_when( str_detect(trt, "F") ~ "Fall", str_detect(trt, "S") ~ "Spring", TRUE ~ "N/A"), sulfur = case_when( str_detect(trt, "12") ~ 1200, str_detect(trt, "6") ~ 600, str_detect(trt, "3") ~ 300, TRUE ~ 0)) %>% # part 4 relocate(trt, .after = col) %>% as_tibble() # optional ``` ``` # # A tibble: 32 x 6 # inf row col trt season sulfur # <int> <int> <int> <fct> <chr> <dbl> # 1 9 4 1 F3 Fall 300 # 2 12 4 2 O N/A 0 # 3 18 4 3 S6 Spring 600 # 4 10 4 4 F12 Fall 1200 # 5 24 4 5 S6 Spring 600 # 6 17 4 6 S12 Spring 1200 # 7 30 4 7 S3 Spring 300 # 8 16 4 8 F6 Fall 600 # 9 10 3 1 O N/A 0 # 10 7 3 2 S3 Spring 300 # # β¦ with 22 more rows ``` ] ] --- class: font_smaller # Which is easier to read for you? .font_small[Part 3] .center[ π― find the mean of `inf` by the combination of variables `season` and `sulfur`<br> **How would you modify the code below?** ] .grid[ .item.border-right[ ```r tapply(df$inf, df$trt, mean) ``` ``` ## F12 F3 F6 O S12 S3 S6 ## 5.750 9.500 15.500 22.625 14.250 16.750 18.250 ``` ] .item[ ```r df %>% group_by(trt) %>% summarise(inf_mean = mean(inf)) ``` ``` ## # A tibble: 7 x 2 ## trt inf_mean ## <fct> <dbl> ## 1 F12 5.75 ## 2 F3 9.5 ## 3 F6 15.5 ## 4 O 22.6 ## 5 S12 14.2 ## 6 S3 16.8 ## 7 S6 18.2 ``` ] ]
00
:
30
--- count: false class: font_smaller # Which is easier to read for you? .font_small[Part 3] .center[ π― find the mean of `inf` by the combination of variables `season` and `sulfur`<br> **How would you modify the code below?** ] .grid[ .item.border-right[ ```r tapply(df$inf, list(df$season, df$sulfur), mean) ``` ``` ## 0 300 600 1200 ## Fall NA 9.50 15.50 5.75 ## N/A 22.625 NA NA NA ## Spring NA 16.75 18.25 14.25 ``` Notice that the above output is `matrix`. {{content}} ] .item[ ```r df %>% group_by(season, sulfur) %>% summarise(inf_mean = mean(inf)) ``` ``` ## # A tibble: 7 x 3 ## # Groups: season [3] ## season sulfur inf_mean ## <chr> <dbl> <dbl> ## 1 Fall 300 9.5 ## 2 Fall 600 15.5 ## 3 Fall 1200 5.75 ## 4 N/A 0 22.6 ## 5 Spring 300 16.8 ## 6 Spring 600 18.2 ## 7 Spring 1200 14.2 ``` ] ] -- This works also: ```r tapply(df$inf, paste0(df$season, df$sulfur), mean) ``` ``` ## Fall1200 Fall300 Fall600 N/A0 Spring1200 ## 5.750 9.500 15.500 22.625 14.250 ## Spring300 Spring600 ## 16.750 18.250 ``` --- class: font_smaller # Reading code like English grammar .grid[ .item.border-right[ ```r df <- cochran.crd %>% mutate(season = case_when( str_detect(trt, "F") ~ "Fall", str_detect(trt, "S") ~ "Spring", TRUE ~ "N/A"), sulfur = case_when( str_detect(trt, "12") ~ 1200, str_detect(trt, "6") ~ 600, str_detect(trt, "3") ~ 300, TRUE ~ 0)) %>% relocate(trt, .after = col) %>% as_tibble() # optional df %>% group_by(season, sulfur) %>% summarise(inf_mean = mean(inf)) ``` ] .item[ * Read `%>%` as "and then" * The code may read more familiar to an English user without much specialist knowledge * Of course this is biased in favour of people who know English * Bonus: no intermediate naming to think or worry about ] ] --- # <i class="fas fa-chalkboard-teacher"></i> Lessons from grammar of data manipulation<br> in `dplyr` <br> <div class="info-box" style="margin-right:100px;margin-left:100px;text-align:left;"> <ul> <li><b>Leveraging existing knowledge</b> of an everyday user or an "average" statistician for the choice of function names and argument names can make it more accessible</li> {{content}} </ul> </div> -- <li><b>Making it more accessible</b>, by making code more expressive or otherwise, can promote statistical literacy</li> {{content}} -- <li><b>Consistency in input and output</b> makes it easier for user expectation and helps to offload mental load</li> --- class: center middle ## Can all this work be considered as statistics? -- ## Or data science? <br> -- <div class="info-box" style="margin-right:100px;margin-left:100px;text-align:left;font-size:18pt;"> Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to <b>extract knowledge and insights from many structural and unstructured data</b>. Data science is related to data mining, machine learning and big data. <span style="float:right"><i class="fas fa-book"></i> <i>Wikipedia, accessed on 27th Nov 2020</i></span> <br><br> </div> -- <br> ## Isn't this what statistics is about? --- class: middle center ## There are two types of people... <br> <img src="images/two-types.png"> --- class: middle center ## There are two types of people... <br> <img src="images/imbalance.png"> <br> ## But actually, it's hugely unbalanced --- class: middle center ## Statistics by nature is inter-disciplinary and about extracting knowledge and insight from data .font_small[at least from my point of view] -- ## but communication largely became exclusive to selective set of statistically well-trained people governed by strict adherence to defintions or protocols in lieu of practicality .font_small[which is not to say it is bad to do this] --- class: middle center ## Far more people need to use statistics than just statistically well-trained people <br> <img src="images/imbalance.png"> {{content}} .footnote[ Wickham (2015) Teaching Safe-Stats, Not Statistical Abstinence. *The American Statistician, Online Discussion* ] -- ## Statistics community need to go beyond talking among like-minded peers --- class: transition middle # .circle-big[3] # Grammar of <br>Experimental Design <i class="fab fa-r-project blue"></i> `edibble` π¦ <i class="fas fa-wrench"></i> (Work-In-Progress) --- # Typical course in experimental design <br>.font_small[(at least at University of Sydney in 2017-2019)] Teach: * Completely Randomised Design * Randomised Complete Block Design * Latin Square Design * Balanced Incomplete Block Design * Factorial Design * <strike> 2</strike><sup>k</sup><strike> Factorial Design</strike> .font_small[(I removed this from 2018, I won't talk about this today)] * Split-plot Design .font_small[(I added this from 2018 among other concepts)] --- # Completely Randomised Design (CRD) .grid[ .item[ <br> <center> <img src="images/crd-eg1.png" width = "300px"/> </center> ] .item[ * `\(t\)` treatments randomised to `\(n\)` units <br> .model-box[ `$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{treatment} + \texttt{error}$$` .font_small[(with constraints and distributional assumptions)] <br> <center> <img src="images/crd-eg1-anova.png" width = "700px"/> </center> ] ] ] --- # Randomised Complete Block Design (RCBD) .grid[ .item[ <br> <center> <img src="images/rcbd-eg1.png" width = "300px"/> </center> ] .item[ * `\(b\)` blocks of size `\(t\)` * `\(t\)` treatments randomised to `\(t\)` units within each block .model-box[ `$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{treatment} + \texttt{block} + \texttt{error}$$` <center> <img src="images/rcbd-eg1-anova.png" width = "850px"/> </center> ] ] ] --- # Latin Square Design (LSD) .grid[ .item[ <br> <center> <img src="images/lsd-eg1.png" width = "300px"/> </center> ] .item[ * two orthogonal blocks of size `\(t\)` * `\(t\)` treatments randomised to units such that every treatment appears exactly once in each block .model-box[ `$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{treatment} + \texttt{row} + \texttt{column} + \texttt{error}$$` <center> <img src="images/lsd-eg1-anova.png" width = "850px"/> </center> ] ] ] --- # Balanced Incomplete Block Design (BIBD) .grid[ .item[ <br> <center> <img src="images/bibd-eg1.png" width = "300px"/> </center> ] .item[ * `\(b\)` blocks of size `\(k < t\)` * `\(t\)` treatments randomised to units within each block such that every pair of treatment appears the same number of times across blocks .model-box[ `$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{block} + \texttt{treatment} + \texttt{error}$$` <center> <img src="images/bibd-eg1-anova.png" width = "850px"/> </center> ] ] ] --- # Factorial Design .grid[ .item[ <br> <center> <img src="images/factorial-eg1.png" width = "300px"/> </center> ] .item[ * `\(ab = t\)` treatments randomised to `\(n\)` units * treatment is every combination of two factors A and B .model-box[ `$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{A} + \texttt{B} + \texttt{A:B} + \texttt{error}$$` <center> <img src="images/factorial-eg1-anova-top.png" width = "850px"/> <details style="font-size:4pt"><summary></summary> <img src="images/factorial-eg1-anova-middle.png" width = "850px"/> </details> <img src="images/factorial-eg1-anova-bottom.png" width = "850px"/> </center> ] ] ] --- # Split-plot Design .grid[ .item[ <br> <center> <img src="images/split-plot-eg1.png" width = "300px"/> </center> ] <div class="item" style="font-size: 0.85em"> <ul> <li> \(n_1\) whole plots consisting of \(b\) sub plots</li> <li>in total there are \(n\) sub plots</li> <li>treatment factor A is randomised to whole plots</li> <li>treatment factor B is randomised to sub plots within each whole plot</li> </ul> .model-box[ `$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{A} + \texttt{WP} + \texttt{B} + \texttt{A:B} + \texttt{error}$$` <center> <img src="images/split-plot-eg1-anova.png" width = "850px"/> </center> ] </div> ] --- class: transition center middle # CRAN Task View of Design of Experiments contains # π¦ .yellow[229 R-packages ] .font_small[as of 2020-07-14] .font_small[(Please note that there may be some webscrapping error)] --- class: center # Top 10 downloaded R-packages in 2018 ![](images/top10_2018-1.png)<!-- --> # `agricolae` is the most downloaded .font_small[(data from `cranlogs` from 2018-01-01 to 2018-12-31)] --- class: center # Top 10 downloaded R-packages in 2019 ![](images/top10-1.png)<!-- --> # Okay fine, results were different in 2019 .font_small[(data from `cranlogs` from 2019-01-01 to 2019-12-31)] .font_small[Please note `agricolae` imports `AlgDesign`] --- class: font_small # `agricolae::design.crd` .blue[**Completely randomised design**] for `\(t = 3\)` treatments with `\(2\)` replicates each <pre><code> trt <- c("A", "B", "C") agricolae::.bg-yellow[design.crd](trt = trt, r = 2) %>% glimpse() </code></pre> .scroll-350[ ``` ## List of 2 ## $ parameters:List of 7 ## ..$ design: chr "crd" ## ..$ trt : chr [1:3] "A" "B" "C" ## ..$ r : num [1:3] 2 2 2 ## ..$ serie : num 2 ## ..$ seed : int 396269021 ## ..$ kinds : chr "Super-Duper" ## ..$ : logi TRUE ## $ book :'data.frame': 6 obs. of 3 variables: ## ..$ plots: num [1:6] 101 102 103 104 105 106 ## ..$ r : int [1:6] 1 1 2 1 2 2 ## ..$ trt : chr [1:6] "C" "A" "C" "B" ... ``` ] <div class="plot-box" style="position:absolute;top: 35%; right: 50px;"> <img src="images/ggcrd-1.png"> </div> --- class: font_small # `agricolae::design.rcbd` .blue[**Randomised complete block design**] for `\(t =3\)` treatments with `\(2\)` blocks <pre><code> trt <- c("A", "B", "C") agricolae::.bg-yellow[design.rcbd](trt = trt, r = 2) %>% glimpse() </code></pre> .scroll-350[ ``` ## List of 3 ## $ parameters:List of 7 ## ..$ design: chr "rcbd" ## ..$ trt : chr [1:3] "A" "B" "C" ## ..$ r : num 2 ## ..$ serie : num 2 ## ..$ seed : int 973575053 ## ..$ kinds : chr "Super-Duper" ## ..$ : logi TRUE ## $ sketch : chr [1:2, 1:3] "C" "A" "B" "C" ... ## $ book :'data.frame': 6 obs. of 3 variables: ## ..$ plots: num [1:6] 101 102 103 201 202 203 ## ..$ block: Factor w/ 2 levels "1","2": 1 1 1 2 2 2 ## ..$ trt : Factor w/ 3 levels "A","B","C": 3 2 1 1 3 2 ``` ] <div class="plot-box" style="position:absolute;bottom: 10px; right: 50px;"> <br> <img src="images/ggrcbd-1.png"> </div> --- class: font_small # `agricolae::design.lsd()` .blue[**Latin square design**] for `\(t = 3\)` treatments <pre><code> trt <- c("A", "B", "C") agricolae::.bg-yellow[design.lsd](trt = trt) %>% glimpse() </code></pre> .scroll-350[ ``` ## List of 3 ## $ parameters:List of 7 ## ..$ design: chr "lsd" ## ..$ trt : chr [1:3] "A" "B" "C" ## ..$ r : int 3 ## ..$ serie : num 2 ## ..$ seed : int -1984440067 ## ..$ kinds : chr "Super-Duper" ## ..$ : logi TRUE ## $ sketch : chr [1:3, 1:3] "B" "C" "A" "A" ... ## $ book :'data.frame': 9 obs. of 4 variables: ## ..$ plots: num [1:9] 101 102 103 201 202 203 301 302 303 ## ..$ row : Factor w/ 3 levels "1","2","3": 1 1 1 2 2 2 3 3 3 ## ..$ col : Factor w/ 3 levels "1","2","3": 1 2 3 1 2 3 1 2 3 ## ..$ trt : Factor w/ 3 levels "A","B","C": 2 1 3 3 2 1 1 3 2 ``` ] <div class="plot-box" style="position:absolute;top: 100px; right: 50px;"> <br> <img src="images/gglsd-1.png"> </div> --- class: font_small # `agricolae::design.bib()` .blue[**Balanced incomplete block design**] for `\(t = 3\)` treatments with block size of `\(2\)` <pre><code> trt <- c("A", "B", "C") agricolae::.bg-yellow[design.bib](trt = trt, k = 2) %>% glimpse() </code></pre> .scroll-350[ ``` ## [1] "No improvement over initial random design." ## ## Parameters BIB ## ============== ## Lambda : 1 ## treatmeans : 3 ## Block size : 2 ## Blocks : 3 ## Replication: 2 ## ## Efficiency factor 0.75 ## ## <<< Book >>> ``` ``` ## List of 4 ## $ parameters:List of 6 ## ..$ design: chr "bib" ## ..$ trt : chr [1:3] "A" "B" "C" ## ..$ k : num 2 ## ..$ serie : num 2 ## ..$ seed : int -509134623 ## ..$ kinds : chr "Super-Duper" ## $ statistics:'data.frame': 1 obs. of 6 variables: ## ..$ lambda : num 1 ## ..$ treatmeans: int 3 ## ..$ blockSize : num 2 ## ..$ blocks : int 3 ## ..$ r : num 2 ## ..$ Efficiency: num 0.75 ## $ sketch : chr [1:3, 1:2] "C" "B" "B" "A" ... ## $ book :'data.frame': 6 obs. of 3 variables: ## ..$ plots: num [1:6] 101 102 201 202 301 302 ## ..$ block: Factor w/ 3 levels "1","2","3": 1 1 2 2 3 3 ## ..$ trt : Factor w/ 3 levels "A","B","C": 3 1 2 1 2 3 ``` ] <div class="plot-box" style="position:absolute;top: 200px; right: 50px;"> <br> <img src="images/ggbibd-1.png"> </div> --- class: font_small # `agricolae::design.ab()` .blue[**Factorial design**] for `\(t = 3 \times 2\)` treatments with `\(2\)` replication for each treatment <pre><code> agricolae::.bg-yellow[design.ab](trt = c(3, 2), r = 2, design = "crd") %>% glimpse() </code></pre> .scroll-350[ ``` ## List of 2 ## $ parameters:List of 8 ## ..$ design : chr "factorial" ## ..$ trt : chr [1:6] "1 1" "1 2" "2 1" "2 2" ... ## ..$ r : num [1:6] 2 2 2 2 2 2 ## ..$ serie : num 2 ## ..$ seed : int 801301585 ## ..$ kinds : chr "Super-Duper" ## ..$ : logi TRUE ## ..$ applied: chr "crd" ## $ book :'data.frame': 12 obs. of 4 variables: ## ..$ plots: num [1:12] 101 102 103 104 105 106 107 108 109 110 ... ## ..$ r : int [1:12] 1 1 1 2 1 1 1 2 2 2 ... ## ..$ A : chr [1:12] "2" "3" "2" "2" ... ## ..$ B : chr [1:12] "2" "2" "1" "2" ... ``` ] Note *not* A/B testing! <div class="plot-box" style="position:absolute;top: 200px; right: 50px;"> <br> <img src="images/ggfac-1.png"> <br> </div> --- class: font_small # `agricolae::design.split()` .blue[**Split-plot design**] for `\(t = 2 \times 4\)` treatments with `\(2\)` replication for each treatment <pre><code> trt1 <- c("I", "R"); trt2 <- LETTERS[1:4] agricolae::.bg-yellow[design.split](trt1 = trt1, trt2 = trt2, r = 2, design = "crd") %>% glimpse() </code></pre> .scroll-350[ ``` ## List of 2 ## $ parameters:List of 8 ## ..$ design : chr "split" ## ..$ : logi TRUE ## ..$ trt1 : chr [1:2] "I" "R" ## ..$ applied: chr "crd" ## ..$ r : num [1:2] 2 2 ## ..$ serie : num 2 ## ..$ seed : int -765020087 ## ..$ kinds : chr "Super-Duper" ## $ book :'data.frame': 16 obs. of 5 variables: ## ..$ plots : num [1:16] 101 101 101 101 102 102 102 102 103 103 ... ## ..$ splots: Factor w/ 4 levels "1","2","3","4": 1 2 3 4 1 2 3 4 1 2 ... ## ..$ r : int [1:16] 1 1 1 1 1 1 1 1 2 2 ... ## ..$ trt1 : chr [1:16] "I" "I" "I" "I" ... ## ..$ trt2 : chr [1:16] "D" "A" "C" "B" ... ``` ] <div class="plot-box" style="position:absolute;top: 300px; right: 50px;"> <img src="images/split-plot-graph-1.png"> </div> --- name: edibble # Grammar of experimental design * Remember Base R plots? <a href="#base-plot" style="font-size:12pt"><i class="fas fa-caret-right"></i> base plot</a> -- * π€ "named experimental design" functions (`agricolae::design.crd`, etc.) are like "named statistical graphic" functions (`pie`, `barplot`) -- * π‘ construction of experimental design needs to be made more accessible, modifiable, extensible and generalisable -- * This is the concept for the development of the `edibble` R-package <br>https://github.com/emitanaka/edibble<br>(<i class="fas fa-wrench"></i> WIP, sorry code not ready yet, please enjoy prototype and live demo instead) -- * `tibble` R-package is a modern reimagining of the `data.frame` * `edibble` (WIP) creates experimental design tibbles {{content}} .footnote[ MΓΌller & Wickham (2020) tibble: Simple Data Frames. *R package version 3.0.3* ] --- class: font_smaller # Prototype for grammar of experimental design .grid[.item[ * Consider a field experiment with 32 plots ] .item[ ```r library(edibble) initiate_design(seed = 2020) %>% * set_units(plot = 32) ``` ``` ## An edibble design ## ββplot (32 levels) ``` ] ] --- class: font_smaller count: false # Prototype Grammar of Experimental Design .grid[.item[ * Consider a field experiment with 32 plots * There are 7 different treatments ] .item50[ ```r library(edibble) initiate_design(seed = 2020) %>% set_units(plot = 32) %>% * set_trts(trt = c("F12", "F3", "F6", "O", * "S12", "S3", "S6")) ``` ``` ## An edibble design ## ββplot (32 levels) ## ββtrt (7 levels) ``` ] ] --- class: font_smaller count: false # Prototype Grammar of Experimental Design .circle[1] .grid[.item[ * Consider a field experiment with 32 plots * There are 7 different treatments * Apply treatments to plots ] .item50[ ```r library(edibble) initiate_design(seed = 2020) %>% set_units(plot = 32) %>% set_trts(trt = c("F12", "F3", "F6", "O", "S12", "S3", "S6")) %>% * apply_trts(~ plot) ``` ``` ## An edibble design ## ββplot (32 levels) ## ββtrt (7 levels) ``` ] ] --- class: font_smaller count: false # Prototype Grammar of Experimental Design .circle[1] .grid[.item[ * Consider a field experiment with 32 plots * There are 7 different treatments * Apply treatments to plots * Randomise treatments to plots ] .item50[ ```r library(edibble) initiate_design(seed = 2020) %>% set_units(plot = 32) %>% set_trts(trt = c("F12", "F3", "F6", "O", "S12", "S3", "S6")) %>% apply_trts(~ plot) %>% * randomise_trts() ``` ``` ## An edibble design ## ββplot (32 levels) ## ββtrt (7 levels) ``` ] ] --- class: font_smaller count: false # Prototype Grammar of Experimental Design .circle[1] .grid[.item[ * Consider a field experiment with 32 plots * There are 7 different treatments * Apply treatments to plots * Randomise treatments to plots * The result is a **completely randomised design** ] .item50[ ```r library(edibble) initiate_design(seed = 2020) %>% set_units(plot = 32) %>% set_trts(trt = c("F12", "F3", "F6", "O", "S12", "S3", "S6")) %>% apply_trts(~ plot) %>% randomise_trts() %>% * serve_table() ``` ``` ## # An edibble: 32 x 2 ## plot trt ## <fct> <fct> ## 1 plot1 F12 ## 2 plot2 S3 ## 3 plot3 O ## 4 plot4 S6 ## 5 plot5 F3 ## 6 plot6 S12 ## 7 plot7 F6 ## 8 plot8 S6 ## 9 plot9 F3 ## 10 plot10 S3 ## # β¦ with 22 more rows ``` ] ] --- class: font_smaller # Prototype Grammar of Experimental Design .circle[2] .grid[.item[ * Consider a field experiment with 2 blocks each with 16 plots * There are 7 different treatments * Apply treatments to plots * Randomise treatments to plots within blocks * Resulting design is a **randomised block design** <br> Can you see how it differs from the previous design? ] .item50[ ```r library(edibble) initiate_design(seed = 2020) %>% * set_units(block = c("B1", "B2"), * plot = nested_in(block, 16)) %>% set_trts(trt = c("F12", "F3", "F6", "O", "S12", "S3", "S6")) %>% apply_trts(~ plot) %>% randomise_trts() %>% serve_table() ``` ``` ## # An edibble: 32 x 3 ## block plot trt ## <fct> <fct> <fct> ## 1 B1 plot1 F12 ## 2 B1 plot2 S3 ## 3 B1 plot3 S12 ## 4 B1 plot4 S6 ## 5 B1 plot5 F3 ## 6 B1 plot6 S3 ## 7 B1 plot7 F6 ## 8 B1 plot8 S6 ## 9 B1 plot9 F3 ## 10 B1 plot10 S6 ## # β¦ with 22 more rows ``` ] ] --- class: transition middle .animated.flash.font_large[ # LIVE DEMO ] --- # Limitations .pad20[ * Generalising experimental designs is extremely hard * A lot of domains have their own specialised design {{content}} ] -- * Some specialist designs that have certain properties cannot be easily replicated by `edibble` * An ad-hoc approach may still be required to create design with specific properties {{content}} -- but the hope is that this can be achieved with replacement in one of the steps (e.g. replacing `randomise_trts`) {{content}} -- * _So why would anyone buy into this framework?_ {{content}} -- * Hopefully, developers and users can see the benefit in being more expressive in code; and also a pay-off like easy visualisation of designs becomes worthwhile --- # <i class="fas fa-clock"></i> Timeline .grid[.item[ * By end of 2020, I intend to have `edibble` be able to construct textbook (mostly orthogonal) designs. * From 2021, I plan to hold (free) workshops in experimental design with `edibble` to generate certain designs. * The initial development for `edibble` is focussed on designs with "fixed" structures. * For survey and sampling designs you should check out [DeclareDesign](https://declaredesign.org/). ] .item[ .center[ <br><br> # <i class="fas fa-comments"></i> Feedback is welcomed! ] <br> <i class="fab fa-github"></i> <a href="https://github.com/emitanaka/edibble/issues" style="color:black!important">emitanaka/edibble/issues</a> <br> <i class="fas fa-envelope"></i> emi.tanaka@monash.edu <br><br> <div class="center"> <div style="border: 2px dotted #006DAE; width:400px;padding:10px;margin:auto;background:#D2F1FF;"> <i class="fas fa-link"></i> Slides can be found at <a href="https://www.emitanaka.org/slides/SSA2020/" style="font-size:20pt">emitanaka.org/slides/SSA2020/</a> </div> </div> ] ] --- class: font_smaller # Acknowledgements This slide was made using `xaringan` R-package and the following systems. .scroll-350[ ```r sessioninfo::session_info() ``` ``` ## β Session info βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ## setting value ## version R version 4.0.1 (2020-06-06) ## os macOS Catalina 10.15.7 ## system x86_64, darwin17.0 ## ui X11 ## language (EN) ## collate en_AU.UTF-8 ## ctype en_AU.UTF-8 ## tz Australia/Melbourne ## date 2020-11-27 ## ## β Packages βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ## package * version date lib source ## agricolae 1.3-3 2020-06-07 [1] CRAN (R 4.0.1) ## AlgDesign 1.2.0 2019-11-29 [1] CRAN (R 4.0.1) ## anicon 0.1.0 2020-06-21 [1] Github (emitanaka/anicon@0b756df) ## assertthat 0.2.1 2019-03-21 [2] CRAN (R 4.0.0) ## backports 1.2.0 2020-11-02 [1] CRAN (R 4.0.2) ## broom 0.7.2 2020-10-20 [1] CRAN (R 4.0.2) ## callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.2) ## cellranger 1.1.0 2016-07-27 [2] CRAN (R 4.0.0) ## cli 2.2.0 2020-11-20 [1] CRAN (R 4.0.1) ## cluster 2.1.0 2019-06-19 [2] CRAN (R 4.0.1) ## colorspace 2.0-0 2020-11-11 [1] CRAN (R 4.0.2) ## combinat 0.0-8 2012-10-29 [1] CRAN (R 4.0.1) ## countdown 0.3.5 2020-07-20 [1] Github (gadenbuie/countdown@a544fa4) ## crayon 1.3.4 2017-09-16 [2] CRAN (R 4.0.0) ## DBI 1.1.0 2019-12-15 [1] CRAN (R 4.0.2) ## dbplyr 2.0.0 2020-11-03 [1] CRAN (R 4.0.2) ## digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2) ## dplyr * 1.0.2 2020-08-18 [1] CRAN (R 4.0.2) ## edibble * 0.0.0.9000 2020-11-15 [1] local ## ellipsis 0.3.1 2020-05-15 [2] CRAN (R 4.0.0) ## emo 0.0.0.9000 2020-06-26 [1] Github (hadley/emo@3f03b11) ## evaluate 0.14 2019-05-28 [2] CRAN (R 4.0.0) ## fansi 0.4.1 2020-01-08 [2] CRAN (R 4.0.0) ## farver 2.0.3.9000 2020-07-24 [1] Github (thomasp85/farver@f1bcb56) ## fastmap 1.0.1 2019-10-08 [2] CRAN (R 4.0.0) ## forcats * 0.5.0 2020-03-01 [2] CRAN (R 4.0.0) ## fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2) ## generics 0.1.0 2020-10-31 [2] CRAN (R 4.0.2) ## ggplot2 * 3.3.2 2020-06-19 [1] CRAN (R 4.0.2) ## glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2) ## gtable 0.3.0 2019-03-25 [2] CRAN (R 4.0.0) ## haven 2.3.1 2020-06-01 [2] CRAN (R 4.0.0) ## highr 0.8 2019-03-20 [2] CRAN (R 4.0.0) ## hms 0.5.3 2020-01-08 [2] CRAN (R 4.0.0) ## htmltools 0.5.0 2020-06-16 [1] CRAN (R 4.0.2) ## httpuv 1.5.4 2020-06-06 [2] CRAN (R 4.0.1) ## httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2) ## icon 0.1.0 2020-06-21 [1] Github (emitanaka/icon@8458546) ## igraph 1.2.6 2020-10-06 [1] CRAN (R 4.0.2) ## jsonlite 1.7.1 2020-09-07 [1] CRAN (R 4.0.2) ## klaR 0.6-15 2020-02-19 [1] CRAN (R 4.0.1) ## knitr 1.30 2020-09-22 [1] CRAN (R 4.0.2) ## labeling 0.4.2 2020-10-20 [1] CRAN (R 4.0.2) ## labelled 2.7.0 2020-09-21 [1] CRAN (R 4.0.2) ## later 1.1.0.1 2020-06-05 [2] CRAN (R 4.0.1) ## lattice 0.20-41 2020-04-02 [2] CRAN (R 4.0.1) ## lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.0) ## lubridate 1.7.9 2020-06-08 [2] CRAN (R 4.0.1) ## magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2) ## MASS 7.3-53 2020-09-09 [1] CRAN (R 4.0.2) ## mime 0.9 2020-02-04 [2] CRAN (R 4.0.0) ## miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.0.0) ## modelr 0.1.8 2020-05-19 [2] CRAN (R 4.0.0) ## munsell 0.5.0 2018-06-12 [2] CRAN (R 4.0.0) ## nlme 3.1-150 2020-10-24 [2] CRAN (R 4.0.2) ## pillar 1.4.7 2020-11-20 [1] CRAN (R 4.0.1) ## pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.0.0) ## processx 3.4.4 2020-09-03 [1] CRAN (R 4.0.2) ## promises 1.1.1 2020-06-09 [1] CRAN (R 4.0.0) ## ps 1.4.0 2020-10-07 [1] CRAN (R 4.0.2) ## purrr * 0.3.4 2020-04-17 [2] CRAN (R 4.0.0) ## questionr 0.7.3 2020-10-05 [1] CRAN (R 4.0.2) ## R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2) ## Rcpp 1.0.5 2020-07-06 [1] CRAN (R 4.0.0) ## readr * 1.4.0 2020-10-05 [2] CRAN (R 4.0.2) ## readxl 1.3.1 2019-03-13 [2] CRAN (R 4.0.0) ## reprex 0.3.0.9001 2020-08-08 [1] Github (tidyverse/reprex@9594ee9) ## rlang 0.4.8 2020-10-08 [1] CRAN (R 4.0.2) ## rmarkdown 2.5 2020-10-21 [1] CRAN (R 4.0.1) ## rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.1) ## rvest 0.3.6 2020-07-25 [1] CRAN (R 4.0.2) ## scales 1.1.1 2020-05-11 [2] CRAN (R 4.0.0) ## sessioninfo 1.1.1 2018-11-05 [2] CRAN (R 4.0.0) ## shiny 1.5.0 2020-06-23 [1] CRAN (R 4.0.2) ## stringi 1.5.3 2020-09-09 [2] CRAN (R 4.0.2) ## stringr * 1.4.0 2019-02-10 [2] CRAN (R 4.0.0) ## tibble * 3.0.4.9000 2020-11-26 [1] Github (tidyverse/tibble@9eeef4d) ## tidyr * 1.1.2 2020-08-27 [1] CRAN (R 4.0.2) ## tidyselect 1.1.0 2020-05-11 [2] CRAN (R 4.0.0) ## tidyverse * 1.3.0 2019-11-21 [1] CRAN (R 4.0.2) ## utf8 1.1.4 2018-05-24 [2] CRAN (R 4.0.0) ## vctrs 0.3.5.9000 2020-11-26 [1] Github (r-lib/vctrs@957baf7) ## viridisLite 0.3.0 2018-02-01 [2] CRAN (R 4.0.0) ## whisker 0.4 2019-08-28 [2] CRAN (R 4.0.0) ## withr 2.3.0 2020-09-22 [1] CRAN (R 4.0.2) ## xaringan 0.18 2020-10-21 [1] CRAN (R 4.0.2) ## xfun 0.19 2020-10-30 [1] CRAN (R 4.0.2) ## xml2 1.3.2 2020-04-23 [2] CRAN (R 4.0.0) ## xtable 1.8-4 2019-04-21 [2] CRAN (R 4.0.0) ## yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2) ## ## [1] /Users/etan0038/Library/R/4.0/library ## [2] /Library/Frameworks/R.framework/Versions/4.0/Resources/library ``` ] (Scroll on html slide to see all)