background-color: #006DAE class: middle center hide-slide-number <div class="shade_black" style="width:60%;right:0;bottom:0;padding:10px;border: dashed 4px white;margin: auto;"> <i class="fas fa-exclamation-circle"></i> These slides are viewed best by Chrome and occasionally need to be refreshed if elements did not load properly. See <a href=edibble-slides.pdf>here for PDF <i class="fas fa-file-pdf"></i></a>. </div> <br> .white[Press the **right arrow** to progress to the next slide!] --- background-image: url(images/bg1.jpg) background-size: cover class: hide-slide-number split-70 title-slide count: false .column.shade_black[.content[ <br> # .monash-blue.outline-text[Advent of "Grammar"] <h2 class="monash-blue2 outline-text" style="font-size: 30pt!important;"></h2> <br> <h2 style="font-weight:900!important;">Bridging Statistics and Data Science for Experimental Design</h2> .bottom_abs.width100[ **Emi Tanaka** Department of Econometrics and Business Statistics
<i class="fas fa-envelope faa-float animated "></i>
emi.tanaka@monash.edu
<i class="fab fa-twitter faa-float animated "></i>
@statsgen 15th July 2020 @ Monash Informal Bioinfo Seminar <br> ] ]] <div class="column transition monash-m-new delay-1s" style="clip-path:url(#swipe__clip-path);"> <div class="background-image" style="background-image:url('images/large.png');background-position: center;background-size:cover;margin-left:3px;"> <svg class="clip-svg absolute"> <defs> <clipPath id="swipe__clip-path" clipPathUnits="objectBoundingBox"> <polygon points="0.5745 0, 0.5 0.33, 0.42 0, 0 0, 0 1, 0.27 1, 0.27 0.59, 0.37 1, 0.634 1, 0.736 0.59, 0.736 1, 1 1, 1 0, 0.5745 0" /> </clipPath> </defs> </svg> </div> </div> --- class: transition middle # .circle-big[1] # Grammar of Graphics <i class="fab fa-r-project blue"></i> `ggplot2` π¦ .footnote.monash-bg-blue[ Wickham (2016) ggplot2: Elegant Graphics for Data Analysis. *Springer-Verlag New York* ] --- # What are the differences between these plots? .grid[ .item[ ![](images/unnamed-chunk-1-1.png)<!-- --> ] .item[ ![](images/unnamed-chunk-2-1.png)<!-- --> ] ] -- <div style="position:absolute; bottom:10%;left:30%;"> How do you construct these plots? </div> --- class: font_smaller # Using R: `base` .grid[ .item.border-right[ ```r df ``` ``` ## duty perc ## 1 Teaching 40 ## 2 Research 40 ## 3 Admin 20 ``` ] .item.border-right[ ```r barplot(as.matrix(df$perc), legend = df$duty) ``` ![](images/barplot-1.png)<!-- --> ] .item[ ```r pie(df$perc, labels = df$duty) ``` ![](images/pie-1.png)<!-- --> ] ] .footnote[ R Core Team (2020) R: A Language and Environment for Statistical Computing https://www.R-project.org/ ] -- <div class="corner-box" style="bottom:50px;"> <ul> <li><b>Single purpose functions</b> to generate "named plots"</li> <li><b>Input</b> varies, here it is vector or matrix</li> </ul> </div> --- class: font_smaller # Using R: `ggplot2` .grid[ .item.border-right[ ```r df ``` ``` ## duty perc ## 1 Teaching 40 ## 2 Research 40 ## 3 Admin 20 ``` ] .item.border-right[ ```r ggplot(df, aes(x = "", # dummy y = perc, fill = duty)) + geom_col() ``` ![](images/ggbarplot-1.png)<!-- --> ] .item[ {{content}} ] ] .footnote[ Wilkinson (2005) The Grammar of graphics. *Statistics and Computing. Springer, 2nd edition.* Wickham (2008) Practical Tools for Exploring Data and Models. *PhD Thesis Chapter 3: A layered grammar of graphics*. Wickham (2010) A Layered Grammar of Graphics, *Journal of Computational and Graphical Statistics, 19:1, 3-28* ] -- ```r ggplot(df, aes(x = "", # dummy y = perc, fill = duty)) + geom_col() + * coord_polar(theta = "y") ``` ![](images/ggpie-1.png)<!-- --> -- .corner-box[ * `ggplot2` implements the **grammar of graphics** * the difference between a **stacked barplot** and a **pie chart** is that the coordinate system have been transformed from **Cartesian coordinate** to **polar coordinate** ] --- # Data <i class="fas fa-exchange-alt"></i> Plot .grid[.item[ * `ggplot` uses **tidy data** as input so plot construction is enforced by consistent thinking in relation to tidy data <details style="font-size:15pt;"> <summary>Tidy data principles</summary> <ol> <li>Each variable must have its own column.</li> <li>Each observation must have its own row.</li> <li>Each value must have its own cell.</li> </ol> </details> * Variables are mapped to a plot aesthetic * Plots are constructed from its components expressed by the **grammar of graphics** ] .item[ <center style="padding:20px;"> <img src="images/ggplot-eg1.png" width = "700px"> </center> ] ] .footnote[ Wickham (2014) Tidy Data. *Journal of Statistical Software, Articles 59 (10): 1β23.* ] --- class: font_smaller # <i class="fas fa-puzzle-piece"></i> What graph will this yield? .grid[.item.border-right[ ```r df2 ``` ``` ## # A tibble: 6 x 3 ## duty perc type ## <chr> <dbl> <chr> ## 1 Teaching 40 standard ## 2 Research 40 standard ## 3 Admin 20 standard ## 4 Teaching 80 teaching ## 5 Research 0 teaching ## 6 Admin 20 teaching ``` ] .item.border-right[ ```r g <- ggplot(df2, * aes(x = type, y = perc, fill = duty)) + geom_col() g ``` ] .item[ ```r *g + coord_polar("y") ``` ] ] --- count: false class: font_smaller # <i class="fas fa-puzzle-piece"></i> What graph will this yield? .grid[.item.border-right[ ```r df2 ``` ``` ## # A tibble: 6 x 3 ## duty perc type ## <chr> <dbl> <chr> ## 1 Teaching 40 standard ## 2 Research 40 standard ## 3 Admin 20 standard ## 4 Teaching 80 teaching ## 5 Research 0 teaching ## 6 Admin 20 teaching ``` ] .item.border-right[ ```r g <- ggplot(df2, * aes(x = type, y = perc, fill = duty)) + geom_col() g ``` ![](images/barplot2-1.png)<!-- --> ] .item[ ```r *g + coord_polar("y") ``` ] ] --- count: false class: font_smaller # <i class="fas fa-puzzle-piece"></i> What graph will this yield? .grid[.item.border-right[ ```r df2 ``` ``` ## # A tibble: 6 x 3 ## duty perc type ## <chr> <dbl> <chr> ## 1 Teaching 40 standard ## 2 Research 40 standard ## 3 Admin 20 standard ## 4 Teaching 80 teaching ## 5 Research 0 teaching ## 6 Admin 20 teaching ``` ] .item.border-right[ ```r g <- ggplot(df2, * aes(x = type, y = perc, fill = duty)) + geom_col() g ``` ![](images/barplot2-1.png)<!-- --> ] .item[ ```r *g + coord_polar("y") ``` ![](images/pie2-1.png)<!-- --> {{content}} ] ] -- ```r g + coord_polar("x") ``` {{content}} -- ![](images/pie2x-1.png)<!-- --> -- .corner-box[ * .yellow[**Modifiable**]: `ggplot` object can be modified * .yellow[**Generalisable**]: `ggplot2` uses a cohesive and complex system under the hood to make many kinds of plots * .yellow[**Extensible**]: the system can be extended to make specialised plots or add more features if the same "grammar" is adopted ] --- class: transition middle # .circle-big[2] # Grammar of <br>Data Manipulation <i class="fab fa-r-project blue"></i> `dplyr` π¦ .footnote.monash-bg-blue[ Wickham, FranΓ§ois, Henry & MΓΌller (2020) dplyr: A Grammar of Data Manipulation. R-package version 1.0.0. ] --- class: font_smaller # <i class="fas fa-ship"></i> Caught in the Wild: Wrangling Fisheries Data .scroll-fix[ ```r samples <- foreign::read.dbf("data/SEF_SAMP.DBF") samples <- samples[samples$GEAR_NAME=="McKenna trawl" & samples$GEAR_TYPE=="FISH TRAWL",] ## taxa data taxa<- foreign::read.dbf("data/SEF_TAXA.DBF") taxa <- taxa[taxa$SEF_SPCODE > 37000000 & taxa$SEF_SPCODE < 38000000 ,] ## clean sample data for(i in 1:dim(samples)[1]) if(!any(taxa$SEF_SPCODE==samples$SEF_SPCODE[i])) samples[i,] <- NA samples <- samples[!is.na(samples$SEF_SPCODE),] samp.cnt <- tapply(samples$SEF_SPCODE,samples$SEF_SPCODE,length) samp.cnt <- samp.cnt[samp.cnt>=10] ## taxa data for(i in 1:dim(taxa)[1]) if(!any(as.numeric(names(samp.cnt))==taxa$SEF_SPCODE[i])) taxa[i,] <- NA taxa <- na.omit(taxa) ``` ] .footnote[ Bax & Williams (2000). Habitat and Fisheries Production in the South East Fishery Ecosystem. *Technical report. 94/040.* ] -- .corner-box[ * Cleaning data is an important aspect of statistical work * Providing code for reproducibility is important * So... what are these lines of code doing? ] --- class: font_smaller # Data Manipulation: `base` & `dplyr` .grid[ .item50.border-right[ .scroll-fix[ ```r samples <- foreign::read.dbf("data/SEF_SAMP.DBF") samples <- samples[samples$GEAR_NAME=="McKenna trawl" & samples$GEAR_TYPE=="FISH TRAWL",] ## taxa data taxa<- foreign::read.dbf("data/SEF_TAXA.DBF") taxa <- taxa[taxa$SEF_SPCODE > 37000000 & taxa$SEF_SPCODE < 38000000 ,] ## clean sample data for(i in 1:dim(samples)[1]) if(!any(taxa$SEF_SPCODE==samples$SEF_SPCODE[i])) samples[i,] <- NA samples <- samples[!is.na(samples$SEF_SPCODE),] samp.cnt <- tapply(samples$SEF_SPCODE,samples$SEF_SPCODE,length) samp.cnt <- samp.cnt[samp.cnt>=10] ## taxa data for(i in 1:dim(taxa)[1]) if(!any(as.numeric(names(samp.cnt))==taxa$SEF_SPCODE[i])) taxa[i,] <- NA taxa <- na.omit(taxa) ``` ] ] .item50[ ] ] --- count: false class: font_smaller # Data Manipulation: `base` & `dplyr` .grid[ .item50.border-right[ .scroll-fix[ ```r *samples <- foreign::read.dbf("data/SEF_SAMP.DBF") *samples <- samples[samples$GEAR_NAME=="McKenna trawl" & samples$GEAR_TYPE=="FISH TRAWL",] ## taxa data taxa<- foreign::read.dbf("data/SEF_TAXA.DBF") taxa <- taxa[taxa$SEF_SPCODE > 37000000 & taxa$SEF_SPCODE < 38000000 ,] ## clean sample data for(i in 1:dim(samples)[1]) if(!any(taxa$SEF_SPCODE==samples$SEF_SPCODE[i])) samples[i,] <- NA samples <- samples[!is.na(samples$SEF_SPCODE),] samp.cnt <- tapply(samples$SEF_SPCODE,samples$SEF_SPCODE,length) samp.cnt <- samp.cnt[samp.cnt>=10] ## taxa data for(i in 1:dim(taxa)[1]) if(!any(as.numeric(names(samp.cnt))==taxa$SEF_SPCODE[i])) taxa[i,] <- NA taxa <- na.omit(taxa) ``` ] ] .item50[ ```r samples <- foreign::read.dbf("data/SEF_SAMP.DBF") %>% filter(GEAR_NAME=="McKenna trawl" & GEAR_TYPE=="FISH TRAWL") ``` ] ] --- count: false class: font_smaller # Data Manipulation: `base` & `dplyr` .grid[ .item50.border-right[ .scroll-fix[ ```r samples <- foreign::read.dbf("data/SEF_SAMP.DBF") samples <- samples[samples$GEAR_NAME=="McKenna trawl" & samples$GEAR_TYPE=="FISH TRAWL",] ## taxa data *taxa<- foreign::read.dbf("data/SEF_TAXA.DBF") *taxa <- taxa[taxa$SEF_SPCODE > 37000000 & taxa$SEF_SPCODE < 38000000 ,] ## clean sample data for(i in 1:dim(samples)[1]) if(!any(taxa$SEF_SPCODE==samples$SEF_SPCODE[i])) samples[i,] <- NA samples <- samples[!is.na(samples$SEF_SPCODE),] samp.cnt <- tapply(samples$SEF_SPCODE,samples$SEF_SPCODE,length) samp.cnt <- samp.cnt[samp.cnt>=10] ## taxa data for(i in 1:dim(taxa)[1]) if(!any(as.numeric(names(samp.cnt))==taxa$SEF_SPCODE[i])) taxa[i,] <- NA taxa <- na.omit(taxa) ``` ] ] .item50[ ```r samples <- foreign::read.dbf("data/SEF_SAMP.DBF") %>% filter(GEAR_NAME=="McKenna trawl" & GEAR_TYPE=="FISH TRAWL") *taxa <- foreign::read.dbf("data/SEF_TAXA.DBF") %>% * filter(SEF_SPCODE > 37000000 & SEF_SPCODE < 38000000) ``` ] ] --- count: false class: font_smaller # Data Manipulation: `base` & `dplyr` .grid[ .item50.border-right[ .scroll-fix[ ```r samples <- foreign::read.dbf("data/SEF_SAMP.DBF") samples <- samples[samples$GEAR_NAME=="McKenna trawl" & samples$GEAR_TYPE=="FISH TRAWL",] ## taxa data taxa<- foreign::read.dbf("data/SEF_TAXA.DBF") taxa <- taxa[taxa$SEF_SPCODE > 37000000 & taxa$SEF_SPCODE < 38000000 ,] ## clean sample data *for(i in 1:dim(samples)[1]) * if(!any(taxa$SEF_SPCODE==samples$SEF_SPCODE[i])) samples[i,] <- NA *samples <- samples[!is.na(samples$SEF_SPCODE),] samp.cnt <- tapply(samples$SEF_SPCODE,samples$SEF_SPCODE,length) samp.cnt <- samp.cnt[samp.cnt>=10] ## taxa data for(i in 1:dim(taxa)[1]) if(!any(as.numeric(names(samp.cnt))==taxa$SEF_SPCODE[i])) taxa[i,] <- NA taxa <- na.omit(taxa) ``` ] ] .item50[ ```r samples <- foreign::read.dbf("data/SEF_SAMP.DBF") %>% filter(GEAR_NAME=="McKenna trawl" & GEAR_TYPE=="FISH TRAWL") taxa <- foreign::read.dbf("data/SEF_TAXA.DBF") %>% filter(SEF_SPCODE > 37000000 & SEF_SPCODE < 38000000) ``` <br> .center[ ??? ] {{content}} ] ] -- The intention here is that if a `SEF_SPCODE` is not within the `taxa` of interest, then you want to remove the corresponding sample. <span style="font-size:18pt">(I did not make up this code. It is a code used in practice, but just some code are harder to read for others.)</span> --- count: false class: font_smaller # Data Manipulation: `base` & `dplyr` .grid[ .item50.border-right[ .scroll-fix[ ```r samples <- foreign::read.dbf("data/SEF_SAMP.DBF") samples <- samples[samples$GEAR_NAME=="McKenna trawl" & samples$GEAR_TYPE=="FISH TRAWL",] ## taxa data taxa<- foreign::read.dbf("data/SEF_TAXA.DBF") taxa <- taxa[taxa$SEF_SPCODE > 37000000 & taxa$SEF_SPCODE < 38000000 ,] ## clean sample data *for(i in 1:dim(samples)[1]) * if(!any(taxa$SEF_SPCODE==samples$SEF_SPCODE[i])) samples[i,] <- NA *samples <- samples[!is.na(samples$SEF_SPCODE),] samp.cnt <- tapply(samples$SEF_SPCODE,samples$SEF_SPCODE,length) samp.cnt <- samp.cnt[samp.cnt>=10] ## taxa data for(i in 1:dim(taxa)[1]) if(!any(as.numeric(names(samp.cnt))==taxa$SEF_SPCODE[i])) taxa[i,] <- NA taxa <- na.omit(taxa) ``` ] ] .item50[ ```r taxa <- foreign::read.dbf("data/SEF_TAXA.DBF") %>% filter(SEF_SPCODE > 37000000 & SEF_SPCODE < 38000000) samples <- foreign::read.dbf("data/SEF_SAMP.DBF") %>% filter(GEAR_NAME=="McKenna trawl" & GEAR_TYPE=="FISH TRAWL") %>% * filter(SEF_SPCODE %in% taxa$SEF_SPCODE) ``` * Note: this is not that coding in `base` is bad! * In this example, you can also write a more simple `base` code too. ] ] --- count: false class: font_smaller # Data Manipulation: `base` & `dplyr` .grid[ .item50.border-right[ .scroll-fix[ ```r samples <- foreign::read.dbf("data/SEF_SAMP.DBF") samples <- samples[samples$GEAR_NAME=="McKenna trawl" & samples$GEAR_TYPE=="FISH TRAWL",] ## taxa data taxa<- foreign::read.dbf("data/SEF_TAXA.DBF") taxa <- taxa[taxa$SEF_SPCODE > 37000000 & taxa$SEF_SPCODE < 38000000 ,] ## clean sample data for(i in 1:dim(samples)[1]) if(!any(taxa$SEF_SPCODE==samples$SEF_SPCODE[i])) samples[i,] <- NA samples <- samples[!is.na(samples$SEF_SPCODE),] *samp.cnt <- tapply(samples$SEF_SPCODE,samples$SEF_SPCODE,length) *samp.cnt <- samp.cnt[samp.cnt>=10] ## taxa data for(i in 1:dim(taxa)[1]) if(!any(as.numeric(names(samp.cnt))==taxa$SEF_SPCODE[i])) taxa[i,] <- NA taxa <- na.omit(taxa) ``` ] ] .item50[ ```r taxa <- foreign::read.dbf("data/SEF_TAXA.DBF") %>% filter(SEF_SPCODE > 37000000 & SEF_SPCODE < 38000000) samples <- foreign::read.dbf("data/SEF_SAMP.DBF") %>% filter(GEAR_NAME=="McKenna trawl" & GEAR_TYPE=="FISH TRAWL") %>% filter(SEF_SPCODE %in% taxa$SEF_SPCODE) *spcode_ge10_samples <- samples %>% * group_by(SEF_SPCODE) %>% * summarise(n = n()) %>% # or just `tally()` * filter(n >= 10) %>% * pull(SEF_SPCODE) ``` In my opinion, `group` + `summarise` is the most powerful reason to use `dplyr` over `base` counterparts. ] ] --- count: false class: font_smaller # Data Manipulation: `base` & `dplyr` .grid[ .item50.border-right[ .scroll-fix[ ```r samples <- foreign::read.dbf("data/SEF_SAMP.DBF") samples <- samples[samples$GEAR_NAME=="McKenna trawl" & samples$GEAR_TYPE=="FISH TRAWL",] ## taxa data taxa<- foreign::read.dbf("data/SEF_TAXA.DBF") taxa <- taxa[taxa$SEF_SPCODE > 37000000 & taxa$SEF_SPCODE < 38000000 ,] ## clean sample data for(i in 1:dim(samples)[1]) if(!any(taxa$SEF_SPCODE==samples$SEF_SPCODE[i])) samples[i,] <- NA samples <- samples[!is.na(samples$SEF_SPCODE),] samp.cnt <- tapply(samples$SEF_SPCODE,samples$SEF_SPCODE,length) samp.cnt <- samp.cnt[samp.cnt>=10] ## taxa data *for(i in 1:dim(taxa)[1]) * if(!any(as.numeric(names(samp.cnt))==taxa$SEF_SPCODE[i])) taxa[i,] <- NA *taxa <- na.omit(taxa) ``` ] ] .item50[ ```r taxa <- foreign::read.dbf("data/SEF_TAXA.DBF") %>% filter(SEF_SPCODE > 37000000 & SEF_SPCODE < 38000000) samples <- foreign::read.dbf("data/SEF_SAMP.DBF") %>% filter(GEAR_NAME=="McKenna trawl" & GEAR_TYPE=="FISH TRAWL") %>% filter(SEF_SPCODE %in% taxa$SEF_SPCODE) spcode_ge10_samples <- samples %>% group_by(SEF_SPCODE) %>% summarise(n = n()) %>% # or just `tally()` filter(n >= 10) %>% pull(SEF_SPCODE) *taxa2 <- taxa %>% * filter(SEF_SPCODE %in% spcode_ge10_samples) ``` ] ] --- class: font_smaller # Reading code like English grammar .grid[ .item.border-right[ ```r taxa <- foreign::read.dbf("data/SEF_TAXA.DBF") %>% filter(SEF_SPCODE > 37000000 & SEF_SPCODE < 38000000) samples <- foreign::read.dbf("data/SEF_SAMP.DBF") %>% filter(GEAR_NAME=="McKenna trawl" & GEAR_TYPE=="FISH TRAWL") %>% filter(SEF_SPCODE %in% taxa$SEF_SPCODE) spcode_ge10_samples <- samples %>% group_by(SEF_SPCODE) %>% summarise(n = n()) %>% # or just `tally()` filter(n >= 10) %>% pull(SEF_SPCODE) taxa2 <- taxa %>% filter(SEF_SPCODE %in% spcode_ge10_samples) ``` No intermediate naming to think or worry about! ] .item[ * Think of `%>%` as "then". * The code may read more familiar to an English user without much specialist knowledge. * Of course this is biased in favour of people who know English! ] ] --- class: transition middle # .circle-big[3] # Grammar of <br>Genomic Data Transformation <i class="fab fa-r-project blue"></i> `plyranges` π¦ .footnote.monash-bg-blue[ Lee, Cook & Lawrence (2019) plyranges: a grammar of genomic data transformation. *Genome Biology 20:1*. ] --- class: font_smaller # <i class="fas fa-dna"></i> Genomic Data from `HelloRangesData` .grid[ .item50.border-right[ ```r exons ``` ``` ## GRanges object with 459752 ranges and 3 metadata columns: ## seqnames ranges strand | name score tx_id ## <Rle> <IRanges> <Rle> | <character> <numeric> <character> ## [1] chr1 11874-12227 + | NR_046018_exon_0_0_chr1_11874_f 0 NR_046018 ## [2] chr1 12613-12721 + | NR_046018_exon_1_0_chr1_12613_f 0 NR_046018 ## [3] chr1 13221-14409 + | NR_046018_exon_2_0_chr1_13221_f 0 NR_046018 ## [4] chr1 14362-14829 - | NR_024540_exon_0_0_chr1_14362_r 0 NR_024540 ## [5] chr1 14970-15038 - | NR_024540_exon_1_0_chr1_14970_r 0 NR_024540 ## ... ... ... ... . ... ... ... ## [459748] chrY 59338754-59338859 + | NM_002186_exon_6_0_chrY_59338754_f 0 NM_002186 ## [459749] chrY 59338754-59338859 + | NM_176786_exon_7_0_chrY_59338754_f 0 NM_176786 ## [459750] chrY 59340194-59340278 + | NM_002186_exon_7_0_chrY_59340194_f 0 NM_002186 ## [459751] chrY 59342487-59343488 + | NM_002186_exon_8_0_chrY_59342487_f 0 NM_002186 ## [459752] chrY 59342487-59343488 + | NM_176786_exon_8_0_chrY_59342487_f 0 NM_176786 ## ------- ## seqinfo: 93 sequences from an unspecified genome; no seqlengths ``` ] .item50[ ```r gwas ``` ``` ## GRanges object with 17680 ranges and 1 metadata column: ## seqnames ranges strand | name ## <Rle> <IRanges> <Rle> | <character> ## [1] chr1 1005806 * | rs3934834 ## [2] chr1 1079198 * | rs11260603 ## [3] chr1 1247494 * | rs12103 ## [4] chr1 2069172 * | rs425277 ## [5] chr1 2069681 * | rs3753242 ## ... ... ... ... . ... ## [17676] chrX 154014107 * | rs5987027 ## [17677] chrX 154014107 * | rs5987027 ## [17678] chrX 154233774 * | rs17281398 ## [17679] chrY 940180 * | rs4129148 ## [17680] chrY 6477300 * | rs5941160 ## ------- ## seqinfo: 93 sequences from an unspecified genome; no seqlengths ``` ] ] --- class: font_smaller # <i class="fas fa-code"></i> `library(plyranges)` .grid[ .item50.border-right[ Find all GWAS SNPs that overlap exons: ```r join_overlap_inner(gwas, exons) ``` ``` ## GRanges object with 3439 ranges and 4 metadata columns: ## seqnames ranges strand | name.x name.y score tx_id ## <Rle> <IRanges> <Rle> | <character> <character> <numeric> <character> ## [1] chr1 1079198 * | rs11260603 NR_038869_exon_2_0_chr1_1078119_f 0 NR_038869 ## [2] chr1 1247494 * | rs12103 NM_001256456_exon_1_0_chr1_1247398_r 0 NM_001256456 ## [3] chr1 1247494 * | rs12103 NM_001256460_exon_1_0_chr1_1247398_r 0 NM_001256460 ## [4] chr1 1247494 * | rs12103 NM_001256462_exon_1_0_chr1_1247398_r 0 NM_001256462 ## [5] chr1 1247494 * | rs12103 NM_001256463_exon_1_0_chr1_1247398_r 0 NM_001256463 ## ... ... ... ... . ... ... ... ... ## [3435] chrX 153764217 * | rs1050828 NM_001042351_exon_9_0_chrX_153764152_r 0 NM_001042351 ## [3436] chrX 153764217 * | rs1050828 NM_000402_exon_9_0_chrX_153764152_r 0 NM_000402 ## [3437] chrX 153764217 * | rs1050828 NM_001042351_exon_9_0_chrX_153764152_r 0 NM_001042351 ## [3438] chrX 153764217 * | rs1050828 NM_000402_exon_9_0_chrX_153764152_r 0 NM_000402 ## [3439] chrX 153764217 * | rs1050828 NM_001042351_exon_9_0_chrX_153764152_r 0 NM_001042351 ## ------- ## seqinfo: 93 sequences from an unspecified genome; no seqlengths ``` ] .item50[ Generate 2bp splice sites on either side of the exons: ```r interweave(flank_left(exons, 2L), flank_right(exons, 2L), .id = "side") ``` ``` ## GRanges object with 919504 ranges and 4 metadata columns: ## seqnames ranges strand | name score tx_id side ## <Rle> <IRanges> <Rle> | <character> <numeric> <character> <character> ## [1] chr1 11872-11873 + | NR_046018_exon_0_0_chr1_11874_f 0 NR_046018 left ## [2] chr1 12228-12229 + | NR_046018_exon_0_0_chr1_11874_f 0 NR_046018 right ## [3] chr1 12611-12612 + | NR_046018_exon_1_0_chr1_12613_f 0 NR_046018 left ## [4] chr1 12722-12723 + | NR_046018_exon_1_0_chr1_12613_f 0 NR_046018 right ## [5] chr1 13219-13220 + | NR_046018_exon_2_0_chr1_13221_f 0 NR_046018 left ## ... ... ... ... . ... ... ... ... ## [919500] chrY 59340279-59340280 + | NM_002186_exon_7_0_chrY_59340194_f 0 NM_002186 right ## [919501] chrY 59342485-59342486 + | NM_002186_exon_8_0_chrY_59342487_f 0 NM_002186 left ## [919502] chrY 59343489-59343490 + | NM_002186_exon_8_0_chrY_59342487_f 0 NM_002186 right ## [919503] chrY 59342485-59342486 + | NM_176786_exon_8_0_chrY_59342487_f 0 NM_176786 left ## [919504] chrY 59343489-59343490 + | NM_176786_exon_8_0_chrY_59342487_f 0 NM_176786 right ## ------- ## seqinfo: 93 sequences from an unspecified genome; no seqlengths ``` ] ] --- class: refresher .grid[.item.center[ <br><br> # Data Science <br><br> is ... <br><br> {{content}} ] .item[ <img src="images/data-science-network.png" width = "700px"> ] ] -- <h1>Statistics for People?<h1> --- count: false class: refresher .grid[.item.center[ <br><br> # Data Science <br><br> is ... <br><br> # Statistics for People? ] .item[ <img src="images/data-science-network2.png" width = "700px"> ] ] <div class="corner-box" style = "left:38%;width:57%;bottom:40px;"> <b>Reality:</b> <ul> <li>Most who use statistics are <i>not</i> trained foremost as a statistician.</li> <li>A lot of people need to use statistics.</li> </ul> </div> .footnote.monash-bg-green2[ Wickham (2015) Teaching Safe-Stats, Not Statistical Abstinence. *The American Statistician, Online Discussion* ] --- count: false class: refresher .grid[.item.center[ <br><br> # Data Science <br><br> is ... <br><br> # Statistics for People? ] .item[ <img src="images/data-science-network3.png" width = "700px"> ] ] <div class="corner-box" style = "left:38%;width:57%;bottom:40px;"> <ul> If statistical tools <span class="yellow"><b>leverage cognition</b></span> of an everyday person or an average statistics user & <span class="yellow"><b>provide a cohesive and consistent system</span></b>, rather than learn new, specialised or ad-hoc methods, wouldn't that be helpful in <span class="yellow"><b>making statistics accessible</span></b> and <span class="yellow"><b>promote statistical literacy</span></b>? </ul> </div> .footnote.monash-bg-green2[ Wickham & Grolemund (2017) R for Data Science: Import, Tidy, Transform, Visualize, and Model. *O'Reilly Media, Inc.* ] --- class: transition middle # .circle-big[4] # Grammar of <br>Experimental Design <i class="fab fa-r-project blue"></i> `edibble` π¦ <i class="fas fa-wrench"></i> (Work-In-Progress) --- # Typical course in experimental design <br>.font_small[(at least at University of Sydney in 2017-2019)] Teach: * Completely Randomised Design * Randomised Complete Block Design * Latin Square Design * Balanced Incomplete Block Design * Factorial Design * <strike> 2</strike><sup>k</sup><strike> Factorial Design</strike> .font_small[(I removed this from 2018, I won't talk about this today)] * Split-plot Design .font_small[(I added this from 2018 among other concepts)] --- # Completely Randomised Design (CRD) .grid[ .item[ <br> <center> <img src="images/crd-eg1.png" width = "300px"/> </center> ] .item[ * `\(t\)` treatments randomised to `\(n\)` units <br> .model-box[ `$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{treatment} + \texttt{error}$$` .font_small[(with constraints and distributional assumptions)] <br> <center> <img src="images/crd-eg1-anova.png" width = "700px"/> </center> ] ] ] --- # Randomised Complete Block Design (RCBD) .grid[ .item[ <br> <center> <img src="images/rcbd-eg1.png" width = "300px"/> </center> ] .item[ * `\(b\)` blocks of size `\(t\)` * `\(t\)` treatments randomised to `\(t\)` units within each block .model-box[ `$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{treatment} + \texttt{block} + \texttt{error}$$` <center> <img src="images/rcbd-eg1-anova.png" width = "850px"/> </center> ] ] ] --- # Latin Square Design (LSD) .grid[ .item[ <br> <center> <img src="images/lsd-eg1.png" width = "300px"/> </center> ] .item[ * two orthogonal blocks of size `\(t\)` * `\(t\)` treatments randomised to units such that every treatment appears exactly once in each block .model-box[ `$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{treatment} + \texttt{row} + \texttt{column} + \texttt{error}$$` <center> <img src="images/lsd-eg1-anova.png" width = "850px"/> </center> ] ] ] --- # Balanced Incomplete Block Design (BIBD) .grid[ .item[ <br> <center> <img src="images/bibd-eg1.png" width = "300px"/> </center> ] .item[ * `\(b\)` blocks of size `\(k < t\)` * `\(t\)` treatments randomised to units within each block such that every pair of treatment appears the same number of times across blocks .model-box[ `$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{block} + \texttt{treatment} + \texttt{error}$$` <center> <img src="images/bibd-eg1-anova.png" width = "850px"/> </center> ] ] ] --- # Factorial Design .grid[ .item[ <br> <center> <img src="images/factorial-eg1.png" width = "300px"/> </center> ] .item[ * `\(ab = t\)` treatments randomised to `\(n\)` units * treatment is every combination of two factors A and B .model-box[ `$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{A} + \texttt{B} + \texttt{A:B} + \texttt{error}$$` <center> <img src="images/factorial-eg1-anova-top.png" width = "850px"/> <details style="font-size:4pt"><summary></summary> <img src="images/factorial-eg1-anova-middle.png" width = "850px"/> </details> <img src="images/factorial-eg1-anova-bottom.png" width = "850px"/> </center> ] ] ] --- # Split-plot Design .grid[ .item[ <br> <center> <img src="images/split-plot-eg1.png" width = "300px"/> </center> ] <div class="item" style="font-size: 0.85em"> <ul> <li> \(n_1\) whole plots consisting of \(b\) sub plots</li> <li>in total there are \(n\) sub plots</li> <li>treatment factor A is randomised to whole plots</li> <li>treatment factor B is randomised to sub plots within each whole plot</li> </ul> .model-box[ `$$\scriptsize \texttt{observation} = \texttt{mean} + \texttt{A} + \texttt{WP} + \texttt{B} + \texttt{A:B} + \texttt{error}$$` <center> <img src="images/split-plot-eg1-anova.png" width = "850px"/> </center> ] </div> ] --- class: transition center middle # CRAN Task View of Design of Experiments contains # π¦ .yellow[229 R-packages ] .font_small[as of 2020-07-14] .font_small[(Please note that there may be some webscrapping error)] --- class: center # Top 10 downloaded R-packages in 2018 ![](images/top10-1.png)<!-- --> # `agricolae` is the most downloaded .font_small[(data from `cranlogs` from 2018-01-01 to 2018-12-31)] .font_small[Please note `agricolae` imports `AlgDesign` and recent data has `AlgDesign` as top followed by `agricolae`. ] --- class: font_small # `agricolae::design.crd` .blue[**Completely randomised design**] for `\(t = 3\)` treatments with `\(2\)` replicates each <pre><code> trt <- c("A", "B", "C") agricolae::.bg-yellow[design.crd](trt = trt, r = 2) %>% glimpse() </code></pre> .scroll-350[ ``` ## List of 2 ## $ parameters:List of 7 ## ..$ design: chr "crd" ## ..$ trt : chr [1:3] "A" "B" "C" ## ..$ r : num [1:3] 2 2 2 ## ..$ serie : num 2 ## ..$ seed : int 396269021 ## ..$ kinds : chr "Super-Duper" ## ..$ : logi TRUE ## $ book :'data.frame': 6 obs. of 3 variables: ## ..$ plots: num [1:6] 101 102 103 104 105 106 ## ..$ r : int [1:6] 1 1 2 1 2 2 ## ..$ trt : chr [1:6] "C" "A" "C" "B" ... ``` ] <div class="plot-box" style="position:absolute;top: 35%; right: 50px;"> <img src="images/ggcrd-1.png"> </div> --- class: font_small # `agricolae::design.rcbd` .blue[**Randomised complete block design**] for `\(t =3\)` treatments with `\(2\)` blocks <pre><code> trt <- c("A", "B", "C") agricolae::.bg-yellow[design.rcbd](trt = trt, r = 2) %>% glimpse() </code></pre> .scroll-350[ ``` ## List of 3 ## $ parameters:List of 7 ## ..$ design: chr "rcbd" ## ..$ trt : chr [1:3] "A" "B" "C" ## ..$ r : num 2 ## ..$ serie : num 2 ## ..$ seed : int 973575053 ## ..$ kinds : chr "Super-Duper" ## ..$ : logi TRUE ## $ sketch : chr [1:2, 1:3] "C" "A" "B" "C" ... ## $ book :'data.frame': 6 obs. of 3 variables: ## ..$ plots: num [1:6] 101 102 103 201 202 203 ## ..$ block: Factor w/ 2 levels "1","2": 1 1 1 2 2 2 ## ..$ trt : Factor w/ 3 levels "A","B","C": 3 2 1 1 3 2 ``` ] <div class="plot-box" style="position:absolute;bottom: 10px; right: 50px;"> <br> <img src="images/ggrcbd-1.png"> </div> --- class: font_small # `agricolae::design.lsd()` .blue[**Latin square design**] for `\(t = 3\)` treatments <pre><code> trt <- c("A", "B", "C") agricolae::.bg-yellow[design.lsd](trt = trt) %>% glimpse() </code></pre> .scroll-350[ ``` ## List of 3 ## $ parameters:List of 7 ## ..$ design: chr "lsd" ## ..$ trt : chr [1:3] "A" "B" "C" ## ..$ r : int 3 ## ..$ serie : num 2 ## ..$ seed : int -1984440067 ## ..$ kinds : chr "Super-Duper" ## ..$ : logi TRUE ## $ sketch : chr [1:3, 1:3] "B" "C" "A" "A" ... ## $ book :'data.frame': 9 obs. of 4 variables: ## ..$ plots: num [1:9] 101 102 103 201 202 203 301 302 303 ## ..$ row : Factor w/ 3 levels "1","2","3": 1 1 1 2 2 2 3 3 3 ## ..$ col : Factor w/ 3 levels "1","2","3": 1 2 3 1 2 3 1 2 3 ## ..$ trt : Factor w/ 3 levels "A","B","C": 2 1 3 3 2 1 1 3 2 ``` ] <div class="plot-box" style="position:absolute;top: 100px; right: 50px;"> <br> <img src="images/gglsd-1.png"> </div> --- class: font_small # `agricolae::design.bib()` .blue[**Balanced incomplete block design**] for `\(t = 3\)` treatments with block size of `\(2\)` <pre><code> trt <- c("A", "B", "C") agricolae::.bg-yellow[design.bib](trt = trt, k = 2) %>% glimpse() </code></pre> .scroll-350[ ``` ## [1] "No improvement over initial random design." ## ## Parameters BIB ## ============== ## Lambda : 1 ## treatmeans : 3 ## Block size : 2 ## Blocks : 3 ## Replication: 2 ## ## Efficiency factor 0.75 ## ## <<< Book >>> ``` ``` ## List of 4 ## $ parameters:List of 6 ## ..$ design: chr "bib" ## ..$ trt : chr [1:3] "A" "B" "C" ## ..$ k : num 2 ## ..$ serie : num 2 ## ..$ seed : int -509134623 ## ..$ kinds : chr "Super-Duper" ## $ statistics:'data.frame': 1 obs. of 6 variables: ## ..$ lambda : num 1 ## ..$ treatmeans: int 3 ## ..$ blockSize : num 2 ## ..$ blocks : int 3 ## ..$ r : num 2 ## ..$ Efficiency: num 0.75 ## $ sketch : chr [1:3, 1:2] "C" "B" "B" "A" ... ## $ book :'data.frame': 6 obs. of 3 variables: ## ..$ plots: num [1:6] 101 102 201 202 301 302 ## ..$ block: Factor w/ 3 levels "1","2","3": 1 1 2 2 3 3 ## ..$ trt : Factor w/ 3 levels "A","B","C": 3 1 2 1 2 3 ``` ] <div class="plot-box" style="position:absolute;top: 200px; right: 50px;"> <br> <img src="images/ggbibd-1.png"> </div> --- class: font_small # `agricolae::design.ab()` .blue[**Factorial design**] for `\(t = 3 \times 2\)` treatments with `\(2\)` replication for each treatment <pre><code> agricolae::.bg-yellow[design.ab](trt = c(3, 2), r = 2, design = "crd") %>% glimpse() </code></pre> .scroll-350[ ``` ## List of 2 ## $ parameters:List of 8 ## ..$ design : chr "factorial" ## ..$ trt : chr [1:6] "1 1" "1 2" "2 1" "2 2" ... ## ..$ r : num [1:6] 2 2 2 2 2 2 ## ..$ serie : num 2 ## ..$ seed : int 801301585 ## ..$ kinds : chr "Super-Duper" ## ..$ : logi TRUE ## ..$ applied: chr "crd" ## $ book :'data.frame': 12 obs. of 4 variables: ## ..$ plots: num [1:12] 101 102 103 104 105 106 107 108 109 110 ... ## ..$ r : int [1:12] 1 1 1 2 1 1 1 2 2 2 ... ## ..$ A : chr [1:12] "2" "3" "2" "2" ... ## ..$ B : chr [1:12] "2" "2" "1" "2" ... ``` ] Note *not* A/B testing! <div class="plot-box" style="position:absolute;top: 200px; right: 50px;"> <br> <img src="images/ggfac-1.png"> <br> </div> --- class: font_small # `agricolae::design.split()` .blue[**Split-plot design**] for `\(t = 2 \times 4\)` treatments with `\(2\)` replication for each treatment <pre><code> trt1 <- c("I", "R"); trt2 <- LETTERS[1:4] agricolae::.bg-yellow[design.split](trt1 = trt1, trt2 = trt2, r = 2, design = "crd") %>% glimpse() </code></pre> .scroll-350[ ``` ## List of 2 ## $ parameters:List of 8 ## ..$ design : chr "split" ## ..$ : logi TRUE ## ..$ trt1 : chr [1:2] "I" "R" ## ..$ applied: chr "crd" ## ..$ r : num [1:2] 2 2 ## ..$ serie : num 2 ## ..$ seed : int -765020087 ## ..$ kinds : chr "Super-Duper" ## $ book :'data.frame': 16 obs. of 5 variables: ## ..$ plots : num [1:16] 101 101 101 101 102 102 102 102 103 103 ... ## ..$ splots: Factor w/ 4 levels "1","2","3","4": 1 2 3 4 1 2 3 4 1 2 ... ## ..$ r : int [1:16] 1 1 1 1 1 1 1 1 2 2 ... ## ..$ trt1 : chr [1:16] "I" "I" "I" "I" ... ## ..$ trt2 : chr [1:16] "D" "A" "C" "B" ... ``` ] <div class="plot-box" style="position:absolute;top: 300px; right: 50px;"> <img src="images/split-plot-graph-1.png"> </div> --- # `library(edibble)` <i class="fas fa-wrench"></i> WIP <br>.font_small[https://github.com/emitanaka/edibble<br>(sorry code not ready yet for prime time, please enjoy prototype demo instead)] * `tibble` R-package is a modern reimagining of the `data.frame` * `edibble` (WIP) creates experimental design tibbles {{content}} .footnote[ MΓΌller & Wickham (2020) tibble: Simple Data Frames. *R package version 3.0.3* ] -- * π€ "named experimental design" functions (`agricolae::design.crd`, etc.) are like "named statistical graphic" functions (`pie`, `barplot`) {{content}} -- * π‘ construction of experimental design needs to be made more accessible, modifiable, extensible and generalisable {{content}} -- * What are experimental designs composed of? --- class: font_smaller # Grammar of Experimental Design .grid[.item[ * Consider a field experiment with 120 plots ] .item[ ```r library(edibble) edibble(seed = 2020) %>% set_units(plot = 120) ``` ``` ## # An edibble: 120 x 1 ## plot ## <unit> ## 1 plot001 ## 2 plot002 ## 3 plot003 ## 4 plot004 ## 5 plot005 ## 6 plot006 ## 7 plot007 ## 8 plot008 ## 9 plot009 ## 10 plot010 ## # β¦ with 110 more rows ``` ] ] --- class: font_smaller count: false # Prototype Grammar of Experimental Design .grid[.item[ * Consider a field experiment with 120 plots * There are 60 wheat varieties to test ] .item50[ ```r library(edibble) edibble(seed = 2020) %>% set_units(plot = 120) %>% set_trts(var = 60) ``` ``` ## # An edibble: 0 x 2 ## # β¦ with 2 variables: plot <unit>, var <trt> ``` ``` ## Warning: `plot` and `var` have no connection with other variables. ``` ] ] --- class: font_smaller count: false # Prototype Grammar of Experimental Design .circle[1] .grid[.item[ * Consider a field experiment with 120 plots. * There are 60 wheat varieties to test. * Completely randomise wheat varieties to plots. <br><br> Resulting design is a .blue[completely randomised design]. ] .item50[ ```r library(edibble) edibble(seed = 2020) %>% set_units(plot = 120) %>% set_trts(var = 60) %>% randomise_trts(var ~ plot) ``` ``` ## # An edibble: 120 x 2 ## plot var ## <unit> <trt> ## 1 plot001 var49 ## 2 plot002 var28 ## 3 plot003 var25 ## 4 plot004 var33 ## 5 plot005 var36 ## 6 plot006 var54 ## 7 plot007 var16 ## 8 plot008 var52 ## 9 plot009 var52 ## 10 plot010 var10 ## # β¦ with 110 more rows ``` ] ] --- class: font_smaller # Prototype Grammar of Experimental Design .circle[2] .grid[.item[ * Consider a field experiment with 2 blocks each with 60 plots. * There are 60 wheat varieties to test. * Completely randomise wheat varieties to plots within block. <br><br> Resulting design is a .blue[randomised complete block design]. <br> Can you see how it differs from the previous design? ] .item50[ ```r library(edibble) edibble(seed = 2020) %>% * set_units(block = c("B1", "B2"), plot = within(block, 60)) %>% set_trts(var = 60) %>% randomise_trts(var ~ plot) %>% * restrict_mapping(ed_nest(plot = block)) ``` ``` ## # An edibble: 120 x 3 ## block plot var ## <unit> <unit> <trt> ## 1 B1 plot001 var34 ## 2 B1 plot002 var56 ## 3 B1 plot003 var50 ## 4 B1 plot004 var02 ## 5 B1 plot005 var07 ## 6 B1 plot006 var53 ## 7 B1 plot007 var44 ## 8 B1 plot008 var31 ## 9 B1 plot009 var39 ## 10 B1 plot010 var40 ## # β¦ with 110 more rows ``` ] ] --- class: font_smaller # Visualising the Experimental Design .grid[.item[ ```r des1 <- edibble(seed = 2020) %>% set_units(row = 6, col = 6, pot =~ row * col) %>% # 36 pots in total set_trts(irrigation = c("Y", "N"), variety = c("V1", "V2", "V3")) %>% randomise_trts(irrigation * variety ~ pot) des2 <- des1 %>% restrict_mapping(ed_nest(pot = c(col, row))) des3 <- des1 %>% restrict_mapping(irrigation ~ col) ``` * `des1` is a completely randomised design * `des2` is a Latin Square design * `des3` is a split-plot design ] .item[ {{content}} ] ] -- <img src="images/autoplots1.png" width = "200px"> {{content}} -- <img src="images/autoplots2.png" width = "200px"> {{content}} -- <img src="images/autoplots3.png" width = "200px"> -- .corner-box[ Small touches to help user: * The graphical object is a `ggplot` so same grammar for `ggplot` can be used to customise it. * Width-to-height ratio of figure maintain fixed aspect ratio for easy viewing. * Factorial experiments: the treatment factor with higher number of levels mapped to the **hue of the color**, and the other treatment factor mapped to the **shade of color**. ] --- class: font_smaller # Unbalanced or non-orthogonal experiments .circle[1] ```r edibble() %>% set_units(class = c("Maths" = 2, "Stats" = 4)) # 2 Maths class and 4 Stats class # OR edibble() %>% set_units(class = traits(labels = c("Maths", "Stats"), replication = c(2, 4))) # OR edibble() %>% set_units(class = c("Maths", "Maths", "Stats", "Stats", "Stats", "Stats")) ``` * Under the hood, the units (and treatments) are all set by `traits`. * Shorthand inputs for `set_units` and `set_trts` are (a) single integer, (b) unnamed vector, (c) named vector and (d) formula. * To avoid ambiguity, user can always use `traits` instead, e.g. (a) and (b) may not be distinguishable if vector is of size 1. --- class: font_smaller # Unbalanced or non-orthogonal experiments .circle[2] ```r edibble() %>% set_units(class = c("A", "B", "C", "D"), student = within(class, "A" ~ 3, # 3 students in class "A" "B" ~ 4, # 4 students in class "B" . ~ 2)) # 2 students for rest of the classes ``` -- What about if students are shared between classes? -- ```r edibble() %>% set_units(class = c("A", "B", "C", "D"), student = within(class, "A" ~ c("Bob", "Mary", "Helen"), "B" ~ c("Helen", "Robert", "Max", "Ana"), "C" ~ c("Helen", "Max"), "D" ~ c("Max", "Ana"))) ``` --- class: font_smaller # BONUS<br>Design of (made-up) single cell experiments ```r edibble(seed = 2020) %>% set_units(patient = 48, # 48 patients # extract 200-400 cells from each patient cell = within(patient, oRanges(min = 200, max = 400)), batch = assemble(cell, 100)) # 100 cells per batch ``` .scroll-350[ ``` ## # An edibble: 9,600 x 3 ## patient cell batch ## <unit> <unit> <unit> ## 1 patient01 cell001 batch1 ## 2 patient01 cell002 batch2 ## 3 patient01 cell003 batch3 ## 4 patient01 cell004 batch4 ## 5 patient01 cell005 batch5 ## 6 patient01 cell006 batch6 ## 7 patient01 cell007 batch7 ## 8 patient01 cell008 batch8 ## 9 patient01 cell009 batch9 ## 10 patient01 cell010 batch10 ## # β¦ with 9,590 more rows ``` ``` ## Warning: `edibble` contructured assuming 200 cells for each patient ``` ] --- # <i class="fas fa-clock"></i> Timeline .grid[.item[ * By end of 2020, I intend to have `edibble` be able to construct textbook (mostly orthogonal) designs. * From 2021, I will start to deploy it in practice for plant breeding experiments. * Clinical trials, survey designs, adaptive designs, and other designs that require sample size calculation or have undetermined number of units won't be tackled until the foundation has been built with agricultural experiments. ] .item.center[ <Br><br><br> # <i class="fas fa-comments"></i> Feedback is welcomed! <br><br> Slides can be found at <a href="https://www.emitanaka.org/slides/MonashBioinfo2020/" style="font-size:20pt">emitanaka.org/slides/MonashBioinfo2020/</a> ] ] --- class: font_smaller # Acknowledgements This slide was made using `xaringan` R-package and the following systems. .scroll-350[ ```r sessioninfo::session_info() ``` ``` ## β Session info βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ## setting value ## version R version 4.0.1 (2020-06-06) ## os macOS Catalina 10.15.5 ## system x86_64, darwin17.0 ## ui X11 ## language (EN) ## collate en_AU.UTF-8 ## ctype en_AU.UTF-8 ## tz Australia/Melbourne ## date 2020-07-15 ## ## β Packages βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ## package * version date lib ## agricolae 1.3-3 2020-06-07 [1] ## AlgDesign 1.2.0 2019-11-29 [1] ## anicon 0.1.0 2020-06-21 [1] ## assertthat 0.2.1 2019-03-21 [2] ## backports 1.1.8 2020-06-17 [1] ## Biobase 2.48.0 2020-04-27 [1] ## BiocGenerics * 0.34.0 2020-04-27 [1] ## BiocParallel 1.22.0 2020-04-27 [1] ## Biostrings 2.56.0 2020-04-27 [1] ## bitops 1.0-6 2013-08-17 [1] ## blob 1.2.1 2020-01-20 [2] ## broom 0.7.0 2020-07-09 [1] ## cellranger 1.1.0 2016-07-27 [2] ## cli 2.0.2 2020-02-28 [2] ## cluster 2.1.0 2019-06-19 [2] ## colorspace 1.4-1 2019-03-18 [2] ## combinat 0.0-8 2012-10-29 [1] ## cranlogs 2.1.1 2019-04-29 [1] ## crayon 1.3.4 2017-09-16 [2] ## curl 4.3 2019-12-02 [2] ## DBI 1.1.0 2019-12-15 [2] ## dbplyr 1.4.4 2020-05-27 [2] ## DelayedArray 0.14.0 2020-04-27 [1] ## digest 0.6.25 2020-02-23 [2] ## dplyr * 1.0.0 2020-05-29 [2] ## edibble * 0.0.0.9000 2020-07-15 [1] ## ellipsis 0.3.1 2020-05-15 [2] ## emo 0.0.0.9000 2020-06-26 [1] ## evaluate 0.14 2019-05-28 [2] ## fansi 0.4.1 2020-01-08 [2] ## farver 2.0.3 2020-01-16 [2] ## fastmap 1.0.1 2019-10-08 [2] ## forcats * 0.5.0 2020-03-01 [2] ## foreign 0.8-80 2020-05-24 [1] ## fs 1.4.2 2020-06-30 [1] ## generics 0.0.2 2018-11-29 [2] ## GenomeInfoDb * 1.24.2 2020-06-15 [1] ## GenomeInfoDbData 1.2.3 2020-07-13 [1] ## GenomicAlignments 1.24.0 2020-04-27 [1] ## GenomicRanges * 1.40.0 2020-04-27 [1] ## ggplot2 * 3.3.2 2020-06-19 [1] ## glue 1.4.1 2020-05-13 [2] ## gtable 0.3.0 2019-03-25 [2] ## haven 2.3.1 2020-06-01 [2] ## highr 0.8 2019-03-20 [2] ## hms 0.5.3 2020-01-08 [2] ## htmltools 0.5.0 2020-06-16 [1] ## httpuv 1.5.4 2020-06-06 [2] ## httr 1.4.1 2019-08-05 [2] ## icon 0.1.0 2020-06-21 [1] ## IRanges * 2.22.2 2020-05-21 [1] ## jsonlite 1.7.0 2020-06-25 [1] ## klaR 0.6-15 2020-02-19 [1] ## knitr 1.29 2020-06-23 [1] ## labeling 0.3 2014-08-23 [2] ## labelled 2.5.0 2020-06-17 [1] ## later 1.1.0.1 2020-06-05 [2] ## lattice 0.20-41 2020-04-02 [2] ## lifecycle 0.2.0 2020-03-06 [1] ## lubridate 1.7.9 2020-06-08 [2] ## magrittr 1.5 2014-11-22 [2] ## MASS 7.3-51.6 2020-04-26 [2] ## Matrix 1.2-18 2019-11-27 [2] ## matrixStats 0.56.0 2020-03-13 [1] ## mime 0.9 2020-02-04 [2] ## miniUI 0.1.1.1 2018-05-18 [1] ## modelr 0.1.8 2020-05-19 [2] ## munsell 0.5.0 2018-06-12 [2] ## nlme 3.1-148 2020-05-24 [2] ## pillar 1.4.6 2020-07-10 [1] ## pkgconfig 2.0.3 2019-09-22 [2] ## plyranges * 1.9.3 2020-07-13 [1] ## promises 1.1.1 2020-06-09 [1] ## purrr * 0.3.4 2020-04-17 [2] ## questionr 0.7.1 2020-05-26 [1] ## R6 2.4.1 2019-11-12 [2] ## Rcpp 1.0.5 2020-07-06 [1] ## RCurl 1.98-1.2 2020-04-18 [1] ## readr * 1.3.1 2018-12-21 [2] ## readxl 1.3.1 2019-03-13 [2] ## reprex 0.3.0 2019-05-16 [2] ## rlang 0.4.7 2020-07-09 [1] ## rmarkdown 2.3 2020-06-18 [1] ## Rsamtools 2.4.0 2020-04-27 [1] ## rstudioapi 0.11 2020-02-07 [2] ## rtracklayer 1.47.0 2020-04-07 [1] ## rvest 0.3.5 2019-11-08 [2] ## S4Vectors * 0.26.1 2020-05-16 [1] ## scales 1.1.1 2020-05-11 [2] ## sessioninfo 1.1.1 2018-11-05 [2] ## shiny 1.5.0 2020-06-23 [1] ## stringi 1.4.6 2020-02-17 [2] ## stringr * 1.4.0 2019-02-10 [2] ## SummarizedExperiment 1.18.2 2020-07-09 [1] ## tibble * 3.0.3 2020-07-10 [1] ## tidyr * 1.1.0 2020-05-20 [2] ## tidyselect 1.1.0 2020-05-11 [2] ## tidyverse * 1.3.0 2019-11-21 [2] ## utf8 1.1.4 2018-05-24 [2] ## vctrs 0.3.1.9000 2020-07-10 [1] ## withr 2.2.0 2020-04-20 [2] ## xaringan 0.16 2020-03-31 [2] ## xfun 0.15 2020-06-21 [1] ## XML 3.99-0.4 2020-07-05 [1] ## xml2 1.3.2 2020-04-23 [2] ## xtable 1.8-4 2019-04-21 [2] ## XVector 0.28.0 2020-04-27 [1] ## yaml 2.2.1 2020-02-01 [1] ## zeallot 0.1.0 2018-01-28 [1] ## zlibbioc 1.34.0 2020-04-27 [1] ## source ## CRAN (R 4.0.1) ## CRAN (R 4.0.1) ## Github (emitanaka/anicon@0b756df) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## Bioconductor ## Bioconductor ## Bioconductor ## Bioconductor ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.1) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.1) ## CRAN (R 4.0.0) ## CRAN (R 4.0.1) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## Bioconductor ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## local ## CRAN (R 4.0.0) ## Github (hadley/emo@3f03b11) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## Bioconductor ## Bioconductor ## Bioconductor ## Bioconductor ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.1) ## CRAN (R 4.0.0) ## Github (emitanaka/icon@8458546) ## Bioconductor ## CRAN (R 4.0.1) ## CRAN (R 4.0.1) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.1) ## CRAN (R 4.0.1) ## CRAN (R 4.0.1) ## CRAN (R 4.0.0) ## CRAN (R 4.0.1) ## CRAN (R 4.0.0) ## CRAN (R 4.0.1) ## CRAN (R 4.0.1) ## CRAN (R 4.0.1) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.1) ## CRAN (R 4.0.1) ## CRAN (R 4.0.0) ## Github (sa-lee/plyranges@0cf2e40) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.1) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.1) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.1) ## CRAN (R 4.0.1) ## Bioconductor ## CRAN (R 4.0.0) ## Bioconductor ## CRAN (R 4.0.0) ## Bioconductor ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.1) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## Bioconductor ## CRAN (R 4.0.1) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## Github (r-lib/vctrs@edf507d) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.1) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## Bioconductor ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## Bioconductor ## ## [1] /Users/etan0038/Library/R/4.0/library ## [2] /Library/Frameworks/R.framework/Versions/4.0/Resources/library ``` ] (Scroll on html slide to see all)