class: monash-bg-blue center middle hide-slide-number <div class="bg-black white" style="width:45%;right:0;bottom:0;padding-left:5px;border: solid 4px white;margin: auto;"> <i class="fas fa-exclamation-circle"></i> These slides are viewed best by Chrome and occasionally need to be refreshed if elements did not load properly. See here for <a href=part1-session2.pdf>PDF <i class="fas fa-file-pdf"></i></a>. </div> <br> .white[Push the **right arrow key** to see the next slide.] --- count: false background-image: url(images/bg8.jpg) background-size: cover class: hide-slide-number title-slide <div class="grid-row" style="grid: 1fr / 2fr;"> .item.center[ # <span style="text-shadow: 2px 2px 30px white;">Data Visualisation with R<br>Workshop Part 1</span> <!-- ## <span style="color:;text-shadow: 2px 2px 30px black;">Multiple layers, facetting and tidying your data</span> --> ] .center.shade_black.animated.bounceInUp.slower[ <br><br> ## Multiple layers, facetting and tidying your data <br> Presented by Emi Tanaka Department of Econometrics and Business Statistics <img src="images/monash-one-line-reversed.png" style="width:500px"><br>
<i class="fas fa-envelope faa-float animated "></i>
emi.tanaka@monash.edu
<i class="fab fa-twitter faa-float animated faa-fast "></i>
@statsgen .bottom_abs.width100.bg-black[ 6th Dec 2021 @ Statistical Society of Australia NSW Branch | Online ] ] </div> --- # Add multiple layers <center> <img src="images/ggplot-multiple-layers.png" width="68%"/> </center> .grid.font_smaller[ .item[ Each layer inherits mapping and data from `ggplot` by default. ```r ggplot(penguins, aes(x = species, y = bill_length_mm)) + geom_violin() + geom_boxplot() + geom_point() ``` ] .item[ <img src="images/part1-session2/plot1-1.png" style="display: block; margin: auto;" /> ] ] --- class: font_smaller # Order of the layers matters! Boxplot and violin plot order are switched around. <div class="grid" style="grid: 250px 250px / 1fr 1fr;"> .item[ ```r ggplot(penguins, aes(species, bill_length_mm)) + * geom_violin() + * geom_boxplot() + geom_point() ``` ] .item[ <img src="images/part1-session2/plot1a-1.png" style="display: block; margin: auto;" /> ] .item[ ```r ggplot(penguins, aes(species, bill_length_mm)) + * geom_boxplot() + * geom_violin() + geom_point() ``` ] .item[ <img src="images/part1-session2/plot1b-1.png" style="display: block; margin: auto;" /> ] </div> --- class: font_smaller # Layer-specific data and aesthestic mappings <center> <img src="images/ggplot-multiple-layers-data-mapping.png" width="60%"/> </center> .grid[ .item[ For each layer, aesthestic and/or data can be overwritten. ```r ggplot(penguins, aes(species, bill_length_mm)) + geom_violin(aes(fill = species)) + geom_boxplot(data = filter(penguins, species=="Adelie")) + geom_point(data = filter(penguins, species=="Gentoo"), aes(y = bill_depth_mm)) ``` ] .item[ <img src="images/part1-session2/plot2-1.png" style="display: block; margin: auto;" /> ] ] --- class: font_smaller # Aesthestic or Attribute? .grid[.item[ Not what you want ```r ggplot(penguins) + geom_point(aes(body_mass_g, bill_depth_mm, color = "blue")) ``` <img src="images/part1-session2/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> ] .item[ {{content}} ] ] -- What you want ```r ggplot(penguins) + geom_point(aes(body_mass_g, bill_depth_mm), color = "blue") ``` <img src="images/part1-session2/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> {{content}} -- Tip: the `I` operator will yield the same as above. ```r ggplot(penguins) + geom_point(aes(body_mass_g, bill_depth_mm, color = I("blue"))) ``` --- class: font_smaller # `group` in `ggplot` .grid[ .item[ ```r ggplot(penguins, aes(body_mass_g, bill_depth_mm)) + geom_point(aes(color = species)) + geom_smooth(method = "lm") ``` <img src="images/part1-session2/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> * This is an obvious case of Simpson's paradox. * What if we wanted to draw the fit of a simple linear model for each cluster? ] .item[ {{content}} ] ] -- ```r ggplot(penguins, aes(body_mass_g, bill_depth_mm)) + geom_point(aes(color = species)) + geom_smooth(method = "lm", aes(group = species=="Gentoo")) ``` <img src="images/part1-session2/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> --- class: font_smaller # Facetting ```r g <- ggplot(penguins, aes(bill_length_mm, bill_depth_mm, color = species)) + geom_point() ``` --- .grid[ .item[ ```r g ``` <img src="images/part1-session2/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> ```r g + facet_wrap(~sex) ``` <img src="images/part1-session2/plot3-1.png" style="display: block; margin: auto;" /> ] .item[ ```r g + facet_grid(island ~ sex) ``` <img src="images/part1-session2/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> ] ] --- class: font_small # `facet_wrap` and `facet_grid` .grid[ .item[ ```r g + facet_wrap( ~ sex) ``` <img src="images/part1-session2/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> ```r g + facet_wrap( ~ sex, ncol = 1) ``` <img src="images/part1-session2/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> ] .item[ ```r g + facet_grid(. ~ sex) ``` <img src="images/part1-session2/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> ```r g + facet_grid(sex ~ .) ``` <img src="images/part1-session2/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> ] ] --- background-color: #e5e5e5 <div class="grid" style="grid: 1fr / 3fr 1fr;"> .item[ <a href="https://github.com/rstudio/cheatsheets/blob/master/data-visualization-2.1.pdf"><img src="images/ggplot-cheatsheet.png" width = "100%" style = "border: solid 3px black;"/></a> ] .item[ HELP! * RStudio > Help > Cheatsheets * [R4DS Community Slack](https://www.rfordatasci.com/) * [Twitter with hastag #rstats](https://twitter.com/search?q=%23rstats) * [RStudio Community](https://community.rstudio.com/) * [Stackoverflow](https://stackoverflow.com/questions/tagged/ggplot) ] </div> --- class: transition middle animated slideInLeft # .circle-big[2] # Tidying your data .footnote.monash-bg-blue[ Wickham (2014) Tidy Data. *Journal of Statistical Software* 59 (10): 1β23. ] --- class: font_small # Weight gain in pigs for different treatments The `crampton.pig` is from the `agridat` π¦ ```r library(agridat) glimpse(crampton.pig) ``` ``` ## Rows: 50 ## Columns: 5 ## $ treatment <fct> T1, T1, T1, T1, T1, T1, T1, T1, T1, T1, T2, T2, T2, T2, T2, β¦ ## $ rep <fct> R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R1, R2, R3, R4, R5,β¦ ## $ weight1 <int> 30, 21, 21, 33, 27, 24, 20, 29, 28, 26, 26, 24, 20, 35, 25, β¦ ## $ feed <int> 674, 628, 661, 694, 713, 585, 575, 638, 632, 637, 699, 626, β¦ ## $ weight2 <int> 195, 177, 180, 200, 197, 170, 150, 180, 192, 184, 194, 204, β¦ ``` `weight1` is initial weight and `weight2` is final weight .footnote[ Wright (2018). agridat: Agricultural Datasets. R package version 1.16. https://CRAN.R-project.org/package=agridat Crampton and Hopkins (1934). The Use of the Method of Partial Regression in the Analysis of Comparative Feeding Trial Data, Part II. *The Journal of Nutrition* 8 113-123. ] --- class: font_smaller <br> ```r names(crampton.pig) ``` ``` ## [1] "treatment" "rep" "weight1" "feed" "weight2" ``` <img src="images/part1-session2/pig-plot-1.png" style="display: block; margin: auto;" /> What are the mappings to get the above graph? π€ ```r ggplot(crampton.pig, aes(x = ???, y = ???)) + geom_point() + geom_line() + facet_grid(. ~ treatment) ``` .center[ π€¨ ] --- class: font_smaller # Getting the data in the right form <img src="images/part1-session2/pig-plot-1.png" style="display: block; margin: auto;" /> .grid[.item.border-right[ * The x-axis is the time when pig was weighed * The y-axis is the weight * The facetting is by treatment ``` ## # A tibble: 100 Γ 5 ## treatment feed id when weight ## <fct> <int> <chr> <fct> <int> ## 1 T1 674 pig1 initial 30 ## 2 T1 674 pig1 final 195 ## 3 T1 628 pig2 initial 21 ## 4 T1 628 pig2 final 177 ## 5 T1 661 pig3 initial 21 ## 6 T1 661 pig3 final 180 ## 7 T1 694 pig4 initial 33 ## 8 T1 694 pig4 final 200 ## 9 T1 713 pig5 initial 27 ## 10 T1 713 pig5 final 197 ## # β¦ with 90 more rows ``` ] .item[ {{content}} ]] -- How I wrangled this data ```r pig_df <- crampton.pig %>% mutate(id = paste0("pig", 1:n())) %>% pivot_longer(c(weight1, weight2), names_to = "when", values_to = "weight") %>% mutate(when = factor(when, levels = c("weight1", "weight2"), labels = c("initial", "final"))) ``` <div class="font_small"> (note: teaching wrangling is not part of this workshop, please see <a href="http://emitanaka.org/datawrangle-workshop-ssavic" target="_blank">here</a> if you want to learn more) </div> --- class: font_small # Putting it all together ```r ggplot(pig_df, aes(when, weight)) + # tidying your data for plotting geom_point(size = 3) + # attribute not aesthestic geom_line(aes(group = id)) + # grouping facet_grid(. ~ treatment) + # facetting labs(x = "") # we'll learn this in the last session ``` <img src="images/part1-session2/pig-plot-1.png" style="display: block; margin: auto;" /> --- class: font_smaller # Meaningfully order categorical variables .grid[ .item.border-right[ ```r ggplot(crampton.pig, aes(treatment, weight2 - weight1)) + geom_point(size = 3) ``` <img src="images/part1-session2/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> * Treatments are ordered alphabetically by default * It's better to order categorical variables meaningfully ] .item[ {{content}} ] ] -- Order factor levels by the mean of the weight difference. ```r library(forcats) # for easy factor manipulation crampton.pig2 <- crampton.pig %>% mutate( treatment = fct_reorder(treatment, weight2 - weight1, mean)) ggplot(crampton.pig2, aes(treatment, weight2 - weight1)) + geom_point(size = 3) ``` <img src="images/part1-session2/unnamed-chunk-19-1.png" style="display: block; margin: auto;" /> --- class: font_smaller # Plotting auxilliary data .grid[ .item[ Plot you may want: <img src="images/part1-session2/diff-plot-1.png" style="display: block; margin: auto;" /> ] .item[ {{content}} ] ] -- One way to do this: ```r fig <- ggplot(crampton.pig2, aes(treatment, weight2 - weight1)) + geom_point(size = 3) + stat_summary(fun.data = mean_se, geom = "pointrange", fatten = 2, color = "#027EB6", size = 3) + stat_summary(fun = mean, geom = "line", group = 1, color = "#027EB6", size = 2) ``` --- class: font_smaller # Plotting annotations .grid[ .item.border-right[ ```r fig + geom_text(data = data.frame(treatment = 4.5, weight2 = 185, weight1 = 0), label = "Treatment\n means", size = 3, color = "#027EB6", fontface = "bold") ``` <img src="images/part1-session2/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> ] .item[ {{content}} ] ] -- But it might be just easier to: ```r fig + annotate("text", x = 4.5, y = 185, label = "Treatment\n means", size = 3, color = "#027EB6", fontface = "bold") ``` <img src="images/part1-session2/unnamed-chunk-21-1.png" style="display: block; margin: auto;" /> --- class: exercise middle hide-slide-number <i class="fas fa-users"></i> # <i class="fas fa-code"></i> Open `part1-exercise-02.Rmd` <center>
15
:
00
</center> --- class: font_smaller background-color: #e5e5e5 # Session Information .scroll-350[ ```r devtools::session_info() ``` ``` ## β Session info πΌ ποΈ π¦Έπ½ ββββββββββββββββββββββββββββββββββββββββββββββββββ ## hash: baby bottle, beach with umbrella, superhero: medium skin tone ## ## setting value ## version R version 4.1.2 (2021-11-01) ## os macOS Big Sur 10.16 ## system x86_64, darwin17.0 ## ui X11 ## language (EN) ## collate en_AU.UTF-8 ## ctype en_AU.UTF-8 ## tz Australia/Melbourne ## date 2021-12-06 ## pandoc 2.11.4 @ /Applications/RStudio.app/Contents/MacOS/pandoc/ (via rmarkdown) ## ## β Packages βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ## package * version date (UTC) lib source ## agridat * 1.18 2021-01-12 [1] CRAN (R 4.1.0) ## anicon 0.1.0 2021-11-30 [1] Github (emitanaka/anicon@0b756df) ## assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0) ## backports 1.3.0 2021-10-27 [1] CRAN (R 4.1.0) ## broom 0.7.10 2021-10-31 [1] CRAN (R 4.1.0) ## bslib 0.3.1 2021-10-06 [1] CRAN (R 4.1.0) ## cachem 1.0.6 2021-08-19 [1] CRAN (R 4.1.0) ## callr 3.7.0 2021-04-20 [1] CRAN (R 4.1.0) ## cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.1.0) ## cli 3.1.0 2021-10-27 [1] CRAN (R 4.1.0) ## colorspace 2.0-2 2021-06-24 [1] CRAN (R 4.1.0) ## countdown 0.3.5 2021-11-30 [1] Github (gadenbuie/countdown@a544fa4) ## crayon 1.4.2 2021-10-29 [1] CRAN (R 4.1.0) ## DBI 1.1.1 2021-01-15 [1] CRAN (R 4.1.0) ## dbplyr 2.1.1 2021-04-06 [1] CRAN (R 4.1.0) ## desc 1.4.0 2021-09-28 [1] CRAN (R 4.1.0) ## devtools 2.4.2 2021-06-07 [1] CRAN (R 4.1.0) ## digest 0.6.28 2021-09-23 [1] CRAN (R 4.1.0) ## dplyr * 1.0.7 2021-06-18 [1] CRAN (R 4.1.0) ## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0) ## evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0) ## fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0) ## farver 2.1.0 2021-02-28 [1] CRAN (R 4.1.0) ## fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0) ## forcats * 0.5.1 2021-01-27 [1] CRAN (R 4.1.0) ## fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0) ## generics 0.1.1 2021-10-25 [1] CRAN (R 4.1.0) ## ggplot2 * 3.3.5 2021-06-25 [1] CRAN (R 4.1.0) ## glue 1.5.0 2021-11-07 [1] CRAN (R 4.1.0) ## gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.0) ## haven 2.4.3 2021-08-04 [1] CRAN (R 4.1.0) ## highr 0.9 2021-04-16 [1] CRAN (R 4.1.0) ## hms 1.1.1 2021-09-26 [1] CRAN (R 4.1.0) ## htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.0) ## httr 1.4.2 2020-07-20 [1] CRAN (R 4.1.0) ## icon 0.1.0 2021-11-30 [1] Github (emitanaka/icon@8458546) ## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.1.0) ## jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.1.0) ## knitr 1.36 2021-09-29 [1] CRAN (R 4.1.0) ## labeling 0.4.2 2020-10-20 [1] CRAN (R 4.1.0) ## lattice 0.20-45 2021-09-22 [1] CRAN (R 4.1.2) ## lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.0) ## lubridate 1.8.0 2021-10-07 [1] CRAN (R 4.1.0) ## magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0) ## Matrix 1.3-4 2021-06-01 [1] CRAN (R 4.1.2) ## memoise 2.0.0 2021-01-26 [1] CRAN (R 4.1.0) ## mgcv 1.8-38 2021-10-06 [1] CRAN (R 4.1.2) ## modelr 0.1.8 2020-05-19 [1] CRAN (R 4.1.0) ## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.0) ## nlme 3.1-153 2021-09-07 [1] CRAN (R 4.1.2) ## palmerpenguins * 0.1.0 2020-07-23 [1] CRAN (R 4.1.0) ## pillar 1.6.4 2021-10-18 [1] CRAN (R 4.1.0) ## pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.1.0) ## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0) ## pkgload 1.2.3 2021-10-13 [1] CRAN (R 4.1.0) ## prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.1.0) ## processx 3.5.2 2021-04-30 [1] CRAN (R 4.1.0) ## ps 1.6.0 2021-02-28 [1] CRAN (R 4.1.0) ## purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.1.0) ## R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.0) ## Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.0) ## readr * 2.1.0 2021-11-11 [1] CRAN (R 4.1.0) ## readxl 1.3.1 2019-03-13 [1] CRAN (R 4.1.0) ## remotes 2.4.1 2021-09-29 [1] CRAN (R 4.1.0) ## reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.0) ## rlang 0.4.12 2021-10-18 [1] CRAN (R 4.1.0) ## rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.0) ## rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.0) ## rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0) ## rvest 1.0.2 2021-10-16 [1] CRAN (R 4.1.0) ## sass 0.4.0 2021-05-12 [1] CRAN (R 4.1.0) ## scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0) ## sessioninfo 1.2.1 2021-11-02 [1] CRAN (R 4.1.0) ## stringi 1.7.5 2021-10-04 [1] CRAN (R 4.1.0) ## stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.1.0) ## testthat 3.1.0 2021-10-04 [1] CRAN (R 4.1.0) ## tibble * 3.1.6 2021-11-07 [1] CRAN (R 4.1.0) ## tidyr * 1.1.4 2021-09-27 [1] CRAN (R 4.1.0) ## tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0) ## tidyverse * 1.3.1 2021-04-15 [1] CRAN (R 4.1.0) ## tzdb 0.2.0 2021-10-27 [1] CRAN (R 4.1.0) ## usethis 2.1.3 2021-10-27 [1] CRAN (R 4.1.0) ## utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0) ## vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0) ## whisker 0.4 2019-08-28 [1] CRAN (R 4.1.0) ## withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0) ## xaringan 0.22 2021-06-23 [1] CRAN (R 4.1.0) ## xfun 0.28 2021-11-04 [1] CRAN (R 4.1.0) ## xml2 1.3.2 2020-04-23 [1] CRAN (R 4.1.0) ## yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0) ## ## [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library ## ## ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ``` ] These slides are licensed under <br><center><a href="https://creativecommons.org/licenses/by-sa/3.0/au/"><img src="images/cc.svg" style="height:2em;"/><img src="images/by.svg" style="height:2em;"/><img src="images/sa.svg" style="height:2em;"/></a></center>