class: monash-bg-blue center middle hide-slide-number <div class="bg-black white" style="width:45%;right:0;bottom:0;padding-left:5px;border: solid 4px white;margin: auto;"> <i class="fas fa-exclamation-circle"></i> These slides are viewed best by Chrome and occasionally need to be refreshed if elements did not load properly. See here for <a href=part2-session4.pdf>PDF <i class="fas fa-file-pdf"></i></a>. </div> .white[Push the **right arrow key** to see the next slide.] --- count: false background-image: url(images/d2bg4.jpg) background-size: cover class: hide-slide-number title-slide <div class="grid-row" style="grid: 1fr / 2fr;"> .item.center[ # <span style="text-shadow: 2px 2px 30px white;">Data Visualization with R <br> Workshop Part 2</span> <!-- ## <span style="color:yellow;text-shadow: 2px 2px 30px black;">Determining the best plot design</span> --> ] .center.shade_black.animated.bounceInUp.slower[ <br><br> ## Determining the best plot design <br> Presented by Di Cook Department of Econometrics and Business Statistics <img src="images/monash-one-line-reversed.png" style="width:500px"><br>
<i class="fas fa-envelope faa-float animated "></i>
dicook@monash.edu
<i class="fab fa-twitter faa-float animated faa-fast "></i>
@visnut .bottom_abs.width100.bg-black[ 6th Dec 2021 @ Statistical Society of Australia NSW Branch | Zoom ] ] </div> --- class: middle background-image: \url(images/who_wore_it_better.jpg) background-size: 40% background-position: 99% 50% .font_large[Let's play a game: <br>Which plot wears it better?] --- class: font_smaller2 On the next slide we have made **two different plots** of 2012 TB incidence in Australia, based on two variables: ``` ## # A tibble: 5 Γ 3 ## sex age_group count ## <chr> <fct> <dbl> ## 1 m 15-24 26 ## 2 m 25-34 40 ## 3 m 35-44 17 ## 4 m 45-54 25 ## 5 m 55-64 16 ``` - In arrangement A, separate plots are made for age, and sex is mapped to the x axis. - Conversely, in arrangement B, separate plots are made for sex, and age is mapped to the x axis. If you were to answer the question: **At which age(s) are the counts for males and females relatively the same?** Which plot makes this easier? --- .pull-left[ <img src="images/day2-session4/focus on one year gender side-by-side bars of males/females-1.png" width="100%" style="display: block; margin: auto;" /> <img src="images/day2-session4/focus on one year age side-by-side bars of age group-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ We've got two different rearrangements of the same information. **At which age(s) are the counts for males and females relatively the same?** Which plot makes this easier?
00
:
30
<br> What do we learn? That is different from each? What's the focus of each? What's easy, what's harder? ] --- class: transition middle animated slideInLeft Arrangement A makes it easier to directly compare male and female counts, separately for each age group. Generally, male counts are higher than female counts. There is a big difference between counts in the 45-54 age group, and over 65 counts are almost the same. Arrangement B makes it easier to directly compare counts by age group, separately for females and males. For females, incidence drops in the middle years. For males, it is pretty consistently high across age groups. --- class: font_smaller2 .pull-left[
Try to write out a question that would be easier to answer from arrangement B.
] .pull-right[ <img src="images/day2-session4/unnamed-chunk-6-1.png" width="100%" style="display: block; margin: auto;" /> <img src="images/day2-session4/unnamed-chunk-7-1.png" width="100%" style="display: block; margin: auto;" /> ]
00
:
30
--- class: font_smaller2 On the next slide we have made **two different plots** of TB incidence in the Australia, based on three variables: ``` ## # A tibble: 5 Γ 4 ## year sex age_group count ## <dbl> <chr> <fct> <dbl> ## 1 1997 m 15-24 8 ## 2 1997 m 25-34 24 ## 3 1997 m 35-44 18 ## 4 1997 m 45-54 13 ## 5 1997 m 55-64 17 ``` - In plot type A, a line plot of counts is drawn separately by age and sex, and year is mapped to the x axis. - Conversely, in plot type B, counts for sex, and age are stacked into a bar chart, separately by age and sex, and year is mapped to the x axis If you were to answer the question: **The trend in incidence over years for females is generally decreasing?** Which plot makes this easier? --- .pull-left[ <img src="images/day2-session4/use a line plot instead of bar-1.png" width="100%" style="display: block; margin: auto;" /> <img src="images/day2-session4/colour and axes fixes-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ Which type of plot makes it easier to answer: **The trend in incidence over years for females is generally flat?** What are the pros and cons of each way of displaying the same information? Should specific limits on axes be made?
00
:
30
] --- class: transition middle animated slideInLeft Plot type A makes it easier to examine trend for each group. This plot should probably have used 0 as the lower limit. Plot type B is really only allowing the overall trend in count to be examined separately by age. It is also possible to see trend for males. Trend for females is buried because the bars start at irregular heights. The separated bars distract from digesting the overall count. --- .pull-left[ The following plots focus on proportion of males vs females. Plot A computes the proportion and displays this as a line plot. Plot B uses a 100% chart of stacked bars for females and males. What are the strengths and weaknesses of each?
00
:
30
] .pull-right[ <img src="images/day2-session4/use a line plot for proportions-1.png" width="100%" style="display: block; margin: auto;" /> <img src="images/day2-session4/compare proportions of males/females-1.png" width="100%" style="display: block; margin: auto;" /> ] --- class: transition middle animated slideInLeft Plot A makes it easier to examine the trend in proportion. It is easy to miss that all the proportions are greater than 0.5, despite having a guideline (white) at 0.5. It could be argued that setting the vertical axis limits could alleviate this. The fluctuations from year to year are more visible. Maybe adding a trend model could be helpful, to reduce this noise. Without colour its less visually appealing. Plot B makes it easier to see that the proportion for males is almost always higher than for females. It also suggests that there is little temporal trend, because the small fluctuations between years is less visible. Having colour makes it more visually appealing. There s less data processing. --- # Perceptual principles - Hierarchy of mappings - Pre-attentive: some elements are noticed before you even realise it. - Color palettes: qualitative, sequential, diverging, *palindrome*. - Proximity: Place elements for primary comparison close together. - Change blindness: When focus is interrupted differences may not be noticed. --- .grid[ .item.center[ <img src="images/day2-session4/task-position-common-scale-1.svg"><img src="images/day2-session4/task-position-non-aligned-scale-1.svg" style="padding-left:10px;"><img src="images/day2-session4/task-length-1.svg" style="padding-left:10px;"><img src="images/day2-session4/task-direction-1.svg" style="padding-left:10px;"><img src="images/day2-session4/task-angle-1.svg" style="padding-left:10px;"> <img src="images/day2-session4/task-area-1.svg"><img src="images/day2-session4/task-volume-1.svg" style="padding-left:10px;"><img src="images/day2-session4/task-curvature-1.svg" style="padding-left:10px;"><img src="images/day2-session4/task-texture.svg" style="padding-left:10px;" width="220pt" height="172pt"><img src="images/day2-session4/task-shape-1.svg" style="padding-left:10px;"> {{content}} ]] --- # Hierarchy of mappings .pull-left[ 1. Position - common scale (BEST) 2. Position - nonaligned scale 3. Length, direction, angle 4. Area 5. Volume, curvature 6. Shading, color (WORST) (Cleveland, 1984; Heer and Bostock, 2009)
Try to come up with a plot type for one of the mappings.
] -- .pull-right[ 1. scatterplot, barchart 2. side-by-side boxplot, stacked barchart 3. piechart, rose plot, gauge plot, donut, wind direction map, starplot 4. treemap, bubble chart, mosaicplot 5. chernoff face 6. choropleth map ] --- # Pre-attentive Can you find the odd one out? <img src="images/day2-session4/is shape preattentive-1.png" width="50%" style="display: block; margin: auto;" /> --- # Pre-attentive Is it easier now? <img src="images/day2-session4/is color preattentive-1.png" width="50%" style="display: block; margin: auto;" /> --- # Proximity Place elements that you want to compare close to each other. If there are multiple comparisons to make, you need to decide which one is most important. .font_small[ ```r ggplot(tb_oz, aes(x = year, y = count, colour = sex)) + geom_line() + geom_point() + facet_wrap(~age_group, ncol = 6) + ylim(c(0, 70)) + scale_colour_brewer(name = "", palette = "Dark2") + ggtitle("Arrangement A") ``` ```r ggplot(tb_oz, aes(x = year, y = count, colour = age_group)) + geom_line() + geom_point() + facet_wrap(~sex, ncol = 2) + ylim(c(0, 70)) + scale_colour_brewer(name = "", palette = "Dark2") + ggtitle("Arrangement B") ``` ] --- <img src="images/day2-session4/unnamed-chunk-12-1.png" width="70%" style="display: block; margin: auto;" /> <img src="images/day2-session4/unnamed-chunk-13-1.png" width="50%" style="display: block; margin: auto;" /> --- # Mapping and proximity .pull-left[ Same proximity is used, but different geoms. Is one better than the other to determine the relative ratios of males to females by age? ] .pull-right[ <img src="images/day2-session4/side-by-side bars of males/females-1.png" width="100%" style="display: block; margin: auto;" /> <img src="images/day2-session4/piecharts of males/females-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Mapping and proximity .pull-left[ Same proximity is used, but different geoms. Is one better than the other to determine the relative ratios of ages by sex? ] .pull-right[ <img src="images/day2-session4/side-by-side bars of age-1.png" width="100%" style="display: block; margin: auto;" /> <img src="images/day2-session4/piecharts of age-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Change blindness Which has the steeper slope, 15-24 or 25-34 males? <img src="images/day2-session4/unnamed-chunk-14-1.png" width="70%" style="display: block; margin: auto;" /> --- # Change blindness Which has the steeper slope, 15-24 or 25-34 males? Making comparisons across plots requires the eye to jump from one focal point to another. It may result in not noticing differences. <img src="images/day2-session4/averlaying makes comparisons easier-1.png" width="60%" style="display: block; margin: auto;" /> --- background-image: \url(https://pbs.twimg.com/profile_images/1092451626781163523/0YzJMi-8_400x400.jpg) background-size: 20% background-position: 100% 50% class: transition middle animated slideInLeft Let's play one more game. --- # Which one is different? <img src="images/day2-session4/lineup 1-1.png" width="50%" style="display: block; margin: auto;" /> --- # Which one is different? <img src="images/day2-session4/lineup 2-1.png" width="50%" style="display: block; margin: auto;" /> --- # Testing infrastructure Both of these were quite easy. The testing procedure is called a lineup protocol: 1. Based on the grammar description of the plot, determine a null generating method (eg permute, simulate) 2. Generate many null plots, and embed your data plot randomly among them 3. Show to a good number of observers (two sample problem) and ask them to pick the plot that is different. (Crowd-sourcing can help.) 4. The plot type/style that has the larger proportion of observers detecting the data plot is the better design. --- # Resources - [Fundamentals of Data Visualization, Claus O. Wilke](https://serialmentor.com/dataviz/) - Hofmann, H., Follett, L., Majumder, M. and Cook, D. (2012) Graphical Tests for Power Comparison of Competing Designs, http://doi.ieeecomputersociety.org/10.1109/TVCG.2012.230. - Wickham, H., Cook, D., Hofmann, H. and Buja, A. (2010) Graphical Inference for Infovis, http://doi.ieeecomputersociety.org/10.1109/TVCG.2010.161. --- class: exercise middle hide-slide-number <i class="fas fa-users"></i> # <i class="fas fa-code"></i> Open `part2-exercise-04.Rmd` <center>
15
:
00
</center> --- class: font_smaller background-color: #e5e5e5 # Session Information .scroll-350[ ``` ## β Session info π€π» π― γ°οΈ βββββββββββββββββββββββββββββββββββββββββββββββββ ## hash: sign of the horns: light skin tone, Japanese βreservedβ button, wavy dash ## ## setting value ## version R version 4.1.2 (2021-11-01) ## os macOS Big Sur 10.16 ## system x86_64, darwin17.0 ## ui X11 ## language (EN) ## collate en_AU.UTF-8 ## ctype en_AU.UTF-8 ## tz Australia/Melbourne ## date 2021-11-30 ## pandoc 2.11.4 @ /Applications/RStudio.app/Contents/MacOS/pandoc/ (via rmarkdown) ## ## β Packages βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ## package * version date (UTC) lib source ## anicon 0.1.0 2021-11-30 [1] Github (emitanaka/anicon@0b756df) ## assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0) ## backports 1.3.0 2021-10-27 [1] CRAN (R 4.1.0) ## bit 4.0.4 2020-08-04 [1] CRAN (R 4.1.0) ## bit64 4.0.5 2020-08-30 [1] CRAN (R 4.1.0) ## broom 0.7.10 2021-10-31 [1] CRAN (R 4.1.0) ## bslib 0.3.1 2021-10-06 [1] CRAN (R 4.1.0) ## cachem 1.0.6 2021-08-19 [1] CRAN (R 4.1.0) ## callr 3.7.0 2021-04-20 [1] CRAN (R 4.1.0) ## cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.1.0) ## class 7.3-19 2021-05-03 [1] CRAN (R 4.1.2) ## cli 3.1.0 2021-10-27 [1] CRAN (R 4.1.0) ## cluster 2.1.2 2021-04-17 [1] CRAN (R 4.1.2) ## colorspace * 2.0-2 2021-06-24 [1] CRAN (R 4.1.0) ## countdown 0.3.5 2021-11-30 [1] Github (gadenbuie/countdown@a544fa4) ## crayon 1.4.2 2021-10-29 [1] CRAN (R 4.1.0) ## DBI 1.1.1 2021-01-15 [1] CRAN (R 4.1.0) ## dbplyr 2.1.1 2021-04-06 [1] CRAN (R 4.1.0) ## DEoptimR 1.0-9 2021-05-24 [1] CRAN (R 4.1.0) ## desc 1.4.0 2021-09-28 [1] CRAN (R 4.1.0) ## devtools 2.4.2 2021-06-07 [1] CRAN (R 4.1.0) ## digest 0.6.28 2021-09-23 [1] CRAN (R 4.1.0) ## diptest 0.76-0 2021-05-04 [1] CRAN (R 4.1.0) ## dplyr * 1.0.7 2021-06-18 [1] CRAN (R 4.1.0) ## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0) ## evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0) ## fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0) ## farver 2.1.0 2021-02-28 [1] CRAN (R 4.1.0) ## fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0) ## flexmix 2.3-17 2020-10-12 [1] CRAN (R 4.1.0) ## forcats * 0.5.1 2021-01-27 [1] CRAN (R 4.1.0) ## fpc 2.2-9 2020-12-06 [1] CRAN (R 4.1.0) ## fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0) ## generics 0.1.1 2021-10-25 [1] CRAN (R 4.1.0) ## ggplot2 * 3.3.5 2021-06-25 [1] CRAN (R 4.1.0) ## ggthemes * 4.2.4 2021-01-20 [1] CRAN (R 4.1.0) ## glue 1.5.0 2021-11-07 [1] CRAN (R 4.1.0) ## gridExtra 2.3 2017-09-09 [1] CRAN (R 4.1.0) ## gridSVG * 1.7-2 2020-04-28 [1] CRAN (R 4.1.0) ## gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.0) ## haven 2.4.3 2021-08-04 [1] CRAN (R 4.1.0) ## here 1.0.1 2020-12-13 [1] CRAN (R 4.1.0) ## highr 0.9 2021-04-16 [1] CRAN (R 4.1.0) ## hms 1.1.1 2021-09-26 [1] CRAN (R 4.1.0) ## htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.0) ## httr 1.4.2 2020-07-20 [1] CRAN (R 4.1.0) ## icon 0.1.0 2021-11-30 [1] Github (emitanaka/icon@8458546) ## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.1.0) ## jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.1.0) ## kernlab 0.9-29 2019-11-12 [1] CRAN (R 4.1.0) ## knitr 1.36 2021-09-29 [1] CRAN (R 4.1.0) ## labeling 0.4.2 2020-10-20 [1] CRAN (R 4.1.0) ## lattice 0.20-45 2021-09-22 [1] CRAN (R 4.1.2) ## lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.0) ## lmtest 0.9-39 2021-11-07 [1] CRAN (R 4.1.0) ## lubridate 1.8.0 2021-10-07 [1] CRAN (R 4.1.0) ## magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0) ## MASS 7.3-54 2021-05-03 [1] CRAN (R 4.1.2) ## Matrix 1.3-4 2021-06-01 [1] CRAN (R 4.1.2) ## mclust 5.4.8 2021-11-05 [1] CRAN (R 4.1.0) ## memoise 2.0.0 2021-01-26 [1] CRAN (R 4.1.0) ## mgcv 1.8-38 2021-10-06 [1] CRAN (R 4.1.2) ## modelr 0.1.8 2020-05-19 [1] CRAN (R 4.1.0) ## modeltools 0.2-23 2020-03-05 [1] CRAN (R 4.1.0) ## moments 0.14 2015-01-05 [1] CRAN (R 4.1.0) ## munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.0) ## nlme 3.1-153 2021-09-07 [1] CRAN (R 4.1.2) ## nnet 7.3-16 2021-05-03 [1] CRAN (R 4.1.2) ## nullabor * 0.3.9 2020-02-25 [1] CRAN (R 4.1.0) ## palmerpenguins * 0.1.0 2020-07-23 [1] CRAN (R 4.1.0) ## pillar 1.6.4 2021-10-18 [1] CRAN (R 4.1.0) ## pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.1.0) ## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0) ## pkgload 1.2.3 2021-10-13 [1] CRAN (R 4.1.0) ## prabclus 2.3-2 2020-01-08 [1] CRAN (R 4.1.0) ## prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.1.0) ## processx 3.5.2 2021-04-30 [1] CRAN (R 4.1.0) ## ps 1.6.0 2021-02-28 [1] CRAN (R 4.1.0) ## purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.1.0) ## R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.0) ## RColorBrewer * 1.1-2 2014-12-07 [1] CRAN (R 4.1.0) ## Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.0) ## readr * 2.1.0 2021-11-11 [1] CRAN (R 4.1.0) ## readxl 1.3.1 2019-03-13 [1] CRAN (R 4.1.0) ## remotes 2.4.1 2021-09-29 [1] CRAN (R 4.1.0) ## reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.0) ## rlang 0.4.12 2021-10-18 [1] CRAN (R 4.1.0) ## rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.0) ## robustbase 0.93-9 2021-09-27 [1] CRAN (R 4.1.0) ## rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.0) ## rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0) ## rvest 1.0.2 2021-10-16 [1] CRAN (R 4.1.0) ## sass 0.4.0 2021-05-12 [1] CRAN (R 4.1.0) ## scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0) ## sessioninfo 1.2.1 2021-11-02 [1] CRAN (R 4.1.0) ## stringi 1.7.5 2021-10-04 [1] CRAN (R 4.1.0) ## stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.1.0) ## testthat 3.1.0 2021-10-04 [1] CRAN (R 4.1.0) ## tibble * 3.1.6 2021-11-07 [1] CRAN (R 4.1.0) ## tidyr * 1.1.4 2021-09-27 [1] CRAN (R 4.1.0) ## tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0) ## tidyverse * 1.3.1 2021-04-15 [1] CRAN (R 4.1.0) ## tzdb 0.2.0 2021-10-27 [1] CRAN (R 4.1.0) ## usethis 2.1.3 2021-10-27 [1] CRAN (R 4.1.0) ## utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0) ## vcd * 1.4-9 2021-10-18 [1] CRAN (R 4.1.0) ## vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0) ## viridis * 0.6.2 2021-10-13 [1] CRAN (R 4.1.0) ## viridisLite * 0.4.0 2021-04-13 [1] CRAN (R 4.1.0) ## vroom 1.5.6 2021-11-10 [1] CRAN (R 4.1.0) ## whisker 0.4 2019-08-28 [1] CRAN (R 4.1.0) ## withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0) ## xaringan 0.22 2021-06-23 [1] CRAN (R 4.1.0) ## xfun 0.28 2021-11-04 [1] CRAN (R 4.1.0) ## XML 3.99-0.8 2021-09-17 [1] CRAN (R 4.1.0) ## xml2 1.3.2 2020-04-23 [1] CRAN (R 4.1.0) ## yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0) ## zoo 1.8-9 2021-03-09 [1] CRAN (R 4.1.0) ## ## [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library ## ## ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ``` ] These slides are licensed under <br><center><a href="https://creativecommons.org/licenses/by-sa/3.0/au/"><img src="images/cc.svg" style="height:2em;"/><img src="images/by.svg" style="height:2em;"/><img src="images/sa.svg" style="height:2em;"/></a></center>