A quick and flexible visualisation system for the designs of experiments

Emi Tanaka

Biological Data Science Institute
Australian National University
emi.tanaka@anu.edu.au

14th December 2023

What visualisation for experimental designs?

  • The type of visualisation we seek may be:

A flow chart



Aggregated summaries by experimental structure


Before or after data collection.

Experimental design layout


Today will be about the static visualisation of the experimental design layout.

The functional aim of the experimental design layout

  • Visualisation of experimental design layout can help us:
    • communicate and confirm our understanding of the experimental design, and
    • reveal functional characteristics (e.g. whether treatment randomisation was done as expected) of the experimental design.
  • We want the visualisation system to be:
    • quick to draw the layout for rapid communication, and
    • flexible to make polished visualisations for publishing in formal outlets.



Talk title: A quick and flexible visualisation system for the designs of experiments







One visualisation system to rule them all?



Motivating examples

  • Visualisation constructed via the ggplot2 R package to illustrate the friction in the workflow
  • The broad idea extends to other visualisation systems and other languages

Completely randomised design (n = 20, t = 3)

  • 20 subjects
  • 3 treatments (labelled A, B, C)
crd20
     subject trt
1  subject01   C
2  subject02   A
3  subject03   A
4  subject04   B
5  subject05   B
6  subject06   C
7  subject07   C
8  subject08   B
9  subject09   A
10 subject10   A
11 subject11   B
12 subject12   B
13 subject13   C
14 subject14   B
15 subject15   A
16 subject16   B
17 subject17   A
18 subject18   A
19 subject19   C
20 subject20   C
  • Map the columns in the experimental design table to visual elements.
library(ggplot2)
ggplot(crd20, aes(x = subject, y = 1, color = trt)) +
  geom_point(size = 10)

Completely randomised design (n = 100, t = 3)

  • 100 subjects
  • 3 treatments (labelled A, B, C)
crd100
       subject trt
1   subject001   A
2   subject002   B
3   subject003   A
4   subject004   C
5   subject005   C
6   subject006   B
7   subject007   B
8   subject008   A
9   subject009   B
10  subject010   C
11  subject011   C
12  subject012   C
13  subject013   B
14  subject014   A
15  subject015   C
16  subject016   C
17  subject017   B
18  subject018   C
19  subject019   A
20  subject020   B
21  subject021   B
22  subject022   B
23  subject023   B
24  subject024   A
25  subject025   C
26  subject026   A
27  subject027   A
28  subject028   C
29  subject029   C
30  subject030   B
31  subject031   B
32  subject032   B
33  subject033   B
34  subject034   C
35  subject035   A
36  subject036   A
37  subject037   C
38  subject038   C
39  subject039   C
40  subject040   A
41  subject041   C
42  subject042   B
43  subject043   C
44  subject044   C
45  subject045   B
46  subject046   C
47  subject047   B
48  subject048   C
49  subject049   A
50  subject050   A
51  subject051   C
52  subject052   C
53  subject053   A
54  subject054   B
55  subject055   B
56  subject056   A
57  subject057   B
58  subject058   C
59  subject059   B
60  subject060   A
61  subject061   C
62  subject062   B
63  subject063   C
64  subject064   A
65  subject065   C
66  subject066   C
67  subject067   A
68  subject068   B
69  subject069   A
70  subject070   B
71  subject071   B
72  subject072   A
73  subject073   B
74  subject074   A
75  subject075   B
76  subject076   A
77  subject077   A
78  subject078   C
79  subject079   C
80  subject080   C
81  subject081   C
82  subject082   B
83  subject083   B
84  subject084   A
85  subject085   C
86  subject086   A
87  subject087   B
88  subject088   B
89  subject089   A
90  subject090   A
91  subject091   A
92  subject092   A
93  subject093   B
94  subject094   B
95  subject095   C
96  subject096   A
97  subject097   A
98  subject098   A
99  subject099   B
100 subject100   A
ggplot(crd100, aes(x = subject, y = 1, color = trt)) +
  geom_point(size = 10) 

Doesn’t scale well!

Wickham (2016) ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.

Completely randomised design (n = 5 \times 20, t = 3)

  • Graphical displays are generally rectangular.
  • Make new coordinates of the units to fit the display better.

An idiosyncratic way to create new coordinates:

library(dplyr)
crd100d <- crd100 %>% 
  mutate(y = rep(1:5, each = 20),
         x = rep(1:20, times = 5)) 

crd100d
       subject trt y  x
1   subject001   A 1  1
2   subject002   B 1  2
3   subject003   A 1  3
4   subject004   C 1  4
5   subject005   C 1  5
6   subject006   B 1  6
7   subject007   B 1  7
8   subject008   A 1  8
9   subject009   B 1  9
10  subject010   C 1 10
11  subject011   C 1 11
12  subject012   C 1 12
13  subject013   B 1 13
14  subject014   A 1 14
15  subject015   C 1 15
16  subject016   C 1 16
17  subject017   B 1 17
18  subject018   C 1 18
19  subject019   A 1 19
20  subject020   B 1 20
21  subject021   B 2  1
22  subject022   B 2  2
23  subject023   B 2  3
24  subject024   A 2  4
25  subject025   C 2  5
26  subject026   A 2  6
27  subject027   A 2  7
28  subject028   C 2  8
29  subject029   C 2  9
30  subject030   B 2 10
31  subject031   B 2 11
32  subject032   B 2 12
33  subject033   B 2 13
34  subject034   C 2 14
35  subject035   A 2 15
36  subject036   A 2 16
37  subject037   C 2 17
38  subject038   C 2 18
39  subject039   C 2 19
40  subject040   A 2 20
41  subject041   C 3  1
42  subject042   B 3  2
43  subject043   C 3  3
44  subject044   C 3  4
45  subject045   B 3  5
46  subject046   C 3  6
47  subject047   B 3  7
48  subject048   C 3  8
49  subject049   A 3  9
50  subject050   A 3 10
51  subject051   C 3 11
52  subject052   C 3 12
53  subject053   A 3 13
54  subject054   B 3 14
55  subject055   B 3 15
56  subject056   A 3 16
57  subject057   B 3 17
58  subject058   C 3 18
59  subject059   B 3 19
60  subject060   A 3 20
61  subject061   C 4  1
62  subject062   B 4  2
63  subject063   C 4  3
64  subject064   A 4  4
65  subject065   C 4  5
66  subject066   C 4  6
67  subject067   A 4  7
68  subject068   B 4  8
69  subject069   A 4  9
70  subject070   B 4 10
71  subject071   B 4 11
72  subject072   A 4 12
73  subject073   B 4 13
74  subject074   A 4 14
75  subject075   B 4 15
76  subject076   A 4 16
77  subject077   A 4 17
78  subject078   C 4 18
79  subject079   C 4 19
80  subject080   C 4 20
81  subject081   C 5  1
82  subject082   B 5  2
83  subject083   B 5  3
84  subject084   A 5  4
85  subject085   C 5  5
86  subject086   A 5  6
87  subject087   B 5  7
88  subject088   B 5  8
89  subject089   A 5  9
90  subject090   A 5 10
91  subject091   A 5 11
92  subject092   A 5 12
93  subject093   B 5 13
94  subject094   B 5 14
95  subject095   C 5 15
96  subject096   A 5 16
97  subject097   A 5 17
98  subject098   A 5 18
99  subject099   B 5 19
100 subject100   A 5 20
ggplot(crd100d, aes(x, y, color = trt)) +
  geom_point(size = 10) 

Order of units is unclear

Guides for the order of units

By a path:


Code
crd100d <- crd100 %>% 
  mutate(
    y = rep(1:5, each = 20),
    x = rep(1:20, times = 5),
   xp = case_when(row_number() == 1   ~ x - 1.5,
                  row_number() == n() ~ x + 1.5,
                  .default = x)) 

ggplot(crd100d, aes(x, y)) +
  geom_path(aes(x = xp),
            linewidth = 2, 
            arrow = grid::arrow()) + 
  geom_point(aes(color = trt), size = 10) 

By text label:


Code
crd100d %>% 
  mutate(id = row_number()) %>% 
  ggplot(aes(x, y)) +
  geom_point(aes(color = trt), size = 14) +
  geom_text(aes(label = id), size = 8)

You have to carefully engineer new coordinates for each design.
Not for a frictionless workflow.

Completely randomised design (n = 20, t = 10)

  • 20 subjects
  • 10 treatments
crdt10
     subject trt
1  subject01   G
2  subject02   I
3  subject03   J
4  subject04   A
5  subject05   D
6  subject06   F
7  subject07   G
8  subject08   H
9  subject09   I
10 subject10   A
11 subject11   H
12 subject12   B
13 subject13   B
14 subject14   C
15 subject15   E
16 subject16   J
17 subject17   E
18 subject18   D
19 subject19   C
20 subject20   F
  • Color doesn’t scale well with the number of treatments.
  • We struggle to discriminate even 7 color palettes1.
Code
ggplot(crdt10, aes(x = subject, y = 1)) +
  geom_point(aes(color = trt), size = 10) +
  theme(legend.position = "bottom")

  • Text label doesn’t take advantage of pre-attentive processing2.
Code
ggplot(crdt10, aes(x = subject, y = 1)) +
  geom_point(size = 14, color = "lightgrey") +
  geom_text(aes(label = trt), size = 10) 

Factorial design (n = 20, t = 5 \times 2)

  • 20 subjects
  • 5 drug types
  • 2 doses
fac5x2
     subject drug dose
1  subject01    A High
2  subject02    C  Low
3  subject03    D  Low
4  subject04    D  Low
5  subject05    A  Low
6  subject06    B High
7  subject07    B High
8  subject08    E High
9  subject09    B  Low
10 subject10    B  Low
11 subject11    C High
12 subject12    E High
13 subject13    D High
14 subject14    A  Low
15 subject15    A High
16 subject16    C  Low
17 subject17    E  Low
18 subject18    D High
19 subject19    E  Low
20 subject20    C High
  • drug is mapped to color
  • dose is mapped to shape
Code
ggplot(fac5x2, aes(x = subject, y = 1)) +
  geom_point(aes(color = drug, shape = dose), size = 10) + 
  theme(legend.position = "bottom")

  • In this instance, each treatment factor has to be mapped to some visual property.

Real experiments

Wheat experiment

Note: terminology and description in Gurung et al (2012) slightly differ.

Study spraying and planting date effect on different wheat varieties for development of foliar blight complex (disease).

  • Four wheat varieties (V1, V2, V3, V4) differing in genetic background were assigned randomly to sub-column.
  • Three planting dates (Nov 26, Dec 11, Dec 26) were randomly assigned to the whole plot.
  • Three status (control, sampled, sprayed) were assigned systematically to the sub-row.


Visualising the wheat experiment

head(wheat)
  subcol subrow col row  status planting_date variety
1      1      1   1   1 sprayed         date1      V3
2      2      1   1   1 sprayed         date1      V1
3      3      1   1   1 sprayed         date1      V2
4      4      1   1   1 sprayed         date1      V4
5      1      2   1   1 sampled         date1      V3
6      2      2   1   1 sampled         date1      V1

Code
ggplot(wheat, aes(subcol, subrow)) +
  geom_tile(aes(fill = status, alpha = variety), color = "black", linewidth = 1.2) + 
  geom_text(aes(label = variety)) +
  geom_text(aes(label = planting_date, color = planting_date), 
            x = 2.5, y = 0, fontface = "bold", size = 6,
            data = ~dplyr::slice(., 1, .by = c(row, col))) +
  facet_grid(row ~ col) +
  coord_cartesian(clip = 'off') +
  scale_y_discrete(expand = c(0.8, 0)) +
  guides(alpha = "none", color = "none") + 
  scale_alpha_manual(values = c(0.4, 0.6, 0.8, 1))

Existing visualisation systems

  • Very little work on high-level visualisation systems of experimental design layouts.
  • Some notable examples are desplot and agricolaeplotr in R, and FielDHub as a Shiny web app (primary for agricultural experiments!).
  • Common traits:
    • Unit factors occupy spatial coordinates in the plot.
    • Treatment factors are generally mapped to perceptual status (color, lightness, texture, shape, etc) or a text label.
  • User typically specify the mappings manually – again, all tedious decisions that disrupt the workflow.

Visualisation workflow for experimental designs

Another system

Focusing the mental effort in experimental design

  • Identification of treatment, unit, and mapping.
library(edibble)
crd20 <- design("Completely Randomised Design") %>%
  set_units(subject = 20) %>%
  set_trts(trt = c("A", "B", "C")) %>%
  allot_table(trt ~ subject, order = "random")
crd20
# Completely Randomised Design 
# An edibble: 20 x 2
     subject    trt
     <U(20)> <T(3)>
       <chr>  <chr>
 1 subject01      C
 2 subject02      B
 3 subject03      C
 4 subject04      C
 5 subject05      B
 6 subject06      C
 7 subject07      A
 8 subject08      A
 9 subject09      C
10 subject10      B
11 subject11      A
12 subject12      A
13 subject13      A
14 subject14      B
15 subject15      B
16 subject16      A
17 subject17      B
18 subject18      B
19 subject19      C
20 subject20      C

  • An experimental design table with the roles (unit, treatment) and its relations.

Reducing the mental effort in visualising

library(deggust)
autoplot(crd20)

Still experimental!





deggust takes into account large n and large t

crd100 <- design("CRD") %>%
  set_units(subject = 100) %>%
  set_trts(trt = paste("Drug", LETTERS[1:20])) %>%
  allot_table(trt ~ subject, order = "systematic") 
  
autoplot(crd100, aspect_ratio = 4, random_fills = TRUE, nfill_max = 4)

Multiple color scales

fac5x2 <- design("Factorial design") %>%
  set_units(subject = 20) %>%
  set_trts(drug1 = LETTERS[1:5],
           drug2 = c("a", "b")) %>%
  allot_table(drug1:drug2 ~ subject) 
  
autoplot(fac5x2)

Revisiting the wheat experiment

Unit structure of the wheat experiment

wheat_unit <- set_units(
      row = 4,
      col = 3,
     plot = crossed_by(row, col)) 


Image from Gurung et al. (2012) Comparative analyses of spot blotch and tan spot epidemics on wheat under optimum and late sowing period in South Asia. European Journal of Plant Pathology

Unit structure of the wheat experiment

wheat_unit <- set_units(
      row = 4,
      col = 3,
     plot = crossed_by(row, col),
   subrow = nested_in(plot, 3),
   subcol = nested_in(plot, 4),
  subplot = nested_in(plot, 
                    crossed_by(subrow, subcol))) 


Image from Gurung et al. (2012) Comparative analyses of spot blotch and tan spot epidemics on wheat under optimum and late sowing period in South Asia. European Journal of Plant Pathology

Treatment structure of the wheat experiment

wheat_trt <- set_trts(
  variety = c("V1", "V2", "V3", "V4"),
  planting_date = 3,
  status = c("sprayed", "sampled", "control")) 

trts_table(wheat_trt)
# A tibble: 36 × 3
   variety planting_date  status 
   <chr>   <chr>          <chr>  
 1 V1      planting_date1 sprayed
 2 V2      planting_date1 sprayed
 3 V3      planting_date1 sprayed
 4 V4      planting_date1 sprayed
 5 V1      planting_date2 sprayed
 6 V2      planting_date2 sprayed
 7 V3      planting_date2 sprayed
 8 V4      planting_date2 sprayed
 9 V1      planting_date3 sprayed
10 V2      planting_date3 sprayed
# ℹ 26 more rows


Image from Gurung et al. (2012) Comparative analyses of spot blotch and tan spot epidemics on wheat under optimum and late sowing period in South Asia. European Journal of Plant Pathology

Mappings of the unit factor to treatment factor

wheat_design <- (wheat_unit + wheat_trt) %>% 
  allot_table(planting_date ~ plot,
                    variety ~ subcol,
                     status ~ subrow,
              order = c("random", "random", "systematic"),
              label_nested = c(subcol, subrow, row, col))
  • The result is an experimental design tibble (edibble):
wheat_design
# An edibble: 144 x 9
      row    col    plot  subrow  subcol    subplot variety  planting_date
   <U(4)> <U(3)> <U(12)> <U(36)> <U(48)>   <U(144)>  <T(4)>         <T(3)>
    <chr>  <chr>   <chr>   <chr>   <chr>      <chr>   <chr>          <chr>
 1   row1   col1  plot01 subrow1 subcol1 subplot001      V3 planting_date1
 2   row1   col1  plot01 subrow2 subcol1 subplot002      V3 planting_date1
 3   row1   col1  plot01 subrow3 subcol1 subplot003      V3 planting_date1
 4   row1   col1  plot01 subrow1 subcol2 subplot004      V2 planting_date1
 5   row1   col1  plot01 subrow2 subcol2 subplot005      V2 planting_date1
 6   row1   col1  plot01 subrow3 subcol2 subplot006      V2 planting_date1
 7   row1   col1  plot01 subrow1 subcol3 subplot007      V1 planting_date1
 8   row1   col1  plot01 subrow2 subcol3 subplot008      V1 planting_date1
 9   row1   col1  plot01 subrow3 subcol3 subplot009      V1 planting_date1
10   row1   col1  plot01 subrow1 subcol4 subplot010      V4 planting_date1
# ℹ 134 more rows
# ℹ 1 more variable: status <T(3)>

Easy visualision of the wheat experimental design

autoplot(wheat_design)

autoplot(wheat_design, 
         label_size = 4,
         shape = "square") +
  theme(strip.text = element_blank()) +
  scale_fill_viridis_d()

Key messages

  • Current visualisation systems of the experimental design layout requires mental effort to specify unit factors to (engineered) spatial coordinates, and treatment factors to perceptual status.
  • The deggust R package
    • relieves users from tedious decisions for quick visualisation by exploiting encoded structure from the edibble R package, and
    • is build on the ggplot2 R package so any customisation can be done using a familiar visualisation system (i.e. flexible).
  • Some future directions include:
    • conducting experiments to determine better visualisation defaults,
    • adding more robust testing in the package,
    • better documentation, making visualisation interactive, …

Thanks for listening!

These slides are available at emitanaka.org/slides/ASC2023.

More details about “the grammar of experimental designs” and edibble in arXiv papers:

You can install the R packages as:

install.packages("edibble")
remotes::install_github("emitanaka/edibble") # latest development
remotes::install_github("emitanaka/deggust") # latest development

Get in touch at emi.tanaka@anu.edu.au!
Issues and feature requests at: