A quick and flexible visualisation system for the designs of experiments

Emi Tanaka

Biological Data Science Institute
Australian National University
emi.tanaka@anu.edu.au

14th December 2023

Completely randomised design $(n = 20, t = 3)$

20 subjects
3 treatments (labelled A, B, C)

crd20

     subject trt
1  subject01   C
2  subject02   A
3  subject03   A
4  subject04   B
5  subject05   B
6  subject06   C
7  subject07   C
8  subject08   B
9  subject09   A
10 subject10   A
11 subject11   B
12 subject12   B
13 subject13   C
14 subject14   B
15 subject15   A
16 subject16   B
17 subject17   A
18 subject18   A
19 subject19   C
20 subject20   C

Map the columns in the experimental design table to visual elements.

library(ggplot2)
ggplot(crd20, aes(x = subject, y = 1, color = trt)) +
  geom_point(size = 10)library(ggplot2)
ggplot(crd20, aes(x = subject, y = 1, color = trt)) +
  geom_point(size = 10)

Completely randomised design $(n = 100, t = 3)$

100 subjects
3 treatments (labelled A, B, C)

crd100

       subject trt
1   subject001   A
2   subject002   B
3   subject003   A
4   subject004   C
5   subject005   C
6   subject006   B
7   subject007   B
8   subject008   A
9   subject009   B
10  subject010   C
11  subject011   C
12  subject012   C
13  subject013   B
14  subject014   A
15  subject015   C
16  subject016   C
17  subject017   B
18  subject018   C
19  subject019   A
20  subject020   B
21  subject021   B
22  subject022   B
23  subject023   B
24  subject024   A
25  subject025   C
26  subject026   A
27  subject027   A
28  subject028   C
29  subject029   C
30  subject030   B
31  subject031   B
32  subject032   B
33  subject033   B
34  subject034   C
35  subject035   A
36  subject036   A
37  subject037   C
38  subject038   C
39  subject039   C
40  subject040   A
41  subject041   C
42  subject042   B
43  subject043   C
44  subject044   C
45  subject045   B
46  subject046   C
47  subject047   B
48  subject048   C
49  subject049   A
50  subject050   A
51  subject051   C
52  subject052   C
53  subject053   A
54  subject054   B
55  subject055   B
56  subject056   A
57  subject057   B
58  subject058   C
59  subject059   B
60  subject060   A
61  subject061   C
62  subject062   B
63  subject063   C
64  subject064   A
65  subject065   C
66  subject066   C
67  subject067   A
68  subject068   B
69  subject069   A
70  subject070   B
71  subject071   B
72  subject072   A
73  subject073   B
74  subject074   A
75  subject075   B
76  subject076   A
77  subject077   A
78  subject078   C
79  subject079   C
80  subject080   C
81  subject081   C
82  subject082   B
83  subject083   B
84  subject084   A
85  subject085   C
86  subject086   A
87  subject087   B
88  subject088   B
89  subject089   A
90  subject090   A
91  subject091   A
92  subject092   A
93  subject093   B
94  subject094   B
95  subject095   C
96  subject096   A
97  subject097   A
98  subject098   A
99  subject099   B
100 subject100   A

ggplot(crd100, aes(x = subject, y = 1, color = trt)) +
  geom_point(size = 10)

Doesn’t scale well!

Wickham (2016) ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.

Completely randomised design $(n = 5 \times 20, t = 3)$

Graphical displays are generally rectangular.
Make new coordinates of the units to fit the display better.

An idiosyncratic way to create new coordinates:

library(dplyr)
crd100d <- crd100 %>% 
  mutate(y = rep(1:5, each = 20),
         x = rep(1:20, times = 5)) 

crd100d

       subject trt y  x
1   subject001   A 1  1
2   subject002   B 1  2
3   subject003   A 1  3
4   subject004   C 1  4
5   subject005   C 1  5
6   subject006   B 1  6
7   subject007   B 1  7
8   subject008   A 1  8
9   subject009   B 1  9
10  subject010   C 1 10
11  subject011   C 1 11
12  subject012   C 1 12
13  subject013   B 1 13
14  subject014   A 1 14
15  subject015   C 1 15
16  subject016   C 1 16
17  subject017   B 1 17
18  subject018   C 1 18
19  subject019   A 1 19
20  subject020   B 1 20
21  subject021   B 2  1
22  subject022   B 2  2
23  subject023   B 2  3
24  subject024   A 2  4
25  subject025   C 2  5
26  subject026   A 2  6
27  subject027   A 2  7
28  subject028   C 2  8
29  subject029   C 2  9
30  subject030   B 2 10
31  subject031   B 2 11
32  subject032   B 2 12
33  subject033   B 2 13
34  subject034   C 2 14
35  subject035   A 2 15
36  subject036   A 2 16
37  subject037   C 2 17
38  subject038   C 2 18
39  subject039   C 2 19
40  subject040   A 2 20
41  subject041   C 3  1
42  subject042   B 3  2
43  subject043   C 3  3
44  subject044   C 3  4
45  subject045   B 3  5
46  subject046   C 3  6
47  subject047   B 3  7
48  subject048   C 3  8
49  subject049   A 3  9
50  subject050   A 3 10
51  subject051   C 3 11
52  subject052   C 3 12
53  subject053   A 3 13
54  subject054   B 3 14
55  subject055   B 3 15
56  subject056   A 3 16
57  subject057   B 3 17
58  subject058   C 3 18
59  subject059   B 3 19
60  subject060   A 3 20
61  subject061   C 4  1
62  subject062   B 4  2
63  subject063   C 4  3
64  subject064   A 4  4
65  subject065   C 4  5
66  subject066   C 4  6
67  subject067   A 4  7
68  subject068   B 4  8
69  subject069   A 4  9
70  subject070   B 4 10
71  subject071   B 4 11
72  subject072   A 4 12
73  subject073   B 4 13
74  subject074   A 4 14
75  subject075   B 4 15
76  subject076   A 4 16
77  subject077   A 4 17
78  subject078   C 4 18
79  subject079   C 4 19
80  subject080   C 4 20
81  subject081   C 5  1
82  subject082   B 5  2
83  subject083   B 5  3
84  subject084   A 5  4
85  subject085   C 5  5
86  subject086   A 5  6
87  subject087   B 5  7
88  subject088   B 5  8
89  subject089   A 5  9
90  subject090   A 5 10
91  subject091   A 5 11
92  subject092   A 5 12
93  subject093   B 5 13
94  subject094   B 5 14
95  subject095   C 5 15
96  subject096   A 5 16
97  subject097   A 5 17
98  subject098   A 5 18
99  subject099   B 5 19
100 subject100   A 5 20

ggplot(crd100d, aes(x, y, color = trt)) +
  geom_point(size = 10)

Order of units is unclear

Completely randomised design $(n = 20, t = 10)$

20 subjects
10 treatments

crdt10

     subject trt
1  subject01   G
2  subject02   I
3  subject03   J
4  subject04   A
5  subject05   D
6  subject06   F
7  subject07   G
8  subject08   H
9  subject09   I
10 subject10   A
11 subject11   H
12 subject12   B
13 subject13   B
14 subject14   C
15 subject15   E
16 subject16   J
17 subject17   E
18 subject18   D
19 subject19   C
20 subject20   F

Color doesn’t scale well with the number of treatments.
We struggle to discriminate even 7 color palettes¹.

Code

ggplot(crdt10, aes(x = subject, y = 1)) +
  geom_point(aes(color = trt), size = 10) +
  theme(legend.position = "bottom")

Text label doesn’t take advantage of pre-attentive processing².

Code

ggplot(crdt10, aes(x = subject, y = 1)) +
  geom_point(size = 14, color = "lightgrey") +
  geom_text(aes(label = trt), size = 10)

Factorial design $(n = 20, t = 5 \times 2)$

20 subjects
5 drug types
2 doses

fac5x2

     subject drug dose
1  subject01    A High
2  subject02    C  Low
3  subject03    D  Low
4  subject04    D  Low
5  subject05    A  Low
6  subject06    B High
7  subject07    B High
8  subject08    E High
9  subject09    B  Low
10 subject10    B  Low
11 subject11    C High
12 subject12    E High
13 subject13    D High
14 subject14    A  Low
15 subject15    A High
16 subject16    C  Low
17 subject17    E  Low
18 subject18    D High
19 subject19    E  Low
20 subject20    C High

drug is mapped to color
dose is mapped to shape

Code

ggplot(fac5x2, aes(x = subject, y = 1)) +
  geom_point(aes(color = drug, shape = dose), size = 10) + 
  theme(legend.position = "bottom")

In this instance, each treatment factor has to be mapped to some visual property.

`deggust` takes into account large $n$ and large $t$

crd100 <- design("CRD") %>%
  set_units(subject = 100) %>%
  set_trts(trt = paste("Drug", LETTERS[1:20])) %>%
  allot_table(trt ~ subject, order = "systematic") 
  
autoplot(crd100, aspect_ratio = 4, random_fills = TRUE, nfill_max = 4)crd100 <- design("CRD") %>%
  set_units(subject = 100) %>%
  set_trts(trt = paste("Drug", LETTERS[1:20])) %>%
  allot_table(trt ~ subject, order = "systematic") 
  
autoplot(crd100, aspect_ratio = 4, random_fills = TRUE, nfill_max = 4)crd100 <- design("CRD") %>%
  set_units(subject = 100) %>%
  set_trts(trt = paste("Drug", LETTERS[1:20])) %>%
  allot_table(trt ~ subject, order = "systematic") 
  
autoplot(crd100, aspect_ratio = 4, random_fills = TRUE, nfill_max = 4)crd100 <- design("CRD") %>%
  set_units(subject = 100) %>%
  set_trts(trt = paste("Drug", LETTERS[1:20])) %>%
  allot_table(trt ~ subject, order = "systematic") 
  
autoplot(crd100, aspect_ratio = 4, random_fills = TRUE, nfill_max = 4)

Mappings of the unit factor to treatment factor

wheat_design <- (wheat_unit + wheat_trt) %>% 
  allot_table(planting_date ~ plot,
                    variety ~ subcol,
                     status ~ subrow,
              order = c("random", "random", "systematic"),
              label_nested = c(subcol, subrow, row, col))wheat_design <- (wheat_unit + wheat_trt) %>% 
  allot_table(planting_date ~ plot,
                    variety ~ subcol,
                     status ~ subrow,
              order = c("random", "random", "systematic"),
              label_nested = c(subcol, subrow, row, col))wheat_design <- (wheat_unit + wheat_trt) %>% 
  allot_table(planting_date ~ plot,
                    variety ~ subcol,
                     status ~ subrow,
              order = c("random", "random", "systematic"),
              label_nested = c(subcol, subrow, row, col))wheat_design <- (wheat_unit + wheat_trt) %>% 
  allot_table(planting_date ~ plot,
                    variety ~ subcol,
                     status ~ subrow,
              order = c("random", "random", "systematic"),
              label_nested = c(subcol, subrow, row, col))

The result is an experimental design tibble (edibble):

wheat_design

# An edibble: 144 x 9
      row    col    plot  subrow  subcol    subplot variety  planting_date
   <U(4)> <U(3)> <U(12)> <U(36)> <U(48)>   <U(144)>  <T(4)>         <T(3)>
    <chr>  <chr>   <chr>   <chr>   <chr>      <chr>   <chr>          <chr>
 1   row1   col1  plot01 subrow1 subcol1 subplot001      V3 planting_date1
 2   row1   col1  plot01 subrow2 subcol1 subplot002      V3 planting_date1
 3   row1   col1  plot01 subrow3 subcol1 subplot003      V3 planting_date1
 4   row1   col1  plot01 subrow1 subcol2 subplot004      V2 planting_date1
 5   row1   col1  plot01 subrow2 subcol2 subplot005      V2 planting_date1
 6   row1   col1  plot01 subrow3 subcol2 subplot006      V2 planting_date1
 7   row1   col1  plot01 subrow1 subcol3 subplot007      V1 planting_date1
 8   row1   col1  plot01 subrow2 subcol3 subplot008      V1 planting_date1
 9   row1   col1  plot01 subrow3 subcol3 subplot009      V1 planting_date1
10   row1   col1  plot01 subrow1 subcol4 subplot010      V4 planting_date1
# ℹ 134 more rows
# ℹ 1 more variable: status <T(3)>

1 / 29

A quick and flexible visualisation system for the designs of experiments Emi Tanaka Biological Data Science Institute Australian National University emi.tanaka@anu.edu.au 14th December 2023

A quick and flexible visualisation system for the designs of experiments
What visualisation for experimental designs?
The functional aim of the experimental design layout
One visualisation...
Motivating examples
Completely randomised design (n = 20, t = 3)
Completely randomised design (n = 100, t = 3)
Completely randomised design (n = 5 \times 20, t = 3)
Guides for the order of units
Completely randomised design (n = 20, t = 10)
Factorial design (n = 20, t = 5 \times 2)
Real experiments
Wheat experiment
Visualising the wheat experiment
Existing visualisation systems
Visualisation workflow for experimental designs
Another system
Focusing the mental effort in experimental design
Reducing the mental effort in visualising
deggust takes into account large n and large t
Multiple color scales
Revisiting the wheat experiment
Unit structure of the wheat experiment
Unit structure of the wheat experiment
Treatment structure of the wheat experiment
Mappings of the unit factor to treatment factor
Easy visualision of the wheat experimental design
Key messages
Thanks for listening!