A quick and flexible visualisation system for the designs of experiments
Emi Tanaka
Biological Data Science Institute Australian National University emi.tanaka@anu.edu.au
14th December 2023
What visualisation for experimental designs?
The type of visualisation we seek may be:
A flow chart
Aggregated summaries by experimental structure
Before or after data collection.
Experimental design layout
Today will be about the static visualisation of the experimental design layout.
The functional aim of the experimental design layout
Visualisation of experimental design layout can help us:
communicate and confirm our understanding of the experimental design, and
reveal functional characteristics (e.g. whether treatment randomisation was done as expected) of the experimental design.
We want the visualisation system to be:
quick to draw the layout for rapid communication, and
flexible to make polished visualisations for publishing in formal outlets.
Talk title: A quick and flexible visualisation system for the designs of experiments
One visualisation system to rule them all?
Motivating examples
Visualisation constructed via the ggplot2 R package to illustrate the friction in the workflow
The broad idea extends to other visualisation systems and other languages
Completely randomised design (n=20,t=3)
20 subjects
3 treatments (labelled A, B, C)
crd20
subject trt
1 subject01 C
2 subject02 A
3 subject03 A
4 subject04 B
5 subject05 B
6 subject06 C
7 subject07 C
8 subject08 B
9 subject09 A
10 subject10 A
11 subject11 B
12 subject12 B
13 subject13 C
14 subject14 B
15 subject15 A
16 subject16 B
17 subject17 A
18 subject18 A
19 subject19 C
20 subject20 C
Map the columns in the experimental design table to visual elements.
library(ggplot2)ggplot(crd20, aes(x = subject, y =1, color = trt)) +geom_point(size =10)library(ggplot2)ggplot(crd20, aes(x = subject, y =1, color = trt)) +geom_point(size =10)
Completely randomised design (n=100,t=3)
100 subjects
3 treatments (labelled A, B, C)
crd100
subject trt
1 subject001 A
2 subject002 B
3 subject003 A
4 subject004 C
5 subject005 C
6 subject006 B
7 subject007 B
8 subject008 A
9 subject009 B
10 subject010 C
11 subject011 C
12 subject012 C
13 subject013 B
14 subject014 A
15 subject015 C
16 subject016 C
17 subject017 B
18 subject018 C
19 subject019 A
20 subject020 B
21 subject021 B
22 subject022 B
23 subject023 B
24 subject024 A
25 subject025 C
26 subject026 A
27 subject027 A
28 subject028 C
29 subject029 C
30 subject030 B
31 subject031 B
32 subject032 B
33 subject033 B
34 subject034 C
35 subject035 A
36 subject036 A
37 subject037 C
38 subject038 C
39 subject039 C
40 subject040 A
41 subject041 C
42 subject042 B
43 subject043 C
44 subject044 C
45 subject045 B
46 subject046 C
47 subject047 B
48 subject048 C
49 subject049 A
50 subject050 A
51 subject051 C
52 subject052 C
53 subject053 A
54 subject054 B
55 subject055 B
56 subject056 A
57 subject057 B
58 subject058 C
59 subject059 B
60 subject060 A
61 subject061 C
62 subject062 B
63 subject063 C
64 subject064 A
65 subject065 C
66 subject066 C
67 subject067 A
68 subject068 B
69 subject069 A
70 subject070 B
71 subject071 B
72 subject072 A
73 subject073 B
74 subject074 A
75 subject075 B
76 subject076 A
77 subject077 A
78 subject078 C
79 subject079 C
80 subject080 C
81 subject081 C
82 subject082 B
83 subject083 B
84 subject084 A
85 subject085 C
86 subject086 A
87 subject087 B
88 subject088 B
89 subject089 A
90 subject090 A
91 subject091 A
92 subject092 A
93 subject093 B
94 subject094 B
95 subject095 C
96 subject096 A
97 subject097 A
98 subject098 A
99 subject099 B
100 subject100 A
ggplot(crd100, aes(x = subject, y =1, color = trt)) +geom_point(size =10)
Doesn’t scale well!
Wickham (2016) ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.
Completely randomised design (n=5×20,t=3)
Graphical displays are generally rectangular.
Make new coordinates of the units to fit the display better.
An idiosyncratic way to create new coordinates:
library(dplyr)crd100d <- crd100 %>%mutate(y =rep(1:5, each =20),x =rep(1:20, times =5)) crd100d
subject trt y x
1 subject001 A 1 1
2 subject002 B 1 2
3 subject003 A 1 3
4 subject004 C 1 4
5 subject005 C 1 5
6 subject006 B 1 6
7 subject007 B 1 7
8 subject008 A 1 8
9 subject009 B 1 9
10 subject010 C 1 10
11 subject011 C 1 11
12 subject012 C 1 12
13 subject013 B 1 13
14 subject014 A 1 14
15 subject015 C 1 15
16 subject016 C 1 16
17 subject017 B 1 17
18 subject018 C 1 18
19 subject019 A 1 19
20 subject020 B 1 20
21 subject021 B 2 1
22 subject022 B 2 2
23 subject023 B 2 3
24 subject024 A 2 4
25 subject025 C 2 5
26 subject026 A 2 6
27 subject027 A 2 7
28 subject028 C 2 8
29 subject029 C 2 9
30 subject030 B 2 10
31 subject031 B 2 11
32 subject032 B 2 12
33 subject033 B 2 13
34 subject034 C 2 14
35 subject035 A 2 15
36 subject036 A 2 16
37 subject037 C 2 17
38 subject038 C 2 18
39 subject039 C 2 19
40 subject040 A 2 20
41 subject041 C 3 1
42 subject042 B 3 2
43 subject043 C 3 3
44 subject044 C 3 4
45 subject045 B 3 5
46 subject046 C 3 6
47 subject047 B 3 7
48 subject048 C 3 8
49 subject049 A 3 9
50 subject050 A 3 10
51 subject051 C 3 11
52 subject052 C 3 12
53 subject053 A 3 13
54 subject054 B 3 14
55 subject055 B 3 15
56 subject056 A 3 16
57 subject057 B 3 17
58 subject058 C 3 18
59 subject059 B 3 19
60 subject060 A 3 20
61 subject061 C 4 1
62 subject062 B 4 2
63 subject063 C 4 3
64 subject064 A 4 4
65 subject065 C 4 5
66 subject066 C 4 6
67 subject067 A 4 7
68 subject068 B 4 8
69 subject069 A 4 9
70 subject070 B 4 10
71 subject071 B 4 11
72 subject072 A 4 12
73 subject073 B 4 13
74 subject074 A 4 14
75 subject075 B 4 15
76 subject076 A 4 16
77 subject077 A 4 17
78 subject078 C 4 18
79 subject079 C 4 19
80 subject080 C 4 20
81 subject081 C 5 1
82 subject082 B 5 2
83 subject083 B 5 3
84 subject084 A 5 4
85 subject085 C 5 5
86 subject086 A 5 6
87 subject087 B 5 7
88 subject088 B 5 8
89 subject089 A 5 9
90 subject090 A 5 10
91 subject091 A 5 11
92 subject092 A 5 12
93 subject093 B 5 13
94 subject094 B 5 14
95 subject095 C 5 15
96 subject096 A 5 16
97 subject097 A 5 17
98 subject098 A 5 18
99 subject099 B 5 19
100 subject100 A 5 20
ggplot(crd100d, aes(x, y, color = trt)) +geom_point(size =10)
Order of units is unclear
Guides for the order of units
By a path:
Code
crd100d <- crd100 %>%mutate(y =rep(1:5, each =20),x =rep(1:20, times =5),xp =case_when(row_number() ==1~ x -1.5,row_number() ==n() ~ x +1.5,.default = x)) ggplot(crd100d, aes(x, y)) +geom_path(aes(x = xp),linewidth =2, arrow = grid::arrow()) +geom_point(aes(color = trt), size =10)
You have to carefully engineer new coordinates for each design. Not for a frictionless workflow.
Completely randomised design (n=20,t=10)
20 subjects
10 treatments
crdt10
subject trt
1 subject01 G
2 subject02 I
3 subject03 J
4 subject04 A
5 subject05 D
6 subject06 F
7 subject07 G
8 subject08 H
9 subject09 I
10 subject10 A
11 subject11 H
12 subject12 B
13 subject13 B
14 subject14 C
15 subject15 E
16 subject16 J
17 subject17 E
18 subject18 D
19 subject19 C
20 subject20 F
Color doesn’t scale well with the number of treatments.
We struggle to discriminate even 7 color palettes1.
Text label doesn’t take advantage of pre-attentive processing2.
Code
ggplot(crdt10, aes(x = subject, y =1)) +geom_point(size =14, color ="lightgrey") +geom_text(aes(label = trt), size =10)
Factorial design (n=20,t=5×2)
20 subjects
5 drug types
2 doses
fac5x2
subject drug dose
1 subject01 A High
2 subject02 C Low
3 subject03 D Low
4 subject04 D Low
5 subject05 A Low
6 subject06 B High
7 subject07 B High
8 subject08 E High
9 subject09 B Low
10 subject10 B Low
11 subject11 C High
12 subject12 E High
13 subject13 D High
14 subject14 A Low
15 subject15 A High
16 subject16 C Low
17 subject17 E Low
18 subject18 D High
19 subject19 E Low
20 subject20 C High
# Completely Randomised Design
# An edibble: 20 x 2
subject trt
<U(20)> <T(3)>
<chr> <chr>
1 subject01 C
2 subject02 B
3 subject03 C
4 subject04 C
5 subject05 B
6 subject06 C
7 subject07 A
8 subject08 A
9 subject09 C
10 subject10 B
11 subject11 A
12 subject12 A
13 subject13 A
14 subject14 B
15 subject15 B
16 subject16 A
17 subject17 B
18 subject18 B
19 subject19 C
20 subject20 C
An experimental design table with the roles (unit, treatment) and its relations.
Image from Gurung et al. (2012) Comparative analyses of spot blotch and tan spot epidemics on wheat under optimum and late sowing period in South Asia. European Journal of Plant Pathology
Image from Gurung et al. (2012) Comparative analyses of spot blotch and tan spot epidemics on wheat under optimum and late sowing period in South Asia. European Journal of Plant Pathology
Image from Gurung et al. (2012) Comparative analyses of spot blotch and tan spot epidemics on wheat under optimum and late sowing period in South Asia. European Journal of Plant Pathology
Current visualisation systems of the experimental design layout requires mental effort to specify unit factors to (engineered) spatial coordinates, and treatment factors to perceptual status.
The deggust R package
relieves users from tedious decisions for quick visualisation by exploiting encoded structure from the edibble R package, and
is build on the ggplot2 R package so any customisation can be done using a familiar visualisation system (i.e. flexible).
Some future directions include:
conducting experiments to determine better visualisation defaults,
adding more robust testing in the package,
better documentation, making visualisation interactive, …
More details about “the grammar of experimental designs” and edibble in arXiv papers:
Tanaka (2023) edibble: An R package to encapsulate elements of experimental designs for better planning, management and workflow. emitanaka.org/paper-edibble
A quick and flexible visualisation system for the designs of experiments Emi Tanaka Biological Data Science Institute Australian National University emi.tanaka@anu.edu.au 14th December 2023