Biological Data Science Institute
30th November 2023
Today is about the lone statistician ensuring their proposed experimental design is sensible.
Define a desirable future and then works backwards.
An example
Aim: Study production of glasshouse tomatoes under different air and soil temperatures.
edibble
R package will be used to encode the experimental design (then later simulate the experimental data).
tomato_assign1 <- tomato_mapping +
assign_trts(order = c("random", "random", "systematic-slowest"), seed = 1) +
serve_table(label_nested = c(plot, trough))
tomato_assign2 <- tomato_mapping +
assign_trts(order = c("random", "random", "blocksdesign"), seed = 1) +
serve_table(label_nested = c(plot, trough))
⚖️ Domain experts (or the literature) generally have some idea about sensible values or distribution of the experimental data.
compartment | trough | plot | tomato | air_temp | heated | soil_temp |
---|---|---|---|---|---|---|
compartment1 | trough1 | plot1 | tomato01 | 13°C | Yes | 21°C |
compartment1 | trough1 | plot1 | tomato02 | 13°C | Yes | 21°C |
compartment1 | trough1 | plot1 | tomato03 | 13°C | Yes | 21°C |
compartment1 | trough1 | plot2 | tomato04 | 13°C | No | Ambient |
compartment1 | trough1 | plot2 | tomato05 | 13°C | No | Ambient |
compartment1 | trough1 | plot2 | tomato06 | 13°C | No | Ambient |
compartment1 | trough2 | plot1 | tomato07 | 13°C | No | Ambient |
compartment1 | trough2 | plot1 | tomato08 | 13°C | No | Ambient |
compartment1 | trough2 | plot1 | tomato09 | 13°C | No | Ambient |
compartment1 | trough2 | plot2 | tomato10 | 13°C | Yes | 18°C |
compartment1 | trough2 | plot2 | tomato11 | 13°C | Yes | 18°C |
compartment1 | trough2 | plot2 | tomato12 | 13°C | Yes | 18°C |
compartment2 | trough1 | plot1 | tomato13 | 13°C | Yes | 18°C |
compartment2 | trough1 | plot1 | tomato14 | 13°C | Yes | 18°C |
compartment2 | trough1 | plot1 | tomato15 | 13°C | Yes | 18°C |
compartment2 | trough1 | plot2 | tomato16 | 13°C | No | Ambient |
compartment2 | trough1 | plot2 | tomato17 | 13°C | No | Ambient |
compartment2 | trough1 | plot2 | tomato18 | 13°C | No | Ambient |
compartment2 | trough2 | plot1 | tomato19 | 13°C | Yes | 21°C |
compartment2 | trough2 | plot1 | tomato20 | 13°C | Yes | 21°C |
compartment2 | trough2 | plot1 | tomato21 | 13°C | Yes | 21°C |
compartment2 | trough2 | plot2 | tomato22 | 13°C | No | Ambient |
compartment2 | trough2 | plot2 | tomato23 | 13°C | No | Ambient |
compartment2 | trough2 | plot2 | tomato24 | 13°C | No | Ambient |
compartment3 | trough1 | plot1 | tomato25 | 16°C | No | Ambient |
compartment3 | trough1 | plot1 | tomato26 | 16°C | No | Ambient |
compartment3 | trough1 | plot1 | tomato27 | 16°C | No | Ambient |
compartment3 | trough1 | plot2 | tomato28 | 16°C | Yes | 18°C |
compartment3 | trough1 | plot2 | tomato29 | 16°C | Yes | 18°C |
compartment3 | trough1 | plot2 | tomato30 | 16°C | Yes | 18°C |
compartment3 | trough2 | plot1 | tomato31 | 16°C | No | Ambient |
compartment3 | trough2 | plot1 | tomato32 | 16°C | No | Ambient |
compartment3 | trough2 | plot1 | tomato33 | 16°C | No | Ambient |
compartment3 | trough2 | plot2 | tomato34 | 16°C | Yes | 21°C |
compartment3 | trough2 | plot2 | tomato35 | 16°C | Yes | 21°C |
compartment3 | trough2 | plot2 | tomato36 | 16°C | Yes | 21°C |
compartment4 | trough1 | plot1 | tomato37 | 13°C | No | Ambient |
compartment4 | trough1 | plot1 | tomato38 | 13°C | No | Ambient |
compartment4 | trough1 | plot1 | tomato39 | 13°C | No | Ambient |
compartment4 | trough1 | plot2 | tomato40 | 13°C | Yes | 21°C |
compartment4 | trough1 | plot2 | tomato41 | 13°C | Yes | 21°C |
compartment4 | trough1 | plot2 | tomato42 | 13°C | Yes | 21°C |
compartment4 | trough2 | plot1 | tomato43 | 13°C | Yes | 18°C |
compartment4 | trough2 | plot1 | tomato44 | 13°C | Yes | 18°C |
compartment4 | trough2 | plot1 | tomato45 | 13°C | Yes | 18°C |
compartment4 | trough2 | plot2 | tomato46 | 13°C | No | Ambient |
compartment4 | trough2 | plot2 | tomato47 | 13°C | No | Ambient |
compartment4 | trough2 | plot2 | tomato48 | 13°C | No | Ambient |
compartment5 | trough1 | plot1 | tomato49 | 13°C | Yes | 21°C |
compartment5 | trough1 | plot1 | tomato50 | 13°C | Yes | 21°C |
compartment5 | trough1 | plot1 | tomato51 | 13°C | Yes | 21°C |
compartment5 | trough1 | plot2 | tomato52 | 13°C | No | Ambient |
compartment5 | trough1 | plot2 | tomato53 | 13°C | No | Ambient |
compartment5 | trough1 | plot2 | tomato54 | 13°C | No | Ambient |
compartment5 | trough2 | plot1 | tomato55 | 13°C | Yes | 18°C |
compartment5 | trough2 | plot1 | tomato56 | 13°C | Yes | 18°C |
compartment5 | trough2 | plot1 | tomato57 | 13°C | Yes | 18°C |
compartment5 | trough2 | plot2 | tomato58 | 13°C | No | Ambient |
compartment5 | trough2 | plot2 | tomato59 | 13°C | No | Ambient |
compartment5 | trough2 | plot2 | tomato60 | 13°C | No | Ambient |
compartment6 | trough1 | plot1 | tomato61 | 16°C | Yes | 18°C |
compartment6 | trough1 | plot1 | tomato62 | 16°C | Yes | 18°C |
compartment6 | trough1 | plot1 | tomato63 | 16°C | Yes | 18°C |
compartment6 | trough1 | plot2 | tomato64 | 16°C | No | Ambient |
compartment6 | trough1 | plot2 | tomato65 | 16°C | No | Ambient |
compartment6 | trough1 | plot2 | tomato66 | 16°C | No | Ambient |
compartment6 | trough2 | plot1 | tomato67 | 16°C | Yes | 21°C |
compartment6 | trough2 | plot1 | tomato68 | 16°C | Yes | 21°C |
compartment6 | trough2 | plot1 | tomato69 | 16°C | Yes | 21°C |
compartment6 | trough2 | plot2 | tomato70 | 16°C | No | Ambient |
compartment6 | trough2 | plot2 | tomato71 | 16°C | No | Ambient |
compartment6 | trough2 | plot2 | tomato72 | 16°C | No | Ambient |
compartment7 | trough1 | plot1 | tomato73 | 16°C | No | Ambient |
compartment7 | trough1 | plot1 | tomato74 | 16°C | No | Ambient |
compartment7 | trough1 | plot1 | tomato75 | 16°C | No | Ambient |
compartment7 | trough1 | plot2 | tomato76 | 16°C | Yes | 21°C |
compartment7 | trough1 | plot2 | tomato77 | 16°C | Yes | 21°C |
compartment7 | trough1 | plot2 | tomato78 | 16°C | Yes | 21°C |
compartment7 | trough2 | plot1 | tomato79 | 16°C | No | Ambient |
compartment7 | trough2 | plot1 | tomato80 | 16°C | No | Ambient |
compartment7 | trough2 | plot1 | tomato81 | 16°C | No | Ambient |
compartment7 | trough2 | plot2 | tomato82 | 16°C | Yes | 18°C |
compartment7 | trough2 | plot2 | tomato83 | 16°C | Yes | 18°C |
compartment7 | trough2 | plot2 | tomato84 | 16°C | Yes | 18°C |
compartment8 | trough1 | plot1 | tomato85 | 16°C | No | Ambient |
compartment8 | trough1 | plot1 | tomato86 | 16°C | No | Ambient |
compartment8 | trough1 | plot1 | tomato87 | 16°C | No | Ambient |
compartment8 | trough1 | plot2 | tomato88 | 16°C | Yes | 21°C |
compartment8 | trough1 | plot2 | tomato89 | 16°C | Yes | 21°C |
compartment8 | trough1 | plot2 | tomato90 | 16°C | Yes | 21°C |
compartment8 | trough2 | plot1 | tomato91 | 16°C | No | Ambient |
compartment8 | trough2 | plot1 | tomato92 | 16°C | No | Ambient |
compartment8 | trough2 | plot1 | tomato93 | 16°C | No | Ambient |
compartment8 | trough2 | plot2 | tomato94 | 16°C | Yes | 18°C |
compartment8 | trough2 | plot2 | tomato95 | 16°C | Yes | 18°C |
compartment8 | trough2 | plot2 | tomato96 | 16°C | Yes | 18°C |
Some bespoke approaches to simulating data:
library(dplyr)
sim_bespoke <- tomato_expt %>%
mutate(yield_same = 1) %>%
mutate(yield_null = rnorm(n(), 5, 1)) %>%
mutate(yield_trts = 5 + c("13°C" = -1.1, "16°C" = 0)[air_temp] +
c("18°C" = 1.3, "21°C" = 2.1, "Ambient" = 0)[soil_temp] +
rnorm(n(), 0, 1)) %>%
mutate(yield_sensible = 5 + yield_trts +
rnorm(nlevels(compartment), 0, 1)[index_levels(compartment)] +
rnorm(nlevels(trough), 0, 1)[index_levels(trough)] +
rnorm(nlevels(plot), 0, 1)[index_levels(plot)])
yield_same
: A quick dummy response data. yield_null
: Ignore influence of treatment and unit structures. yield_trts
: Considers the treatment effects (more realistic). yield_sensible
: Adds possible unit variations on top of the treatment effects (more sensible).
Error: compartment
Df Sum Sq Mean Sq F value Pr(>F)
air_temp 1 3.782e-30 3.782e-30 1 0.356
Residuals 6 2.269e-29 3.782e-30
Error: compartment:trough
Df Sum Sq Mean Sq F value Pr(>F)
soil_temp 1 3.782e-30 3.782e-30 1 0.356
air_temp:soil_temp 1 3.782e-30 3.782e-30 1 0.356
Residuals 6 2.269e-29 3.782e-30
Error: compartment:trough:plot
Df Sum Sq Mean Sq F value Pr(>F)
soil_temp 2 7.560e-30 3.782e-30 1 0.397
air_temp:soil_temp 2 7.560e-30 3.782e-30 1 0.397
Residuals 12 4.538e-29 3.782e-30
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 64 2.42e-28 3.782e-30
Error: compartment
Df Sum Sq Mean Sq F value Pr(>F)
air_temp 1 0.169 0.1691 0.093 0.771
Residuals 6 10.957 1.8261
Error: compartment:trough
Df Sum Sq Mean Sq F value Pr(>F)
soil_temp 1 0.432 0.4325 0.260 0.628
air_temp:soil_temp 1 0.080 0.0803 0.048 0.833
Residuals 6 9.989 1.6649
Error: compartment:trough:plot
Df Sum Sq Mean Sq F value Pr(>F)
soil_temp 2 0.618 0.3090 0.232 0.797
air_temp:soil_temp 2 1.391 0.6953 0.521 0.607
Residuals 12 16.004 1.3337
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 64 60.4 0.9437
Error: compartment
Df Sum Sq Mean Sq F value Pr(>F)
air_temp 1 15.873 15.873 20.07 0.00419 **
Residuals 6 4.745 0.791
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Error: compartment:trough
Df Sum Sq Mean Sq F value Pr(>F)
soil_temp 1 2.899 2.899 63.74 0.000206 ***
air_temp:soil_temp 1 3.815 3.815 83.87 9.54e-05 ***
Residuals 6 0.273 0.045
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Error: compartment:trough:plot
Df Sum Sq Mean Sq F value Pr(>F)
soil_temp 2 48.71 24.357 63.872 4.01e-07 ***
air_temp:soil_temp 2 1.08 0.542 1.422 0.279
Residuals 12 4.58 0.381
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 64 101.5 1.586
Error: compartment
Df Sum Sq Mean Sq F value Pr(>F)
air_temp 1 0.67 0.674 0.03 0.869
Residuals 6 136.41 22.735
Error: compartment:trough
Df Sum Sq Mean Sq F value Pr(>F)
soil_temp 1 29.33 29.331 7.447 0.0342 *
air_temp:soil_temp 1 29.63 29.630 7.523 0.0336 *
Residuals 6 23.63 3.938
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Error: compartment:trough:plot
Df Sum Sq Mean Sq F value Pr(>F)
soil_temp 2 82.23 41.11 11.275 0.00176 **
air_temp:soil_temp 2 3.74 1.87 0.514 0.61096
Residuals 12 43.76 3.65
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 64 101.5 1.586
edibble
tomato_sim <- tomato_expt %>%
simulate_process(
yield = function(air = -1.1, soil = c(1.3, 2.1)) {
air_effects <- c("13°C" = air, "16°C" = 0)
soil_effects <- c("18°C" = soil[1], "21°C" = soil[2], "Ambient" = 0)
compartment_noise <- rnorm(nlevels(compartment), 0, 1)
trough_noise <- rnorm(nlevels(trough), 0, 1)
plot_noise <- rnorm(nlevels(plot), 0, 1)
tomato_noise <- rnorm(nlevels(tomato), 0, 1)
5 + air_effects[air_temp] + soil_effects[soil_temp] +
compartment_noise[index_levels(compartment)] +
trough_noise[index_levels(trough)] +
plot_noise[index_levels(plot)] +
tomato_noise[index_levels(tomato)]
})
edibble
A tool to easily simulate valid experimental data
A tool to easily simulate valid experimental data
Why can’t we have an automated good-enough simulation?
edibble
# An edibble: 96 x 12
compartment trough plot tomato air_temp heated soil_temp shape weight ntomatoes yield .sim
<U(8)> <U(16)> <U(32)> <U(96)> <T(2)> <T(2)> <T(3)>
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <fct> <dbl> <int> <dbl> <int>
1 compartment1 trough1 plot1 tomato01 13°C Yes 21°C flat 304. 38 5.40 1
2 compartment1 trough1 plot1 tomato02 13°C Yes 21°C other 466. 38 5.40 1
3 compartment1 trough1 plot1 tomato03 13°C Yes 21°C flat 405. 38 5.40 1
4 compartment1 trough1 plot2 tomato04 13°C No Ambient flat 574. 18 3.96 1
5 compartment1 trough1 plot2 tomato05 13°C No Ambient flat 482. 18 3.96 1
6 compartment1 trough1 plot2 tomato06 13°C No Ambient other 299. 18 3.96 1
7 compartment1 trough2 plot1 tomato07 13°C No Ambient round 377. 51 2.82 1
8 compartment1 trough2 plot1 tomato08 13°C No Ambient round 488. 51 2.82 1
9 compartment1 trough2 plot1 tomato09 13°C No Ambient flat 613. 51 2.82 1
10 compartment1 trough2 plot2 tomato10 13°C Yes 18°C round 522. 22 0.795 1
# ℹ 86 more rows
Error: compartment
Df Sum Sq Mean Sq F value Pr(>F)
air_temp 1 7.37 7.372 0.853 0.391
Residuals 6 51.87 8.646
Error: compartment:trough
Df Sum Sq Mean Sq F value Pr(>F)
soil_temp 1 16.00 16.00 0.811 0.403
air_temp:soil_temp 1 59.52 59.52 3.017 0.133
Residuals 6 118.39 19.73
Error: compartment:trough:plot
Df Sum Sq Mean Sq F value Pr(>F)
soil_temp 2 84.94 42.47 3.956 0.0479 *
air_temp:soil_temp 2 80.90 40.45 3.768 0.0537 .
Residuals 12 128.83 10.74
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 64 1.355e-27 2.117e-29
:::
simulate_process(
yield = function ()
{
plot_effects <- rnorm(32, 0, 7.5)
air_temp_effects <- rnorm(2, 0, 4.7)
y <- plot_effects[index_levels(plot)] + air_temp_effects[index_levels(air_temp)]
rescale_values(y, lower = 0, upper = 10)
} )
$air_temp_effects
[1] 3.487784 0.693595
$plot_effects
[1] 2.63182298 -1.30910197 -4.43571353 -10.00520446 -8.22973876 15.27077707 -2.44867195 5.80503909 5.88754801 5.72434560 2.21106570 -9.39266943 -7.57127815 5.63543396 -9.81265134 3.95655073 -4.00154680 -2.98782011 -5.92177088 -1.72605852 6.57888631 3.40299883 -1.74348111 6.52504144 12.42002801 -0.04776697 3.52867090 2.08663987 -7.33427206 -6.94939607 14.39827847 6.60958341
$y
[1] 6.1196073 6.1196073 6.1196073 2.1786824 2.1786824 2.1786824 -0.9479292 -0.9479292 -0.9479292 -6.5174201 -6.5174201 -6.5174201 -4.7419544 -4.7419544 -4.7419544 18.7585614 18.7585614 18.7585614 1.0391124 1.0391124 1.0391124 9.2928234 9.2928234 9.2928234 6.5811430 6.5811430 6.5811430 6.4179406 6.4179406 6.4179406 2.9046607 2.9046607 2.9046607 -8.6990744 -8.6990744 -8.6990744 -4.0834938 -4.0834938 -4.0834938 9.1232183 9.1232183 9.1232183 -6.3248670 -6.3248670 -6.3248670 7.4443351 7.4443351 7.4443351 -0.5137625 -0.5137625 -0.5137625 0.4999642 0.4999642 0.4999642 -2.4339866 -2.4339866 -2.4339866 1.7617258 1.7617258 1.7617258 7.2724813 7.2724813 7.2724813 4.0965938 4.0965938 4.0965938 -1.0498861 -1.0498861 -1.0498861 7.2186364 7.2186364 7.2186364 13.1136230 13.1136230 13.1136230 0.6458280 0.6458280 0.6458280 4.2222659 4.2222659 4.2222659 2.7802349 2.7802349 2.7802349 -6.6406771 -6.6406771 -6.6406771 -6.2558011 -6.2558011 -6.2558011
[91] 15.0918735 15.0918735 15.0918735 7.3031784 7.3031784 7.3031784
edibble
R package encapsulates elements of experimental designs in a series of composable functions.edibble::autofill_rcrds()
provides a quick, lazy simulation for checking or revising the statistical analysis plan for free, if your experiment is designed using the edibble
R package.simulate_process()
+ simulate_rcrds()
that include censoring and aggregation options that respects the experimental structure and expectations.These slides are available at emitanaka.org/slides/biometrics2023.
More details also in arXiv papers:
edibble
R package is available on CRAN with the latest developments on GitHub at github.com/emitanaka/edibble
emi.tanaka@anu.edu.au
emitanaka.org/slides/biometrics2023