ETC5512: Wild Caught Data
Week 9
The language of designed experiments
Lecturer: Emi Tanaka
Department of Econometrics and Business Statistics
ETC5512.Clayton-x@monash.edu
20th May 2020
Press the right arrow to progress to the next slide!
Lecturer: Emi Tanaka
Department of Econometrics and Business Statistics
ETC5512.Clayton-x@monash.edu
20th May 2020
Source: Gilmour et al. (1997) Accounting for natural and extraneous variation in the analysis of field experiments. Journal of Agric Biol Env Statistics, 2, 269-293.
library(agridat) # data packagehead(gilmour.serpentine)
## col row rep gen yield## 1 1 1 R1 ANGAS 483## 2 1 2 R1 BT_SCHOMBURG 526## 3 1 3 R1 DGR/MNX-9-9e 557## 4 1 4 R1 EXCALIBUR 564## 5 1 5 R1 JANZ 498## 6 1 6 R1 MACHETE 510
ggplot(gilmour.serpentine) + geom_histogram(aes(x = yield))
Assuming the experiment is unstructured, we may propose a linear model: yield=mean+gen+error, where error∼N(0,σ2).
This model can be fitted in R as below.
fit1 <- lm(yield ~ 1 + gen, data = gilmour.serpentine)
1
(for the mean) is included by default and can be omitted.H0:gen1=...=gen107=0.
anova(fit1)
## Analysis of Variance Table## ## Response: yield## Df Sum Sq Mean Sq F value Pr(>F)## gen 106 2041055 19255 0.7428 0.9579## Residuals 223 5781054 25924
fit2 <- lm(yield ~ 1, data = gilmour.serpentine)anova(fit2, fit1)
## Analysis of Variance Table## ## Model 1: yield ~ 1## Model 2: yield ~ 1 + gen## Res.Df RSS Df Sum of Sq F Pr(>F)## 1 329 7822108 ## 2 223 5781054 106 2041055 0.7428 0.9579
VF655
, TINCURRIN
and WW1477
have a replication of 6, the remaining 104 varieties each have a replication of 3. set.seed
in the beginning of your script.set.seed(2020) # for reproducibility# first create the field arrayexpand_grid(col = 1:15, row = 1:22) %>% # create plot id (optional) mutate(plot = 1:n()) %>% # genotype 1-104 has 3 reps # genotype 105-107 has 6 reps mutate(gen = c(rep(1:104, each = 3), rep(105:107, each = 6))) %>% # now randomly permute the genotypes mutate(gen = sample(gen))
## # A tibble: 330 x 4## col row plot gen## <int> <int> <int> <int>## 1 1 1 1 79## 2 1 2 2 29## 3 1 3 3 8## 4 1 4 4 72## 5 1 5 5 106## 6 1 6 6 91## 7 1 7 7 55## 8 1 8 8 57## 9 1 9 9 66## 10 1 10 10 37## # … with 320 more rows
You can form blocks from:
rep
. set.seed(20052020)expand_grid(col = 1:15, row = 1:22) %>% mutate(plot = 1:n()) %>% # 3 blocks -> # block 1 is col 1-5, # block 2 is col 6-10, # block 3 is col 11-15 mutate(rep = case_when( col %in% 1:5 ~ "block1", col %in% 6:10 ~ "block2", col %in% 11:15 ~ "block3" )) %>% # every block contains: # - 1 replicate of gen 1-104 # - 2 replicates of gen 105-107 group_by(rep) %>% mutate(gen = c(1:107, 105:107)) %>% # randomise within `rep` mutate(gen = sample(gen))
## # A tibble: 330 x 5## # Groups: rep [3]## col row plot rep gen## <int> <int> <int> <chr> <int>## 1 1 1 1 block1 28## 2 1 2 2 block1 106## 3 1 3 3 block1 93## 4 1 4 4 block1 41## 5 1 5 5 block1 92## 6 1 6 6 block1 71## 7 1 7 7 block1 78## 8 1 8 8 block1 38## 9 1 9 9 block1 13## 10 1 10 10 block1 89## # … with 320 more rows
fitb <- lm(yield ~ 1 + rep + gen, data = gilmour.serpentine)
anova(fitb)
## Analysis of Variance Table## ## Response: yield## Df Sum Sq Mean Sq F value Pr(>F) ## rep 2 2828701 1414351 105.8720 < 2e-16 ***## gen 106 2041055 19255 1.4414 0.01235 * ## Residuals 221 2952352 13359 ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
gen
is small.broom::tidy(fitb) %>% select(term, estimate) %>% filter(str_detect(term, "gen")) %>% arrange(-estimate)
## # A tibble: 106 x 2## term estimate## <chr> <dbl>## 1 genVG878 52.3 ## 2 genRAC811 42.3 ## 3 gen(WqKPWmH*3Ag 24.3 ## 4 genVF508 11.7 ## 5 genRAC772 5.00## 6 genWI216 4.00## 7 genRAC779 3.67## 8 genRAC820 -1. ## 9 genVF519 -1. ## 10 genRAC798 -1.67## # … with 96 more rows
VG878
is performing the best according to the analysis.set.seed(1)# no difference between trtstrt <- c(0, 0, 0)# random deviation for horsehordev <- rnorm(9, 0, 20) # there are 9 horsesim_df <- tibble(horse = 1:9) %>% # 3 grafting with 3 reps mutate(graft = rep(1:3, 3)) %>% # cut each grafted skin to 20 pieces mutate(piece = list(1:20)) %>% # let each piece be one row unnest(piece) %>% # now simulate the response mutate(y = 300 + # mean trt[graft] + # trt effect hordev[horse] + # horse dev rnorm(n(), 0, 5)) %>% # OU dev mutate_if(is.integer, as.factor)
anova(lm(y ~ graft + horse, data = sim_df))
## Analysis of Variance Table## ## Response: y## Df Sum Sq Mean Sq F value Pr(>F) ## graft 2 9035 4517.4 205.86 < 2.2e-16 ***## horse 6 32225 5370.8 244.75 < 2.2e-16 ***## Residuals 171 3752 21.9 ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
graft
is small indicatingsummary(aov(y ~ graft + Error(horse/piece), data = sim_df))
## ## Error: horse## Df Sum Sq Mean Sq F value Pr(>F)## graft 2 9035 4517 0.841 0.476## Residuals 6 32225 5371 ## ## Error: horse:piece## Df Sum Sq Mean Sq F value Pr(>F)## Residuals 171 3752 21.94
In a controlled experiment, the investigators allocate the treatments to the units (that may be people, mice, plants, etc).
In an observational study, the investigators observe units without manipulation or intervention.
In a well-controlled experiment, the difference in response between treatment groups should be only due to the treatment.
Based on previous graphs, which statements are true?
Correlation does not imply causation.
Chief Medical Officer Brendan Murphy said there was no point in testing Australians simply because they had respiratory or cold and flu symptoms.
Other than a "small and controlled" cluster of community transmission in Sydney, cases were largely confined to returned travelers.
"If you're a returned traveler or you've been in contact with someone who has been a confirmed case, then you should be tested. But other Australians do not need testing and all they're doing is putting an unnecessary burden on the testing," he said.
Read the article here.
A control is an experimental unit that did not receive any treatment.
A control is an experimental unit that did not receive any treatment.
If there is already effective treatment that is applied as a standard, then testing should be compared with this standard treatment (as was the case for breeding trial).
Is "do-nothing" treatment wise comparison though?
A placebo is a medical treatment or procedure designed to have no therapeutic value.
A double-blind study is an experimental study that neither the participants nor the experimenters know who is receiving which treatment.
A confounding variable is a variable that is associated with the variable of interest (usually the treatment) and the response.
Source: Freedman, Pisani & Purves (2010) Statistics. 4th edition.
Vaccinate all grade 2 children whose parents would consent, leaving children in grades 1 and 3 as controls.
The rate is the number of polio cases per 100,000 in each group.
Group | Participants | Rate |
---|---|---|
Vaccinated | 200,745 | 28 |
Placebo | 201,229 | 71 |
Not Vaccination (no consent) |
338,778 | 46 |
Incomplete Vaccination | 8,484 | 24 |
Group | Participants | Rate |
---|---|---|
Vaccinated (Grade 2) | 221,998 | 25 |
Control (Grade 1 & 3) | 725,173 | 54 |
Not Vaccination (Grade 2, no consent) |
123,605 | 44 |
Incomplete Vaccination (Grade 2, incomplete) |
9,904 | 40 |
Francis (1955) An evaluation of the 1954 poliomyelitis vaccine trials - summary report. American Journal of Public Health 45 1-63.
Group | RCT Rate | NFIP Rate |
---|---|---|
Vaccinated | 28 | 25 |
Placebo/Control | 71 | 54 |
Not Vaccination (no consent) |
46 | 44 |
Incomplete Vaccination | 24 | 40 |
Group | RCT Rate | NFIP Rate |
---|---|---|
Vaccinated | 28 | 25 |
Placebo/Control | 71 | 54 |
Not Vaccination (no consent) |
46 | 44 |
Incomplete Vaccination | 24 | 40 |
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Lecturer: Emi Tanaka
Department of Econometrics and Business Statistics
ETC5512.Clayton-x@monash.edu
Lecturer: Emi Tanaka
Department of Econometrics and Business Statistics
ETC5512.Clayton-x@monash.edu
20th May 2020
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |