These slides are viewed best by Chrome or Firefox and occasionally need to be refreshed if elements did not load properly. See here for the PDF .
Press the right arrow to progress to the next slide!
Lecturer: Emi Tanaka
ETC5521.Clayton-x@monash.edu
Week 3 - Session 2
library(tidyverse)glimpse(cars)
## Rows: 50## Columns: 2## $ speed <dbl> 4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 16, 16, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 20, 20, 20, 20, 20, 22, 23, 24, 24, 24…## $ dist <dbl> 2, 10, 4, 22, 16, 10, 18, 26, 34, 17, 28, 14, 20, 24, 28, 26, 34, 34, 46, 26, 36, 60, 80, 20, 26, 54, 32, 40, 32, 40, 50, 42, 56, 76, 84, 36, 46, 68, 32, 48, 52, 56, 64, 66, 54, 70, 92…
library(tidyverse)glimpse(cars)
## Rows: 50## Columns: 2## $ speed <dbl> 4, 4, 7, 7, 8, 9, 10, 10, 10, 11, 11, 12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 15, 15, 15, 16, 16, 17, 17, 17, 18, 18, 18, 18, 19, 19, 19, 20, 20, 20, 20, 20, 22, 23, 24, 24, 24…## $ dist <dbl> 2, 10, 4, 22, 16, 10, 18, 26, 34, 17, 28, 14, 20, 24, 28, 26, 34, 34, 46, 26, 36, 60, 80, 20, 26, 54, 32, 40, 32, 40, 50, 42, 56, 76, 84, 36, 46, 68, 32, 48, 52, 56, 64, 66, 54, 70, 92…
ggplot(cars, aes(speed, dist)) + geom_point() + geom_smooth(method = "lm", se = FALSE)
lm
function:lm(dist ~ speed, data = cars)
is the same aslm(dist ~ 1 + speed, data = cars)
lm
function:lm(dist ~ speed, data = cars)
is the same aslm(dist ~ 1 + speed, data = cars)
fit <- lm(dist ~ 1 + speed, data = cars)coef(fit)
## (Intercept) speed ## -17.579095 3.932409
fit <- lm(dist ~ 1 + speed, data = cars)coef(fit)
## (Intercept) speed ## -17.579095 3.932409
price ~ 1 + carat
?
price ~ 1 + carat
?
price ~ poly(carat, 2)
?
yi=β0+β1xi+β2x2i+ei.
price ~ poly(carat, 2)
?
yi=β0+β1xi+β2x2i+ei.
price ~ poly(carat, 2)
?
yi=β0+β1xi+β2x2i+ei.
price ~ poly(carat, 2)
?
yi=β0+β1xi+β2x2i+ei.
Notice that we did no formal statistical inference as we initially try to formulate the model.
The goal of the main analysis is to characterise the price of a diamond by its carat. This may involve:
Notice that we did no formal statistical inference as we initially try to formulate the model.
The goal of the main analysis is to characterise the price of a diamond by its carat. This may involve:
There may be in fact many, many models considered but discarded at the IDA stage.
Notice that we did no formal statistical inference as we initially try to formulate the model.
The goal of the main analysis is to characterise the price of a diamond by its carat. This may involve:
There may be in fact many, many models considered but discarded at the IDA stage.
These discarded models are hardly ever reported. Consequently, majority of reported statistics give a distorted view and it's important to remind yourself what might not be reported.
All models are approximate and tentative; approximate in the sense that no model is exactly true and tentative in that they may be modified in the light of further data
—Chatfield (1985)
All models are approximate and tentative; approximate in the sense that no model is exactly true and tentative in that they may be modified in the light of further data
—Chatfield (1985)
All models are wrong but some are useful
—George Box
A wheat breeding trial to test 107 varieties (also called genotype) is conducted in a field experiment laid out in a rectangular array with 22 rows and 15 columns.
data("gilmour.serpentine", package = "agridat")skimr::skim(gilmour.serpentine)
## ── Data Summary ────────────────────────## Values ## Name gilmour.serpentine## Number of rows 330 ## Number of columns 5 ## _______________________ ## Column type frequency: ## factor 2 ## numeric 3 ## ________________________ ## Group variables None ## ## ── Variable type: factor ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate ordered n_unique top_counts ## 1 rep 0 1 FALSE 3 R1: 110, R2: 110, R3: 110 ## 2 gen 0 1 FALSE 107 TIN: 6, VF6: 6, WW1: 6, (WW: 3## ## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 col 0 1 8 4.33 1 4 8 12 15 ▇▇▇▇▇## 2 row 0 1 11.5 6.35 1 6 11.5 17 22 ▇▆▆▆▇## 3 yield 0 1 592. 154. 194 469 618. 714. 925 ▂▅▆▇▂
Gilmour, Cullis and Verbyla (1997) Accounting for natural and extraneous variation in the analysis of field experiments. Journal of Agric Biol Env Statistics 2 269-293
Experimental Design
Experimental Design
gen
); Experimental Design
gen
); rep
here). Experimental Design
Analysis
yield = mean + block + treatment + error
fit <- lm(yield ~ 1 + rep + gen, data = gilmour.serpentine)
Analysis
summary(fit)
## ## Call:## lm(formula = yield ~ 1 + rep + gen, data = gilmour.serpentine)## ## Residuals:## Min 1Q Median 3Q Max ## -245.070 -69.695 -1.182 71.427 250.652 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 720.248 67.335 10.697 < 2e-16 ***## repR2 96.100 15.585 6.166 3.29e-09 ***## repR3 -129.845 15.585 -8.331 8.44e-15 ***## gen(WqKPWmH*3Ag 24.333 94.372 0.258 0.796766 ## genAMERY -93.333 94.372 -0.989 0.323747 ## genANGAS -132.667 94.372 -1.406 0.161192 ## genAROONA -153.667 94.372 -1.628 0.104884 ## genBATAVIA -175.333 94.372 -1.858 0.064513 . ## genBD231 -70.333 94.372 -0.745 0.456895 ## genBEULAH -173.667 94.372 -1.840 0.067074 . ## genBLADE -270.000 94.372 -2.861 0.004628 ** ## genBT_SCHOMBURG -49.000 94.372 -0.519 0.604125 ## genCADOUX -223.333 94.372 -2.367 0.018820 * ## genCONDOR -124.333 94.372 -1.317 0.189041 ## genCORRIGIN -217.667 94.372 -2.306 0.022010 * ## genCUNNINGHAM -254.667 94.372 -2.699 0.007502 ** ## genDGR/MNX-9-9e -47.667 94.372 -0.505 0.613996 ## genDOLLARBIRD -200.667 94.372 -2.126 0.034584 * ## genEXCALIBUR -55.000 94.372 -0.583 0.560621 ## genGOROKE -141.667 94.372 -1.501 0.134743 ## genHALBERD -53.333 94.372 -0.565 0.572551 ## genHOUTMAN -209.333 94.372 -2.218 0.027560 * ## genJANZ -214.667 94.372 -2.275 0.023884 * ## genK2011-5* -87.333 94.372 -0.925 0.355758 ## genKATUNGA -110.333 94.372 -1.169 0.243609 ## genKIATA -165.667 94.372 -1.755 0.080565 . ## genKITE -180.000 94.372 -1.907 0.057772 . ## genKULIN -91.000 94.372 -0.964 0.335964 ## genLARK -336.333 94.372 -3.564 0.000448 ***## genLOWAN -152.333 94.372 -1.614 0.107915 ## genM4997 -146.000 94.372 -1.547 0.123277 ## genM5075 -194.667 94.372 -2.063 0.040304 * ## genM5097 -102.667 94.372 -1.088 0.277826 ## genMACHETE -231.333 94.372 -2.451 0.015010 * ## genMEERING -247.667 94.372 -2.624 0.009286 ** ## genMOLINEUX -165.667 94.372 -1.755 0.080565 . ## genOSPREY -162.000 94.372 -1.717 0.087451 . ## genOUYEN -136.667 94.372 -1.448 0.148986 ## genOXLEY -221.667 94.372 -2.349 0.019713 * ## genPELSART -200.333 94.372 -2.123 0.034882 * ## genPEROUSE -283.667 94.372 -3.006 0.002955 ** ## genRAC655 -112.667 94.372 -1.194 0.233813 ## genRAC655'S' -113.667 94.372 -1.204 0.229702 ## genRAC696 -3.667 94.372 -0.039 0.969042 ## genRAC710 -51.000 94.372 -0.540 0.589455 ## genRAC750 -77.333 94.372 -0.819 0.413410 ## genRAC759 -42.000 94.372 -0.445 0.656721 ## genRAC772 5.000 94.372 0.053 0.957794 ## genRAC777 -172.333 94.372 -1.826 0.069183 . ## genRAC779 3.667 94.372 0.039 0.969042 ## genRAC787 -118.000 94.372 -1.250 0.212486 ## genRAC791 -72.667 94.372 -0.770 0.442120 ## genRAC792 -102.333 94.372 -1.084 0.279385 ## genRAC798 -1.667 94.372 -0.018 0.985926 ## genRAC804 -45.000 94.372 -0.477 0.633949 ## genRAC805 -43.000 94.372 -0.456 0.649093 ## genRAC806 -35.333 94.372 -0.374 0.708462 ## genRAC807 -91.333 94.372 -0.968 0.334201 ## genRAC808 -54.000 94.372 -0.572 0.567765 ## genRAC809 -43.333 94.372 -0.459 0.646559 ## genRAC810 -131.667 94.372 -1.395 0.164359 ## genRAC811 42.333 94.372 0.449 0.654174 ## genRAC812 -94.000 94.372 -0.996 0.320310 ## genRAC813 -83.333 94.372 -0.883 0.378179 ## genRAC814 -72.333 94.372 -0.766 0.444214 ## genRAC815 -111.000 94.372 -1.176 0.240781 ## genRAC816 -66.333 94.372 -0.703 0.482862 ## genRAC817 -100.000 94.372 -1.060 0.290466 ## genRAC818 -107.000 94.372 -1.134 0.258101 ## genRAC819 -121.333 94.372 -1.286 0.199895 ## genRAC820 -1.000 94.372 -0.011 0.991555 ## genRAC821 -98.333 94.372 -1.042 0.298560 ## genROSELLA -184.333 94.372 -1.953 0.052050 . ## genSCHOMBURGK -132.333 94.372 -1.402 0.162242 ## genSHRIKE -128.000 94.372 -1.356 0.176376 ## genSPEAR -254.667 94.372 -2.699 0.007502 ** ## genSTILETTO -157.000 94.372 -1.664 0.097603 . ## genSUNBRI -218.333 94.372 -2.314 0.021612 * ## genSUNFIELD -206.667 94.372 -2.190 0.029576 * ## genSUNLAND -182.667 94.372 -1.936 0.054192 . ## genSWIFT -197.000 94.372 -2.087 0.037990 * ## genTASMAN -161.000 94.372 -1.706 0.089410 . ## genTATIARA -64.333 94.372 -0.682 0.496142 ## genTINCURRIN -19.000 81.728 -0.232 0.816382 ## genTRIDENT -132.667 94.372 -1.406 0.161192 ## genVF299 -66.333 94.372 -0.703 0.482862 ## genVF300 -111.667 94.372 -1.183 0.237976 ## genVF302 -108.333 94.372 -1.148 0.252234 ## genVF508 11.667 94.372 0.124 0.901725 ## genVF519 -1.000 94.372 -0.011 0.991555 ## genVF655 -160.167 81.728 -1.960 0.051283 . ## genVF664 -106.667 94.372 -1.130 0.259583 ## genVG127 -109.667 94.372 -1.162 0.246460 ## genVG503 -43.000 94.372 -0.456 0.649093 ## genVG506 -108.667 94.372 -1.151 0.250782 ## genVG701 -19.333 94.372 -0.205 0.837867 ## genVG714 -108.333 94.372 -1.148 0.252234 ## genVG878 52.333 94.372 0.555 0.579767 ## genWARBLER -217.000 94.372 -2.299 0.022415 * ## genWI216 4.000 94.372 0.042 0.966230 ## genWI221 -17.333 94.372 -0.184 0.854440 ## genWI231 -218.333 94.372 -2.314 0.021612 * ## genWI232 -56.333 94.372 -0.597 0.551165 ## genWILGOYNE -131.000 94.372 -1.388 0.166496 ## genWW1402 -117.333 94.372 -1.243 0.215071 ## genWW1477 -185.667 81.728 -2.272 0.024064 * ## genWW1831 -86.667 94.372 -0.918 0.359435 ## genWYUNA -176.667 94.372 -1.872 0.062524 . ## genYARRALINKA -245.000 94.372 -2.596 0.010061 * ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 115.6 on 221 degrees of freedom## Multiple R-squared: 0.6226, Adjusted R-squared: 0.4381 ## F-statistic: 3.375 on 108 and 221 DF, p-value: 1.081e-14
Do you notice anything from below?
"Teaching of Statistics should provide a more balanced blend of IDA and inference"
Chatfield (1985)
"Teaching of Statistics should provide a more balanced blend of IDA and inference"
Chatfield (1985)
Yet there is still very little emphasis of it in teaching and also at times in practice.
"Teaching of Statistics should provide a more balanced blend of IDA and inference"
Chatfield (1985)
Yet there is still very little emphasis of it in teaching and also at times in practice.
So don't forget to do IDA!
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Lecturer: Emi Tanaka
ETC5521.Clayton-x@monash.edu
Week 3 - Session 2
Lecturer: Emi Tanaka
ETC5521.Clayton-x@monash.edu
Week 3 - Session 2
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |