These slides are viewed best by Chrome or Firefox and occasionally need to be refreshed if elements did not load properly. See here for the PDF .
Press the right arrow to progress to the next slide!
Lecturer: Emi Tanaka
ETC5521.Clayton-x@monash.edu
Week 12 - Session 2
set.seed(1)df <- tibble(id = 1:200) %>% mutate(y = rgamma(n(), shape = 3, rate = 4))
set.seed(1)g <- lineup(null_dist("y", dist = "exp", params = list(rate = 1/mean(df$y))), true = df, n = 20, pos = 15) %>% ggplot(aes(y)) + geom_histogram(color = "white") + facet_wrap(~.sample) + theme(axis.text = element_blank(), axis.title = element_blank(), axis.ticks.length = unit(0, "pt")) g
set.seed(1)df <- tibble(id = 1:200) %>% mutate(y = rgamma(n(), shape = 3, rate = 4))
set.seed(1)g <- lineup(null_dist("y", dist = "exp", params = list(rate = 1/mean(df$y))), true = df, n = 20, pos = 15) %>% ggplot(aes(y)) + geom_histogram(color = "white") + facet_wrap(~.sample) + theme(axis.text = element_blank(), axis.title = element_blank(), axis.ticks.length = unit(0, "pt")) g
set.seed(1)df <- tibble(id = 1:200) %>% mutate(y = rgamma(n(), shape = 3, rate = 4))
set.seed(1)g <- lineup(null_dist("y", dist = "exp", params = list(rate = 1/mean(df$y))), true = df, n = 20, pos = 15) %>% ggplot(aes(y)) + geom_histogram(color = "white") + facet_wrap(~.sample) + theme(axis.text = element_blank(), axis.title = element_blank(), axis.ticks.length = unit(0, "pt")) g
set.seed(1)df <- tibble(id = 1:200) %>% mutate(y = rgamma(n(), shape = 3, rate = 4))
set.seed(1)g <- lineup(null_dist("y", dist = "exp", params = list(rate = 1/mean(df$y))), true = df, n = 20, pos = 15) %>% ggplot(aes(y)) + geom_histogram(color = "white") + facet_wrap(~.sample) + theme(axis.text = element_blank(), axis.title = element_blank(), axis.ticks.length = unit(0, "pt")) g
set.seed(1)df <- tibble(id = 1:200) %>% mutate(y = rgamma(n(), shape = 3, rate = 4))
set.seed(1)g <- lineup(null_dist("y", dist = "exp", params = list(rate = 1/mean(df$y))), true = df, n = 20, pos = 15) %>% ggplot(aes(y)) + geom_histogram(color = "white") + facet_wrap(~.sample) + theme(axis.text = element_blank(), axis.title = element_blank(), axis.ticks.length = unit(0, "pt")) g
The statistical power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis (H0) when a specific alternative hypothesis (H1) is true.
The statistical power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis (H_0) when a specific alternative hypothesis (H_1) is true.
The statistical power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis (H_0) when a specific alternative hypothesis (H_1) is true.
The statistical power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis (H_0) when a specific alternative hypothesis (H_1) is true.
The statistical power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis (H_0) when a specific alternative hypothesis (H_1) is true.
For a fixed power (1-\beta), the minimum sample size n we need depends on the detection probability p
Generally if p is larger, less n is sufficient to get equivalent or larger power.
For a fixed power (1-\beta), the minimum sample size n we need depends on the detection probability p
Generally if p is larger, less n is sufficient to get equivalent or larger power.
But we don't know what the true p is! (If we did, we don't need to test for it.)
For a fixed power (1-\beta), the minimum sample size n we need depends on the detection probability p
Generally if p is larger, less n is sufficient to get equivalent or larger power.
But we don't know what the true p is! (If we did, we don't need to test for it.)
Either you will need to make a guess from past experience or run a pilot test.
For a fixed power (1-\beta), the minimum sample size n we need depends on the detection probability p
Generally if p is larger, less n is sufficient to get equivalent or larger power.
But we don't know what the true p is! (If we did, we don't need to test for it.)
Either you will need to make a guess from past experience or run a pilot test.
If you find in the pilot test, x_p out of n_p participants detected the data plot then an estimate of \hat{p} = x_p / n_p.
p <- 0.1m <- 20d <- 0.95power_df <- tibble(n = 2:200) %>% mutate(power = map_dbl(n, function(n) { x <- 1:n pval <- map_dbl(x, ~1 - pbinom(.x - 1, n, 1/m)) xmin <- x[which.max(pval < alpha)] 1 - pbinom(xmin - 1, n, p) })) power_df %>% filter(power > 0.8) %>% pull(n) %>% min() %>% magrittr::divide_by(d) %>% ceiling()
## [1] 178
p <- 0.1m <- 20d <- 0.95power_df <- tibble(n = 2:200) %>% mutate(power = map_dbl(n, function(n) { x <- 1:n pval <- map_dbl(x, ~1 - pbinom(.x - 1, n, 1/m)) xmin <- x[which.max(pval < alpha)] 1 - pbinom(xmin - 1, n, p) })) power_df %>% filter(power > 0.8) %>% pull(n) %>% min() %>% magrittr::divide_by(d) %>% ceiling()
## [1] 178
set.seed(1)df1 <- tibble(id = 1:200) %>% mutate(x = runif(n(), 0, 5), y = 2 * x + 1 + rnorm(n()))ggplot(df1, aes(x, y)) + geom_point()
set.seed(1)df <- tibble(id = 1:200) %>% mutate(y = runif(n(), -4, 4))
set.seed(1)ldf <- lineup(null_dist("y", dist = "norm", params = list(mean = mean(df$y), sd = sd(df$y))), true = df, n = 20, pos = 4)ggplot(ldf, aes(y)) + geom_histogram(color = "white") + facet_wrap(~.sample) + theme(axis.text = element_blank(), axis.title = element_blank(), axis.ticks.length = unit(0, "pt"))
set.seed(1)df <- tibble(id = 1:200) %>% mutate(y = runif(n(), -4, 4))
set.seed(1)ldf <- lineup(null_dist("y", dist = "norm", params = list(mean = mean(df$y), sd = sd(df$y))), true = df, n = 20, pos = 4)ggplot(ldf, aes(y)) + geom_histogram(color = "white") + facet_wrap(~.sample) + theme(axis.text = element_blank(), axis.title = element_blank(), axis.ticks.length = unit(0, "pt"))
We are testing H_0: Y \sim N(\mu, \sigma^2).
An estimate of \hat{\mu} = \bar{Y} is estimated the sample mean
set.seed(1)df <- tibble(id = 1:200) %>% mutate(y = runif(n(), -4, 4))
set.seed(1)ldf <- lineup(null_dist("y", dist = "norm", params = list(mean = mean(df$y), sd = sd(df$y))), true = df, n = 20, pos = 4)ggplot(ldf, aes(y)) + geom_histogram(color = "white") + facet_wrap(~.sample) + theme(axis.text = element_blank(), axis.title = element_blank(), axis.ticks.length = unit(0, "pt"))
We are testing H_0: Y \sim N(\mu, \sigma^2).
An estimate of \hat{\mu} = \bar{Y} is estimated the sample mean
An estimate of \hat{\sigma} = sd(Y) is estimated the sample standard deviation
A null data here is simply simulated from N(\hat{\mu}, \hat{\sigma}).
It is easier to compare a distribution using Q-Q plot
Plot 4 is in indeed the data plot.
While today's focus was on data collection from visual inference surveys, concepts such as data quality checks and sufficient sample size to draw inference is applicable to other data collection.
While today's focus was on data collection from visual inference surveys, concepts such as data quality checks and sufficient sample size to draw inference is applicable to other data collection.
There's always more to learn but stay curious and make sure you plot your data before rushing off to fitting some models!
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Lecturer: Emi Tanaka
ETC5521.Clayton-x@monash.edu
Week 12 - Session 2
Lecturer: Emi Tanaka
ETC5521.Clayton-x@monash.edu
Week 12 - Session 2
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |