These slides are viewed best by Chrome or Firefox and occasionally need to be refreshed if elements did not load properly. See here for the PDF .
Press the right arrow to progress to the next slide!
Lecturer: Emi Tanaka
ETC5521.Clayton-x@monash.edu
Week 12 - Session 1
Today we are going to use surveys in visual inference as a way to think further about what can and cannot be inferred more generally given the data collection
We will use the following notation for the remaining lecture:
pbinom(x, n, p)
is P(X \leq x).Choices | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | Total |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Frequency | 11 | 2 | 3 | 4 | 7 | 8 | 5 | 4 | 6 | 3 | 6 | 8 | 6 | 2 | 4 | 8 | 4 | 7 | 6 | 8 | 112 |
Let's use last week's survey results to discuss about data collection issues
Choices | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Total |
---|---|---|---|---|---|---|---|---|---|---|---|
Frequency | 0 | 2 | 0 | 19 | 3 | 4 | 0 | 1 | 4 | 1 | 34 |
Choices | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Total |
---|---|---|---|---|---|---|---|---|---|---|---|
Frequency | 0 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 21 | 23 |
Choices | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Total |
---|---|---|---|---|---|---|---|---|---|---|---|
Frequency | 0 | 3 | 21 | 0 | 0 | 1 | 0 | 4 | 0 | 1 | 30 |
Choices | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Total |
---|---|---|---|---|---|---|---|---|---|---|---|
Frequency | 3 | 21 | 0 | 0 | 0 | 1 | 6 | 0 | 0 | 0 | 31 |
Choices | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Total |
---|---|---|---|---|---|---|---|---|---|---|---|
Frequency | 0 | 1 | 1 | 24 | 1 | 4 | 0 | 0 | 0 | 0 | 31 |
Choices | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Total |
---|---|---|---|---|---|---|---|---|---|---|---|
Frequency | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 28 | 31 |
Poll | Choice | |
---|---|---|
1 | 2 | |
4 | ||
5 | ||
6 | ||
8 | ||
9 | ||
10 | ||
2 | 6 | |
10 | ||
3 | 2 | |
3 | ||
6 | ||
8 | ||
10 | ||
4 | 1 | |
2 | ||
6 | ||
7 | ||
5 | 2 | |
3 | ||
4 | ||
5 | ||
6 | ||
6 | 3 | |
6 | ||
9 | ||
10 |
02:00
Poll | Choice | |
---|---|---|
1 | 2 | |
4 | ||
5 | ||
6 | ||
8 | ||
9 | ||
10 | ||
2 | 6 | |
10 | ||
3 | 2 | |
3 | ||
6 | ||
8 | ||
10 | ||
4 | 1 | |
2 | ||
6 | ||
7 | ||
5 | 2 | |
3 | ||
4 | ||
5 | ||
6 | ||
6 | 3 | |
6 | ||
9 | ||
10 |
02:00
Poll | Choice | |
---|---|---|
1 | 2 | |
4 | ||
5 | ||
6 | ||
8 | ||
9 | ||
10 | ||
2 | 6 | |
10 | ||
3 | 2 | |
3 | ||
6 | ||
8 | ||
10 | ||
4 | 1 | |
2 | ||
6 | ||
7 | ||
5 | 2 | |
3 | ||
4 | ||
5 | ||
6 | ||
6 | 3 | |
6 | ||
9 | ||
10 |
02:00
Poll | Choice | |
---|---|---|
1 | 2 | |
4 | ||
5 | ||
6 | ||
8 | ||
9 | ||
10 | ||
2 | 6 | |
10 | ||
3 | 2 | |
3 | ||
6 | ||
8 | ||
10 | ||
4 | 1 | |
2 | ||
6 | ||
7 | ||
5 | 2 | |
3 | ||
4 | ||
5 | ||
6 | ||
6 | 3 | |
6 | ||
9 | ||
10 |
02:00
Poll | Choice | |
---|---|---|
1 | 2 | |
4 | ||
5 | ||
6 | ||
8 | ||
9 | ||
10 | ||
2 | 6 | |
10 | ||
3 | 2 | |
3 | ||
6 | ||
8 | ||
10 | ||
4 | 1 | |
2 | ||
6 | ||
7 | ||
5 | 2 | |
3 | ||
4 | ||
5 | ||
6 | ||
6 | 3 | |
6 | ||
9 | ||
10 |
02:00
The survey may be designed so it:
The survey may be designed so it:
records the demographic of participants (e.g. gender, age and education),
records the choices of multiple lineups from each participant,
The survey may be designed so it:
records the demographic of participants (e.g. gender, age and education),
records the choices of multiple lineups from each participant,
records the reaction time for selecting their choice, and
The survey may be designed so it:
records the demographic of participants (e.g. gender, age and education),
records the choices of multiple lineups from each participant,
records the reaction time for selecting their choice, and
includes some lineups with an "obvious" data plot.
The survey may be designed so it:
records the demographic of participants (e.g. gender, age and education),
records the choices of multiple lineups from each participant,
records the reaction time for selecting their choice, and
includes some lineups with an "obvious" data plot.
Post data collection then you can check:
The survey may be designed so it:
records the demographic of participants (e.g. gender, age and education),
records the choices of multiple lineups from each participant,
records the reaction time for selecting their choice, and
includes some lineups with an "obvious" data plot.
Post data collection then you can check:
if the survey was representative of the population by checking the demographic information,
if participant fails to detect the data plot in the "obvious" lineups, it means that they may not be answering sincerely or they did not understand the instructions, which means you may want to remove their data,
The survey may be designed so it:
records the demographic of participants (e.g. gender, age and education),
records the choices of multiple lineups from each participant,
records the reaction time for selecting their choice, and
includes some lineups with an "obvious" data plot.
Post data collection then you can check:
if the survey was representative of the population by checking the demographic information,
if participant fails to detect the data plot in the "obvious" lineups, it means that they may not be answering sincerely or they did not understand the instructions, which means you may want to remove their data,
if particpant appears to be selecting too quickly, they may not actually be processing the plots appropriately.
Choices | 1 | 2 | Total |
---|---|---|---|
Frequency | 38 | 67 | 105 |
Choices | 1 | 2 | 3 | 4 | Total |
---|---|---|---|---|---|
Frequency | 38 | 67 | 0 | 0 | 105 |
Choices | 1 | 2 | 3 | 4 | Total |
---|---|---|---|---|---|
Experment 1 | 52 | 53 | 0 | 0 | 105 |
Experment 2 | 20 | 53 | 20 | 12 | 105 |
Experment 3 | 4 | 53 | 36 | 12 | 105 |
Choices | 1 | 2 | 3 | 4 | Total |
---|---|---|---|---|---|
Experment 1 | 52 | 53 | 0 | 0 | 105 |
Experment 2 | 20 | 53 | 20 | 12 | 105 |
Experment 3 | 4 | 53 | 36 | 12 | 105 |
03:00
set.seed(1)x1 <- rnorm(20, 0, 1)x2 <- rnorm(20, 1, 1)df3 <- tibble(x = c(x1, x2), group = rep(c("A", "B"), times = c(length(x1), length(x2))))
set.seed(29)gcheat <- lineup(null_permute("group"), true = df3, n = 4, pos = 3) %>% ggplot(aes(group, x, color = group)) + ggbeeswarm::geom_quasirandom() + #geom_boxplot() + facet_wrap(~.sample) + theme(axis.text = element_blank(), axis.title = element_blank(), axis.ticks.length = unit(0, "pt")) + scale_color_discrete_qualitative() + guides(color = "none")gcheat
set.seed(1)x1 <- rnorm(20, 0, 1)x2 <- rnorm(20, 1, 1)df3 <- tibble(x = c(x1, x2), group = rep(c("A", "B"), times = c(length(x1), length(x2))))
set.seed(29)gcheat <- lineup(null_permute("group"), true = df3, n = 4, pos = 3) %>% ggplot(aes(group, x, color = group)) + ggbeeswarm::geom_quasirandom() + #geom_boxplot() + facet_wrap(~.sample) + theme(axis.text = element_blank(), axis.title = element_blank(), axis.ticks.length = unit(0, "pt")) + scale_color_discrete_qualitative() + guides(color = "none")gcheat
Occasionally, null data demonstrates features that are more aligned with data generated from the alternative hypothesis
For example, the data plot is in Plot 3 for the lineup on the left but Plot 1 demonstrates a bigger difference in the mean (and smaller standard deviation) of the two groups
set.seed(1)x1 <- rnorm(20, 0, 1)x2 <- rnorm(20, 1, 1)df3 <- tibble(x = c(x1, x2), group = rep(c("A", "B"), times = c(length(x1), length(x2))))
set.seed(29)gcheat <- lineup(null_permute("group"), true = df3, n = 4, pos = 3) %>% ggplot(aes(group, x, color = group)) + ggbeeswarm::geom_quasirandom() + #geom_boxplot() + facet_wrap(~.sample) + theme(axis.text = element_blank(), axis.title = element_blank(), axis.ticks.length = unit(0, "pt")) + scale_color_discrete_qualitative() + guides(color = "none")gcheat
Occasionally, null data demonstrates features that are more aligned with data generated from the alternative hypothesis
For example, the data plot is in Plot 3 for the lineup on the left but Plot 1 demonstrates a bigger difference in the mean (and smaller standard deviation) of the two groups
There is indeed a significant mean difference between the two groups for the data in Plot 3, but this can be overshadowed by null data in Plot 1
set.seed(1)x1 <- rnorm(20, 0, 1)x2 <- rnorm(20, 1, 1)df3 <- tibble(x = c(x1, x2), group = rep(c("A", "B"), times = c(length(x1), length(x2))))
set.seed(29)gcheat <- lineup(null_permute("group"), true = df3, n = 4, pos = 3) %>% ggplot(aes(group, x, color = group)) + ggbeeswarm::geom_quasirandom() + #geom_boxplot() + facet_wrap(~.sample) + theme(axis.text = element_blank(), axis.title = element_blank(), axis.ticks.length = unit(0, "pt")) + scale_color_discrete_qualitative() + guides(color = "none")gcheat
Occasionally, null data demonstrates features that are more aligned with data generated from the alternative hypothesis
For example, the data plot is in Plot 3 for the lineup on the left but Plot 1 demonstrates a bigger difference in the mean (and smaller standard deviation) of the two groups
There is indeed a significant mean difference between the two groups for the data in Plot 3, but this can be overshadowed by null data in Plot 1
This case is extremely rare; in fact, I cheated by generating 100 lineups of the same dimension and took the extreme case.
set.seed(1)x1 <- rnorm(20, 0, 1)x2 <- rnorm(20, 1, 1)df3 <- tibble(x = c(x1, x2), group = rep(c("A", "B"), times = c(length(x1), length(x2))))
set.seed(29)gcheat <- lineup(null_permute("group"), true = df3, n = 4, pos = 3) %>% ggplot(aes(group, x, color = group)) + ggbeeswarm::geom_quasirandom() + #geom_boxplot() + facet_wrap(~.sample) + theme(axis.text = element_blank(), axis.title = element_blank(), axis.ticks.length = unit(0, "pt")) + scale_color_discrete_qualitative() + guides(color = "none")gcheat
Occasionally, null data demonstrates features that are more aligned with data generated from the alternative hypothesis
For example, the data plot is in Plot 3 for the lineup on the left but Plot 1 demonstrates a bigger difference in the mean (and smaller standard deviation) of the two groups
There is indeed a significant mean difference between the two groups for the data in Plot 3, but this can be overshadowed by null data in Plot 1
This case is extremely rare; in fact, I cheated by generating 100 lineups of the same dimension and took the extreme case.
While this case is rare, it can happen so we need to becareful in generalising the results based on a test on one lineup
Lineup 1
Lineup 3
Lineup 2
Lineup 4
Lineup 1
Choices | 1 | 2 | 3 | 4 | Total |
---|---|---|---|---|---|
Frequency | 15 | 4 | 10 | 1 | 30 |
Lineup 2
Choices | 1 | 2 | 3 | 4 | Total |
---|---|---|---|---|---|
Frequency | 1 | 3 | 15 | 6 | 25 |
Lineup 3
Choices | 1 | 2 | 3 | 4 | Total |
---|---|---|---|---|---|
Frequency | 2 | 4 | 15 | 4 | 25 |
Lineup 4
Choices | 1 | 2 | 3 | 4 | Total |
---|---|---|---|---|---|
Frequency | 4 | 3 | 15 | 3 | 25 |
All together
Choices | 1 | 2 | 3 | 4 | Total |
---|---|---|---|---|---|
Frequency | 22 | 14 | 55 | 14 | 105 |
library(nullabor)set.seed(1)sim <- tibble(id = 1:10000000) %>% mutate(y = c(rnorm(n()/2), rnorm(n()/2, 0.001)), group = rep(c("A", "B"), each = n()/2))with(sim, mean(y[group=="A"]) - mean(y[group=="B"]))
## [1] -0.001443504
with(sim, t.test(y[group=="A"], y[group=="B"]))
## ## Welch Two Sample t-test## ## data: y[group == "A"] and y[group == "B"]## t = -2.2819, df = 1e+07, p-value = 0.0225## alternative hypothesis: true difference in means is not equal to 0## 95 percent confidence interval:## -0.0026833804 -0.0002036271## sample estimates:## mean of x mean of y ## 0.0001819234 0.0016254271
For computational reasons, only 10,000 data points for each plot are used above.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Lecturer: Emi Tanaka
ETC5521.Clayton-x@monash.edu
Week 12 - Session 1
Lecturer: Emi Tanaka
ETC5521.Clayton-x@monash.edu
Week 12 - Session 1
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |