These slides are viewed best by Chrome or Firefox and occasionally need to be refreshed if elements did not load properly. See here for the PDF .

Press the right arrow to progress to the next slide!

1/26

Visual Inference

絵による推論

Presenter: Emi Tanaka

Department of Econometrics and Business Statistics,
Monash University, Melbourne, Australia

emi.tanaka@monash.edu

@statsgen

27 Mar 2021 @ Fukuoka R

1/26

スライドは emitanaka.org/slides/FukuokaR2021

A little about me

自己紹介

Monash大学で統計学の助教授をやってます

2/26

スライドは emitanaka.org/slides/FukuokaR2021

A little about me

自己紹介

Monash大学で統計学の助教授をやってます

2/26

スライドは emitanaka.org/slides/FukuokaR2021

A little about me

自己紹介

Monash大学で統計学の助教授をやってます

2/26

スライドは emitanaka.org/slides/FukuokaR2021

A little about me

自己紹介

Monash大学で統計学の助教授をやってます

父親は佐賀市、母親は長野県からです

2/26

スライドは emitanaka.org/slides/FukuokaR2021

A little about me

自己紹介

Monash大学で統計学の助教授をやってます

父親は佐賀市、母親は長野県からです
統計学とRを教える気歴は~10年です（±３年standard deviation)

2/26

スライドは emitanaka.org/slides/FukuokaR2021

Visual Inference

絵による推論

3/26

スライドは emitanaka.org/slides/FukuokaR2021

Visual Inference

絵による推論

…の前に、

hypothesis testing (frequentist version)
linear model
residual analysis
null plots

3/26

スライドは emitanaka.org/slides/FukuokaR2021

Visual Inference

絵による推論

…の前に、

hypothesis testing (frequentist version)
linear model
residual analysis
null plots with !

3/26

スライドは emitanaka.org/slides/FukuokaR2021

レッスン一

hypothesis testing (frequentist version)
linear model
residual analysis
null plots

4/26

スライドは emitanaka.org/slides/FukuokaR2021

Testing coin bias

私はこの１００円持ってます　
私はこのコインは表に偏ってると思ってます。
即ち、５０％以上の確率で表がでるのでしょうか？

実験その１

コインを１０回フリップします。

結果：

７回は表、３回は裏でした。

コインは表に偏ってますか？

5/26

スライドは emitanaka.org/slides/FukuokaR2021

実験その２

コインを１００回フリップします。

結果：

７０回は表、３０回は裏でした。また７割は表でした。

コインは表に偏ってますか？

6/26

スライドは emitanaka.org/slides/FukuokaR2021

(Frequentist) hypotheses testing framework

Suppose $X$ is the number of heads out of $n$ independent tosses.
Let $p$ be the probability of getting a head for this coin.


Hypotheses	$H_0: p = 0.5$ vs. $H_1: p > 0.5$
Assumptions	Each toss is independent with equal chance of getting a head.
Test statistic	$X \sim B(n, p)$ under $H_0$ .
P-value (or critical value or confidence interval)	$P(X \geq x)$ where $x$ is the observed test statistic.
Conclusion	Reject null hypothesis when the $p$ -value is less than some significance level $\alpha$ . Usually $\alpha = 0.05$ .

7/26

スライドは emitanaka.org/slides/FukuokaR2021

(Frequentist) hypotheses testing framework

Suppose $X$ is the number of heads out of $n$ independent tosses.
Let $p$ be the probability of getting a head for this coin.


Hypotheses	$H_0: p = 0.5$ vs. $H_1: p > 0.5$
Assumptions	Each toss is independent with equal chance of getting a head.
Test statistic	$X \sim B(n, p)$ under $H_0$ .
P-value (or critical value or confidence interval)	$P(X \geq x)$ where $x$ is the observed test statistic.
Conclusion	Reject null hypothesis when the $p$ -value is less than some significance level $\alpha$ . Usually $\alpha = 0.05$ .

実験その１の p-value は $P(X \geq 7) \approx 0.17$ Fail to reject $H_0$ .
実験その２の p-value は $P(X \geq 70) \approx 0.00004$ Reject $H_0$ .

Inference framework

7/26

スライドは emitanaka.org/slides/FukuokaR2021

レッスン二

hypothesis testing (frequentist version)
linear model
residual analysis
null plots

8/26

スライドは emitanaka.org/slides/FukuokaR2021

Black Cherry Trees: Volume vs Girth

library(ggplot2)
ggplot(trees, aes(log(Girth), log(Volume))) +
  geom_point() + 
  geom_smooth(method = "lm", se = FALSE)

Consider the model:

$\log(\texttt{Volume}_i) = \beta_0 + \beta_1 \log(\texttt{Girth}_i) + e_i$ where $e_i \sim NID(0, \sigma^2)$ .

Note: NID means normal, independent and identically distributed.

9/26

スライドは emitanaka.org/slides/FukuokaR2021

Model fit

library(magrittr) # for `%>%`
library(broom) # for `augment`
fit <- lm(log(Volume) ~ log(Girth), data = trees)
res <- augment(fit)
tidy(fit)

## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    -2.35    0.231      -10.2 4.18e-11
## 2 log(Girth)      2.20    0.0898      24.5 6.36e-21

res

## # A tibble: 31 x 7
##    `log(Volume)` `log(Girth)` .fitted .std.resid   .hat .sigma  .cooksd
##            <dbl>        <dbl>   <dbl>      <dbl>  <dbl>  <dbl>    <dbl>
##  1          2.33         2.12    2.30      0.281 0.151   0.117 0.00703 
##  2          2.33         2.15    2.38     -0.452 0.133   0.117 0.0156  
##  3          2.32         2.17    2.43     -1.01  0.122   0.115 0.0705  
##  4          2.80         2.35    2.82     -0.200 0.0582  0.117 0.00124 
##  5          2.93         2.37    2.86      0.650 0.0536  0.116 0.0120  
##  6          2.98         2.38    2.88      0.884 0.0516  0.115 0.0213  
##  7          2.75         2.40    2.92     -1.56  0.0478  0.112 0.0609  
##  8          2.90         2.40    2.92     -0.183 0.0478  0.117 0.000842
##  9          3.12         2.41    2.94      1.57  0.0461  0.112 0.0594  
## 10          2.99         2.42    2.96      0.259 0.0445  0.117 0.00156 
## # … with 21 more rows

10/26

スライドは emitanaka.org/slides/FukuokaR2021

レッスン三

hypothesis testing (frequentist version)
linear model
residual analysis
null plots

11/26

スライドは emitanaka.org/slides/FukuokaR2021

Model diagnostic

$e_i \sim NID(0, \sigma^2)$ no pattern in residual plot & straight line for QQ-plot (theoretically)

ggplot(res, 
    aes(`log(Girth)`, .std.resid)) + 
  geom_point() + 
  geom_hline(yintercept = 0) + 
  ggtitle("Residual plot")

ggplot(res, 
    aes(sample = .std.resid)) + 
  geom_qq() +
  geom_qq_line() +
  ggtitle("QQ plot")

12/26

スライドは emitanaka.org/slides/FukuokaR2021

So is there any pattern in this residual plot?

And is this line straight in the QQ-plot?

13/26

スライドは emitanaka.org/slides/FukuokaR2021

意外に統計学者でもプロットを評価する時はちょいと適当

14/26

スライドは emitanaka.org/slides/FukuokaR2021

意外に統計学者でもプロットを評価する時はちょいと適当

じゃあ、どうやって評価する？

Hypothesis testing framework

14/26

スライドは emitanaka.org/slides/FukuokaR2021

Coin toss example


Hypotheses	$H_0: p = 0.5$ vs. $H_1: p > 0.5$
Assumptions	Each toss is independent with equal chance of getting a head.
Test statistic	$X \sim B(n, p)$ under $H_0$ .
P-value (or critical value or confidence interval)	$P(X \geq x)$ where $x$ is the observed test statistic.
Conclusion	Reject null hypothesis when the $p$ -value is less than some significance level $\alpha$ . Usually $\alpha = 0.05$ .

Tree example


Hypotheses	$H_0: e_i \sim NID(0, \sigma^2)$ vs. $H_1:$ not $H_0$
Assumptions	Assume method of moments estimate for $\sigma^2$ is good enough.
Test statistic	Residual and QQ-plot But what is its distribution??
P-value	???
Conclusion	???

15/26

スライドは emitanaka.org/slides/FukuokaR2021

レッスン四

hypothesis testing (frequentist version)
linear model
residual analysis
null plots

16/26

スライドは emitanaka.org/slides/FukuokaR2021

Samples from the null distribution

Under $H_0$ , we have $e_i \sim NID(0, \sigma^2)$ .

17/26

スライドは emitanaka.org/slides/FukuokaR2021

Samples from the null distribution

Under $H_0$ , we have $e_i \sim NID(0, \sigma^2)$ .
The estimate of $\sigma$ , denoted $\hat{\sigma}$ is given below:

sigma_hat <- glance(fit)$sigma
sigma_hat

## [1] 0.1149578

17/26

スライドは emitanaka.org/slides/FukuokaR2021

Samples from the null distribution

Under $H_0$ , we have $e_i \sim NID(0, \sigma^2)$ .
The estimate of $\sigma$ , denoted $\hat{\sigma}$ is given below:

sigma_hat <- glance(fit)$sigma
sigma_hat

## [1] 0.1149578

Parametric bootstrap $e_i$ s:

null_generator <- function() {
    rnorm(n = nrow(trees), 
          mean = 0, 
          sd = sigma_hat) 
  }

17/26

スライドは emitanaka.org/slides/FukuokaR2021

Samples from the null distribution

Under $H_0$ , we have $e_i \sim NID(0, \sigma^2)$ .
The estimate of $\sigma$ , denoted $\hat{\sigma}$ is given below:

sigma_hat <- glance(fit)$sigma
sigma_hat

## [1] 0.1149578

Parametric bootstrap $e_i$ s:

null_generator <- function() {
    rnorm(n = nrow(trees), 
          mean = 0, 
          sd = sigma_hat) 
  }

set.seed(2021) # for reproducibility
sample1 <- null_generator()
sample1

##  [1] -0.014077733  0.063509210  0.040079987  0.041342539  0.103238294 -0.221014401  0.030089561  0.105251514  0.001583192
## [10]  0.198872794 -0.124407911 -0.031363388  0.020921794  0.173418675  0.184446385 -0.211692022  0.186612202  0.015104195
## [19]  0.170266610  0.173967771 -0.108341224 -0.021345944 -0.126582888  0.138882296 -0.186799392  0.012114064 -0.167314594
## [28] -0.040696923 -0.010771552  0.126530466 -0.225757053

17/26

スライドは emitanaka.org/slides/FukuokaR2021

Samples from the null distribution

Under $H_0$ , we have $e_i \sim NID(0, \sigma^2)$ .
The estimate of $\sigma$ , denoted $\hat{\sigma}$ is given below:

sigma_hat <- glance(fit)$sigma
sigma_hat

## [1] 0.1149578

Parametric bootstrap $e_i$ s:

null_generator <- function() {
    rnorm(n = nrow(trees), 
          mean = 0, 
          sd = sigma_hat) 
  }

set.seed(2021) # for reproducibility
sample1 <- null_generator()
sample1

##  [1] -0.014077733  0.063509210  0.040079987  0.041342539  0.103238294 -0.221014401  0.030089561  0.105251514  0.001583192
## [10]  0.198872794 -0.124407911 -0.031363388  0.020921794  0.173418675  0.184446385 -0.211692022  0.186612202  0.015104195
## [19]  0.170266610  0.173967771 -0.108341224 -0.021345944 -0.126582888  0.138882296 -0.186799392  0.012114064 -0.167314594
## [28] -0.040696923 -0.010771552  0.126530466 -0.225757053

sample2 <- null_generator()
sample2

##  [1] -0.166452530  0.117192993 -0.163403008 -0.069495694 -0.182032707 -0.147827979 -0.167227403 -0.010009506  0.058023401
## [10]  0.013379792  0.202350333 -0.039673836  0.243710596 -0.003951961 -0.091064303  0.169622012 -0.083408475  0.035910414
## [19]  0.079546685 -0.057512339 -0.259329822  0.005028408 -0.042398523 -0.110385074  0.011928749  0.049120227 -0.019598186
## [28] -0.178085791 -0.173080487  0.001844330 -0.021309077

17/26

スライドは emitanaka.org/slides/FukuokaR2021

Data plot: Residual plot

gres <- ggplot(res, 
    aes(`log(Girth)`, .std.resid)) + 
  geom_point() + 
  geom_hline(yintercept = 0) +
  theme_void()
gres

Null plot と違いありますか？

Null plots

gres %+% 
  mutate(res, .std.resid = sample1)

gres %+% 
  mutate(res, .std.resid = sample2)

18/26

スライドは emitanaka.org/slides/FukuokaR2021

Data plot: QQ-plot

gres <- ggplot(res, 
    aes(sample = .std.resid)) + 
  geom_qq() + 
  geom_qq_line() +
  coord_equal() +
  theme_void()
gres

Null plot と違いありますか？

Null plots

gres %+% 
  mutate(res, .std.resid = sample1)

gres %+% 
  mutate(res, .std.resid = sample2)

19/26

スライドは emitanaka.org/slides/FukuokaR2021

hypothesis testing (frequentist version)
linear model
residual analysis
null plots

Visual Inference

20/26

スライドは emitanaka.org/slides/FukuokaR2021

Data plots tend to be over-interpreted

Reading data plots requires calibration

その為に null plots が必要

Buja et al. (2008) “Statistical Inference for Exploratory Data Analysis and Model Diagnostics.” Philosophical Transactions. Series A, Mathematical, Physical, and Engineering Sciences 367 (1906): 4361–83.

21/26

スライドは emitanaka.org/slides/FukuokaR2021

どれか変わってるプロットありますか？

22/26

スライドは emitanaka.org/slides/FukuokaR2021

Orange Tree Growth

ggplot(Orange) +
  aes(age, circumference, color = Tree) + 
  geom_point(size = 3) + 
  geom_smooth(method = "lm")

The fitted model:

fit <- lm(circumference ~ Tree + age:Tree, 
          data = Orange)

23/26

スライドは emitanaka.org/slides/FukuokaR2021

`nullabor`

library(nullabor)
method <- null_lm(circumference ~ Tree + age:Tree, method = "pboot")
fit <- lm(circumference ~ Tree + age:Tree, data = Orange)
fit_df <- Orange %>% 
  mutate(.resid = residuals(fit))
set.seed(2021)
line_df <- lineup(method, true = fit_df)

## decrypt("bhMq KJPJ 62 sSQ6P6S2 uD")

ggplot(line_df, aes(age, .resid)) + 
  geom_point() + 
  geom_hline(yintercept = 0) +
  facet_wrap(~.sample) + 
  labs(x = "", y = "") +
  theme(axis.text = element_blank(),
        panel.border = element_rect(fill = "transparent"),
        axis.line = element_blank(),
        axis.ticks.length = unit(0, "mm"),
        strip.background = element_rect(fill = "red"),
        strip.text = element_text(color = "white", face = "bold", size = 18))

24/26

スライドは emitanaka.org/slides/FukuokaR2021

Statistical significance of the data plot

🧑🏻‍⚖️🧑🏼‍⚖️🧑🏽‍⚖️🧑🏾‍⚖️🧑🏿‍⚖️👨🏻‍⚖️👨🏼‍⚖️👨🏽‍⚖️👨🏾‍⚖️👨🏿‍⚖️

多数人の評価をまとめる

Let $X$ be the number of observers out of $K$ independent observers picking the data plot from the lineup.
Assume that $X \sim B(K, 1/m)$ under $H_0$ .
The of a lineup of size $m$ evaluated by $K$ observers is given as $P(X \geq x)$ .

Unlike conventional hypothesis testing, visual inference p-value depends on

plot design,
the individuals' visual skills,

the number of $K$ observers,
the size $m$ of the lineup, and

effect size.

Majumder, M., Heiki Hofmann, and Dianne Cook. 2013. “Validation of Visual Statistical Inference, Applied to Linear Models.” Journal of the American Statistical Association 108 (503): 942–56.

25/26

This slide is made using the xaringan R-package and found at

emitanaka.org/slides/FukuokaR2021

Emi Tanaka

Department of Econometrics and Business Statistics,
Monash University, Melbourne, Australia

emi.tanaka@monash.edu

@statsgen

26/26

This slide is made using the xaringan R-package and found at

emitanaka.org/slides/FukuokaR2021

Thank you!

Emi Tanaka

Department of Econometrics and Business Statistics,
Monash University, Melbourne, Australia

emi.tanaka@monash.edu

@statsgen

26/26

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help

Visual Inference

絵による推論

A little about me

自己紹介

A little about me

自己紹介

A little about me

自己紹介

A little about me

自己紹介

A little about me

自己紹介

Visual Inference

絵による推論

Visual Inference

絵による推論

Visual Inference

絵による推論

レッスン 一

Testing coin bias

実験その１

実験その２

(Frequentist) hypotheses testing framework

(Frequentist) hypotheses testing framework

レッスン 二

Black Cherry Trees: Volume vs Girth

Model fit

レッスン 三

Model diagnostic

Coin toss example

Tree example

レッスン 四

Samples from the null distribution

Samples from the null distribution

Samples from the null distribution

Samples from the null distribution

Samples from the null distribution

Data plot: Residual plot

Null plots

Data plot: QQ-plot

Null plots

Orange Tree Growth

nullabor

Statistical significance of the data plot

Visual Inference

絵による推論

Help

レッスン一

レッスン二

レッスン三

レッスン四

`nullabor`