
STAT1003 – Statistical Techniques
Dr. Emi Tanaka
Australian National University
These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.
This lecture was partially adapted from the previous STAT1003 lecturers. Thank you folks!
Is the new program more effective than the standard program in reducing recovery time?





Is the new program more effective than the standard program in reducing recovery time?
\[Z = \dfrac{(\bar{X}_{1} - \bar{X}_{2}) - (\mu_{1} - \mu_{2})}{\sqrt{\dfrac{\sigma_{1}^2}{n_{1}} + \dfrac{\sigma_{2}^2}{n_{2}}}}\]
\(Z \sim N(0, 1)\) under \(H_0: \mu_{1} = \mu_{2}\).
\[T = \dfrac{(\bar{X}_{1} - \bar{X}_{2}) - (\mu_{1} - \mu_{2})}{\sqrt{\dfrac{s_{1}^2}{n_{1}} + \dfrac{s_{2}^2}{n_{2}}}}\]
\(T \sim t_{\class{highlight mark-yellow}{\min(n_{1}, n_{2}) - 1}}\) under \(H_0: \mu_{1} = \mu_{2}\).
\[T = \dfrac{(\bar{X}_{1} - \bar{X}_{2}) - (\mu_{1} - \mu_{2})}{\sqrt{\class{highlight mark-yellow}{s_p^2}\left(\dfrac{1}{n_{1}} + \dfrac{1}{n_{2}}\right)}}\]
\(T \sim t_{\class{highlight mark-yellow}{n_{1} + n_{2} - 2}}\) under \(H_0: \mu_{1} = \mu_{2}\).
\[\bar{X}_{1} - \bar{X}_{2} \pm z_{\alpha / 2} \sqrt{\dfrac{\sigma_{1}^2}{n_{1}} + \dfrac{\sigma_{2}^2}{n_{2}}}\]
\[\bar{X}_{1} - \bar{X}_{2} \pm t_{\text{df}, \alpha / 2} \sqrt{\dfrac{s_{1}^2}{n_{1}} + \dfrac{s_{2}^2}{n_{2}}}\]
where:










\(H_0: \mu_1 = \mu_2\)
\(z^* = \dfrac{\bar{x}_{1} - \bar{x}_{2}}{\sqrt{\dfrac{\sigma_1^2}{n_1} + \dfrac{\sigma_2^2}{n_2}}}\)
where \(Z \sim N(0, 1)\) under \(H_0\).
\(t^* = \dfrac{\bar{x}_{1} - \bar{x}_{2}}{\sqrt{\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}}}\)
where

P-values:

P-values:

P-values:

P-values:

P-values:
Growth rates of 14 unicellular alga Chlamydomonas after 1,000 generations of selection under High and Normal levels of carbon dioxide were examined.
Is there a difference in growth rates between the two carbon dioxide levels?
Assume that:
Note that:
var.equal = FALSE, the two sample t-test uses the Welch-Satterthwaite approximation for the degrees of freedom (derivation out of scope for this course).var.equal = TRUE, the two sample t-test assumes that the population variances are equal and uses the pooled variance estimator.scroll
\(H_0: \mu_1 = \mu_2\)
Test statistic under \(H_0\)
Variances known
\[Z = \dfrac{\bar{X}_{1} - \bar{X}_{2}}{\sqrt{\frac{\sigma_{1}^2}{n_{1}} + \frac{\sigma_{2}^2}{n_{2}}}} \sim N(0, 1)\]
Variances unknown
\[T = \dfrac{\bar{X}_{1} - \bar{X}_{2}}{\sqrt{\frac{s_{1}^2}{n_{1}} + \frac{s_{2}^2}{n_{2}}}} \sim t_{\text{df}}\]
where
Confidence interval for \(\mu_1 - \mu_2\)
Variances known
\[\bar{X}_{1} - \bar{X}_{2} \pm z_{\alpha / 2} \sqrt{\dfrac{\sigma_{1}^2}{n_{1}} + \dfrac{\sigma_{2}^2}{n_{2}}}\]
Variances unknown
\[\bar{X}_{1} - \bar{X}_{2} \pm t_{\text{df}, \alpha / 2} \sqrt{\dfrac{s_{1}^2}{n_{1}} + \dfrac{s_{2}^2}{n_{2}}}\]
P-value:
Variances known
| \(H_A\) | P-value |
|---|---|
| \(H_A: \mu_1 \neq \mu_2\) | \(P(|Z| \geq |z^*|)\) |
| \(H_A: \mu_1 > \mu_2\) | \(P(Z \geq z^*)\) |
| \(H_A: \mu_1 < \mu_2\) | \(P(Z \leq z^*)\) |
Variances unknown
| \(H_A\) | P-value |
|---|---|
| \(H_A: \mu_1 \neq \mu_2\) | \(P(|T| \geq |t^*|)\) |
| \(H_A: \mu_1 > \mu_2\) | \(P(T \geq t^*)\) |
| \(H_A: \mu_1 < \mu_2\) | \(P(T \leq t^*)\) |
There is a new blood thinner drug that is believed to improve the survival rate of patients who underwent cardiopulmonary resuscitation (CPR) for a heart attack. Does the new drug improve the survival rate?
| Group | Survived | Died | Total |
|---|---|---|---|
| Treatment | 11 | 39 | 50 |
| Control | 14 | 26 | 40 |
| Total | 25 | 65 | 90 |
\[\class{highlight mark-yellow}{\hat{p}_1 - \hat{p}_2 \sim N\left(p_1 - p_2, \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}\right)}\]
\[\class{highlight mark-yellow}{\hat{p}_{1} - \hat{p}_{2} \pm z_{\alpha / 2} \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_{1}} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_{2}}}}\]
| Group | Survived | Died | Total |
|---|---|---|---|
| Treatment | 11 | 39 | 50 |
| Control | 14 | 26 | 40 |
| Total | 25 | 65 | 90 |
A 90% confidence interval for \(p_1 - p_2\) is given by:
\[\frac{11}{50} - \frac{14}{40} \pm z_{0.05} \sqrt{\frac{\frac{11}{50}(1-\frac{11}{50})}{50} + \frac{\frac{14}{40}(1-\frac{14}{40})}{40}} = (-0.2871, 0.0271).\]
where \(z_{0.05} \approx 1.645\).
\(H_0: p_1 = p_2\)
\[\class{highlight mark-yellow}{Z = \dfrac{\hat{p}_{1} - \hat{p}_{2}}{\sqrt{\hat{p}_{p}(1-\hat{p}_{p})(\frac{1}{n_{1}} + \frac{1}{n_{2}})}}} \sim N(0, 1) \text{ under } H_0\]
where \(\hat{p}_p = \dfrac{X_1 + X_2}{n_1 + n_2}\) is the pooled sample proportion.
where \(z^* = \dfrac{\frac{x_1}{n_1} - \frac{x_2}{n_2}}{\sqrt{\frac{x_1 + x_2}{n_1 + n_2}\left(1-\frac{x_1 + x_2}{n_1 + n_2}\right)\left(\frac{1}{n_{1}} + \frac{1}{n_{2}}\right)}}\).
\[\begin{align*} z^* &= \dfrac{\frac{11}{50} - \frac{14}{40}}{\sqrt{\frac{11 + 14}{50 + 40}\left(1-\frac{11 + 14}{50 + 40}\right)\left(\frac{1}{50} + \frac{1}{40}\right)}}\\ &= -1.37 \end{align*}\]
P-value \(= P(Z \geq -1.37) \approx 0.9147\).
Since P-value \(> 0.05\), there is no evidence to support the claim that the new drug improves the survival rate.
A quadcopter company is considering a new manufacturer for rotor blades. The new manufacturer would be more expensive, but they claim their higher-quality blades are more reliable, with at least 3% more blades passing inspection than their competitor. Is there evidence to support the claim?
\(H_0: p_1 = p_2\)
\[n_1\hat{p}_1 \geq 10, n_1(1-\hat{p}_1) \geq 10, n_2\hat{p}_2 \geq 10\text{, and }n_2(1-\hat{p}_2) \geq 10\]
Confidence interval for \(\mu_1 - \mu_2\)
\[\hat{p}_{1} - \hat{p}_{2} \pm z_{\alpha / 2} \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_{1}} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_{2}}}\]
\(\hat{p}_1 = \dfrac{X_1}{n_1}\), \(\hat{p}_2 = \dfrac{X_2}{n_2}\) and \(\hat{p}_p = \dfrac{X_1 + X_2}{n_1 + n_2}\)
Test statistic under \(H_0: p_1 = p_2\)
\[Z = \dfrac{\hat{p}_{1} - \hat{p}_{2}}{\sqrt{\hat{p}_{p}(1-\hat{p}_{p})(\frac{1}{n_{1}} + \frac{1}{n_{2}})}} \sim N(0, 1)\]
P-value:
| \(H_A\) | P-value |
|---|---|
| \(H_A: p_1 \neq p_2\) | \(P(|Z| \geq |z^*|)\) |
| \(H_A: p_1 > p_2\) | \(P(Z \geq z^*)\) |
| \(H_A: p_1 < p_2\) | \(P(Z \leq z^*)\) |
Are the prices of textbooks on Amazon close to ones in the UCLA bookstore?
73 textbooks were identified as required for UCLA courses.
The price of each textbook on Amazon and the university bookstore was recorded.
| Course | ISBN |
Price (USD)
|
||
|---|---|---|---|---|
| UCLA | Amazon | Difference | ||
| C139 | 978-0520224759 | 18.75 | 20.21 | -1.46 |
| 104 | 978-0470419977 | 174.00 | 143.75 | 30.25 |
| C170 | 978-0803272620 | 27.67 | 27.95 | -0.28 |
| M422 | 978-0761918479 | 88.42 | 97.95 | -9.53 |
| 127B | 978-0324828641 | 214.50 | 173.56 | 40.94 |
| 10 | 978-0195181234 | 24.70 | 16.47 | 8.23 |
| M104D | 978-1412969666 | 89.71 | 85.45 | 4.26 |
| M132B | 978-0205753383 | 55.13 | 42.68 | 12.45 |
| M104C | 978-0582276024 | 16.00 | 11.67 | 4.33 |
| 124 | 978-0078136634 | 183.75 | 145.40 | 38.35 |

| Source | n | Mean Price | Standard Deviation |
|---|---|---|---|
| UCLA | 73 | 72.22 | 59.66 |
| Amazon | 73 | 59.46 | 49.00 |
| Difference | 73 | 12.76 | 14.26 |
\(12.76 \pm t_{72, 0.025} \times \dfrac{14.26}{\sqrt{73}}\) \(\approx\) \((9.44, 16.09).\)
\(H_0: \mu_d = 0 \text{ vs. } H_A: \mu_d \neq 0\)
\[\class{highlight mark-yellow}{T = \dfrac{\bar{X}_d}{s_d / \sqrt{n}}} \sim t_{n-1} \text{ under } H_0\]
\(t^* = \dfrac{12.76}{14.26/\sqrt{73}} \approx 7.65\)
P-value \(= 2P(T \geq 7.65) \approx 0.0000\).
Since P-value \(< 0.05\), there is evidence to support that the prices of textbooks on Amazon are significantly different from the ones in the UCLA bookstore.
scroll
\(H_0: \mu_d = 0\)
Confidence interval for \(\mu_d\)
Variance known
\[\bar{X}_d \pm z_{\alpha / 2} \frac{\sigma_d}{\sqrt{n}}\]
Variance unknown
\[\bar{X}_d \pm t_{n - 1, \alpha / 2} \frac{s_d}{\sqrt{n}}\]
P-value
Variance known
\(Z = \dfrac{\bar{X}_d}{\sigma_d / \sqrt{n}} \sim N(0, 1)\) under \(H_0\).
| \(H_A\) | P-value |
|---|---|
| \(H_A: \mu_d \neq 0\) | \(P(|Z| \geq |z^*|)\) |
| \(H_A: \mu_d > 0\) | \(P(Z \geq z^*)\) |
| \(H_A: \mu_d < 0\) | \(P(Z \leq z^*)\) |
Variance unknown
\(T = \dfrac{\bar{X}_d}{s_d / \sqrt{n}} \sim t_{n - 1}\) under \(H_0\).
| \(H_A\) | P-value |
|---|---|
| \(H_A: \mu_d \neq 0\) | \(P(|T| \geq |t^*|)\) |
| \(H_A: \mu_d > 0\) | \(P(T \geq t^*)\) |
| \(H_A: \mu_d < 0\) | \(P(T \leq t^*)\) |

STAT1003 – Statistical Techniques