Comparing two population proportions

STAT1003 – Statistical Techniques

Dr. Emi Tanaka

Australian National University

These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.

Example: CPR study

There is a new blood thinner drug that is believed to improve the survival rate of patients who underwent cardiopulmonary resuscitation (CPR) for a heart attack. Does the new drug improve the survival rate?

An experiment was conducted where 90 patients who underwent CPR for a heart attack and were subsequently admitted to a hospital were randomly divided into:
- the treatment group where they received the blood thinner or
- the control group where they did not receive the blood thinner.
The outcome variable of interest was whether the patients survived for at least 24 hours.

Group	Survived	Died	Total
Treatment	11	39	50
Control	14	26	40
Total	25	65	90

Let \(p_1\) and \(p_2\) be the population proportions of patients who survived in the treatment and control groups, respectively.
We are interested in \(p_1 - p_2\).

Sampling distribution of the difference in sample proportions

Let \(X_{1}\) be the number of patients who survived out of \(n_1\) patients in the treatment group.
Let \(X_{2}\) be the number of patients who survived out of \(n_2\) patients in the control group.
The sample proportions are \(\hat{p}_1 = X_1/n_1\) and \(\hat{p}_2 = X_2/n_2\)
An estimator for \(p_1 - p_2\) is \(\hat{p}_1 - \hat{p}_2\).
Provided that:
- the samples are independent within and between the two groups, and
- success-failure condition is satisfied for CLT:
  \(n_1\hat{p}_1 \geq 10\), \(n_1(1-\hat{p}_1) \geq 10\), \(n_2\hat{p}_2 \geq 10\), and \(n_2(1-\hat{p}_2) \geq 10\),

\[\class{highlight mark-yellow}{\hat{p}_1 - \hat{p}_2 \sim N\left(p_1 - p_2, \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}}\right)}\]

\(100(1-\alpha)\%\) confidence interval for \(p_1 - p_2\)

\[\class{highlight mark-yellow}{\hat{p}_{1} - \hat{p}_{2} \pm z_{\alpha / 2} \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_{1}} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_{2}}}}\]

Group	Survived	Died	Total
Treatment	11	39	50
Control	14	26	40
Total	25	65	90

\(\hat{p}_1 = 11/50 = 0.22\) and \(\hat{p}_2 = 14/40 = 0.35\)
We have \(n_1\hat{p}_1 = 11\), \(n_1(1-\hat{p}_1) = 39\), \(n_2\hat{p}_2 = 14\), and \(n_2(1-\hat{p}_2) = 26\), so the success-failure condition is satisfied.
This is a randomized experiment, so we can assume samples are independent within and between the two groups.

A 90% confidence interval for \(p_1 - p_2\) is given by:

\[\frac{11}{50} - \frac{14}{40} \pm z_{0.05} \sqrt{\frac{\frac{11}{50}(1-\frac{11}{50})}{50} + \frac{\frac{14}{40}(1-\frac{14}{40})}{40}} = (-0.2871, 0.0271).\]

where \(z_{0.05} \approx 1.645\).

Hypothesis testing for comparing two population proportions

\(H_0: p_1 = p_2\)

\[\class{highlight mark-yellow}{Z = \dfrac{\hat{p}_{1} - \hat{p}_{2}}{\sqrt{\hat{p}_{p}(1-\hat{p}_{p})(\frac{1}{n_{1}} + \frac{1}{n_{2}})}}} \sim N(0, 1) \text{ under } H_0\]

where \(\hat{p}_p = \dfrac{X_1 + X_2}{n_1 + n_2}\) is the pooled sample proportion.

\(H_A: p_1 < p_2\) \(\rightarrow\) P-value \(= P(Z \leq z^*)\)
\(H_A: p_1 > p_2\) \(\rightarrow\) P-value \(= P(Z \geq z^*)\)
\(H_A: p_1 \neq p_2\) \(\rightarrow\) P-value \(= 2P(Z \geq |z^*|)\)

where \(z^* = \dfrac{\frac{x_1}{n_1} - \frac{x_2}{n_2}}{\sqrt{\frac{x_1 + x_2}{n_1 + n_2}\left(1-\frac{x_1 + x_2}{n_1 + n_2}\right)\left(\frac{1}{n_{1}} + \frac{1}{n_{2}}\right)}}\).

\[\begin{align*} z^* &= \dfrac{\frac{11}{50} - \frac{14}{40}}{\sqrt{\frac{11 + 14}{50 + 40}\left(1-\frac{11 + 14}{50 + 40}\right)\left(\frac{1}{50} + \frac{1}{40}\right)}}\\ &= -1.37 \end{align*}\]

P-value \(= P(Z \geq -1.37) \approx 0.9147\).

Since P-value \(> 0.05\), there is no evidence to support the claim that the new drug improves the survival rate.

Example: Quadcopter rotor blade manufacturer

A quadcopter company is considering a new manufacturer for rotor blades. The new manufacturer would be more expensive, but they claim their higher-quality blades are more reliable, with at least 3% more blades passing inspection than their competitor. Is there evidence to support the claim?

The quality control engineer examines 1000 blades from each company.
She finds 958 blades pass inspection from the prospective supplier, and
899 blades pass inspection from the current supplier.
Let \(p_1\) and \(p_2\) be the population proportions of blades that pass inspection for the prospective and current suppliers, respectively.

\(H_0: p_1 = p_2 + 0.03\) vs. \(H_A: p_1 > p_2 + 0.03\)
First, check the conditions:
- the sample is not necessarily random, so we must assume the blades are independent within and between the two groups, and
- the success-failure condition is satisfied for each company.
\(H_0\) is not \(p_1 = p_2\), so we cannot use the pooled sample proportion.
\(z^* = \dfrac{(0.958 - 0.899) - \class{highlight mark-green}{0.03}}{\sqrt{\class{highlight mark-green}{\frac{0.958(1 - 0.958)}{1000} + \frac{0.899(1 - 0.899)}{1000}}}} \approx 2.53\)
P-value \(= P(Z \geq 2.53) \approx 0.0057 < 0.05 \rightarrow\) there is evidence to support that the new manufacturer has at least 3% more blades that pass inspection.

Summary

\(H_0: p_1 = p_2\)

Observe \(n_1\) samples with \(X_1\) successes from population 1 with probability of success \(p_1\).
Observe \(n_2\) samples with \(X_2\) successes from population 2 with probability of success \(p_2\).
Samples from populations 1 and 2 should be independent.
Success-failture condition should be satisfied:

\[n_1\hat{p}_1 \geq 10, n_1(1-\hat{p}_1) \geq 10, n_2\hat{p}_2 \geq 10\text{, and }n_2(1-\hat{p}_2) \geq 10\]

Confidence interval for \(\mu_1 - \mu_2\)

\[\hat{p}_{1} - \hat{p}_{2} \pm z_{\alpha / 2} \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_{1}} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_{2}}}\]

\(\hat{p}_1 = \dfrac{X_1}{n_1}\), \(\hat{p}_2 = \dfrac{X_2}{n_2}\) and \(\hat{p}_p = \dfrac{X_1 + X_2}{n_1 + n_2}\)

Test statistic under \(H_0: p_1 = p_2\)

\[Z = \dfrac{\hat{p}_{1} - \hat{p}_{2}}{\sqrt{\hat{p}_{p}(1-\hat{p}_{p})(\frac{1}{n_{1}} + \frac{1}{n_{2}})}} \sim N(0, 1)\]

P-value:

\(H_A\)	P-value
\(H_A: p_1 \neq p_2\)	\(P(\|Z\| \geq \|z^*\|)\)
\(H_A: p_1 > p_2\)	\(P(Z \geq z^*)\)
\(H_A: p_1 < p_2\)	\(P(Z \leq z^*)\)