Hypothesis testing for population proportion

STAT1003 – Statistical Techniques

Dr. Emi Tanaka

Australian National University

These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.

Example: Is this coin biased towards heads?

Suppose I have a coin that I’m going to flip
I have some suspicion that the coin has been tweaked to be biased towards heads.
Let \(p\) be the probability of getting a head.
If the coin is biased towards heads, then
So how would we test if a coin is biased towards head or not?
We’ll collect some data.

I flipped the coin 10 times and this is the result:

The result is 7 head and 3 tails. So 70% are heads.
Do you believe the coin is biased towards heads based on this data?

Example: Is this coin biased towards heads?

Suppose now I flip the coin 200 times and this is the outcome:

We observe 140 heads and 60 tails. So again 70% are heads.
Based on this data, do you think the coin is biased towards heads?
If the coin was fair, how many heads did you expect to see?

A one-sided binomial test

Suppose \(X\) is the number of heads out of \(n\) independent tosses.
Let \(p\) be the probability of getting a head for this coin.

Hypotheses: \(H_0: p = 0.5\) vs. \(H_A: p > 0.5\)
Assumptions: Each toss is independent with equal chance of getting a head.
Test statistic: \(X \sim B(n, p)\) under \(H_0\).
- Let \(x\) be the observed number of heads.
- High value of \(x\) suggests that the coin is biased towards heads (\(p > 0.5\)).
P-value: The probability of observing a test statistic as extreme or more extreme than the one observed, assuming \(H_0\) is true. For \(H_A: p > 0.5\), p-value \(= P(X \geq x)\).
Conclusion: Select a significance level \(\alpha\) (e.g. \(\alpha = 0.05\)).
- If p-value < \(\alpha\), the observed data is unlikely to have occurred under \(H_0\).
- If p-value \(\geq \alpha\), the observed data is consistent with \(H_0\).

Example: calculating the p-value for \(H_A: p > 0.5\)

\(H_0: p = 0.5\) vs. \(H_A: p > 0.5\)

We observed \(x = 7\) heads out of \(n = 10\) tosses.

The p-value is \(P(X \geq x) = 1 - P(X \leq x - 1) = 1 - P(X \leq 6)\).

We observed \(x = 140\) heads out of \(n = 200\) tosses.

The p-value is \(P(X \geq x) = 1 - P(X \leq x - 1) = 1 - P(X \leq 139)\).

Significance level \(\alpha\) and Type I error

The significance level \(\alpha\) is the threshold for rejecting the null hypothesis.
It is the probability of making a Type I error: rejecting the null hypothesis when it is actually true.
Common choices for \(\alpha\) are 0.05, 0.01, or 0.10, but the choice should depend on the context of the problem and the consequences of making a Type I error.
If the data is generated under \(H_0\), the p-value follows roughly a uniform distribution on [0, 1].

A two-sided binomial test

Hypothesis tests can also be two-sided.
Hypothesis: \(H_0: p = 0.5\) vs. \(H_A: p \neq 0.5\)
Assumptions: Each toss is independent with equal chance of getting a head.
Test statistic: \(X \sim B(n, p)\) under \(H_0\).
- Let \(x\) be the observed number of heads.
- High or low values of \(x\) suggests that the coin is biased.

P-value: the probability of observing a test statistic as extreme or more extreme than the one observed, assuming \(H_0\) is true. For a two-sided test, this is \[P(|X - E(X)| \geq |x - E(X)|).\]

Conclusion: Select a false positive rate \(\alpha\) (e.g. \(\alpha = 0.05\)).
- If p-value < \(\alpha\), the observed data is unlikely to have occurred under \(H_0\).
- If p-value \(\geq \alpha\), the observed data is consistent with \(H_0\).

Example: calculating the p-value for \(H_A: p \neq 0.5\)

\(H_0: p = 0.5\) vs. \(H_A: p \neq 0.5\)

We observed \(x = 7\) heads out of \(n = 10\) tosses.

The p-value is \(P(|X - E(X)| \geq |x - E(X)|) = P(|X - 5| \geq 2) = P(X \leq 3) + P(X \geq 7)\).

We observed \(x = 140\) heads out of \(n = 200\) tosses.

The p-value is \(P(|X - 100| \geq 40) = P(X \leq 60) + P(X \geq 140)\).

Binomial test (two-sided) with R

\(H_0: p = 0.5\) vs. \(H_A: p \neq 0.5\)

Binomial test (one-sided) with R

Note: if performing a one-sided test, the directon should have been specified in advance of the experiment.

Upper-tail test: \(H_0: p = 0.5\) vs. \(H_A: p > 0.5\)

Lower-tail test: \(H_0: p = 0.5\) vs. \(H_A: p < 0.5\)

Null hypothesis significance testing: Conclusion

Conclusion: Reject \(H_0\) when the p-value is less than some significance level \(\alpha\).
Usually \(\alpha = 0.05\), but some argue it should be even lower.
There has been a lot of misuse of p-values in scientific research that American Statistical Association (ASA) issued a statement on p-values.

The ASA statement on p-values highlights the following six principles:

P-values can indicate how incompatible the data are with a specified statistical model.
P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
Proper inference requires full reporting and transparency.
A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.

Confidence interval for hypothesis testing

In light of misuses of and misconceptions concerning p-values, the statement notes that statisticians often supplement or even replace p-values with other approaches. These include methods “that emphasize estimation over testing such as confidence … intervals”

— ASA Statement on p-values

If \(H_0: p = p_0\) vs \(H_A: p \neq p_0\) and the \(100(1-\alpha)\)% confidence interval contains \(p_0\), then we fail to reject the null hypothesis at \(\alpha\) significance level.

Here the 95% confidence interval does not contain \(p_0 = 0.5\), therefore the null hypothesis is rejected at \(\alpha = 0.05\) significance level.

Summary

Hypothesis testing is a systematic way to evaluate evidence against a null hypothesis. It involves:

Hypothesis: Formulating null \(H_0\) and alternative \(H_A\) hypotheses.
Assumptions: Specifying assumptions about the data and the test statistic.
Test statistic: Choosing an appropriate test statistic based on the hypotheses and assumptions.
P-value: The probability of observing a test statistic as extreme or more extreme than the one observed, assuming the null hypothesis is true.
Conclusion: Making a decision to reject (p-value \(< \alpha\)) or fail to reject (p-value \(\geq \alpha\)) the null hypothesis based on the p-value and a predetermined significance level \(\alpha\).

Binomial test \(H_0: p = p_0\)

Test statistic: \(X \sim B(n, p_0)\) under \(H_0\).
P-value:
- \(H_A: p > p_0\) (upper-tail test) \(\rightarrow\) p-value = \(P(X \geq x)\)
- \(H_A: p < p_0\) (lower-tail test) \(\rightarrow\) p-value = \(P(X \leq x)\)
- \(H_A: p \neq p_0\) (two-tail test) \(\rightarrow\) p-value = \(P(|X - np_0| \geq |x - np_0|)\)