Distribution of sample proportion

STAT1003 – Statistical Techniques

Dr. Emi Tanaka

Australian National University

These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.

Sampling distribution of the sample proportion

  • If \(X \sim B(n, p)\), then \(X\) is the number of successes.
  • We estimate \(p\) using the sample proportion \[\hat{p} = \frac{X}{n}.\]
  • \(\hat{p}\) is a random variable.

The true (population) prime minister support rate \(p\) is unknown. We could take a poll and find the sample approval rate \(\hat{p}\) as an estimate.

The true (population) proportion of people who prefer Coke than Pepsi is unknown. We could randomly select a sample of people and calculate the sample proportion \(\hat{p}\).

CLT for the sample proportion

\[ X = X_1 + \cdots + X_n, \quad X_i = \begin{cases} 1 & \text{success} \\ 0 & \text{failure} \end{cases} \]

\[ \hat{p} = \frac{X_1 + \cdots + X_n}{n} \]

  • \(\hat{p}\) is the sample mean of Bernoulli variables.
  • CLT applies.

For \(X \sim B(n, p)\),

\(X \overset{\text{approx.}}{\sim} N\left(np, np(1-p)\right)\qquad\) and \(\qquad\hat{p} \overset{\text{approx.}}{\sim} N\!\left(p, \dfrac{p(1-p)}{n}\right)\)

provided sample size is sufficiently large.

Success-failture condition: \(np \ge 10\) and \(n(1-p) \ge 10\).

Example: Smoker

  • Approximately 15% of the US population smokes cigarettes.
  • A local government believed their community had a lower smoker rate and commissioned a survey of 400 randomly selected individuals.
  • The survey found that only 42 of the 400 participants smoke cigarettes.
  • If the true proportion of smokers in the community was really 15%, what is the probability of observing 42 or fewer smokers in a sample of 400 people?
  • The true proportion of smokers is \(p = 0.15\).
  • The sample size is \(n = 400\).
  • We want to calculate \(P(\hat{p} \le \frac{42}{400} = 0.105)\).
  • As \(n\) is large, \(\hat{p} \overset{\text{approx.}}{\sim} N\!\left(0.15, \dfrac{0.15 \times 0.85}{400}\right)\).

Example: Women on Corporate Boards

Over the past few years there has been increased monitoring of the representation of women on corporate boards. Suppose that the true percentage of women of ASX 200 boards is now 24.6% and that a random sample of 220 board members is chosen.

  1. What is the probability that in the sample less than 24% of board members will be women?
  2. If a sample of 100 is taken, how does this change your answer to above?
  • The sample proportion \(\hat{p} \overset{\text{approx.}}{\sim} N\!\left(0.246, \dfrac{0.246 \times 0.754}{220}\right)\).
  • \(P(\hat{p} < 0.24)\)
  • If \(n = 100\), then \(\hat{p} \overset{\text{approx.}}{\sim} N\!\left(0.246, \dfrac{0.246 \times 0.754}{100}\right)\) and \(P(\hat{p} < 0.24)\) is smaller.