Central limit theorem

STAT1003 – Statistical Techniques

Dr. Emi Tanaka

Australian National University

These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.

Central limit theorem

If a random variable is the mean of independent random values, then that value will follow a normal distribution regardless of how the individual terms are distributed.


Or mathematically, let \(X_1, \ldots, X_n\) be a random sample of \(n\) independent observations from a population with mean \(\mu\) and variance \(\sigma^2\). Then, as \(n \to \infty\),

\[\bar{X} \sim N\!\left(\mu,\; \frac{\sigma^2}{n}\right)\]

  • A number of distribution in nature appears to conform a normal distribution (if you ignore the fact that some values can never be negative).
  • This could be because the observation is the result of the sum of many independent random variables.

When can we approximate with Normal distribution?

  • Independence: observations must be independent.
  • Sample size \(n\):
    • If \(X\) is normally distributed, \(\bar{X}\) is normal for all \(n\).
    • If \(X\) is not normally distributed, \(\bar{X}\) is approximately normal for large \(n\).:
  • \(n < 30\): data should be nearly normal with no clear outliers.
  • \(n \ge 30\): sampling distribution of \(\bar{X}\) is approximately normal unless there are extreme outliers.

Case 1: \(N(\mu, \sigma^2)\)

Case 2: \(U(a, b)\)

Case 3: \(B(m, p)\)

Case 4: \(\text{Poisson}(\lambda)\)

Example: Physician Part 1

The time a family physician spends seeing a patient follows a right-skewed distribution with mean 15 minutes and standard deviation 11.6 minutes.

What is the time that the physician spends less than 12 minutes seeing a patient?

  • Let \(X\) be consultation time.
  • We want \(P(X < 12)\), but the distribution of \(X\) is unknown.
  • It is not possible to calculate \(P(X < 12)\).

Example: Physician Part 2

The time a family physician spends seeing a patient follows a right-skewed distribution with mean 15 minutes and standard deviation 11.6 minutes.

What is the probability that the doctor spends an average time of less than 12 minutes with her 30 patients?

  • Let \(\bar{X}\) be the mean consultation time.
  • By CLT, \[\bar{X} \overset{\text{approx.}}{\sim} N\!\left(15, \frac{11.6^2}{30}\right)\]
  • We want \(P(\bar{X} < 12)\).

Example: Physician Part 3

The time a family physician spends seeing a patient follows a right-skewed distribution with mean 15 minutes and standard deviation 11.6 minutes.

One day, 35 patients have appointments. What is the probability the doctor works overtime beyond 8 hours?

  • Let \(X_i\) be the consultation time for patient \(i\).
  • \(P\!\left(\sum_{i=1}^{35} X_i > 480\right) = P\left(\bar{X} > \frac{480}{35}\right)\)
  • Again by CLT, \[\bar{X} \overset{\text{approx.}}{\sim} N\!\left(15, \frac{11.6^2}{35}\right)\]
  • We want \(P\left(\bar{X} < \frac{480}{35}\right)\).

Example: Passengers

The number of passengers passing through a large South East Asian airport is normally distributed with a mean of 110,000 persons per day and a standard deviation of 20,200 persons. If you select a random sample of 16 days:

  1. What is the probability that \(\bar{X}\) is between 102,000 and 104,500 passengers per day?
  2. The probability is 60% that \(\bar{X}\) will be between which two values symmetrically distributed around the population mean?
  • \(\bar{X} \overset{\text{approx.}}{\sim} N\!\left(110000, \frac{20200^2}{16}\right)\).
  • \(P(102000 < \bar{X} < 104500) = P(\bar{X} < 104500) - P(\bar{X} < 102000)\)
  • \(P(|\bar{X} - 110000| < q) = 0.6\) so \(P\left(|Z| < \frac{q}{20200 / \sqrt{16}}\right) = 0.6 \rightarrow P(Z < \frac{q}{20200 / \sqrt{16}}) = 0.8\).