Chi-squared Tests

STAT1003 – Statistical Techniques

Dr. Emi Tanaka

Australian National University

These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.

Example: blood type distribution

  • The global distribution of blood types (A, B, AB, O) is A: 40%, B: 25%, AB: 10%, O: 25%.
  • A scientist wants to know whether migration, population history, or selective factors have influenced the local blood type distribution.
  • The observed blood type in 200 randomly selected individuals from a local population is as follows: A: 85, B: 40, AB: 15, O: 60.
  • Does the observed distribution differ from the global distribution?
    • \(H_0\): The observed distribution is the same as the global distribution vs. 
    • \(H_A\): The observed distribution is different from the global distribution.
  • How should we test these hypotheses?

Chi-squared test for goodness-of-fit

  • Let \(O_i\) be the observed count for category \(i\) and \(E_i = n \times p_i\) be the expected count for category \(i\) under \(H_0\) where \(n\) is the total sample size and \(p_i\) is the expected proportion for category \(i\) for \(i = 1, 2, \ldots, k\).
  • Note that \(\sum_{i=1}^k O_i = \sum_{i=1}^k E_i = n\), where \(n\) is the total sample size.

The chi-squared test statistic is defined as:

\[X^2 = \sum_{i=1}^k \frac{(O_i - E_i)^2}{E_i} = \sum_{i=1}^k \frac{O_i^2}{E_i} - n.\]

  • Recall that \(Y \sim \text{Poisson}(\lambda)\) then \(E(Y) = \text{Var}(Y) = \lambda\).
  • We assume that \(\frac{O_i - E_i}{\sqrt{E_i}} \overset{\text{approx.}}{\sim} N(0, 1)\) for each \(i\) provided \(E_i \geq 5\) and cases are independent.
  • Thus, \(X^2 \overset{\text{approx.}}{\sim} \chi^2_{k-1}\) under \(H_0\).

Chi-squared test for goodness-of-fit in R

P-value is calculated as: \(P(\chi^2_{k-1} > X^2)\).

  • The global distribution of blood types (A, B, AB, O) is
    A: 40%, B: 25%, AB: 10%, O: 25%.
  • The observed blood type in 200 randomly selected individuals from a local population is as follows:
    A: 85, B: 40, AB: 15, O: 60.


Category A B AB O
\(O_i\) 85 40 15 60
\(E_i\) 80 50 20 50