Chi-squared tests for independence

STAT1003 – Statistical Techniques

Dr. Emi Tanaka

Australian National University

These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.

Case study: iPod

  • Researchers recruited 219 participants in a study where they would sell a used iPod that was known to have frozen twice in the past.
  • The participants were incentivized to get as much money as they could for the iPod.
  • The researchers wanted to understand what types of questions would elicit the seller to disclose the freezing issue.
  • Unbeknownst to the participants who were the sellers in the study, the buyers were collaborating with the researchers to evaluate the influence of different questions on the likelihood of getting the sellers to disclose the past issues with the iPod.
  • The scripted buyers asked one of three questions:

    • General: What can you tell me about it?
    • Positive Assumption: It doesn’t have any problems, does it?
    • Negative Assumption: What problems does it have?
General Positive Negative Total
Disclose Problem 2 23 36 61
Hide Problem 71 50 37 158
Total 73 73 73 219

Hypotheses

  • \(H_0\): The buyer’s question and the seller’s behavior are independent.
  • \(H_A\): The buyer’s question and the seller’s behavior are not independent.
  • Recall that events \(A\) and \(B\) are independent if: \(P(A \cap B) = P(A)P(B)\).
  • Alternatively, we can write the hypotheses in terms of the probabilities:
  • \(H_0\):
    • \(P(\text{Disclose} \cap \text{General}) = P(\text{Disclose})P(\text{General})\)
    • \(P(\text{Disclose} \cap \text{Positive}) = P(\text{Disclose})P(\text{Positive})\)
    • \(P(\text{Disclose} \cap \text{Negative}) = P(\text{Disclose})P(\text{Negative})\)
    • \(P(\text{Hide} \cap \text{General}) = P(\text{Hide})P(\text{General})\)
    • \(P(\text{Hide} \cap \text{Positive}) = P(\text{Hide})P(\text{Positive})\)
    • \(P(\text{Hide} \cap \text{Negative}) = P(\text{Hide})P(\text{Negative})\)
  • \(H_A\): At least one of the probabilities are not equal.

Expected counts under \(H_0\)

  • Under \(H_0\) we have \(E_{ij} = n \times P(\text{row } i) \times P(\text{column } j)\) for \(i = 1, 2\) and \(j = 1, 2, 3\).
  • We estimate the probabilities by the sample proportions: \(P(\text{row } i) = \frac{\text{row } i \text{ total}}{n}\) and \(P(\text{column } j) = \dfrac{\text{column } j \text{ total}}{n}\).
  • E.g. \(E_{11} = E_{12} = E_{13} = 219 \times \dfrac{61}{219} \times \dfrac{73}{219} = 20.33\) and \(E_{21} = E_{22} = E_{23} = 219 \times \dfrac{158}{219} \times \dfrac{73}{219} = 52.67\).
  • We can summarize the observed counts and the expected counts in a table:
General Positive Negative Total
Disclose Problem 2 (20.33) 23 (20.33) 36 (20.33) 61
Hide Problem 71 (52.67) 50 (52.67) 37 (52.67) 158
Total 73 73 73 219

P-value for the chi-squared test for independence

  • For independence tests, we calculate the test statistic using \[X^2 = \sum_{i=1}^{R} \sum_{j=1}^{C} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}\] where \(R\) and \(C\) are the number of levels in the row variable and column variable.
  • Under \(H_0\), \(X^2 \overset{\text{approx.}}{\sim} \chi^2_{(R-1)(C-1)}\) provided that \(E_{ij} \geq 5\) for all \(i\) and \(j\) and cases are independent.

\(\chi^2 = \frac{(2 - 20.33)^2}{20.33} + \frac{(23 - 20.33)^2}{20.33} + \frac{(36 - 20.33)^2}{20.33} + \frac{(71 - 52.67)^2}{52.67} + \frac{(50 - 52.67)^2}{52.67} + \frac{(37 - 52.67)^2}{52.67} = 35.86\)

  • The p-value is \(P(\chi^2_2 > 35.86)\) which is very small (less than 0.001).
  • We reject the null and conclude that the data provide convincing evidence that the question asked did affect a seller’s likelihood to tell the truth about problems with the iPod.

Chi-square test for independence in R