Chi-squared tests for independence

STAT1003 – Statistical Techniques

Dr. Emi Tanaka

Australian National University

These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.

Case study: iPod

Researchers recruited 219 participants in a study where they would sell a used iPod that was known to have frozen twice in the past.
The participants were incentivized to get as much money as they could for the iPod.
The researchers wanted to understand what types of questions would elicit the seller to disclose the freezing issue.
Unbeknownst to the participants who were the sellers in the study, the buyers were collaborating with the researchers to evaluate the influence of different questions on the likelihood of getting the sellers to disclose the past issues with the iPod.

The scripted buyers asked one of three questions:
- General: What can you tell me about it?
- Positive Assumption: It doesn’t have any problems, does it?
- Negative Assumption: What problems does it have?

	General	Positive	Negative	Total
Disclose Problem	2	23	36	61
Hide Problem	71	50	37	158
Total	73	73	73	219

Hypotheses

\(H_0\): The buyer’s question and the seller’s behavior are independent.
\(H_A\): The buyer’s question and the seller’s behavior are not independent.

Recall that events \(A\) and \(B\) are independent if: \(P(A \cap B) = P(A)P(B)\).
Alternatively, we can write the hypotheses in terms of the probabilities:

\(H_0\):
- \(P(\text{Disclose} \cap \text{General}) = P(\text{Disclose})P(\text{General})\)
- \(P(\text{Disclose} \cap \text{Positive}) = P(\text{Disclose})P(\text{Positive})\)
- \(P(\text{Disclose} \cap \text{Negative}) = P(\text{Disclose})P(\text{Negative})\)
- \(P(\text{Hide} \cap \text{General}) = P(\text{Hide})P(\text{General})\)
- \(P(\text{Hide} \cap \text{Positive}) = P(\text{Hide})P(\text{Positive})\)
- \(P(\text{Hide} \cap \text{Negative}) = P(\text{Hide})P(\text{Negative})\)
\(H_A\): At least one of the probabilities are not equal.

Expected counts under \(H_0\)

Under \(H_0\) we have \(E_{ij} = n \times P(\text{row } i) \times P(\text{column } j)\) for \(i = 1, 2\) and \(j = 1, 2, 3\).
We estimate the probabilities by the sample proportions: \(P(\text{row } i) = \frac{\text{row } i \text{ total}}{n}\) and \(P(\text{column } j) = \dfrac{\text{column } j \text{ total}}{n}\).
E.g. \(E_{11} = E_{12} = E_{13} = 219 \times \dfrac{61}{219} \times \dfrac{73}{219} = 20.33\) and \(E_{21} = E_{22} = E_{23} = 219 \times \dfrac{158}{219} \times \dfrac{73}{219} = 52.67\).
We can summarize the observed counts and the expected counts in a table:

	General	Positive	Negative	Total
Disclose Problem	2 (20.33)	23 (20.33)	36 (20.33)	61
Hide Problem	71 (52.67)	50 (52.67)	37 (52.67)	158
Total	73	73	73	219

P-value for the chi-squared test for independence

For independence tests, we calculate the test statistic using \[X^2 = \sum_{i=1}^{R} \sum_{j=1}^{C} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}\] where \(R\) and \(C\) are the number of levels in the row variable and column variable.
Under \(H_0\), \(X^2 \overset{\text{approx.}}{\sim} \chi^2_{(R-1)(C-1)}\) provided that \(E_{ij} \geq 5\) for all \(i\) and \(j\) and cases are independent.

\(\chi^2 = \frac{(2 - 20.33)^2}{20.33} + \frac{(23 - 20.33)^2}{20.33} + \frac{(36 - 20.33)^2}{20.33} + \frac{(71 - 52.67)^2}{52.67} + \frac{(50 - 52.67)^2}{52.67} + \frac{(37 - 52.67)^2}{52.67} = 35.86\)

The p-value is \(P(\chi^2_2 > 35.86)\) which is very small (less than 0.001).
We reject the null and conclude that the data provide convincing evidence that the question asked did affect a seller’s likelihood to tell the truth about problems with the iPod.

Chi-squared tests for independence

Case study: iPod

Hypotheses

Expected counts under \(H_0\)

P-value for the chi-squared test for independence

Chi-square test for independence in R