Interval estimators

STAT1003 – Statistical Techniques

Dr. Emi Tanaka

Australian National University

These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.

Interval estimators

Point estimators will almost always be wrong.
Instead of giving a single value, we can give a range of plausible values for the parameter.
This range is called an interval estimator.
We will focus on a specific type of interval estimator called a confidence interval.

Confidence intervals for a population mean

A 95% confidence interval for \(\mu\) when \(\sigma\) is known

Use the sample mean \(\bar{X}\) as the point estimator for \(\mu\).
By the CLT, \(\bar{X} \overset{\text{approx}}{\sim} N\!\left(\mu, \dfrac{\sigma^2}{n}\right)\) for large \(n\).

For a standard normal variable, \(P(-1.96 < Z < 1.96) \approx 0.95\).

\[\begin{align*} & P\left(-1.96 < \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} < 1.96\right) \approx 0.95\\ &\quad = P\left(-1.96\frac{\sigma}{\sqrt{n}} < \bar{X} - \mu < 1.96\frac{\sigma}{\sqrt{n}}\right) \\ &\quad = P\left(\bar{X}-1.96\frac{\sigma}{\sqrt{n}} < \mu < \bar{X}+1.96\frac{\sigma}{\sqrt{n}}\right) \\ \end{align*}\]

Therefore, a 95% confidence interval for \(\mu\) is \[\left(\bar{X}-1.96\frac{\sigma}{\sqrt{n}},\;\bar{X}+1.96\frac{\sigma}{\sqrt{n}}\right).\]

A \(100(1 - \alpha)\%\) confidence interval for \(\mu\) when \(\sigma\) is known

\[\left(\bar{X}-z^*_{\alpha/2}\frac{\sigma}{\sqrt{n}},\;\bar{X}+z^*_{\alpha/2}\frac{\sigma}{\sqrt{n}}\right)\]

where \(z^*_{\alpha/2}\) is the critical value such that \[P(Z < z^*_{\alpha/2}) = 1 - \alpha/2\] for \(Z \sim N(0,1)\).

Interpretation of confidence intervals

Suppose we repeat the experiment many times and construct a \(100(1 - \alpha)\%\) confidence interval from each sample. We expect that approximately \(100(1 - \alpha)\%\) of those intervals will contain the true parameter value.

The parameter \(\mu\) is fixed (in the frequentist paradigm).
The interval is random.
The confidence interval is not a probability statement about the parameter being in the interval.
It is a statement about the long-run performance of the method used to construct the interval.
The interval either contains the parameter or it does not.

Simulating confidence intervals when \(\sigma\) is known

Visualising confidence intervals when \(\sigma\) is known

Factors affecting the width of a confidence interval

Sample size: Larger \(n\) → narrower interval.
Confidence level: Higher confidence → wider interval.
Population variability: More variability → wider interval.

\[\left(\bar{X}-z^*_{\alpha/2}\frac{\sigma}{\sqrt{n}},\;\bar{X}+z^*_{\alpha/2}\frac{\sigma}{\sqrt{n}}\right)\]

A \(100(1 - \alpha)\%\) confidence interval for \(\mu\) when \(\sigma\) is unknown

Population standard deviation usually unknown.
When \(\sigma\) is estimated with the sample standard deviation \(s\):

\[ \frac{\bar{X}-\mu}{s/\sqrt{n}} \sim t_{n-1} \]

\[ \left( \bar{X}-t^*_{n-1,\alpha/2}\frac{s}{\sqrt{n}},\; \bar{X}+t^*_{n-1,\alpha/2}\frac{s}{\sqrt{n}} \right) \]

where \(t^*_{n-1,\alpha/2}\) is the critical value such that \[P(T < t^*_{n-1,\alpha/2}) = 1 - \alpha/2\] for \(T \sim t_{n-1}\).

Simulating confidence intervals when \(\sigma\) is unknown

Does it really make a difference if you use the t-distribution instead of the normal distribution when \(\sigma\) is unknown? Let’s find out by simulating confidence intervals using both methods.

Example: electricity usage

A random sample of 30 households was selected as part of a study on electricity usage, and the number of kilowatt-hours (kWh) was recorded for each household in the sample for the March quarter of 2006. The average usage was found to be 375kWh sample standard deviation is 91.5kWh. Find a 99% confidence interval for the mean usage in the March quarter of 2006.

First, write what you know: \(n = 30\), \(\bar{x} = 375\), \(s = 91.5\), confidence level = 99% \(\Rightarrow \alpha = 0.01\).
Check conditions: sampling distribution of \(\bar{X}\) is approximately normal since \(n\) is large enough.
Since \(\sigma\) is unknown, the critical value is given by \(t^*_{29, 0.005}\) and the standard error is \(s/\sqrt{n}\).

Confidence intervals for a population proportion

Confidence interval for a population proportion

Recall that from CLT that if we have \(X \sim B(n, p)\), then \[\hat{p} = X/n \overset{\text{approx.}}{\sim} N\left(p, \frac{p(1-p)}{n}\right)\] when \(n\) is large enough.
Check the success-failure condition: \(np \geq 10\) and \(n(1-p) \geq 10\).

A \(100(1 - \alpha)\%\) confidence interval for \(p\) is given by

\[\left(\hat{p} - z^*_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}},\;\hat{p} + z^*_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\right)\]

Example: vaccine

A random sample of 100 preschool children in Bruce revealed that only 62 had been vaccinated. Provide an approximate 90% confidence interval for the proportion vaccinated in that suburb.

We have \(n = 100\), \(\hat{p} = 0.62\), confidence level = 90% \(\Rightarrow \alpha = 0.1\).
Check conditions: \(np = 62 \geq 10\) and \(n(1-p) = 38 \geq 10\).
The critical value is \(z^*_{0.05}\) and the standard error is \(\sqrt{\hat{p}(1-\hat{p})/n}\).

Summary

A \((1 - \alpha)100\%\) confidence interval for the mean \(\mu\) is of the form:

\[\text{Point Estimate} \pm \text{Critical Value} \times \text{Standard Error of Point Estimate}.\]

When \(\sigma\) is known: \(\bar{X} \pm z^*_{\alpha/2}\dfrac{\sigma}{\sqrt{n}}\)
- where \(z^*_{\alpha/2}\) is the critical value such that \(P(Z < z^*_{\alpha/2}) = 1 - \alpha/2\) for \(Z \sim N(0,1)\).
When \(\sigma\) is unknown: \(\bar{X} \pm t^*_{n-1,\alpha/2}\dfrac{S}{\sqrt{n}}\)
- where \(t^*_{n-1,\alpha/2}\) is the critical value such that \(P(T < t^*_{n-1,\alpha/2}) = 1 - \alpha/2\) for \(T \sim t_{n-1}\).
For a population proportion: \(\hat{p} \pm z^*_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\)