Interval estimators

  • Point estimators will almost always be wrong.
  • Instead of giving a single value, we can give a range of plausible values for the parameter.
  • This range is called an interval estimator.
  • We will focus on a specific type of interval estimator called a confidence interval.

Confidence intervals for a population mean

A 95% confidence interval for \(\mu\) when \(\sigma\) is known

  • Use the sample mean \(\bar{X}\) as the point estimator for \(\mu\).
  • By the CLT, \(\bar{X} \overset{\text{approx}}{\sim} N\!\left(\mu, \dfrac{\sigma^2}{n}\right)\) for large \(n\).
  • For a standard normal variable, \(P(-1.96 < Z < 1.96) \approx 0.95\).



\[\begin{align*} & P\left(-1.96 < \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} < 1.96\right) \approx 0.95\\ &\quad = P\left(-1.96\frac{\sigma}{\sqrt{n}} < \bar{X} - \mu < 1.96\frac{\sigma}{\sqrt{n}}\right) \\ &\quad = P\left(\bar{X}-1.96\frac{\sigma}{\sqrt{n}} < \mu < \bar{X}+1.96\frac{\sigma}{\sqrt{n}}\right) \\ \end{align*}\]

Therefore, a 95% confidence interval for \(\mu\) is \[\left(\bar{X}-1.96\frac{\sigma}{\sqrt{n}},\;\bar{X}+1.96\frac{\sigma}{\sqrt{n}}\right).\]

A \(100(1 - \alpha)\%\) confidence interval for \(\mu\) when \(\sigma\) is known


\[\left(\bar{X}-z^*_{\alpha/2}\frac{\sigma}{\sqrt{n}},\;\bar{X}+z^*_{\alpha/2}\frac{\sigma}{\sqrt{n}}\right)\]

where \(z^*_{\alpha/2}\) is the critical value such that \[P(Z < z^*_{\alpha/2}) = 1 - \alpha/2\] for \(Z \sim N(0,1)\).

Interpretation of confidence intervals

Suppose we repeat the experiment many times and construct a \(100(1 - \alpha)\%\) confidence interval from each sample. We expect that approximately \(100(1 - \alpha)\%\) of those intervals will contain the true parameter value.

  • The parameter \(\mu\) is fixed (in the frequentist paradigm).
  • The interval is random.
  • The confidence interval is not a probability statement about the parameter being in the interval.
  • It is a statement about the long-run performance of the method used to construct the interval.
  • The interval either contains the parameter or it does not.

Simulating confidence intervals when \(\sigma\) is known

Visualising confidence intervals when \(\sigma\) is known

Factors affecting the width of a confidence interval

  • Sample size: Larger \(n\) → narrower interval.
  • Confidence level: Higher confidence → wider interval.
  • Population variability: More variability → wider interval.

\[\left(\bar{X}-z^*_{\alpha/2}\frac{\sigma}{\sqrt{n}},\;\bar{X}+z^*_{\alpha/2}\frac{\sigma}{\sqrt{n}}\right)\]

A \(100(1 - \alpha)\%\) confidence interval for \(\mu\) when \(\sigma\) is unknown

  • Population standard deviation usually unknown.
  • When \(\sigma\) is estimated with the sample standard deviation \(s\):

\[ \frac{\bar{X}-\mu}{s/\sqrt{n}} \sim t_{n-1} \]

\[ \left( \bar{X}-t^*_{n-1,\alpha/2}\frac{s}{\sqrt{n}},\; \bar{X}+t^*_{n-1,\alpha/2}\frac{s}{\sqrt{n}} \right) \]

where \(t^*_{n-1,\alpha/2}\) is the critical value such that \[P(T < t^*_{n-1,\alpha/2}) = 1 - \alpha/2\] for \(T \sim t_{n-1}\).

Simulating confidence intervals when \(\sigma\) is unknown

Does it really make a difference if you use the t-distribution instead of the normal distribution when \(\sigma\) is unknown? Let’s find out by simulating confidence intervals using both methods.

Example: electricity usage

A random sample of 30 households was selected as part of a study on electricity usage, and the number of kilowatt-hours (kWh) was recorded for each household in the sample for the March quarter of 2006. The average usage was found to be 375kWh sample standard deviation is 91.5kWh. Find a 99% confidence interval for the mean usage in the March quarter of 2006.

  • First, write what you know: \(n = 30\), \(\bar{x} = 375\), \(s = 91.5\), confidence level = 99% \(\Rightarrow \alpha = 0.01\).
  • Check conditions: sampling distribution of \(\bar{X}\) is approximately normal since \(n\) is large enough.
  • Since \(\sigma\) is unknown, the critical value is given by \(t^*_{29, 0.005}\) and the standard error is \(s/\sqrt{n}\).

Confidence intervals for a population proportion

Confidence interval for a population proportion

  • Recall that from CLT that if we have \(X \sim B(n, p)\), then \[\hat{p} = X/n \overset{\text{approx.}}{\sim} N\left(p, \frac{p(1-p)}{n}\right)\] when \(n\) is large enough.
  • Check the success-failure condition: \(np \geq 10\) and \(n(1-p) \geq 10\).

A \(100(1 - \alpha)\%\) confidence interval for \(p\) is given by

\[\left(\hat{p} - z^*_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}},\;\hat{p} + z^*_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\right)\]

Example: vaccine

A random sample of 100 preschool children in Bruce revealed that only 62 had been vaccinated. Provide an approximate 90% confidence interval for the proportion vaccinated in that suburb.

  • We have \(n = 100\), \(\hat{p} = 0.62\), confidence level = 90% \(\Rightarrow \alpha = 0.1\).
  • Check conditions: \(np = 62 \geq 10\) and \(n(1-p) = 38 \geq 10\).
  • The critical value is \(z^*_{0.05}\) and the standard error is \(\sqrt{\hat{p}(1-\hat{p})/n}\).

Summary

A \((1 - \alpha)100\%\) confidence interval for the mean \(\mu\) is of the form:

\[\text{Point Estimate} \pm \text{Critical Value} \times \text{Standard Error of Point Estimate}.\]

  • When \(\sigma\) is known: \(\bar{X} \pm z^*_{\alpha/2}\dfrac{\sigma}{\sqrt{n}}\)
    • where \(z^*_{\alpha/2}\) is the critical value such that \(P(Z < z^*_{\alpha/2}) = 1 - \alpha/2\) for \(Z \sim N(0,1)\).
  • When \(\sigma\) is unknown: \(\bar{X} \pm t^*_{n-1,\alpha/2}\dfrac{S}{\sqrt{n}}\)
    • where \(t^*_{n-1,\alpha/2}\) is the critical value such that \(P(T < t^*_{n-1,\alpha/2}) = 1 - \alpha/2\) for \(T \sim t_{n-1}\).
  • For a population proportion: \(\hat{p} \pm z^*_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}}\)