Point estimators

STAT1003 – Statistical Techniques

Dr. Emi Tanaka

Australian National University

These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.

Point estimator

We use statistics as estimators for the population parameters.

Estimator: the formula or method, e.g., \(\bar{X}\) as estimator for \(\mu\).
Estimate: the number you get when you apply the method to data, e.g. \(\bar{x} = 63.2\).

Let \(\theta\) be some population parameter and let \(\hat{\theta}\) denote a point estimator of \(\theta\).

Case study: a point estimator for \(\mu\)

Why should we use the sample mean \(\bar{X}\) as a point estimator for the population mean \(\mu\)?

Consider three different point estimators for \(\mu\):

The sample mean: \(\hat{\theta}_1 = \bar{X}\) (the sample mean)
The sample median: \(\hat{\theta}_2 = \tilde{X}\) (the sample median)
The average of the first two observations: \(\hat{\theta}_3 = \dfrac{1}{2}(X_1 + X_2)\)

Which estimator is “better”?

The error of a point estimate

In practice, the estimated value is almost never exactly equal to the unknown population parameter value.

Error of the estimate is the difference between the estimated value and the parameter, and is composed of:

Bias describes a systematic tendency to over- or under-estimate the true population value
Sampling error describes the variability of the estimator from one sample to another \[\underbrace{\hat{\theta}- \theta}_{\large \text{Error}} = \underbrace{\hat{\theta} - \text{E}(\hat{\theta})}_{\large \text{Sampling Error}} + \underbrace{\text{E}(\hat{\theta}) - \theta}_{\large \text{Bias}}\]

We evaluate a point estimator in these two aspects.

Bias of a point estimator

Bias of a point estimator describes its systematic tendency to over- or under-estimate the true population value.

The formal definition of the bias of a point estimator \(\hat{\theta}\) of a parameter \(\theta\) is

\[\text{Bias}(\hat{\theta}) = E(\hat{\theta}) - \theta\]

A point estimator is unbiased if \(\text{Bias}(\hat{\theta}) = 0\), i.e. if \(\text{E}(\hat{\theta}) = \theta\)
- If the bias is positive, the estimator tends to over-estimate the parameter.
- If the bias is negative, the estimator tends to under-estimate the parameter.

Case study: bias of a point estimator

We have shown before that \(E(\bar{X}) = \mu\), so sample mean \(\bar{X}\) is an unbiased estimator for \(\mu\).

The sample median, \(\tilde{X}\), would:

be an unbiased estimator only if the population distribution is symmetric,
have a positive bias if the population distribution is right-skewed, and
have a negative bias if the population distribution is left-skewed.

If we use the average of the first two observations in the sample as an estimator, this would also be an unbiased estimator because

\[ E\left[\frac{1}{2}(X_{1} + X_{2})\right] = \frac{1}{2}\left[E(X_{1}) + E(X_{2})\right] = \frac{1}{2}(\mu + \mu) = \mu. \]

Variance of a point estimator

Sampling error, sometimes called sampling uncertainty, describes how much an estimate will tend to vary from one sample to the next.
It is measured by the variance of an estimator.
If \(\hat{\theta}\) is a point estimator of \(\theta\), the variance of \(\hat{\theta}\) is

\[\text{Var}(\hat{\theta}) = E\left((\hat{\theta} - E(\hat{\theta}))^2\right) = E\left(\hat{\theta}^2\right) - \left(E(\hat{\theta})\right)^2\]

Estimators with low variance are said to be efficient, i.e. for two unbiased estimators \(\hat{\theta}_1\) and \(\hat{\theta}_2\), if \(\text{Var}(\hat{\theta}_1) < \text{Var}(\hat{\theta}_2)\) then \(\hat{\theta}_1\) is considered more efficient.

Case study: variance of a point estimator

\(\text{Var}(\bar{X}) = \dfrac{\sigma^2}{n}\)

The variance decreases when the sample size increases.
We will get a more efficient estimator for the population mean when the sample size is large.

\(\text{Var}(\tilde{X}) \approx \dfrac{\pi\sigma^2}{2n}\) if the population distribution is normal (proof out of scope).

It is less efficient than the sample mean.

\(\text{Var}\left[\frac{1}{2}(X_1 + X_2)\right] = \frac{1}{4}\left[\text{Var}(X_1) + \text{Var}(X_2)\right] = \frac{1}{4}\left[\sigma^2 + \sigma^2\right] = \dfrac{\sigma^2}{2}\)

It is less efficient than the sample mean when \(n > 2\).

Case study: variance of a point estimator via simulation

Mean squared error of a point estimator

To evaluate a point estimator, we need to consider both its bias and its variance.

The mean squared error (MSE) of a point estimator takes into account both. It is defined as

\[\text{MSE}(\hat{\theta}) = \text{E}\left[(\hat{\theta} - \theta)^2\right] = \left(\text{E}(\hat{\theta}) - \theta\right)^2 + E\left((\hat{\theta} - E(\hat{\theta}))^2\right) = \text{Bias}(\hat{\theta})^2 + \text{Var}(\hat{\theta})\]

MSE is often used as a criterion for comparing estimators.

Example: mean squared error of a point estimator

\(\text{MSE}(\bar{X}) = \text{Bias}(\bar{X})^2 + \text{Var}(\bar{X}) = 0 + \dfrac{\sigma^2}{n} = \dfrac{\sigma^2}{n}\)

\(\text{MSE}(\tilde{X}) \approx \text{Bias}(\tilde{X})^2 + \text{Var}(\tilde{X}) \approx 0 + \dfrac{\pi\sigma^2}{2n} = \dfrac{\pi\sigma^2}{2n}\) if the population distribution is normal (proof out of scope).

\(\text{MSE}\left[\frac{1}{2}(X_1 + X_2)\right] = \text{Bias}\left[\frac{1}{2}(X_1 + X_2)\right]^2 + \text{Var}\left[\frac{1}{2}(X_1 + X_2)\right] = 0 + \dfrac{\sigma^2}{2} = \dfrac{\sigma^2}{2}\)

Consistency of a point estimator

A point estimator is said to be consistent if the MSE of the estimator goes to zero as sample size \(n\) increases:

\[\lim_{n \to \infty} \text{MSE}(\hat{\theta}) = 0\]

We can see for all three estimators in our case study, the MSE goes to zero as \(n\) increases, so they are all consistent estimators for \(\mu\).

Summary

Unbiasedness: The estimator’s expected value equals the true parameter: \(\text{E}(\hat{\theta}) = \theta\).
Efficiency: Among unbiased estimators, the one with the smallest variance is preferred: \(\text{Var}(\hat{\theta}) < \text{Var}(\hat{\theta}')\)
Consistency: As sample size increases, the estimator gets closer to the true parameter: \(\lim_{n \to \infty} \text{MSE}(\hat{\theta}) = 0\).