Sampling distribution

STAT1003 – Statistical Techniques

Dr. Emi Tanaka

Australian National University

These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.

Recall: statistical inference

Statistical inference: Infer parameters of a population from a sample from that population.

  • Population parameters describe populations, e.g. population mean \(\mu\) and population variance \(\sigma^2\).
  • Population parameters are usually unknown.
  • To find out more about them, we take a random sample of data.
  • Calculate statistics from the sample, e.g. sample mean \(\bar{x}\) and sample variance \(s^2\).
  • Use \(\bar{x}\) and \(s^2\) to estimate \(\mu\) and \(\sigma^2\).

Sampling distribution

  • Sample statistics, like the sample mean (\(\bar{X}\)) and sample variance (\(S^2\)), are random variables.

Sampling distribution refers to the distribution of a statistic that would arise if we repeatedly took random samples from a population.

Sampling distribution of a sample mean

Expected value of a sample mean

  • Suppose \(X_1, X_2, \ldots, X_n\) are independent random variables drawn from a population with mean \(\mu\) and finite variance \(\sigma^2\) (we refer to this as independent and identically distributed or i.i.d.).
  • Then the sample mean \(\bar{X}\) is given by:

\[ \bar{X} = \frac{1}{n}(X_1 + X_2 + \ldots + X_n) \]

  • The expected value of the sample mean \(\bar{X}\) is equal to the population mean \(\mu\):

\[ \begin{align*} \text{E}(\bar{X}) &= \text{E}\left(\frac{1}{n}(X_1 + X_2 + \ldots + X_n)\right) \\ &= \frac{1}{n}(\text{E}(X_1) + \text{E}(X_2) + \ldots + \text{E}(X_n)) \\ &= \frac{1}{n}(n\mu) = \mu \end{align*} \]

Standard error

The standard error (SE) of a statistic is the standard deviation of its sampling distribution.

For the sample mean \(\bar{X}\):

\[ \begin{align*} \left[\text{SE}(\bar{X})\right]^2 &= \text{Var}\left(\frac{1}{n}(X_1 + X_2 + \ldots + X_n)\right) \\ &= \frac{1}{n^2}\text{Var}(X_1 + X_2 + \ldots + X_n) \\ &= \frac{\sigma^2}{n} \end{align*} \]

But since \(\sigma\) is usually unknown, we use the sample standard deviation \(s\) to estimate it. Thus, we have \(\widehat{\text{SE}}(\bar{X}) = \dfrac{s}{\sqrt{n}}\).

Exact sampling distribution 1

  • When dealing with experiments that are random and well-defined in a purely theoretical setting, we are able to find the exact sampling distribution.
  • Suppose we roll a fair four-sided die and let \(X\) be the number that comes up.
  • The population can be thought of as an infinite number of rolls of the die.
  • If we repeatedly draw samples of size 2, i.e., roll the die twice, find the sampling distribution of the sample mean \(\bar{X}\).
  • The population distribution of \(X\) is:
\(x\) \(1\) \(2\) \(3\) \(4\)
\(P(X = x)\) \(\frac{1}{4}\) \(\frac{1}{4}\) \(\frac{1}{4}\) \(\frac{1}{4}\)
  • \(E(X) = 2.5\) and \(\text{Var}(X) = 1.25\).

Exact sampling distribution 2

  • How about the sampling distribution for sample mean \(\bar{X}\) when the sample size \(n = 2\)?
  • The possible values of \(\bar{X}\) are:
First roll
Second roll
1 2 3 4
1 1.0 1.5 2.0 2.5
2 1.5 2.0 2.5 3.0
3 2.0 2.5 3.0 3.5
4 2.5 3.0 3.5 4.0
  • The (exact) sampling distribution of \(\bar{X}\) is:
\(\bar{x}\) \(1\) \(1.5\) \(2\) \(2.5\) \(3\) \(3.5\) \(4\)
\(P(\bar{X} = \bar{x})\) \(\frac{1}{16}\) \(\frac{2}{16}\) \(\frac{3}{16}\) \(\frac{4}{16}\) \(\frac{3}{16}\) \(\frac{2}{16}\) \(\frac{1}{16}\)
  • \(E(\bar{X}) = 2.5\) and \(\text{Var}(\bar{X}) = 0.625\).