Sampling distribution

STAT1003 – Statistical Techniques

Dr. Emi Tanaka

Australian National University

These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.

Recall: statistical inference

Statistical inference: Infer parameters of a population from a sample from that population.

Population parameters describe populations, e.g. population mean \(\mu\) and population variance \(\sigma^2\).
Population parameters are usually unknown.
To find out more about them, we take a random sample of data.
Calculate statistics from the sample, e.g. sample mean \(\bar{x}\) and sample variance \(s^2\).
Use \(\bar{x}\) and \(s^2\) to estimate \(\mu\) and \(\sigma^2\).

Sampling distribution

Sample statistics, like the sample mean (\(\bar{X}\)) and sample variance (\(S^2\)), are random variables.

Sampling distribution refers to the distribution of a statistic that would arise if we repeatedly took random samples from a population.

Sampling distribution of a sample mean

Expected value of a sample mean

Suppose \(X_1, X_2, \ldots, X_n\) are independent random variables drawn from a population with mean \(\mu\) and finite variance \(\sigma^2\) (we refer to this as independent and identically distributed or i.i.d.).
Then the sample mean \(\bar{X}\) is given by:

\[ \bar{X} = \frac{1}{n}(X_1 + X_2 + \ldots + X_n) \]

The expected value of the sample mean \(\bar{X}\) is equal to the population mean \(\mu\):

\[ \begin{align*} \text{E}(\bar{X}) &= \text{E}\left(\frac{1}{n}(X_1 + X_2 + \ldots + X_n)\right) \\ &= \frac{1}{n}(\text{E}(X_1) + \text{E}(X_2) + \ldots + \text{E}(X_n)) \\ &= \frac{1}{n}(n\mu) = \mu \end{align*} \]

Standard error

The standard error (SE) of a statistic is the standard deviation of its sampling distribution.

For the sample mean \(\bar{X}\):

\[ \begin{align*} \left[\text{SE}(\bar{X})\right]^2 &= \text{Var}\left(\frac{1}{n}(X_1 + X_2 + \ldots + X_n)\right) \\ &= \frac{1}{n^2}\text{Var}(X_1 + X_2 + \ldots + X_n) \\ &= \frac{\sigma^2}{n} \end{align*} \]

But since \(\sigma\) is usually unknown, we use the sample standard deviation \(s\) to estimate it. Thus, we have \(\widehat{\text{SE}}(\bar{X}) = \dfrac{s}{\sqrt{n}}\).

Exact sampling distribution 1

When dealing with experiments that are random and well-defined in a purely theoretical setting, we are able to find the exact sampling distribution.

Suppose we roll a fair four-sided die and let \(X\) be the number that comes up.
The population can be thought of as an infinite number of rolls of the die.
If we repeatedly draw samples of size 2, i.e., roll the die twice, find the sampling distribution of the sample mean \(\bar{X}\).

The population distribution of \(X\) is:

\(x\)	\(1\)	\(2\)	\(3\)	\(4\)
\(P(X = x)\)	\(\frac{1}{4}\)	\(\frac{1}{4}\)	\(\frac{1}{4}\)	\(\frac{1}{4}\)

\(E(X) = 2.5\) and \(\text{Var}(X) = 1.25\).

Exact sampling distribution 2

How about the sampling distribution for sample mean \(\bar{X}\) when the sample size \(n = 2\)?
The possible values of \(\bar{X}\) are:

First roll	Second roll
First roll	1	2	3	4
1	1.0	1.5	2.0	2.5
2	1.5	2.0	2.5	3.0
3	2.0	2.5	3.0	3.5
4	2.5	3.0	3.5	4.0

The (exact) sampling distribution of \(\bar{X}\) is:

\(\bar{x}\)	\(1\)	\(1.5\)	\(2\)	\(2.5\)	\(3\)	\(3.5\)	\(4\)
\(P(\bar{X} = \bar{x})\)	\(\frac{1}{16}\)	\(\frac{2}{16}\)	\(\frac{3}{16}\)	\(\frac{4}{16}\)	\(\frac{3}{16}\)	\(\frac{2}{16}\)	\(\frac{1}{16}\)

\(E(\bar{X}) = 2.5\) and \(\text{Var}(\bar{X}) = 0.625\).