| First roll |
Second roll
|
|||
|---|---|---|---|---|
| 1 | 2 | 3 | 4 | |
| 1 | 1.0 | 1.5 | 2.0 | 2.5 |
| 2 | 1.5 | 2.0 | 2.5 | 3.0 |
| 3 | 2.0 | 2.5 | 3.0 | 3.5 |
| 4 | 2.5 | 3.0 | 3.5 | 4.0 |
STAT1003 – Statistical Techniques
Dr. Emi Tanaka
Australian National University
These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.
This lecture was partially adapted from the previous STAT1003 lecturers. Thank you folks!
Statistical inference: Infer parameters of a population from a sample from that population.
Sampling distribution refers to the distribution of a statistic that would arise if we repeatedly took random samples from a population.
\[ \bar{X} = \frac{1}{n}(X_1 + X_2 + \ldots + X_n) \]
\[ \begin{align*} \text{E}(\bar{X}) &= \text{E}\left(\frac{1}{n}(X_1 + X_2 + \ldots + X_n)\right) \\ &= \frac{1}{n}(\text{E}(X_1) + \text{E}(X_2) + \ldots + \text{E}(X_n)) \\ &= \frac{1}{n}(n\mu) = \mu \end{align*} \]
The standard error (SE) of a statistic is the standard deviation of its sampling distribution.
For the sample mean \(\bar{X}\):
\[ \begin{align*} \left[\text{SE}(\bar{X})\right]^2 &= \text{Var}\left(\frac{1}{n}(X_1 + X_2 + \ldots + X_n)\right) \\ &= \frac{1}{n^2}\text{Var}(X_1 + X_2 + \ldots + X_n) \\ &= \frac{\sigma^2}{n} \end{align*} \]
But since \(\sigma\) is usually unknown, we use the sample standard deviation \(s\) to estimate it. Thus, we have \(\widehat{\text{SE}}(\bar{X}) = \dfrac{s}{\sqrt{n}}\).
| \(x\) | \(1\) | \(2\) | \(3\) | \(4\) |
|---|---|---|---|---|
| \(P(X = x)\) | \(\frac{1}{4}\) | \(\frac{1}{4}\) | \(\frac{1}{4}\) | \(\frac{1}{4}\) |
| First roll |
Second roll
|
|||
|---|---|---|---|---|
| 1 | 2 | 3 | 4 | |
| 1 | 1.0 | 1.5 | 2.0 | 2.5 |
| 2 | 1.5 | 2.0 | 2.5 | 3.0 |
| 3 | 2.0 | 2.5 | 3.0 | 3.5 |
| 4 | 2.5 | 3.0 | 3.5 | 4.0 |
| \(\bar{x}\) | \(1\) | \(1.5\) | \(2\) | \(2.5\) | \(3\) | \(3.5\) | \(4\) |
|---|---|---|---|---|---|---|---|
| \(P(\bar{X} = \bar{x})\) | \(\frac{1}{16}\) | \(\frac{2}{16}\) | \(\frac{3}{16}\) | \(\frac{4}{16}\) | \(\frac{3}{16}\) | \(\frac{2}{16}\) | \(\frac{1}{16}\) |
Let’s assume that the population is:
What do you notice about the sampling distributions of the sample means?
If a random variable is the mean of independent random values, then that value will follow a normal distribution regardless of how the individual terms are distributed.
Or mathematically, let \(X_1, \ldots, X_n\) be a random sample of \(n\) independent observations from a population with mean \(\mu\) and variance \(\sigma^2\). Then, as \(n \to \infty\),
\[\bar{X} \sim N\!\left(\mu,\; \frac{\sigma^2}{n}\right)\]
The time a family physician spends seeing a patient follows a right-skewed distribution with mean 15 minutes and standard deviation 11.6 minutes.
What is the time that the physician spends less than 12 minutes seeing a patient?
The time a family physician spends seeing a patient follows a right-skewed distribution with mean 15 minutes and standard deviation 11.6 minutes.
What is the probability that the doctor spends an average time of less than 12 minutes with her 30 patients?
The time a family physician spends seeing a patient follows a right-skewed distribution with mean 15 minutes and standard deviation 11.6 minutes.
One day, 35 patients have appointments. What is the probability the doctor works overtime beyond 8 hours?
The number of passengers passing through a large South East Asian airport is normally distributed with a mean of 110,000 persons per day and a standard deviation of 20,200 persons. If you select a random sample of 16 days:
The true (population) prime minister support rate \(p\) is unknown. We could take a poll and find the sample approval rate \(\hat{p}\) as an estimate.
The true (population) proportion of people who prefer Coke than Pepsi is unknown. We could randomly select a sample of people and calculate the sample proportion \(\hat{p}\).
\[ X = X_1 + \cdots + X_n, \quad X_i = \begin{cases} 1 & \text{success} \\ 0 & \text{failure} \end{cases} \]
\[ \hat{p} = \frac{X_1 + \cdots + X_n}{n} \]
For \(X \sim B(n, p)\),
\(X \overset{\text{approx.}}{\sim} N\left(np, np(1-p)\right)\qquad\) and \(\qquad\hat{p} \overset{\text{approx.}}{\sim} N\!\left(p, \dfrac{p(1-p)}{n}\right)\)
provided sample size is sufficiently large.
Success-failture condition: \(np \ge 10\) and \(n(1-p) \ge 10\).
Over the past few years there has been increased monitoring of the representation of women on corporate boards. Suppose that the true percentage of women of ASX 200 boards is now 24.6% and that a random sample of 220 board members is chosen.

STAT1003 – Statistical Techniques