STAT1003 – Statistical Techniques
Dr. Emi Tanaka
Australian National University
These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.
Statistics is defined as the science and technology of obtaining useful information from data, taking its variability into account.

The best thing about being a statistician is that you get to play in everyone’s backyard.
Starting point
❓ I have a question
I have a dataset
🤔 I have this question
What does the data tell me about my question?
🕵️🕵️♀️
Technical proficiency (understand statistical methods and skilled with statistical software for extracting and analyzing data) alone isn’t enough for practice. Think holistically.

How hard is a first year statistics course?
Types of variables include:
Data / variable may be captured as:
Subset of marks1 for STAT1003 students in 2025
quiz = quiz score out of 6assignment = assignment score out of 100exam = exam score out of 100week2, week3, …, week12 = tutorial attendance for weeks (1 = attended, 0 = absent)| quiz | assignment | exam | week2 | week3 | week4 | week5 | week6 | week7 | week8 | week9 | week10 | week11 | week12 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 6.0 | 60 | 14 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 5.0 | 75 | 79 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
| 5.5 | 90 | 97 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Always get to know the data first
Populations have parameters: a descriptive measure of a population that is usually unobservable and unknown.
Sample statistics are estimated from sample data and used to make inferences about population parameters.
How hard is STAT1003 at ANU for a typical undergraduate student as measured by the average final grade earned by students in STAT1003?
But we only observe data from a sample of \(n\) students.
If \(x_i\) denotes the final grade of the \(i\)-th sampled student, then the sample consists of the values: \[x_1, x_2, \dots, x_n.\]
Sample size is usually much smaller then population size: \(n \ll N\)
Let \(\mu\) denote the population mean (average) final grade of all STAT1003 students. \[\begin{align*} \mu &= \frac{1}{N}(x_{1'} + x_{2'} + \dots + x_{N'}) = \frac{1}{N}\sum_{i=1}^{N} x_{i'}\\ &= {\tiny \frac{1}{14}(73 + 60 + 54 + 62 + 71 + 68 + 57 + 60 + 72 + 57 + 35 + 53 + 58 + 70)} \approx 60.7\\ \end{align*}\]
Let \(\bar{x}\) denote the sample mean (average) final grade of the sampled STAT1003 students. \[\begin{align*} \bar{x} &= \frac{1}{n}(x_1 + x_2 + \dots + x_n) = \frac{1}{n}\sum_{i=1}^{n} x_i\\ &= {\tiny \frac{1}{5}(54 + 71 + 57 + 70 + 53)} = 61\\ \end{align*}\]
\(\bar{x}\) is used to estimate \(\mu\).
Population parameters are typically denoted by Greek letters, e.g.
Population size is often denoted by \(N\).
Garbage in, garbage out (GIGO): the quality of the output is determined by the quality of the input.
Data collection methods include:
Suppose a study tracked sunscreen use and skin cancer, and it was found that the more sunscreen someone used, the more likely the person was to have skin cancer. Does this mean sunscreen causes skin cancer?

STAT1003 – Statistical Techniques