STAT1003 – Statistical Techniques
Dr. Emi Tanaka
Australian National University
These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.
| Type | Tidy data | Relational model |
|---|---|---|
| variable | column | attribute/field |
| observation | row | tuple/record |
species_id is the primary key in the species table,plot_id is the primary key in the plots table, andrecord_id is the primary key in the surveys table.ChickWeight has a compound key consisting of Time and Chick.chickwts has no key.plot_id and species_id are foreign keys in the surveys table, which link to the plots and species tables, respectively.plots and species tables.dplyrdplyr, you can use by = NULL (default) to perform a natural join.



dplyr joins using the relationship argument, which can be set to “one-to-one”, “one-to-many”, “many-to-one”, or “many-to-many”.| name | enrol |
|---|---|
| BEAN | 109557 |
| CANBERRA | 102196 |
| FENNER | 102576 |
| division | party | votes |
|---|---|---|
| Bean | AUP | 2722 |
| Bean | LP | 29223 |
| Bean | GAP | 929 |
| Bean | LDP | 2540 |
| Bean | GRN | 12168 |
| Bean | UAPP | 2227 |
| Bean | ALP | 35447 |
| Bean | 5043 | |
| Bean | IND | 7683 |
| Canberra | AUP | 1784 |
| Canberra | GRN | 20144 |
| Canberra | 1904 | |
| Canberra | UAPP | 1361 |
| Canberra | LP | 24063 |
| Canberra | ALP | 34989 |
| Canberra | IND | 4062 |
| Fenner | 2669 | |
| Fenner | ALP | 38864 |
| Fenner | UAPP | 3529 |
| Fenner | GRN | 12492 |
| Fenner | LP | 30025 |
| Fenner | AUP | 1723 |
| division | party | votes | enrol | perc |
|---|---|---|---|---|
| BEAN | AUP | 2722 | 109557 | 2.5 |
| BEAN | LP | 29223 | 109557 | 26.7 |
| BEAN | GAP | 929 | 109557 | 0.8 |
| BEAN | LDP | 2540 | 109557 | 2.3 |
| BEAN | GRN | 12168 | 109557 | 11.1 |
| BEAN | UAPP | 2227 | 109557 | 2.0 |
| BEAN | ALP | 35447 | 109557 | 32.4 |
| BEAN | 5043 | 109557 | 4.6 | |
| BEAN | IND | 7683 | 109557 | 7.0 |
| CANBERRA | AUP | 1784 | 102196 | 1.7 |
| CANBERRA | GRN | 20144 | 102196 | 19.7 |
| CANBERRA | 1904 | 102196 | 1.9 | |
| CANBERRA | UAPP | 1361 | 102196 | 1.3 |
| CANBERRA | LP | 24063 | 102196 | 23.5 |
| CANBERRA | ALP | 34989 | 102196 | 34.2 |
| CANBERRA | IND | 4062 | 102196 | 4.0 |
| FENNER | 2669 | 102576 | 2.6 | |
| FENNER | ALP | 38864 | 102576 | 37.9 |
| FENNER | UAPP | 3529 | 102576 | 3.4 |
| FENNER | GRN | 12492 | 102576 | 12.2 |
| FENNER | LP | 30025 | 102576 | 29.3 |
| FENNER | AUP | 1723 | 102576 | 1.7 |
dplyr| id | name | date |
|---|---|---|
| 1 | Alice | 2021-11-01 |
| 1 | Alice | 2022-01-01 |
| 2 | Bob | 2022-02-15 |
| 3 | Charlie | 2022-03-10 |
| patient_id | name | start | end |
|---|---|---|---|
| 1 | Aspirin | 2021-12-01 | 2022-01-31 |
| 1 | Tylenol | 2022-02-01 | 2022-02-28 |
| 2 | Ibuprofen | 2022-02-20 | 2022-03-15 |
| 3 | Amoxicillin | 2022-01-15 | 2022-02-28 |
| 3 | Vitamin C | 2022-04-01 | 2022-04-30 |
dplyrGenomic data often involves intervals and we may want to join two datasets based on whether the intervals overlap.
| id | chr | start | end |
|---|---|---|---|
| 1 | chr1 | 140 | 150 |
| 2 | chr2 | 210 | 240 |
| 3 | chr2 | 380 | 415 |
| 4 | chr1 | 230 | 280 |
| id | chr | start | end |
|---|---|---|---|
| 1 | chr1 | 100 | 150 |
| 2 | chr1 | 200 | 250 |
| 3 | chr2 | 300 | 399 |
| 4 | chr2 | 415 | 450 |








STAT1003 – Statistical Techniques