STAT1003 – Statistical Techniques
Dr. Emi Tanaka
Australian National University
These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.
To effectively communicate, we must realize that we are all different in the way we perceive the world and use this understanding as a guide to our communication with others.
– Anthony Robbins
The two words ‘information’ and ‘communication’ are often used interchangeably, but they signify quite different things. Information is giving out; communication is getting through.
– Sydney J. Harris
No one ever made a decision because of a number. They need a story.
– Daniel Kahneman

The Code of Practice for Statistics ensures that statistics are not just numbers, but reliable tools for understanding the world, providing insight to inform understanding and shape action.
The Code Principles
Trustworthiness
Quality
Value
“Table 1” is most commonly the first table in a scientific or medical research paper, providing a summary of the study population’s baseline characteristics (demographics, health metrics) stratified by group.
| Adelie (N=152) |
Chinstrap (N=68) |
Gentoo (N=124) |
Overall (N=344) |
|
|---|---|---|---|---|
| Body mass (g) | ||||
| Mean (SD) | 3700 (459) | 3730 (384) | 5080 (504) | 4200 (802) |
| Median [Min, Max] | 3700 [2850, 4780] | 3700 [2700, 4800] | 5000 [3950, 6300] | 4050 [2700, 6300] |
| Missing | 1 (0.7%) | 0 (0%) | 1 (0.8%) | 2 (0.6%) |
| Sex | ||||
| Female | 73 (48.0%) | 34 (50.0%) | 58 (46.8%) | 165 (48.0%) |
| Male | 73 (48.0%) | 34 (50.0%) | 61 (49.2%) | 168 (48.8%) |
| Missing | 6 (3.9%) | 0 (0%) | 5 (4.0%) | 11 (3.2%) |
What statistics to present depends on what you want to convey and your audience.
There are two key purposes of the table:
Contingency table (also called cross-tabulation) is a type of table that displays the frequency distribution of variables, often used to explore the relationship between two or more categorical variables.
| Island/Species | Adelie | Chinstrap | Gentoo | Total |
|---|---|---|---|---|
| Biscoe | 12.8% (44) | 0.0% (0) | 36.0% (124) | 48.8% (168) |
| Dream | 16.3% (56) | 19.8% (68) | 0.0% (0) | 36.0% (124) |
| Torgersen | 15.1% (52) | 0.0% (0) | 0.0% (0) | 15.1% (52) |
| Total | 44.2% (152) | 19.8% (68) | 36.0% (124) | 100.0% (344) |
Select an appropriate precision for your goal and audience.
Average Body Mass (g)
|
|||
|---|---|---|---|
| Species | 5 d.p. | 2 d.p. | 0 d.p. |
| Adelie | 3,700.66225 | 3700.66 | 3701 |
| Gentoo | 5,076.01626 | 5076.02 | 5076 |
| Chinstrap | 3,733.08824 | 3733.09 | 3733 |
Display trailing zeroes to match selected precision of the column.
Trailing zeroes
|
|
|---|---|
| Yes | No |
| 0.233 | 0.233 |
| 0.320 | 0.32 |
| 0.400 | 0.4 |
| 0.343 | 0.343 |
Change and display units as appropriate.
Average body mass
|
|||||
|---|---|---|---|---|---|
| Species | (g) | (mg) | (mg) | (tonnes) | (lbs) |
| Adelie | 3700.662 | 3700662 | 3,700,662 | 3.70e-03 | 8.2 |
| Gentoo | 5076.016 | 5076016 | 5,076,016 | 5.08e-03 | 11.2 |
| Chinstrap | 3733.088 | 3733088 | 3,733,088 | 3.73e-03 | 8.2 |
1000000 is harder to read than 1,000,000.To make it easier to read and compare values across rows:
Species
|
Body mass (lbs)
|
||||||
|---|---|---|---|---|---|---|---|
| Left | Center | Right | Left | Center | Right | Variable-width | Fixed-width |
| Adelie | Adelie | Adelie | 8.159 | 8.159 | 8.159 | 8.159 | 8.159 |
| Gentoo | Gentoo | Gentoo | 11.191 | 11.191 | 11.191 | 11.191 | 11.191 |
| Chinstrap | Chinstrap | Chinstrap | 8.230 | 8.230 | 8.230 | 8.230 | 8.230 |
| Species | Average body mass (g) | Average flipper length (mm) |
|---|---|---|
| Adelie | 3700.7 | 190.0 |
| Gentoo | 5076.0 | 217.2 |
| Chinstrap | 3733.1 | 195.8 |
| Species | Average body mass | Average flipper length |
|---|---|---|
| Adelie | 3700.7 g | 190.0 mm |
| Gentoo | 5076.0 g | 217.2 mm |
| Chinstrap | 3733.1 g | 195.8 mm |
There are many packages that make table in R, including ones that wrangle the data for you to make specialised table output. E.g. kableExtra, formattable, gt, DT, pander, modelsummary, gtsummary, gtExtras.
Some packages are designed to work with specific output formats (e.g. kableExtra works well with HTML and PDF output, but not with Word output). So you may want to choose a package that is compatible with your intended output format.
You can read the documentation for each packages to make the table you want.
Let us know in the dicussion forum if you have a favorite package for making tables!
When you want to show exact values or the accuracy of the values are important to convey.
You can combine plots with tables!
Data visualization is part art and part science. The challenge is to get the art right without getting the science wrong and vice versa.
– Claus O. Wilke, Fundamentals of Data Visualization

Non-exhaustive

Which category has the largest percentage?


Pie charts are a very bad way of displaying information. The eye is good at judging linear measures and bad at judging relative areas. A bar chart or dot chart is a preferable way of displaying this type of data.
Non-exhaustive
Of the 10 elementary perception tasks, Cleveland & McGill (1984) found the accuracy ranked as follows…
Rank 1
Example
Rank 2
Example
Rank 3
Example
Rank 4
Example
Rank 5
Example
Rank 6
Example
Viewers can notice certain features are absent or present without focussing their attention on particular regions.
| Birth place | Count | % |
|---|---|---|
| Australia | 17,020,422 | 66.9 |
| Not Stated | 1,358,658 | 5.3 |
| England | 927,490 | 3.6 |
| Other | 759,173 | 3.0 |
| India | 673,352 | 2.6 |
| China | 549,618 | 2.2 |
| New Zealand | 530,492 | 2.1 |
| Philippines | 293,892 | 1.2 |
| Vietnam | 257,997 | 1.0 |
| South Africa | 189,207 | 0.7 |
| Malaysia | 165,616 | 0.7 |
| Italy | 163,326 | 0.6 |
| Sri Lanka | 131,904 | 0.5 |
| Nepal | 122,506 | 0.5 |
| Scotland | 118,496 | 0.5 |
| Korea South | 102,092 | 0.4 |
| United States America | 101,309 | 0.4 |
| Germany | 101,255 | 0.4 |
| Hong Kong | 100,148 | 0.4 |
| Iraq | 92,922 | 0.4 |
| Greece | 92,314 | 0.4 |
| Pakistan | 89,633 | 0.4 |
| Lebanon | 87,340 | 0.3 |
| Indonesia | 87,075 | 0.3 |
| Thailand | 83,779 | 0.3 |
| Ireland | 80,927 | 0.3 |
| Iran | 70,899 | 0.3 |
| Fiji | 68,947 | 0.3 |
| Netherlands | 66,481 | 0.3 |
| Singapore | 61,056 | 0.2 |
| Afghanistan | 59,797 | 0.2 |
| Bangladesh | 51,491 | 0.2 |
| Canada | 50,223 | 0.2 |
| Taiwan | 49,511 | 0.2 |
| Brazil | 46,720 | 0.2 |
| Poland | 45,884 | 0.2 |
| Japan | 45,267 | 0.2 |
| Croatia | 43,302 | 0.2 |
| Egypt | 43,213 | 0.2 |
| North Macedonia | 41,786 | 0.2 |
| Zimbabwe | 39,714 | 0.2 |
| Myanmar | 39,171 | 0.2 |
| Cambodia | 39,043 | 0.2 |
| Turkey | 38,568 | 0.2 |
| France | 36,019 | 0.1 |
| Malta | 35,413 | 0.1 |
| Papua New Guinea | 29,984 | 0.1 |
| Chile | 29,860 | 0.1 |
| Wales | 29,250 | 0.1 |
| Samoa | 28,107 | 0.1 |
| Bosnia Herzegov | 26,171 | 0.1 |
| Mauritius | 25,981 | 0.1 |

Which birth place is the third largest among people in Australia?

Can you read the labels without tilting your head?

What’s the data story?

The text on the bar shows the percentage out of 25,422,788 Australian residents born in the corresponding country.
There were 5.3% of Australian residents who did not state their birth place.
The top country of birth place is Australia with 66.9% of Australian residents born in Australia.
Story from The Guardian.
Data Story
India has overtaken China and New Zealand to become the third largest country of birth for Australian residents, 2021 census data has found.
| Birth place | Count | % | Census Year |
|---|---|---|---|
| England | 907,570 | 3.9 | 2016 |
| New Zealand | 518,466 | 2.2 | 2016 |
| China | 509,555 | 2.2 | 2016 |
| India | 455,389 | 1.9 | 2016 |
| Philippines | 232,386 | 1.0 | 2016 |
| England | 927,490 | 3.6 | 2021 |
| India | 673,352 | 2.6 | 2021 |
| China | 549,618 | 2.2 | 2021 |
| New Zealand | 530,492 | 2.1 | 2021 |
| Philippines | 293,892 | 1.2 | 2021 |

Does this show that India overtook China and New Zealand?

Should we show percentage instead of counts?

The legend and the line order is different…

Maybe we can put the labels directly in the plot?

Colorblindness affect roughly 1 in 8 men.
ggplot2scale_color_ (for points, lines, etc.) orscale_fill_ (for bars, areas, etc.) in ggplot2.
STAT1003 – Statistical Techniques