Why data visualisation?

  • “A picture is worth a thousand words”
  • Data visualisation can make large, complex data more accessible, understandable and usable.

Data visualization is part art and part science. The challenge is to get the art right without getting the science wrong and vice versa.

– Claus O. Wilke, Fundamentals of Data Visualization

  • Effective data visualisation means to design your data plot to effectively use human visual system to improve cognition about a targeted information from the data.

Data Visualisation Catalogue 🛒 What plot type to use?

Non-exhaustive

Why is a 3D pie chart considered a “bad plot”?

What about 2D pie charts?

Which category has the largest percentage?

help("pie")

Pie charts are a very bad way of displaying information. The eye is good at judging linear measures and bad at judging relative areas. A bar chart or dot chart is a preferable way of displaying this type of data.

  • This comes from empirical research of Cleveland & McGill (1984) among others.

Elementary perceptual tasks

Non-exhaustive

Retrieving information from graphs

Of the 10 elementary perception tasks, Cleveland & McGill (1984) found the accuracy ranked as follows…

Rank 1

Example

Rank 2

Example

Rank 3

Example

Rank 4

Example

Rank 5

Example

Rank 6

Example

Preattentive processing

Viewers can notice certain features are absent or present without focussing their attention on particular regions.

  • Which plot helps you to distinguish the data points?



Gestalt principles

  • “Gestalt” is German for form or shape.
  • Gestalt principles are a set of laws to address the natural compulsion to find order in disorder by perceiving a series of individual elements as a whole.
  • Law of proximity: Elements that are close together are perceived as being related.
  • Law of similarity: Elements that share similar characteristics are perceived as belonging together.
  • Law of closure: Elements that are arranged in a way that suggests a complete form are perceived as a whole.

Case study Birth place from the 2021 Australian Census

Birth place Count %
Australia 17,020,422 66.9
Not Stated 1,358,658 5.3
England 927,490 3.6
Other 759,173 3.0
India 673,352 2.6
China 549,618 2.2
New Zealand 530,492 2.1
Philippines 293,892 1.2
Vietnam 257,997 1.0
South Africa 189,207 0.7
Malaysia 165,616 0.7
Italy 163,326 0.6
Sri Lanka 131,904 0.5
Nepal 122,506 0.5
Scotland 118,496 0.5
Korea South 102,092 0.4
United States America 101,309 0.4
Germany 101,255 0.4
Hong Kong 100,148 0.4
Iraq 92,922 0.4
Greece 92,314 0.4
Pakistan 89,633 0.4
Lebanon 87,340 0.3
Indonesia 87,075 0.3
Thailand 83,779 0.3
Ireland 80,927 0.3
Iran 70,899 0.3
Fiji 68,947 0.3
Netherlands 66,481 0.3
Singapore 61,056 0.2
Afghanistan 59,797 0.2
Bangladesh 51,491 0.2
Canada 50,223 0.2
Taiwan 49,511 0.2
Brazil 46,720 0.2
Poland 45,884 0.2
Japan 45,267 0.2
Croatia 43,302 0.2
Egypt 43,213 0.2
North Macedonia 41,786 0.2
Zimbabwe 39,714 0.2
Myanmar 39,171 0.2
Cambodia 39,043 0.2
Turkey 38,568 0.2
France 36,019 0.1
Malta 35,413 0.1
Papua New Guinea 29,984 0.1
Chile 29,860 0.1
Wales 29,250 0.1
Samoa 28,107 0.1
Bosnia Herzegov 26,171 0.1
Mauritius 25,981 0.1

Plot #1

Which birth place is the third largest among people in Australia?

Plot #2

Can you read the labels without tilting your head?

Plot #3



What’s the data story?

Plot #4



  • The text on the bar shows the percentage out of 25,422,788 Australian residents born in the corresponding country.

  • There were 5.3% of Australian residents who did not state their birth place.

  • The top country of birth place is Australia with 66.9% of Australian residents born in Australia.

Story from The Guardian.

Another data story

Data Story

India has overtaken China and New Zealand to become the third largest country of birth for Australian residents, 2021 census data has found.

– The Guardian

Birth place Count % Census Year
England 907,570 3.9 2016
New Zealand 518,466 2.2 2016
China 509,555 2.2 2016
India 455,389 1.9 2016
Philippines 232,386 1.0 2016
England 927,490 3.6 2021
India 673,352 2.6 2021
China 549,618 2.2 2021
New Zealand 530,492 2.1 2021
Philippines 293,892 1.2 2021

Plot #5

Does this show that India overtook China and New Zealand?

Plot #6

Should we show percentage instead of counts?

Plot #7

The legend and the line order is different…

Plot #8

Maybe we can put the labels directly in the plot?

Plot #9

Color palettes

Qualitative palettes

  • Designed for nominal variable (no particular ordering)

Sequential palettes

  • Designed for ordered categorical variable or number going from low to high (or vice-versa)

Diverging palettes

  • Designed for ordered categorical variable or number going from low to high (or vice-versa) with a neutral value in between

Colorblindness

Colorblindness affect roughly 1 in 8 men.

Using different color palettes in ggplot2

  • To change the color scale, you can use functions starting from
    • scale_color_ (for points, lines, etc.) or
    • scale_fill_ (for bars, areas, etc.) in ggplot2.

Summary

  • Data visualisation can make large, complex data more accessible, understandable and usable.
  • An effective data visualisation means to design your data plot to effectively use human visual system to improve cognition about a targeted information from the data.
  • Empirical research has found that the accuracy of retrieving information from graphs is ranked as follows: position on common scale > position on non-aligned scale > length > direction > angle > area > volume > curvature > color shade/saturation > color hue.
  • Preattentive processing allows viewers to notice certain features are absent or present without focussing their attention on particular regions.
  • Gestalt principles are a set of laws to address the natural compulsion to find order in disorder by perceiving a series of individual elements as a whole.
  • Choosing the right plot type and design can help to effectively communicate the data story.
  • There are three types of color palettes: qualitative (for nominal variable), sequential (for ordered variable), and diverging (for ordered variable with a neutral value in between).
  • When choosing colors for data visualisation, it is important to consider colorblindness.