+ - 0:00:00
Notes for current slide
Notes for next slide



Statistical Methods for Omics Assisted Breeding

Data Visualization in


Emi Tanaka
emi.tanaka@sydney.edu.au
School of Mathematics and Statisitcs

2018/11/12

These slides may take a while to render properly. You can find the pdf here.

Creative Commons LicenseCreative Commons LicenseCreative Commons License    This work by Emi Tanaka is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

1 / 44

Data Visualisation in R

  • R has many contributed packages that extend from the standard base installation.

  • Today we will learn about ggplot2 and desplot R packages with a light touch on plotly.

Why Data Visualisation in R?

  • You can make publication quality graphs.

  • The graphs are easily reproducible.

ggplot2 is quite stable now but R packages are contributed and can change in future iterations.

2 / 44

ggplot

3 / 44

ggplot2 R package

  • ggplot2 is a powerful data visualisation R package with a large community following that is built on the layered grammar of graphics by Wickham (2008).
  • One of the reason that makes it powerful is because of its ease in extensibility resulting in many extension packages.
  • ggplot2 uses qplot or ggplot to make graphics
  • qplot is useful for making quick graphs (especially when data is not in a data.frame) but ggplot is advisable for most occasions.
  • We will only cover ggplot.
  • To get started, load the package:
library(ggplot2) # or library(tidyverse)

Wickham (2008) Practical tools for exploring data and models. PhD Thesis.

4 / 44

Layered Grammar of Graphics

  • Every ggplot2 object has three key components:
    1. data,
    2. A set of aesthestic mapping between variables in the data and visual properties (e.g color, size etc)
    3. At least one layer describing how to render each observation; usually created with geom function.
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
ggplot(data=iris) +
aes(x=Sepal.Length,
y=Sepal.Width) +
geom_point()

5 / 44

Every layer has:

  1. geom - the geometric object to use display the data, and stat - statistical transformation to use on the data for this layer.
  2. data and mapping (aesthestics) which is usually inherited from ggplot() object.
  3. position - position in the coordinate system.
p <- ggplot(iris, aes(Sepal.Length, Sepal.Width))
p + geom_point() # blank + geom layer

which is a short-hand for:

p + layer(geom="point", stat="identity",
position="identity")

Every ggplot object has:

  1. Data
  2. Aesthesitc mapping
  3. Layer(s)

Purpose of a layer is to display:

  • the raw data,
  • a statistical summary, or
  • additional metadata such as context, annotations, and references.
6 / 44

Some geom objects

p <- ggplot(iris,
aes(Species, Sepal.Width))
class(p)
## [1] "gg" "ggplot"
p + geom_blank()

p + geom_point()

p + geom_boxplot()

p + geom_violin()

7 / 44

Drawing lines

p <- ggplot(iris, aes(Petal.Length, Petal.Width)) + geom_point(colour="gray")
p + geom_abline(intercept=-0.4,slope=0.4)

p + geom_smooth(method="lm")

p + geom_hline(yintercept=0)

p + geom_vline(xintercept=0)

8 / 44

Distribution by group

p <- ggplot(iris, aes(Petal.Width, fill=Species))
p + geom_dotplot()

p + geom_histogram()

p + geom_density()

p + geom_freqpoly(aes(color=Species))

9 / 44
geom Description
geom_abline Reference lines: horizontal, vertical, and diagonal
geom_bar Bar charts
geom_bin2d Heatmap of 2d bin counts
geom_blank Draw nothing
geom_boxplot A box and whiskers plot (in the style of Tukey)
geom_contour 2d contours of a 3d surface
geom_count Count overlapping points
geom_density Smoothed density estimates
geom_density_2d Contours of a 2d density estimate
geom_dotplot Dot plot
geom_errorbarh Horizontal error bars
geom_hex Hexagonal heatmap of 2d bin counts
geom_freqpoly Histograms and frequency polygons
geom_jitter Jittered points
geom_crossbar Vertical intervals: lines, crossbars & errorbars
geom_map Polygons from a reference map
geom_path Connect observations
geom_point Points
geom_polygon Polygons
geom_qq_line A quantile-quantile plot
geom_quantile Quantile regression
geom_ribbon Ribbons and area plots
geom_rug Rug plots in the margins
geom_segment Line segments and curves
geom_smooth Smoothed conditional means
geom_spoke Line segments parameterised by location, direction and distance
geom_label Text
geom_raster Rectangles
geom_violin Violin plot

geom

10 / 44

Statistical Tranformation

head(iris[, c("Petal.Width", "Species")]) # raw data
## Petal.Width Species
## 1 0.2 setosa
## 2 0.2 setosa
## 3 0.2 setosa
## 4 0.2 setosa
## 5 0.2 setosa
## 6 0.4 setosa

stat_bin(bins=7, mapping=aes(Petal.Width, fill=Species)) Under the hood, the raw data is transformed into statistics and this is passed onto the geom where here geom="bar" is default.

## fill y count x xmin xmax density ncount
## 1 #619CFF 0 0 0.0 -0.2 0.2 0.0 0.0000000
## 2 #00BA38 0 0 0.0 -0.2 0.2 0.0 0.0000000
## 3 #F8766D 34 34 0.0 -0.2 0.2 1.7 1.0000000
## 4 #619CFF 0 0 0.4 0.2 0.6 0.0 0.0000000
## 5 #00BA38 0 0 0.4 0.2 0.6 0.0 0.0000000
## 6 #F8766D 16 16 0.4 0.2 0.6 0.8 0.4705882

11 / 44

Using stat with different geom object

p <- ggplot(iris, aes(Petal.Width, fill=Species))
p + stat_bin()

p + stat_bin(geom="bar")

p + stat_bin(geom="point")

p + stat_bin(geom="line")

12 / 44
stat Description
stat_count Bar charts
stat_bin_2d Heatmap of 2d bin counts
stat_boxplot A box and whiskers plot (in the style of Tukey)
stat_contour 2d contours of a 3d surface
stat_sum Count overlapping points
stat_density Smoothed density estimates
stat_density_2d Contours of a 2d density estimate
stat_bin_hex Hexagonal heatmap of 2d bin counts
stat_bin Histograms and frequency polygons
stat_qq_line A quantile-quantile plot
stat_quantile Quantile regression
stat_smooth Smoothed conditional means
stat_spoke Line segments parameterised by location, direction and distance
stat_ydensity Violin plot
stat_sf Visualise sf objects
stat_ecdf Compute empirical cumulative distribution
stat_ellipse Compute normal confidence ellipses
stat_function Compute function for each x value
stat_identity Leave data as is
stat_sf_coordinates Extract coordinates from 'sf' objects
stat_summary_bin Summarise y values at unique/binned x
stat_summary_2d Bin and summarise in 2d (rectangle & hexagons)
stat_unique Remove duplicates

stat

13 / 44

Customisation

14 / 44

Changing Color

There are many color palettes available, e.g.

library(RColorBrewer)
ggplot(iris, aes(Petal.Width,
fill=Species)) +
geom_dotplot() +
scale_fill_brewer(palette="Set3")

15 / 44

Grey-scale

ggplot(iris, aes(Petal.Width,
fill=Species)) +
geom_dotplot() +
scale_fill_grey()

Manual scale

ggplot(iris, aes(Petal.Width,
fill=Species)) +
geom_dotplot() +
scale_fill_manual(
values=c("red","blue", "green"),
labels=c("setosa", "versicolor", "virginica"))

16 / 44

Color variable is factor

ggplot(iris, aes(Petal.Width,
Petal.Length,
color=Species)) +
geom_point(size=2) +
scale_color_brewer(palette="Set1")

Color variable is continuous

ggplot(iris, aes(Petal.Width,
Petal.Length,
color=Sepal.Length)) +
geom_point(size=2) +
scale_color_distiller(palette="YlGnBu")

17 / 44

Counts of yellow/white and sweet/starchy maize kernels

Here I am massaging the data to get the counts for the 8th maize ear by observer 1 (plant pathologist):

library(agridat); library(dplyr)
(maize <- pearl.kernels %>%
filter(ear=="Ear08" & obs=="Obs01") %>%
select(ys, yt, ws, wt) %>%
tidyr::gather("Type", "Count", ys:wt) %>%
mutate(Color=case_when(
Type %in% c("ys", "yt") ~ "Yellow",
Type %in% c("ws", "wt") ~ "White"
),Kernel=case_when(
Type %in% c("ys", "ws") ~ "Starchy",
Type %in% c("yt", "wt") ~ "Sweet")))
## Type Count Color Kernel
## 1 ys 352 Yellow Starchy
## 2 yt 102 Yellow Sweet
## 3 ws 52 White Starchy
## 4 wt 26 White Sweet

Pearl, Raymond (1911) The Personal Equation In Breeding Experiments Involving Certain Characters of Maize Biological Bulletin 21 339-366

18 / 44

Example: Observer 1 for Maize Ear 8

p <- ggplot(maize, aes(Kernel, Count, fill=Color)) +
geom_bar(stat="identity")

19 / 44

Example: Observer 1 for Maize Ear 8

p <- ggplot(maize, aes(Kernel, Count, fill=Color)) +
geom_bar(stat="identity", color="black") +
scale_fill_manual(values=c("white", "yellow"),
label=c("White", "Yellow")) +
guides(fill=FALSE)

19 / 44

Example: Observer 1 for Maize Ear 8

p <- ggplot(maize, aes(Kernel, Count, fill=Color)) +
geom_bar(stat="identity", color="black") +
scale_fill_manual(values=c("white", "yellow"),
label=c("White", "Yellow")) +
guides(fill=FALSE)
19 / 44

Position for geom_bar

p + geom_bar()

 

p + geom_bar(position="stack")

p + geom_bar(position="dodge")

p + geom_bar(position="fill")

All geom_bar include the arguments stat="identity" and color="black".

20 / 44

Coordinate system

p2 <- ggplot(maize, aes(1, Count, fill=Type)) +
guides(fill=FALSE) + theme_void()
p + geom_bar()

p + geom_bar() + coord_polar(theta="y")

p + geom_bar() + coord_flip()

p + geom_bar() +
coord_polar(theta="y", direction=-1)

All geom_bar include the arguments stat="identity" and color="black".
21 / 44

Overplotting

g <- ggplot(pearl.kernels, aes(ear, ys, color=ear, size=3)) + xlab(NULL) +
guides(color=FALSE, size=FALSE) + ylab("No. of Yellow\n Starchy Kernel")
g + geom_point()

g + geom_point(position="jitter")

g + geom_point(alpha=1 / 3)

g + geom_point(alpha=1 / 6)

22 / 44

Massaging data to tidy form

maize2 <- pearl.kernels %>%
tidyr::gather("Type", "Count", ys:wt) %>%
mutate(Color=ifelse(substr(Type, 1, 1)=="y",
"Yellow", "White"),
Kernel=ifelse(substr(Type, 2, 2)=="s",
"Starchy", "Sweet"),
obs=factor(as.integer(substring(obs, 4, 5))))

head(maize2)
ear obs Type Count Color Kernel
1 Ear08 1 ys 352 Yellow Starchy
2 Ear08 2 ys 322 Yellow Starchy
3 Ear08 3 ys 298 Yellow Starchy
4 Ear08 4 ys 332 Yellow Starchy
5 Ear08 5 ys 305 Yellow Starchy
6 Ear08 6 ys 313 Yellow Starchy
head(pearl.kernels)
ear obs ys yt ws wt
1 Ear08 Obs01 352 102 52 26
2 Ear08 Obs02 322 49 82 79
3 Ear08 Obs03 298 75 108 51
4 Ear08 Obs04 332 101 71 28
5 Ear08 Obs05 305 101 86 40
6 Ear08 Obs06 313 100 90 29
23 / 44

Faceting

ggplot(maize2,
aes(obs, Count, fill=Type)) +
geom_bar(stat="identity") +
xlab("Observer") +
facet_wrap(~ear)

 

ear8 <- maize2 %>% filter(ear=="Ear08") %>%
ggplot(aes(obs, Count, fill=Type)) +
geom_bar(stat="identity", show.legend=F) +
labs(tag="(A)", title="Ear 8", x="Observer") +
facet_grid(Color ~ Kernel)

24 / 44

Patching Plots Together

library(patchwork)
ear8 + ear9 + ear10 + ear11 + plot_layout(ncol = 2)

25 / 44

Example: Barley in Norway

Average height each year for 15 genotypes of barley in Norway from 1974-1982.

str(aastveit.barley.height) # from agridat
## 'data.frame': 135 obs. of 3 variables:
## $ year : int 1974 1975 1976 1977 1978 1979 1980 1981 1982 1974 ...
## $ gen : Factor w/ 15 levels "G01","G02","G03",..: 1 1 1 1 1 1 1 1 1 2 ...
## $ height: num 81 67.3 71.5 64.3 55.8 84.9 86.2 88 72 72.3 ...

The covariate information are found in aastveit.barley.covs.
Use below command to find information about the data:

?aastveit.barley.height
?aastveit.barley.covs

Aastveit, A. H. and Martens, H. (1986). ANOVA interactions interpreted by partial least squares regression. Biometrics 42 829–844.

barley <- aastveit.barley.height %>%
left_join(aastveit.barley.covs,
by="year")

26 / 44

Subset data for labels

(maxh_df <- barley %>%
select(year, height, gen, T4) %>%
group_by(year) %>%
filter(height==max(height)) %>%
arrange(year))
## # A tibble: 9 x 4
## # Groups: year [9]
## year height gen T4
## <int> <dbl> <fct> <dbl>
## 1 1974 97 G15 12.1
## 2 1975 83.3 G15 16.0
## 3 1976 86.8 G15 17.4
## 4 1977 80 G10 17.4
## 5 1978 75.5 G11 13.9
## 6 1979 97 G14 14.0
## 7 1980 106. G11 13.9
## 8 1981 106. G07 13.6
## 9 1982 90.3 G14 11.6

Labels with geom_label

g <- ggplot(barley,
aes(T4, height)) +
geom_point(size=4, aes(color=factor(year))) +
guides(color=FALSE) +
xlab("Avg temp (Celsius) in the 4-th period") +
ylab("Height")
g + geom_label(data=maxh_df, size=4,
aes(T4, height, label=year))

27 / 44

Nudge labels + geom_text

g +
geom_text(
data=maxh_df, size=4,
nudge_y=10,
aes(T4, height, label=year))

ggrepel

library(ggrepel)
g +
geom_label_repel(
data=maxh_df, size=4,
aes(T4, height, label=year))

28 / 44

Annotation Text

g + annotate("text",
x=12, y=100,
label="1974", size=12)

Annotation Rectangle

g + annotate("rect", xmin=15, xmax=18,
ymin=-Inf, ymax=Inf,
alpha=0.2, fill="red")

29 / 44

Changing Labels

g <- ggplot(vargas.wheat1.traits,
aes(NGS, yield)) +
geom_point(size=3) +
geom_point(aes(colour=gen)) +
geom_smooth(se=F, method="lm") +
facet_wrap(~year) +
labs(colour="Genotype") +
# changes the label name for color legend
labs(x="Number of grains per spikelet") +
# same as xlab(..)
labs(y="Yield (kg/ha)") +
# same as ylab(..)
labs(title="Durum Wheat at Ciudad Obregon, Mexico 1990-1995") +
# same as ggtitle(..)
labs(subtitle="Source: Vargas et al. (1998) Interpreting Genotype x Environment Interaction in Wheat by Partial Least Squares Regression.")
# same as ggtitle(subtitle=..)

30 / 44

Theme - customise the look

g

31 / 44

Theme - customise the look

g +
theme(legend.position="bottom")

31 / 44

Theme - customise the look

g +
theme(legend.position="bottom",
plot.title=element_text(face="bold", size=15))

31 / 44

Theme - customise the look

g +
theme(legend.position="bottom",
plot.title=element_text(face="bold", size=15),
plot.subtitle=element_text(face="italic", size=8))

31 / 44

Theme - customise the look

g +
theme(legend.position="bottom",
plot.title=element_text(face="bold", size=15),
plot.subtitle=element_text(face="italic", size=8),
panel.background=element_rect(fill="white"))

31 / 44

Theme - customise the look

g +
theme(legend.position="bottom",
plot.title=element_text(face="bold", size=15),
plot.subtitle=element_text(face="italic", size=8),
panel.background=element_rect(fill="white"),
panel.border=element_rect(colour="grey20", fill=NA))

31 / 44

Theme - customise the look

g +
theme(legend.position="bottom",
plot.title=element_text(face="bold", size=15),
plot.subtitle=element_text(face="italic", size=8),
panel.background=element_rect(fill="white"),
panel.border=element_rect(colour="grey20", fill=NA),
panel.grid=element_line(colour="grey92"))

31 / 44

Theme - customise the look

g +
theme(legend.position="bottom",
plot.title=element_text(face="bold", size=15),
plot.subtitle=element_text(face="italic", size=8),
panel.background=element_rect(fill="white"),
panel.border=element_rect(colour="grey20", fill=NA),
panel.grid=element_line(colour="grey92"),
panel.grid.minor=element_line(size=rel(0.5)))

31 / 44

Theme - customise the look

g +
theme(legend.position="bottom",
plot.title=element_text(face="bold", size=15),
plot.subtitle=element_text(face="italic", size=8),
panel.background=element_rect(fill="white"),
panel.border=element_rect(colour="grey20", fill=NA),
panel.grid=element_line(colour="grey92"),
panel.grid.minor=element_line(size=rel(0.5)),
strip.background=element_rect(fill="grey85", colour="grey20"))

31 / 44

Theme - customise the look

g +
theme(legend.position="bottom",
plot.title=element_text(face="bold", size=15),
plot.subtitle=element_text(face="italic", size=8),
panel.background=element_rect(fill="white"),
panel.border=element_rect(colour="grey20", fill=NA),
panel.grid=element_line(colour="grey92"),
panel.grid.minor=element_line(size=rel(0.5)),
strip.background=element_rect(fill="grey85", colour="grey20"),
legend.key=element_rect(fill="white"))

31 / 44

Theme - customise the look

g +
theme(legend.position="bottom",
plot.title=element_text(face="bold", size=15),
plot.subtitle=element_text(face="italic", size=8),
panel.background=element_rect(fill="white"),
panel.border=element_rect(colour="grey20", fill=NA),
panel.grid=element_line(colour="grey92"),
panel.grid.minor=element_line(size=rel(0.5)),
strip.background=element_rect(fill="grey85", colour="grey20"),
legend.key=element_rect(fill="white"))

or use a pre-defined theme:

g +
theme_bw() +
theme(legend.position="bottom",
plot.title=element_text(face="bold", size=14),
plot.subtitle=element_text(face="italic", size=8))

32 / 44

More Pre-Defined Themes

g + theme_gray()

g + theme_classic()

 

g + theme_minimal()

g + theme_dark()

33 / 44

Even More Pre-Defined Themes

library(ggthemes)
g + theme_stata() + scale_color_stata()

g + theme_solarized() + scale_color_solarized()

34 / 44

Cheatsheet

  • There are even more features in ggplot2 and many extension packages.
  • Go to RStudio > Help > Cheatsheets > Data Visualisation with ggplot2.
  • There is a useful addin for modifying the theme: ggThemeAssist.
35 / 44

References

What graph to choose for your data?

How to get help

There is a vibrant, friendly and (overly)-generous community of users of R (which is another reason that makes using R great).

36 / 44

plotly

37 / 44

plotly - interactive graphics

library(plotly)
g <- ggplot(iris,
aes(Sepal.Length, Sepal.Width, color=Species)) +
geom_point()
ggplotly(g)

38 / 44

Simple animation with plotly

g <- ggplot(vargas.wheat1.traits,
aes(NGS, yield, frame=year)) +
geom_point(aes(color=gen)) +
geom_smooth(method="lm")
ggplotly(g)

39 / 44

desplot

40 / 44

Yield of Oats in a Split-Plot Experiment

str(yates.oats) # from agridat
## 'data.frame': 72 obs. of 8 variables:
## $ row : int 16 12 3 14 8 5 15 11 3 14 ...
## $ col : int 3 4 3 1 2 2 4 4 4 2 ...
## $ yield: int 80 60 89 117 64 70 82 102 82 114 ...
## $ nitro: num 0 0 0 0 0 0 0.2 0.2 0.2 0.2 ...
## $ gen : Factor w/ 3 levels "GoldenRain","Marvellous",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ block: Factor w/ 6 levels "B1","B2","B3",..: 1 2 3 4 5 6 1 2 3 4 ...
## $ grain: num 20 15 22.2 29.2 16 ...
## $ straw: num 28 25 40.5 28.8 32 ...

Yates, Frank (1935) Complex experiments. Journal of the Royal Statistical Society Suppl 2 181-247

41 / 44

desplot - visualising designs for field trials

library(desplot)
desplot(block ~ row + col, yates.oats,
col=nitro, text=gen, cex=1, aspect=176/620,
out1=block, out2=gen,
out2.gpar=list(col = "gray50", lwd = 1, lty = 1))

42 / 44

ggplot version of desplot in beta now!

Enable by using argument gg=TRUE in desplot.

This means that you can use theme and other ggplot features easily.

43 / 44



Slides

These slides were made using the R package xaringan with the ninja-themes and is available at bit.ly/UT-WS-DataVis.

Your Turn

Download day1-session02-datavis-tutorial.Rmd here, open in RStudio, push the button "Run Document" on the top tab and work through the exercises. For workshop participants, contact Emi for the tutorials.

Creative Commons LicenseCreative Commons LicenseCreative Commons License    This work by Emi Tanaka is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

44 / 44

Data Visualisation in R

  • R has many contributed packages that extend from the standard base installation.

  • Today we will learn about ggplot2 and desplot R packages with a light touch on plotly.

Why Data Visualisation in R?

  • You can make publication quality graphs.

  • The graphs are easily reproducible.

ggplot2 is quite stable now but R packages are contributed and can change in future iterations.

2 / 44
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow