R
R
has many contributed packages that extend from the standard base
installation.
Today we will learn about ggplot2
and desplot
R packages with a light touch on plotly
.
R
?You can make publication quality graphs.
The graphs are easily reproducible.
ggplot2
is quite stable now but R
packages are contributed and can change in future iterations.
ggplot
ggplot2
R packageggplot2
is a powerful data visualisation R package with a large community following that is built on the layered grammar of graphics by Wickham (2008). ggplot2
uses qplot
or ggplot
to make graphicsqplot
is useful for making quick graphs (especially when data is not in a data.frame
) but ggplot
is advisable for most occasions. ggplot
.library(ggplot2) # or library(tidyverse)
Wickham (2008) Practical tools for exploring data and models. PhD Thesis.
ggplot2
object has three key components:str(iris)
## 'data.frame': 150 obs. of 5 variables:## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
ggplot(data=iris) + aes(x=Sepal.Length, y=Sepal.Width) + geom_point()
geom
- the geometric object to use display the data, and stat
- statistical transformation to use on the data for this layer.ggplot()
object.position
- position in the coordinate system.p <- ggplot(iris, aes(Sepal.Length, Sepal.Width))p + geom_point() # blank + geom layer
which is a short-hand for:
p + layer(geom="point", stat="identity", position="identity")
Every ggplot
object has:
Purpose of a layer is to display:
geom
objectsp <- ggplot(iris, aes(Species, Sepal.Width))class(p)
## [1] "gg" "ggplot"
p + geom_blank()
p + geom_point()
p + geom_boxplot()
p + geom_violin()
p <- ggplot(iris, aes(Petal.Length, Petal.Width)) + geom_point(colour="gray")
p + geom_abline(intercept=-0.4,slope=0.4)
p + geom_smooth(method="lm")
p + geom_hline(yintercept=0)
p + geom_vline(xintercept=0)
p <- ggplot(iris, aes(Petal.Width, fill=Species))
p + geom_dotplot()
p + geom_histogram()
p + geom_density()
p + geom_freqpoly(aes(color=Species))
geom | Description |
---|---|
geom_abline | Reference lines: horizontal, vertical, and diagonal |
geom_bar | Bar charts |
geom_bin2d | Heatmap of 2d bin counts |
geom_blank | Draw nothing |
geom_boxplot | A box and whiskers plot (in the style of Tukey) |
geom_contour | 2d contours of a 3d surface |
geom_count | Count overlapping points |
geom_density | Smoothed density estimates |
geom_density_2d | Contours of a 2d density estimate |
geom_dotplot | Dot plot |
geom_errorbarh | Horizontal error bars |
geom_hex | Hexagonal heatmap of 2d bin counts |
geom_freqpoly | Histograms and frequency polygons |
geom_jitter | Jittered points |
geom_crossbar | Vertical intervals: lines, crossbars & errorbars |
geom_map | Polygons from a reference map |
geom_path | Connect observations |
geom_point | Points |
geom_polygon | Polygons |
geom_qq_line | A quantile-quantile plot |
geom_quantile | Quantile regression |
geom_ribbon | Ribbons and area plots |
geom_rug | Rug plots in the margins |
geom_segment | Line segments and curves |
geom_smooth | Smoothed conditional means |
geom_spoke | Line segments parameterised by location, direction and distance |
geom_label | Text |
geom_raster | Rectangles |
geom_violin | Violin plot |
head(iris[, c("Petal.Width", "Species")]) # raw data
## Petal.Width Species## 1 0.2 setosa## 2 0.2 setosa## 3 0.2 setosa## 4 0.2 setosa## 5 0.2 setosa## 6 0.4 setosa
stat_bin(bins=7, mapping=aes(Petal.Width, fill=Species))
Under the hood, the raw data is transformed into statistics and this is passed onto the geom
where here geom="bar"
is default.
## fill y count x xmin xmax density ncount## 1 #619CFF 0 0 0.0 -0.2 0.2 0.0 0.0000000## 2 #00BA38 0 0 0.0 -0.2 0.2 0.0 0.0000000## 3 #F8766D 34 34 0.0 -0.2 0.2 1.7 1.0000000## 4 #619CFF 0 0 0.4 0.2 0.6 0.0 0.0000000## 5 #00BA38 0 0 0.4 0.2 0.6 0.0 0.0000000## 6 #F8766D 16 16 0.4 0.2 0.6 0.8 0.4705882
stat
with different geom
objectp <- ggplot(iris, aes(Petal.Width, fill=Species))
p + stat_bin()
p + stat_bin(geom="bar")
p + stat_bin(geom="point")
p + stat_bin(geom="line")
stat | Description |
---|---|
stat_count | Bar charts |
stat_bin_2d | Heatmap of 2d bin counts |
stat_boxplot | A box and whiskers plot (in the style of Tukey) |
stat_contour | 2d contours of a 3d surface |
stat_sum | Count overlapping points |
stat_density | Smoothed density estimates |
stat_density_2d | Contours of a 2d density estimate |
stat_bin_hex | Hexagonal heatmap of 2d bin counts |
stat_bin | Histograms and frequency polygons |
stat_qq_line | A quantile-quantile plot |
stat_quantile | Quantile regression |
stat_smooth | Smoothed conditional means |
stat_spoke | Line segments parameterised by location, direction and distance |
stat_ydensity | Violin plot |
stat_sf | Visualise sf objects |
stat_ecdf | Compute empirical cumulative distribution |
stat_ellipse | Compute normal confidence ellipses |
stat_function | Compute function for each x value |
stat_identity | Leave data as is |
stat_sf_coordinates | Extract coordinates from 'sf' objects |
stat_summary_bin | Summarise y values at unique/binned x |
stat_summary_2d | Bin and summarise in 2d (rectangle & hexagons) |
stat_unique | Remove duplicates |
There are many color palettes available, e.g.
library(RColorBrewer)ggplot(iris, aes(Petal.Width, fill=Species)) + geom_dotplot() + scale_fill_brewer(palette="Set3")
ggplot(iris, aes(Petal.Width, fill=Species)) + geom_dotplot() + scale_fill_grey()
ggplot(iris, aes(Petal.Width, fill=Species)) + geom_dotplot() + scale_fill_manual( values=c("red","blue", "green"), labels=c("setosa", "versicolor", "virginica"))
factor
ggplot(iris, aes(Petal.Width, Petal.Length, color=Species)) + geom_point(size=2) + scale_color_brewer(palette="Set1")
ggplot(iris, aes(Petal.Width, Petal.Length, color=Sepal.Length)) + geom_point(size=2) + scale_color_distiller(palette="YlGnBu")
Here I am massaging the data to get the counts for the 8th maize ear by observer 1 (plant pathologist):
library(agridat); library(dplyr)(maize <- pearl.kernels %>% filter(ear=="Ear08" & obs=="Obs01") %>% select(ys, yt, ws, wt) %>% tidyr::gather("Type", "Count", ys:wt) %>% mutate(Color=case_when( Type %in% c("ys", "yt") ~ "Yellow", Type %in% c("ws", "wt") ~ "White" ),Kernel=case_when( Type %in% c("ys", "ws") ~ "Starchy", Type %in% c("yt", "wt") ~ "Sweet")))
## Type Count Color Kernel## 1 ys 352 Yellow Starchy## 2 yt 102 Yellow Sweet## 3 ws 52 White Starchy## 4 wt 26 White Sweet
Pearl, Raymond (1911) The Personal Equation In Breeding Experiments Involving Certain Characters of Maize Biological Bulletin 21 339-366
p <- ggplot(maize, aes(Kernel, Count, fill=Color)) + geom_bar(stat="identity")
Image Source:
https://agrifarmingtips.com/maize-cultivation-process/
p <- ggplot(maize, aes(Kernel, Count, fill=Color)) + geom_bar(stat="identity", color="black") + scale_fill_manual(values=c("white", "yellow"), label=c("White", "Yellow")) + guides(fill=FALSE)
Image Source:
https://agrifarmingtips.com/maize-cultivation-process/
p <- ggplot(maize, aes(Kernel, Count, fill=Color)) + geom_bar(stat="identity", color="black") + scale_fill_manual(values=c("white", "yellow"), label=c("White", "Yellow")) + guides(fill=FALSE)
Image Source:
https://agrifarmingtips.com/maize-cultivation-process/
geom_bar
p + geom_bar()
p + geom_bar(position="stack")
p + geom_bar(position="dodge")
p + geom_bar(position="fill")
All geom_bar
include the arguments stat="identity"
and color="black"
.
p2 <- ggplot(maize, aes(1, Count, fill=Type)) + guides(fill=FALSE) + theme_void()
p + geom_bar()
p + geom_bar() + coord_polar(theta="y")
p + geom_bar() + coord_flip()
p + geom_bar() + coord_polar(theta="y", direction=-1)
geom_bar
include the arguments stat="identity"
and color="black"
.g <- ggplot(pearl.kernels, aes(ear, ys, color=ear, size=3)) + xlab(NULL) + guides(color=FALSE, size=FALSE) + ylab("No. of Yellow\n Starchy Kernel")
g + geom_point()
g + geom_point(position="jitter")
g + geom_point(alpha=1 / 3)
g + geom_point(alpha=1 / 6)
maize2 <- pearl.kernels %>% tidyr::gather("Type", "Count", ys:wt) %>% mutate(Color=ifelse(substr(Type, 1, 1)=="y", "Yellow", "White"), Kernel=ifelse(substr(Type, 2, 2)=="s", "Starchy", "Sweet"), obs=factor(as.integer(substring(obs, 4, 5))))
head(maize2)
ear obs Type Count Color Kernel1 Ear08 1 ys 352 Yellow Starchy2 Ear08 2 ys 322 Yellow Starchy3 Ear08 3 ys 298 Yellow Starchy4 Ear08 4 ys 332 Yellow Starchy5 Ear08 5 ys 305 Yellow Starchy6 Ear08 6 ys 313 Yellow Starchy
head(pearl.kernels)
ear obs ys yt ws wt1 Ear08 Obs01 352 102 52 262 Ear08 Obs02 322 49 82 793 Ear08 Obs03 298 75 108 514 Ear08 Obs04 332 101 71 285 Ear08 Obs05 305 101 86 406 Ear08 Obs06 313 100 90 29
ggplot(maize2, aes(obs, Count, fill=Type)) + geom_bar(stat="identity") + xlab("Observer") + facet_wrap(~ear)
ear8 <- maize2 %>% filter(ear=="Ear08") %>% ggplot(aes(obs, Count, fill=Type)) + geom_bar(stat="identity", show.legend=F) + labs(tag="(A)", title="Ear 8", x="Observer") + facet_grid(Color ~ Kernel)
library(patchwork)ear8 + ear9 + ear10 + ear11 + plot_layout(ncol = 2)
Average height each year for 15 genotypes of barley in Norway from 1974-1982.
str(aastveit.barley.height) # from agridat
## 'data.frame': 135 obs. of 3 variables:## $ year : int 1974 1975 1976 1977 1978 1979 1980 1981 1982 1974 ...## $ gen : Factor w/ 15 levels "G01","G02","G03",..: 1 1 1 1 1 1 1 1 1 2 ...## $ height: num 81 67.3 71.5 64.3 55.8 84.9 86.2 88 72 72.3 ...
The covariate information are found in aastveit.barley.covs
.
Use below command to find information about the data:
?aastveit.barley.height?aastveit.barley.covs
Aastveit, A. H. and Martens, H. (1986). ANOVA interactions interpreted by partial least squares regression. Biometrics 42 829–844.
barley <- aastveit.barley.height %>% left_join(aastveit.barley.covs, by="year")
(maxh_df <- barley %>% select(year, height, gen, T4) %>% group_by(year) %>% filter(height==max(height)) %>% arrange(year))
## # A tibble: 9 x 4## # Groups: year [9]## year height gen T4## <int> <dbl> <fct> <dbl>## 1 1974 97 G15 12.1## 2 1975 83.3 G15 16.0## 3 1976 86.8 G15 17.4## 4 1977 80 G10 17.4## 5 1978 75.5 G11 13.9## 6 1979 97 G14 14.0## 7 1980 106. G11 13.9## 8 1981 106. G07 13.6## 9 1982 90.3 G14 11.6
geom_label
g <- ggplot(barley, aes(T4, height)) + geom_point(size=4, aes(color=factor(year))) + guides(color=FALSE) + xlab("Avg temp (Celsius) in the 4-th period") + ylab("Height")g + geom_label(data=maxh_df, size=4, aes(T4, height, label=year))
geom_text
g + geom_text( data=maxh_df, size=4, nudge_y=10, aes(T4, height, label=year))
ggrepel
library(ggrepel)g + geom_label_repel( data=maxh_df, size=4, aes(T4, height, label=year))
g + annotate("text", x=12, y=100, label="1974", size=12)
g + annotate("rect", xmin=15, xmax=18, ymin=-Inf, ymax=Inf, alpha=0.2, fill="red")
g <- ggplot(vargas.wheat1.traits, aes(NGS, yield)) + geom_point(size=3) + geom_point(aes(colour=gen)) + geom_smooth(se=F, method="lm") + facet_wrap(~year) + labs(colour="Genotype") + # changes the label name for color legend labs(x="Number of grains per spikelet") + # same as xlab(..) labs(y="Yield (kg/ha)") + # same as ylab(..) labs(title="Durum Wheat at Ciudad Obregon, Mexico 1990-1995") + # same as ggtitle(..) labs(subtitle="Source: Vargas et al. (1998) Interpreting Genotype x Environment Interaction in Wheat by Partial Least Squares Regression.") # same as ggtitle(subtitle=..)
g
g + theme(legend.position="bottom")
g + theme(legend.position="bottom", plot.title=element_text(face="bold", size=15))
g + theme(legend.position="bottom", plot.title=element_text(face="bold", size=15),plot.subtitle=element_text(face="italic", size=8))
g + theme(legend.position="bottom", plot.title=element_text(face="bold", size=15),plot.subtitle=element_text(face="italic", size=8),panel.background=element_rect(fill="white"))
g + theme(legend.position="bottom", plot.title=element_text(face="bold", size=15),plot.subtitle=element_text(face="italic", size=8),panel.background=element_rect(fill="white"),panel.border=element_rect(colour="grey20", fill=NA))
g + theme(legend.position="bottom", plot.title=element_text(face="bold", size=15),plot.subtitle=element_text(face="italic", size=8),panel.background=element_rect(fill="white"),panel.border=element_rect(colour="grey20", fill=NA),panel.grid=element_line(colour="grey92"))
g + theme(legend.position="bottom", plot.title=element_text(face="bold", size=15),plot.subtitle=element_text(face="italic", size=8),panel.background=element_rect(fill="white"),panel.border=element_rect(colour="grey20", fill=NA),panel.grid=element_line(colour="grey92"),panel.grid.minor=element_line(size=rel(0.5)))
g + theme(legend.position="bottom", plot.title=element_text(face="bold", size=15),plot.subtitle=element_text(face="italic", size=8),panel.background=element_rect(fill="white"),panel.border=element_rect(colour="grey20", fill=NA),panel.grid=element_line(colour="grey92"),panel.grid.minor=element_line(size=rel(0.5)),strip.background=element_rect(fill="grey85", colour="grey20"))
g + theme(legend.position="bottom", plot.title=element_text(face="bold", size=15),plot.subtitle=element_text(face="italic", size=8),panel.background=element_rect(fill="white"),panel.border=element_rect(colour="grey20", fill=NA),panel.grid=element_line(colour="grey92"),panel.grid.minor=element_line(size=rel(0.5)),strip.background=element_rect(fill="grey85", colour="grey20"),legend.key=element_rect(fill="white"))
g + theme(legend.position="bottom", plot.title=element_text(face="bold", size=15),plot.subtitle=element_text(face="italic", size=8),panel.background=element_rect(fill="white"),panel.border=element_rect(colour="grey20", fill=NA),panel.grid=element_line(colour="grey92"),panel.grid.minor=element_line(size=rel(0.5)),strip.background=element_rect(fill="grey85", colour="grey20"),legend.key=element_rect(fill="white"))
or use a pre-defined theme:
g + theme_bw() +theme(legend.position="bottom", plot.title=element_text(face="bold", size=14),plot.subtitle=element_text(face="italic", size=8))
g + theme_gray()
g + theme_classic()
g + theme_minimal()
g + theme_dark()
library(ggthemes)
g + theme_stata() + scale_color_stata()
g + theme_solarized() + scale_color_solarized()
ggplot2
and many extension packages.ggThemeAssist
.There is a vibrant, friendly and (overly)-generous community of users of R (which is another reason that makes using R great).
plotly
plotly
- interactive graphicslibrary(plotly)g <- ggplot(iris, aes(Sepal.Length, Sepal.Width, color=Species)) + geom_point()ggplotly(g)
plotly
g <- ggplot(vargas.wheat1.traits, aes(NGS, yield, frame=year)) + geom_point(aes(color=gen)) + geom_smooth(method="lm")ggplotly(g)
desplot
str(yates.oats) # from agridat
## 'data.frame': 72 obs. of 8 variables:## $ row : int 16 12 3 14 8 5 15 11 3 14 ...## $ col : int 3 4 3 1 2 2 4 4 4 2 ...## $ yield: int 80 60 89 117 64 70 82 102 82 114 ...## $ nitro: num 0 0 0 0 0 0 0.2 0.2 0.2 0.2 ...## $ gen : Factor w/ 3 levels "GoldenRain","Marvellous",..: 1 1 1 1 1 1 1 1 1 1 ...## $ block: Factor w/ 6 levels "B1","B2","B3",..: 1 2 3 4 5 6 1 2 3 4 ...## $ grain: num 20 15 22.2 29.2 16 ...## $ straw: num 28 25 40.5 28.8 32 ...
Yates, Frank (1935) Complex experiments. Journal of the Royal Statistical Society Suppl 2 181-247
desplot
- visualising designs for field trialslibrary(desplot)desplot(block ~ row + col, yates.oats, col=nitro, text=gen, cex=1, aspect=176/620, out1=block, out2=gen, out2.gpar=list(col = "gray50", lwd = 1, lty = 1))
ggplot
version of desplot
in beta now!Enable by using argument gg=TRUE
in desplot
.
This means that you can use
theme
and other ggplot
features easily.
These slides were made using the R package xaringan
with the ninja-themes
and is available at bit.ly/UT-WS-DataVis
.
Download
For workshop participants, contact Emi for the tutorials.
day1-session02-datavis-tutorial.Rmd
here, open in RStudio, push the button "Run Document" on the top tab and work through the exercises.
R
R
has many contributed packages that extend from the standard base
installation.
Today we will learn about ggplot2
and desplot
R packages with a light touch on plotly
.
R
?You can make publication quality graphs.
The graphs are easily reproducible.
ggplot2
is quite stable now but R
packages are contributed and can change in future iterations.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |