These slides are viewed best by Chrome or Firefox and occasionally need to be refreshed if elements did not load properly. See here for the PDF .
Press the right arrow to progress to the next slide!
Lecturer: Emi Tanaka
ETC5521.Clayton-x@monash.edu
Week 4 - Session 2
data(bostonc, package = "DAAG")df3 <- read_tsv(bostonc[10:length(bostonc)]) skimr::skim(df3)
## ── Data Summary ────────────────────────## Values## Name df3 ## Number of rows 506 ## Number of columns 21 ## _______________________ ## Column type frequency: ## character 2 ## numeric 19 ## ________________________ ## Group variables None ## ## ── Variable type: character ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate min max empty n_unique whitespace## 1 TOWN 0 1 4 23 0 92 0## 2 TRACT 0 1 4 4 0 506 0## ## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 OBS. 0 1 254. 146. 1 127. 254. 380. 506 ▇▇▇▇▇## 2 TOWN# 0 1 47.5 27.6 0 26.2 42 78 91 ▅▆▅▃▇## 3 LON 0 1 -71.1 0.0754 -71.3 -71.1 -71.1 -71.0 -70.8 ▁▂▇▂▁## 4 LAT 0 1 42.2 0.0618 42.0 42.2 42.2 42.3 42.4 ▁▃▇▃▁## 5 MEDV 0 1 22.5 9.20 5 17.0 21.2 25 50 ▂▇▅▁▁## 6 CMEDV 0 1 22.5 9.18 5 17.0 21.2 25 50 ▂▇▅▁▁## 7 CRIM 0 1 3.61 8.60 0.00632 0.0820 0.257 3.68 89.0 ▇▁▁▁▁## 8 ZN 0 1 11.4 23.3 0 0 0 12.5 100 ▇▁▁▁▁## 9 INDUS 0 1 11.1 6.86 0.46 5.19 9.69 18.1 27.7 ▇▆▁▇▁## 10 CHAS 0 1 0.0692 0.254 0 0 0 0 1 ▇▁▁▁▁## 11 NOX 0 1 0.555 0.116 0.385 0.449 0.538 0.624 0.871 ▇▇▆▅▁## 12 RM 0 1 6.28 0.703 3.56 5.89 6.21 6.62 8.78 ▁▂▇▂▁## 13 AGE 0 1 68.6 28.1 2.9 45.0 77.5 94.1 100 ▂▂▂▃▇## 14 DIS 0 1 3.80 2.11 1.13 2.10 3.21 5.19 12.1 ▇▅▂▁▁## 15 RAD 0 1 9.55 8.71 1 4 5 24 24 ▇▂▁▁▃## 16 TAX 0 1 408. 169. 187 279 330 666 711 ▇▇▃▁▇## 17 PTRATIO 0 1 18.5 2.16 12.6 17.4 19.0 20.2 22 ▁▃▅▅▇## 18 B 0 1 357. 91.3 0.32 375. 391. 396. 397. ▁▁▁▁▇## 19 LSTAT 0 1 12.7 7.14 1.73 6.95 11.4 17.0 38.0 ▇▇▅▂▁
ggplot(df3, aes(MEDV)) + geom_histogram(binwidth = 1, color = "black", fill = "#008A25") + labs(x = "Median housing value (US$1000)", y = "Frequency")
Harrison, David, and Daniel L. Rubinfeld (1978) Hedonic Housing Prices and the Demand for Clean Air, Journal of Environmental Economics and Management 5 81-102. Original data.
Gilley, O.W. and R. Kelley Pace (1996) On the Harrison and Rubinfeld Data. Journal of Environmental Economics and Management 31 403-405. Provided corrections and examined censoring.
Maindonald, John H. and Braun, W. John (2020). DAAG: Data Analysis and Graphics Data and Functions. R package version 1.24
data(bostonc, package = "DAAG")df3 <- read_tsv(bostonc[10:length(bostonc)]) skimr::skim(df3)
## ── Data Summary ────────────────────────## Values## Name df3 ## Number of rows 506 ## Number of columns 21 ## _______________________ ## Column type frequency: ## character 2 ## numeric 19 ## ________________________ ## Group variables None ## ## ── Variable type: character ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate min max empty n_unique whitespace## 1 TOWN 0 1 4 23 0 92 0## 2 TRACT 0 1 4 4 0 506 0## ## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 OBS. 0 1 254. 146. 1 127. 254. 380. 506 ▇▇▇▇▇## 2 TOWN# 0 1 47.5 27.6 0 26.2 42 78 91 ▅▆▅▃▇## 3 LON 0 1 -71.1 0.0754 -71.3 -71.1 -71.1 -71.0 -70.8 ▁▂▇▂▁## 4 LAT 0 1 42.2 0.0618 42.0 42.2 42.2 42.3 42.4 ▁▃▇▃▁## 5 MEDV 0 1 22.5 9.20 5 17.0 21.2 25 50 ▂▇▅▁▁## 6 CMEDV 0 1 22.5 9.18 5 17.0 21.2 25 50 ▂▇▅▁▁## 7 CRIM 0 1 3.61 8.60 0.00632 0.0820 0.257 3.68 89.0 ▇▁▁▁▁## 8 ZN 0 1 11.4 23.3 0 0 0 12.5 100 ▇▁▁▁▁## 9 INDUS 0 1 11.1 6.86 0.46 5.19 9.69 18.1 27.7 ▇▆▁▇▁## 10 CHAS 0 1 0.0692 0.254 0 0 0 0 1 ▇▁▁▁▁## 11 NOX 0 1 0.555 0.116 0.385 0.449 0.538 0.624 0.871 ▇▇▆▅▁## 12 RM 0 1 6.28 0.703 3.56 5.89 6.21 6.62 8.78 ▁▂▇▂▁## 13 AGE 0 1 68.6 28.1 2.9 45.0 77.5 94.1 100 ▂▂▂▃▇## 14 DIS 0 1 3.80 2.11 1.13 2.10 3.21 5.19 12.1 ▇▅▂▁▁## 15 RAD 0 1 9.55 8.71 1 4 5 24 24 ▇▂▁▁▃## 16 TAX 0 1 408. 169. 187 279 330 666 711 ▇▇▃▁▇## 17 PTRATIO 0 1 18.5 2.16 12.6 17.4 19.0 20.2 22 ▁▃▅▅▇## 18 B 0 1 357. 91.3 0.32 375. 391. 396. 397. ▁▁▁▁▇## 19 LSTAT 0 1 12.7 7.14 1.73 6.95 11.4 17.0 38.0 ▇▇▅▂▁
ggplot(df3, aes(MEDV)) + geom_histogram(binwidth = 1, color = "black", fill = "#008A25") + labs(x = "Median housing value (US$1000)", y = "Frequency")
Harrison, David, and Daniel L. Rubinfeld (1978) Hedonic Housing Prices and the Demand for Clean Air, Journal of Environmental Economics and Management 5 81-102. Original data.
Gilley, O.W. and R. Kelley Pace (1996) On the Harrison and Rubinfeld Data. Journal of Environmental Economics and Management 31 403-405. Provided corrections and examined censoring.
Maindonald, John H. and Braun, W. John (2020). DAAG: Data Analysis and Graphics Data and Functions. R package version 1.24
data(bostonc, package = "DAAG")df3 <- read_tsv(bostonc[10:length(bostonc)]) skimr::skim(df3)
## ── Data Summary ────────────────────────## Values## Name df3 ## Number of rows 506 ## Number of columns 21 ## _______________________ ## Column type frequency: ## character 2 ## numeric 19 ## ________________________ ## Group variables None ## ## ── Variable type: character ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate min max empty n_unique whitespace## 1 TOWN 0 1 4 23 0 92 0## 2 TRACT 0 1 4 4 0 506 0## ## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 OBS. 0 1 254. 146. 1 127. 254. 380. 506 ▇▇▇▇▇## 2 TOWN# 0 1 47.5 27.6 0 26.2 42 78 91 ▅▆▅▃▇## 3 LON 0 1 -71.1 0.0754 -71.3 -71.1 -71.1 -71.0 -70.8 ▁▂▇▂▁## 4 LAT 0 1 42.2 0.0618 42.0 42.2 42.2 42.3 42.4 ▁▃▇▃▁## 5 MEDV 0 1 22.5 9.20 5 17.0 21.2 25 50 ▂▇▅▁▁## 6 CMEDV 0 1 22.5 9.18 5 17.0 21.2 25 50 ▂▇▅▁▁## 7 CRIM 0 1 3.61 8.60 0.00632 0.0820 0.257 3.68 89.0 ▇▁▁▁▁## 8 ZN 0 1 11.4 23.3 0 0 0 12.5 100 ▇▁▁▁▁## 9 INDUS 0 1 11.1 6.86 0.46 5.19 9.69 18.1 27.7 ▇▆▁▇▁## 10 CHAS 0 1 0.0692 0.254 0 0 0 0 1 ▇▁▁▁▁## 11 NOX 0 1 0.555 0.116 0.385 0.449 0.538 0.624 0.871 ▇▇▆▅▁## 12 RM 0 1 6.28 0.703 3.56 5.89 6.21 6.62 8.78 ▁▂▇▂▁## 13 AGE 0 1 68.6 28.1 2.9 45.0 77.5 94.1 100 ▂▂▂▃▇## 14 DIS 0 1 3.80 2.11 1.13 2.10 3.21 5.19 12.1 ▇▅▂▁▁## 15 RAD 0 1 9.55 8.71 1 4 5 24 24 ▇▂▁▁▃## 16 TAX 0 1 408. 169. 187 279 330 666 711 ▇▇▃▁▇## 17 PTRATIO 0 1 18.5 2.16 12.6 17.4 19.0 20.2 22 ▁▃▅▅▇## 18 B 0 1 357. 91.3 0.32 375. 391. 396. 397. ▁▁▁▁▇## 19 LSTAT 0 1 12.7 7.14 1.73 6.95 11.4 17.0 38.0 ▇▇▅▂▁
ggplot(df3, aes(MEDV, y = "")) + geom_boxplot(fill = "#008A25") + labs(x = "Median housing value (US$1000)", y = "") + theme(axis.line.y = element_blank())ggplot(df3, aes(MEDV, y = "")) + geom_jitter() + labs(x = "Median housing value (US$1000)", y = "") + theme(axis.line.y = element_blank())ggplot(df3, aes(MEDV)) + geom_density() + geom_rug() + labs(x = "Median housing value (US$1000)", y = "") + theme(axis.line.y = element_blank())
data(bostonc, package = "DAAG")df3 <- read_tsv(bostonc[10:length(bostonc)]) skimr::skim(df3)
## ── Data Summary ────────────────────────## Values## Name df3 ## Number of rows 506 ## Number of columns 21 ## _______________________ ## Column type frequency: ## character 2 ## numeric 19 ## ________________________ ## Group variables None ## ## ── Variable type: character ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate min max empty n_unique whitespace## 1 TOWN 0 1 4 23 0 92 0## 2 TRACT 0 1 4 4 0 506 0## ## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 OBS. 0 1 254. 146. 1 127. 254. 380. 506 ▇▇▇▇▇## 2 TOWN# 0 1 47.5 27.6 0 26.2 42 78 91 ▅▆▅▃▇## 3 LON 0 1 -71.1 0.0754 -71.3 -71.1 -71.1 -71.0 -70.8 ▁▂▇▂▁## 4 LAT 0 1 42.2 0.0618 42.0 42.2 42.2 42.3 42.4 ▁▃▇▃▁## 5 MEDV 0 1 22.5 9.20 5 17.0 21.2 25 50 ▂▇▅▁▁## 6 CMEDV 0 1 22.5 9.18 5 17.0 21.2 25 50 ▂▇▅▁▁## 7 CRIM 0 1 3.61 8.60 0.00632 0.0820 0.257 3.68 89.0 ▇▁▁▁▁## 8 ZN 0 1 11.4 23.3 0 0 0 12.5 100 ▇▁▁▁▁## 9 INDUS 0 1 11.1 6.86 0.46 5.19 9.69 18.1 27.7 ▇▆▁▇▁## 10 CHAS 0 1 0.0692 0.254 0 0 0 0 1 ▇▁▁▁▁## 11 NOX 0 1 0.555 0.116 0.385 0.449 0.538 0.624 0.871 ▇▇▆▅▁## 12 RM 0 1 6.28 0.703 3.56 5.89 6.21 6.62 8.78 ▁▂▇▂▁## 13 AGE 0 1 68.6 28.1 2.9 45.0 77.5 94.1 100 ▂▂▂▃▇## 14 DIS 0 1 3.80 2.11 1.13 2.10 3.21 5.19 12.1 ▇▅▂▁▁## 15 RAD 0 1 9.55 8.71 1 4 5 24 24 ▇▂▁▁▃## 16 TAX 0 1 408. 169. 187 279 330 666 711 ▇▇▃▁▇## 17 PTRATIO 0 1 18.5 2.16 12.6 17.4 19.0 20.2 22 ▁▃▅▅▇## 18 B 0 1 357. 91.3 0.32 375. 391. 396. 397. ▁▁▁▁▇## 19 LSTAT 0 1 12.7 7.14 1.73 6.95 11.4 17.0 38.0 ▇▇▅▂▁
ggplot(df3, aes(PTRATIO)) + geom_histogram(fill = "#9651A0", color = "black", binwidth = 0.2) + labs(x = "Pupil-teacher ratio by town", y = "", title = "Bin width = 0.2, Left-open") ggplot(df3, aes(PTRATIO)) + geom_histogram(fill = "#9651A0", color = "black", binwidth = 0.5) + labs(x = "Pupil-teacher ratio by town", y = "", title = "Bin width = 0.5, Left-open") ggplot(df3, aes(PTRATIO)) + geom_histogram(fill = "#9651A0", color = "black", bins = 30) + labs(x = "Pupil-teacher ratio by town", y = "", title = "Bin number = 30, Left-open") ggplot(df3, aes(PTRATIO)) + geom_histogram(fill = "#9651A0", color = "black", binwidth = 0.2, closed = "left") + labs(x = "Pupil-teacher ratio by town", y = "", title = "Bin width = 0.2, Right-open") ggplot(df3, aes(PTRATIO)) + geom_histogram(fill = "#9651A0", color = "black", binwidth = 0.5, closed = "left") + labs(x = "Pupil-teacher ratio by town", y = "", title = "Bin width = 0.5, Right-open") ggplot(df3, aes(PTRATIO)) + geom_histogram(fill = "#9651A0", color = "black", bins = 30, closed = "left") + labs(x = "Pupil-teacher ratio by town", y = "", title = "Bin number = 30, Right-open")
df3long <- df3 %>% pivot_longer(MEDV:LSTAT, names_to = "var", values_to = "value") %>% filter(!var %in% c("CHAS", "B", "ZN"))skimr::skim(df3long)
## ── Data Summary ────────────────────────## Values ## Name df3long## Number of rows 6072 ## Number of columns 8 ## _______________________ ## Column type frequency: ## character 3 ## numeric 5 ## ________________________ ## Group variables None ## ## ── Variable type: character ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate min max empty n_unique whitespace## 1 TOWN 0 1 4 23 0 92 0## 2 TRACT 0 1 4 4 0 506 0## 3 var 0 1 2 7 0 12 0## ## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 OBS. 0 1 254. 146. 1 127 254. 380 506 ▇▇▇▇▇## 2 TOWN# 0 1 47.5 27.5 0 26 42 78 91 ▅▆▅▃▇## 3 LON 0 1 -71.1 0.0753 -71.3 -71.1 -71.1 -71.0 -70.8 ▁▂▇▂▁## 4 LAT 0 1 42.2 0.0617 42.0 42.2 42.2 42.3 42.4 ▁▃▇▃▁## 5 value 0 1 49.0 120. 0.00632 4 12.3 23.4 711 ▇▁▁▁▁
ggplot(df3long, aes(value)) + geom_histogram(color = "white") + facet_wrap( ~var, scale = "free") + labs(x = "", y = "") + theme(axis.text = element_text(size = 12))
load(here::here("data/Hidalgo1872.rda"))skimr::skim(Hidalgo1872)
## ── Data Summary ────────────────────────## Values ## Name Hidalgo1872## Number of rows 485 ## Number of columns 3 ## _______________________ ## Column type frequency: ## numeric 3 ## ________________________ ## Group variables None ## ## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 thickness 0 1 0.0860 0.0150 0.06 0.075 0.08 0.098 0.131 ▅▇▃▂▁## 2 thicknessA 195 0.598 0.0922 0.0162 0.068 0.0772 0.092 0.105 0.131 ▇▃▆▃▂## 3 thicknessB 289 0.404 0.0768 0.00508 0.06 0.072 0.078 0.08 0.097 ▁▃▇▁▁
ggplot(Hidalgo1872, aes(thickness)) + geom_histogram(binwidth = 0.001, aes(y = stat(density))) + labs(x = "Thickness (0.001 mm)", y = "Density") + geom_density(color = "#E16A86", size = 2) + geom_density(color = "#00AD9A", size = 2, bw = "SJ")
data(movies, package = "ggplot2movies")skimr::skim(movies)
## ── Data Summary ────────────────────────## Values## Name movies## Number of rows 58788 ## Number of columns 24 ## _______________________ ## Column type frequency: ## character 2 ## numeric 22 ## ________________________ ## Group variables None ## ## ── Variable type: character ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate min max empty n_unique whitespace## 1 title 0 1 1 121 0 56007 0## 2 mpaa 0 1 0 5 53864 5 0## ## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 year 0 1 1976. 23.7 1893 1958 1983 1997 2005 ▁▁▃▃▇## 2 length 0 1 82.3 44.3 1 74 90 100 5220 ▇▁▁▁▁## 3 budget 53573 0.0887 13412513. 23350085. 0 250000 3000000 15000000 200000000 ▇▁▁▁▁## 4 rating 0 1 5.93 1.55 1 5 6.1 7 10 ▁▃▇▆▁## 5 votes 0 1 632. 3830. 5 11 30 112 157608 ▇▁▁▁▁## 6 r1 0 1 7.01 10.9 0 0 4.5 4.5 100 ▇▁▁▁▁## 7 r2 0 1 4.02 5.96 0 0 4.5 4.5 84.5 ▇▁▁▁▁## 8 r3 0 1 4.72 6.45 0 0 4.5 4.5 84.5 ▇▁▁▁▁## 9 r4 0 1 6.37 7.59 0 0 4.5 4.5 100 ▇▁▁▁▁## 10 r5 0 1 9.80 9.73 0 4.5 4.5 14.5 100 ▇▁▁▁▁## 11 r6 0 1 13.0 11.0 0 4.5 14.5 14.5 84.5 ▇▂▁▁▁## 12 r7 0 1 15.5 11.6 0 4.5 14.5 24.5 100 ▇▃▁▁▁## 13 r8 0 1 13.9 11.3 0 4.5 14.5 24.5 100 ▇▃▁▁▁## 14 r9 0 1 8.95 9.44 0 4.5 4.5 14.5 100 ▇▁▁▁▁## 15 r10 0 1 16.9 15.7 0 4.5 14.5 24.5 100 ▇▃▁▁▁## 16 Action 0 1 0.0797 0.271 0 0 0 0 1 ▇▁▁▁▁## 17 Animation 0 1 0.0628 0.243 0 0 0 0 1 ▇▁▁▁▁## 18 Comedy 0 1 0.294 0.455 0 0 0 1 1 ▇▁▁▁▃## 19 Drama 0 1 0.371 0.483 0 0 0 1 1 ▇▁▁▁▅## 20 Documentary 0 1 0.0591 0.236 0 0 0 0 1 ▇▁▁▁▁## 21 Romance 0 1 0.0807 0.272 0 0 0 0 1 ▇▁▁▁▁## 22 Short 0 1 0.161 0.367 0 0 0 0 1 ▇▁▁▁▂
ggplot(movies, aes(length)) + geom_histogram(color = "white") + labs(x = "Length of movie (minutes)", y = "Frequency")ggplot(movies, aes(length)) + geom_histogram(color = "white") + labs(x = "Length of movie (minutes)", y = "Frequency") + scale_x_log10()movies %>% filter(length < 180) %>% ggplot(aes(length)) + geom_histogram(binwidth = 1, fill = "#795549", color = "black") + labs(x = "Length of movie (minutes)", y = "Frequency")
data(movies, package = "ggplot2movies")skimr::skim(movies)
## ── Data Summary ────────────────────────## Values## Name movies## Number of rows 58788 ## Number of columns 24 ## _______________________ ## Column type frequency: ## character 2 ## numeric 22 ## ________________________ ## Group variables None ## ## ── Variable type: character ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate min max empty n_unique whitespace## 1 title 0 1 1 121 0 56007 0## 2 mpaa 0 1 0 5 53864 5 0## ## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 year 0 1 1976. 23.7 1893 1958 1983 1997 2005 ▁▁▃▃▇## 2 length 0 1 82.3 44.3 1 74 90 100 5220 ▇▁▁▁▁## 3 budget 53573 0.0887 13412513. 23350085. 0 250000 3000000 15000000 200000000 ▇▁▁▁▁## 4 rating 0 1 5.93 1.55 1 5 6.1 7 10 ▁▃▇▆▁## 5 votes 0 1 632. 3830. 5 11 30 112 157608 ▇▁▁▁▁## 6 r1 0 1 7.01 10.9 0 0 4.5 4.5 100 ▇▁▁▁▁## 7 r2 0 1 4.02 5.96 0 0 4.5 4.5 84.5 ▇▁▁▁▁## 8 r3 0 1 4.72 6.45 0 0 4.5 4.5 84.5 ▇▁▁▁▁## 9 r4 0 1 6.37 7.59 0 0 4.5 4.5 100 ▇▁▁▁▁## 10 r5 0 1 9.80 9.73 0 4.5 4.5 14.5 100 ▇▁▁▁▁## 11 r6 0 1 13.0 11.0 0 4.5 14.5 14.5 84.5 ▇▂▁▁▁## 12 r7 0 1 15.5 11.6 0 4.5 14.5 24.5 100 ▇▃▁▁▁## 13 r8 0 1 13.9 11.3 0 4.5 14.5 24.5 100 ▇▃▁▁▁## 14 r9 0 1 8.95 9.44 0 4.5 4.5 14.5 100 ▇▁▁▁▁## 15 r10 0 1 16.9 15.7 0 4.5 14.5 24.5 100 ▇▃▁▁▁## 16 Action 0 1 0.0797 0.271 0 0 0 0 1 ▇▁▁▁▁## 17 Animation 0 1 0.0628 0.243 0 0 0 0 1 ▇▁▁▁▁## 18 Comedy 0 1 0.294 0.455 0 0 0 1 1 ▇▁▁▁▃## 19 Drama 0 1 0.371 0.483 0 0 0 1 1 ▇▁▁▁▅## 20 Documentary 0 1 0.0591 0.236 0 0 0 0 1 ▇▁▁▁▁## 21 Romance 0 1 0.0807 0.272 0 0 0 0 1 ▇▁▁▁▁## 22 Short 0 1 0.161 0.367 0 0 0 0 1 ▇▁▁▁▂
ggplot(movies, aes(length)) + geom_histogram(color = "white") + labs(x = "Length of movie (minutes)", y = "Frequency")ggplot(movies, aes(length)) + geom_histogram(color = "white") + labs(x = "Length of movie (minutes)", y = "Frequency") + scale_x_log10()movies %>% filter(length < 180) %>% ggplot(aes(length)) + geom_histogram(binwidth = 1, fill = "#795549", color = "black") + labs(x = "Length of movie (minutes)", y = "Frequency")
data(movies, package = "ggplot2movies")skimr::skim(movies)
## ── Data Summary ────────────────────────## Values## Name movies## Number of rows 58788 ## Number of columns 24 ## _______________________ ## Column type frequency: ## character 2 ## numeric 22 ## ________________________ ## Group variables None ## ## ── Variable type: character ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate min max empty n_unique whitespace## 1 title 0 1 1 121 0 56007 0## 2 mpaa 0 1 0 5 53864 5 0## ## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 year 0 1 1976. 23.7 1893 1958 1983 1997 2005 ▁▁▃▃▇## 2 length 0 1 82.3 44.3 1 74 90 100 5220 ▇▁▁▁▁## 3 budget 53573 0.0887 13412513. 23350085. 0 250000 3000000 15000000 200000000 ▇▁▁▁▁## 4 rating 0 1 5.93 1.55 1 5 6.1 7 10 ▁▃▇▆▁## 5 votes 0 1 632. 3830. 5 11 30 112 157608 ▇▁▁▁▁## 6 r1 0 1 7.01 10.9 0 0 4.5 4.5 100 ▇▁▁▁▁## 7 r2 0 1 4.02 5.96 0 0 4.5 4.5 84.5 ▇▁▁▁▁## 8 r3 0 1 4.72 6.45 0 0 4.5 4.5 84.5 ▇▁▁▁▁## 9 r4 0 1 6.37 7.59 0 0 4.5 4.5 100 ▇▁▁▁▁## 10 r5 0 1 9.80 9.73 0 4.5 4.5 14.5 100 ▇▁▁▁▁## 11 r6 0 1 13.0 11.0 0 4.5 14.5 14.5 84.5 ▇▂▁▁▁## 12 r7 0 1 15.5 11.6 0 4.5 14.5 24.5 100 ▇▃▁▁▁## 13 r8 0 1 13.9 11.3 0 4.5 14.5 24.5 100 ▇▃▁▁▁## 14 r9 0 1 8.95 9.44 0 4.5 4.5 14.5 100 ▇▁▁▁▁## 15 r10 0 1 16.9 15.7 0 4.5 14.5 24.5 100 ▇▃▁▁▁## 16 Action 0 1 0.0797 0.271 0 0 0 0 1 ▇▁▁▁▁## 17 Animation 0 1 0.0628 0.243 0 0 0 0 1 ▇▁▁▁▁## 18 Comedy 0 1 0.294 0.455 0 0 0 1 1 ▇▁▁▁▃## 19 Drama 0 1 0.371 0.483 0 0 0 1 1 ▇▁▁▁▅## 20 Documentary 0 1 0.0591 0.236 0 0 0 0 1 ▇▁▁▁▁## 21 Romance 0 1 0.0807 0.272 0 0 0 0 1 ▇▁▁▁▁## 22 Short 0 1 0.161 0.367 0 0 0 0 1 ▇▁▁▁▂
ggplot(movies, aes(length)) + geom_histogram(color = "white") + labs(x = "Length of movie (minutes)", y = "Frequency")ggplot(movies, aes(length)) + geom_histogram(color = "white") + labs(x = "Length of movie (minutes)", y = "Frequency") + scale_x_log10()movies %>% filter(length < 180) %>% ggplot(aes(length)) + geom_histogram(binwidth = 1, fill = "#795549", color = "black") + labs(x = "Length of movie (minutes)", y = "Frequency")
data(movies, package = "ggplot2movies")skimr::skim(movies)
## ── Data Summary ────────────────────────## Values## Name movies## Number of rows 58788 ## Number of columns 24 ## _______________________ ## Column type frequency: ## character 2 ## numeric 22 ## ________________________ ## Group variables None ## ## ── Variable type: character ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate min max empty n_unique whitespace## 1 title 0 1 1 121 0 56007 0## 2 mpaa 0 1 0 5 53864 5 0## ## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 year 0 1 1976. 23.7 1893 1958 1983 1997 2005 ▁▁▃▃▇## 2 length 0 1 82.3 44.3 1 74 90 100 5220 ▇▁▁▁▁## 3 budget 53573 0.0887 13412513. 23350085. 0 250000 3000000 15000000 200000000 ▇▁▁▁▁## 4 rating 0 1 5.93 1.55 1 5 6.1 7 10 ▁▃▇▆▁## 5 votes 0 1 632. 3830. 5 11 30 112 157608 ▇▁▁▁▁## 6 r1 0 1 7.01 10.9 0 0 4.5 4.5 100 ▇▁▁▁▁## 7 r2 0 1 4.02 5.96 0 0 4.5 4.5 84.5 ▇▁▁▁▁## 8 r3 0 1 4.72 6.45 0 0 4.5 4.5 84.5 ▇▁▁▁▁## 9 r4 0 1 6.37 7.59 0 0 4.5 4.5 100 ▇▁▁▁▁## 10 r5 0 1 9.80 9.73 0 4.5 4.5 14.5 100 ▇▁▁▁▁## 11 r6 0 1 13.0 11.0 0 4.5 14.5 14.5 84.5 ▇▂▁▁▁## 12 r7 0 1 15.5 11.6 0 4.5 14.5 24.5 100 ▇▃▁▁▁## 13 r8 0 1 13.9 11.3 0 4.5 14.5 24.5 100 ▇▃▁▁▁## 14 r9 0 1 8.95 9.44 0 4.5 4.5 14.5 100 ▇▁▁▁▁## 15 r10 0 1 16.9 15.7 0 4.5 14.5 24.5 100 ▇▃▁▁▁## 16 Action 0 1 0.0797 0.271 0 0 0 0 1 ▇▁▁▁▁## 17 Animation 0 1 0.0628 0.243 0 0 0 0 1 ▇▁▁▁▁## 18 Comedy 0 1 0.294 0.455 0 0 0 1 1 ▇▁▁▁▃## 19 Drama 0 1 0.371 0.483 0 0 0 1 1 ▇▁▁▁▅## 20 Documentary 0 1 0.0591 0.236 0 0 0 0 1 ▇▁▁▁▁## 21 Romance 0 1 0.0807 0.272 0 0 0 0 1 ▇▁▁▁▁## 22 Short 0 1 0.161 0.367 0 0 0 0 1 ▇▁▁▁▂
ggplot(movies, aes(length)) + geom_histogram(color = "white") + labs(x = "Length of movie (minutes)", y = "Frequency")ggplot(movies, aes(length)) + geom_histogram(color = "white") + labs(x = "Length of movie (minutes)", y = "Frequency") + scale_x_log10()movies %>% filter(length < 180) %>% ggplot(aes(length)) + geom_histogram(binwidth = 1, fill = "#795549", color = "black") + labs(x = "Length of movie (minutes)", y = "Frequency")
This lecture is based on Chapter 4 of
Unwin (2015) Graphical Data Analysis with R
Nominal where there is no intrinsic ordering to the categories
E.g. blue, grey, black, white.
Nominal where there is no intrinsic ordering to the categories
E.g. blue, grey, black, white.
Ordinal where there is a clear order to the categories.
E.g. Strongly disagree, disagree, neutral, agree, strongly agree.
data <- c(2, 2, 1, 1, 3, 3, 3, 1)factor(data)
## [1] 2 2 1 1 3 3 3 1## Levels: 1 2 3
factor(data, labels = c("I", "II", "III"))
## [1] II II I I III III III I ## Levels: I II III
data <- c(2, 2, 1, 1, 3, 3, 3, 1)factor(data)
## [1] 2 2 1 1 3 3 3 1## Levels: 1 2 3
factor(data, labels = c("I", "II", "III"))
## [1] II II I I III III III I ## Levels: I II III
# numerical input are ordered in increasing orderfactor(c(1, 3, 10))
## [1] 1 3 10## Levels: 1 3 10
# character input are ordered alphabeticallyfactor(c("1", "3", "10"))
## [1] 1 3 10## Levels: 1 10 3
# you can specify order of levels explicitlyfactor(c("1", "3", "10"), levels = c("1", "3", "10"))
## [1] 1 3 10## Levels: 1 3 10
x <- factor(c(10, 20, 30, 10, 20))mean(x)
## Warning in mean.default(x): argument is not numeric or logical: returning NA
## [1] NA
x <- factor(c(10, 20, 30, 10, 20))mean(x)
## Warning in mean.default(x): argument is not numeric or logical: returning NA
## [1] NA
as.numeric
function returns the internal integer values of the factor
mean(as.numeric(x))
## [1] 1.8
x <- factor(c(10, 20, 30, 10, 20))mean(x)
## Warning in mean.default(x): argument is not numeric or logical: returning NA
## [1] NA
as.numeric
function returns the internal integer values of the factor
mean(as.numeric(x))
## [1] 1.8
You probably want to use:
mean(as.numeric(levels(x)[x]))
## [1] 18
mean(as.numeric(as.character(x)))
## [1] 18
df1 <- read_csv(here::here("data/HouseFirstPrefsByCandidateByVoteTypeDownload-24310.csv"), skip = 1, col_types = cols( .default = col_character(), OrdinaryVotes = col_double(), AbsentVotes = col_double(), ProvisionalVotes = col_double(), PrePollVotes = col_double(), PostalVotes = col_double(), TotalVotes = col_double(), Swing = col_double()))tdf3 <- df1 %>% group_by(DivisionID) %>% summarise(DivisionNm = unique(DivisionNm), State = unique(StateAb), votes_GRN = TotalVotes[which(PartyAb=="GRN")], votes_total = sum(TotalVotes)) %>% mutate(perc_GRN = votes_GRN / votes_total * 100)
tdf3 %>% ggplot(aes(perc_GRN, State)) + ggbeeswarm::geom_quasirandom(groupOnX = FALSE, varwidth = TRUE) + labs(x = "Percentage of first preference votes per division", y = "State", title = "First preference votes for the Greens party")tdf3 %>% mutate(State = fct_reorder(State, perc_GRN)) %>% ggplot(aes(perc_GRN, State)) + ggbeeswarm::geom_quasirandom(groupOnX = FALSE, varwidth = TRUE) + labs(x = "Percentage of first preference votes per division", y = "State", title = "First preference votes for the Greens party")
Coding tip: use below functions to easily change the order of factor levels
stats::reorder(factor, value, mean)forcats::fct_reorder(factor, value, median)forcats::fct_reorder2(factor, value1, value2, func)
data("Fleiss93", package = "meta")df6 <- Fleiss93 %>% mutate(total = n.e + n.c)skimr::skim(df6)
## ── Data Summary ────────────────────────## Values## Name df6 ## Number of rows 7 ## Number of columns 7 ## _______________________ ## Column type frequency: ## character 1 ## numeric 6 ## ________________________ ## Group variables None ## ## ── Variable type: character ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate min max empty n_unique whitespace## 1 study 0 1 3 6 0 7 0## ## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 year 0 1 1979. 4.39 1974 1978. 1979 1980 1988 ▇▇▇▁▃## 2 event.e 0 1 304 563. 32 46.5 85 174 1570 ▇▁▁▁▁## 3 n.e 0 1 2027. 2959. 317 686. 810 1550. 8587 ▇▂▁▁▂## 4 event.c 0 1 327. 618. 38 58 67 172. 1720 ▇▁▁▁▁## 5 n.c 0 1 1974. 2993. 309 515 771 1554. 8600 ▇▂▁▁▂## 6 total 0 1 4000. 5950. 626 1228. 1529 3103 17187 ▇▂▁▁▂
df6 %>% mutate(study = fct_reorder(study, desc(total))) %>% ggplot(aes(study, total)) + geom_col() + labs(x = "", y = "Frequency") + guides(x = guide_axis(n.dodge = 2))df6 %>% mutate(study = ifelse(total < 2000, "Other", study), study = fct_reorder(study, desc(total))) %>% ggplot(aes(study, total)) + geom_col() + labs(x = "", y = "Frequency")
Fleiss JL (1993): The statistical basis of meta-analysis. Statistical Methods in Medical Research 2 121–145
Balduzzi S, Rücker G, Schwarzer G (2019), How to perform a meta-analysis with R: a practical tutorial, Evidence-Based Mental Health.
Coding tip: the following family of functions help to easily lump factor levels together:
forcats::fct_lump()forcats::fct_lump_lowfreq()forcats::fct_lump_min()forcats::fct_lump_n()forcats::fct_lump_prop()# if conditioned on another variableifelse(cond, "Other", factor)dplyr::case_when(cond1 ~ "level1", cond2 ~ "level2", TRUE ~ "Other")
Treatment | Frequency |
---|---|
CBT | 29 |
Cont | 26 |
FT | 17 |
Table or Plot?
data(anorexia, package = "MASS")df9tab <- table(anorexia$Treat) %>% as.data.frame() %>% rename(Treatment = Var1, Frequency = Freq)skimr::skim(anorexia)
## ── Data Summary ────────────────────────## Values ## Name anorexia## Number of rows 72 ## Number of columns 3 ## _______________________ ## Column type frequency: ## factor 1 ## numeric 2 ## ________________________ ## Group variables None ## ## ── Variable type: factor ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate ordered n_unique top_counts ## 1 Treat 0 1 FALSE 3 CBT: 29, Con: 26, FT: 17## ## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 Prewt 0 1 82.4 5.18 70 79.6 82.3 86 94.9 ▂▅▇▆▁## 2 Postwt 0 1 85.2 8.04 71.3 79.3 84.1 91.6 104. ▆▇▅▆▂
ggplot(anorexia, aes(Treat)) + geom_bar() + labs(x = "", y = "Frequency")
ggplot(anorexia, aes(Treat)) + stat_count(geom = "point", size = 4) + stat_count(geom = "line", group = 1) + labs(y = "Frequency", x = "")
Hand, D. J., Daly, F., McConway, K., Lunn, D. and Ostrowski, E. eds (1993) A Handbook of Small Data Sets. Chapman & Hall, Data set 285 (p. 229)
Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0
Treatment | Frequency |
---|---|
CBT | 29 |
Cont | 26 |
FT | 17 |
Table or Plot?
data(anorexia, package = "MASS")df9tab <- table(anorexia$Treat) %>% as.data.frame() %>% rename(Treatment = Var1, Frequency = Freq)skimr::skim(anorexia)
## ── Data Summary ────────────────────────## Values ## Name anorexia## Number of rows 72 ## Number of columns 3 ## _______________________ ## Column type frequency: ## factor 1 ## numeric 2 ## ________________________ ## Group variables None ## ## ── Variable type: factor ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate ordered n_unique top_counts ## 1 Treat 0 1 FALSE 3 CBT: 29, Con: 26, FT: 17## ## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 Prewt 0 1 82.4 5.18 70 79.6 82.3 86 94.9 ▂▅▇▆▁## 2 Postwt 0 1 85.2 8.04 71.3 79.3 84.1 91.6 104. ▆▇▅▆▂
ggplot(anorexia, aes(Treat)) + geom_bar() + labs(x = "", y = "Frequency")
ggplot(anorexia, aes(Treat)) + stat_count(geom = "point", size = 4) + stat_count(geom = "line", group = 1) + labs(y = "Frequency", x = "")
Hand, D. J., Daly, F., McConway, K., Lunn, D. and Ostrowski, E. eds (1993) A Handbook of Small Data Sets. Chapman & Hall, Data set 285 (p. 229)
Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0
Treatment | Frequency |
---|---|
CBT | 29 |
Cont | 26 |
FT | 17 |
Table or Plot?
Why not a point or line?
data(anorexia, package = "MASS")df9tab <- table(anorexia$Treat) %>% as.data.frame() %>% rename(Treatment = Var1, Frequency = Freq)skimr::skim(anorexia)
## ── Data Summary ────────────────────────## Values ## Name anorexia## Number of rows 72 ## Number of columns 3 ## _______________________ ## Column type frequency: ## factor 1 ## numeric 2 ## ________________________ ## Group variables None ## ## ── Variable type: factor ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate ordered n_unique top_counts ## 1 Treat 0 1 FALSE 3 CBT: 29, Con: 26, FT: 17## ## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 Prewt 0 1 82.4 5.18 70 79.6 82.3 86 94.9 ▂▅▇▆▁## 2 Postwt 0 1 85.2 8.04 71.3 79.3 84.1 91.6 104. ▆▇▅▆▂
ggplot(anorexia, aes(Treat)) + geom_bar() + labs(x = "", y = "Frequency")
ggplot(anorexia, aes(Treat)) + stat_count(geom = "point", size = 4) + stat_count(geom = "line", group = 1) + labs(y = "Frequency", x = "")
Hand, D. J., Daly, F., McConway, K., Lunn, D. and Ostrowski, E. eds (1993) A Handbook of Small Data Sets. Chapman & Hall, Data set 285 (p. 229)
Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0
Treatment | Frequency |
---|---|
CBT | 29 |
Cont | 26 |
FT | 17 |
Table or Plot?
Why not a point or line?
data(anorexia, package = "MASS")df9tab <- table(anorexia$Treat) %>% as.data.frame() %>% rename(Treatment = Var1, Frequency = Freq)skimr::skim(anorexia)
## ── Data Summary ────────────────────────## Values ## Name anorexia## Number of rows 72 ## Number of columns 3 ## _______________________ ## Column type frequency: ## factor 1 ## numeric 2 ## ________________________ ## Group variables None ## ## ── Variable type: factor ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate ordered n_unique top_counts ## 1 Treat 0 1 FALSE 3 CBT: 29, Con: 26, FT: 17## ## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 Prewt 0 1 82.4 5.18 70 79.6 82.3 86 94.9 ▂▅▇▆▁## 2 Postwt 0 1 85.2 8.04 71.3 79.3 84.1 91.6 104. ▆▇▅▆▂
ggplot(anorexia, aes(Treat)) + geom_bar() + labs(x = "", y = "Frequency")
ggplot(anorexia, aes(Treat)) + stat_count(geom = "point", size = 4) + stat_count(geom = "line", group = 1) + labs(y = "Frequency", x = "")
Hand, D. J., Daly, F., McConway, K., Lunn, D. and Ostrowski, E. eds (1993) A Handbook of Small Data Sets. Chapman & Hall, Data set 285 (p. 229)
Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0
What does the graphs for each categorical variable tell us?
df9 <- as_tibble(Titanic)skimr::skim(df9)
## ── Data Summary ────────────────────────## Values## Name df9 ## Number of rows 32 ## Number of columns 5 ## _______________________ ## Column type frequency: ## character 4 ## numeric 1 ## ________________________ ## Group variables None ## ## ── Variable type: character ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate min max empty n_unique whitespace## 1 Class 0 1 3 4 0 4 0## 2 Sex 0 1 4 6 0 2 0## 3 Age 0 1 5 5 0 2 0## 4 Survived 0 1 2 3 0 2 0## ## ── Variable type: numeric ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────## skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist ## 1 n 0 1 68.8 136. 0 0.75 13.5 77 670 ▇▁▁▁▁
df9 %>% group_by(Class) %>% summarise(total = sum(n)) %>% ggplot(aes(Class, total)) + geom_col(fill = "#ee64a4") + labs(x = "", y = "Frequency") df9 %>% group_by(Sex) %>% summarise(total = sum(n)) %>% ggplot(aes(Sex, total)) + geom_col(fill = "#746FB2") + labs(x = "", y = "Frequency") df9 %>% group_by(Age) %>% summarise(total = sum(n)) %>% ggplot(aes(Age, total)) + geom_col(fill = "#C8008F") + labs(x = "", y = "Frequency") df9 %>% group_by(Survived) %>% summarise(total = sum(n)) %>% ggplot(aes(Survived, total)) + geom_col(fill = "#795549") + labs(x = "Survived", y = "Frequency")
British Board of Trade (1990), Report on the Loss of the ‘Titanic’ (S.S.). British Board of Trade Inquiry Report (reprint). Gloucester, UK: Allan Sutton Publishing
df9 <- tibble(party = c("Fine Gael", "Labour", "Fianna Fail", "Sinn Fein", "Indeps", "Green", "Undecided"), nos = c(181, 51, 171, 119, 91, 4, 368)) df9v2 <- df9 %>% filter(party != "Undecided")df9
## # A tibble: 7 x 2## party nos## <chr> <dbl>## 1 Fine Gael 181## 2 Labour 51## 3 Fianna Fail 171## 4 Sinn Fein 119## 5 Indeps 91## 6 Green 4## 7 Undecided 368
g9 <- df9 %>% ggplot(aes("", nos, fill = party)) + geom_col(color = "black") + labs(y = "", x = "") + coord_polar("y") + theme(axis.line = element_blank(), axis.line.y = element_blank(), axis.text = element_blank(), panel.grid.major = element_blank()) + scale_fill_discrete_qualitative(name = "Party")g9g9 %+% df9v2 + # below is needed to keep the same color scheme as before scale_fill_manual(values = qualitative_hcl(7)[1:6])
df <- data.frame(var = c("A", "B", "C"), perc = c(40, 40, 20))g <- ggplot(df, aes("", perc, fill = var)) + geom_col()g
df <- data.frame(var = c("A", "B", "C"), perc = c(40, 40, 20))g <- ggplot(df, aes("", perc, fill = var)) + geom_col()g
g + coord_polar("y")
dummy <- data.frame(var = LETTERS[1:20], n = round(rexp(20, 1/100)))g <- ggplot(dummy, aes(var, n)) + geom_col(fill = "pink", color = "black")g
dummy <- data.frame(var = LETTERS[1:20], n = round(rexp(20, 1/100)))g <- ggplot(dummy, aes(var, n)) + geom_col(fill = "pink", color = "black")g
g + coord_polar("x") + theme_void()
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Lecturer: Emi Tanaka
ETC5521.Clayton-x@monash.edu
Week 4 - Session 2
Lecturer: Emi Tanaka
ETC5521.Clayton-x@monash.edu
Week 4 - Session 2
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |