STAT1003 – Statistical Techniques
Dr. Emi Tanaka
Australian National University
These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.








library(tidyverse) is a shorthand for loading the 9 core tidyverse packages: ggplot2, dplyr, tidyr, readr, tibble, purrr, stringr, forcats, lubridate.dplyr is a core package in tidyversedplyr provides a more consistent and user-friendly interface for data manipulation tasks.dplyr (first on CRAN in 2014-01-29) was implemented in plyr (first on CRAN in 2008-10-08).dplyr has been evolving but dplyr v1.0.0 was released on CRAN in 2020-05-29 suggesting that functions in dplyr are maturing and thus the user interface is unlikely to change.
tidyverse packages often are labelled with a badge like on the leftLionel Henry (2020). lifecycle: Manage the Life Cycle of your Package Functions. R package version 0.2.0.
dplyr “verbs”dplyr include:arrangeselectmutaterenamefiltersummarisedplyr generally have the form:verb(data, args)
data is a data.frame object.tips data from GGally package to illustrate some of the dplyr functions.tidyverse packages use the pipe operator %>% from the magrittr package.|> which is similar to %>% but with some differences.x |> f(y) is the same as f(x, y).x |> f(y) |> g(z) is the same as g(f(x, y), z).tidyselect package for column selection.tidyselect can be found in the documentation:tibble objectstibble is a modern reimagining of the data.frame object.What’s the difference between these?
What is happening here?
See also filter_out() for filtering out rows by condition for dplyr v1.2.0 or greater.
mutate().ifelse() or case_when().Also see recode_values(), replace_values(), and replace_when() for dplyr v1.2.0 or greater.
select() along with everything() to reorder columns.relocate() to move columns around.summarise() function allows you to calculate statistical summaries by group.n() function is a special function that counts the number of observations in each group.by for group operations.across() function allows you to apply functions to multiple columns.dplyr provides a grammar of data manipulation.dplyr are verbs that take a data.frame (or tibble) as the first argument.dplyr cheatsheet


STAT1003 – Statistical Techniques