Base R and Tidyverse syntax comparison for data wrangling
Author
Affiliation
Emi Tanaka
Australian National University
Published
January 12, 2025
Modified
January 12, 2025
1 Introduction
Tidyverse is a popular collection of R packages that provides a consistent interface for data science tasks, including data wrangling for tabular data, strings, factors, date and time, and lists (Wickham et al. 2019). This live document provides a comparison between Base R and tidyverse syntax for data wrangling. Note that this document includes many of the data wrangling tasks but not all and will be continously updated. The Base R approach is shown on the left and the Tidyverse approach is shown on the right.
For the following comparisons, each code cell is excecuted independently so the previous code cell do not affect the code cell thereafter1.
library(tidyverse)
Output
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
2 Tabular data
We make the syntax comparison using a subset of the penguins data in the palmerpenguins dataset (Horst, Hill, and Gorman 2020) named penguins_small as shown in Table 1. The presentation of the penguins_small is modified by changing it to a tibble and data frame, named as tbl and df, respectively.
As an aside, we also show comparison of the syntax of similar operations using data.table package (Barrett et al. 2024). The penguins_small data is converted to a data.table object named dt.
Warning in `[.data.table`(dt, sex == "female", `:=`(mass, mass/2)): 2437.500000
(type 'double') at RHS position 8 out-of-range(NA) or truncated (precision
lost) when assigning to type 'integer' (column 5 named 'mass')
2.15 Update specific cells in a column with multiple cases
Create a new column size based on the following conditions: if sex is female and mass is greater than 4000, then size is large, if sex is male and mass is greater than 4100, then size is large, otherwise size is small.
We also create a second data table dt2 that contains the sex, species, and name columns.
dt2 <-as.data.table(df2)
2.16.1 Left join
Left join df and df2 by species and sex.
merge(df, df2, by =c("species", "sex"),all.x =TRUE)
Output
species sex bill_length_mm bill_depth_mm mass name
1 Adelie female 39.5 16.7 3250 A
2 Adelie female 35.7 16.9 3150 A
3 Adelie female 35.7 18.0 3550 A
4 Adelie female 36.2 17.2 3150 A
5 Adelie female 36.0 17.1 3700 A
6 Adelie male 39.1 18.7 3750 B
7 Gentoo female 47.5 14.0 4875 <NA>
8 Gentoo female 47.2 13.7 4925 <NA>
9 Gentoo female 48.2 14.3 4600 <NA>
10 Gentoo female 43.2 14.5 4450 <NA>
left_join(df, df2, join_by(species, sex))
Output
species sex bill_length_mm bill_depth_mm mass name
1 Adelie male 39.1 18.7 3750 B
2 Adelie female 39.5 16.7 3250 A
3 Adelie female 35.7 16.9 3150 A
4 Adelie female 35.7 18.0 3550 A
5 Adelie female 36.2 17.2 3150 A
6 Adelie female 36.0 17.1 3700 A
7 Gentoo female 48.2 14.3 4600 <NA>
8 Gentoo female 43.2 14.5 4450 <NA>
9 Gentoo female 47.5 14.0 4875 <NA>
10 Gentoo female 47.2 13.7 4925 <NA>
merge(dt, dt2, by =c("species", "sex"),all.x =TRUE)
Output
Key: <species, sex>
species sex bill_length_mm bill_depth_mm mass name
<char> <char> <num> <num> <int> <char>
1: Adelie female 39.5 16.7 3250 A
2: Adelie female 35.7 16.9 3150 A
3: Adelie female 35.7 18.0 3550 A
4: Adelie female 36.2 17.2 3150 A
5: Adelie female 36.0 17.1 3700 A
6: Adelie male 39.1 18.7 3750 B
7: Gentoo female 48.2 14.3 4600 <NA>
8: Gentoo female 43.2 14.5 4450 <NA>
9: Gentoo female 47.5 14.0 4875 <NA>
10: Gentoo female 47.2 13.7 4925 <NA>
2.16.2 Right join
Right join df and df2 by species and sex.
merge(df, df2, by =c("species", "sex"),all.y =TRUE)
Output
species sex bill_length_mm bill_depth_mm mass name
1 Adelie female 39.5 16.7 3250 A
2 Adelie female 35.7 16.9 3150 A
3 Adelie female 35.7 18.0 3550 A
4 Adelie female 36.2 17.2 3150 A
5 Adelie female 36.0 17.1 3700 A
6 Adelie male 39.1 18.7 3750 B
7 Chinstrap female NA NA NA C
8 Chinstrap male NA NA NA D
right_join(df, df2, join_by(species, sex))
Output
species sex bill_length_mm bill_depth_mm mass name
1 Adelie male 39.1 18.7 3750 B
2 Adelie female 39.5 16.7 3250 A
3 Adelie female 35.7 16.9 3150 A
4 Adelie female 35.7 18.0 3550 A
5 Adelie female 36.2 17.2 3150 A
6 Adelie female 36.0 17.1 3700 A
7 Chinstrap female NA NA NA C
8 Chinstrap male NA NA NA D
merge(dt, dt2, by =c("species", "sex"),all.y =TRUE)
Output
Key: <species, sex>
species sex bill_length_mm bill_depth_mm mass name
<char> <char> <num> <num> <int> <char>
1: Adelie female 39.5 16.7 3250 A
2: Adelie female 35.7 16.9 3150 A
3: Adelie female 35.7 18.0 3550 A
4: Adelie female 36.2 17.2 3150 A
5: Adelie female 36.0 17.1 3700 A
6: Adelie male 39.1 18.7 3750 B
7: Chinstrap female NA NA NA C
8: Chinstrap male NA NA NA D
2.16.3 Full join
Full join df and df2 by species and sex.
merge(df, df2, by =c("species", "sex"),all =TRUE)
Output
species sex bill_length_mm bill_depth_mm mass name
1 Adelie female 39.5 16.7 3250 A
2 Adelie female 35.7 16.9 3150 A
3 Adelie female 35.7 18.0 3550 A
4 Adelie female 36.2 17.2 3150 A
5 Adelie female 36.0 17.1 3700 A
6 Adelie male 39.1 18.7 3750 B
7 Chinstrap female NA NA NA C
8 Chinstrap male NA NA NA D
9 Gentoo female 47.5 14.0 4875 <NA>
10 Gentoo female 47.2 13.7 4925 <NA>
11 Gentoo female 48.2 14.3 4600 <NA>
12 Gentoo female 43.2 14.5 4450 <NA>
full_join(df, df2, join_by(species, sex))
Output
species sex bill_length_mm bill_depth_mm mass name
1 Adelie male 39.1 18.7 3750 B
2 Adelie female 39.5 16.7 3250 A
3 Adelie female 35.7 16.9 3150 A
4 Adelie female 35.7 18.0 3550 A
5 Adelie female 36.2 17.2 3150 A
6 Adelie female 36.0 17.1 3700 A
7 Gentoo female 48.2 14.3 4600 <NA>
8 Gentoo female 43.2 14.5 4450 <NA>
9 Gentoo female 47.5 14.0 4875 <NA>
10 Gentoo female 47.2 13.7 4925 <NA>
11 Chinstrap female NA NA NA C
12 Chinstrap male NA NA NA D
merge(dt, dt2, by =c("species", "sex"),all =TRUE)
Output
Key: <species, sex>
species sex bill_length_mm bill_depth_mm mass name
<char> <char> <num> <num> <int> <char>
1: Adelie female 39.5 16.7 3250 A
2: Adelie female 35.7 16.9 3150 A
3: Adelie female 35.7 18.0 3550 A
4: Adelie female 36.2 17.2 3150 A
5: Adelie female 36.0 17.1 3700 A
6: Adelie male 39.1 18.7 3750 B
7: Chinstrap female NA NA NA C
8: Chinstrap male NA NA NA D
9: Gentoo female 48.2 14.3 4600 <NA>
10: Gentoo female 43.2 14.5 4450 <NA>
11: Gentoo female 47.5 14.0 4875 <NA>
12: Gentoo female 47.2 13.7 4925 <NA>
2.16.4 Inner join
Inner join df and df2 by species and sex.
merge(df, df2, by =c("species", "sex"))
Output
species sex bill_length_mm bill_depth_mm mass name
1 Adelie female 39.5 16.7 3250 A
2 Adelie female 35.7 16.9 3150 A
3 Adelie female 35.7 18.0 3550 A
4 Adelie female 36.2 17.2 3150 A
5 Adelie female 36.0 17.1 3700 A
6 Adelie male 39.1 18.7 3750 B
inner_join(df, df2, join_by(species, sex))
Output
species sex bill_length_mm bill_depth_mm mass name
1 Adelie male 39.1 18.7 3750 B
2 Adelie female 39.5 16.7 3250 A
3 Adelie female 35.7 16.9 3150 A
4 Adelie female 35.7 18.0 3550 A
5 Adelie female 36.2 17.2 3150 A
6 Adelie female 36.0 17.1 3700 A
merge(dt, dt2, by =c("species", "sex"))
Output
Key: <species, sex>
species sex bill_length_mm bill_depth_mm mass name
<char> <char> <num> <num> <int> <char>
1: Adelie female 39.5 16.7 3250 A
2: Adelie female 35.7 16.9 3150 A
3: Adelie female 35.7 18.0 3550 A
4: Adelie female 36.2 17.2 3150 A
5: Adelie female 36.0 17.1 3700 A
6: Adelie male 39.1 18.7 3750 B
2.16.5 Cross join
Cross join df and df2.
merge(df, df2, by =NULL)
Output
species.x sex.x bill_length_mm bill_depth_mm mass sex.y species.y name
1 Adelie male 39.1 18.7 3750 female Adelie A
2 Adelie female 39.5 16.7 3250 female Adelie A
3 Adelie female 35.7 16.9 3150 female Adelie A
4 Adelie female 35.7 18.0 3550 female Adelie A
5 Adelie female 36.2 17.2 3150 female Adelie A
6 Adelie female 36.0 17.1 3700 female Adelie A
7 Gentoo female 48.2 14.3 4600 female Adelie A
8 Gentoo female 43.2 14.5 4450 female Adelie A
9 Gentoo female 47.5 14.0 4875 female Adelie A
10 Gentoo female 47.2 13.7 4925 female Adelie A
11 Adelie male 39.1 18.7 3750 male Adelie B
12 Adelie female 39.5 16.7 3250 male Adelie B
13 Adelie female 35.7 16.9 3150 male Adelie B
14 Adelie female 35.7 18.0 3550 male Adelie B
15 Adelie female 36.2 17.2 3150 male Adelie B
16 Adelie female 36.0 17.1 3700 male Adelie B
17 Gentoo female 48.2 14.3 4600 male Adelie B
18 Gentoo female 43.2 14.5 4450 male Adelie B
19 Gentoo female 47.5 14.0 4875 male Adelie B
20 Gentoo female 47.2 13.7 4925 male Adelie B
21 Adelie male 39.1 18.7 3750 female Chinstrap C
22 Adelie female 39.5 16.7 3250 female Chinstrap C
23 Adelie female 35.7 16.9 3150 female Chinstrap C
24 Adelie female 35.7 18.0 3550 female Chinstrap C
25 Adelie female 36.2 17.2 3150 female Chinstrap C
26 Adelie female 36.0 17.1 3700 female Chinstrap C
27 Gentoo female 48.2 14.3 4600 female Chinstrap C
28 Gentoo female 43.2 14.5 4450 female Chinstrap C
29 Gentoo female 47.5 14.0 4875 female Chinstrap C
30 Gentoo female 47.2 13.7 4925 female Chinstrap C
31 Adelie male 39.1 18.7 3750 male Chinstrap D
32 Adelie female 39.5 16.7 3250 male Chinstrap D
33 Adelie female 35.7 16.9 3150 male Chinstrap D
34 Adelie female 35.7 18.0 3550 male Chinstrap D
35 Adelie female 36.2 17.2 3150 male Chinstrap D
36 Adelie female 36.0 17.1 3700 male Chinstrap D
37 Gentoo female 48.2 14.3 4600 male Chinstrap D
38 Gentoo female 43.2 14.5 4450 male Chinstrap D
39 Gentoo female 47.5 14.0 4875 male Chinstrap D
40 Gentoo female 47.2 13.7 4925 male Chinstrap D
cross_join(df, df2)
Output
species.x sex.x bill_length_mm bill_depth_mm mass sex.y species.y name
1 Adelie male 39.1 18.7 3750 female Adelie A
2 Adelie male 39.1 18.7 3750 male Adelie B
3 Adelie male 39.1 18.7 3750 female Chinstrap C
4 Adelie male 39.1 18.7 3750 male Chinstrap D
5 Adelie female 39.5 16.7 3250 female Adelie A
6 Adelie female 39.5 16.7 3250 male Adelie B
7 Adelie female 39.5 16.7 3250 female Chinstrap C
8 Adelie female 39.5 16.7 3250 male Chinstrap D
9 Adelie female 35.7 16.9 3150 female Adelie A
10 Adelie female 35.7 16.9 3150 male Adelie B
11 Adelie female 35.7 16.9 3150 female Chinstrap C
12 Adelie female 35.7 16.9 3150 male Chinstrap D
13 Adelie female 35.7 18.0 3550 female Adelie A
14 Adelie female 35.7 18.0 3550 male Adelie B
15 Adelie female 35.7 18.0 3550 female Chinstrap C
16 Adelie female 35.7 18.0 3550 male Chinstrap D
17 Adelie female 36.2 17.2 3150 female Adelie A
18 Adelie female 36.2 17.2 3150 male Adelie B
19 Adelie female 36.2 17.2 3150 female Chinstrap C
20 Adelie female 36.2 17.2 3150 male Chinstrap D
21 Adelie female 36.0 17.1 3700 female Adelie A
22 Adelie female 36.0 17.1 3700 male Adelie B
23 Adelie female 36.0 17.1 3700 female Chinstrap C
24 Adelie female 36.0 17.1 3700 male Chinstrap D
25 Gentoo female 48.2 14.3 4600 female Adelie A
26 Gentoo female 48.2 14.3 4600 male Adelie B
27 Gentoo female 48.2 14.3 4600 female Chinstrap C
28 Gentoo female 48.2 14.3 4600 male Chinstrap D
29 Gentoo female 43.2 14.5 4450 female Adelie A
30 Gentoo female 43.2 14.5 4450 male Adelie B
31 Gentoo female 43.2 14.5 4450 female Chinstrap C
32 Gentoo female 43.2 14.5 4450 male Chinstrap D
33 Gentoo female 47.5 14.0 4875 female Adelie A
34 Gentoo female 47.5 14.0 4875 male Adelie B
35 Gentoo female 47.5 14.0 4875 female Chinstrap C
36 Gentoo female 47.5 14.0 4875 male Chinstrap D
37 Gentoo female 47.2 13.7 4925 female Adelie A
38 Gentoo female 47.2 13.7 4925 male Adelie B
39 Gentoo female 47.2 13.7 4925 female Chinstrap C
40 Gentoo female 47.2 13.7 4925 male Chinstrap D
merge(dt, dt2, by =NULL)
Output
Key: <species, sex>
species sex bill_length_mm bill_depth_mm mass name
<char> <char> <num> <num> <int> <char>
1: Adelie female 39.5 16.7 3250 A
2: Adelie female 35.7 16.9 3150 A
3: Adelie female 35.7 18.0 3550 A
4: Adelie female 36.2 17.2 3150 A
5: Adelie female 36.0 17.1 3700 A
6: Adelie male 39.1 18.7 3750 B
species sex bill_length_mm bill_depth_mm mass name
1 Adelie male 39.1 18.7 3750 <NA>
2 Adelie female 39.5 16.7 3250 <NA>
3 Adelie female 35.7 16.9 3150 <NA>
4 Adelie female 35.7 18.0 3550 <NA>
5 Adelie female 36.2 17.2 3150 <NA>
6 Adelie female 36.0 17.1 3700 <NA>
7 Gentoo female 48.2 14.3 4600 <NA>
8 Gentoo female 43.2 14.5 4450 <NA>
9 Gentoo female 47.5 14.0 4875 <NA>
10 Gentoo female 47.2 13.7 4925 <NA>
11 Adelie female NA NA NA A
12 Adelie male NA NA NA B
13 Chinstrap female NA NA NA C
14 Chinstrap male NA NA NA D
bind_rows(tbl, tbl2)
Output
# A tibble: 14 × 6
species sex bill_length_mm bill_depth_mm mass name
<chr> <chr> <dbl> <dbl> <int> <chr>
1 Adelie male 39.1 18.7 3750 <NA>
2 Adelie female 39.5 16.7 3250 <NA>
3 Adelie female 35.7 16.9 3150 <NA>
4 Adelie female 35.7 18 3550 <NA>
5 Adelie female 36.2 17.2 3150 <NA>
6 Adelie female 36 17.1 3700 <NA>
7 Gentoo female 48.2 14.3 4600 <NA>
8 Gentoo female 43.2 14.5 4450 <NA>
9 Gentoo female 47.5 14 4875 <NA>
10 Gentoo female 47.2 13.7 4925 <NA>
11 Adelie female NA NA NA A
12 Adelie male NA NA NA B
13 Chinstrap female NA NA NA C
14 Chinstrap male NA NA NA D
rbindlist(list(dt, dt2), fill =TRUE)
Output
species sex bill_length_mm bill_depth_mm mass name
<fctr> <fctr> <num> <num> <int> <char>
1: Adelie male 39.1 18.7 3750 <NA>
2: Adelie female 39.5 16.7 3250 <NA>
3: Adelie female 35.7 16.9 3150 <NA>
4: Adelie female 35.7 18.0 3550 <NA>
5: Adelie female 36.2 17.2 3150 <NA>
6: Adelie female 36.0 17.1 3700 <NA>
7: Gentoo female 48.2 14.3 4600 <NA>
8: Gentoo female 43.2 14.5 4450 <NA>
9: Gentoo female 47.5 14.0 4875 <NA>
10: Gentoo female 47.2 13.7 4925 <NA>
11: Adelie female NA NA NA A
12: Adelie male NA NA NA B
13: Chinstrap female NA NA NA C
14: Chinstrap male NA NA NA D
2.17 Reshape data to longer format
Reshape the df data frame to a longer format where the bill_length_mm and bill_depth_mm columns are stacked into a single column named bill and a new column named type is created to indicate the type of measurement.
Reshape the first two rows of df data frame to a wider format where the sex column is used as the id variable and the species column is used as the time variable.
[1] Other Other Other Other Other Other Other Other Other Other Other Other
[13] Other Other Other Other Other Other Other Other Other Other Other Other
[25] Other Other Other Other Other Other Other Other Other Other Other Other
[37] Other Other Other Other Other Other Other Other Other Other Other Other
[49] Other Other Other Other Other Other Other Other Other Other Other Other
[61] Other Other Other Other Other Other Other Other Other Other Other Other
[73] Other Other Other Other Other Other Other Other Other Other Other Other
[85] Other Other Other Other Other Other Other Other Other Other Other Other
[97] Other Other Other Other Other Other Other Other Other Other Other Other
[109] Other Other Other Other Other I I I I I I I
[121] I I I I I I I I I I I I
[133] I I I I I I I I I I I J
[145] J J J J J J J J J J J J
[157] J J J J J J J J J J J J
[169] J J J J J J
Levels: Other I J
fct_lump_n(f, n =2)
Output
[1] A C A Other <NA> A C
Levels: A C Other
n <-min(ceiling(0.2*length(f0)), length(f0))levels(f0)[lvls %in%setdiff(lvls, lvls_order[1:n])] <-"Other"f0
Output
[1] A B B B C C C C C D D D D D D D D D D D D D D D D E E E E E E E E E E E E
[38] E E E E E E F F F F F F F F F F F F F F F F F F F F G G G G G G G G G G G
[75] G G G G G G G G G G G G H H H H H H H H H H H H H H H H H H H H H H H H H
[112] H H I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I J J J J J
[149] J J J J J J J J J J J J J J J J J J J J J J J J J J
Levels: A B C D E F G H I J
fct_lump_prop(f0, prop =0.2)
Output
[1] Other Other Other Other Other Other Other Other Other Other Other Other
[13] Other Other Other Other Other Other Other Other Other Other Other Other
[25] Other Other Other Other Other Other Other Other Other Other Other Other
[37] Other Other Other Other Other Other Other Other Other Other Other Other
[49] Other Other Other Other Other Other Other Other Other Other Other Other
[61] Other Other Other Other Other Other Other Other Other Other Other Other
[73] Other Other Other Other Other Other Other Other Other Other Other Other
[85] Other Other Other Other Other Other Other Other Other Other Other Other
[97] Other Other Other Other Other Other Other Other Other Other Other Other
[109] Other Other Other Other Other Other Other Other Other Other Other Other
[121] Other Other Other Other Other Other Other Other Other Other Other Other
[133] Other Other Other Other Other Other Other Other Other Other Other Other
[145] Other Other Other Other Other Other Other Other Other Other Other Other
[157] Other Other Other Other Other Other Other Other Other Other Other Other
[169] Other Other Other Other Other Other
Levels: Other
levels(f0)[lvls %in%names(tt[tt <2])] <-"Other"f0
Output
[1] Other B B B C C C C C D D D
[13] D D D D D D D D D D D D
[25] D E E E E E E E E E E E
[37] E E E E E E E F F F F F
[49] F F F F F F F F F F F F
[61] F F F G G G G G G G G G
[73] G G G G G G G G G G G G
[85] G G H H H H H H H H H H
[97] H H H H H H H H H H H H
[109] H H H H H I I I I I I I
[121] I I I I I I I I I I I I
[133] I I I I I I I I I I I J
[145] J J J J J J J J J J J J
[157] J J J J J J J J J J J J
[169] J J J J J J
Levels: Other B C D E F G H I J
fct_lump_min(f0, min =2)
Output
[1] Other B B B C C C C C D D D
[13] D D D D D D D D D D D D
[25] D E E E E E E E E E E E
[37] E E E E E E E F F F F F
[49] F F F F F F F F F F F F
[61] F F F G G G G G G G G G
[73] G G G G G G G G G G G G
[85] G G H H H H H H H H H H
[97] H H H H H H H H H H H H
[109] H H H H H I I I I I I I
[121] I I I I I I I I I I I I
[133] I I I I I I I I I I I J
[145] J J J J J J J J J J J J
[157] J J J J J J J J J J J J
[169] J J J J J J
Levels: B C D E F G H I J Other
[1] Other Other Other Other Other Other Other Other Other D D D
[13] D D D D D D D D D D D D
[25] D E E E E E E E E E E E
[37] E E E E E E E F F F F F
[49] F F F F F F F F F F F F
[61] F F F G G G G G G G G G
[73] G G G G G G G G G G G G
[85] G G H H H H H H H H H H
[97] H H H H H H H H H H H H
[109] H H H H H I I I I I I I
[121] I I I I I I I I I I I I
[133] I I I I I I I I I I I J
[145] J J J J J J J J J J J J
[157] J J J J J J J J J J J J
[169] J J J J J J
Levels: Other D E F G H I J
fct_lump_lowfreq(f0)
Output
[1] Other Other Other Other Other Other Other Other Other D D D
[13] D D D D D D D D D D D D
[25] D E E E E E E E E E E E
[37] E E E E E E E F F F F F
[49] F F F F F F F F F F F F
[61] F F F G G G G G G G G G
[73] G G G G G G G G G G G G
[85] G G H H H H H H H H H H
[97] H H H H H H H H H H H H
[109] H H H H H I I I I I I I
[121] I I I I I I I I I I I I
[133] I I I I I I I I I I I J
[145] J J J J J J J J J J J J
[157] J J J J J J J J J J J J
[169] J J J J J J
Levels: D E F G H I J Other
4.12 Match levels
if("A"%in%levels(f)) { f %in%"A"}
Output
[1] TRUE FALSE TRUE FALSE FALSE TRUE FALSE
fct_match(f, "A")
Output
[1] TRUE FALSE TRUE FALSE FALSE TRUE FALSE
4.13 Recode levels
factor(sapply(f, \(x) { x <-as.character(x)switch(x, A ="apple", B ="banana", x) }))
Output
[1] apple C apple banana <NA> apple C
Levels: apple banana C
fct_recode(f, apple ="A", banana ="B")
Output
[1] apple C apple banana <NA> apple C
Levels: apple banana C
4.14 Relabel levels
levels(f) <-paste0(levels(f), "1")f
Output
[1] A1 C1 A1 B1 <NA> A1 C1
Levels: A1 B1 C1
fct_relabel(f, ~str_c(., "1"))
Output
[1] A1 C1 A1 B1 <NA> A1 C1
Levels: A1 B1 C1
4.15 Relevel
relevel(f, ref ="B")
Output
[1] A C A B <NA> A C
Levels: B A C
fct_relevel(f, "B")
Output
[1] A C A B <NA> A C
Levels: B A C
4.16 Reorder
reorder(f, x1, mean)
Output
[1] A C A B <NA> A C
attr(,"scores")
A B C
0.5789172 0.9082078 0.6583996
Levels: A C B
fct_reorder(f, x1, mean)
Output
[1] A C A B <NA> A C
Levels: A C B
reorder(f, seq_along(f), \(i) sum(x1[i] * x2[i]))
Output
[1] A C A B <NA> A C
attr(,"scores")
A B C
0.8280563 0.1870677 0.5969617
Levels: B C A
Barrett, Tyson, Matt Dowle, Arun Srinivasan, Jan Gorecki, Michael Chirico, Toby Hocking, and Benjamin Schwendinger. 2024. Data.table: Extension of ‘Data.frame‘. https://CRAN.R-project.org/package=data.table.
Horst, Allison Marie, Alison Presmanes Hill, and Kristen B Gorman. 2020. Palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data. https://doi.org/10.5281/zenodo.3960218.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse.”Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Footnotes
In practice, the data is just reset if it was modified.↩︎