Functional programming

STAT1003 – Statistical Techniques

Dr. Emi Tanaka

Australian National University

These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.

Repetitive tasks

  • 🎯 Let’s calculate the number of distinct values for each column and store it in n
  • Anything you notice from above?

Using for loops

  • But R is notoriously known for being slow using for loops
  • The across function in dplyr can do this without for loops
  • Note that the result is a data.frame

Functions in R

  • Functions can be broken into three components:

    • formals(), the list of arguments,
    • body(), the code inside the function, and
    • environment()1.
  • Functions in R are created using function() with binding to a name using <- or =

Function body

  • The body of a function can be a single expression or a block of code enclosed in {}.

Functional programming with purrr

  • purrr is part of the core tidyverse packages.
  • It contains a series of map and walk functions.
  • The related functions in purrr have been design so that that input are consistent.
  • The user is required to think of the expected output before seeing the output.

map functions in purrr

  • map(.x, .f, ...) returns a list
  • map_chr(.x, .f, ...) returns a vector of character
  • map_dbl(.x, .f, ...) returns a vector of numeric
  • map_int(.x, .f, ...) returns a vector of integer
  • map_lgl(.x, .f, ...) returns a vector of logical

Conditional maps in purrr

  • map_if(.x, .p, .f, ...) uses .p to determine if .f will be applied to .x
  • map_at(.x, .at, .f, ...) applies .f to .x at .at (name or position)
  • map_depth(.x, .depth, .f, ...) apples .f to .x at a specific depth level of a nested vector
  • The return object is always a list

Functional programming in Base R

  • lapply, Map, mapply, sapply, tapply, apply, and vapply are variants of functional programming in Base R
  • Some function outputs in Base R are more predictable than others:
    • purrr::map is a variant of lapply (which always returns list)
    • purrr::pmap is a variant of Map (which takes more than one input)
  • sapply doesn’t require users to specify the output type, instead it’ll try to figure out what looks best for the user… great for interactive use but require great caution for programming

Anonymous functions

  • Anonymous functions, also called lambda expression in computer programming, are functions without names.
  • Since R version 4.1.0, we can use the shorthand \(x) to define anonymous functions
  • Tidyverse employs a special shorthand using a formula and .x as a special placeholder for input

Formula anonymous function in Tidyverse

  • Formula anonymous functions are not just for purrr functions:
  • Most tidyverse functions would support this formula approach to anonymous function, but likely not outside of that ecosystem unless developers adopt the same system.

Functions with two inputs

  • For functions with two inputs, you can use the map2 variants in purrr
  • For anonymous functions with two inputs, the first input is .x (as before) and the second is .y

Functions with more than two inputs

  • What about if there are more than two input?
  • You can use pmap variants in purrr

Other functions in purrr

Using names of input

  • The imap(x) variants are shorthand for map2(x, names(x))

Expecting no return object

  • If you are looking to get a side effect rather than return, you can use the walk variants

reduce function in purrr

  • reduce(.x, .f) applies a function .f cumulatively to the elements of .x, from left to right.
  • E.g. reduce(c(1, 2, 3, 4), sum) is equivalent to sum(sum(sum(1, 2), 3), 4)
  • accumulate(.x, .f) is similar to reduce, but it returns the intermediate results as well.

purrr cheatsheet