Towards better reproducible practice

STAT1003 – Statistical Techniques

Dr. Emi Tanaka

Australian National University

These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.

Non-robust workflow

What should have been submitted:

A robust, reproducible workflow

  • Using a robust, reproducible workflow means:
    • you avoid manual, repetitive tasks
    • your results are computationally reproducible
  • Using a robust, reproducible workflow doesn’t mean you won’t make mistakes, but it will help you minimise mistakes.

Literate programming is a programming paradigm introduced by Donald Knuth where it emphasises writing code for humans (i.e. intertwine code with natural language explanations).

  • Literate programming includes documentation (detailed explanations, comments and annotations to provide context, rationale and insight into the program’s design and functionality).

Analysis framework

Tidy data

  1. Each column is a variable.
  2. Each row is an observation.
  3. Each cell is a single value.

Tools

  • Git/GitHub for version control and collaboration
  • Open-source programming languages (e.g. R and Python) for coding
  • Quarto with markdown syntax for interoperable reproducible reports

Statistical value chain

… a statistical value chain is constructed by defining a number of meaningful intermediate data products, for which a chosen set of quality attributes are well described …

– van der Loo & de Jonge (2018)

Folder structure

A suggested folder structure for data projects:

    project-root-folder/  # Root of the project folder
    │
    ├── README.md         # README file
    │
    ├── data/             # Raw and derived data
    │   ├── data-raw/     # Read-only files
    │   ├── data-input/   # Extracted and coerced from raw data
    │   ├── data-valid/   # Edit and imputed from input data
    │   └── data-stats/   # Analysed results (R objects, .csv, etc.)
    │
    ├── analysis/         # Scripts (not functions) to run analysis
    │
    ├── figures/          # Figures (.png, .pdf, etc.)
    │
    ├── misc/             # Misc
    │
    ├── report.qmd        # Report, paper, or thesis output

Sharing your documents

via Quarto Pubs

  • Make sure you are logged in to your Quarto Pub account.
  • Then run the following command in the Terminal:
quarto publish quarto-pub /path/to/your/quarto-document.qmd


Self-contained HTML document


format:
  html:
    embed-resources: true


then you can share your output HTML file with no external dependencies

Happy writing and sharing 😊