Towards better reproducible practice

STAT1003 – Statistical Techniques

Dr. Emi Tanaka

Australian National University

These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.

Non-robust workflow

What should have been submitted:

A robust, reproducible workflow

Using a robust, reproducible workflow means:
- you avoid manual, repetitive tasks
- your results are computationally reproducible
Using a robust, reproducible workflow doesn’t mean you won’t make mistakes, but it will help you minimise mistakes.

Literate programming is a programming paradigm introduced by Donald Knuth where it emphasises writing code for humans (i.e. intertwine code with natural language explanations).

Literate programming includes documentation (detailed explanations, comments and annotations to provide context, rationale and insight into the program’s design and functionality).

Analysis framework

Tidy data

Each column is a variable.
Each row is an observation.
Each cell is a single value.

Tools

Git/GitHub for version control and collaboration
Open-source programming languages (e.g. R and Python) for coding
Quarto with markdown syntax for interoperable reproducible reports

Statistical value chain

… a statistical value chain is constructed by defining a number of meaningful intermediate data products, for which a chosen set of quality attributes are well described …

– van der Loo & de Jonge (2018)

Folder structure

A suggested folder structure for data projects:

    project-root-folder/  # Root of the project folder
    │
    ├── README.md         # README file
    │
    ├── data/             # Raw and derived data
    │   ├── data-raw/     # Read-only files
    │   ├── data-input/   # Extracted and coerced from raw data
    │   ├── data-valid/   # Edit and imputed from input data
    │   └── data-stats/   # Analysed results (R objects, .csv, etc.)
    │
    ├── analysis/         # Scripts (not functions) to run analysis
    │
    ├── figures/          # Figures (.png, .pdf, etc.)
    │
    ├── misc/             # Misc
    │
    ├── report.qmd        # Report, paper, or thesis output

Sharing your documents

via Quarto Pubs

Make sure you are logged in to your Quarto Pub account.
Then run the following command in the Terminal:

quarto publish quarto-pub /path/to/your/quarto-document.qmd

Self-contained HTML document

format:
  html:
    embed-resources: true

then you can share your output HTML file with no external dependencies

Happy writing and sharing 😊