Designing a simple simulation study

STAT1003 – Statistical Techniques

Dr. Emi Tanaka

Australian National University

These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.

Simulation study

  • Simulation is a powerful tool to understand the behaviour of a system when we cannot easily derive the answer mathematically.
  • In a simulation study, we need:
    • a data generating process (DGP) that mimics the real-world process we are interested in, and
    • a statistic that we want to study the behaviour of.

Simulation design

  • We can consider various scenarios or factors that may affect the behaviour of the statistic.
  • We can then use simulation to explore how the statistic behaves under different scenarios.

Rolling 10 dices

  • Let \(X\) be the total of rolling 10 dice.
  • We can get the exact distribution of \(X\) by enumerating all possible outcomes of rolling 10 dice, and counting the number of outcomes that give each possible total.
Alternative approach

This requires more memory, as it needs to store all possible outcomes of rolling 10 dice, which is \(6^{10} = 60,466,176\) outcomes. However, it is more straightforward to implement and understand.

Simulate rolling 10 dice

  • We can also simulate rolling 10 dice by generating random numbers from a (discrete) uniform distribution and summing them up.

Rolling 10,000 dices

  • Let \(X\) be the total of rolling 10,000 dice.
  • Suppose we want to know the probability that \(X\) is greater than 35,000.
  • We can use the exact distribution approach, but it is computationally infeasible as it requires enumerating all possible outcomes of rolling 10,000 dice, which is \(6^{10000}\) outcomes.
  • However, we can easily simulate rolling 10,000 dice and estimate the probability that \(X\) is greater than 30,000.

Bootstrapping

  • Bootstrapping is frequently used to estimate the sampling distribution of a statistic when the underlying population distribution is unknown or when the sample size is small.
  • The bootstrap method involves repeatedly resampling with replacement from the observed data and calculating the statistic of interest for each resample.

Summary

  • Simulation is a powerful tool to understand the behaviour of a system when we cannot easily derive the answer mathematically.
  • In a simulation study, we need a data generating process (DGP) that mimics the real-world process we are interested in, and a statistic that we want to study the behaviour of.
  • We can consider various scenarios or factors that may affect the behaviour of the statistic, and use simulation to explore how the statistic behaves under different scenarios.
  • Simulation can be used to estimate probabilities or other characteristics of a statistic when the exact distribution is computationally infeasible to derive.
  • Bootstrapping is a resampling method that can be used to estimate the sampling distribution of a statistic when the underlying population distribution is unknown or when the sample size is small.