Acknowledgement

Some components of this lecture was adapted from the previous STAT1003 lecturers. Thank you folks!

Random variable

Random variables

A random variable is a variable whose possible values are numerical outcomes of a random phenomenon.

There are two main types of random variables:

  • Discrete random variables: possible values are a countable number of distinct values
    • E.g., number of heads in 10 coin tosses.
  • Continuous random variables: any values within a given range
    • E.g., the height of students in a class.

Notation

  • Random variables are usually represented by uppercase letters (e.g., \(X\) or \(Y\)).
  • Specific outcomes (realised values) are represented by corresponding lowercase letters (e.g., \(x\) or \(y\)).
  • For example:
    • Let \(X\) be the number of heads in 3 coin tosses.
    • Then \(x = 0\) represents the outcome of getting 0 heads.

Probability distribution

A probability distribution describes how probabilities are assigned to the possible values of a random variable.

  • For a discrete random variable, the distribution is described by a probability mass function (pmf).
  • For a continuous random variable, the distribution is described by a probability density function (pdf).

By convention we denote:

  • the pmf of a discrete random variable \(X\) as \(P(X = x)\) or \(p_X(x)\), and
  • the pdf of a continuous random variable \(X\) as \(f_X(x)\).
  • Probability distribution summarises characteristics of the population.
  • Barplots for discrete data and histograms and density plots for continuous data are visualisations of estimates of the probability distribution of the underlying random variable.

Parametric distributions

  • Parametric distribution are defined by just a handful of parameters.

Bernoulli distribution


Binomial distribution

Poisson distribution


Negative binomial distribution

Normal distribution

t distribution


Uniform distribution

Gamma distribution

Discrete random variable

Probability mass function

  • Suppose that \(X\) is a discrete random variable with \(k\) distinct possible values: \(x_1, x_2, \ldots, x_k\).
  • The pmf, denoted as \(p_X(x) = P(X = x)\), gives the probability that the random variable \(X\) is exactly equal to some value \(x\).

Properties of a pmf:

  • \(0 \leq p_X(x) \leq 1\) for all \(x\)
  • \(\sum_{i = 1}^k p_X(x_i) = 1\)

If \(X\) is the number of heads in 2 tosses of a fair coin, then the pmf is given by:

\(x\) \(p_X(x)\)
\(0\) \(0.25\)
\(1\) \(0.50\)
\(2\) \(0.25\)

Note: this is similar to a relative frequency table, but the probabilities are theoretical values based on the assumption of a fair coin, rather than empirical estimates from data.

Expected value of a discrete random variables

The expected value (or mean) of a discrete random variable \(X\) is the long-run average value of repetitions of the experiment:

\[E(X) = \mu = \sum_{i=1}^k x_i \, p_X(x_i).\]

The expected value of \(X\), the number of heads in 2 tosses of a fair coin, is:

\[E(X) = 0 \times 0.25 + 1 \times 0.50 + 2 \times 0.25 = 1.\]

Variance of a discrete random variable

The variance measures how spread out the values of \(X\) are around the mean:

\[\text{Var}(X) = \sigma^2 = \sum_{i=1}^k (x_i - \mu)^2 \,p_X(x_i)\]

The variance of \(X\), the number of heads in 2 tosses of a fair coin, is:

\[\begin{align*} \text{Var}(X) &= (0 - 1)^2 \times 0.25 +\\&\qquad (1 - 1)^2 \times 0.50 +\\&\qquad (2 - 1)^2 \times 0.25\\ &= 0.5. \end{align*}\]

Expected value of a function

  • Sometimes we are interested in a function of a random variable, such as \(Y = g(X)\).

The expected value of a transformation is given by: \[E\left(g(X)\right) = \sum_{i = 1}^k g(x_i) \cdot p_X(x_i)\]

for a discrete random variable \(X\).

If \(X\) is the number of heads in 2 tosses of a fair coin, and we define \(Y = X^2\), then the expected value of \(Y\) is:

\[\begin{align*} E(Y) &= E(X^2)\\ &= 0^2 \times 0.25 + 1^2 \times 0.50 + 2^2 \times 0.25\\ &= 1.5. \end{align*}\]

Bivariate distributions

  • A bivariate distribution describes the probability behavior of two random variables, say \(X\) and \(Y\), simultaneously.

  • The joint probability distribution specifies \(P(X = x, Y = y)\) for all possible \((x, y)\) combinations and must satisfy:

    • \(0 \le p_{X,Y}(x,y) \le 1\) for all \(x, y\)
    • \(\sum_x \sum_y p_{X,Y}(x,y) = 1\)

Joint probability table:

\(X \backslash Y\) \(1\) \(2\)
\(0\) \(0.1\) \(0.2\)
\(1\) \(0.3\) \(0.4\)
  • Each entry shows \[P(X = x, Y = y).\]
  • The total of all probabilities in the table must be \(1\).

Marginal distributions

The marginal distribution describes the probability distribution of one variable in a bivariate (joint) distribution, regardless of the value of the other variable.

  • For random variables \(X\) and \(Y\) with joint probabilities \(p_{X,Y}(x, y)\):

    • The marginal distribution of \(X\): \[p_X(x) = \sum_y p_{X,Y}(x, y).\]
    • The marginal distribution of \(Y\): \[p_Y(y) = \sum_x p_{X,Y}(x, y).\]

If the joint probability table is:

\(X \backslash Y\) \(1\) \(2\)
\(0\) \(0.1\) \(0.2\)
\(1\) \(0.3\) \(0.4\)
  • Marginal for \(X = 0\): \(P(X=0) = 0.1 + 0.2 = 0.3\)
  • Marginal for \(Y = 2\): \(P(Y=2) = 0.2 + 0.4 = 0.6\)

Independence of random variables

  • Two random variables \(X\) and \(Y\) are independent if knowing the value of one does not provide any information about the other.
  • Mathematically, \(X\) and \(Y\) are independent if, for all values of \(x\) and \(y\): \[p_{XY}(x,y) = p_X(x) \cdot p_Y(y).\]
  • Independence is often a simplifying (but realistic) assumption that greatly simplifies analysis.

If the joint probability table is:

\(X \backslash Y\) \(1\) \(2\)
\(0\) \(0.1\) \(0.2\)
\(1\) \(0.3\) \(0.4\)
  • \(p_{XY}(0, 1) = 0.1\)
  • \(p_X(0) = 0.1 + 0.2 = 0.3\)
  • \(p_Y(1) = 0.1 + 0.3 = 0.4\)
  • So \(p_X(0) \cdot p_Y(1) = 0.3 \cdot 0.4 = 0.12\).
  • Since \(p_{XY}(0, 1) \neq p_X(0) \cdot p_Y(1)\),
    \(X\) and \(Y\) are not independent.

Covariance

The covariance is defined as: \[\mathrm{Cov}(X, Y) = E\left((X - E(X)) (Y - E(Y))\right)\]

Alternatively, \[\mathrm{Cov}(X, Y) = E(XY) - E(X)E(Y)\]

  • If \(X\) and \(Y\) are independent, then \(\mathrm{Cov}(X, Y) = 0\).
  • However, \(\mathrm{Cov}(X, Y) = 0\) does not necessarily imply that \(X\) and \(Y\) are independent!

Laws of expected value and variance

For random variables \(X\) and \(Y\), and constants \(a\) and \(b\), the following properties hold:

  • \(E(a) = a\)
  • \(E(aX) = aE(X)\)
  • \(E(aX + bY) = aE(X) + bE(Y)\)
  • If \(X\) and \(Y\) are independent:
    \(E(XY) = E(X)E(Y)\)
  • \(\mathrm{Var}(a) = 0\)
  • \(\mathrm{Var}(X) = E(X^2) - E(X)^2\)
  • \(\mathrm{Var}(aX) = a^2\mathrm{Var}(X)\)
  • \(\mathrm{Var}(X + a) = \mathrm{Var}(X)\)
  • \(\mathrm{Var}(aX + bY) = a^2\mathrm{Var}(X) + b^2\mathrm{Var}(Y) + 2ab\mathrm{Cov}(X, Y)\)
  • If \(X\) and \(Y\) are independent:
    \(\mathrm{Var}(aX + bY) = a^2\mathrm{Var}(X) + b^2\mathrm{Var}(Y)\)

Summary

  • A discrete random variable \(X\) takes on a countable number of distinct values.

  • The probability mass function (pmf), \(p_X(x)\), gives the probability that a discrete random variable is exactly equal to \(x\).

  • The expected value \(E(X) = \sum_x x\cdot p_X(x)\).

  • The variance \(\mathrm{Var}(X) = E\left(X - E(X)\right)^2 = E(X^2) - E(X)^2\).

  • The expected value of a transformation \(E(g(X)) = \sum_x g(x) \cdot p_X(x)\).

  • The random variables \(X\) and \(Y\) are independent if \(p_{XY}(x,y) = p_X(x) \cdot p_Y(y)\) for all \(x\) and \(y\).

  • For random variables \(X\) and \(Y\) and constants \(a\) and \(b\):

    • \(E(aX + bY) = aE(X) + bE(Y)\) and
    • \(\mathrm{Var}(aX + bY) = a^2\mathrm{Var}(X) + b^2\mathrm{Var}(Y) + 2 a b \mathrm{Cov}(X, Y)\).

Binomial distribution

Binary events in the wild


  • Possible outcomes:
    (A) tail or (B) head
  • For an unbiased coin, probability for each outcome is 0.5.
  • Possible outcomes:
    (A) a baby girl 👧 or (B) a baby boy 👦 ignoring miscarriages, irregularities, intersex, etc
  • Probability for each outcome is 0.5.
  • Possible outcomes:
    (A) Labor party 🔴 or (B) Coalition party 🔵 (ignoring other parties and formation of majority or minority government)
  • Probability for Labor party winning??
  • Possible outcomes:
    (A) Win 🏆 or (B) Lose ❌
  • Probability of winning depends on the skill level of the players.

Bernoulli distribution

  • If \(X\) is a random variable representing the outcome of a single trial with two possible outcomes (0 = failure or 1 = success), we can model \(X\) using a Bernoulli distribution.
  • We write \(X \sim \mathrm{Bernoulli}(p)\) where \(p\) is the probability of success.
  • Probability mass function is given by: \(P(X = 1) = p\) and \(P(X = 0) = 1 - p\).
  • Expected value: \(E(X) = p\).
  • Variance: \(\mathrm{Var}(X) = p(1-p)\).

Binomial distribution

  • The number of “successes” out of \(n\) independent Bernoulli trials with probability of success, \(p\), follows a binomial distribution with parameters \(n\) and \(p\).
  • Or we can simulate the sum directly:

A Binomial random variable

\[X = X_1 + X_2 + \cdots + X_n \sim B(n, p)\]

  • \(X_i \sim \text{Bernoulli}(p)\) where \(p\) is the probability of success,
  • \(X_i = 1\) if \(i\)-th trial is a success, otherwise \(X_i = 0\),
  • all the trials are independent and \(p\) is constant for all trials,
  • \(X \in \{0, 1, \ldots, n\}\) is the number of successes out of \(n\) trials.
  • Bernoulli random variable is a special case of Binomial random variable when \(n = 1\).
  • Expected value: \(E(X) = np\)
  • Variance: \(\text{Var}(X) = np(1-p)\)
  • Standard deviation: \(\text{SD}(X) = \sqrt{np(1-p)}\)

Probability mass function:

\[P(X = x) = \binom{n}{x} p^x (1-p)^{n-x}\]

Binomial probability and distribution function

Suppose I flip an unbiased coin 10 times (so \(n = 10, p = 0.5\)).

  • What is the probability that exactly 3 are heads?
  • What is the probability that there are 3 or less heads?
  • What is the probability that there are 3 or more heads?

Simulating Binomial random variables


Simulating coin flips and count the number of heads

Use in-silico experiments to understand statistics

  • Computer-based simulations are “cheap”.
  • Understand how statistics behave under known data-generating process.

Summary

\[X \sim B(n, p)\]

  • \(X\) is the number of successes in \(n\) independent Bernoulli trials with probability of success, \(p\).
  • Expected value: \(E(X) = np\).
  • Variance: \(\text{Var}(X) = np(1-p)\).
  • Probability mass function: \(P(X = x) = \binom{n}{x} p^x (1-p)^{n-x}\) for \(x = 0, 1, \ldots, n\).

Poisson distribution

Poisson distribution

  • If \(X \sim B(n, p)\) and \(n \to \infty\), \(p \to 0\) such that \(np = \lambda\) (a constant), then

    \[ P(X = k) = \lim_{n \to \infty} \binom{n}{k} p^k (1 - p)^{n - k} = \frac{e^{-\lambda} \lambda^k}{k!}. \]

  • This limiting distribution is called the Poisson distribution with parameter \(\lambda\) written as \(X \sim \mathrm{Poisson}(\lambda)\).

  • \(E(X) = \lambda\) and \(\mathrm{Var}(X) = \lambda\).

Poisson distribution in practice

  • The Poisson distribution is often used to model the number of times a discrete event occurs within a given, fixed interval of time, distance, area, or volume.
  • For example, it can be used to model:
    • the number of emails received per hour, or
    • the number of accidents at an intersection per month.
  • The parameter \(\lambda\) is the expected number of occurrences in the interval.

If a coffee shop has an average of 75 customers per hour then assume the number of customers \(X \sim \mathrm{Poisson}(75)\). Then probability of exactly 70 customers arriving in an hour is \[ P(X=70) = \frac{75^{70}e^{-75}}{70!} \approx 0.0402. \]

Poisson distribution in R

  • Suppose \(X \sim \mathrm{Poisson}(75)\).
  • Then \(P(X = 70)\):
  • Then \(P(X \leq 70)\):
  • To simulate 1000 random variables from \(X \sim \mathrm{Poisson}(75)\):

Summary

\[X \sim \mathrm{Poisson}(\lambda)\]

  • \(P(X = k) = \frac{e^{-\lambda} \lambda^k}{k!}\) for \(k = 0, 1, 2, \ldots\).
  • \(E(X) = \lambda\) and \(\mathrm{Var}(X) = \lambda\).
  • Poisson distribution is the limiting distribution of the Binomial distribution when \(n \to \infty\), \(p \to 0\) such that \(np = \lambda\) (a constant).
  • Poisson distribution is often used to model the number of times a discrete event occurs within a given, fixed interval of time, distance, area, or volume.