Conditional probability

STAT1003 – Statistical Techniques

Dr. Emi Tanaka

Australian National University

These slides are best viewed on a modern browser like Google Chrome on a desktop or laptop. Some interactive components may require some time to fully load.

SMS spam classification

  • A large language model (LLM) was used to classify if a SMS message is spam or ham (legitimate).

Example spam SMS

Last chance 2 claim ur £150 worth of discount vouchers-Text YES to 85023 now!SavaMob-member offers mobile T Cs 08717898035. £3.00 Sub. 16 . Remove txt X or STOP

Example SMS (not spam)

I dun believe u. I thk u told him.

  • The results are summarized in the contingency table (also referred to as confusion matrix in this case) below.
Prediction
Truth
Total
ham spam
ham 2,280 368 2,648
spam 1,843 282 2,125
Total 4,123 650 4,773
  • What the probability that if I select a random SMS message that LLM classified it as spam?

Relative frequency table

  • Relative to what?


Table 1
Prediction
Truth
Total
ham spam
ham 0.861 0.139 1.000
spam 0.867 0.133 1.000
Table 2
Prediction
Truth
ham spam
ham 0.553 0.566
spam 0.447 0.434
Total 1.000 1.000
Table 3
Prediction
Truth
Total
ham spam
ham 0.478 0.077 0.555
spam 0.386 0.059 0.445
Total 0.864 0.136 1.000

Joint distribution table

A joint probability of events \(A\) and \(B\) is the probability that both events occur together, denoted by \(P(A \cap B)\).

  • Suppose that we randomly select a SMS message from the dataset.
  • What are the probabilities:
    • \(P(\text{SMS is predicted as ham} \cap \text{SMS is a ham})\)
    • \(P(\text{SMS is predicted as ham} \cap \text{SMS is a spam})\)
    • \(P(\text{SMS is predicted as spam} \cap \text{SMS is a ham})\)
    • \(P(\text{SMS is predicted as spam} \cap \text{SMS is a spam})\)
Prediction
Truth
Total
ham spam
ham 0.478 0.077 0.555
spam 0.386 0.059 0.445
Total 0.864 0.136 1.000

Marginal probability

A marginal probability is the probability of a single event occurring, regardless of the outcomes of other variables.

  • Marginal probabilities are calculated by either:
    • adding across the rows of the table, or
    • adding down the columns of the table.
Prediction
Truth
Total
ham spam
ham 0.478 0.077 0.555
spam 0.386 0.059 0.445
Total 0.864 0.136 1.000

\[P(\underbrace{\text{predicted as ham}}_{\large A})=P(A \cap \underbrace{\text{is a ham}}_{\large B})+P(A \cap \underbrace{\text{is a spam}}_{\large B^c})\]

  • Note here \(B \cup B^c = S\) (sample space).

Law of total probability

Suppose that the events \(B_{1}, B_{2}, \ldots, B_{n}\) are a partition of the sample space. That is:

  • \(B_{1}, B_{2}, \ldots, B_{n}\) are mutually exclusive
  • \(B_{1} \cup B_{2} \cup \ldots \cup B_{n}=S\).

Then for any event \(A\), the following is true: \[P(A)=\sum_{i=1}^{n} P\left(A \cap B_{i}\right).\]

This is referred to as the law of total probability.

Conditional probability

For two events \(A\) and \(B\), the conditional probability of \(A\) given that \(B\) has occurred, denoted by \(P(A \mid B)\), is defined to be:

\[P(A \mid B)=\frac{P(A \cap B)}{P(B)}\]

\[P(\text{Truth} \mid \text{Prediction})\]

Prediction
Truth
Total
ham spam
ham 0.861 0.139 1.000
spam 0.867 0.133 1.000

\[P(\text{Prediction} \mid \text{Truth})\]

Prediction
Truth
ham spam
ham 0.553 0.566
spam 0.447 0.434
Total 1.000 1.000

Note that \(P(A \mid B) \ne P(B \mid A)\) in general.

Challenge

If a SMS is predicted to be spam, what is the probability that the prediction is correct?

You are rolling a die. You are told you rolled an even number.

  • What is the probability that you rolled a 2?
  • What is the probability that you rolled a big number (4,5,6)?
Prediction
Truth
Total
ham spam
ham 0.478 0.077 0.555
spam 0.386 0.059 0.445
Total 0.864 0.136 1.000

Solution:

  • \(P(\text{is a spam} \mid \text{predicted as spam}) = \dfrac{P(\text{is a spam} \cap \text{predicted as spam})}{P(\text{predicted as spam})} = \dfrac{0.059}{0.445} \approx 0.133.\)
  • \(P(\text{is a 2} \mid \text{is even}) = \dfrac{P(\text{is a 2} \cap \text{is even})}{P(\text{is even})} = \dfrac{\frac{1}{6}}{\frac{1}{2}} = \dfrac{1}{3}.\)
  • \(P(\text{is a big number} \mid \text{is even}) = \dfrac{P(\text{is a big number} \cap \text{is even})}{P(\text{is even})} = \dfrac{\frac{2}{6}}{\frac{1}{2}} = \dfrac{2}{3}.\)

Independent events

Two events \(A\) and \(B\) are independent if and only if:

\[P(A \cap B)=P(A) \times P(B).\]

  • Notice if \(A\) and \(B\) are independent: \[\begin{equation*} \begin{array}{l} P(A \mid B)=\frac{P(A \cap B)}{P(B)}=\frac{P(A) \times P(B)}{P(B)}=P(A) \\ P(B \mid A)=\frac{P(B \cap A)}{P(A)}=\frac{P(B) \times P(A)}{P(A)}=P(B) \end{array} \end{equation*}\]

  • That is, two events are independent if the probability of one event occurring is not affected by the occurrence of the other event.

Challenge

Consider rolling two dice, what is the probability of getting two 1s?

If we shuffle up a deck of cards and draw one, is the event that the card is a heart independent of the event that the card is an ace?

A streaming service reports that 25% of its users watch documentaries and 60% of its users watch movies. If 15% of users watch both documentaries and movies, are the events “a user watches documentaries” and “a user watches movies” independent?

Multiplication rule

  • Using the definition of conditional probability we have \[P(A \mid B)=\frac{P(A \cap B)}{P(B)}\]

For two events \(A\) and \(B\), the multiplication rule states that \[\begin{align*} P(A\cap B) &= P(A\mid B)\times P(B)\\ &= P(B\mid A)\times P(A) \end{align*}\]

  • When \(A\) and \(B\) are independent, the multiplication rule is simply \[P(A\cap B) = P(A) \times P(B).\]

Smallpox in Boston, 1721

  • The smallpox data set provides a sample of 6,224 individuals from the year 1721 who were exposed to smallpox in Boston.
  • Doctors at the time believed that inoculation, which involve exposing a person to the disease in a controlled form, could reduce the likelihood of death.
  • The counts of result (died or lived) from inoculated (yes and no) are shown in the contingency table below.
Result
Inoculated
Total
no yes
died 844 6 850
lived 5136 238 5374
Total 5980 244 6224

What is the probability that a randomly selected person who was not inoculated died from smallpox?

\[P(\text{died}\mid \text{not inoculated}) = \frac{P(\text{died}\cap \text{not inoculated})}{P(\text{not inoculated})} = \frac{844/6224}{5980/6224} = 0.1411.\]

What is the probability that an inoculated person died from smallpox?

\[P(\text{died}\mid \text{inoculated}) = \frac{P(\text{died}\cap \text{inoculated})}{P(\text{inoculated})} = \frac{6/6224}{244/6224} = 0.0246.\]

Sum of conditional probabilities

Let \(A_{1}, \ldots, A_{n}\) represent a set of disjoint and exhaustive events, i.e.,

  • \(A_{1} \cup A_{2} \cup \ldots \cup A_{n}=S\), and
  • \(A_i \cap A_j = \emptyset\) for \(i\neq j\).

Then for event \(B\),

\[ P\left(A_{1} \mid B\right)+\cdots+P\left(A_{n} \mid B\right) = 1 \]

  • One special case: \[ P(A \mid B)=1-P\left(A^{c} \mid B\right) \]

Bayes’ theorem

For two events \(A\) and \(B\), the Bayes’ theorem states \[P(A\mid B) = \frac{P(B|A)P(A)}{P(B)}.\]

Heroin test example

  • Joe is a randomly chosen member of a large population in which 3% are heroin users.
  • Joe tests positive for heroin in a drug test that correctly identifies users 95% of the time and correctly identifies non-users 90% of the time.
  • Determine the probability that Joe uses heroin given the positive test result.
  • First write what we want in probabilities:
    • \(P(\text{User}) = 0.03\)
    • \(P(\text{Positive Test}\mid \text{User}) = 0.95\)
    • \(P(\text{Negative Test}\mid \text{User}^c) = 0.9\)
    • \(P(\text{User} \mid \text{Positive Test})\)
  • By Bayes’ theorem, \(P(\text{User}\mid \text{Positive Test})\)

\[\begin{align*} &= \frac{P(\text{User}\cap \text{Positive Test})}{P(\text{Positive Test})}\\ &= \frac{P(\text{Positive Test}\mid \text{User})P(\text{User})}{P(\text{Positive Test}\mid \text{User})P(\text{User}) + P(\text{Positive Test}\mid \text{User}^c)P(\text{User}^c)}\\ &= \frac{0.95\times 0.03}{0.95\times 0.03+(1-0.9)\times (1-0.03)}\\ &= 0.2271 \end{align*}\]

Summary

  • Joint probability, \(P(A\cap B)\), describes the probability of two events, \(A\) and \(B\), occurring together.
  • Marginal probability, \(P(A)\), describes the probability of \(A\) occurring, regardless of \(B\).
  • Conditional probability, \(P(A|B) = \dfrac{P(A\cap B)}{P(B)}\) describes the probability of an event occurring given that another event has occurred.
  • Events \(A\) and \(B\) are independent if and only if \(P(A\cap B) = P(A)P(B)\).
  • Law of total probability: if \(B_1, B_2, \ldots, B_n\) are a partition of the sample space, then \(P(A) = \sum_{i=1}^n P(A\cap B_i)\).
  • Bayes’ theorem: \[P(A\mid B) = \frac{P(B|A)P(A)}{P(B)}\]