Probability¶
Fundamentals¶
Probability theory provides the formal framework for reasoning under uncertainty.
Key notation:
\(P(X)\) — prior probability of event \(X\)
\(P(X|Y)\) — conditional probability of \(X\) given \(Y\)
\(P(\neg X) = 1 - P(X)\) — complement rule
Negation of conditionals:
The negation applies only to the query variable. You cannot negate the conditioning variable and assume probabilities sum to 1: \(P(X|Y) + P(X|\neg Y) \neq 1\) in general.
Independence¶
Two events \(X, Y\) are independent iff:
Equivalently, \(P(X|Y) = P(X)\).
Total Probability¶
For any random variable \(Y\) and a partition \(\{X_i\}\):
This is the primary tool for computing marginal probabilities when direct computation is intractable.
Bayes Rule¶
The central identity for diagnostic reasoning:
Components:
\(P(A)\) — prior (belief before evidence)
\(P(B|A)\) — likelihood (probability of evidence given cause)
\(P(B)\) — marginal likelihood (normalizer)
\(P(A|B)\) — posterior (updated belief after evidence)
Bayes rule converts diagnostic reasoning (evidence → cause) into causal reasoning (cause → evidence), corrected by the prior and normalized by the evidence.
The normalizer is typically expanded via total probability:
Conditional Probability Example¶
Given:
P(X) = 0.03
P(¬Y | X) = 0.01
P(Y | ¬X) = 0.1
P(X | Y) = ?
P(Y | X) = 1 - P(¬Y | X) = 1 - 0.01 = 0.99
P(Y) = P(Y|X)·P(X) + P(Y|¬X)·P(¬X)
= 0.99 × 0.03 + 0.1 × 0.97
= 0.1267
P(X | Y) = (0.99 × 0.03) / 0.1267
= 0.2344
Cancer Test Example¶
Setup: \(P(C) = 0.01\), \(P(+|C) = 0.9\), \(P(+|\neg C) = 0.2\).
P(C | +) = P(+ | C) · P(C) / P(+)
= (0.9 × 0.01) / (0.009 + 0.198)
= 0.009 / 0.207
= 0.0435
Despite a 90% sensitive test, the posterior \(P(C|+) \approx 0.043\) because the prior \(P(C) = 0.01\) is very small — the base rate dominates.
Bayes Networks Overview¶
A Bayes network is a compact representation of a joint probability distribution over many variables. It encodes conditional independence relationships as a directed acyclic graph (DAG).
Nodes represent random variables
Edges represent direct probabilistic dependencies
Each node stores a conditional probability table (CPT) given its parents
Given observed evidence variables, we can compute posterior distributions over unobserved query variables using the network structure.