Probability Foundations: From Counting to Continuous Distributions
Today
Probability Foundations: From Counting to Continuous Distributions
Distribution Reference
Discrete
Name
Range
P(X=k)
Mean
Variance
Bernoulli(p)
{0,1}
P(1)=p;P(0)=1−p
p
p(1−p)
Binomial(n,p)
{0,1,…,n}
(kn)pk(1−p)n−k
np
np(1−p)
Hypergeometric(n,N,G)
{0,…,n}
(kG)(n−kN−G)/(nN)
nG/N
n(G/N)(1−G/N)(N−n)/(N−1)
Geometric(p)
{1,2,3,…}
(1−p)k−1p
1/p
(1−p)/p2
Geometric(p)
{0,1,2,…}
(1−p)kp
(1−p)/p
(1−p)/p2
Neg. Binomial(r,p)
{0,1,2,…}
(r−1k+r−1)pr(1−p)k
r(1−p)/p
r(1−p)/p2
Poisson(μ)
{0,1,2,…}
e−μμk/k!
μ
μ
Continuous
Name
Range
Density f(x)
CDF F(x)
Mean
Variance
Uniform(a,b)
(a,b)
1/(b−a)
(x−a)/(b−a)
(a+b)/2
(b−a)2/12
Normal(μ,σ2)
(−∞,∞)
(2πσ2)−1/2e−(x−μ)2/2σ2
Φ(σx−μ)
μ
σ2
Exponential(λ)
(0,∞)
λe−λx
1−e−λx
1/λ
1/λ2
Gamma(r,λ)
(0,∞)
Γ(r)λrxr−1e−λx
1−e−λx∑k=0r−1k!(λx)k
r/λ
r/λ2
1. Foundations of Probability
1.1 Core Rules
Complement rule.P(Ac)=1−P(A). Any “at least one” problem is best attacked via complement.
Inclusion-exclusion. For two events:
P(A∪B)=P(A)+P(B)−P(A∩B).
For three events:
P(A∪B∪C)=∑iP(Ai)−∑i<jP(Ai∩Aj)+P(A∩B∩C).
Independence. Events A,B are independent iff P(A∩B)=P(A)P(B). Independence and disjointness are entirely different concepts: disjoint events with positive probability are never independent.
1.2 Conditional Probability and Bayes’ Rule
P(A∣B)=P(B)P(A∩B),P(A∣B)=P(B)P(B∣A)P(A).
Law of total probability. If {Ak} partitions the sample space:
P(B)=∑kP(B∣Ak)P(Ak).
A useful distinction from the review lectures: a forward conditional (cause given, find effect) needs no Bayes’ rule, while a backward conditional (effect given, find cause) does.
1.3 Counting Techniques
Multinomial coefficient. The number of ways to partition n objects into groups of sizes n1,…,nm:
(n1,n2,…,nmn)=n1!n2!⋯nm!n!,∑ni=n.
Slot method. To count arrangements with adjacency constraints: fix one type first, then place the other into the resulting gaps. Example: 8 coin tosses with 3 heads given. Place 5 tails to create 6 slots, then choose 3 for heads. Probability of non-consecutive heads:
(38)(36).
Conditional sample space. When conditioning restricts the outcome space, recompute as if the restricted space is the full space. Example: 20-sided die given no face exceeds 12 becomes a uniform 12-sided die.
1.4 Conditional Distributions
Poisson conditional on sum. If X1∼Pois(μ1), X2∼Pois(μ2) are independent, then:
X1∣X1+X2=m∼Bin(m,μ1+μ2μ1).
Interpretation: given m total arrivals in a Poisson process with two types, the count of type 1 is Binomial with success probability equal to the rate proportion.
2. Discrete Distribution Patterns
2.1 Distribution Selection
Scenario
Distribution
Fixed n independent trials, count successes
Binomial(n,p)
Without replacement from finite population, count successes
Hypergeometric(n,N,G)
Independent trials until first success
Geometric(p)
Independent trials until rth success (count failures)
Neg. Binomial(r,p)
Count of events in a Poisson process interval
Poisson(λt)
n large, p small, np moderate
Poisson(μ=np) approx to Binomial
N≫n
Binomial(n,G/N) approx to Hypergeometric
2.2 Approximation Methods
Normal approximation to Binomial. When np and n(1−p) are both large:
X∼Bin(n,p)≈N(np,np(1−p)).
Continuity correction for integer-valued X:
P(X=k)≈Φ(σk+0.5−μ)−Φ(σk−0.5−μ).
Poisson approximation to Binomial. When n is large, p is small, and μ=np is moderate:
Bin(n,p)≈Pois(μ).
2.3 Coupon Collector Pattern
Collecting n distinct types drawn uniformly at random. After i types collected, the wait for type i+1 is:
Wi∼Geom(nn−i),i=0,1,…,n−1.
The waiting times W0,W1,…,Wn−1 are mutually independent, so:
Var(∑Wi)=∑Var(Wi).
3. Expectation and Variance
3.1 Expectation
E[X]=∑xxP(X=x).
LOTUS (Law of the Unconscious Statistician):
E[g(X)]=∑xg(x)P(X=x).
Note that E[g(X)]=g(E[X]) in general.
Linearity. For any random variables (no independence required):
E[∑i=1naiXi+b]=∑i=1naiE[Xi]+b.
3.2 LOTUS with Poisson Series Manipulation
For X∼Pois(μ) and g(x)=(x+1)(x+2)⋯(x+n)1, find E[g(X)].
Step 1. Apply LOTUS:
E[g(X)]=∑k=0∞(k+1)(k+2)⋯(k+n)1⋅k!e−μμk.
Step 2. Combine denominators: (k+1)(k+2)⋯(k+n)⋅k!=(k+n)!, so:
=∑k=0∞(k+n)!e−μμk.
Step 3. Multiply and divide by μn, substitute j=k+n:
=μn1∑j=n∞j!e−μμj=μn1P(X≥n).
Result:
E[g(X)]=μn1[1−j=0∑n−1j!e−μμj]
The key recognition: ∑j=n∞e−μμj/j! is the upper tail of the Poisson CDF.
3.3 Indicator Method
If X counts how many of the events A1,…,An occur, decompose X=I1+⋯+In with Ij=1(Aj). Then:
E[X]=∑j=1nP(Aj).
No independence is required. This is the single most common technique on exams.
3.4 Variance
Var(X)=E[X2]−(E[X])2,Var(aX+b)=a2Var(X).
For independent X,Y: Var(X+Y)=Var(X)+Var(Y).
For general (possibly dependent) X,Y: Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y).
since Ii2=Ii and IiIj=IAi∩Aj. The cross terms P(Ai∩Aj) may differ depending on the structural relationship between indicators (e.g., adjacent vs. non-adjacent pairs).
4. Probability Inequalities
4.1 Markov’s Inequality
For X≥0:
P(X≥a)≤aE[X].
When X takes values in [c,d], apply Markov to X−c≥0:
P(X≥a)=P(X−c≥a−c)≤a−cE[X]−c.
4.2 Chebyshev’s Inequality
For any random variable X with finite variance:
P(∣X−μ∣≥kσ)≤k21.
Equivalently: P(∣X−μ∣≥a)≤Var(X)/a2.
One-sided bound.{X≥a}⊆{∣X−μ∣≥a−μ} for a>μ, so:
P(X≥a)≤(a−μ)2Var(X).
Tighter bound with side information. Chebyshev bounds both tails simultaneously:
P(X≥a)+P(X≤b)≤k21,
so if a lower bound on P(X≤b) is known, it can be subtracted.
Symmetry refinement. If X is symmetric about μ, then P(X≥μ+a)=P(X≤μ−a), and the Chebyshev bound can be halved:
P(X≥μ+a)≤21⋅a2Var(X).
5. Continuous Distributions
5.1 PDF and CDF
For a continuous random variable X with density f and CDF F:
P(a≤X≤b)=∫abf(x)dx=F(b)−F(a),f(x)=F′(x).
Note that f(x) is a density, not a probability. Its units are 1/(units of X). Also, P(X=a)=0 for any specific value a.
5.2 Exponential Distribution
The exponential distribution Exp(λ) has the survival function:
P(X>t)=e−λt,t>0.
Memoryless property:
P(X>s+t∣X>s)=P(X>t).
This is the defining characteristic: the conditional distribution of remaining lifetime, given survival to time s, is the same as the original distribution.
5.3 Minimum and Maximum of Independent Variables
Minimum.M=min(X1,…,Xn), all independent:
P(M>t)=∏i=1nP(Xi>t).
Competing exponentials. If Xi∼Exp(λi) independently:
min(X1,…,Xn)∼Exp(λ1+⋯+λn).
Maximum.Y=max(X1,…,Xn), all independent:
P(Y≤t)=∏i=1nP(Xi≤t).
6. Poisson Process and Gamma Distribution
6.1 Poisson Process
For a Poisson process with rate λ:
N(t)∼Pois(λt),Tr∼Gamma(r,λ),
where N(t) is the count in [0,t] and Tr is the rth arrival time.
Month indexing convention. January =[0,1], February =[1,2], …, so month k occupies [k−1,k].
6.2 Gamma-Poisson Duality
The Gamma CDF equals the Poisson upper tail:
P(Tr≤t)=P(N(t)≥r)=1−∑k=0r−1k!e−λt(λt)k.
This identity converts between arrival time probabilities and count probabilities.
6.3 Memoryless Property in Poisson Processes
After observing the process up to time s, the remaining time until the next arrival is Exp(λ), independent of the history. This follows from the memoryless property of the exponential and the independent increments of the Poisson process.
6.4 Poisson Superposition and Thinning
If X1∼Pois(μ1) and X2∼Pois(μ2) are independent:
X1+X2∼Pois(μ1+μ2).
Conversely, in a Poisson process with rate λ where each arrival is independently classified as type 1 with probability p, the type 1 arrivals form a Poisson process with rate λp.
Pattern Recognition Table
Signal
Technique
”Upper/lower bound”, “at least/most” with limited info
Markov or Chebyshev
”E[W]” counting pairs, matches, or occurrences
Indicator decomposition + linearity
”Var(X)” with indicator structure
E[X2] via ∑E[Ii2]+∑i=jE[IiIj]
“With replacement” + count
Binomial
”Without replacement” + count
Hypergeometric
Sequential random experiment (urn → draw)
Law of total probability
”First to achieve” alternating game
Geometric series
Poisson process, arrival time of rth event
Gamma-Poisson duality
Min of independent lifetimes
Multiply survival functions
g(x) with factorial-like denominator, X∼Pois
LOTUS → combine denominators → Poisson tail
n large, p moderate
Normal approximation + continuity correction
n large, p small, np moderate
Poisson approximation
References
[1] J. Pitman, Probability, corrected 6th printing. New York, NY: Springer, 1997.
[2] A. Lucas, STAT 134 Spring 2026 Lecture Notes 1—22, UC Berkeley.