Conditional Probability, Bayes’ Rule, Binomial, Poisson & Normal Approximations, Hypergeometric Sampling Section 1.1: Equally Likely Outcomes 1. Outcome Space The outcome space Ω \Omega Ω (omega) is the set of all possible outcomes of an experiment.
Ω = { 1 , 2 , 3 , 4 , 5 , 6 } (rolling a die) \Omega = \{1, 2, 3, 4, 5, 6\} \quad \text{(rolling a die)} Ω = { 1 , 2 , 3 , 4 , 5 , 6 } (rolling a die) An event A A A is any subset of Ω \Omega Ω . Example: “rolling an even number”
A = { 2 , 4 , 6 } . A = \{2, 4, 6\}. A = { 2 , 4 , 6 } . When all outcomes in a finite Ω \Omega Ω are equally likely:
P ( A ) = ∣ A ∣ ∣ Ω ∣ . P(A) = \frac{|A|}{|\Omega|}. P ( A ) = ∣Ω∣ ∣ A ∣ . Key boundary values: P ( Ω ) = 1 P(\Omega) = 1 P ( Ω ) = 1 , P ( ∅ ) = 0 P(\emptyset) = 0 P ( ∅ ) = 0 .
Quick example: draw a ticket from a box of 100 tickets labeled 1 , … , 100 1, \ldots, 100 1 , … , 100 . Event “number has one digit” is A = { 1 , … , 9 } A = \{1, \ldots, 9\} A = { 1 , … , 9 } , so P ( A ) = 9 / 100 P(A) = 9/100 P ( A ) = 9/100 .
3. Counting with Pairs (Two Dice) Rolling two dice: each outcome is an ordered pair ( i , j ) (i, j) ( i , j ) where i , j ∈ { 1 , … , 6 } i, j \in \{1, \ldots, 6\} i , j ∈ { 1 , … , 6 } .
∣ Ω ∣ = 6 ⋅ 6 = 36. |\Omega| = 6 \cdot 6 = 36. ∣Ω∣ = 6 ⋅ 6 = 36. Example: “sum is 5”
A = { ( 1 , 4 ) , ( 2 , 3 ) , ( 3 , 2 ) , ( 4 , 1 ) } ⟹ P ( A ) = 4 36 = 1 9 . A = \{(1,4), (2,3), (3,2), (4,1)\} \implies P(A) = \frac{4}{36} = \frac{1}{9}. A = {( 1 , 4 ) , ( 2 , 3 ) , ( 3 , 2 ) , ( 4 , 1 )} ⟹ P ( A ) = 36 4 = 9 1 . For a general n n n -sided die, ∣ Ω ∣ = n 2 |\Omega| = n^2 ∣Ω∣ = n 2 . Number of pairs where the second number exceeds the first (above the diagonal):
∣ above ∣ = 1 + 2 + ⋯ + ( n − 1 ) = n ( n − 1 ) 2 . |\text{above}| = 1 + 2 + \cdots + (n-1) = \frac{n(n-1)}{2}. ∣ above ∣ = 1 + 2 + ⋯ + ( n − 1 ) = 2 n ( n − 1 ) . So
P ( second > first ) = n ( n − 1 ) 2 n 2 = 1 2 ( 1 − 1 n ) . P(\text{second} > \text{first}) = \frac{\frac{n(n-1)}{2}}{n^2} = \frac{1}{2}\left(1 - \frac{1}{n}\right). P ( second > first ) = n 2 2 n ( n − 1 ) = 2 1 ( 1 − n 1 ) . 4. Odds Odds in favor of A A A :
Odds in favor of A = P ( A ) 1 − P ( A ) . \text{Odds in favor of } A = \frac{P(A)}{1 - P(A)}. Odds in favor of A = 1 − P ( A ) P ( A ) . Odds against A A A is the inverse.
Example: P ( red at roulette ) = 18 / 38 P(\text{red at roulette}) = 18/38 P ( red at roulette ) = 18/38 , so odds against red are 20 : 18 20:18 20 : 18 (or 10 : 9 10:9 10 : 9 ).
5. Fair Odds Rule & House Percentage Fair Odds Rule: in a fair bet, payoff odds = chance odds.
If you bet $1 on event A A A at payoff odds r pay r_{\text{pay}} r pay to 1 against, total stake is ( r pay + 1 ) (r_{\text{pay}} + 1) ( r pay + 1 ) . A fair price for your bet would be P ( A ) ( r pay + 1 ) P(A)(r_{\text{pay}} + 1) P ( A ) ( r pay + 1 ) . House percentage:
House % = [ 1 − P ( A ) ( r pay + 1 ) ] ⋅ 100 % . \text{House \%} = \left[1 - P(A)(r_{\text{pay}} + 1)\right] \cdot 100\%. House % = [ 1 − P ( A ) ( r pay + 1 ) ] ⋅ 100%. Example (straight play at roulette): P ( A ) = 1 / 38 P(A) = 1/38 P ( A ) = 1/38 , r pay = 35 r_{\text{pay}} = 35 r pay = 35 .
House % = [ 1 − 1 38 ⋅ 36 ] ⋅ 100 % = 2 38 ⋅ 100 % = 5.26 % . \text{House \%} = \left[1 - \frac{1}{38} \cdot 36\right] \cdot 100\% = \frac{2}{38} \cdot 100\% = 5.26\%. House % = [ 1 − 38 1 ⋅ 36 ] ⋅ 100% = 38 2 ⋅ 100% = 5.26%. Interpretation: for every $1 bet, the house keeps about 5.26 cents on average.
Section 1.3: Distributions 1. Events as Sets An outcome space Ω \Omega Ω is the set of all possible outcomes. Every event is a subset of Ω \Omega Ω .
Event Set notation not A A A A c A^c A c A A A or B B B (or both)A ∪ B A \cup B A ∪ B both A A A and B B B A ∩ B A \cap B A ∩ B (or A B AB A B )A , B A, B A , B mutually exclusiveA B = ∅ AB = \emptyset A B = ∅
2. Partition Event B B B is partitioned into B 1 , … , B n B_1, \ldots, B_n B 1 , … , B n if
B = B 1 ∪ ⋯ ∪ B n , B i ∩ B j = ∅ for i ≠ j . B = B_1 \cup \cdots \cup B_n, \qquad B_i \cap B_j = \emptyset \ \text{for } i \ne j. B = B 1 ∪ ⋯ ∪ B n , B i ∩ B j = ∅ for i = j . Every outcome in B B B belongs to exactly one B i B_i B i .
3. Three Axioms of Probability A distribution on Ω \Omega Ω is a function P P P satisfying:
P ( B ) ≥ 0 , P(B) \ge 0, P ( B ) ≥ 0 , If B 1 , … , B n partition B , then P ( B ) = P ( B 1 ) + ⋯ + P ( B n ) , \text{If } B_1, \ldots, B_n \text{ partition } B, \text{ then } P(B) = P(B_1) + \cdots + P(B_n), If B 1 , … , B n partition B , then P ( B ) = P ( B 1 ) + ⋯ + P ( B n ) , P ( Ω ) = 1. P(\Omega) = 1. P ( Ω ) = 1. 4. Derived Rules Complement Rule
P ( A c ) = 1 − P ( A ) . P(A^c) = 1 - P(A). P ( A c ) = 1 − P ( A ) . (Implies P ( ∅ ) = 0 P(\emptyset) = 0 P ( ∅ ) = 0 and 0 ≤ P ( A ) ≤ 1 0 \le P(A) \le 1 0 ≤ P ( A ) ≤ 1 .)
Difference Rule. If A ⊆ B A \subseteq B A ⊆ B , then
P ( B ∩ A c ) = P ( B ) − P ( A ) , P(B \cap A^c) = P(B) - P(A), P ( B ∩ A c ) = P ( B ) − P ( A ) , since A A A and B ∩ A c B \cap A^c B ∩ A c partition B B B .
Inclusion–Exclusion (2 events)
P ( A ∪ B ) = P ( A ) + P ( B ) − P ( A B ) . P(A \cup B) = P(A) + P(B) - P(AB). P ( A ∪ B ) = P ( A ) + P ( B ) − P ( A B ) . If A , B A, B A , B mutually exclusive, then P ( A B ) = 0 P(AB) = 0 P ( A B ) = 0 and this reduces to P ( A ∪ B ) = P ( A ) + P ( B ) P(A \cup B) = P(A) + P(B) P ( A ∪ B ) = P ( A ) + P ( B ) .
Inclusion–Exclusion (3 events)
P ( A ∪ B ∪ C ) = P ( A ) + P ( B ) + P ( C ) − P ( A B ) − P ( A C ) − P ( B C ) + P ( A B C ) . P(A \cup B \cup C) = P(A) + P(B) + P(C) - P(AB) - P(AC) - P(BC) + P(ABC). P ( A ∪ B ∪ C ) = P ( A ) + P ( B ) + P ( C ) − P ( A B ) − P ( A C ) − P ( BC ) + P ( A BC ) . Quick example: 10% rich, 5% famous, 3% both ⇒ \Rightarrow ⇒ P ( rich or famous ) = 10 % + 5 % − 3 % = 12 % P(\text{rich or famous}) = 10\% + 5\% - 3\% = 12\% P ( rich or famous ) = 10% + 5% − 3% = 12% .
5. Named Distributions Bernoulli(p p p ): distribution on { 0 , 1 } \{0, 1\} { 0 , 1 } with P ( 1 ) = p P(1) = p P ( 1 ) = p , P ( 0 ) = 1 − p P(0) = 1 - p P ( 0 ) = 1 − p . (Indicator of event A A A with p = P ( A ) p = P(A) p = P ( A ) .)
Uniform on a finite set: if Ω = { 1 , 2 , … , n } \Omega = \{1, 2, \ldots, n\} Ω = { 1 , 2 , … , n } equally likely, then P ( i ) = 1 / n P(i) = 1/n P ( i ) = 1/ n and
P ( B ) = ∣ B ∣ n . P(B) = \frac{|B|}{n}. P ( B ) = n ∣ B ∣ . Uniform(a , b a, b a , b ): point picked at random from ( a , b ) (a, b) ( a , b ) ; probability proportional to length:P ( ( x , y ) ) = y − x b − a ( a < x < y < b ) . P((x, y)) = \frac{y - x}{b - a} \qquad (a < x < y < b). P (( x , y )) = b − a y − x ( a < x < y < b ) . 6. Independence (Preview) Two events A , B A, B A , B are independent if
P ( A B ) = P ( A ) P ( B ) . P(AB) = P(A)P(B). P ( A B ) = P ( A ) P ( B ) . Section 1.4: Conditional Probability and Independence 1. Conditional Probability P ( A ∣ B ) = P ( A ∩ B ) P ( B ) , P ( B ) > 0. P(A \mid B) = \frac{P(A \cap B)}{P(B)}, \qquad P(B) > 0. P ( A ∣ B ) = P ( B ) P ( A ∩ B ) , P ( B ) > 0. Equally likely outcomes version:
P ( A ∣ B ) = ∣ A ∩ B ∣ ∣ B ∣ . P(A \mid B) = \frac{|A \cap B|}{|B|}. P ( A ∣ B ) = ∣ B ∣ ∣ A ∩ B ∣ . Quick example: 3 fair coin tosses. Let A = { 2+ heads } A = \{\text{2+ heads}\} A = { 2+ heads } , H = { first toss is heads } H = \{\text{first toss is heads}\} H = { first toss is heads } . H = { h h h , h h t , h t h , h t t } H = \{hhh, hht, hth, htt\} H = { hhh , hh t , h t h , h tt } , A ∩ H = { h h h , h h t , h t h } A \cap H = \{hhh, hht, hth\} A ∩ H = { hhh , hh t , h t h } , so P ( A ∣ H ) = 3 / 4 P(A \mid H) = 3/4 P ( A ∣ H ) = 3/4 .
2. Multiplication Rule P ( A ∩ B ) = P ( A ∣ B ) P ( B ) = P ( B ∣ A ) P ( A ) . P(A \cap B) = P(A \mid B) \, P(B) = P(B \mid A) \, P(A). P ( A ∩ B ) = P ( A ∣ B ) P ( B ) = P ( B ∣ A ) P ( A ) . 3. Rule of Average Conditional Probabilities (Law of Total Probability) If B 1 , … , B n B_1, \ldots, B_n B 1 , … , B n partition Ω \Omega Ω , then
P ( A ) = ∑ i = 1 n P ( A ∣ B i ) P ( B i ) . P(A) = \sum_{i=1}^{n} P(A \mid B_i) \, P(B_i). P ( A ) = i = 1 ∑ n P ( A ∣ B i ) P ( B i ) . Quick example: P ( second card black ) P(\text{second card black}) P ( second card black ) from a 52-card deck; condition on first card color:
P ( 2nd black ) = 25 51 ⋅ 1 2 + 26 51 ⋅ 1 2 = 1 2 . P(\text{2nd black}) = \frac{25}{51} \cdot \frac{1}{2} + \frac{26}{51} \cdot \frac{1}{2} = \frac{1}{2}. P ( 2nd black ) = 51 25 ⋅ 2 1 + 51 26 ⋅ 2 1 = 2 1 . 4. Independence A A A and B B B are independent iff
P ( A ∩ B ) = P ( A ) P ( B ) . P(A \cap B) = P(A)P(B). P ( A ∩ B ) = P ( A ) P ( B ) . Equivalent statements (when probabilities are positive):
P ( A ∣ B ) = P ( A ) P(A \mid B) = P(A) P ( A ∣ B ) = P ( A ) P ( B ∣ A ) = P ( B ) P(B \mid A) = P(B) P ( B ∣ A ) = P ( B ) P ( A ∣ B ) = P ( A ∣ B c ) P(A \mid B) = P(A \mid B^c) P ( A ∣ B ) = P ( A ∣ B c ) Key facts: if A ⊥ B A \perp B A ⊥ B , then also A c ⊥ B A^c \perp B A c ⊥ B , A ⊥ B c A \perp B^c A ⊥ B c , A c ⊥ B c A^c \perp B^c A c ⊥ B c .
Mutual independence (3 events)
For events A , B , C A, B, C A , B , C :
mutual independence ⟺ { P ( A B ) = P ( A ) P ( B ) , P ( A C ) = P ( A ) P ( C ) , P ( B C ) = P ( B ) P ( C ) , P ( A B C ) = P ( A ) P ( B ) P ( C ) . \text{mutual independence} \iff \begin{cases} P(AB) = P(A)P(B), \quad P(AC) = P(A)P(C), \quad P(BC) = P(B)P(C), \\ P(ABC) = P(A)P(B)P(C). \end{cases} mutual independence ⟺ { P ( A B ) = P ( A ) P ( B ) , P ( A C ) = P ( A ) P ( C ) , P ( BC ) = P ( B ) P ( C ) , P ( A BC ) = P ( A ) P ( B ) P ( C ) . i.i.d. symmetry lemma
If X , Y X, Y X , Y are i.i.d. (independent and identically distributed), then
P ( X > Y ) = P ( Y > X ) = 1 − P ( X = Y ) 2 . P(X > Y) = P(Y > X) = \frac{1 - P(X = Y)}{2}. P ( X > Y ) = P ( Y > X ) = 2 1 − P ( X = Y ) . Series system (both must work; assume independence):
P ( W 1 ∩ W 2 ) = P ( W 1 ) P ( W 2 ) . P(W_1 \cap W_2) = P(W_1)P(W_2). P ( W 1 ∩ W 2 ) = P ( W 1 ) P ( W 2 ) . Parallel system (at least one works; assume independence):
P ( W 1 ∪ W 2 ) = 1 − P ( F 1 ) P ( F 2 ) . P(W_1 \cup W_2) = 1 - P(F_1)P(F_2). P ( W 1 ∪ W 2 ) = 1 − P ( F 1 ) P ( F 2 ) . Section 1.5: Bayes’ Rule What is Bayes’ Rule? Bayes’ Rule reverses conditional probability: from P ( A ∣ B i ) P(A \mid B_i) P ( A ∣ B i ) to P ( B i ∣ A ) P(B_i \mid A) P ( B i ∣ A ) .
For a partition B 1 , … , B n B_1, \ldots, B_n B 1 , … , B n :
P ( B i ∣ A ) = P ( A ∣ B i ) P ( B i ) ∑ j = 1 n P ( A ∣ B j ) P ( B j ) . P(B_i \mid A) = \frac{P(A \mid B_i) \, P(B_i)}{\displaystyle\sum_{j=1}^{n} P(A \mid B_j) \, P(B_j)}. P ( B i ∣ A ) = j = 1 ∑ n P ( A ∣ B j ) P ( B j ) P ( A ∣ B i ) P ( B i ) . Key Terminology Prior: P ( B i ) P(B_i) P ( B i ) Likelihood: P ( A ∣ B i ) P(A \mid B_i) P ( A ∣ B i ) Posterior: P ( B i ∣ A ) P(B_i \mid A) P ( B i ∣ A ) How to Derive It (3 steps) Multiplication rule: P ( B i ∩ A ) = P ( A ∣ B i ) P ( B i ) P(B_i \cap A) = P(A \mid B_i) \, P(B_i) P ( B i ∩ A ) = P ( A ∣ B i ) P ( B i ) Total probability: P ( A ) = ∑ i = 1 n P ( A ∣ B i ) P ( B i ) P(A) = \sum_{i=1}^{n} P(A \mid B_i) \, P(B_i) P ( A ) = ∑ i = 1 n P ( A ∣ B i ) P ( B i ) Conditional probability: P ( B i ∣ A ) = P ( B i ∩ A ) P ( A ) P(B_i \mid A) = \dfrac{P(B_i \cap A)}{P(A)} P ( B i ∣ A ) = P ( A ) P ( B i ∩ A ) Bayes’ Rule for Odds (shortcut) For two hypotheses B 1 , B 2 B_1, B_2 B 1 , B 2 :
posterior odds = prior odds × likelihood ratio , \text{posterior odds} = \text{prior odds} \times \text{likelihood ratio}, posterior odds = prior odds × likelihood ratio , P ( B 1 ∣ A ) P ( B 2 ∣ A ) = P ( B 1 ) P ( B 2 ) ⋅ P ( A ∣ B 1 ) P ( A ∣ B 2 ) . \frac{P(B_1 \mid A)}{P(B_2 \mid A)} = \frac{P(B_1)}{P(B_2)} \cdot \frac{P(A \mid B_1)}{P(A \mid B_2)}. P ( B 2 ∣ A ) P ( B 1 ∣ A ) = P ( B 2 ) P ( B 1 ) ⋅ P ( A ∣ B 2 ) P ( A ∣ B 1 ) . Quick Example Prevalence P ( D ) = 0.01 P(D) = 0.01 P ( D ) = 0.01 . Test: P ( + ∣ D ) = 0.95 P(+ \mid D) = 0.95 P ( + ∣ D ) = 0.95 , P ( + ∣ D c ) = 0.02 P(+ \mid D^c) = 0.02 P ( + ∣ D c ) = 0.02 .
P ( D ∣ + ) = ( 0.95 ) ( 0.01 ) ( 0.95 ) ( 0.01 ) + ( 0.02 ) ( 0.99 ) = 0.0095 0.0293 = 95 293 ≈ 32 % . P(D \mid +) = \frac{(0.95)(0.01)}{(0.95)(0.01) + (0.02)(0.99)} = \frac{0.0095}{0.0293} = \frac{95}{293} \approx 32\%. P ( D ∣ + ) = ( 0.95 ) ( 0.01 ) + ( 0.02 ) ( 0.99 ) ( 0.95 ) ( 0.01 ) = 0.0293 0.0095 = 293 95 ≈ 32%. Diagnostic Testing Template Let
P ( D ) = π , P ( + ∣ D ) = s , P ( + ∣ D c ) = f . P(D) = \pi, \quad P(+ \mid D) = s, \quad P(+ \mid D^c) = f. P ( D ) = π , P ( + ∣ D ) = s , P ( + ∣ D c ) = f . Then
P ( D ∣ + ) = s π s π + f ( 1 − π ) . P(D \mid +) = \frac{s\pi}{s\pi + f(1 - \pi)}. P ( D ∣ + ) = s π + f ( 1 − π ) s π . Appendix 1: Counting This appendix covers the fundamental counting rules and the standard formulas for sequences, permutations, and combinations. These are the tools behind ( n k ) \binom{n}{k} ( k n ) and thus the Binomial and Hypergeometric distributions.
Three Basic Rules 1. Correspondence Rule. If there exists a bijection f : B → C f: B \to C f : B → C , then ∣ B ∣ = ∣ C ∣ |B| = |C| ∣ B ∣ = ∣ C ∣ .
2. Addition Rule. If B 1 , … , B m B_1, \ldots, B_m B 1 , … , B m are pairwise disjoint and B = ⋃ i = 1 m B i B = \bigcup_{i=1}^{m} B_i B = ⋃ i = 1 m B i , then
∣ B ∣ = ∑ i = 1 m ∣ B i ∣ . |B| = \sum_{i=1}^{m} |B_i|. ∣ B ∣ = i = 1 ∑ m ∣ B i ∣. 3. Multiplication Rule. If an outcome is generated by k k k successive choices with n j n_j n j available options at stage j j j (independent of earlier choices), then the total number of outcomes is
n 1 ⋅ n 2 ⋯ n k = ∏ j = 1 k n j . n_1 \cdot n_2 \cdots n_k = \prod_{j=1}^{k} n_j. n 1 ⋅ n 2 ⋯ n k = j = 1 ∏ k n j . Think of it as counting paths through a decision tree.
Selection Types Selection type Order matters? Repetition allowed? Count Sequences Yes Yes n k n^k n k Permutations / Orderings Yes No ( n ) k (n)_k ( n ) k Combinations No No ( n k ) \binom{n}{k} ( k n )
Sequences (order matters, repetition allowed) Number of sequences of length k k k from n n n elements:
n k . n^k. n k . Example: number of 5-letter “words” from 26 letters = 26 5 = 26^5 = 2 6 5 .
Permutations / Orderings (order matters, no repetition) Number of orderings of k k k distinct elements chosen from n n n distinct elements:
( n ) k = n ( n − 1 ) ⋯ ( n − k + 1 ) = n ! ( n − k ) ! . (n)_k = n(n-1)\cdots(n-k+1) = \frac{n!}{(n-k)!}. ( n ) k = n ( n − 1 ) ⋯ ( n − k + 1 ) = ( n − k )! n ! . Conventions: ( n ) 0 = 1 (n)_0 = 1 ( n ) 0 = 1 , 0 ! = 1 \;0! = 1 0 ! = 1 .
Special case: when k = n k = n k = n , ( n ) n = n ! (n)_n = n! ( n ) n = n ! (the number of ways to arrange all n n n objects in a row).
Example (birthday problem): number of ways k k k people can all have different birthdays is ( 365 ) k (365)_k ( 365 ) k .
Combinations (order doesn’t matter, no repetition) Number of ways to choose a subset of size k k k from n n n elements:
( n k ) = ( n ) k k ! = n ! k ! ( n − k ) ! = n ( n − 1 ) ⋯ ( n − k + 1 ) k ( k − 1 ) ⋯ 1 . \binom{n}{k} = \frac{(n)_k}{k!} = \frac{n!}{k!(n-k)!} = \frac{n(n-1)\cdots(n-k+1)}{k(k-1)\cdots 1}. ( k n ) = k ! ( n ) k = k ! ( n − k )! n ! = k ( k − 1 ) ⋯ 1 n ( n − 1 ) ⋯ ( n − k + 1 ) . Why divide by k ! k! k ! ? Each ordering of k k k distinct elements can be decomposed as:
( n ) k = ( n k ) ⋅ k ! . (n)_k = \binom{n}{k} \cdot k!. ( n ) k = ( k n ) ⋅ k ! . Dividing by k ! k! k ! removes the effect of ordering.
Equivalent Interpretations of ( n k ) \binom{n}{k} ( k n ) Choose positions: number of ways to choose k k k positions from n n n positions.0–1 sequences: number of binary sequences of length n n n with exactly k k k ones.p p p /q q q placements: number of ways to place k k k symbols p p p and n − k n - k n − k symbols q q q in a row.These interpretations explain why ( n k ) \binom{n}{k} ( k n ) appears in the Binomial distribution: it counts the number of ways to choose which k k k trials are successes.
Useful Identities Symmetry
( n k ) = ( n n − k ) . \binom{n}{k} = \binom{n}{n-k}. ( k n ) = ( n − k n ) . Pascal’s Rule
( n k ) = ( n − 1 k − 1 ) + ( n − 1 k ) . \binom{n}{k} = \binom{n-1}{k-1} + \binom{n-1}{k}. ( k n ) = ( k − 1 n − 1 ) + ( k n − 1 ) . Sum of all subsets
∑ k = 0 n ( n k ) = 2 n . \sum_{k=0}^{n} \binom{n}{k} = 2^n. k = 0 ∑ n ( k n ) = 2 n . Binomial Theorem
( x + y ) n = ∑ k = 0 n ( n k ) x k y n − k . (x + y)^n = \sum_{k=0}^{n} \binom{n}{k} x^k y^{n-k}. ( x + y ) n = k = 0 ∑ n ( k n ) x k y n − k . Multinomial Coefficient If k 0 + k 1 + ⋯ + k m = n k_0 + k_1 + \cdots + k_m = n k 0 + k 1 + ⋯ + k m = n , the number of sequences of length n n n containing exactly k i k_i k i copies of symbol i i i is
n ! k 0 ! k 1 ! ⋯ k m ! . \frac{n!}{k_0! \, k_1! \cdots k_m!}. k 0 ! k 1 ! ⋯ k m ! n ! . Example: MISSISSIPPI has 11 letters (1 M, 4 I, 4 S, 2 P), so the number of distinct rearrangements is
11 ! 1 ! ⋅ 4 ! ⋅ 4 ! ⋅ 2 ! = 34,650. \frac{11!}{1! \cdot 4! \cdot 4! \cdot 2!} = 34{,}650. 1 ! ⋅ 4 ! ⋅ 4 ! ⋅ 2 ! 11 ! = 34 , 650. Section 2.1: The Binomial Distribution What is this about? Repeat the same experiment n n n times independently. Each trial is success w.p.(with probability) p p p , failure w.p. q = 1 − p q = 1 - p q = 1 − p .
P ( k successes in n trials ) = ( n k ) p k q n − k , P(k \text{ successes in } n \text{ trials}) = \binom{n}{k} p^k q^{\,n-k}, P ( k successes in n trials ) = ( k n ) p k q n − k , where
( n k ) = n ! k ! ( n − k ) ! = n ( n − 1 ) ⋯ ( n − k + 1 ) k ( k − 1 ) ⋯ 1 , \binom{n}{k} = \frac{n!}{k!(n-k)!} = \frac{n(n-1)\cdots(n-k+1)}{k(k-1)\cdots 1}, ( k n ) = k ! ( n − k )! n ! = k ( k − 1 ) ⋯ 1 n ( n − 1 ) ⋯ ( n − k + 1 ) , and k ∈ { 0 , 1 , … , n } k \in \{0, 1, \ldots, n\} k ∈ { 0 , 1 , … , n } .
Quick example: exactly 2 sixes in 9 die rolls (p = 1 / 6 p = 1/6 p = 1/6 , q = 5 / 6 q = 5/6 q = 5/6 ):
( 9 2 ) ( 1 6 ) 2 ( 5 6 ) 7 = 36 ⋅ 5 7 6 9 ≈ 0.279. \binom{9}{2}\left(\frac{1}{6}\right)^2\left(\frac{5}{6}\right)^7 = 36 \cdot \frac{5^7}{6^9} \approx 0.279. ( 2 9 ) ( 6 1 ) 2 ( 6 5 ) 7 = 36 ⋅ 6 9 5 7 ≈ 0.279. Useful Properties Binomial expansion (sum to 1)
∑ k = 0 n ( n k ) p k q n − k = ( p + q ) n = 1. \sum_{k=0}^{n} \binom{n}{k} p^k q^{\,n-k} = (p + q)^n = 1. k = 0 ∑ n ( k n ) p k q n − k = ( p + q ) n = 1. Fair coin special case (p = q = 1 / 2 p = q = 1/2 p = q = 1/2 ):
P ( k heads in n tosses ) = ( n k ) 2 n . P(k \text{ heads in } n \text{ tosses}) = \frac{\binom{n}{k}}{2^n}. P ( k heads in n tosses ) = 2 n ( k n ) . Consecutive Odds Ratio
R ( k ) = P ( k ) P ( k − 1 ) = n − k + 1 k ⋅ p q . R(k) = \frac{P(k)}{P(k-1)} = \frac{n - k + 1}{k} \cdot \frac{p}{q}. R ( k ) = P ( k − 1 ) P ( k ) = k n − k + 1 ⋅ q p . Start with P ( 0 ) = q n P(0) = q^n P ( 0 ) = q n , then P ( 1 ) = P ( 0 ) ⋅ R ( 1 ) P(1) = P(0) \cdot R(1) P ( 1 ) = P ( 0 ) ⋅ R ( 1 ) , etc.
Mode
m = ⌊ n p + p ⌋ = ⌊ ( n + 1 ) p ⌋ . m = \lfloor np + p \rfloor = \lfloor (n+1)p \rfloor. m = ⌊ n p + p ⌋ = ⌊( n + 1 ) p ⌋ . Probabilities increase up to m m m , then decrease after. If ( n + 1 ) p ∈ Z (n+1)p \in \mathbb{Z} ( n + 1 ) p ∈ Z , there are two modes: m m m and m − 1 m - 1 m − 1 .
Mean
μ = n p . \mu = np. μ = n p . Best-of-( 2 n − 1 ) (2n-1) ( 2 n − 1 ) series
If Team A wins each game independently with probability p p p , then
P ( A wins best-of- ( 2 n − 1 ) ) = ∑ k = n 2 n − 1 ( k − 1 n − 1 ) p n ( 1 − p ) k − n . P(\text{A wins best-of-}(2n-1)) = \sum_{k=n}^{2n-1} \binom{k-1}{n-1} p^{\,n}(1-p)^{k-n}. P ( A wins best-of- ( 2 n − 1 )) = k = n ∑ 2 n − 1 ( n − 1 k − 1 ) p n ( 1 − p ) k − n . Section 2.4: Poisson Approximation When to Use It Normal approximation to binomial is poor when n n n is large but p p p is very small (or very close to 1). Poisson approximation depends mainly on μ = n p \mu = np μ = n p .
P ( k successes ) ≈ e − μ μ k k ! , k = 0 , 1 , 2 , … , P(k \text{ successes}) \approx e^{-\mu} \frac{\mu^k}{k!}, \qquad k = 0, 1, 2, \ldots, P ( k successes ) ≈ e − μ k ! μ k , k = 0 , 1 , 2 , … , where μ = n p \mu = np μ = n p . Conditions: n n n large, p p p small, μ = n p \mu = np μ = n p moderate.
Why It Works P ( 0 ) = ( 1 − p ) n ≈ ( e − p ) n = e − n p = e − μ . P(0) = (1 - p)^n \approx (e^{-p})^n = e^{-np} = e^{-\mu}. P ( 0 ) = ( 1 − p ) n ≈ ( e − p ) n = e − n p = e − μ . Consecutive odds ratio:
R ( k ) = P ( k ) P ( k − 1 ) = n − k + 1 k ⋅ p 1 − p ≈ μ k . R(k) = \frac{P(k)}{P(k-1)} = \frac{n - k + 1}{k} \cdot \frac{p}{1 - p} \approx \frac{\mu}{k}. R ( k ) = P ( k − 1 ) P ( k ) = k n − k + 1 ⋅ 1 − p p ≈ k μ . Thus
P ( k ) = P ( 0 ) ∏ j = 1 k R ( j ) ≈ e − μ ⋅ μ k k ! . P(k) = P(0) \prod_{j=1}^{k} R(j) \approx e^{-\mu} \cdot \frac{\mu^k}{k!}. P ( k ) = P ( 0 ) j = 1 ∏ k R ( j ) ≈ e − μ ⋅ k ! μ k . The Poisson(μ \mu μ ) Distribution P μ ( k ) = e − μ μ k k ! , k = 0 , 1 , 2 , … P_\mu(k) = e^{-\mu} \frac{\mu^k}{k!}, \qquad k = 0, 1, 2, \ldots P μ ( k ) = e − μ k ! μ k , k = 0 , 1 , 2 , … and ∑ k = 0 ∞ P μ ( k ) = 1 \displaystyle\sum_{k=0}^{\infty} P_\mu(k) = 1 k = 0 ∑ ∞ P μ ( k ) = 1 .
Quick example: 200 items, 1% defective. Find P ( ≥ 2 ) P(\ge 2) P ( ≥ 2 ) . μ = n p = 200 ( 0.01 ) = 2 \mu = np = 200(0.01) = 2 μ = n p = 200 ( 0.01 ) = 2 .
P ( ≥ 2 ) = 1 − P ( 0 ) − P ( 1 ) = 1 − e − 2 2 0 0 ! − e − 2 2 1 1 ! = 1 − 3 e − 2 ≈ 0.594. P(\ge 2) = 1 - P(0) - P(1) = 1 - e^{-2}\frac{2^0}{0!} - e^{-2}\frac{2^1}{1!} = 1 - 3e^{-2} \approx 0.594. P ( ≥ 2 ) = 1 − P ( 0 ) − P ( 1 ) = 1 − e − 2 0 ! 2 0 − e − 2 1 ! 2 1 = 1 − 3 e − 2 ≈ 0.594. Section 2.2: Normal Approximation Method 1. The Normal Curve Normal density with mean μ \mu μ and standard deviation σ \sigma σ :
y = 1 2 π σ exp ( − 1 2 ( x − μ σ ) 2 ) , − ∞ < x < ∞ . y = \frac{1}{\sqrt{2\pi}\,\sigma} \exp\!\left(-\frac{1}{2}\left(\frac{x - \mu}{\sigma}\right)^2\right), \qquad -\infty < x < \infty. y = 2 π σ 1 exp ( − 2 1 ( σ x − μ ) 2 ) , − ∞ < x < ∞. μ \mu μ controls center; σ \sigma σ controls spread; total area is 1.
2. Standard Units and the Standard Normal Convert X ∼ N ( μ , σ 2 ) X \sim N(\mu, \sigma^2) X ∼ N ( μ , σ 2 ) to Z ∼ N ( 0 , 1 ) Z \sim N(0,1) Z ∼ N ( 0 , 1 ) via
z = x − μ σ . z = \frac{x - \mu}{\sigma}. z = σ x − μ . Standard normal density:
ϕ ( z ) = 1 2 π e − z 2 / 2 . \phi(z) = \frac{1}{\sqrt{2\pi}} e^{-z^2/2}. ϕ ( z ) = 2 π 1 e − z 2 /2 . 3. The Standard Normal CDF Φ ( z ) \Phi(z) Φ ( z ) Φ ( z ) = ∫ − ∞ z ϕ ( y ) d y = P ( Z ≤ z ) . \Phi(z) = \int_{-\infty}^{z} \phi(y) \, dy = P(Z \le z). Φ ( z ) = ∫ − ∞ z ϕ ( y ) d y = P ( Z ≤ z ) . Symmetry:
Φ ( − z ) = 1 − Φ ( z ) . \Phi(-z) = 1 - \Phi(z). Φ ( − z ) = 1 − Φ ( z ) . Interval probability (notation):
Φ ( a , b ) = Φ ( b ) − Φ ( a ) . \Phi(a, b) = \Phi(b) - \Phi(a). Φ ( a , b ) = Φ ( b ) − Φ ( a ) . Symmetric interval:
Φ ( − z , z ) = 2 Φ ( z ) − 1. \Phi(-z, z) = 2\Phi(z) - 1. Φ ( − z , z ) = 2Φ ( z ) − 1. Three values to memorize:
Interval Probability Φ ( − 1 , 1 ) \Phi(-1, 1) Φ ( − 1 , 1 ) ≈ 68 % \approx 68\% ≈ 68% Φ ( − 2 , 2 ) \Phi(-2, 2) Φ ( − 2 , 2 ) ≈ 95 % \approx 95\% ≈ 95% Φ ( − 3 , 3 ) \Phi(-3, 3) Φ ( − 3 , 3 ) ≈ 99.7 % \approx 99.7\% ≈ 99.7%
4. Normal Approximation to the Binomial For S n ∼ Binomial ( n , p ) S_n \sim \text{Binomial}(n, p) S n ∼ Binomial ( n , p ) , when n p q \sqrt{npq} n pq is large enough:
μ = n p , σ = n p q , q = 1 − p . \mu = np, \qquad \sigma = \sqrt{npq}, \qquad q = 1 - p. μ = n p , σ = n pq , q = 1 − p . Continuity correction:
P ( a ≤ S n ≤ b ) ≈ Φ ( b + 1 2 − μ σ ) − Φ ( a − 1 2 − μ σ ) . P(a \le S_n \le b) \approx \Phi\!\left(\frac{b + \tfrac{1}{2} - \mu}{\sigma}\right) - \Phi\!\left(\frac{a - \tfrac{1}{2} - \mu}{\sigma}\right). P ( a ≤ S n ≤ b ) ≈ Φ ( σ b + 2 1 − μ ) − Φ ( σ a − 2 1 − μ ) . Quick example: P ( S 100 = 50 ) P(S_{100} = 50) P ( S 100 = 50 ) for fair tosses. μ = 50 \mu = 50 μ = 50 , σ = 5 \sigma = 5 σ = 5 , a = b = 50 a = b = 50 a = b = 50 .
P ( 50 ) ≈ Φ ( 50.5 − 50 5 ) − Φ ( 49.5 − 50 5 ) = Φ ( 0.1 ) − Φ ( − 0.1 ) = 2 Φ ( 0.1 ) − 1 ≈ 0.0796. P(50) \approx \Phi\!\left(\frac{50.5 - 50}{5}\right) - \Phi\!\left(\frac{49.5 - 50}{5}\right) = \Phi(0.1) - \Phi(-0.1) = 2\Phi(0.1) - 1 \approx 0.0796. P ( 50 ) ≈ Φ ( 5 50.5 − 50 ) − Φ ( 5 49.5 − 50 ) = Φ ( 0.1 ) − Φ ( − 0.1 ) = 2Φ ( 0.1 ) − 1 ≈ 0.0796. 5. Square Root Law & Confidence Intervals Success count fluctuates around n p np n p on scale n \sqrt{n} n ; proportion fluctuates around p p p on scale 1 / n 1/\sqrt{n} 1/ n .
Conservative 99.99% CI for unknown p p p , given observed p ^ \hat{p} p ^ from n n n trials:
p ^ ± 2 n . \hat{p} \pm \frac{2}{\sqrt{n}}. p ^ ± n 2 . Why 99.99%? Worst-case SD of p ^ \hat{p} p ^ is σ max = 1 2 n \sigma_{\max} = \frac{1}{2\sqrt{n}} σ m a x = 2 n 1 (at p = 1 / 2 p = 1/2 p = 1/2 ). So 2 n = 4 σ max \frac{2}{\sqrt{n}} = 4\sigma_{\max} n 2 = 4 σ m a x , and ± 4 σ \pm 4\sigma ± 4 σ under normal approximation covers ≈ 99.99 % \approx 99.99\% ≈ 99.99% .
To halve the CI width, you must quadruple n n n .
Section 2.5: Random Sampling 1. Setup Population size N N N contains G G G good and B B B bad, with G + B = N G + B = N G + B = N . Sample size n = g + b n = g + b n = g + b , where g g g = # good drawn, b b b = # bad drawn.
2. Sampling WITH Replacement Draws independent; p = G / N p = G/N p = G / N . Number of good follows Binomial ( n , p ) \text{Binomial}(n, p) Binomial ( n , p ) :
P ( g good and b bad ) = ( n g ) G g B b N n = ( n g ) p g q b , q = B N . P(g \text{ good and } b \text{ bad}) = \binom{n}{g} \frac{G^g B^b}{N^n} = \binom{n}{g} p^g q^b, \quad q = \frac{B}{N}. P ( g good and b bad ) = ( g n ) N n G g B b = ( g n ) p g q b , q = N B . 3. Sampling WITHOUT Replacement Draws dependent. Hypergeometric:
P ( g good and b bad ) = ( G g ) ( B b ) ( N n ) . P(g \text{ good and } b \text{ bad}) = \frac{\binom{G}{g}\binom{B}{b}}{\binom{N}{n}}. P ( g good and b bad ) = ( n N ) ( g G ) ( b B ) . Equivalent ordered form:
P ( g good and b bad ) = ( n g ) ( G ) g ( B ) b ( N ) n , P(g \text{ good and } b \text{ bad}) = \binom{n}{g} \frac{(G)_g \, (B)_b}{(N)_n}, P ( g good and b bad ) = ( g n ) ( N ) n ( G ) g ( B ) b , where ( M ) k = M ( M − 1 ) ⋯ ( M − k + 1 ) (M)_k = M(M-1)\cdots(M-k+1) ( M ) k = M ( M − 1 ) ⋯ ( M − k + 1 ) (falling factorial, k k k factors).
4. Binomial Approximation to Hypergeometric When N , G , B N, G, B N , G , B are large relative to n n n , sampling without replacement ≈ \approx ≈ sampling with replacement:
( N ) n N n → 1 as N → ∞ , \frac{(N)_n}{N^n} \to 1 \quad \text{as } N \to \infty, N n ( N ) n → 1 as N → ∞ , so hypergeometric probability ≈ \approx ≈ binomial probability.
5. Confidence Intervals (brief) For large n n n , the sample proportion p ^ \hat{p} p ^ satisfies:
P ( p − 1 n < p ^ < p + 1 n ) ≥ 95 % . P\!\left(p - \frac{1}{\sqrt{n}} < \hat{p} < p + \frac{1}{\sqrt{n}}\right) \ge 95\%. P ( p − n 1 < p ^ < p + n 1 ) ≥ 95%. So p ^ ± 1 n \hat{p} \pm \dfrac{1}{\sqrt{n}} p ^ ± n 1 is an approximate 95% CI for unknown population proportion p p p .