Probability Space: Think probability space as a gray-scale picture such that when summed, equal to 1. And probability function extract the intensity of one pixel. An event is just a group of pixels.
Sample Space: \Omega is a set containing all possible experimental outcomes
Probability Function: \text{Pr}: \mathcal{P}(\Omega) \rightarrow [0, 1]
Uniform Distribution: (\forall l \in \Omega)(\text{Pr}[l] = \frac{1}{|\Omega|})
Event: E \subseteq \Omega
\bar{E} = \Omega - E
Disjoint events: A \cap B = \emptyset
// Exercise (Practice with events) // Exercise (Basic facts about probability)
Conditional Probability: setting all outcome that cannot happen given the event and normalize all non-zero probability to sum to 1
\text{Pr}[A | B] = \begin{cases} 0 &\text{if} A \notin B\\ \frac{\text{Pr}[A]}{\text{Pr}[B]} &\text{if} A \in B\\ \end{cases} = \frac{\text{Pr}[A \cap B]}{\text{Pr}[B]} = \frac{\text{Pr}[A] \cdot \text{Pr}[B | A]}{\text{Pr}[B]}
Union Bound: \text{Pr}[A_1 \cup A_2 \cup ... \cup A_n] \leq \text{Pr}[A_1] + \text{Pr}[A_2] + ... + \text{Pr}[A_n]
// Exercise (Conditional probability practice)
Probability of More Events:
\text{Pr}[A \cap B] = \text{Pr}[A] \cdot \text{Pr}[B | A]
\text{Pr}[A \cap B \cap C] = \text{Pr}[A] \cdot \text{Pr}[B | A] \cdot \text{Pr}[C | A \cap B]
If A_1, A_2, ..., A_n partitions \Omega, then \text{Pr}[E] = \text{A_1} \cdot \text{Pr}[E | A_1] + ... + \text{Pr}[A_n] \cdot \text{Pr}[E | A_n]
// Exercise (Practice with chain rule)
Independent: we cannot conclude and identify any true independence in the world, this leads to circular definition (can be fixed by looking at agreement in code)
Random Variable: a function X : \Omega \rightarrow \mathbb{R} transform sample space to real number. (think of it as a enum with real value)
\text{Pr}[S = x] = \text{Pr}[\{l \in \Omega | S(l) = x\}]
with random variable as a mapping, we can define expected value and describe discrite things in numerical way
Probability mass function (PMF) of X : \Omega \rightarrow \mathbb{R}: a function p_X : \mathbb{R} \rightarrow [0, 1] such that (\forall x \in \mathbb{R})(p_X(x) = \text{Pr}[X = x])
\sum_{x \in \text{range}(X)} p_X(x) = 1
(\forall S \subseteq \mathbb{R})(\text{Pr}[X \in S] = \sum_{x \in S}p_X(x))
we often "define" random variable by probability mass function (no mention of the underlying sample space)
// Exercise (Practice with random variables)
Expected Value: E[X] = \sum_{l \in \Omega} \text{Pr}[l] \cdot X(l) = \sum_{x \in \text{range}(X)} x \cdot \text{Pr}[X = x] where \text{range}(X) = \{X(l) | l \in \Omega\}
linearity expectation: E[c_1X_1 + c_2X_2 + ... + c_nX_n] = c_1E[X_1] + c_2E[X_2] + c_nE[X_n] (need not be independent)
expectation product: E[X_1X_2...X_n] = E[X_1] \cdot E[X_2] \cdot ... \cdot E[X_n]
// Exercise (Equivalence of expected value definitions) // Exercise (Practice with expected value) // Exercise (Practice with linearity of expectation)
Indicator Random Variable: maps event to random variable I_A(l) = \begin{cases} 1 & \text{if} l \in A\\ 0 & \text{otherwise}\\ \end{cases} (where A \subseteq \Omega is an event)
Practical Calculation:
// Exercise (Practice with linearity of expectation and indicators)
Markov's Inequality: Let X be non-negative random variable. Then (\forall \epsilon \geq 1)(\text{Pr}[X \geq \epsilon \cdot E[X]] \leq \frac{1}{\epsilon})
// Exercise (Practice with Markov’s inequality)
Bernoulli: x has p probability to equal 1 X ~ \text{Bernoulli}(p) such that \begin{align*} \text{Pr}[X = 1] &= p\\ \text{Pr}[X = 0] &= 1 - p\\ \end{align*}
Binomial: repeat p probability n times X ~ \text{Binomial}(n, p) such that X = X_1 + X_2 + ... + X_n where (\forall i \in [n])(X_i ~ \text{Bernoulli}(p))
observe \text{Pr}[X = i] = \begin{pmatrix}n\\ i\\\end{pmatrix}p^i(1-p)^{n-i}
observe \text{range}(X) = \{0, 1, ..., n\} and E[X] = np
// Exercise (Expectation of a Binomial random variable) // Exercise (Practice with Binomial random variable)
Geometric: how many times to get SSR X ~ \text{Geometric}(p)
\text{range}(X) = \{1, 2, 3, ..., \infty\}
\text{Pr}[X = i] = (1-p)^{i-1}p
\sum_{i = 1}^\infty \text{Pr}[X = i] = 1
E[X] = \frac{1}{p}
X = 1
while (Bernoulli(p) = 0) {
X += 1
}
// Exercise (PMF of a geometric random variable) // Exercise (Practice with geometric random variable) // Exercise (Expectation of a geometric random variable)
Some general tips:
Upper-bound \text{Pr}(A): find A \subseteq B and bound \text{Pr}(B) when event A implies event B.
Lower-bound \text{Pr}(A): find B \subseteq A and bound \text{Pr}(B).
Upper-bound \text{Pr}(A): try lower bound \text{Pr}[\bar{A}] since \text{Pr}[A] = 1 - \text{Pr}[\bar{A}]
Lower-bound \text{Pr}(A): try upper bound \text{Pr}[\bar{A}] since \text{Pr}[A] = 1 - \text{Pr}[\bar{A}]
Calculate \text{Pr}[A_1 \cap ... \cap A_n]: chain rule. If independent: \text{Pr}[A] = 1 - \text{Pr}[\bar{A}] = \text{Pr}[A_1]...\text{Pr}[A_n]
Upper-bound \text{Pr}[A_1 \cup ... \cup A_n]: union bound
(\forall i \in [n])(A_i) = A_1 \cap ... \cap A_n
(\exists i \in [n])(A_i) = A_1 \cup ... \cup A_n
Calculate \mathbb{E}[X]: using definition
Calculate \mathbb{E}[X]: write X as the sum of indicator random variables, and then using linearity of expectation.
// TODO: Check Your Understanding
Random in Computation
statistics via random sampling
distributed computing break deadlock
break prison's dilemma, achieve nash equilibria in games
cryptography: encode with no real info
error-correcting code
communication complexity: check sum does not ensure, but very close
quantum computation
GAN
Random Variables
Math Definition: a function from sample space \Omega to \mathbb{R}
CS Definition: a numerical variable in randomized code
Randomized Algorithms:
input chosen randomly: average-case analysis
algorithm make random choice: random algorithm
Deterministic Algorithm A computes f : \Sigma^* \rightarrow \Sigma^* in T(n)
correctness: (\forall x \in \Sigma^*)(A(x) = f(x))
running time: (\forall x \in \Sigma^*)(A(x) \text{ takes } \leq T(|x|) \text{ steps})
Classifying Nondeterministic Algorithms
Monte Carlo Algorithm A computes f : \Sigma^* \rightarrow \Sigma^* in T(n)
correctness: (\forall x \in \Sigma^*)(P_{\text{robability}}(A(x) \neq f(x)) \leq \epsilon)
running time: (\forall x \in \Sigma^*)(P_{\text{robability}}(A(x) \text{ takes } \leq T(|x|) \text{ steps}) = 1)
Las Vegas Algorithm A computes f : \Sigma^* \rightarrow \Sigma^* in T(n)
correctness: (\forall x \in \Sigma^*)(P_{\text{robability}}(A(x) = f(x)) = 1)
running time: (\forall x \in \Sigma^*)(\mathbb{E}_{\text{xpected}}(\text{# of steps A(x) takes} \leq T(|x|)))
// Exercise (Las Vegas to Monte Carlo) // Exercise (Monte Carlo to Las Vegas)
Contraction algorithm: iteratively merge two connected vertex together, and keep all the edges
(n-2) iterations
If you randomly guess, probability of minimum cut is exponential 2^n
this algorithm has \text{Pr}[\text{output min cut}] \geq \frac{1}{n^2}, in polynomial time, we can boost probability to 1 - \frac{1}{e^n}
// Exercise (Boosting for one-sided error) // Exercise (Boosting for two-sided error)
// Exercise (Maximum number of minimum cuts) // Exercise (Contracting two random vertices)
Randomness is computational resource, however, it is extremely hard to get true randomness.
true random: is random even given initial conditions correctly (nowadays you can buy hardware that uses quantum physics in circuitry to generate true random) - but since random sequence is undecidable, we can't prove it is random
fake random: deterministic chaos
Defining Random:
density: D(x) = \frac{\sum_i x_i}{n}
limiting density: D(\alpha) = \lim_{n \rightarrow \infty} D(\alpha, n) (we would like it to be \frac{1}{2})
a bad definition
a good definition: Per Martin-Lof's random sequence test (but this is undecidable)
Approximation: x_n = a_1x_{n-1} + a_2x_{n-2} + ... + a_kx_{n-k} \mod m
sample space: \Omega
elementary (automic) event: an event
compound event: subset of \Omega
probability: mapping Pr : \mathfrak{P}(\Omega) \rightarrow \mathbb{R}
Probability:
Boole's Inequality (Union Bound): Pr[\cup A_i] \leq Pr[A_1] + Pr[A_2] + ... + Pr[A_n] Bonferroni's Inequality: Pr[\cap A_i] \geq Pr[A_1] + Pr[A_2] + ... + Pr[A_n] - (n - 1)
Inclusion-Exclusion: \sum_{I \subseteq [n]} (-1)^{|I|} Pr[\cap_{i \in I} A_i] = 0
when n = 2, Pr[A_1 \cup A_2] - Pr[A_1] - Pr[A_2] + Pr[A_1 \cap A_2] = 0 for I = \emptyset, \{1\}, \{2\}, \{1, 2\}
Variance: Var[X] = E[(X - \mu)^2] = E[X^2] - E[X]^2
Var[aX + b] = a^2 Var[X] (b is shifting average)
Assume X, Y independent:
Uniform Distribution:
pdf: f(x) = \begin{cases} 1 / (b - a) & \text{if } a \leq x \leq b\\ 0 & \text{otherwise} \end{cases}
expectation: E[X] = (a + b) / 2
variance: Var[X] = (b - a)^2 / 12h Binomial: let X_i indicator variable, then X = X_1 + X_2 + ... + X_n, assume (\forall i) (Pr[X_i] = p)
Pr[X = k] = \begin{pmatrix} n\\ k\\ \end{pmatrix} (1- p)^{n - k}
expectation: E[X] = np
variance: Var[X] = np(1 - p)
SELECTION: given a unsorted list, find k-th element in the corresponding sorted list.
PZT(Polynomial Zero Testing): whether a polynomial with integer coefficient is always 0 (essentially checking whether coefficients are all 0 when the polynomial simplifies)
Schwartz-Zippel Lemma: let P \in \mathbb{F}[x_1, ..., x_n] (solution space) be of degree d (dimension) and S \subset \mathbb{F} a set of cardinality s. If P is not identically zero, then P has at most ds^{n-1} roots in S. (Pr[P(x) = 0] \leq d/s where x is random point in polynomial)
Primarily Testing
Table of Content