# Lecture 013

### Probability Theory

#### Basic Definitions and Properties

Probability Space: Think probability space as a gray-scale picture such that when summed, equal to 1. And probability function extract the intensity of one pixel. An event is just a group of pixels.

• Sample Space: $\Omega$ is a set containing all possible experimental outcomes

• Probability Function: $\text{Pr}: \mathcal{P}(\Omega) \rightarrow [0, 1]$

• $\sum_{l \in \Omega} \text{Pr}[l] = 1$
• $\text{Pr}[\Omega] = 1$
• $\text{Pr}[\emptyset] = 0$

Uniform Distribution: $(\forall l \in \Omega)(\text{Pr}[l] = \frac{1}{|\Omega|})$

Event: $E \subseteq \Omega$

• $\bar{E} = \Omega - E$

• Disjoint events: $A \cap B = \emptyset$

// Exercise (Practice with events) // Exercise (Basic facts about probability)

Conditional Probability: setting all outcome that cannot happen given the event and normalize all non-zero probability to sum to 1

• $\text{Pr}[A | B] = \begin{cases} 0 &\text{if} A \notin B\\ \frac{\text{Pr}[A]}{\text{Pr}[B]} &\text{if} A \in B\\ \end{cases} = \frac{\text{Pr}[A \cap B]}{\text{Pr}[B]} = \frac{\text{Pr}[A] \cdot \text{Pr}[B | A]}{\text{Pr}[B]}$

• Union Bound: $\text{Pr}[A_1 \cup A_2 \cup ... \cup A_n] \leq \text{Pr}[A_1] + \text{Pr}[A_2] + ... + \text{Pr}[A_n]$

// Exercise (Conditional probability practice)

Probability of More Events:

• $\text{Pr}[A \cap B] = \text{Pr}[A] \cdot \text{Pr}[B | A]$

• $\text{Pr}[A \cap B \cap C] = \text{Pr}[A] \cdot \text{Pr}[B | A] \cdot \text{Pr}[C | A \cap B]$

• If $A_1, A_2, ..., A_n$ partitions $\Omega$, then $\text{Pr}[E] = \text{A_1} \cdot \text{Pr}[E | A_1] + ... + \text{Pr}[A_n] \cdot \text{Pr}[E | A_n]$

// Exercise (Practice with chain rule)

Independent: we cannot conclude and identify any true independence in the world, this leads to circular definition (can be fixed by looking at agreement in code)

\begin{align*} \text{Pr}[A | B] &= \text{Pr}[A] \tag{Undefined when $\text{Pr}[B] = 0$}\\ \text{Pr}[B | A] &= \text{Pr}[B] \tag{Undefined when $\text{Pr}[A] = 0$}\\ \text{Pr}[A \cap B] &= \text{Pr}[A] \cdot \text{Pr}[B] \tag{Definition}\\ \text{Pr}[\bigcup_{i \in [n]}A_i] &= \prod_{i \in S}\text{Pr}[A_i] \tag{for $A_1, ..., A_n$ all independent}\\ \end{align*}

#### Random Variables Basics

Random Variable: a function $X : \Omega \rightarrow \mathbb{R}$ transform sample space to real number. (think of it as a enum with real value)

• $\text{Pr}[S = x] = \text{Pr}[\{l \in \Omega | S(l) = x\}]$

• with random variable as a mapping, we can define expected value and describe discrite things in numerical way

Probability mass function (PMF) of $X : \Omega \rightarrow \mathbb{R}$: a function $p_X : \mathbb{R} \rightarrow [0, 1]$ such that $(\forall x \in \mathbb{R})(p_X(x) = \text{Pr}[X = x])$

• $\sum_{x \in \text{range}(X)} p_X(x) = 1$

• $(\forall S \subseteq \mathbb{R})(\text{Pr}[X \in S] = \sum_{x \in S}p_X(x))$

• we often "define" random variable by probability mass function (no mention of the underlying sample space)

// Exercise (Practice with random variables)

Expected Value: $E[X] = \sum_{l \in \Omega} \text{Pr}[l] \cdot X(l) = \sum_{x \in \text{range}(X)} x \cdot \text{Pr}[X = x]$ where $\text{range}(X) = \{X(l) | l \in \Omega\}$

• linearity expectation: $E[c_1X_1 + c_2X_2 + ... + c_nX_n] = c_1E[X_1] + c_2E[X_2] + c_nE[X_n]$ (need not be independent)

• expectation product: $E[X_1X_2...X_n] = E[X_1] \cdot E[X_2] \cdot ... \cdot E[X_n]$

// Exercise (Equivalence of expected value definitions) // Exercise (Practice with expected value) // Exercise (Practice with linearity of expectation)

Indicator Random Variable: maps event to random variable $I_A(l) = \begin{cases} 1 & \text{if} l \in A\\ 0 & \text{otherwise}\\ \end{cases}$ (where $A \subseteq \Omega$ is an event)

• $E[I_A] = \text{Pr}[I_A = 1] = \text{Pr}[A]$

Practical Calculation:

\begin{align*} &\mathbb{E}[X]\\ = & \mathbb{E}[\sum_j I_{E_j}] \tag{let $X = \sum_j I_{E_j}$ where $I_{E_j}$ are indicator random variables}\\ = & \sum_j \mathbb{E}[I_{E_j}]\\ = & \sum_j \text{Pr}[E_j] \tag{by $\mathbb{E}[I_{E_j}] = \text{Pr}[E_j]$}\\ \end{align*}

// Exercise (Practice with linearity of expectation and indicators)

Markov's Inequality: Let $X$ be non-negative random variable. Then $(\forall \epsilon \geq 1)(\text{Pr}[X \geq \epsilon \cdot E[X]] \leq \frac{1}{\epsilon})$

// Exercise (Practice with Markov’s inequality)

### Three Popular Random Variables

Bernoulli: x has $p$ probability to equal $1$ $X ~ \text{Bernoulli}(p)$ such that \begin{align*} \text{Pr}[X = 1] &= p\\ \text{Pr}[X = 0] &= 1 - p\\ \end{align*}

• observe $\text{range}(X) = \{0, 1\}$ and $E[X] = p$

Binomial: repeat $p$ probability $n$ times $X ~ \text{Binomial}(n, p)$ such that $X = X_1 + X_2 + ... + X_n$ where $(\forall i \in [n])(X_i ~ \text{Bernoulli}(p))$

• observe $\text{Pr}[X = i] = \begin{pmatrix}n\\ i\\\end{pmatrix}p^i(1-p)^{n-i}$

• observe $\text{range}(X) = \{0, 1, ..., n\}$ and $E[X] = np$

// Exercise (Expectation of a Binomial random variable) // Exercise (Practice with Binomial random variable)

Geometric: how many times to get SSR $X ~ \text{Geometric}(p)$

• $\text{range}(X) = \{1, 2, 3, ..., \infty\}$

• $\text{Pr}[X = i] = (1-p)^{i-1}p$

• $\sum_{i = 1}^\infty \text{Pr}[X = i] = 1$

• $E[X] = \frac{1}{p}$

X = 1
while (Bernoulli(p) = 0) {
X += 1
}


// Exercise (PMF of a geometric random variable) // Exercise (Practice with geometric random variable) // Exercise (Expectation of a geometric random variable)

Some general tips:

• Upper-bound $\text{Pr}(A)$: find $A \subseteq B$ and bound $\text{Pr}(B)$ when event $A$ implies event $B$.

• Lower-bound $\text{Pr}(A)$: find $B \subseteq A$ and bound $\text{Pr}(B)$.

• Upper-bound $\text{Pr}(A)$: try lower bound $\text{Pr}[\bar{A}]$ since $\text{Pr}[A] = 1 - \text{Pr}[\bar{A}]$

• Lower-bound $\text{Pr}(A)$: try upper bound $\text{Pr}[\bar{A}]$ since $\text{Pr}[A] = 1 - \text{Pr}[\bar{A}]$

• Calculate $\text{Pr}[A_1 \cap ... \cap A_n]$: chain rule. If independent: $\text{Pr}[A] = 1 - \text{Pr}[\bar{A}] = \text{Pr}[A_1]...\text{Pr}[A_n]$

• Upper-bound $\text{Pr}[A_1 \cup ... \cup A_n]$: union bound

• $(\forall i \in [n])(A_i) = A_1 \cap ... \cap A_n$

• $(\exists i \in [n])(A_i) = A_1 \cup ... \cup A_n$

• Calculate $\mathbb{E}[X]$: using definition

• Calculate $\mathbb{E}[X]$: write $X$ as the sum of indicator random variables, and then using linearity of expectation.

### Randomized Computation

Random in Computation

• statistics via random sampling

• break prison's dilemma, achieve nash equilibria in games

• Theorem: every game has a nash equilibrium provided players can pick a probabilistic strategy
• cryptography: encode with no real info

• error-correcting code

• communication complexity: check sum does not ensure, but very close

• quantum computation

• GAN

Random Variables

• Math Definition: a function from sample space $\Omega$ to $\mathbb{R}$

• CS Definition: a numerical variable in randomized code

Randomized Algorithms:

• input chosen randomly: average-case analysis

• algorithm make random choice: random algorithm

• $\text{RandInt}(n)$: gives $1$ to $n$ with equal probability
• $\text{Bernouli}(p)$: give $1$ with probability $p$, $0$ with $1-p$

Deterministic Algorithm $A$ computes $f : \Sigma^* \rightarrow \Sigma^*$ in $T(n)$

• correctness: $(\forall x \in \Sigma^*)(A(x) = f(x))$

• running time: $(\forall x \in \Sigma^*)(A(x) \text{ takes } \leq T(|x|) \text{ steps})$

Classifying Nondeterministic Algorithms

Monte Carlo Algorithm $A$ computes $f : \Sigma^* \rightarrow \Sigma^*$ in $T(n)$

• correctness: $(\forall x \in \Sigma^*)(P_{\text{robability}}(A(x) \neq f(x)) \leq \epsilon)$

• running time: $(\forall x \in \Sigma^*)(P_{\text{robability}}(A(x) \text{ takes } \leq T(|x|) \text{ steps}) = 1)$

Las Vegas Algorithm $A$ computes $f : \Sigma^* \rightarrow \Sigma^*$ in $T(n)$

• correctness: $(\forall x \in \Sigma^*)(P_{\text{robability}}(A(x) = f(x)) = 1)$

• running time: $(\forall x \in \Sigma^*)(\mathbb{E}_{\text{xpected}}(\text{# of steps A(x) takes} \leq T(|x|)))$

// Exercise (Las Vegas to Monte Carlo) // Exercise (Monte Carlo to Las Vegas)

### Randomized Approximation

#### Randomized Approximation for MIN-CUT

Contraction algorithm: iteratively merge two connected vertex together, and keep all the edges

• $(n-2)$ iterations

• If you randomly guess, probability of minimum cut is exponential $2^n$

• this algorithm has $\text{Pr}[\text{output min cut}] \geq \frac{1}{n^2}$, in polynomial time, we can boost probability to $1 - \frac{1}{e^n}$

// Exercise (Boosting for one-sided error) // Exercise (Boosting for two-sided error)

// Exercise (Maximum number of minimum cuts) // Exercise (Contracting two random vertices)

## Sutner's Lecture

### Random Algorithm

Randomness is computational resource, however, it is extremely hard to get true randomness.

• true random: is random even given initial conditions correctly (nowadays you can buy hardware that uses quantum physics in circuitry to generate true random) - but since random sequence is undecidable, we can't prove it is random

• fake random: deterministic chaos

Defining Random:

• density: $D(x) = \frac{\sum_i x_i}{n}$

• limiting density: $D(\alpha) = \lim_{n \rightarrow \infty} D(\alpha, n)$ (we would like it to be $\frac{1}{2}$)

1. the whole density of sequence is $\frac{1}{2}$.
2. any infinite subsequence density is $\frac{1}{2}$.
• a good definition: Per Martin-Lof's random sequence test (but this is undecidable)

Approximation: $x_n = a_1x_{n-1} + a_2x_{n-2} + ... + a_kx_{n-k} \mod m$

• sample space: $\Omega$

• elementary (automic) event: an event

• compound event: subset of $\Omega$

• probability: mapping $Pr : \mathfrak{P}(\Omega) \rightarrow \mathbb{R}$

• $Pr[A] \geq 0$
• $Pr[\Omega] = 1$
• $A \cap B = \emptyset \implies Pr[A \cup B] = Pr[A] + Pr[B]$
• mutually exclusive: $A \cap B = \emptyset$

Probability:

• Assume exclusive: $Pr[A] = \sum_{a \in A} Pr[\{a\}]$
• uniform probability: assert the probability of automic event is $Pr[\{a\}] = \frac{1}{|\Omega|}$ (as axiom)
• problem 1: probability of elementary event in continuous space is $0$
• problem 2: the sum is uncountable
• frequency: $Pr[\{a\}] = \frac{\text{# success}}{\text{# failure}}$

Boole's Inequality (Union Bound): $Pr[\cup A_i] \leq Pr[A_1] + Pr[A_2] + ... + Pr[A_n]$ Bonferroni's Inequality: $Pr[\cap A_i] \geq Pr[A_1] + Pr[A_2] + ... + Pr[A_n] - (n - 1)$

Inclusion-Exclusion: $\sum_{I \subseteq [n]} (-1)^{|I|} Pr[\cap_{i \in I} A_i] = 0$

• when $n = 2$, $Pr[A_1 \cup A_2] - Pr[A_1] - Pr[A_2] + Pr[A_1 \cap A_2] = 0$ for $I = \emptyset, \{1\}, \{2\}, \{1, 2\}$

Variance: $Var[X] = E[(X - \mu)^2] = E[X^2] - E[X]^2$

• $Var[aX + b] = a^2 Var[X]$ (b is shifting average)

• Assume $X, Y$ independent:

• $Var[X + Y] = Var[X] + Var[Y]$
• $Var[X \cdot Y] = Var[X] \cdot Var[Y]$

Uniform Distribution:

• pdf: $f(x) = \begin{cases} 1 / (b - a) & \text{if } a \leq x \leq b\\ 0 & \text{otherwise} \end{cases}$

• expectation: $E[X] = (a + b) / 2$

• variance: $Var[X] = (b - a)^2 / 12h$ Binomial: let $X_i$ indicator variable, then $X = X_1 + X_2 + ... + X_n$, assume $(\forall i) (Pr[X_i] = p)$

• $Pr[X = k] = \begin{pmatrix} n\\ k\\ \end{pmatrix} (1- p)^{n - k}$

• expectation: $E[X] = np$

• variance: $Var[X] = np(1 - p)$

SELECTION: given a unsorted list, find $k$-th element in the corresponding sorted list.

• Theorem: SELECTION can be done in linear time $5.4305 n$. (Blum, Floyd, Pratt, Rivest, Tarjan, 1973)

PZT(Polynomial Zero Testing): whether a polynomial with integer coefficient is always 0 (essentially checking whether coefficients are all 0 when the polynomial simplifies)

Schwartz-Zippel Lemma: let $P \in \mathbb{F}[x_1, ..., x_n]$ (solution space) be of degree $d$ (dimension) and $S \subset \mathbb{F}$ a set of cardinality $s$. If $P$ is not identically zero, then $P$ has at most $ds^{n-1}$ roots in S. ($Pr[P(x) = 0] \leq d/s$ where $x$ is random point in polynomial)

• you can test by plug in random input for polynomial and check if it is zero

Primarily Testing

• Quadratic Residues: $a \in \mathbb{Z}_m^*$ is quadratic residue modulo $m$ iff $x^2 \equiv a (\mod m)$ has a solution over $\mathbb{Z}_m^*$ (so $Z_m^*$ can be partitioned into quadratic residues $QR_m$ and quadratic non-residues $QNR_m$.)

Table of Content