Lecture 013

Ada's Lecture

Probability Theory

Basic Definitions and Properties

Probability Space: Think probability space as a gray-scale picture such that when summed, equal to 1. And probability function extract the intensity of one pixel. An event is just a group of pixels.

Sample Space: $\Omega$ is a set containing all possible experimental outcomes
Probability Function: $\text{Pr}: \mathcal{P}(\Omega) \rightarrow [0, 1]$
- $\sum_{l \in \Omega} \text{Pr}[l] = 1$
- $\text{Pr}[\Omega] = 1$
- $\text{Pr}[\emptyset] = 0$

Uniform Distribution: $(\forall l \in \Omega)(\text{Pr}[l] = \frac{1}{|\Omega|})$

Event: $E \subseteq \Omega$

$\bar{E} = \Omega - E$
Disjoint events: $A \cap B = \emptyset$

// Exercise (Practice with events) // Exercise (Basic facts about probability)

Conditional Probability: setting all outcome that cannot happen given the event and normalize all non-zero probability to sum to 1

$\text{Pr}[A | B] = \begin{cases} 0 &\text{if} A \notin B\\ \frac{\text{Pr}[A]}{\text{Pr}[B]} &\text{if} A \in B\\ \end{cases} = \frac{\text{Pr}[A \cap B]}{\text{Pr}[B]} = \frac{\text{Pr}[A] \cdot \text{Pr}[B | A]}{\text{Pr}[B]}$
Conditional Probability
Normalization 1
Normalization 2
Union Bound: $\text{Pr}[A_1 \cup A_2 \cup ... \cup A_n] \leq \text{Pr}[A_1] + \text{Pr}[A_2] + ... + \text{Pr}[A_n]$

// Exercise (Conditional probability practice)

Probability of More Events:

$\text{Pr}[A \cap B] = \text{Pr}[A] \cdot \text{Pr}[B | A]$
$\text{Pr}[A \cap B \cap C] = \text{Pr}[A] \cdot \text{Pr}[B | A] \cdot \text{Pr}[C | A \cap B]$
Law of Total Probability
If $A_1, A_2, ..., A_n$ partitions $\Omega$ , then $\text{Pr}[E] = \text{A_1} \cdot \text{Pr}[E | A_1] + ... + \text{Pr}[A_n] \cdot \text{Pr}[E | A_n]$

// Exercise (Practice with chain rule)

Independent: we cannot conclude and identify any true independence in the world, this leads to circular definition (can be fixed by looking at agreement in code)

$\begin{align*} \text{Pr}[A | B] &= \text{Pr}[A] \tag{Undefined when $\text{Pr}[B] = 0$}\\ \text{Pr}[B | A] &= \text{Pr}[B] \tag{Undefined when $\text{Pr}[A] = 0$}\\ \text{Pr}[A \cap B] &= \text{Pr}[A] \cdot \text{Pr}[B] \tag{Definition}\\ \text{Pr}[\bigcup_{i \in [n]}A_i] &= \prod_{i \in S}\text{Pr}[A_i] \tag{for $A_1, ..., A_n$ all independent}\\ \end{align*}$

Random Variables Basics

Random Variable: a function $X : \Omega \rightarrow \mathbb{R}$ transform sample space to real number. (think of it as a enum with real value)

Example of Random Variable 1
Example of Random Variable 2
$\text{Pr}[S = x] = \text{Pr}[\{l \in \Omega | S(l) = x\}]$
with random variable as a mapping, we can define expected value and describe discrite things in numerical way

Probability mass function (PMF) of $X : \Omega \rightarrow \mathbb{R}$ : a function $p_X : \mathbb{R} \rightarrow [0, 1]$ such that $(\forall x \in \mathbb{R})(p_X(x) = \text{Pr}[X = x])$

$\sum_{x \in \text{range}(X)} p_X(x) = 1$
$(\forall S \subseteq \mathbb{R})(\text{Pr}[X \in S] = \sum_{x \in S}p_X(x))$
we often "define" random variable by probability mass function (no mention of the underlying sample space)

// Exercise (Practice with random variables)

Expected Value: $E[X] = \sum_{l \in \Omega} \text{Pr}[l] \cdot X(l) = \sum_{x \in \text{range}(X)} x \cdot \text{Pr}[X = x]$ where $\text{range}(X) = \{X(l) | l \in \Omega\}$

linearity expectation: $E[c_1X_1 + c_2X_2 + ... + c_nX_n] = c_1E[X_1] + c_2E[X_2] + c_nE[X_n]$ (need not be independent)
expectation product: $E[X_1X_2...X_n] = E[X_1] \cdot E[X_2] \cdot ... \cdot E[X_n]$

// Exercise (Equivalence of expected value definitions) // Exercise (Practice with expected value) // Exercise (Practice with linearity of expectation)

Indicator Random Variable: maps event to random variable $I_A(l) = \begin{cases} 1 & \text{if} l \in A\\ 0 & \text{otherwise}\\ \end{cases}$ (where $A \subseteq \Omega$ is an event)

$E[I_A] = \text{Pr}[I_A = 1] = \text{Pr}[A]$

Practical Calculation:

$\begin{align*} &\mathbb{E}[X]\\ = & \mathbb{E}[\sum_j I_{E_j}] \tag{let $X = \sum_j I_{E_j}$ where $I_{E_j}$ are indicator random variables}\\ = & \sum_j \mathbb{E}[I_{E_j}]\\ = & \sum_j \text{Pr}[E_j] \tag{by $\mathbb{E}[I_{E_j}] = \text{Pr}[E_j]$}\\ \end{align*}$

// Exercise (Practice with linearity of expectation and indicators)

Markov's Inequality: Let $X$ be non-negative random variable. Then $(\forall \epsilon \geq 1)(\text{Pr}[X \geq \epsilon \cdot E[X]] \leq \frac{1}{\epsilon})$

// Exercise (Practice with Markov’s inequality)

Three Popular Random Variables

Bernoulli: x has $p$ probability to equal $1$ $X ~ \text{Bernoulli}(p)$ such that $\begin{align*} \text{Pr}[X = 1] &= p\\ \text{Pr}[X = 0] &= 1 - p\\ \end{align*}$

observe $\text{range}(X) = \{0, 1\}$ and $E[X] = p$

Binomial: repeat $p$ probability $n$ times $X ~ \text{Binomial}(n, p)$ such that $X = X_1 + X_2 + ... + X_n$ where $(\forall i \in [n])(X_i ~ \text{Bernoulli}(p))$

observe $\text{Pr}[X = i] = \begin{pmatrix}n\\ i\\\end{pmatrix}p^i(1-p)^{n-i}$
observe $\text{range}(X) = \{0, 1, ..., n\}$ and $E[X] = np$

// Exercise (Expectation of a Binomial random variable) // Exercise (Practice with Binomial random variable)

Geometric: how many times to get SSR $X ~ \text{Geometric}(p)$

$\text{range}(X) = \{1, 2, 3, ..., \infty\}$
$\text{Pr}[X = i] = (1-p)^{i-1}p$
$\sum_{i = 1}^\infty \text{Pr}[X = i] = 1$
$E[X] = \frac{1}{p}$

X = 1
while (Bernoulli(p) = 0) {
  X += 1
}

// Exercise (PMF of a geometric random variable) // Exercise (Practice with geometric random variable) // Exercise (Expectation of a geometric random variable)

Some general tips:

Upper-bound $\text{Pr}(A)$ : find $A \subseteq B$ and bound $\text{Pr}(B)$ when event $A$ implies event $B$ .
Lower-bound $\text{Pr}(A)$ : find $B \subseteq A$ and bound $\text{Pr}(B)$ .
Upper-bound $\text{Pr}(A)$ : try lower bound $\text{Pr}[\bar{A}]$ since $\text{Pr}[A] = 1 - \text{Pr}[\bar{A}]$
Lower-bound $\text{Pr}(A)$ : try upper bound $\text{Pr}[\bar{A}]$ since $\text{Pr}[A] = 1 - \text{Pr}[\bar{A}]$
Calculate $\text{Pr}[A_1 \cap ... \cap A_n]$ : chain rule. If independent: $\text{Pr}[A] = 1 - \text{Pr}[\bar{A}] = \text{Pr}[A_1]...\text{Pr}[A_n]$
Upper-bound $\text{Pr}[A_1 \cup ... \cup A_n]$ : union bound
$(\forall i \in [n])(A_i) = A_1 \cap ... \cap A_n$
$(\exists i \in [n])(A_i) = A_1 \cup ... \cup A_n$
Calculate $\mathbb{E}[X]$ : using definition
Calculate $\mathbb{E}[X]$ : write $X$ as the sum of indicator random variables, and then using linearity of expectation.

// TODO: Check Your Understanding

Randomized Computation

Random in Computation

statistics via random sampling
distributed computing break deadlock
break prison's dilemma, achieve nash equilibria in games
- Theorem: every game has a nash equilibrium provided players can pick a probabilistic strategy
cryptography: encode with no real info
error-correcting code
communication complexity: check sum does not ensure, but very close
quantum computation
GAN

Random Variables

Math Definition: a function from sample space $\Omega$ to $\mathbb{R}$
CS Definition: a numerical variable in randomized code

Randomized Algorithms:

input chosen randomly: average-case analysis
algorithm make random choice: random algorithm
- $\text{RandInt}(n)$ : gives $1$ to $n$ with equal probability
- $\text{Bernouli}(p)$ : give $1$ with probability $p$ , $0$ with $1-p$

Deterministic Algorithm

$A$ computes

$f : \Sigma^* \rightarrow \Sigma^*$ in

$T(n)$

correctness: $(\forall x \in \Sigma^*)(A(x) = f(x))$
running time: $(\forall x \in \Sigma^*)(A(x) \text{ takes } \leq T(|x|) \text{ steps})$

Classifying Nondeterministic Algorithms

Nondeterministic algorithm should gamble correctness and run-time

randomized algorithm prohibits adversary

Monte Carlo Algorithm

$A$ computes

$f : \Sigma^* \rightarrow \Sigma^*$ in

$T(n)$

correctness: $(\forall x \in \Sigma^*)(P_{\text{robability}}(A(x) \neq f(x)) \leq \epsilon)$
running time: $(\forall x \in \Sigma^*)(P_{\text{robability}}(A(x) \text{ takes } \leq T(|x|) \text{ steps}) = 1)$

Las Vegas Algorithm

$A$ computes

$f : \Sigma^* \rightarrow \Sigma^*$ in

$T(n)$

correctness: $(\forall x \in \Sigma^*)(P_{\text{robability}}(A(x) = f(x)) = 1)$
running time: $(\forall x \in \Sigma^*)(\mathbb{E}_{\text{xpected}}(\text{# of steps A(x) takes} \leq T(|x|)))$

// Exercise (Las Vegas to Monte Carlo) // Exercise (Monte Carlo to Las Vegas)

Randomized Approximation

Randomized Approximation for MAX-CUT

Randomized Approximation for MIN-CUT

Contraction algorithm: iteratively merge two connected vertex together, and keep all the edges

$(n-2)$ iterations
If you randomly guess, probability of minimum cut is exponential $2^n$
this algorithm has $\text{Pr}[\text{output min cut}] \geq \frac{1}{n^2}$ , in polynomial time, we can boost probability to $1 - \frac{1}{e^n}$
MIN-CUT probability proof
Running the algorithm n^3 times

BPP: polynomial solvable using randomized algorithms

// Exercise (Boosting for one-sided error) // Exercise (Boosting for two-sided error)

// Exercise (Maximum number of minimum cuts) // Exercise (Contracting two random vertices)

Sutner's Lecture

Random Algorithm

Randomness is computational resource, however, it is extremely hard to get true randomness.

true random: is random even given initial conditions correctly (nowadays you can buy hardware that uses quantum physics in circuitry to generate true random) - but since random sequence is undecidable, we can't prove it is random
fake random: deterministic chaos

Defining Random:

density: $D(x) = \frac{\sum_i x_i}{n}$
limiting density: $D(\alpha) = \lim_{n \rightarrow \infty} D(\alpha, n)$ (we would like it to be $\frac{1}{2}$ )
a bad definition
1. the whole density of sequence is $\frac{1}{2}$ .
2. any infinite subsequence density is $\frac{1}{2}$ .
a good definition: Per Martin-Lof's random sequence test (but this is undecidable)

Approximation: $x_n = a_1x_{n-1} + a_2x_{n-2} + ... + a_kx_{n-k} \mod m$

sample space: $\Omega$
elementary (automic) event: an event
compound event: subset of $\Omega$
probability: mapping $Pr : \mathfrak{P}(\Omega) \rightarrow \mathbb{R}$
- $Pr[A] \geq 0$
- $Pr[\Omega] = 1$
- $A \cap B = \emptyset \implies Pr[A \cup B] = Pr[A] + Pr[B]$
- mutually exclusive: $A \cap B = \emptyset$

Probability:

Assume exclusive: $Pr[A] = \sum_{a \in A} Pr[\{a\}]$
- uniform probability: assert the probability of automic event is $Pr[\{a\}] = \frac{1}{|\Omega|}$ (as axiom)
  - problem 1: probability of elementary event in continuous space is $0$
  - problem 2: the sum is uncountable
- frequency: $Pr[\{a\}] = \frac{\text{# success}}{\text{# failure}}$

Boole's Inequality (Union Bound): $Pr[\cup A_i] \leq Pr[A_1] + Pr[A_2] + ... + Pr[A_n]$ Bonferroni's Inequality: $Pr[\cap A_i] \geq Pr[A_1] + Pr[A_2] + ... + Pr[A_n] - (n - 1)$

Inclusion-Exclusion: $\sum_{I \subseteq [n]} (-1)^{|I|} Pr[\cap_{i \in I} A_i] = 0$

when $n = 2$ , $Pr[A_1 \cup A_2] - Pr[A_1] - Pr[A_2] + Pr[A_1 \cap A_2] = 0$ for $I = \emptyset, \{1\}, \{2\}, \{1, 2\}$
Corollary

Variance: $Var[X] = E[(X - \mu)^2] = E[X^2] - E[X]^2$

$Var[aX + b] = a^2 Var[X]$ (b is shifting average)
Assume $X, Y$ independent:
- $Var[X + Y] = Var[X] + Var[Y]$
- $Var[X \cdot Y] = Var[X] \cdot Var[Y]$

Uniform Distribution:

pdf: $f(x) = \begin{cases} 1 / (b - a) & \text{if } a \leq x \leq b\\ 0 & \text{otherwise} \end{cases}$
expectation: $E[X] = (a + b) / 2$
variance: $Var[X] = (b - a)^2 / 12h$ Binomial: let $X_i$ indicator variable, then $X = X_1 + X_2 + ... + X_n$ , assume $(\forall i) (Pr[X_i] = p)$
$Pr[X = k] = \begin{pmatrix} n\\ k\\ \end{pmatrix} (1- p)^{n - k}$
expectation: $E[X] = np$
variance: $Var[X] = np(1 - p)$

SELECTION: given a unsorted list, find $k$ -th element in the corresponding sorted list.

Theorem: SELECTION can be done in linear time $5.4305 n$ . (Blum, Floyd, Pratt, Rivest, Tarjan, 1973)

PZT(Polynomial Zero Testing): whether a polynomial with integer coefficient is always 0 (essentially checking whether coefficients are all 0 when the polynomial simplifies)

Schwartz-Zippel Lemma: let $P \in \mathbb{F}[x_1, ..., x_n]$ (solution space) be of degree $d$ (dimension) and $S \subset \mathbb{F}$ a set of cardinality $s$ . If $P$ is not identically zero, then $P$ has at most $ds^{n-1}$ roots in S. ( $Pr[P(x) = 0] \leq d/s$ where $x$ is random point in polynomial)

you can test by plug in random input for polynomial and check if it is zero

Primarily Testing

Quadratic Residues: $a \in \mathbb{Z}_m^*$ is quadratic residue modulo $m$ iff $x^2 \equiv a (\mod m)$ has a solution over $\mathbb{Z}_m^*$ (so $Z_m^*$ can be partitioned into quadratic residues $QR_m$ and quadratic non-residues $QNR_m$ .)

Table of Content