Lecture 015

Randomized Algorithm: Las Vegas

Random Algorithm

random bits -> random output, runtime
we want for all input, expected (or with high probability) runtime is small (this is different than average-case analysis where the input is random)
robust against adversaries

Deterministic Algorithm

no random bits
can be fast on most input
performance measured by worst case: there can be worst case input
total runtime decrease probability of error

Las Vegas: always produce correct answer, but with random runtime Monte-Carlo: deterministic time, but answer might be incorrect

Quicksort

Deterministic Quicksort

Deterministic Quicksort: given list $l$

pick $l[0]$ as pivot, sort everything with respect to pivot
sort left and right of pivot with quicksort

Worst Case of Deterministic is $O(n^2)$

$\begin{align*} c(n) =& n - 1 + C(n - 1)\\ =& (n - 1) + (n - 2) + ... + 1\\ \in& O(n^2)\\ \end{align*}$

Best Case of Deterministic

$\begin{align*} C(n) =& n - 1 + 2C(\frac{n}{2})\\ =& (n - 1) + 2(n/2 - 1 + 2C(n/4))\\ =& (n - 1) + (n - 2) + ... + (n - \ln n)\\ \in& O(n\ln n)\\ \end{align*}$

If we pick the pivot deterministically, we can always construct a worst-case input.

Random Quicksort

Theorem: for all input $E[C(n)] = O(n \lg n)$

Let us define $X_{ij} = \begin{cases} 1 & \text{if i is compared with j}\\ 0 & \text{otherwise}\\ \end{cases}$

$\begin{align*} E[C(n)] =& E[\sum_{i = 1}^n \sum_{j = i+1}^n X_{ij}]\\ =& \sum_{i = 1}^n \sum_{j = i+1}^n E[X_{ij}]\\ =& \sum_{i = 1}^n \sum_{j = i+1}^n Pr\{s_i \text{ ever compared with } s_j\}\\ =& \sum_{i = 1}^n \sum_{j = i+1}^n Pr\{s_i \text{ in the same subset as } s_j \text{ at some point}\} \cdot Pr\{\text{one of them is pivot}\}\\ =& \sum_{i = 1}^n \sum_{j = i+1}^n 1 \cdot \frac{2}{j - i + 1}\\ =& 2\sum_{i = 1}^{n - 1} \sum_{k = 2}^{n - i + 1} \cdot \frac{1}{k} \tag{let $k = j - i + 1$}\\ \leq& 2 \sum_{i = 1}^n \sum_{k = 2}^n \frac{1}{k}\\ \leq& 2 \sum_{i = 1}^n (1 + \ln n - 1)\\ =& 2n \ln n\\ \in& O(n \lg n)\\ \end{align*}$

Find k-th Smallest Element

Idea: find the $k$ -th smallest item

choose a pivot at position $i$
sort the list from small to large according to pivot $i$
if $k < i$ ( $i$ is the position after sorting), then throw away right side
if $k > i$ ( $i$ is the position after sorting), then throw away left side
if $k = i$ ( $i$ is the position after sorting), we found it!

Best case bound:

$\begin{align*} C(n) <& (n - 1) + C(n/2) \tag{ignore $k = i$ case}\\ =& (n - 1) + (n / 2 - 1) + C(n / 4)\\ <& n + n/2 + n/4 + n/8 + ... + 1\\ \leq& 2n \end{align*}$

Theorem: the expected number of comparison is bounded by $8n$ .

$\begin{align*} E[C(n)] \leq& (n - 1) + \sum_{i = 1}^n Pr\{\text{pivot is }s_i\} \cdot E[C(\max(i - 1, n - 1))]\\ =& (n - 1) + \sum_{i = 1}^n \frac{1}{n} \cdot E[C(\max(i - 1, n - 1))]\\ =& (n - 1) + \frac{2}{n}\sum_{i = \lfloor\frac{n}{2}\rfloor}^{n - 1} E[C(i)] \tag{expand the sum and bound it (picture it)}\\ \end{align*}$

We start using induction. Base case: $E[C(1)] = 1 \leq c \cdot 1$ . Assume $E[C(i)] \leq c \cdot i$ for some $c \geq 1$ , then:

$\begin{align*} E[C(n)] \leq& (n - 1) + \frac{2}{n}\sum_{i = \lfloor\frac{n}{2}\rfloor}^{n - 1} E[C(i)]\\ \leq& (n - 1) + \frac{2}{n} \sum_{i = \lfloor\frac{n}{2}\rfloor}^{n - 1} c \cdot i \tag{indiction hypothesis}\\ \leq& (n - 1) + \frac{2c}{n} \cdot \frac{(n - 1)+\lfloor\frac{n}{2}\rfloor}{2} \cdot (n - 1 - \lfloor\frac{n}{2}\rfloor + 1)\\ \leq& (n - 1) + \frac{2c}{n} \cdot \frac{(n - 1)+\frac{n}{2}}{2} \cdot (n - \frac{n - 1}{2})\\ =& (n - 1) + \frac{3cn}{4} + \frac{c}{4} - \frac{2c}{4n}\\ =& 7n + 1 - \frac{4}{n} \tag{plug in $c = 8$}\\ \leq& 7n + 1\\ \leq& 8n \tag{for $n \geq 1$}\\ \end{align*}$

When $k = (n + 1)/2$ (assuming $n$ is odd), then we have Randomized Median-Select algorithm.

Randomized Algorithm: Monte Carlo

Randomized Matrix-Multiplication Checking

Freivalds' Matrix Multiplication Checking Algorithm: checking $A \cdot B =^? C$ .

Choose random vector $\vec{r} = \langle{r_1, r_2, ..., r_n}\rangle$ where $r_i \in \{0, 1\}$ .
if $A(B\vec{r}) \neq C\vec{r}$ return False, else return True

Theorem: for $A \cdot B \neq C$ , the error probability is:

$Pr\{A \cdot B \cdot \vec{r} = C \cdot \vec{r}\} \leq \frac{1}{2}$

If we chose $\vec{r}$ such that $A \cdot B \cdot \vec{r} \neq C \cdot \vec{r}$ when it is really $A \cdot B \neq C$ , then we say $\vec{r}$ witness the fact $A \cdot B \neq C$ .

Proof: given $D := AB - C \neq 0$ , what is the probability $D\vec{r} = AB\vec{r} - C\vec{r} = \vec{0}$ ? Well, since $D \neq 0$ , let $d_{1, 1}$ be the non-zero entry of $D$ .

$\begin{align*} &D\vec{r} = \vec{0}\\ \implies& \sum_{j = 1}^n (d_{1, j} r_j) = 0\\ \implies& d_{1, 1}r_1 + \sum_{j = 2}^n (d_{1, j} r_j) = 0\\ \implies& r_1 = \\ \end{align*}$

If we want to pass the test by maliciously choosing $r$ , $r_1$ is determined when $r_2, r_3, ..., r_n$ is determined. Therefore, for every set of $r_2, r_3, ..., r_n$ , there is one $r_1$ that makes the malicious injection successful. Since $r_1 \in \{0, 1\}$ , the probability such that our choice $r_1$ magically equal to $-\frac{1}{d_{1, 1}} \sum_{j = 2}^n (d_{1, j} r_j)$ is $\leq \frac{1}{|\{0, 1\}|}$ .

Note the $\leq$ is used above. We could have the case that none of our choice will make us pass the test due to the chance that $-\frac{1}{d_{1, 1}} \sum_{j = 2}^n (d_{1, j} r_j) \not\in \{0, 1\}$ .

The algorithm is $\in \Theta(n^2)$ . We can boost it by running $k$ times with complexity $\in \Theta(kn^2)$ to achieve error rate $\leq \left(\frac{1}{|\{0, 1\}|}\right)^k$ . We can also boost it by choosing $\vec{r} \in \{0, 1, 2\}^n$ rather than $\vec{r} \in \{0, 1\}^n$ .

What is some of our choice of random vector $\vec{r}$ repeat in our process of boosting? This turns out is not a problem as long as we choose vectors independently. What if we have an algorithm with 2-sided error? Then we can do majority voting.

Randomized Polynomial Checking

We would like to know if a given polynomial $G(x) := x^3 - x^2 - 41x +105$ is equal to $G(X) := (x - 3)(x - 5)(x + 7)$ without simplifying them (as it is expensive $\in O(d^2)$ ).

Let $d$ be the number of root (degree) of polynomial.

Simple Random Checker:

pick a random value $r$ out of $n \cdot d$ many possible choices
if $F(r) = G(r)$ return True, otherwise return False.

The above algorithm is $\in \Theta(d)$

Proof: Let $H(x) = F(x) - G(x)$ , observe $H(x)$ has at most $d$ roots.

Given $F(x) \neq G(x)$ :

$\begin{align*} &Pr\{F(r) = G(r)\}\\ =& Pr\{H(r) = 0\}\\ \leq& \frac{d}{nd} \tag{$\leq d$ many values in choices for $r$ that satisfy}\\ =& \frac{1}{n}\\ \end{align*}$

Note the $\leq$ is used above. We could have the case that none of our choice are roots to $H(x)$ .

Again, we can boost it to achieve error rate $\leq \frac{1}{n^k}$ by running it $k$ times with complexity $\Theta(kd)$ .

Randomized Min-Cut

Cut-set: a set of edges whose removal will break the graph into two or more connected components.

The algorithm: given graph $G = \langle{E, V}\rangle$ . Let $|V| = n$ , $C$ be the set of edges represent one of those min-cuts and $|C| = k$ .

For each iteration, contract two vertice by selecting a random common edge $e \in E$ between $v_1, v_2$ .
Stop until there is only two vertice left

Since there is $n$ vertice, we need $n - 2$ iterations. We know the graph has at least $\frac{nk}{2}$ edges by edge counting since $k$ is the length of smallest cut set.

Let $E_i$ denotes the event "no edge of $C$ is selected in the $i$ -th round". We calculate $E_1$ :

$Pr\{E_1\} \geq \frac{nk/2 - k}{nk/2} = \frac{n - 2}{n}$

$Pr\{E_2 | E_1\} \geq \frac{(n - 1)k/2 - k}{(n - 1)k/2} = \frac{n - 3}{n - 1}$

$\begin{align*} Pr\{\text{none of } C \text{ contracted}\} =& Pr\{E_1\} Pr\{E_2 | E_1\} Pr\{E_3 | E_1 \cap E_2\} ... Pr\{E_{n - 2} | E_1 \cap E_2 \cap ... \cap E_{n - 3}\}\tag{by $n-2$ many contraction}\\ \geq& \frac{n - 2}{n} \frac{n - 3}{n} \frac{n - 4}{n} ...\frac{3}{5} \frac{2}{4} \frac{1}{3}\\ =& \frac{2}{n (n - 1)}\\ \end{align*}$

Therefore the probability that the algorithm doesn't output a specific min-cut is: $\leq 1 - \frac{2}{n(n - 1)}$ . It is less than a high probability. (It is unlikely it will produce min-cut)

To fix it, we run $\Theta(n^2\ln n)$ times:

$\begin{align*} &\left(1 - \frac{2}{n(n - 1)}\right)^{n^2 \ln n}\\ \leq& \left[\left(1 - \frac{2}{n(n - 1)}\right)^{n(n - 1)}\right]^{\ln n}\\ \leq& [e^{-2}]^{\ln n} \tag{by $1 - x \leq e^{-x}$}\\ =& \frac{1}{n^2}\\ \end{align*}$

Table of Content