Lecture 015

Randomized Algorithm: Las Vegas

Random Algorithm

Deterministic Algorithm

Las Vegas: always produce correct answer, but with random runtime Monte-Carlo: deterministic time, but answer might be incorrect

Quicksort

Deterministic Quicksort

Deterministic Quicksort: given list l

  1. pick l[0] as pivot, sort everything with respect to pivot
  2. sort left and right of pivot with quicksort

Worst Case of Deterministic is O(n^2)

\begin{align*} c(n) =& n - 1 + C(n - 1)\\ =& (n - 1) + (n - 2) + ... + 1\\ \in& O(n^2)\\ \end{align*}

Best Case of Deterministic

\begin{align*} C(n) =& n - 1 + 2C(\frac{n}{2})\\ =& (n - 1) + 2(n/2 - 1 + 2C(n/4))\\ =& (n - 1) + (n - 2) + ... + (n - \ln n)\\ \in& O(n\ln n)\\ \end{align*}

If we pick the pivot deterministically, we can always construct a worst-case input.

Random Quicksort

Theorem: for all input E[C(n)] = O(n \lg n)

Let us define X_{ij} = \begin{cases} 1 & \text{if i is compared with j}\\ 0 & \text{otherwise}\\ \end{cases}

\begin{align*} E[C(n)] =& E[\sum_{i = 1}^n \sum_{j = i+1}^n X_{ij}]\\ =& \sum_{i = 1}^n \sum_{j = i+1}^n E[X_{ij}]\\ =& \sum_{i = 1}^n \sum_{j = i+1}^n Pr\{s_i \text{ ever compared with } s_j\}\\ =& \sum_{i = 1}^n \sum_{j = i+1}^n Pr\{s_i \text{ in the same subset as } s_j \text{ at some point}\} \cdot Pr\{\text{one of them is pivot}\}\\ =& \sum_{i = 1}^n \sum_{j = i+1}^n 1 \cdot \frac{2}{j - i + 1}\\ =& 2\sum_{i = 1}^{n - 1} \sum_{k = 2}^{n - i + 1} \cdot \frac{1}{k} \tag{let $k = j - i + 1$}\\ \leq& 2 \sum_{i = 1}^n \sum_{k = 2}^n \frac{1}{k}\\ \leq& 2 \sum_{i = 1}^n (1 + \ln n - 1)\\ =& 2n \ln n\\ \in& O(n \lg n)\\ \end{align*}

Find k-th Smallest Element

Idea: find the k-th smallest item

  1. choose a pivot at position i
  2. sort the list from small to large according to pivot i
  3. if k < i(i is the position after sorting), then throw away right side
  4. if k > i(i is the position after sorting), then throw away left side
  5. if k = i(i is the position after sorting), we found it!

Best case bound:

\begin{align*} C(n) <& (n - 1) + C(n/2) \tag{ignore $k = i$ case}\\ =& (n - 1) + (n / 2 - 1) + C(n / 4)\\ <& n + n/2 + n/4 + n/8 + ... + 1\\ \leq& 2n \end{align*}

Theorem: the expected number of comparison is bounded by 8n.

\begin{align*} E[C(n)] \leq& (n - 1) + \sum_{i = 1}^n Pr\{\text{pivot is }s_i\} \cdot E[C(\max(i - 1, n - 1))]\\ =& (n - 1) + \sum_{i = 1}^n \frac{1}{n} \cdot E[C(\max(i - 1, n - 1))]\\ =& (n - 1) + \frac{2}{n}\sum_{i = \lfloor\frac{n}{2}\rfloor}^{n - 1} E[C(i)] \tag{expand the sum and bound it (picture it)}\\ \end{align*}

We start using induction. Base case: E[C(1)] = 1 \leq c \cdot 1. Assume E[C(i)] \leq c \cdot i for some c \geq 1, then:

\begin{align*} E[C(n)] \leq& (n - 1) + \frac{2}{n}\sum_{i = \lfloor\frac{n}{2}\rfloor}^{n - 1} E[C(i)]\\ \leq& (n - 1) + \frac{2}{n} \sum_{i = \lfloor\frac{n}{2}\rfloor}^{n - 1} c \cdot i \tag{indiction hypothesis}\\ \leq& (n - 1) + \frac{2c}{n} \cdot \frac{(n - 1)+\lfloor\frac{n}{2}\rfloor}{2} \cdot (n - 1 - \lfloor\frac{n}{2}\rfloor + 1)\\ \leq& (n - 1) + \frac{2c}{n} \cdot \frac{(n - 1)+\frac{n}{2}}{2} \cdot (n - \frac{n - 1}{2})\\ =& (n - 1) + \frac{3cn}{4} + \frac{c}{4} - \frac{2c}{4n}\\ =& 7n + 1 - \frac{4}{n} \tag{plug in $c = 8$}\\ \leq& 7n + 1\\ \leq& 8n \tag{for $n \geq 1$}\\ \end{align*}

When k = (n + 1)/2 (assuming n is odd), then we have Randomized Median-Select algorithm.

Randomized Algorithm: Monte Carlo

Randomized Matrix-Multiplication Checking

Freivalds' Matrix Multiplication Checking Algorithm: checking A \cdot B =^? C.

  1. Choose random vector \vec{r} = \langle{r_1, r_2, ..., r_n}\rangle where r_i \in \{0, 1\}.
  2. if A(B\vec{r}) \neq C\vec{r} return False, else return True

Theorem: for A \cdot B \neq C, the error probability is:

Pr\{A \cdot B \cdot \vec{r} = C \cdot \vec{r}\} \leq \frac{1}{2}

If we chose \vec{r} such that A \cdot B \cdot \vec{r} \neq C \cdot \vec{r} when it is really A \cdot B \neq C, then we say \vec{r} witness the fact A \cdot B \neq C.

Proof: given D := AB - C \neq 0, what is the probability D\vec{r} = AB\vec{r} - C\vec{r} = \vec{0}? Well, since D \neq 0, let d_{1, 1} be the non-zero entry of D.

\begin{align*} &D\vec{r} = \vec{0}\\ \implies& \sum_{j = 1}^n (d_{1, j} r_j) = 0\\ \implies& d_{1, 1}r_1 + \sum_{j = 2}^n (d_{1, j} r_j) = 0\\ \implies& r_1 = \\ \end{align*}

If we want to pass the test by maliciously choosing r, r_1 is determined when r_2, r_3, ..., r_n is determined. Therefore, for every set of r_2, r_3, ..., r_n, there is one r_1 that makes the malicious injection successful. Since r_1 \in \{0, 1\}, the probability such that our choice r_1 magically equal to -\frac{1}{d_{1, 1}} \sum_{j = 2}^n (d_{1, j} r_j) is \leq \frac{1}{|\{0, 1\}|}.

Note the \leq is used above. We could have the case that none of our choice will make us pass the test due to the chance that -\frac{1}{d_{1, 1}} \sum_{j = 2}^n (d_{1, j} r_j) \not\in \{0, 1\}.

The algorithm is \in \Theta(n^2). We can boost it by running k times with complexity \in \Theta(kn^2) to achieve error rate \leq \left(\frac{1}{|\{0, 1\}|}\right)^k. We can also boost it by choosing \vec{r} \in \{0, 1, 2\}^n rather than \vec{r} \in \{0, 1\}^n.

What is some of our choice of random vector \vec{r} repeat in our process of boosting? This turns out is not a problem as long as we choose vectors independently. What if we have an algorithm with 2-sided error? Then we can do majority voting.

Randomized Polynomial Checking

We would like to know if a given polynomial G(x) := x^3 - x^2 - 41x +105 is equal to G(X) := (x - 3)(x - 5)(x + 7) without simplifying them (as it is expensive \in O(d^2)).

Let d be the number of root (degree) of polynomial.

Simple Random Checker:

  1. pick a random value r out of n \cdot d many possible choices
  2. if F(r) = G(r) return True, otherwise return False.

The above algorithm is \in \Theta(d)

Proof: Let H(x) = F(x) - G(x), observe H(x) has at most d roots.

Given F(x) \neq G(x):

\begin{align*} &Pr\{F(r) = G(r)\}\\ =& Pr\{H(r) = 0\}\\ \leq& \frac{d}{nd} \tag{$\leq d$ many values in choices for $r$ that satisfy}\\ =& \frac{1}{n}\\ \end{align*}

Note the \leq is used above. We could have the case that none of our choice are roots to H(x).

Again, we can boost it to achieve error rate \leq \frac{1}{n^k} by running it k times with complexity \Theta(kd).

Randomized Min-Cut

Cut-set: a set of edges whose removal will break the graph into two or more connected components.

The algorithm: given graph G = \langle{E, V}\rangle. Let |V| = n, C be the set of edges represent one of those min-cuts and |C| = k.

  1. For each iteration, contract two vertice by selecting a random common edge e \in E between v_1, v_2.
  2. Stop until there is only two vertice left

Since there is n vertice, we need n - 2 iterations. We know the graph has at least \frac{nk}{2} edges by edge counting since k is the length of smallest cut set.

Let E_i denotes the event "no edge of C is selected in the i-th round". We calculate E_1:

Pr\{E_1\} \geq \frac{nk/2 - k}{nk/2} = \frac{n - 2}{n}
Pr\{E_2 | E_1\} \geq \frac{(n - 1)k/2 - k}{(n - 1)k/2} = \frac{n - 3}{n - 1}
\begin{align*} Pr\{\text{none of } C \text{ contracted}\} =& Pr\{E_1\} Pr\{E_2 | E_1\} Pr\{E_3 | E_1 \cap E_2\} ... Pr\{E_{n - 2} | E_1 \cap E_2 \cap ... \cap E_{n - 3}\}\tag{by $n-2$ many contraction}\\ \geq& \frac{n - 2}{n} \frac{n - 3}{n} \frac{n - 4}{n} ...\frac{3}{5} \frac{2}{4} \frac{1}{3}\\ =& \frac{2}{n (n - 1)}\\ \end{align*}

Therefore the probability that the algorithm doesn't output a specific min-cut is: \leq 1 - \frac{2}{n(n - 1)}. It is less than a high probability. (It is unlikely it will produce min-cut)

To fix it, we run \Theta(n^2\ln n) times:

\begin{align*} &\left(1 - \frac{2}{n(n - 1)}\right)^{n^2 \ln n}\\ \leq& \left[\left(1 - \frac{2}{n(n - 1)}\right)^{n(n - 1)}\right]^{\ln n}\\ \leq& [e^{-2}]^{\ln n} \tag{by $1 - x \leq e^{-x}$}\\ =& \frac{1}{n^2}\\ \end{align*}

Table of Content