# Lecture 008 - Randomized Quicksort

## Randomized Algorithm

Monte Carlo algorithms: might be wrong Las Vegas algorithms: might be fast

### Quick Select

Order Statistics: find the $k$th minimum item in a sequence.

Quick Select:

1. given an array of $n$ elements
2. find the $k$th element
3. put everything less than the $k$th element on the left
4. put everything greater than the $k$th element on the right

Complexity:

• Average Complexity: $O(n)$ work with high probability, $O(\log^2 n)$ span with high probability

• Worst Complexity $O(n^2)$

• Worst When:

• pick the smallest element as pivot, asked for biggest element
• pick the biggest element as pivot, asked for smallest element
• Anti-adversarial: random chosen a pivot

Algorithm:

1. pick the first element in array as a pivot
2. sort the array against the pivot, we get pivot index $p$
3. if the pivot is at $k$th position ($p = k$), return
4. otherwise, find the $k$th element in left array if $k$ is less than pivot index $p$
5. otherwise, find the $k-(p+1)$th element in the right array if $k$ is greater than pivot index $p$

Pseudo Code // TODO

#### Expected Complexity of Quick Select

We want to bound the expected input length at each level:

• Let $Y_d = \text{input length at level d}$

• Let $Z_d = \text{pivot index at level d}$

\begin{align*} E[Y_{d+1}] =& \sum_y \sum_z Pr\{Y_d=y \cap Z_d = z\} \cdot f(y, z)\\ =& \sum_y \sum_z Pr\{Y_d=y \cap Z_d = z\} \cdot \max(0, z, y-z-1)\tag{either left half, right half, or get lucky}\\ =& \sum_y \sum_z Pr\{Y_d=y\} \cdot Pr\{Z_d = z | Y_d = y\} \cdot \max(0, z, y-z-1)\\ =& \sum_y \sum_z Pr\{Y_d=y\} \cdot \frac{1}{y} \cdot \max(0, z, y-z-1)\\ =& \sum_y Pr\{Y_d=y\} \cdot \frac{1}{y} \cdot (\sum_z\max(0, z, y-z-1))\\ \leq& \sum_y Pr\{Y_d=y\} \cdot \frac{1}{y} \cdot (2\sum_{z=y/2}^{y-1}z)\\ =& \sum_y Pr\{Y_d=y\} \cdot \frac{1}{y} \cdot (2\cdot \frac{1}{2}(y/2+y-1)(y-1-y/2+1))\\ \leq& \sum_y Pr\{Y_d=y\} \cdot \frac{1}{y} \cdot \frac{3}{4}y^2\\ =& \frac{3}{4} \sum_y yPr\{Y_d=y\}\\ =& \frac{3}{4} E[Y_d]\\ E[Y_d] =& n \cdot (\frac{3}{4})^d \end{align*}

Therefore, the expected work is:

\begin{align*} E[W] =& \sum_d E[W_d]\\ =& \sum_d E[Y_d]\\ =& \sum_d n \cdot (\frac{3}{4})^d\\ =& n\sum_d (\frac{3}{4})^d\\ \in& O(n) \tag{by Geometric Series}\\ \end{align*}

The expected span is: assuming we have $O(\log n)$ levels with high probability, then

\begin{align*} E[S] =& \#\text{levels} \cdot E[S_d]\\ =& \#\text{levels} \cdot \log n \tag{by filter}\\ =& \log n \cdot \log n \tag{w.h.p}\\ =& \log^2 n \tag{w.h.p}\\ \end{align*}

### Quick Sort

Pseudo Code // TODO

The probability of comparing indices $i, j$ (assuming $j > i$) is:

E[X_{ij}] = Pr\{i \text{ compaired to } j\} = \frac{2}{j - i + 1}

This is because

• We only compare $i, j$ if one of them is a pivot (nominator)

• We know eventually $i, j$ will end up with different partition

• If we choose a pivot that is less than $i$ or greater than $j$, we never make progress

• If we choose a pivot between $i, j$, then we make progress (denominator)

The overall number of comparison is

\begin{align*} E[W] =& \sum_{i < j} \frac{2}{j - i + 1}\\ =& 2\sum_{i = 0}^{n - 1} \sum_{j = i+1}^{n - 1} \frac{1}{j - i + 1}\\ =& 2\sum_{i = 0}^{n - 1} \sum_{k = 1}^{n - i - 1} \frac{1}{k + 1} \tag{let $k = j - 1$}\\ =& 2\sum_{i = 0}^{n - 1} (H_{n - i} - 1)\\ \leq& 2\sum_{i = 0}^{n - 1} (1 + \log(n - i) - 1)\\ =& 2\sum_{i = 0}^{n - 1} \log(n - i)\\ \in& O(n \log n)\\ \end{align*}

For span analysis, we use high probability bound. This is because the max of two high probability bound is usually the same high probability bound.

### Binary Search Tree

(* 15-210 Fall 2022 *)
(* Parametric implementation of binary search trees *)
(* INCOMPLETE AND UNTESTED *)
(* Live-coded in Lecture 11, Wed Oct 5, 2022 *)
(* Frank Pfenning + students *)

signature KEY =
sig
type t
val compare : t * t -> order
end

structure K :> KEY =
struct
type t = int
val compare = Int.compare
end

signature ParmBST =
sig
type T          (* abstract *)

datatype E = Leaf | Node of T * K.t * T

val size : T -> int

val expose : T -> E  (* exposes structure, not internal info *)
val joinMid : E -> T (* rebalance *)
end

structure P :> ParmBST =
struct

datatype T = TLeaf | TNode of T * K.t * int * T
datatype E = Leaf | Node of T * K.t * T

fun size TLeaf = 0
| size (TNode (L, k, s, R)) = s

fun expose T = case T
of TLeaf => Leaf
| TNode (L, k, s, R) => Node (L, k, R)
fun joinMid E = case E
of Leaf => TLeaf
| Node (L, k, R) => TNode (L, k, size L + size R + 1, R)

end

signature BST =
sig

type T  (* abstract *)
val empty : T
val find : T -> K.t -> bool

val insert : T -> K.t -> T
val delete : T -> K.t -> T

val union : T * T -> T
val intersection : T * T -> T
(* more... *)
end

functor Simple (structure P : ParmBST) :> BST =
struct

type T = P.T
val empty = P.joinMid (P.Leaf)

fun split T k = case P.expose T of
P.Leaf => (empty, false, empty)
| P.Node (L, k', R) => case K.compare (k, k') of
LESS => let val (LL, b, LR) = split L k  (* LL < k < LR *)
in (LL, b, P.joinMid(P.Node(LR, k', R))) end
| EQUAL => (L, true, R)
| GREATER => let val (RL, b, RR) = split R k
in (P.joinMid(P.Node(L, k', RL)), b, RR) end

fun insert T k =
let val (L, _, R) = split T k
in P.joinMid(P.Node(L, k, R)) end

fun find T k = case P.expose T of
P.Leaf => false
| P.Node(L, k', R) => case K.compare (k, k') of
LESS => find L k
| EQUAL => true
| GREATER => find R k

end



### Expected Span

Consider a game in which we draw some number of tasks at random such that a task has length $n$ with probability $1/n$ and has length $1$ otherwise. The expected length of a task is therefore bounded by $2$. Imagine now drawing $n$ tasks and waiting for all them to complete, assuming that each task can proceed in parallel independently of other tasks. Prove that the expected completion time is not constant.

\begin{align*} &\left(\frac{n-1}{n}\right)^{n}\cdot1+\left(1-\left(\frac{n-1}{n}\right)^{n}\right)n\\ \to& 1/e + (1 - 1/e)n\tag{take the limit}\\ \in& O(n)\\ \end{align*}

Repeat the same exercise with slightly different probabilities: a randomly chosen task has length $n$ with probability $1/n^3$ and $1$ otherwise. Prove now that the expected completion time is bounded by a constant.

\begin{align*} &\left(\frac{n^{3}-1}{n^{3}}\right)^{n}\cdot1+\left(1-\left(\frac{n^{3}-1}{n^{3}}\right)^{n}\right)n\\ \to& 1+(1-1)n\tag{take the limit}\\ <&2\\ \in& O(1) \end{align*}

Table of Content