Lecture 008 - Randomized Quicksort

Randomized Algorithm

Monte Carlo algorithms: might be wrong Las Vegas algorithms: might be fast

Quick Select

Order Statistics: find the $k$ th minimum item in a sequence.

Quick Select:

given an array of $n$ elements
find the $k$ th element
put everything less than the $k$ th element on the left
put everything greater than the $k$ th element on the right

Complexity:

Average Complexity: $O(n)$ work with high probability, $O(\log^2 n)$ span with high probability
Worst Complexity $O(n^2)$
Worst When:
- pick the smallest element as pivot, asked for biggest element
- pick the biggest element as pivot, asked for smallest element
Anti-adversarial: random chosen a pivot

Algorithm:

pick the first element in array as a pivot
sort the array against the pivot, we get pivot index $p$
if the pivot is at $k$ th position ( $p = k$ ), return
otherwise, find the $k$ th element in left array if $k$ is less than pivot index $p$
otherwise, find the $k-(p+1)$ th element in the right array if $k$ is greater than pivot index $p$

Pseudo Code // TODO

Expected Complexity of Quick Select

We want to bound the expected input length at each level:

Let $Y_d = \text{input length at level d}$
Let $Z_d = \text{pivot index at level d}$

$\begin{align*} E[Y_{d+1}] =& \sum_y \sum_z Pr\{Y_d=y \cap Z_d = z\} \cdot f(y, z)\\ =& \sum_y \sum_z Pr\{Y_d=y \cap Z_d = z\} \cdot \max(0, z, y-z-1)\tag{either left half, right half, or get lucky}\\ =& \sum_y \sum_z Pr\{Y_d=y\} \cdot Pr\{Z_d = z | Y_d = y\} \cdot \max(0, z, y-z-1)\\ =& \sum_y \sum_z Pr\{Y_d=y\} \cdot \frac{1}{y} \cdot \max(0, z, y-z-1)\\ =& \sum_y Pr\{Y_d=y\} \cdot \frac{1}{y} \cdot (\sum_z\max(0, z, y-z-1))\\ \leq& \sum_y Pr\{Y_d=y\} \cdot \frac{1}{y} \cdot (2\sum_{z=y/2}^{y-1}z)\\ =& \sum_y Pr\{Y_d=y\} \cdot \frac{1}{y} \cdot (2\cdot \frac{1}{2}(y/2+y-1)(y-1-y/2+1))\\ \leq& \sum_y Pr\{Y_d=y\} \cdot \frac{1}{y} \cdot \frac{3}{4}y^2\\ =& \frac{3}{4} \sum_y yPr\{Y_d=y\}\\ =& \frac{3}{4} E[Y_d]\\ E[Y_d] =& n \cdot (\frac{3}{4})^d \end{align*}$

Therefore, the expected work is:

$\begin{align*} E[W] =& \sum_d E[W_d]\\ =& \sum_d E[Y_d]\\ =& \sum_d n \cdot (\frac{3}{4})^d\\ =& n\sum_d (\frac{3}{4})^d\\ \in& O(n) \tag{by Geometric Series}\\ \end{align*}$

The expected span is: assuming we have $O(\log n)$ levels with high probability, then

$\begin{align*} E[S] =& \#\text{levels} \cdot E[S_d]\\ =& \#\text{levels} \cdot \log n \tag{by filter}\\ =& \log n \cdot \log n \tag{w.h.p}\\ =& \log^2 n \tag{w.h.p}\\ \end{align*}$

Quick Sort

Pseudo Code // TODO

The probability of comparing indices $i, j$ (assuming $j > i$ ) is:

$E[X_{ij}] = Pr\{i \text{ compaired to } j\} = \frac{2}{j - i + 1}$

This is because

We only compare $i, j$ if one of them is a pivot (nominator)
We know eventually $i, j$ will end up with different partition
If we choose a pivot that is less than $i$ or greater than $j$ , we never make progress
If we choose a pivot between $i, j$ , then we make progress (denominator)

The overall number of comparison is

$\begin{align*} E[W] =& \sum_{i < j} \frac{2}{j - i + 1}\\ =& 2\sum_{i = 0}^{n - 1} \sum_{j = i+1}^{n - 1} \frac{1}{j - i + 1}\\ =& 2\sum_{i = 0}^{n - 1} \sum_{k = 1}^{n - i - 1} \frac{1}{k + 1} \tag{let $k = j - 1$}\\ =& 2\sum_{i = 0}^{n - 1} (H_{n - i} - 1)\\ \leq& 2\sum_{i = 0}^{n - 1} (1 + \log(n - i) - 1)\\ =& 2\sum_{i = 0}^{n - 1} \log(n - i)\\ \in& O(n \log n)\\ \end{align*}$

For span analysis, we use high probability bound. This is because the max of two high probability bound is usually the same high probability bound.

Binary Search Tree

(* 15-210 Fall 2022 *)
(* Parametric implementation of binary search trees *)
(* INCOMPLETE AND UNTESTED *)
(* Live-coded in Lecture 11, Wed Oct 5, 2022 *)
(* Frank Pfenning + students *)

signature KEY =
sig
    type t
    val compare : t * t -> order
end

structure K :> KEY =
struct
  type t = int
  val compare = Int.compare
end

signature ParmBST =
sig
    type T          (* abstract *)

    datatype E = Leaf | Node of T * K.t * T

    val size : T -> int

    val expose : T -> E  (* exposes structure, not internal info *)
    val joinMid : E -> T (* rebalance *)
end

structure P :> ParmBST =
struct

datatype T = TLeaf | TNode of T * K.t * int * T
datatype E = Leaf | Node of T * K.t * T

fun size TLeaf = 0
  | size (TNode (L, k, s, R)) = s

fun expose T = case T
                of TLeaf => Leaf
                 | TNode (L, k, s, R) => Node (L, k, R)
fun joinMid E = case E
                 of Leaf => TLeaf
                  | Node (L, k, R) => TNode (L, k, size L + size R + 1, R)

end

signature BST =
sig

    type T  (* abstract *)
    val empty : T
    val find : T -> K.t -> bool

    val insert : T -> K.t -> T
    val delete : T -> K.t -> T

    val union : T * T -> T
    val intersection : T * T -> T
    (* more... *)
end

functor Simple (structure P : ParmBST) :> BST =
struct

type T = P.T
val empty = P.joinMid (P.Leaf)

fun split T k = case P.expose T of
   P.Leaf => (empty, false, empty)
 | P.Node (L, k', R) => case K.compare (k, k') of
     LESS => let val (LL, b, LR) = split L k  (* LL < k < LR *)
              in (LL, b, P.joinMid(P.Node(LR, k', R))) end
   | EQUAL => (L, true, R)
   | GREATER => let val (RL, b, RR) = split R k
                in (P.joinMid(P.Node(L, k', RL)), b, RR) end

fun insert T k =
    let val (L, _, R) = split T k
    in P.joinMid(P.Node(L, k, R)) end

fun find T k = case P.expose T of
   P.Leaf => false
 | P.Node(L, k', R) => case K.compare (k, k') of
       LESS => find L k
     | EQUAL => true
     | GREATER => find R k


end

Expected Span

Consider a game in which we draw some number of tasks at random such that a task has length $n$ with probability $1/n$ and has length $1$ otherwise. The expected length of a task is therefore bounded by $2$ . Imagine now drawing $n$ tasks and waiting for all them to complete, assuming that each task can proceed in parallel independently of other tasks. Prove that the expected completion time is not constant.

$\begin{align*} &\left(\frac{n-1}{n}\right)^{n}\cdot1+\left(1-\left(\frac{n-1}{n}\right)^{n}\right)n\\ \to& 1/e + (1 - 1/e)n\tag{take the limit}\\ \in& O(n)\\ \end{align*}$

Repeat the same exercise with slightly different probabilities: a randomly chosen task has length $n$ with probability $1/n^3$ and $1$ otherwise. Prove now that the expected completion time is bounded by a constant.

$\begin{align*} &\left(\frac{n^{3}-1}{n^{3}}\right)^{n}\cdot1+\left(1-\left(\frac{n^{3}-1}{n^{3}}\right)^{n}\right)n\\ \to& 1+(1-1)n\tag{take the limit}\\ <&2\\ \in& O(1) \end{align*}$

Table of Content