Lecture 008 - Randomized Quicksort

Randomized Algorithm

Monte Carlo algorithms: might be wrong Las Vegas algorithms: might be fast

Quick Select

Order Statistics: find the kth minimum item in a sequence.

Quick Select:

  1. given an array of n elements
  2. find the kth element
  3. put everything less than the kth element on the left
  4. put everything greater than the kth element on the right

Complexity:

Algorithm:

  1. pick the first element in array as a pivot
  2. sort the array against the pivot, we get pivot index p
  3. if the pivot is at kth position (p = k), return
  4. otherwise, find the kth element in left array if k is less than pivot index p
  5. otherwise, find the k-(p+1)th element in the right array if k is greater than pivot index p

Pseudo Code // TODO

Expected Complexity of Quick Select

We want to bound the expected input length at each level:

\begin{align*} E[Y_{d+1}] =& \sum_y \sum_z Pr\{Y_d=y \cap Z_d = z\} \cdot f(y, z)\\ =& \sum_y \sum_z Pr\{Y_d=y \cap Z_d = z\} \cdot \max(0, z, y-z-1)\tag{either left half, right half, or get lucky}\\ =& \sum_y \sum_z Pr\{Y_d=y\} \cdot Pr\{Z_d = z | Y_d = y\} \cdot \max(0, z, y-z-1)\\ =& \sum_y \sum_z Pr\{Y_d=y\} \cdot \frac{1}{y} \cdot \max(0, z, y-z-1)\\ =& \sum_y Pr\{Y_d=y\} \cdot \frac{1}{y} \cdot (\sum_z\max(0, z, y-z-1))\\ \leq& \sum_y Pr\{Y_d=y\} \cdot \frac{1}{y} \cdot (2\sum_{z=y/2}^{y-1}z)\\ =& \sum_y Pr\{Y_d=y\} \cdot \frac{1}{y} \cdot (2\cdot \frac{1}{2}(y/2+y-1)(y-1-y/2+1))\\ \leq& \sum_y Pr\{Y_d=y\} \cdot \frac{1}{y} \cdot \frac{3}{4}y^2\\ =& \frac{3}{4} \sum_y yPr\{Y_d=y\}\\ =& \frac{3}{4} E[Y_d]\\ E[Y_d] =& n \cdot (\frac{3}{4})^d \end{align*}

Therefore, the expected work is:

\begin{align*} E[W] =& \sum_d E[W_d]\\ =& \sum_d E[Y_d]\\ =& \sum_d n \cdot (\frac{3}{4})^d\\ =& n\sum_d (\frac{3}{4})^d\\ \in& O(n) \tag{by Geometric Series}\\ \end{align*}

The expected span is: assuming we have O(\log n) levels with high probability, then

\begin{align*} E[S] =& \#\text{levels} \cdot E[S_d]\\ =& \#\text{levels} \cdot \log n \tag{by filter}\\ =& \log n \cdot \log n \tag{w.h.p}\\ =& \log^2 n \tag{w.h.p}\\ \end{align*}

Quick Sort

Pseudo Code // TODO

The probability of comparing indices i, j (assuming j > i) is:

E[X_{ij}] = Pr\{i \text{ compaired to } j\} = \frac{2}{j - i + 1}

This is because

The overall number of comparison is

\begin{align*} E[W] =& \sum_{i < j} \frac{2}{j - i + 1}\\ =& 2\sum_{i = 0}^{n - 1} \sum_{j = i+1}^{n - 1} \frac{1}{j - i + 1}\\ =& 2\sum_{i = 0}^{n - 1} \sum_{k = 1}^{n - i - 1} \frac{1}{k + 1} \tag{let $k = j - 1$}\\ =& 2\sum_{i = 0}^{n - 1} (H_{n - i} - 1)\\ \leq& 2\sum_{i = 0}^{n - 1} (1 + \log(n - i) - 1)\\ =& 2\sum_{i = 0}^{n - 1} \log(n - i)\\ \in& O(n \log n)\\ \end{align*}

For span analysis, we use high probability bound. This is because the max of two high probability bound is usually the same high probability bound.

Binary Search Tree

(* 15-210 Fall 2022 *)
(* Parametric implementation of binary search trees *)
(* INCOMPLETE AND UNTESTED *)
(* Live-coded in Lecture 11, Wed Oct 5, 2022 *)
(* Frank Pfenning + students *)

signature KEY =
sig
    type t
    val compare : t * t -> order
end

structure K :> KEY =
struct
  type t = int
  val compare = Int.compare
end

signature ParmBST =
sig
    type T          (* abstract *)

    datatype E = Leaf | Node of T * K.t * T

    val size : T -> int

    val expose : T -> E  (* exposes structure, not internal info *)
    val joinMid : E -> T (* rebalance *)
end

structure P :> ParmBST =
struct

datatype T = TLeaf | TNode of T * K.t * int * T
datatype E = Leaf | Node of T * K.t * T

fun size TLeaf = 0
  | size (TNode (L, k, s, R)) = s

fun expose T = case T
                of TLeaf => Leaf
                 | TNode (L, k, s, R) => Node (L, k, R)
fun joinMid E = case E
                 of Leaf => TLeaf
                  | Node (L, k, R) => TNode (L, k, size L + size R + 1, R)

end

signature BST =
sig

    type T  (* abstract *)
    val empty : T
    val find : T -> K.t -> bool

    val insert : T -> K.t -> T
    val delete : T -> K.t -> T

    val union : T * T -> T
    val intersection : T * T -> T
    (* more... *)
end

functor Simple (structure P : ParmBST) :> BST =
struct

type T = P.T
val empty = P.joinMid (P.Leaf)

fun split T k = case P.expose T of
   P.Leaf => (empty, false, empty)
 | P.Node (L, k', R) => case K.compare (k, k') of
     LESS => let val (LL, b, LR) = split L k  (* LL < k < LR *)
              in (LL, b, P.joinMid(P.Node(LR, k', R))) end
   | EQUAL => (L, true, R)
   | GREATER => let val (RL, b, RR) = split R k
                in (P.joinMid(P.Node(L, k', RL)), b, RR) end

fun insert T k =
    let val (L, _, R) = split T k
    in P.joinMid(P.Node(L, k, R)) end

fun find T k = case P.expose T of
   P.Leaf => false
 | P.Node(L, k', R) => case K.compare (k, k') of
       LESS => find L k
     | EQUAL => true
     | GREATER => find R k


end

Expected Span

Consider a game in which we draw some number of tasks at random such that a task has length n with probability 1/n and has length 1 otherwise. The expected length of a task is therefore bounded by 2. Imagine now drawing n tasks and waiting for all them to complete, assuming that each task can proceed in parallel independently of other tasks. Prove that the expected completion time is not constant.

\begin{align*} &\left(\frac{n-1}{n}\right)^{n}\cdot1+\left(1-\left(\frac{n-1}{n}\right)^{n}\right)n\\ \to& 1/e + (1 - 1/e)n\tag{take the limit}\\ \in& O(n)\\ \end{align*}

Repeat the same exercise with slightly different probabilities: a randomly chosen task has length n with probability 1/n^3 and 1 otherwise. Prove now that the expected completion time is bounded by a constant.

\begin{align*} &\left(\frac{n^{3}-1}{n^{3}}\right)^{n}\cdot1+\left(1-\left(\frac{n^{3}-1}{n^{3}}\right)^{n}\right)n\\ \to& 1+(1-1)n\tag{take the limit}\\ <&2\\ \in& O(1) \end{align*}

Table of Content