Lecture 010 - Sets and Tables

Sets

Set Model

Sets: Georg Cantor's mathematical concept used to formalize mathematics.

for element in space $\mathbb{U}$ that support equality (for universe of functions, equality is not decidable), a set datatype is a power set of $\mathbb{U}$ of finite size.
sets do not require a total order (therefore order of toSeq can be arbitrary in specification)

We care about sets of element on fixed domain and cost models. We implement set by binary search tree.

size : S -> N
toSeq A : S -> Seq

empty : S
singleton : U -> S
fromSeq : Seq -> S

filter : ((U -> B) -> S) -> S
intersection : S -> S -> S
difference : S -> S -> S
union : S -> S -> S

find : S -> U -> B
delete : S -> U -> S
insert : S -> U -> S

where $S$ is a set, $N$ is natural number, $B$ is boolean.

intersection, difference, union are bulk updates. iterate, reduce can be done by first converting set to sequence.

// QUESTION: why it is bad for allowing map to return singleton

// QUESTION: textbook says we can't create infinite set with finite work, I disagree

Cost of Sets

Approach: assuming elements on natural number and equality function exist

If elements are natural number: array-based boolean sequence
If element are hashable: hash-table
If element are total order: binary search tree

Some bound, for example, size can be improved.

Bollean Sequence: let our universe $U$ defined as $\{0, 1, ..., u-1\}$ for $u \in \mathbb{N}$ , then we can construct a boolean sequence of length $u$ where we put $1$ for element exist in set and $0$ otherwise.

Note that '' means the same as above in the image. $n = \max(|a|, |b|), m = \min(|a|, |b|)$ .

Tree Representation: we use balanced BST to implement set

Also consider the cost of intersection, union, and difference. Let $m = 1$ , observe the work becomes $O(\log n)$ . Let $m = n$ , observes the work becomes $O(n)$ .

If we implement intersection, union, and difference by singleton iteration, it is not work-efficient and the cost would be $O(n \log n)$ to edit on size $n$ .

fromSeq a = Seq.iterate Set.insert emptyset a is work efficient but sequential. fromSeq a = Seq.reduce Set.union empty <{x} : x in a> is work-efficient and parallel.

Tables

Table: Universe of key $\mathbb{K}$ must support equality. Table is power set of $\mathbb{K} \times \mathbb{V}$ where each key can appear at most once.

Tables are similar to sets: they extend sets so that each key now carries a value. Their cost specification and implementations are also similar.

// TODO: copy

where $\mathbb{S}$ is power set of keys and $\mathbb{N}$ is natural number and $\mathbb{B}$ is boolean.

tabulate function bulk convert keys to value. map convert value to value but keep keys the same. filter delete key value pairs. intersection intersect the keys and combine two value to single value through a function. difference delete key value pairs. union is the same as intersection when dealing with duplicates. find access the table by key return vale and if no key return bottom $\perp$ . delete delete key. insert insert pair. restrict take in a domain set and restrict the domain of a table to that domain (similar to intersection or bulk version of find). subtract is complement of restruct that bulk delete according to a domain set. collect transfer from a sequence of key value pairs to a table whose value is a sequence.

// TODO: short hand

Ordered Sets

Ordered Sets: assume element have a total order

// TODO: Data type

We assume max or min of empty set returns $\perp$

// TODO: example

We can implement ordered sets with BST efficiently. getRange is two calls to split. rank, select, splitRank can be implemented by augment size.

The cost for the ordered set/table functions is the same as for (tree-based) sets/tables

Augmented Table

Augmented Table: assume keys have a total order

With augmented table we can implement reduceVal : T -> V given associative function f : V * V -> V and identity I in $O(1)$ work.

// TODO: recitation

Table of Content