Sets: Georg Cantor's mathematical concept used to formalize mathematics.
for element in space \mathbb{U} that support equality (for universe of functions, equality is not decidable), a set datatype is a power set of \mathbb{U} of finite size.
sets do not require a total order (therefore order of toSeq
can be arbitrary in specification)
We care about sets of element on fixed domain and cost models. We implement set by binary search tree.
size : S -> N
toSeq A : S -> Seq
empty : S
singleton : U -> S
fromSeq : Seq -> S
filter : ((U -> B) -> S) -> S
intersection : S -> S -> S
difference : S -> S -> S
union : S -> S -> S
find : S -> U -> B
delete : S -> U -> S
insert : S -> U -> S
where S is a set, N is natural number, B is boolean.
intersection
,difference
,union
are bulk updates.iterate
,reduce
can be done by first converting set to sequence.
// QUESTION: why it is bad for allowing map
to return singleton
// QUESTION: textbook says we can't create infinite set with finite work, I disagree
Approach: assuming elements on natural number and equality function exist
If elements are natural number: array-based boolean sequence
If element are hashable: hash-table
If element are total order: binary search tree
// TODO: image on cost
Some bound, for example,
size
can be improved.
Bollean Sequence: let our universe U defined as \{0, 1, ..., u-1\} for u \in \mathbb{N}, then we can construct a boolean sequence of length u where we put 1 for element exist in set and 0 otherwise.
// TODO: image on cost
Note that
''
means the same as above in the image. n = \max(|a|, |b|), m = \min(|a|, |b|).
Tree Representation: we use balanced BST to implement set
Also consider the cost of intersection
, union
, and difference
. Let m = 1, observe the work becomes O(\log n). Let m = n, observes the work becomes O(n).
If we implement intersection
, union
, and difference
by singleton iteration, it is not work-efficient and the cost would be O(n \log n) to edit on size n.
fromSeq a = Seq.iterate Set.insert emptyset a
is work efficient but sequential.fromSeq a = Seq.reduce Set.union empty <{x} : x in a>
is work-efficient and parallel.
Table: Universe of key \mathbb{K} must support equality. Table is power set of \mathbb{K} \times \mathbb{V} where each key can appear at most once.
Tables are similar to sets: they extend sets so that each key now carries a value. Their cost specification and implementations are also similar.
// TODO: copy
where \mathbb{S} is power set of keys and \mathbb{N} is natural number and \mathbb{B} is boolean.
tabulate
function bulk convert keys to value. map
convert value to value but keep keys the same. filter
delete key value pairs. intersection
intersect the keys and combine two value to single value through a function. difference
delete key value pairs. union
is the same as intersection when dealing with duplicates. find
access the table by key return vale and if no key return bottom \perp. delete
delete key. insert
insert pair. restrict
take in a domain set and restrict the domain of a table to that domain (similar to intersection or bulk version of find
). subtract
is complement of restruct
that bulk delete
according to a domain set. collect
transfer from a sequence of key value pairs to a table whose value is a sequence.
// TODO: short hand
// TODO: cost specification
Ordered Sets: assume element have a total order
// TODO: Data type
We assume max
or min
of empty set returns \perp
// TODO: example
We can implement ordered sets with BST efficiently. getRange
is two calls to split
. rank
, select
, splitRank
can be implemented by augment size
.
The cost for the ordered set/table functions is the same as for (tree-based) sets/tables
Augmented Table: assume keys have a total order
With augmented table we can implement reduceVal : T -> V
given associative function f : V * V -> V
and identity I
in O(1) work.
// TODO: recitation
Table of Content