Advantage: O(1) average for search, insert, delete, and O(m) space.
There are 3 types of Hashing Algorithms:
Note that they are many advanced hashing schemes including: Bloom Filters, Cuckoo Hashing, Consistent Hashing, ...
Definition: bucket hash function
U: the space of key Universe
K \subset U: the actual set of keys that we use. m = |K| << |U|.
B: the space of buckets partitioned into B_i \sim \text{Binomial}(m, \frac{1}{n}). n = |B|.
k \in U: single key
h : U \rightarrow B: hashing function
\alpha = \frac{m}{n}: load factor, how many keys per bucket on average
Simple Uniform Hashing Assumption (SUHA): h is SUHA if each key k \in K has probability \frac{1}{|B|} of mapping to any bucket b \in B. ((\forall k_i \in K, b_i \in B)(Pr\{h(k_i) = b_i\} = \frac{1}{|B|})). Moreover, the hash values of different keys are independent, Pr\{h(k_1) = b_1 \cap h(k_2) = b_2 \cap ... \cap h(k_i) = b_i\} = \frac{1}{n^i}.
Note that h(k) is deterministic, not probabilistic (Pr\{h(k) = b\} = \frac{1}{n} is not true). To resolve this issue, we need a universal family of hashing functions H = \{h_1, h_2, ..., h_n\}. A random hashing function h_i \in H is chosen for the specific instance of hash table.
What is E[B_i]? Let I_k be indicator random variable that key k maps to bucket i, then:
You can also conclude this by knowing that B_i \sim \text{Binomial}(m, \frac{1}{n})
When m is high, p = \frac{1}{n} is low, then B_i \sim \text{Binomial}(m, \frac{1}{n}) \simeq \text{Poisson}(mp) = \text{Poisson}(\alpha) with E[B_i] \simeq \alpha, Var(B_i) \simeq \alpha
// TODO: exit exercise 4.4 to include binomial's relation with Poisson
When \alpha is high, B_i \sim \text{Binomial}(m, \frac{1}{n}) \simeq \text{Poisson}(\alpha) \simeq \text{Normal}(\alpha, \alpha)
If the number of bucket n equals the number of keys m, then as we showed in the last section, with high probability, B_i \in O(\frac{\ln n}{\ln \ln n}).
If the number of buckets n is smaller than the number of keys m, and if m \geq 2n\ln n, with high probability (Pr \geq 1 - \frac{1}{n}), (\forall i)(|B_i| < e \alpha).
Proof: Here is what we want to show
Notice \alpha = \frac{m}{n} \geq 2\ln n > 1, therefore \alpha \in \Omega(\ln n) \not\in \Theta(n)
Disadvantage of Separate Chaining:
It requires pointer storage overhead
Memory allocation is scattered all over the memory space (not cache friendly)
probing: searching through alternative locations in the array (the probe sequence) until either the target record is found, or an unused array slot is found.
In the case of linear probing, our probing sequence, for a particular key k is:
We need n > m to use linear probing. Typically, we use n > 2m in implementation.
Insert: Given a key k
Search / Delete: Given a key k
Disadvantage of Linear Probing:
In the case of linear probing, our probing sequence, for a particular key k is:
In the idea world, the probe sequence for each key is equally likely to be assigned any one of the n! many permutations of \langle{0, 1, 2, ..., n - 1}\rangle buckets. It randomized the probing sequence with respect to each k.
Let A_i denotes the event "ith cell that we look at is occupied", we are interested in search cost X.
Now, we want to calculate E[X] using Pr\{X > i\}.
Definition: cryptographic signature
U: the space of key Universe
K \subset U: the actual set of keys that we use. m = |K| << |U|.
B: the space of all possible signature. n = |B|.
k \in U: single key
h : U \rightarrow B: hashing function
Hash Collision: when two different keys k_i, k_j has the same hash value (h(k_i) = h(k_j) given k_i \neq k_j)
Given that we hashed |K| = m many keys, how large |B| = n should be to achieve low probability of collision? Given that we hashed |K| = m many keys, what is the probability that none of their hash collide?
Let A denotes the event of "no collision". Let A_i denotes key i has different signature than all of first i - 1 many keys.
Therefore, the probability that there are no collision (given m, n) is:
Note that for n >> m, the upper bound is close to equality.
The above equality implies we need m \in O(\sqrt{n}) to ensure the probability of no collision is high. In fact, in expectation, we can insert 1 + \sqrt{\frac{\pi n}{2}} many keys before a collision.
If we want e^{-\frac{m^2}{2n}} \simeq 1, then
// TODO: include alternative way to calculate E[X]
Table of Content