Hash Tables
What if |U| is large, and the set of Keys actually stored,
K, is small ?
|U| >> |K|
Can build a hash table of size Q (
|K| ) and can reduce searching to O (1), on average. With a hash
function h (k) we store x with k = key[x] in h (k). h: U ®
K, since |U| >> |K| we expect collisions.
We say k hashes to slot h (k) and h (k) is the hash
value of key k.


Chained-Hash-Insert ( T, x ), insert x at the head of list
T[h(key[x])] O (r), r = number of elements in T [ h (k) ]
Chained-Hash-Search ( T, x ), search for an element with key
k in list T[h(k)] O (r)
Chained-Hash-Delete ( T, x ), delete x from the list
T[h(key[x])] O (1)
Clearly, collisions are minimized if h is a
"randomizing" function. Given T, a hash table, with m slots
storing n elements.
a = n/m is the load factor (
often consider a fixed as n, m ®
¥ )
Worst-case behavior of hashing with chaining is Q
( n ) + time to compute h (k). This occurs when all n elements hash
to a single slot.
Assume this doesn't happen: Simple Uniform Hashing: a
given Key, k, is equally likely to hash to any of the m slots (values of h
(k)).
Assume h (k) cand be computed in O (1). Thus
searching for an element with key k is proportional to T[h(k)]'s length. We
must analyze this (T[h(k)]'s length) it two cases: search unsuccessful,
and search successful. (Assume simple uniform hashing.)
Unsuccessful: In T with collision resolution through
chiaining, search time is Q ( 1+a
), in the average case.
Proof: Any key, k, is equally likely to hash into ANY
of the m slots. Thus the time to unsuccessfully search is the time to search
through one of the m lists. Average length = n/m = a. With h (k) computation
costing O (1) plus O (a) for
searching, we get O ( 1+a ).
Successful: A successful search in a hash table with
chaining takes time Q ( 1+a
). Assume Chained-Hash-Insert adds new elements to the end of the list. The
time to search for element with key k is one more (the successful element)
than searched for when inserting that element. The length of the list to
insert the ith element is ( i - 1 ) / m , so expected elements
examined in successful search:
1 / n |
n
å
i = 1
|
(1 + ( i - 1) / m ) |
|
= 1 + 1/ n m |
|
= 1 + 1/ n m |
|
= 1 + ( n ( n-1 )) / 2 n m |
|
= 1 + n/ 2 m - 1 / 2 m |
|
= 1 + a/2 - 1 / 2 m |
|
Running time = 2 + a/2 - 1 / 2 m
= Q ( 1+a )
Note: if a = n/m = O (1) ,
n » m, then all run (expected) in O (1) .
Why do you need to hash?
Discrete log problem (Diffie-Helman) Let a generate a
group G, thus for any element e Î G
e = ai for some i
0 £ i < |G|
(An example is G = integers mod p, a is a primitive root mod
p.)
Discrete log problem in G: Given a and e find i. A
general solution to this problem is the giant-step / baby-step algorithm:

Building a Kangaroo trap for solving the discrete log problem.
Each giant-step see if you land in the "baby-step"
region. Need O (Ö |G|)steps to do this,
on average. This is the "Birthday Paradox". |G| = 264 Ö
|G|) = 232, must hash. |