Open Addressing
Store all elements in T without chaining for collision
resolution. Instead use empty spaces in T. Consequences:
-
a ( load factor ) can never
be bigger than one!
-
Must deterministically search for new spaces when there
is a collision
Collision resolution is handled by probing the hash table
(with m slots). Augment h:
U x {0, ..., m-1} ® {0, ...,
m-1}
We probe < h (k, 0), h ( k, 1), ..., h ( k, m-1)>
until a slot/element is found. Note < h (k, 0), h ( k, 1), ..., h ( k,
m-1)> must be a permutation of {0,...,m-1} for each k. (must be able to
get everywhere from everywhere)
Hash-Insert (T, k)
-
i ¬ 0
-
repeat j ¬ h (k, i)
-
if T[j] = NIL
-
then T[j] ¬
k
-
return j
-
else i ¬ i
+ 1
-
until i = m
-
error "hash table overflow"
Hash-Search (T, k)
-
i ¬ 0
-
repeat j ¬ h (k, i)
-
if T[j] = k
-
then return j
-
i ¬ i + 1
-
until T[j] = NIL or i = m
-
return NIL
Use NIL / end of table as marker to stop search
Problem: When deleting, cannot leave a NIL
behind or the elements afterward, with key k would become inaccessible. Solution:
don't delete with open addressing!
Methods of probing
Linear Probing:
h (k, i) = ( h' (k) + i ) (mod m)
k ® h' (k) ®
| ¯ 0
® h' (k) ® |
¯ 1
® h' (k) ® |
¯ 2
® h' (k) ® |
¯ 3
this eventually wraps.
One problem: primary clustering. One develops long
strings of occupied spaces in T. Note: h ( k, i) = ( h' (k) + c i )
(mod m) does not help primary clustering: Clusters develop from many
different keys initially hashing close together. This still has constant
jumps.
What about: h ( k, i ) = h' (k), i = 0
= h ' (k) + ai ) ( mod m ) otherwise
jumps will no longer be the same size, thus defeating
primary clustering. Question: What are good choices of a? a's
with the property that <a0 ,a1 ,a2
(mod m),a3 (mod m), ... am-1 (mod m)> is a
permutation of <0,1,2, ..., m-1>. Such an a is called a primitive
root modulo m.
Quadratic Probing:
h ( k, i ) = ( h' (k) + c1 i + c2
i2) ( mod m )
To get the "full period" c1, c2,
m are constrained. This does not have the same jump each time, and so it
does not have primary clustering.
If h (k1, 0) = h (k2,
0), then h (k1, i) = h (k2, i) and they mirror one
another. This leads to secondary clustering.
Note: h ( k, i ) = (
h' (k) + ai ) (mod m ) would also suffer from secondary
clustering.
Double Hashing:
h (k, i) = ( h1 (k) + i h2 (k) ) (
mod m )
The jump depends on k as well. To get the "full
period" we must have gcd ( m, h2 (k) ) = 1.
If gcd (
m, h2
(k) ) = d then after m/d steps you would wrap!! This means you could only
access 1/d of T.
Choices:
-
h1 (k) arbitrary "hash"
-
m = 2p h2 (k) always produces
odd numbers ( gcd ( 2p, odd ) = 1 )
- h1 (k) arbitrary "hash"
- m = prime h2 (k) return an integer modulo m
Example: h1 (k) = k ( mod m) , h2 (k) = 1
+ ( k mod m' )
Note: There are m! different permutations you can
get.
Linear and quadratic probing give you just one ( neglecting h' (k) ).
Double hashing gives you m more for total Q (
m ) possible permutations. This leads double hashing to giving close to
SUH performance.
Analysis of Open-Address Hashing:
a = n/m < 1 is our figure of merit.
Analyze performance as n, m ® ¥, a
constant.
Assumption: the probe
sequence < h (k, 0), h ( k, 1), ..., h ( k, m-1)> is a
permutation of <0,...,m-1> for each k. Assume it is equally likely
to be any of the m! permutations.
Theorem: With open-address
hashing with a = n/m < 1 the expected
number of probes in an unsuccessful search is at most 1/ (1 - a)
> 1 .
Proof: When unsuccessful. each probe accesses
a full
slot except the last.
Let pi = Prob[ exactly i probes access occupied
slots.]
Expected number of probes:
to find the empty slot.
Let qi = Prob[ at least i probes access occupied
slots.]
q1 = n/m = a
q2 = ( n/m) ( (n-1)/(m-1) )
qi = ( n/m) ( (n-1)/(m-1) ) ... ( (n - i + 1)/(m
- i + 1) )
n/m > ? < (n-1) / (m-1)
n ( m-1 ) = n m - n
m ( n - 1 ) = n m - m, n < m
n ( m - 1 ) > m ( n - 1 )
n/m > ( n - 1 ) / ( m - 1 ), so
qi = ( n/m) ( (n-1)/(m-1) ) ... ( (n - i + 1)/(m - i +
1) ) < ( n/m)i = ai
= 1 + a + a2
... = 1 / ( 1 - a ) , a
< 1.
Corollary: Inserting an element into an open-address table with a
requires at most 1 / ( 1 - a ) probes.
Proof: Insertion requires, an unsuccessful search then an insertion.
Theorem:
Given an open-address T with a < 1, the
expected number of probes in a successful search is: 1/a
ln 1 / ( 1 - a ) + 1 / a
.
Proof: Searching for k follows the
same probe sequence as inserting it.
If k is the ( i + 1 )st
key inserted into T then:
1 / ( 1 - i/m ) = m
/ ( m- i )
is the expected number of probes for the search. Averaging over all
keys:
1/n |
n-1
å
i = 0 |
m / ( m- i ) = m/n |
|
n-1
å
i = 0 |
1 / ( m - i ) |
|
= 1/a ( 1/m + 1 / ( m-1 ) + ... + 1
/ ( m - n + 1 ))
= 1/a ( Hm
- Hm-n )
Hi = |
i
å
j = 1 |
1/j, ith Harmonic
number. |
|
Recall ln i £ Hi £
ln i + 1
1/a ( Hm
- Hm-n ) £ 1/a
( ln (m) + 1 - ln ( m-n ) )
= 1/a ln m / ( m-n ) + 1/a
m / ( m-n ) = (m/n) / ( m/m - n/m ) = 1 / ( 1 - a
)
= 1/a ln 1/ ( 1 - a )
+ 1/a
Example:
a =
0.5 1/2 ln 2 + 1/2 »
3.387
a =
0.9 10/9 ln 10 + 10/9 »
3.670
a = 0.99 100/99 ln
100 + 100/99 » 5.662
a = 0.999 1000/999 ln
1000 + 1000/999 » 7.915
a » 1 ®
ln 1 / ( 1-a ) + 1
|