This is hard to use as P (k)'s are not usually known (even
approximately.) If the k's are U[0,1) and independent then h (k) = ë
k m û (m fixed) satisfies SUH.
Hash functions are usually chosen to be as "independent"
of patterns that may exist in the k's. Note: SUH is powerful, but often hash
functions need to do "better than" just SUH. Sometimes
"mixing" and "separation" are required, so that a close
k1 k2 should be far apart in their hash value.
Aside: Why such "separating" hash function? Cryptographic
verification of signatures.
What if my brother could change my will slightly (add the word
"not") and not change the digest? Bad news for me! Even
though keys may not be integers, lets consider them to be integers.
With a fixed hash function there are keys that will hash poorly. We
previously solved bad worst case behavior by randomizing into average
cases: choose your hash functions randomly! This is called Universal
Hashing, to do this we must construct a family of hash functions to choose
from. H is such a family.
h Î H h: U ®
{ 0,..., m-1 }
if for each pair x, y Î
U # h ' h (x) = h (y) is | H | / m .
This means that h Î H randomly chosen will
give h (x) = h (y) (collision) with probability 1/m. This means that on
the average (with regard to the functions in H) we get SUH.
Theorem: Let h Î H. We hash n keys
into a table of size m, n £ m. Then the number
of expected collisions for a key x is less than one.
Proof: cyz
= { 1 if h (y) = h (z), 0 otherwise }
E [ cyz ] = 1/m (because
h was chosen randomly).
cx = total number of collisions with x
in T of size m with n keys.
E [ ( cx) ] = |
å
y Î T |
E ( ( cxy) ] = ( n-1 ) (
1/m ) |
|
assumptions: y ¹ x and ( n- 1 ) ( 1/m ) < 1
since n £ m.
How can we design H a universal class?
.
Example: |T| = m, m prime. x = <x0, x1, x2,
...,xr> Bytes, Max value Byte < m
<a0, a1, a2, ...,ar>
ai randomly chosen from {0, 1, 2, ..., m-1}
h (x) = |
r
å
i = 0 |
ai xi
( mod m ) |
|
|
H = Ua {ha}, has mr+1 members.
Theorem: The class H is a universal class.
Proof: Consider
x, y, can assume x0 ¹ y0.
With {a0, ...,ar} given:
a0 (x0
- y0 ) º |
r
-- å
i = 1 |
ai (xi - yi
) ( mod m ) |
|
Has only one a0 that solves it. (write down h (x) = h (y) for a0
).
Since m is prime:
a0 º |
r
-å
i = 1 |
ai (xi - yi
) (x0 - y0 )-1
(mod m) |
|
This means a0 can be found to cause a
collision each time. There are thus mr different collisions
here, one for each of the mr choices of <a1, a2,
a3, ...,ar>.
Since there are mr+1 ,
<a0, a1, a2, ...,ar>'s,
x and y collide with probability mr/mr+1 = 1/m Þ
H is Universal.
An aside on modular inversion: a-1( mod m ) is the integer
that solves:
a a-1 = 1 ( mod m )
For a to have an inverse ( mod m ) it must be that : gcd ( a, m ) = 1,
i.e. a and m have no common factor.
One computes gcd ( a, m ) via the Euclidean
algorithm ( will analyze this later this term.) A variant called the Extended
Euclidean Algorithm: given a, m produces gcd ( a, m ) = xa + ym. If
gcd ( a, m ) = 1 the x = a-1 !
Method 2: If m is prime, then am-1 º
1( mod m ) for any a. (This fact is the basis for probabilistic
primality testing.) thus a am-2 º 1
( mod m ) and a-1 º am-2
( mod m ). If modular multiplication is Q
(1) , then what is the cost of modular exponentiation?
a3 = a11 =
(a2 ) a ( s - m )
a4 = a100
= (a2 )2
ss
a5 = a101
= (a2 )2 a s (
s - m )
a10110 = a22 = (((a2
)2 a)2 a)2
So starting from the next to the m.s. bit and working towards the l.s.
bit of the exponent, when you come to a '0' Þ
square, and when you come to a '1' Þ square-multiply.
Why does this work?
Induction:
a1 = a
a10 = a2
a11 = a3
Assume ap is correct
q = 2 p + 1 ®
square-multiply
= 2 p ®
square
Cost: Q ( lg p ) operations.
Note:
this is the "giant-step" algorithm.