Tables

Tables, like lists, are ubiquitous in everyday life and also fundamental tools in computing. In computing, table, dictionary, and map are essentially synonymous, except possibly in particular documented implementation environments. The C++ STL uses the term "map". The use of "map" for ADT table has nothing to do with cartography, but rather is an abbreviation for "mapping", which in mathematical terms is a function from one data type to another. If one finds the definition of function given in terms of ordered pairs, it is easy to conclude that ADT table is a mapping from one type (KeyType) to another (DataType).

Associate arrays are just tables with a special overload of the bracket operator. We will return to this special concept later. For now, it suffices to emphasize that pointer arithmetic is not supported by this bracket operator.

Tables and associative arrays use unimodal semantics, which means that duplicate key entries are not allowed. An insert(key,data) operation has a dual personality: if the key is not in the table, then the pair is inserted. If the key is already in the table, then the data that is in the table is replaced (overwritten) with the incoming data.

Table ADT

The abstract data type table is defined in this slide. A table stores associations between a "key", intended to be the search-for/access data item, and "data", intended to be the data item being looked for. For example, in a classic dictionary, the "key" is a word and the "data" is all of the stored information about the word, such as definitions, usage, origin, and pronounciation. Such a dictionary could be implemented using ADT table.

The operations for ADT table consist of Insert(), Remove(), Includes(), Empty(), and Size(). We discuss these individually.

void Insert (KeyType k, DataType d)
Preconditions: none
Postconditions: k is in the table and is associated with data d
Thus, if k is not already in the table, the pair (k, d) is inserted. If k is already in the table, the data associated with k is overwritten by d.

void Remove (KeyType k)
Preconditions: none
Postconditions: k is not in the table

bool Includes (KeyType k, DataType& d) const
Preconditions: none
Postconditions: state of table is not changed
Return value: if k is in the table, returns true and sets value of d to value associated with k; if k is not in table, returns false and value of d is unspecified

bool Empty() const
Preconditions: none
Postconditions: state of table is not changed
Return value: true iff table contains no keys

unsigned int Size() const
Preconditions: none
Postconditions: state of table is not changed
Return value: number of keys in table

Note: In actual implementatons, it is recommended to replace parameters passed as values with parameters passed as constant references, for efficiency. Another common enhancement is to return bool values by Insert() and/or Remove() indicating whether the key existed in the table prior to the call.

A set of axioms for ADT table is also shown. These are for illustration purposes. We will not formally derive uniqueness of ADT table from axioms.

Exploring Implementation Possibilities - 1,2

It is possible to design implementations of ADT table using existing containers. Here we explore three possibilities.

List < pair < KeyType , DataType > >
This plan would use a TList<pair> object L as a basic container and use sequential search to implement Insert(), Remove(), and Includes(). For example, Insert() could consist of sequential search using a list iterator I, then either *I = d if found or L.Insert(I, (k,v)) if not found. Sorted order in L could be maintained, or not. In either case, Insert(), Remove(), Includes() runtimes are O(size), because sequential search is part of each operation.

Because of the slow runtimes, this plan is practical only for very small applications.

Sorted vector < pair < KeyType , DataType > >
This plan would use a TVector <pair>object V (or a TDeque<>) as a basic container, maintaining sorted order by key values. Binary search would then replace sequential search. This would speed up Includes() to O(log size), but both Insert() and Remove() would remain O(size) due to the innefficiency of insertion into the middle of a vector (or deque) range.

In applications where Includes() is the only operation whose runtime efficiency is important (such as the Password Server example), this implementation may be appropriate.

Vector < DataType >
If the KeyType is (convertable to) unsigned int, then we could use a vector in a different way: index directly on the key using a TVector V as the basic container. Then we would have random access to elements using the vector bracket operator to facilitate implementation of the tablle operations. For example, both Insert() and Includes() would be essentially "V[k] = d". Remove() might be a technical problem, because we cannot remove a vector index, but we could get around that by putting a "null" datum at each "removed" key. All of the table operations except Size() would run in time O(1). (Size() would require iterating through V, counting the number of "null" data items, and subtracting this value from V.Size(): an O(V.Size()) endeavor.) The problem with this plan is that we must use an entire range of key values for the vector index.

For example, suppose we wanted to keep employee records using social security number (SSN) as primary key. The range of SSN is [0, 999999999] = [0, 1,000,000,000), so we would require a vector of size one billion, even if we had only a few hundred employee records. This would make the Size() method very slow, and worse, it would be an enormous waste of storage, because space for one billion employee records would be reserved, even though all but a few of them would be "null" at any given time. This innefficiency, together with the constraint on KeyType, make this option workable for only a few exceptional cases.

Table Performance Goals

Table operations are fundamental to virtually all data management systems, including data base systems. It is therefore imperative that tables implementations be as generally applicable as possible and use resources efficiently, particularly time and space. In particular,

The three methods Insert(), Remove(), Includes() should have excellent runtime performance, no worse than O(log size)
Space overhead should be modest, no worse than O(size)
There should be no unreasonable restrictions on KeyType.

Each of our three straw-man implementation possibilities discussed above fails at least one of these requirements. We shall have to do considerably more work to achieve these goals. Indeed, much of the theory and practice of data structures, algorithms, and data bases is centered around finding solutions to this requirements list.

Hash Tables - 1

One classic solution to the tables requirements problem is the so-called hash table. Hash tables present the fastest access performance of any known data structure that satisfies requirements 2 and 3 above. They are found in such applications where speed is of utmost importance but a hardware solution is not practical. Examples include symbol tables for runtime environments (such as those of Ada, C, C++, and Java) and route tables for internet routers.

Hash tables are a hybrid of the vector and list "straw-men", facilitated by the use of a hash function to convert KeyType to unsigned integer. The basic idea is to use a vector v whose elements are lists of (key, data) pairs. The vector index is the hash value of the key. Thus v[n] is a list of all table entries (key, data) with n = HashFunction(key). These vector elements are called buckets in the context of hash tables.) Let's begin an example, using a very simplified hash function and easily understood data.

Suppose we want to store <String, int> pairs in a table, with String = KeyType and int = DataType. Suppose also that our String hash function is given by

unsigned int HashFunction (const String& S)
{
  unsigned int hval(0), i;
  for (i = 0; i < S.Size(); ++i)
    hval += S[i] - 'a';
  return hval;
}

(This is the simple hash function discussed in Slide 2 of Chapter Hashing, Hash Functions, and their Uses.) To set up a hash table, we first select a size for the vector v, say 10. (In practice, we will use prime numbers for vector size. We use 10 here to make the arithmetic transparently convenient.) We modify the hash function by taking the remainder when divided by 10, to guarantee that has values coincide with the vector index range. Then initially the hash table could be represented something like:

v[0]:
v[1]:
v[2]:
v[3]:
v[4]:
v[5]:
v[6]:
v[7]:
v[8]:
v[9]:

showing that all 10 buckets are empty. Now

Insert(ab,15);

First note that HashFunction(ab) == 0 + 1 == 1. Therefore this pair will be stored in the bucket (list) at vector index 1, that is, in v[1]. The table representation is now

v[0]:
v[1]:  (ab,15)
v[2]:
v[3]:
v[4]:
v[5]:
v[6]:
v[7]:
v[8]:
v[9]:

Now

Insert(bz,12);
HashFunction(bz) = 1 + 25 = 26

This has value is out of range of the vector index, so we divide by the size of the vector and use the remainder as the index: 26 % 10 = 6 so we store this pair in bucket v[6]:

v[0]:
v[1]:  (ab,15)
v[2]:
v[3]:
v[4]:
v[5]:
v[6]:  (bz,12)
v[7]:
v[8]:
v[9]:

Continuing, we insert the following pairs (with hash value of key and index shown)

(et, 20) h = 23 n = 3
(ds, 20) h = 21 n = 1
(aa, 20) h = 0  n = 0
(vf, 20) h = 26 n = 6
(sg, 20) h = 24 n = 4
(bv, 20) h = 22 n = 2
(hd, 20) h = 10 n = 0
(ek, 20) h = 14 n = 4
(kr, 20) h = 27 n = 7
(ez, 20) h = 29 n = 9

which results in the following representation:

v[0]:  (aa,20)  (hd,20)
v[1]:  (ab,15)  (ds,20)
v[2]:  (bv,20)
v[3]:  (et,20)
v[4]:  (sg,20)  (ek,20)
v[5]:
v[6]:  (bz,12)  (vf,20)
v[7]:  (kr,20)
v[8]:
v[9]:  (ez,20)

Note that we have now inserted 12 items in the table, with no restriction on the keys. The buckets at each index expand as needed when two keys end up with the same vector index (a collision). These insert operations add only a list item's worth of memory to the table. Insert requires a sequential search of the bucket to overwrite the data associated with the key (if found) or to invoke PushBack(k,d) (if not found).

Now let's look up an item in the table:

Includes(vf, data);

First we compute the hash value and vector index from vf: h = 26 n = 6. Next we access the bucket v[6]. Finally, we search the bucket for the key vf (sequential search), find it, and retrieve the data, setting data = 20. A similar process would implement Remove(). Note that each step in this process runs in constant time, except for the sequential search of the bucket, which runs in O(bucket size). If we can keep the size of buckets small, then the runtime of all of the table operations will be small.

We can formalize this process as the following

Hash Table Search Algorithm for pair (key, data)

   1: compute hash value hval for key (modulo vector size)    O(1)

   2: access bucket at that vector index: v[hval] O(1)

   3: search this bucket O(v[hval].Size()) [worst case]

In general, all three table operations could use this search algorithm.

Hash Tables - 2, 3, 4

We have just seen that the runtime efficiency of the hash table operations depends on keeping the size of buckets small. In fact, our goal is to keep the size of buckets O(1), so that the runtime efficiency of all hash table operations is O(1). Clearly the best we can do is to keep the bucket size close to the average bucket size, which is

average bucket size = (table size) / (number of buckets)

If our hash function has good pseudo-randomness, then it will distribute the hash values uniformly and without key bias in the vector index range, resulting in a small variation in bucket size. (It is always good practice to improve pseudo-randomness by using a prime number of buckets.) In any case, the amortized runtime for the search algorithm will be

amortized search time	= O( average size of non-empty buckets )
	= O( average bucket size )
	= O( (table size) / (number of buckets) )
	= O( `Table.Size()` / `v.Size()` )

which is O(1) provided O(Table.Size()) = O(v.Size()). With this analysis we can now describe design specifications for a hash table container class.

Hash Table Class Design
We would like to make this class as general as possible, using template parameters. Of course we need to implement the ADT table protocol. Some kind of iterator support will be needed. We will also require the Clear() and Dump() methods, with their usual meaning. Due to the special nature of the constructor requirements for hash tables, we will not allow copies to be made, and there will be no default constructor.

Clearly we need to make KeyType and DataType template parameters. In case the client has a special hash function for KeyType, it will be helpful to pass hash function objects as parameters, so we make HashClass a template parameter. Finally, becuase it may be that clients want to use specific containers for their buckets, we make BucketType a template parameter.

The has table constructor presents a problem for both the class design and later the client. In order to achieve the goal of O(1) bucket size, we need to ensure that the number of buckets is approximately the size of the table. (Actually, we need to give the client the ability to ensure this, since only the client can have knowledge of the table size.) We also need to ensure that the number of buckets is a prime number, which will add significantly to the pseudo-randomness of the hashing. Thus we need to have a constructor parameter that allows the client to estimate the table size, and then our constructor must find a prime number that is "near" this inout parameter and then make the number of buckets that prime number.

There two functions in cpp/primes.h, cpp/primes.cpp:

unsigned int PrimeBelow (unsigned int n);  // returns largest prime <= n
unsigned int PrimeAbove (unsigned int n);  // returns smallest prime >= n

that can be used to find prime numbers. (These use the Sieve of Eratosthenes algorithm.) The hash table constructor should take an unsigned int parameter and convert that to prime using one of these functions and then instantiate the bucket vector.

In summary:

Template parameters: KeyType, DataType, HashClass, BucketType
Table protocol
Clear(), Dump()
Iterator support
Prevent copying of tables by privatizing copy constructor and assignment operator
No default constructor
Constructor parameter unsigned int approximateSize
Bucket vector size is either PrimeBelow(approximateSize) or PrimeAbove(approximateSize)
Constructor needs to instantiate the hash object
Implement as adaptor class

Distribution files:

primes.h, primes.cpp // functions PrimeBelow() and PrimeAbove()
thash.h              // THash < KeyType > (hash function objects)
fchtbl.cpp           // test program for CHashTable < K, D, H, B >
ranfile.cpp          // creates files of <string, int> data

Exercise 1: Hash table bucket sizes. Log in to your course account and go to the course library. You should have the following files in this directory:
area51/fListHashTable.x
area51/fSortedListHashTable.x
area51/fBSTHashTable.x
tests/tables/table1
tests/tables/table2
tests/tables/table3
tests/tables/table4
tests/tables/table1.bad
tests/tables/table2.bad
tests/tables/table3.bad
tests/rantable.cpp

Start the executable fListHashTable.x.
Enter "75" for the approximate number of buckets
Enter "F0" to see the table contents displayed to screen; note the table is empty
Enter "D0" to dump internal structure to screen; how many actual buckets do you have?
Enter "L table1"
Enter "S" to get size; what is the size of this table?
Enter "F0" to see the table contents displayed to screen; note the table is not empty.
Enter "D0" to dump internal structure to screen (alternatively, enter "Dx1" to dump internal structure to file x1), and answer the following by examining the output:

What is the largest bucket size?
What is the smallest non-empty bucket size?
What is the average bucket size, excluding empty buckets?
What is the range [smallest, largest] of sequential search items to examine in accessing elements in this table?

Exercise 2: Hash table data updating

(Continue using fListHashTable.x)
Enter "C" to clear the table data (or re-start the program).
Enter "L table2", and examine the new table.
Enter "L table3"; what has changed?

Exercise 3: Hash table bucket sizes, part 2

(Continue using fListHashTable.x)
Enter "C" to clear the table data (or re-start the program).
Enter "L table4" to load table 4; note the size of this new table
Enter "Q" to exit the program, and re-start the program, with a new estimate for the number of buckets appropriate for table 4.
Repeat questions 1-4 of Exercise 1.

Exercise 4: Hash table bucket structure

Experimenting with the executables fListHashTable.x, fSortedListHashtable.x, and fBSTHashTable.x, describe the differences in structure you can detect among the three table test programs using the interface menu.

Exercise 5: Experimenting with random tables

Compile the program rantable.cpp. Use the executable to create a table with 1000 elements with string length 3.
Use one of the hash table executables (with approximately 1000 buckets) to read this file. What is the size of the table?
How do you explain the discrepancy between the size of the input file (1000) and the size of the table?
Repeat questions 1-4 of Exercise 1.

Class TEntry < KeyType, DataType>

template <typename K, typename D>
class TEntry
{
  public:
    typedef K KeyType;
    typedef D DataType;
    const   KeyType  key_;
            DataType data_;

    // no default constructor, because key is const
    TEntry  (K k);
    TEntry  (K k, D d);
    TEntry  (const TEntry& e);
    TEntry& operator =  (const TEntry& e);
    int     operator == (const TEntry e2) const;
    int     operator != (const TEntry e2) const;
    int     operator <= (const TEntry e2) const;
    int     operator >= (const TEntry e2) const;
    int     operator >  (const TEntry e2) const;
    int     operator <  (const TEntry e2) const;
} ;

// one stand alone operator

template <typename K, typename D>
std::ostream& operator << (std::ostream& os, const TEntry<K,D>& e)
{
  os << e.key_ << ':'<< e.data_;
  return os;
}

// hash function class template
template <typename K, typename D>
class THashEntry
{
public:
  unsigned int operator ()(const TEntry <K,D>& e) const
  {
    return fsu::HashFunction (e.key_);
  }
};

template <typename K, typename D>
TEntry<K,D>::TEntry(K k) : key_(k)
{}

template <typename K, typename D>
TEntry<K,D>::TEntry(K k, D d) : key_(k), data_(d)
{}

template <typename K, typename D>
TEntry<K,D>::TEntry(const TEntry<K,D>& e) :   key_(e.key_), data_(e.data_)
{}

template <typename K, typename D>
TEntry<K,D>& TEntry<K,D>::operator = (const TEntry<K,D>& e)
{
  if (key_ != e.key_)
    std::cerr << "** Entry error: cannot assign entrys with different keys\n";
  else
    data_ = e.data_;
  return *this;
}

template <typename K, typename D>
int TEntry<K,D>::operator == (const TEntry<K,D> e2) const
{
  return (key_ == e2.key_);
}

template <typename K, typename D>
int TEntry<K,D>::operator != (const TEntry<K,D> e2) const
{
  return (key_ != e2.key_);
}

template <typename K, typename D>
int TEntry<K,D>::operator <= (const TEntry<K,D> e2) const
{
  return (key_ <= e2.key_);
}

template <typename K, typename D>
int TEntry<K,D>::operator >= (const TEntry<K,D> e2) const
{
  return (key_ >= e2.key_);
}

template <typename K, typename D>
int TEntry<K,D>::operator > (const TEntry<K,D> e2) const
{
  return (key_ > e2.key_);
}

template <typename K, typename D>
int TEntry<K,D>::operator < (const TEntry<K,D> e2) const
{
  return (key_ < e2.key_);
}

// less and greater function classes

template <typename K, typename D>
class TEntryLessThan
{
public:
  int operator () (const TEntry<K,D>& e1, const TEntry<K,D>& e2) const
  {
    return e1.key_ < e2.key_;
  }
} ;

template <typename K, typename D>
class TEntryGreaterThan
{
public:
  int operator () (const TEntry<K,D>& e1, const TEntry<K,D>& e2) const
  {
    return e2.key_ < e1.key_;
  }
} ;

Hash Table Adaptor Class

The hash table adaptor class needs the following methods in the underlying container C, where T is C::ValueType:

C::Iterator Includes (const T& t);                 // returns location for t
C::Iterator Insert   (const T& t);                 // inserts t and returns location
bool        Insert   (C::Iterator& I, const T& t); // places copy of t at I
bool        Remove   (C::Iterator I);              // removes item at I
bool        Empty    ();                           // true iff Size() returns zero
size_t      Size     ();                           // returns the number of elements in C
void        Clear    ();                           // makes C empty
C::Iterator Begin    ();                           // returns iterator to first element of C
C::Iterator End      ();                           // returns iterator past the last element of C

and the class C::Iterator needs to be a forward iterator. No particular behavior of C::Includes(t) is assumed when t is not in C, except that the C::Iterator points to a suitable location for insertion of t. Note that this allows C to "make the decision" as to how to search efficiently and where items should be inserted based on the specific design of C. The following are implicit, but legitimate, assumptions on C:

All of the C:: methods used have constant runtime except for C::Includes(), C::Clear(), and C::Size().
C::Includes() and C::Size() can be assumed to be optimized for performance relative to the design of C.

The following containers meet all of the requirements and assumptions stated above:

list
ordered_list
binary_search_tree
set

Here is part of the file structure containing the hash table adaptor class (as usual, namespace has been omitted):

/*  chashtbl.h
    
    Defining the classes CHashTable <K, D, H, C>
                     and CHashTable <K, D, H, C> :: Iterator

    <K,D> = ValueType
    H     = HashType
    C     = BucketType (a container class)

    Iterator category = forward iterator

    Note: a possible point of confusion is that 
          C           :: ValueType is TEntry<K,D>, while 
          TEntry<K,D> :: ValueType is D.
*/

// directives, namespace, and declarations omitted

template <typename K, typename D, class H, class C>
class CHashTable
{
  friend class CHashTableIterator <K,D,H,C>;
  public:
    typedef K                           KeyType;
    typedef C                           BucketType;
    typedef H                           HashType;
    typedef typename C::ValueType       ValueType;
    typedef CHashTableIterator<K,D,H,C> Iterator;

    Iterator   Insert       (const K& key, const D& data);
    int        Remove       (const K& key);
    int        Includes     (const K& key, D& data) const;
    Iterator   Includes     (const K& key)   const;

    void       Clear        ();
    size_t     Size         () const;
    int        Empty        () const;

    Iterator   Begin        () const;
    Iterator   End          () const;

    explicit   CHashTable   (size_t numBuckets);               // uses default hash object
               CHashTable   (size_t numBuckets, H hashObject); // user supplies hash object
               ~CHashTable  ();

    void       Dump         (std::ostream& os, int c1 = 0, int c2 = 0)  const;

  protected:
    size_t         numBuckets_;
    TVector < C >  bucketVector_;
    H              hashObject_;
    size_t         Index  (const K& key) const;

  private:
    CHashTable             (const CHashTable<K,D,H,C>&);
    CHashTable& operator = (const CHashTable&);
} ;

template <typename K, typename D, class H, class C>
CHashTable <K,D,H,C>::CHashTable (size_t n, H hashObject)
  :  numBuckets_(n), bucketVector_(0), hashObject_(hashObject)
{
  numBuckets_ = PrimeBelow(numBuckets_);
  bucketVector_.SetSize(numBuckets_);
}

template <typename K, typename D, class H, class C>
CHashTable <K,D,H,C>::~CHashTable ()
{
  Clear();
}

template <typename K, typename D, class H, class C>
size_t CHashTable <K,D,H,C>::Index (const K& k) const
{
  return hashObject_ (k) % numBuckets_;
}

template <typename K, typename D, class H, class C>
void CHashTable<K,D,H,C>::Clear ()
{
  // clear each bucket
}

template <typename K, typename D, class H, class C>
CHashTableIterator<K,D,H,C> CHashTable<K,D,H,C>::Begin () const
{
  // see CHashTable<K,D,H,C>::Iterator::operator ++(); 
}

template <typename K, typename D, class H, class C>
CHashTableIterator<K,D,H,C> CHashTable<K,D,H,C>::End () const
{
  // see CHashTable<K,D,H,C>::Iterator::operator ++(); 
}

template <typename K, typename D, class H, class C>
int CHashTable<K,D,H,C>::Includes (const K& k, D& d) const
{
  // create a pair
  // perform the hash search algorithm, calling an appropriate C:: method
  // if key is found, place the stored data in d
  // return 1 for success, 0 for failure
}

template <typename K, typename D, class H, class C>
CHashTableIterator<K,D,H,C> CHashTable<K,D,H,C>::Includes (const K& k) const
{
  // create a pair
  // create an iterator
  // perform the hash search algorithm, calling an appropriate C:: method
  // return the iterator pointing to the found pair
}

template <typename K, typename D, class H, class C>
CHashTable<K,D,H,C>::Iterator CHashTable<K,D,H,C>::Insert (const K& k, const D& d)
{
  // create a pair
  // create an iterator
  // perform the hash search algorithm, calling an appropriate C:: method
  // if k is found, overwrite the stored data with d, otherwise insert the pair
  // return the iterator pointing to the pair in the table
}

template <typename K, typename D, class H, class C>
int CHashTable<K,D,H,C>::Remove (const K& k)
{
  // create a pair
  // perform the hash search algorithm, calling an appropriate C:: method
  // remove the item if found
  // return 1 for success, 0 for failure
}

template <typename K, typename D, class H, class C>
size_t CHashTable<K,D,H,C>::Size () const
{
  // TBS
}

template <typename K, typename D, class H, class C>
int CHashTable<K,D,H,C>::Empty () const
{
  // TBS
}

template <typename K, typename D, class H, class C>
void CHashTable<K,D,H,C>::Dump (std::ostream& os, int c1, int c2) const
{
  typename BucketType::Iterator I;
  for (size_t i = 0; i < numBuckets_; ++i)
  {
    os << "b[" << i << "]:";
    for (I = bucketVector_[i].Begin(); I != bucketVector_[i].End(); ++I)
      os << '\t' << std::setw(c1) << (*I).key_ << ':' << std::setw(c2) << (*I).data_;
    os << '\n';
  }
}

Hash Table Iterator Adaptor Class

template <typename K, typename D, class H, class C>
class CHashTableIterator
{
  friend class CHashTable <K,D,H,C>;
  public:
    typedef K                           KeyType;
    typedef C                           BucketType;
    typedef H                           HashType;
    typedef typename C::ValueType       ValueType;
    typedef CHashTableIterator<K,D,H,C> Iterator;
    CHashTableIterator ();
    CHashTableIterator (const CHashTableIterator<K,D,H,C>& i);
    int Valid          () const;
    CHashTableIterator <K,D,H,C>& operator =  (const CHashTableIterator <K,D,H,C>& i);
    CHashTableIterator <K,D,H,C>& operator ++ ();
    CHashTableIterator <K,D,H,C>  operator ++ (int);
    TEntry <K,D>&                 operator *  ();
    const TEntry <K,D>&           operator *  () const;
    int operator == (const CHashTableIterator<K,D,H,C>& i2) const;
    int operator != (const CHashTableIterator<K,D,H,C>& i2) const;

  protected:
    const CHashTable <K,D,H,C> * tablePtr_;
    size_t                       bucketNum_;
    typename C::Iterator         bucketItr_;
} ;

template <typename K, typename D, class H, class C>
CHashTableIterator<K,D,H,C>::CHashTableIterator () 
  :  tablePtr_(0), bucketNum__(0), bucketItr__()
{}

template <typename K, typename D, class H, class C>
CHashTableIterator<K,D,H,C>
::CHashTableIterator (const CHashTableIterator<K,D,H,C>& I)
  :  tablePtr_(I.tablePtr_), bucketNum_(I.bucketNum_), bucketItr_(I.bucketItr_)
{}

template <typename K, typename D, class H, class C>
int CHashTableIterator<K,D,H,C>::Valid () const
{
  if (tablePtr_ == 0)
    return 0;
  if (bucketNum_ >= tablePtr_->numBuckets_)
    return 0;
  return bucketItr_.Valid();
}

template <typename K, typename D, class H, class C>
CHashTableIterator <K,D,H,C>&
  CHashTableIterator<K,D,H,C>::operator = 
  (const CHashTableIterator <K,D,H,C>& I)
{
  // TBS
}

template <typename K, typename D, class H, class C>
CHashTableIterator <K,D,H,C>& CHashTableIterator<K,D,H,C>::operator ++ ()
{
  // increment the bucket itr
  // if bucket itr is not at the end of the bucket, return itr
  // if bucket itr is at end of bucket, 
  //   start at beginning of next non-empty bucket and return itr
  // if a non-empty bucket is not found, return the end itr
}

template <typename K, typename D, class H, class C>
CHashTableIterator <K,D,H,C> CHashTableIterator<K,D,H,C>::operator ++ (int)
{
  // TBS
}

template <typename K, typename D, class H, class C>
TEntry<K,D>& CHashTableIterator<K,D,H,C>::operator * () const
{
  // if itr is valid, return the pair to which it points
}

template <typename K, typename D, class H, class C>
int CHashTableIterator<K,D,H,C>
::operator == (const CHashTableIterator<K,D,H,C>& I2) const
{
  if (!Valid() && !I2.Valid())
    return 1;
  if (Valid() && !I2.Valid())
    return 0;
  if (!Valid() && I2.Valid())
    return 0;

  // now both are valid
  if (tablePtr_ != I2.tablePtr_)
    return 0;
  if (bucketNum_ != I2.bucketNum_)
    return 0;
  if (bucketItr_ != I2.bucketItr_)
    return 0;
  return 1;
}

template <typename K, typename D, class H, class C>
int CHashTableIterator<K,D,H,C>
::operator != (const CHashTableIterator<K,D,H,C>& I2) const
{
  // TBS
}

Associative Arrays

An associative array is a table (with or without the Insert() method) with the addition of a special bracket operator prototypes as:

DataType& operator [] (const KeyType& k);

with the same kind of "failsafe" dual semantics as that of Table::Insert(). For example, suppose we declared an associative array

AssociativeArray <KeyType, DataType> aa;

Then the first time a given key value is accessed, as in

aa[k];

the key k is inserted into aa. In the first and all subsequent accesses a reference to the data associated with k is retrieved. Thus the following sequence of statements

aa[k] = x;
aa[k] = y;

would first insert the pair (k,x) into aa and then change the data associated with k from x to y.

Hash Associative Array Adaptor Class

template <typename K, typename D, class H, class C>
class CHashMap
{
  friend class CHashMapIterator <K,D,H,C>;
  public:
    typedef K                           KeyType;
    typedef C                           BucketType;
    typedef H                           HashType;
    typedef typename C::ValueType       ValueType;
    typedef CHashMapIterator<K,D,H,C>   Iterator;

    D&         operator []  (const K& k); // associative array operator

    int        Remove    (const K& k);
    int        Includes  (const K& k, D& d) const;
    void       Clear     ();
    size_t     Size      () const;
    int        Empty     () const;

    Iterator   Includes  (const K& k)   const;
    Iterator   Begin     () const;
    Iterator   End       () const;

    explicit   CHashMap  (size_t numbuckets);
               CHashMap  (size_t numbuckets, H ho);
               ~CHashMap ();

    void       Dump      (std::ostream& os, int c1 = 0, int c2 = 0)  const;

  protected:
    size_t         numBuckets_;
    TVector < C >  bucketVector_;
    H              hashObject_;
    size_t         Index  (const K& key) const;

  private:
    CHashMap             (const CHashMap<K,D,H,C>&);
    CHashMap& operator = (const CHashMap&);
} ;

template <typename K, typename D, class H, class C>
D& CHashMap<K,D,H,C>::operator [] (const K& k)
{
  // perform the hash search algorithm, calling an appropriate C:: method
  // if key is not found, insert a pair (k,?) into the table
  // return the data associated with the key in the table
}

The associative array bracket operator may be used as either an Lvalue or an Rvalue. Here is sample code:

CHashMap<KeyType, DataType, HashType, BucketType> m;
DataType d1, d2;
KeyType  k1, k2;

cin >> k1 >> d1;
m[k1] = d1; // associates data d1 with key k1 in table, inserting if necessary

d2 = m[k2]; // retrieves data associated with key in table
cout << d2; // outputs data associated with key k2

// traversal done with iterator, not key range:
CHashMap<KeyType, DataType, HashType, BucketType>::Iterator I;
for (I = m.Begin(); I != m.End(); ++I)
  cout << (*I).key << '\t' << (*I).value << '\n';

Note that the requirements and assumptions placed on the container C and C::Iterator are the same as for the hash table adaptor.

Hash Table Search Algorithm for pair `(key, data)`
1: compute hash value `hval` for `key` (modulo vector size)	O(1)
2: access bucket at that vector index: `v[hval]`	O(1)
3: search this bucket	O(`v[hval].Size()`) [worst case]