Main Page | See live article | Alphabetical index

Hash table

In computer science, a hash table is a data structure that implements an associative array. Like any associative array a hash table is used to store many key => value associations (this is a many to one relationship as the hash table is almost universally smaller than the number of keys).

A hash table maintains two arrays, one for keys, one for values (or possibly one array of (key, value) pairs - it doesn't really matter). The elements of these arrays are referred to as buckets. When required to find the associated value for a given key, the key is fed through a hash function to yield an integer (called the hash value). This integer is then the index to the associated value. Commonly the hash value is calculated in the field Z/pZ (with the hash table having a prime size p) to reduce hash collisions.

Hash tables offer insertions, lookups, and deletions in O(1) amortized time, with the caveat that hash collisions need to be dealt with. A collision is when two or more different keys hash to the same integer (they have the same hash value). Techniques for dealing with collisions include probing, chaining, and rehashing. Note that probing and chaining are generally mutually exclusive.

Probing is, upon finding a collision, moving to the next free bucket in the table (or incrementing by some number other than 1). Chaining involves using each bucket as a pointer to another structure, such as an array, a linked list, or even another hash (preferably with a different size and/or hashing function).

Both probing and chaining can be problematic because, as more and more elements are added to the hash, the O(1) property is lost! Rehashing is a technique to minimize this effect: by increasing the size of the table and recomputing the hash values with respect to the new table size, the O(1) property can be maintained. Hash tables work faster the fewer occupied entries they have because there will be fewer collisions. If the ratio of occupied entries to total entries is kept below some fixed constant then the performance of hash tables will be reasonable.

Widely useful, hash tables are found in a wide variety of programs.

See also: Distributed hash tables