Associative array

In computing, an associative array, also known as a map or table, is an abstract data type very closely related to the mathematical concept of a function. Conceptually, an associative array is composed of a collection of keys and a collection of values, and each key is associated with one value. The operation of finding the value associated with a key is called a lookup or indexing, and this is the most important operation supported by an associative array. The relationship between a key and its value is sometimes called a mapping or binding. For example, if the value associated with the key "bob" is 7, we see say that our array maps "bob" to 7.

The operations that are usually defined for an associative array are:

Add: Bind a new key to a new value
Reassign: Bind an old key to a new value
Remove: Unbind a key from a value and remove it from the key set
Lookup: Find the value (if any) that is bound to a key

Table of contents

1 Examples
2 Data Structures for Associative Arrays

2.1 Efficient Representations
2.2 Association Lists
2.3 Specialized Representations

3 Language Support

3.4 In Smalltalk
3.5 In C++
3.6 In Lisp

4 External links

Examples

One can think of a telephone book as an example of an associative array, where names are the keys and phone numbers are the values. Another example would be a dictionary where words are the keys and definitions are the values. A database is a sort of generalized associative array.

Need more examples

Data Structures for Associative Arrays

Associative arrays are usually used when lookup is the most frequent operation. For this reason, implementations are usually designed to allow speedy lookup, at the expense of slower insertion and a larger storage footprint than other data structures (such as association lists).

Efficient Representations

There are two main efficient data structures used to represent associative arrays, the hash table and the self-balancing binary search tree. Relative advantages and disadvantages include:

Hash tables have faster average lookup and insertion time (O(1)), while balanced binary trees have faster worst-case lookup and insertion time (O(log n) instead of O(n)). These make trees more useful in real-time and interactive systems, and hash tables more useful for very large arrays.
Hash tables have more compact storage for small value types, especially when the values are bits.
There are simple persistent versions of balanced binary trees, which are especially prominent in functional languages.
Building a hash table requires a good hash function for the key type, which can be difficult to write, while balanced binary trees only require a less-than operator on the keys.
Balanced binary trees allow one to efficiently iterate over the keys in order, a highly expensive feat for a hash table.
Balanced binary trees can be easily adapted to efficiently assign a single value to a large ordered range of keys.

Association Lists

A simple but generally inefficient type of association map is an association list, which simply stores a list of key/value pairs, and each lookup scans through the list looking for a key match. Strong advantages of association lists include:

No knowledge is needed about the keys, such as an order or a hash function.
For small associative arrays, common in some applications, association lists can take less time and space than other data structures.
Insertions are worst-case constant time.

The disadvantage, however, is that remove and lookup operations take O(n) worst-case and average time. In particular, looking up a key which is not present will scan the entire list. This particular case can be mitigated by storing the list in sorted order, but this destroys constant-time insertion.

Specialized Representations

If the keys have a specific type, one can often use specialized data structures to gain performance. For example, integer-keyed maps can be implemented using Patricia trees or Judy arrays, and are useful space-saving replacements for sparse arrays. Because this type of data structure can perform longest-prefix matching, they're particularly useful in applications where a single value is assigned to most of a large range of keys with a common prefix except for a few exceptions, such as in routing tables.

String-keyed maps can avoid extra comparisons during lookups by using tries.

Language Support

Associative arrays are known by many names in different programming languages. In Smalltalk and Python they are called dictionaries; in Perl they are called hashes; in Java they are called hashmaps [1] and in Common Lisp they are called hash tables. "Hash table" is also the name of the most common data structure used to store an associative array. In the scripting language Lua, associative arrays, called tables, are used as the primitive building block for all data structures, even arrays.

Associative arrays can be implemented in any programming language, and one or more implementations of them is typically found either built into the language or the standard library distributed with it (C is a noticeable exception, as neither the language nor the standard library directly provide one. It is not difficult to write one in C, though).

In Smalltalk

In Smalltalk a dictionary is used:

phonebook := Dictionary new. phonebook at: 'Sally Smart' put: '555-9999'. phonebook at: 'John Doe' put: '555-1212'. phonebook at: 'J. Random Hacker' put: '553-1337'.

To access an entry the message #at: is sent to the dictionary object.

 phonebook at: 'Sally Smart'

gives

'555-9999'

In C++

C++ also has a form of associative array called std::map. One could create a map with the same information as above using C++ with the following code:

#include 
#include 
int main()
{
   std::map phone_book;
   phone_book["Sally Smart"] = "555-9999";
   phone_book["John Doe"] = "555-1212";
   phone_book["J. Random Hacker"] = "553-1337";
   return 0;
}

In C++, std::map allows keys and values to be different data types, but all of the keys in a particular map must be of the same base type. The same must be true for all of the values. Although std::map is typically implemented using a self-balancing binary search tree, the SGI STL also provides a std::hash_map which has the algorithmic benefits of a hash table.

In Lisp

In Lisp and Scheme, association lists are commonly used, as in the following S-expression:

'(("Sally Smart" . "555-9999")
  ("John Doe" . "555-1212")
  ("J. Random Hacker" . "553-1337"))

The syntax (x . y) is used to indicate a pair. Keys and values need not be the same type within an alist. Lisp and Scheme provide operations to manipulate alists in ways similar to associative arrays.