Common Lisp Extensions. Node: Hash Tables

Chapter 12: Hash Tables

A hash table is a data structure that maps ``keys'' onto ``values.'' Keys and values can be arbitrary Lisp data objects. Hash tables have the property that the time to search for a given key is roughly constant; simpler data structures like association lists take time proportional to the number of entries in the list.

Function: make-hash-table &key :test :size

This function creates and returns a hash-table object whose function for comparing elements is :test (eql by default), and which is allocated to fit about :size elements. The :size argument is purely advisory; the table will stretch automatically if you store more elements in it. If :size is omitted, a reasonable default is used.

Common Lisp allows only eq, eql, equal, and equalp as legal values for the :test argument. In this package, any reasonable predicate function will work, though if you use something else you should check the details of the hashing function described below to make sure it is suitable for your predicate.

Some versions of Emacs (like Lucid Emacs 19) include a built-in hash table type; in these versions, make-hash-table with a test of eq will use these built-in hash tables. In all other cases, it will return a hash-table object which takes the form of a list with an identifying ``tag'' symbol at the front. All of the hash table functions in this package can operate on both types of hash table; normally you will never know which type is being used.

This function accepts the additional Common Lisp keywords :rehash-size and :rehash-threshold, but it ignores their values.

Function: gethash key table &optional default: This function looks up key in table. If key exists in the table, in the sense that it matches any of the existing keys according to the table's test function, then the associated value is returned. Otherwise, default (or nil) is returned.
To store new data in the hash table, use setf on a call to gethash. If key already exists in the table, the corresponding value is changed to the stored value. If key does not already exist, a new entry is added to the table and the table is reallocated to a larger size if necessary. The default argument is allowed but ignored in this case. The situation is exactly analogous to that of get*; see Property Lists.

Function: remhash key table: This function removes the entry for key from table. If an entry was removed, it returns t. If key does not appear in the table, it does nothing and returns nil.

Function: clrhash table: This function removes all the entries from table, leaving an empty hash table.

Function: maphash function table: This function calls function for each entry in table. It passes two arguments to function, the key and the value of the given entry. The return value of function is ignored; maphash itself returns nil. See Loop Facility, for an alternate way of iterating over hash tables.

Function: hash-table-count table: This function returns the number of entries in table. Warning: The current implementation of Lucid Emacs 19 hash-tables does not decrement the stored count when remhash removes an entry. Therefore, the return value of this function is not dependable if you have used remhash on the table and the table's test is eq. A slower, but reliable, way to count the entries is (loop for x being the hash-keys of table count t).

Function: hash-table-p object: This function returns t if object is a hash table, nil otherwise. It recognizes both types of hash tables (both Lucid Emacs built-in tables and tables implemented with special lists.)

Sometimes when dealing with hash tables it is useful to know the exact ``hash function'' that is used. This package implements hash tables using Emacs Lisp ``obarrays,'' which are the same data structure that Emacs Lisp uses to keep track of symbols. Each hash table includes an embedded obarray. Key values given to gethash are converted by various means into strings, which are then looked up in the obarray using intern and intern-soft. The symbol, or ``bucket,'' corresponding to a given key string includes as its symbol-value an association list of all key-value pairs which hash to that string. Depending on the test function, it is possible for many entries to hash to the same bucket. For example, if the test is eql, then the symbol foo and two separately built strings "foo" will create three entries in the same bucket. Search time is linear within buckets, so hash tables will be most effective if you arrange not to store too many things that hash the same.

The following algorithm is used to convert Lisp objects to hash strings:

Strings are used directly as hash strings. (However, if the test function is equalp, strings are downcased first.)
Symbols are hashed according to their symbol-name.
Integers are hashed into one of 16 buckets depending on their value modulo 16. Floating-point numbers are truncated to integers and hashed modulo 16.
Cons cells are hashed according to their cars; nonempty vectors are hashed according to their first element.
All other types of objects hash into a single bucket named "*".

Thus, for example, searching among many buffer objects in a hash table will devolve to a (still fairly fast) linear-time search through a single bucket, whereas searching for different symbols will be very fast since each symbol will, in general, hash into its own bucket.

The size of the obarray in a hash table is automatically adjusted as the number of elements increases.

As a special case, make-hash-table with a :size argument of 0 or 1 will create a hash-table object that uses a single association list rather than an obarray of many lists. For very small tables this structure will be more efficient since lookup does not require converting the key to a string or looking it up in an obarray. However, such tables are guaranteed to take time proportional to their size to do a search.

Lists

Top

Structures