13. Beyond Lists: Other Uses for Cons Cells - Lookup Tables: Alists and Plists - 《Practical Common Lisp》

Lookup Tables: Alists and Plists

Lookup Tables: Alists and Plists

In addition to trees and sets, you can build tables that map keys to values out of cons cells. Two flavors of cons-based lookup tables are commonly used, both of which I’ve mentioned in passing in previous chapters. They’re association lists, also called alists, and property lists, also known as plists. While you wouldn’t use either alists or plists for large tables—for that you’d use a hash table—it’s worth knowing how to work with them both because for small tables they can be more efficient than hash tables and because they have some useful properties of their own.

An alist is a data structure that maps keys to values and also supports reverse lookups, finding the key when given a value. Alists also support adding key/value mappings that shadow existing mappings in such a way that the shadowing mapping can later be removed and the original mappings exposed again.

Under the covers, an alist is essentially a list whose elements are themselves cons cells. Each element can be thought of as a key/value pair with the key in the cons cell’s **CAR** and the value in the **CDR**. For instance, the following is a box-and-arrow diagram of an alist mapping the symbol A to the number 1, B to 2, and C to 3:

Unless the value in the **CDR** is a list, cons cells representing the key/value pairs will be dotted pairs in s-expression notation. The alist diagramed in the previous figure, for instance, is printed like this:

((A . 1) (B . 2) (C . 3))

The main lookup function for alists is **ASSOC**, which takes a key and an alist and returns the first cons cell whose **CAR** matches the key or **NIL** if no match is found.

CL-USER> (assoc 'a '((a . 1) (b . 2) (c . 3)))
(A . 1)
CL-USER> (assoc 'c '((a . 1) (b . 2) (c . 3)))
(C . 3)
CL-USER> (assoc 'd '((a . 1) (b . 2) (c . 3)))
NIL

To get the value corresponding to a given key, you simply pass the result of **ASSOC** to **CDR**.

CL-USER> (cdr (assoc 'a '((a . 1) (b . 2) (c . 3))))
1

By default the key given is compared to the keys in the alist using **EQL**, but you can change that with the standard combination of :key and :test keyword arguments. For instance, if you wanted to use string keys, you might write this:

CL-USER> (assoc "a" '(("a" . 1) ("b" . 2) ("c" . 3)) :test #'string=)
("a" . 1)

Without specifying :test to be **STRING=**, that **ASSOC** would probably return **NIL** because two strings with the same contents aren’t necessarily **EQL**.

CL-USER> (assoc "a" '(("a" . 1) ("b" . 2) ("c" . 3)))
NIL

Because **ASSOC** searches the list by scanning from the front of the list, one key/value pair in an alist can shadow other pairs with the same key later in the list.

CL-USER> (assoc 'a '((a . 10) (a . 1) (b . 2) (c . 3)))
(A . 10)

You can add a pair to the front of an alist with **CONS** like this:

(cons (cons 'new-key 'new-value) alist)

However, as a convenience, Common Lisp provides the function **ACONS**, which lets you write this:

(acons 'new-key 'new-value alist)

Like **CONS**, **ACONS** is a function and thus can’t modify the place holding the alist it’s passed. If you want to modify an alist, you need to write either this:

(setf alist (acons 'new-key 'new-value alist))

or this:

(push (cons 'new-key 'new-value) alist)

Obviously, the time it takes to search an alist with **ASSOC** is a function of how deep in the list the matching pair is found. In the worst case, determining that no pair matches requires **ASSOC** to scan every element of the alist. However, since the basic mechanism for alists is so lightweight, for small tables an alist can outperform a hash table. Also, alists give you more flexibility in how you do the lookup. I already mentioned that **ASSOC** takes :key and :test keyword arguments. When those don’t suit your needs, you may be able to use the **ASSOC-IF** and **ASSOC-IF-NOT** functions, which return the first key/value pair whose **CAR** satisfies (or not, in the case of **ASSOC-IF-NOT**) the test function passed in the place of a specific item. And three functions—**RASSOC**, **RASSOC-IF**, and **RASSOC-IF-NOT**--work just like the corresponding **ASSOC** functions except they use the value in the **CDR** of each element as the key, performing a reverse lookup.

The function **COPY-ALIST** is similar to **COPY-TREE** except, instead of copying the whole tree structure, it copies only the cons cells that make up the list structure, plus the cons cells directly referenced from the **CAR**s of those cells. In other words, the original alist and the copy will both contain the same objects as the keys and values, even if those keys or values happen to be made up of cons cells.

Finally, you can build an alist from two separate lists of keys and values with the function **PAIRLIS**. The resulting alist may contain the pairs either in the same order as the original lists or in reverse order. For example, you may get this result:

CL-USER> (pairlis '(a b c) '(1 2 3))
((C . 3) (B . 2) (A . 1))

Or you could just as well get this:

CL-USER> (pairlis '(a b c) '(1 2 3))
((A . 1) (B . 2) (C . 3))

The other kind of lookup table is the property list, or plist, which you used to represent the rows in the database in Chapter 3. Structurally a plist is just a regular list with the keys and values as alternating values. For instance, a plist mapping A, B, and C, to 1, 2, and 3 is simply the list (A 1 B 2 C 3). In boxes-and-arrows form, it looks like this:

However, plists are less flexible than alists. In fact, plists support only one fundamental lookup operation, the function **GETF**, which takes a plist and a key and returns the associated value or **NIL** if the key isn’t found. **GETF** also takes an optional third argument, which will be returned in place of **NIL** if the key isn’t found.

Unlike **ASSOC**, which uses **EQL** as its default test and allows a different test function to be supplied with a :test argument, **GETF** always uses **EQ** to test whether the provided key matches the keys in the plist. Consequently, you should never use numbers or characters as keys in a plist; as you saw in Chapter 4, the behavior of **EQ** for those types is essentially undefined. Practically speaking, the keys in a plist are almost always symbols, which makes sense since plists were first invented to implement symbolic “properties,” arbitrary mappings between names and values.

You can use **SETF** with **GETF** to set the value associated with a given key. **SETF** also treats **GETF** a bit specially in that the first argument to **GETF** is treated as the place to modify. Thus, you can use **SETF** of **GETF** to add a new key/value pair to an existing plist.

CL-USER> (defparameter *plist* ())
*PLIST*
CL-USER> *plist*
NIL
CL-USER> (setf (getf *plist* :a) 1)
1
CL-USER> *plist*
(:A 1)
CL-USER> (setf (getf *plist* :a) 2)
2
CL-USER> *plist*
(:A 2)

To remove a key/value pair from a plist, you use the macro **REMF**, which sets the place given as its first argument to a plist containing all the key/value pairs except the one specified. It returns true if the given key was actually found.

CL-USER> (remf *plist* :a)
T
CL-USER> *plist*
NIL

Like **GETF**, **REMF** always uses **EQ** to compare the given key to the keys in the plist.

Since plists are often used in situations where you want to extract several properties from the same plist, Common Lisp provides a function, **GET-PROPERTIES**, that makes it more efficient to extract multiple values from a single plist. It takes a plist and a list of keys to search for and returns, as multiple values, the first key found, the corresponding value, and the head of the list starting with the found key. This allows you to process a property list, extracting the desired properties, without continually rescanning from the front of the list. For instance, the following function efficiently processes—using the hypothetical function process-property--all the key/value pairs in a plist for a given list of keys:

(defun process-properties (plist keys)
  (loop while plist do
       (multiple-value-bind (key value tail) (get-properties plist keys)
         (when key (process-property key value))
         (setf plist (cddr tail)))))

The last special thing about plists is the relationship they have with symbols: every symbol object has an associated plist that can be used to store information about the symbol. The plist can be obtained via the function **SYMBOL-PLIST**. However, you rarely care about the whole plist; more often you’ll use the functions **GET**, which takes a symbol and a key and is shorthand for a **GETF** of the same key in the symbols **SYMBOL-PLIST**.

(get 'symbol 'key) === (getf (symbol-plist 'symbol) 'key)

Like **GETF**, **GET** is **SETF**able, so you can attach arbitrary information to a symbol like this:

(setf (get 'some-symbol 'my-key) "information")

To remove a property from a symbol’s plist, you can use either **REMF** of **SYMBOL-PLIST** or the convenience function **REMPROP**.4

(remprop 'symbol 'key) === (remf (symbol-plist 'symbol key))

Being able to attach arbitrary information to names is quite handy when doing any kind of symbolic programming. For instance, one of the macros you’ll write in Chapter 24 will attach information to names that other instances of the same macros will extract and use when generating their expansions.