Pairs and alists

Associative lists, or alists for shorts, are associative data structures: they map keys to values. This principles has numerous applications. For example, you might define an alist that maps names of composers to their birth dates. LilyPond has an alist that maps language names (like "english" in \language "english") to their note names. These note names are an alist mapping names to pitches. There is another alist to map alterations to glyphs in the music font. In short, there are alists everywhere.

A word on pairs

Alists are built from pairs. These are compound objects which contain two elements exactly, unlike lists, which contain as many of them as wanted. Just like with lists, there are two methods to create a pair:

(cons element1 element2)

or

'(element1 . element2)

The first way is to call the cons function, of which the name is short for construct. Naturally, it expects two arguments. The second way is a form of quoting. The dot between the two elements indicates that the expression is a pair rather than a list. The display of pairs is again similar to how they are quoted: they are printed like lists, but with a dot in the middle.

(cons 1 2)
 (1 . 2)

The spaces around the dot are important. '(1 . 2) is a pair, whereas '(1.2) is a one-element list containing the number “one point two”.

Quasiquoting can also be used for pairs.

(define birth 1756)
`(age . ,(- 2021 birth))
 (age . 265)

The two elements of a pair are accessed using the functions car and cdr.

(car '(1 . 2))
 1
(cdr '(1 . 2))
 2

It is regrettable that the names “car” and “cdr” have not been chosen to be more explicit. They are actually technical acronyms dating back to the first implementations of Lisp (they mean “Contents of the Address part of the Register” and “Contents of the Decrement part of the Register”).

Alists

An alist is simply a list of pairs. The car of each pair is a key, and the cdr is the associated value. Keys are normally unique, i.e., no two pairs in the alist have the same car. However, there are legitimate cases where this rule of thumb can be waived.

Here is a first alist:

(define birth-dates
  '(("Mozart" . 1756)
    ("Beethoven" . 1778)
    ("Schumann" . 1810)
    ("Hába" . 1893)
    ("Weill" . 1900)
    ("Ligeti" . 1923)))

Guile (reminder: the Scheme implementaton used by LilyPond) has an array of functions devoted to alists.

Retrieving a value

First, here are those to retrieve a value. There is not just one, but six, all with differences! They are grouped in two families:

  • assq, assv, assoc,

  • assq-ref, assv-ref, assoc-ref.

Functions from the first family are called like this (using assoc as an example):

(assoc key alist)

The return value is the pair (key . value), if it exists in the alists, and the boolean #f otherwise.

Annoyingly, functions from the second family are called with these arguments swapped:

(assoc-ref alist key)

These only return the value associated with the key, rather than the pair with the key and the value.

The difference between functions from the same family is how they compare keys. All of these functions use the same algorithm. They start with the first pair in the alist, compare its car to the key being looked up, and return the cdr if the test is successful, else, they continue with the second pair, etc. As you can notice, a function is needed to compare the wanted key to keys from the alist. The assq and assq-ref functions use eq?. assoc and assoc-ref use equal?. As for assv and assv-ref, they use a third predicate, eqv?, which is much less common; it is similar to eq? but compares numbers and characters by value (though it still compares strings by identity).

Therefore, if the keys in the alist are symbols, you should use assq or assq-ref. In the general case, better use assoc and assoc-ref.

Our alist of birth dates has strings for the keys, so we are going to use assoc and assoc-ref.

(assoc "Schumann" birth-dates)
 ("Schumann" . 1810)
(assoc-ref birth-dates "Schumann")
 1810
(assoc-ref birth-dates "Bartók") ; not in the alist
 #f

Clearly, assoc-ref is the most practical. With assoc, you need to explicitly take the cdr of the returned pair. So why does Scheme have assoc? The necessity becomes clear when considering alists where the values can be #f. Let us consider the same set of composers, and build the alist which tells, for every of them, whether one (at least) of his parents was a musician.

(define parents-musicians
  '(("Mozart" . #t)
    ("Beethoven" . #t)
    ("Schumann" . #f)
    ("Weill" . #t)))
(assoc-ref parents-musicians "Mozart")
 #t
(assoc-ref parents-musicians "Bach")
 #f

assoc-ref returns #f for unfound keys. This does not allow to distinguish whether Bach’s parents were not musicians, or Bach is simply not in the alist. assoc allows to draw this difference.

(assoc "Mozart" parents-musicians)
 ("Mozart" . #t)
(assoc "Bach" parents-musicians)
 #f

If Bach had been in the alist, the result of assoc would have been a pair. This time, #f is unambiguous.

With assoc-ref, or functions of the same family, it is common to utilize the fact that any value is considered true except #f to give default values thanks to or (cf. On universal truth).

(or (assoc-ref birth-dates "Bach")
    "birth date unknown")
 "birth date unknown"

With assoc, you need to store the pair in a variable.

(let ((pair (assoc "Bath" parents-musicians)))
  (if pair
      (cdr pair)
      "no entry for this composer"))
 "no entry for this composer"

Be aware that LilyPond defines yet another functions, assoc-get. Like assoc and assoc-ref, it uses equal? to test equality of keys. Like assoc-ref, it returns the value associated with a key, not the (key . value) pair. However, its arguments are in the same order as assoc, namely the key first, then the alist. Finally, assoc-get has an optional third parameter specifying the value returned if the key is not found. This parameter defaults to #f, in line with the standard alist functions.

(assoc-get "Bach" parents-musicians "no entry for this composer")
 "no entry for this composer"

Modifying values

Again, there is not a single function to modify alists. There are three, assq-set!, assv-set! and assoc-set!. Like assq-ref, assv-ref and assoc-ref, they use the predicates eq?, eqv? and equal?, respectively. They are called as

(assxxx-set! alist key value)

They return a new alist.

(define birth-dates
  '(("Mozart" . 1756)
    ("Beethoven" . 1778)
    ("Schumann" . 1810)
    ("Hába" . 1893)
    ("Weill" . 1900)
    ("Ligeti" . 1923)))
(assoc-set! birth-dates "Schönberg" 1874)
 (("Schönberg" . 1874) ("Mozart" . 1756) ("Beethoven" . 1778) ("Schumann" . 1810) ("Hába" . 1893) ("Weill" . 1900) ("Ligeti" . 1923))
(assoc-set! birth-dates "Weill" 1901) ; let's lie on history
 (("Mozart" . 1756) ("Beethoven" . 1778) ("Schumann" . 1810) ("Hába" . 1893) ("Weill" . 1901) ("Ligeti" . 1923))

The exclamation mark at the end of the names of these functions means that they are allowed to modify the alist contents in place, by mutating the alist. In this example, if they do so, it means that birth-dates after the first assoc-set! contains a new key. However, they are not guaranteed to do so! You must always save the return value of assxxx-set! and work on that.

There is also acons, which is designed to add an entry to an alist which does not already contain the key, or for cases where repeated keys are not a problem. acons just adds a new pair at the start of the alist. The order of arguments is different from the assxxx-set! functions:

(acons key value alist)

For example:

(acons "Weill" 1901 birth-dates)
 (("Weill" . 1901) ("Mozart" . 1756) ("Beethoven" . 1778) ("Schumann" . 1810) ("Hába" . 1893) ("Weill" . 1900) ("Ligeti" . 1923))