Quoting and symbols

Understanding quoting

Quoting is a concept that frequently confuses beginners in languages from the Lisp family. Yet, it is an essential tool in practice. This part takes a slightly more theoretical approach than before in order to dissect this feature.

In the part about lists, we already met the basic syntax of quoting:

'(0 1 2 3 "Hello")

This expression is (almost, see below) equivalent to (list 0 1 2 3 "Hello").

However, it is not all that similar to the list function:

(list (+ 1 1) (+ 2 3))
 (2 5)
'((+ 1 1) (+ 2 3))
 ((+ 1 1) (+ 2 3))

What happened?

To understand this, we have to dive into fundamental aspects of the languages that remained more or less hidden or implicit up to now.

Let’s start with an observation. The Scheme interpreter displays a list in a format that is very similar to how Scheme expressions are formed. In both cases, there are elements separated by spaces inside a pair of parentheses.

(+ 1 2 3 4) ; computation
(1 2 3 4 5) ; list

This resemblance between notations is no coincidence. Like all languages from the Lisp family, Scheme obeys a core principle: there is only one syntax, which works for both programs and data, because programs and data are actually the same thing. Concretely, the expression (+ 1 2 3 4) is not just written in a format that makes it look like a list. It is a list!

To understand better, let us review the different forms that a Scheme expression can take (i.e., what you can put after a # in LilyPond).

  • A number: -5.5 (in LilyPond, #-5.5; remember that you need a # to switch to Scheme mode),

  • A string: "abcd" (in LilyPond: `#”abcd”),

  • A boolean: #t or #f (in LilyPond: ##t or ##f),

  • A function call: (display "foo") (in LilyPond: #(display "foo")),

  • A variable: my-variable (in LilyPond: #my-variable).

The first two forms should already look familiar. When you write a literal number, like -5.5, the value of the expression is this same number. This is simple, but not inconsequential: you have to keep in mind the distinction between a Scheme expression (like (+ 1 2)) and the value it evaluates to (in the latter example, 3). What makes Scheme special is that expressions are values themselves, built with very basic data types. Therefore, the number -5.5 is a valid expression. Because it is a number, and numbers are self-evaluating (they evaluate to themselves), the expression -5.5 is evaluated into -5.5. The same happens with strings ad booleans. Now, let us turn our attention to the fourth point. It may be useful to think of it by comparison to another languages, for example, Python. In Python, a program is represented using an “abstract syntax tree”, which can be read using the ast module. On the example 1+2, this gives:

>>> import ast
>>> ast.dump(ast.parse("1+2", mode="eval"))
'Expression(body=BinOp(left=Constant(value=1), op=Add(), right=Constant(value=2)))'

The expression 1+2 is thus represented by the Python interpreter in a relatively complex way, using an object of ttype Expression of which the body attribute is an object of type BinOp with attributes left, op and right.

None of this exists in Scheme. The representation of expressions is a lot simpler: it is just a list. This explains why (+ 1 2) resembles a list. It is, in fact, a list, of which the elements are +, 1 and 2. Unlike numbers, lists are not self-evaluating. When a list is evaluated, its first element is evaluated and interpreted as a function, while the remaining elements are evaluated and interpreted as arguments, and the function is called on the arguments.

It remains to see how variables are represented. They use a dedicated type of objects called symbols (more on those below). A symbol is similar to a string, but it is not self-evaluating. Instead, it evaluates to the value of the variable it is the name of.

"my-variable" ; string, evaluates to itself
 "my-variable

my-variable ; symbol
⇒ error because my-variable is not defined

(define my-variable "foo") ; let's define it
my-variable ; now the symbol can be evaluated without error
⇒ "foo"

In the Scheme syntax, symbols are written as simple sequences of letters, without quotes. The expression my-variable is a symbol, which evaluates to the value of the variable called my-variable. Similarly, in (+ 1 2), + is a symbol. It is predefined by the Scheme interpreter as the addition function.

To summarize, the Scheme syntax allows to enter five fundamental data types, namely numbers, strings, booleans, lists and symbols (there exist some other, less fundamental data types). The central idea of the language is that there is no need for more syntax than that, because these five types (and the few others) are enough to encode all programs. Thus, a Scheme program is a Scheme data structure made by combining these five (or more) data types. Of these five, numbers, strings and booleans are self-evaluating, while symbols take the value of a variable and lists are evaluated by calling a function.

Once this concept is understood, quoting becomes quite simple. It is a way to prevent evaluation of an expression. Normally, when the Scheme interpreter sees a data structures, it automatically considers it to be a program. For example, the list (+ 1 2) is viewed as a program and evaluated, which yields the value 3. On the other hand, if the same expression is prefixed with a single quote ', then the list is returned as-is, without being evaluated.

(+ 1 2) ; list evaluated through a function call
 3

'(+ 1 2) ; list is not evaluated by returned as-is
 (+ 1 2)

This is a very practical way of entering a list.

This discussion also explains the oddity noticed at the beginning :

(list (+ 1 1) (+ 2 3))
 (2 5)
'((+ 1 1) (+ 2 3))
 ((+ 1 1) (+ 2 3))

Indeed, (list ...) is an expression that evaluates through a function call, and to make this function call, the interpreter first evaluates the arguments. The symbol list evaluates to the predefined Scheme function that builds a list, the expression (+ 1 2) evaluates to 3, and (+ 2 3) evaluates to 5; then the list function is called on 2 and 5, which yields the list `(2 5). On the other hand, in the second case, quoting prevents evaluation of the whole expression, including any subexpressions that it contains, which yields a list of two lists.

Not only is quoting useful to enter lists, but it is the main way to obtain a symbol (without evaluating it). This syntax is so common that it will look familiar to you if you use LilyPond frequently:

\tag #'edition (
\override NoteHead.style = #'cross

A more explicit syntax for quoting is available. Writing a single quote before an expression is actually a shorthand for wrapping that expression in (quote ...). These are strictly equivalent:

'(1 2.4 "Hello")

and

(quote (1 2.4 "Hello"))

Quasiquoting

Quasiquoting syntax is a way to evaluate selected subexpressions within a quoted expressions. For this to work, quote needs to be replaced with quasiquote. Expressions to be evaluated are then wrapped in unquote.

(quasiquote (0 1 2 (unquote + 1 2) "Hello"))
 (0 1 2 3 "Hello")

Just like there is a shorthand ' (single quote) for quote, there is a shorthand ` (backtick) for quasiquote, and , (comma) for unquote.

`(0 1 2 ,(+ 1 2) "Hello")
 (0 1 2 3 "Hello")

This syntax is frequently used to create lists that contain symbols. Rather than

(list 'moveto x y)

you would often write

`(moveto ,x ,y)

Identity of symbols

At first glance, symbols and strings are similar, the difference being that strings are self-evaluating while symbols evaluate to variables. There is another essential difference, however. When you create two strings, even equal strings, memory is allocated in the computer for both strings independently, and they are stored in different places. On the contrary, symbols are unique. A symbol is never stored twice in two different places. When a symbol already known to the interpreter is read, the already allocated symbol is reused automatically.

With equal?, this makes no difference at all, since equal? tests structural equality of objects. For two strings, equal? tests whether they have exactly the same characters. There is another test, called eq?, to determine whether two objects are actually the same object in memory. equal? is an equality test, whereas eq? is an identity test.

(equal? "hello" "hello")
 #t
(eq? "hello" "hello")
 #f
(equal? 'hello 'hello)
 #t
(eq? 'hello 'hello)
 #t

The advantage of eq? is that it’s very fast. It suffices to test whether the addresses of the two objects are equal. Unlike eq?, equal? can take significant time; the larger the objects it compares, the costlier it is. Thus, the lesson to remember is that symbols can be compared with eq?, and it’s a good idea to always compare them with eq?, even though it’s not really a problem to use equal?.