Character Escaping

The first bit of the foundation you’ll need to lay is the code that knows how to escape characters with a special meaning in HTML. There are three such characters, and they must not appear in the text of an element or in an attribute value; they are <, >, and &. In element text or attribute values, these characters must be replaced with the character reference entities &lt;, &gt;, and &amp;. Similarly, in attribute values, the quotation marks used to delimit the value must be escaped, ' with &apos; and " with &quot;. Additionally, any character can be represented by a numeric character reference entity consisting of an ampersand, followed by a sharp sign, followed by the numeric code as a base 10 integer, and followed by a semicolon. These numeric escapes are sometimes used to embed non-ASCII characters in HTML.

The Package

Since FOO is a low-level library, the package you develop it in doesn’t rely on much external code—just the usual dependency on names from the COMMON-LISP package and, almost as usual, on the names of the macro-writing macros from COM.GIGAMONKEYS.MACRO-UTILITIES. On the other hand, the package needs to export all the names needed by code that uses FOO. Here’s the **DEFPACKAGE** from the source that you can download from the book’s Web site:

  1. (defpackage :com.gigamonkeys.html
  2. (:use :common-lisp :com.gigamonkeys.macro-utilities)
  3. (:export :with-html-output
  4. :in-html-style
  5. :define-html-macro
  6. :html
  7. :emit-html
  8. :&attributes))

The following function accepts a single character and returns a string containing a character reference entity for that character:

  1. (defun escape-char (char)
  2. (case char
  3. (#\& "&amp;")
  4. (#\< "&lt;")
  5. (#\> "&gt;")
  6. (#\' "&apos;")
  7. (#\" "&quot;")
  8. (t (format nil "&#~d;" (char-code char)))))

You can use this function as the basis for a function, escape, that takes a string and a sequence of characters and returns a copy of the first argument with all occurrences of the characters in the second argument replaced with the corresponding character entity returned by escape-char.

  1. (defun escape (in to-escape)
  2. (flet ((needs-escape-p (char) (find char to-escape)))
  3. (with-output-to-string (out)
  4. (loop for start = 0 then (1+ pos)
  5. for pos = (position-if #'needs-escape-p in :start start)
  6. do (write-sequence in out :start start :end pos)
  7. when pos do (write-sequence (escape-char (char in pos)) out)
  8. while pos))))

You can also define two parameters: *element-escapes*, which contains the characters you need to escape in normal element data, and *attribute-escapes*, which contains the set of characters to be escaped in attribute values.

  1. (defparameter *element-escapes* "<>&")
  2. (defparameter *attribute-escapes* "<>&\"'")

Here are some examples:

  1. HTML> (escape "foo & bar" *element-escapes*)
  2. "foo &amp; bar"
  3. HTML> (escape "foo & 'bar'" *element-escapes*)
  4. "foo &amp; 'bar'"
  5. HTML> (escape "foo & 'bar'" *attribute-escapes*)
  6. "foo &amp; &apos;bar&apos;"

Finally, you’ll need a variable, *escapes*, that will be bound to the set of characters that need to be escaped. It’s initially set to the value of *element-escapes*, but when generating attributes, it will, as you’ll see, be rebound to the value of *attribute-escapes*.

  1. (defvar *escapes* *element-escapes*)