The Database

The main problem with the database in Chapter 3 is that there’s only one table, the list stored in the variable *db*. Another is that the code doesn’t know anything about what type of values are stored in different columns. In Chapter 3 you got away with that by using the fairly general-purpose **EQUAL** method to compare column values when selecting rows from the database, but you would’ve been in trouble if you had wanted to store values that couldn’t be compared with **EQUAL** or if you had wanted to sort the rows in the database since there’s no ordering function that’s as general as **EQUAL**.

This time you’ll solve both problems by defining a class, table, to represent individual database tables. Each table instance will consist of two slots—one to hold the table’s data and another to hold information about the columns in the table that database operations will be able to use. The class looks like this:

  1. (defclass table ()
  2. ((rows :accessor rows :initarg :rows :initform (make-rows))
  3. (schema :accessor schema :initarg :schema)))

As in Chapter 3, you can represent the individual rows with plists, but this time around you’ll create an abstraction that will make that an implementation detail you can change later without too much trouble. And this time you’ll store the rows in a vector rather than a list since certain operations that you’ll want to support, such as random access to rows by a numeric index and the ability to sort a table, can be more efficiently implemented with vectors.

The function make-rows used to initialize the rows slot can be a simple wrapper around **MAKE-ARRAY** that builds an empty, adjustable,vector with a fill pointer.

The Package

The package for the code you’ll develop in this chapter looks like this:

  1. (defpackage :com.gigamonkeys.mp3-database
  2. (:use :common-lisp
  3. :com.gigamonkeys.pathnames
  4. :com.gigamonkeys.macro-utilities
  5. :com.gigamonkeys.id3v2)
  6. (:export :*default-table-size*
  7. :*mp3-schema*
  8. :*mp3s*
  9. :column
  10. :column-value
  11. :delete-all-rows
  12. :delete-rows
  13. :do-rows
  14. :extract-schema
  15. :in
  16. :insert-row
  17. :load-database
  18. :make-column
  19. :make-schema
  20. :map-rows
  21. :matching
  22. :not-nullable
  23. :nth-row
  24. :random-selection
  25. :schema
  26. :select
  27. :shuffle-table
  28. :sort-rows
  29. :table
  30. :table-size
  31. :with-column-values))

The :use section gives you access to the functions and macros whose names are exported from the packages defined in Chapter 15, 8, and 25 and the :export section exports the API this library will provide, which you’ll use in Chapter 29.

  1. (defparameter *default-table-size* 100)
  2. (defun make-rows (&optional (size *default-table-size*))
  3. (make-array size :adjustable t :fill-pointer 0))

To represent a table’s schema, you need to define another class, column, each instance of which will contain information about one column in the table: its name, how to compare values in the column for equality and ordering, a default value, and a function that will be used to normalize the column’s values when inserting data into the table and when querying the table. The schema slot will hold a list of column objects. The class definition looks like this:

  1. (defclass column ()
  2. ((name
  3. :reader name
  4. :initarg :name)
  5. (equality-predicate
  6. :reader equality-predicate
  7. :initarg :equality-predicate)
  8. (comparator
  9. :reader comparator
  10. :initarg :comparator)
  11. (default-value
  12. :reader default-value
  13. :initarg :default-value
  14. :initform nil)
  15. (value-normalizer
  16. :reader value-normalizer
  17. :initarg :value-normalizer
  18. :initform #'(lambda (v column) (declare (ignore column)) v))))

The equality-predicate and comparator slots of a column object hold functions used to compare values from the given column for equivalence and ordering. Thus, a column containing string values might have **STRING=** as its equality-predicate and **STRING<** as its comparator, while a column containing numbers might have **=** and **<**.

The default-value and value-normalizer slots are used when inserting rows into the database and, in the case of value-normalizer, when querying the database. When you insert a row into the database, if no value is provided for a particular column, you can use the value stored in the column‘s default-value slot. Then the value—defaulted or otherwise—is normalized by passing it and the column object to the function stored in the value-normalizer slot. You pass the column in case the value-normalizer function needs to use some data associated with the column object. (You’ll see an example of this in the next section.) You should also normalize values passed in queries before comparing them with values in the database.

Thus, the value-normalizer‘s responsibility is primarily to return a value that can be safely and correctly passed to the equality-predicate and comparator functions. If the value-normalizer can’t figure out an appropriate value to return, it can signal an error.

The other reason to normalize values before you store them in the database is to save both memory and CPU cycles. For instance, if you have a column that’s going to contain string values but the number of distinct strings that will be stored in the column is small—for instance, the genre column in the MP3 database—you can save space and speed by using the value-normalizer to intern the strings (translate all **STRING=** values to a single string object). Thus, you’ll need only as many strings as there are distinct values, regardless of how many rows are in the table, and you can use **EQL** to compare column values rather than the slower **STRING=**.1