14. Files and File I/O - Other Kinds of I/O - 《Practical Common Lisp》

Other Kinds of I/O

Other Kinds of I/O

In addition to file streams, Common Lisp supports other kinds of streams, which can also be used with the various reading, writing, and printing I/O functions. For instance, you can read data from, or write data to, a string using **STRING-STREAM**s, which you can create with the functions **MAKE-STRING-INPUT-STREAM** and **MAKE-STRING-OUTPUT-STREAM**.

**MAKE-STRING-INPUT-STREAM** takes a string and optional start and end indices to bound the area of the string from which data should be read and returns a character stream that you can pass to any of the character-based input functions such as **READ-CHAR**, **READ-LINE**, or **READ**. For example, if you have a string containing a floating-point literal in Common Lisp’s syntax, you can convert it to a float like this:

(let ((s (make-string-input-stream "1.23")))
  (unwind-protect (read s)
    (close s)))

Similarly, **MAKE-STRING-OUTPUT-STREAM** creates a stream you can use with **FORMAT**, **PRINT**, **WRITE-CHAR**, **WRITE-LINE**, and so on. It takes no arguments. Whatever you write, a string output stream will be accumulated into a string that can then be obtained with the function **GET-OUTPUT-STREAM-STRING**. Each time you call **GET-OUTPUT-STREAM-STRING**, the stream’s internal string is cleared so you can reuse an existing string output stream.

However, you’ll rarely use these functions directly, because the macros **WITH-INPUT-FROM-STRING** and **WITH-OUTPUT-TO-STRING** provide a more convenient interface. **WITH-INPUT-FROM-STRING** is similar to **WITH-OPEN-FILE**--it creates a string input stream from a given string and then executes the forms in its body with the stream bound to the variable you provide. For instance, instead of the **LET** form with the explicit **UNWIND-PROTECT**, you’d probably write this:

(with-input-from-string (s "1.23")
  (read s))

The **WITH-OUTPUT-TO-STRING** macro is similar: it binds a newly created string output stream to a variable you name and then executes its body. After all the body forms have been executed, **WITH-OUTPUT-TO-STRING** returns the value that would be returned by **GET-OUTPUT-STREAM-STRING**.

CL-USER> (with-output-to-string (out)
            (format out "hello, world ")
            (format out "~s" (list 1 2 3)))
"hello, world (1 2 3)"

The other kinds of streams defined in the language standard provide various kinds of stream “plumbing,” allowing you to plug together streams in almost any configuration. A **BROADCAST-STREAM** is an output stream that sends any data written to it to a set of output streams provided as arguments to its constructor function, **MAKE-BROADCAST-STREAM**.14 Conversely, a **CONCATENATED-STREAM** is an input stream that takes its input from a set of input streams, moving from stream to stream as it hits the end of each stream. **CONCATENATED-STREAM**s are constructed with the function **MAKE-CONCATENATED-STREAM**, which takes any number of input streams as arguments.

Two kinds of bidirectional streams that can plug together streams in a couple ways are **TWO-WAY-STREAM** and **ECHO-STREAM**. Their constructor functions, **MAKE-TWO-WAY-STREAM** and **MAKE-ECHO-STREAM**, both take two arguments, an input stream and an output stream, and return a stream of the appropriate type, which you can use with both input and output functions.

In a **TWO-WAY-STREAM** every read you perform will return data read from the underlying input stream, and every write will send data to the underlying output stream. An **ECHO-STREAM** works essentially the same way except that all the data read from the underlying input stream is also echoed to the output stream. Thus, the output stream of an **ECHO-STREAM** stream will contain a transcript of both sides of the conversation.

Using these five kinds of streams, you can build almost any topology of stream plumbing you want.

Finally, although the Common Lisp standard doesn’t say anything about networking APIs, most implementations support socket programming and typically implement sockets as another kind of stream, so you can use all the regular I/O functions with them.15

Now you’re ready to move on to building a library that smoothes over some of the differences between how the basic pathname functions behave in different Common Lisp implementations.

1Note, however, that while the Lisp reader knows how to skip comments, it completely skips them. Thus, if you use **READ** to read in a configuration file containing comments and then use **PRINT** to save changes to the data, you’ll lose the comments.

2By default **OPEN** uses the default character encoding for the operating system, but it also accepts a keyword parameter, :external-format, that can pass implementation-defined values that specify a different encoding. Character streams also translate the platform-specific end-of-line sequence to the single character #\Newline.

3The type (unsigned-byte 8) indicates an 8-bit byte; Common Lisp “byte” types aren’t a fixed size since Lisp has run at various times on architectures with byte sizes from 6 to 9 bits, to say nothing of the PDP-10, which had individually addressable variable-length bit fields of 1 to 36 bits.

4In general, a stream is either a character stream or a binary stream, so you can’t mix calls to **READ-BYTE** and **READ-CHAR** or other character-based read functions. However, some implementations, such as Allegro, support so-called bivalent streams, which support both character and binary I/O.

5Some folks expect this wouldn’t be a problem in a garbage-collected language such as Lisp. It is the case in most Lisp implementations that a stream that becomes garbage will automatically be closed. However, this isn’t something to rely on—the problem is that garbage collectors usually run only when memory is low; they don’t know about other scarce resources such as file handles. If there’s plenty of memory available, it’s easy to run out of file handles long before the garbage collector runs.

6Another reason the pathname system is considered somewhat baroque is because of the inclusion of logical pathnames. However, you can use the rest of the pathname system perfectly well without knowing anything more about logical pathnames than that you can safely ignore them. Briefly, logical pathnames allow Common Lisp programs to contain references to pathnames without naming specific files. Logical pathnames could then be mapped to specific locations in an actual file system when the program was installed by defining a “logical pathname translation” that translates logical pathnames matching certain wildcards to pathnames representing files in the file system, so-called physical pathnames. They have their uses in certain situations, but you can get pretty far without worrying about them.

7Many Unix-based implementations treat filenames whose last element starts with a dot and don’t contain any other dots specially, putting the whole element, with the dot, in the name component and leaving the type component **NIL**.

(pathname-name (pathname "/foo/.emacs")) ==> ".emacs"
(pathname-type (pathname "/foo/.emacs")) ==> NIL

However, not all implementations follow this convention; some will create a pathname with “” as the name and emacs as the type.

8The name returned by **FILE-NAMESTRING** also includes the version component on file systems that use it.

9The host component may not default to **NIL**, but if not, it will be an opaque implementation-defined value.

10For absolutely maximum portability, you should really write this:

(make-pathname :type "html" :version :newest :defaults input-file)

Without the :version argument, on a file system with built-in versioning, the output pathname would inherit its version number from the input file which isn’t likely to be right—if the input file has been saved many times it will have a much higher version number than the generated HTML file. On implementations without file versioning, the :version argument should be ignored. It’s up to you if you care that much about portability.

11See Chapter 19 for more on handling errors.

12For applications that need access to other file attributes on a particular operating system or file system, libraries provide bindings to underlying C system calls. The Osicat library at http://common-lisp.net/project/osicat/ provides a simple API built using the Universal Foreign Function Interface (UFFI), which should run on most Common Lisps that run on a POSIX operating system.

13The number of bytes and characters in a file can differ even if you’re not using a multibyte character encoding. Because character streams also translate platform-specific line endings to a single #\Newline character, on Windows (which uses CRLF as its line ending) the number of characters will typically be smaller than the number of bytes. If you really have to know the number of characters in a file, you have to bite the bullet and write something like this:

(with-open-file (in filename)
  (loop while (read-char in nil) count t))

or maybe something more efficient like this:

(with-open-file (in filename)
  (let ((scratch (make-string 4096)))
    (loop for read = (read-sequence scratch in)
          while (plusp read) sum read)))

14**MAKE-BROADCAST-STREAM** can make a data black hole by calling it with no arguments.

15The biggest missing piece in Common Lisp’s standard I/O facilities is a way for users to define new stream classes. There are, however, two de facto standards for user-defined streams. During the Common Lisp standardization, David Gray of Texas Instruments wrote a draft proposal for an API to allow users to define new stream classes. Unfortunately, there wasn’t time to work out all the issues raised by his draft to include it in the language standard. However, many implementations support some form of so-called Gray Streams, basing their API on Gray’s draft proposal. Another, newer API, called Simple Streams, has been developed by Franz and included in Allegro Common Lisp. It was designed to improve the performance of user-defined streams relative to Gray Streams and has been adopted by some of the open-source Common Lisp implementations.