Copyright © 2003-2005, Peter Seibel

19. Beyond Exception Handling: Conditions and Restarts

One of Lisp’s great features is its condition system. It serves a similar purpose to the exception handling systems in Java, Python, and C++ but is more flexible. In fact, its flexibility extends beyond error handling—conditions are more general than exceptions in that a condition can represent any occurrence during a program’s execution that may be of interest to code at different levels on the call stack. For example, in the section “Other Uses for Conditions,” you’ll see that conditions can be used to emit warnings without disrupting execution of the code that emits the warning while allowing code higher on the call stack to control whether the warning message is printed. For the time being, however, I’ll focus on error handling.

The condition system is more flexible than exception systems because instead of providing a two-part division between the code that signals an error1 and the code that handles it,2 the condition system splits the responsibilities into three parts—signaling a condition, handling it, and restarting. In this chapter, I’ll describe how you could use conditions in part of a hypothetical application for analyzing log files. You’ll see how you could use the condition system to allow a low-level function to detect a problem while parsing a log file and signal an error, to allow mid-level code to provide several possible ways of recovering from such an error, and to allow code at the highest level of the application to define a policy for choosing which recovery strategy to use.

To start, I’ll introduce some terminology: errors, as I’ll use the term, are the consequences of Murphy’s law. If something can go wrong, it will: a file that your program needs to read will be missing, a disk that you need to write to will be full, the server you’re talking to will crash, or the network will go down. If any of these things happen, it may stop a piece of code from doing what you want. But there’s no bug; there’s no place in the code that you can fix to make the nonexistent file exist or the disk not be full. However, if the rest of the program is depending on the actions that were going to be taken, then you’d better deal with the error somehow or you will have introduced a bug. So, errors aren’t caused by bugs, but neglecting to handle an error is almost certainly a bug.

So, what does it mean to handle an error? In a well-written program, each function is a black box hiding its inner workings. Programs are then built out of layers of functions: high-level functions are built on top of the lower-level functions, and so on. This hierarchy of functionality manifests itself at runtime in the form of the call stack: if high calls medium, which calls low, when the flow of control is in low, it’s also still in medium and high, that is, they’re still on the call stack.

Because each function is a black box, function boundaries are an excellent place to deal with errors. Each function—low, for example—has a job to do. Its direct caller—medium in this case—is counting on it to do its job. However, an error that prevents it from doing its job puts all its callers at risk: medium called low because it needs the work done that low does; if that work doesn’t get done, medium is in trouble. But this means that medium‘s caller, high, is also in trouble—and so on up the call stack to the very top of the program. On the other hand, because each function is a black box, if any of the functions in the call stack can somehow do their job despite underlying errors, then none of the functions above it needs to know there was a problem—all those functions care about is that the function they called somehow did the work expected of it.

In most languages, errors are handled by returning from a failing function and giving the caller the choice of either recovering or failing itself. Some languages use the normal function return mechanism, while languages with exceptions return control by throwing or raising an exception. Exceptions are a vast improvement over using normal function returns, but both schemes suffer from a common flaw: while searching for a function that can recover, the stack unwinds, which means code that might recover has to do so without the context of what the lower-level code was trying to do when the error actually occurred.

Consider the hypothetical call chain of high, medium, low. If low fails and medium can’t recover, the ball is in high‘s court. For high to handle the error, it must either do its job without any help from medium or somehow change things so calling medium will work and call it again. The first option is theoretically clean but implies a lot of extra code—a whole extra implementation of whatever it was medium was supposed to do. And the further the stack unwinds, the more work that needs to be redone. The second option—patching things up and retrying—is tricky; for high to be able to change the state of the world so a second call into medium won’t end up causing an error in low, it’d need an unseemly knowledge of the inner workings of both medium and low, contrary to the notion that each function is a black box.