Error vs. Exception

There has been confusion about the distinction between errors and exceptions for a long time, repeated threads in Haskell-Cafe and more and more packages that handle errors and exceptions or something between. Although both terms are related and sometimes hard to distinguish, it is important to do it carefully. This is like the confusion between parallelism and concurrency.

The first problem is that "exception" seems to me to be the historically younger term. Before there were only "errors", independent of whether they were programming, I/O or user errors. In this article we use the term exception for expected but irregular situations at runtime and the term error for mistakes in the running program that can be resolved only by fixing the program. We do not want to distinguish between different ways of representing exceptions: Maybe, Either, exceptions in IO monad, or return codes, they all represent exceptions and are worth considering for exception handling.

The history may have led to the identifiers we find today in the Haskell language and standard Haskell modules.

Exceptions: Prelude.catch, Control.Exception.catch, Control.Exception.try, IOError, Control.Monad.Error
Errors: error, assert, Control.Exception.catch, Debug.Trace.trace

Note, that the catch function from Prelude handles exclusively exceptions, whereas its counterpart from Control.Exception also catches certain kinds of undefined values.

Prelude> catch (error "bla") (\msg -> putStrLn $ "caught " ++ show msg)
*** Exception: bla

Prelude> Control.Exception.catch (error "bla") (\msg -> putStrLn $ "caught " ++ show (msg::Control.Exception.SomeException))
caught bla

This is unsafe, since Haskell's error is just sugar for undefined, that shall help spotting a programming error. A program should work as well when all errors and undefineds are replaced by infinite loops. However infinite loops in general cannot be caught, whereas calls to sugared functions like error can.

Even more confusion was initiated by the Java programming language to use the term "exceptions" for programming errors like the NullPointerException and introducing the distinction between checked and unchecked exceptions.

Examples

Let's give some examples for explaining the difference between errors and exceptions and why the distinction is important.

First, consider a compiler like GHC. If you feed it a program that contains invalid syntax or inconsistent types, it emits a description of the problem. Such occurrences are considered to be exceptions. GHC anticipates bad syntax and mismatched types and handles them by generating useful messages for the user. However, if GHC spits out a message like "Panic! This should not happen: ... Send a bug report to ghc@haskell.org", then you've encountered a situation which indicates a flaw in GHC. This would be considered an error. It cannot be handled by GHC or by the user. The error message, "Panic!...", is only useful to the GHC developers in fixing the problem.

Ok, these are possible reactions to user input. Now a more difficult question: How should GHC handle corruptions in the files it has generated itself like the interface (.hi) and object files (.o)? These corruptions can be introduced easily by the user by editing the files in a simple text editor, or by network problems or by exchanging files between operating systems or different GHC versions. Thus GHC must be prepared for them, which means, it must generate and handle exceptions here. Program must tell the user at least that there is some problem with the file. A more obscure case - modified files by malicious software. Next question: Must GHC also be prepared for corrupt memory or damages in the CPU? According to the above definition corrupt memory is an exception, not an error. However, GHC cannot do much to solve such situations. Checking hardware is a OS responsibility.

Now we proceed with two examples that show, what happens if you try to treat errors like exceptions:
I was involved in the development of a library that was written in C++. One of the developers told me, that the developers are divided into the ones who like exceptions and the other ones who prefer return codes. As it seem to me, the friends of return codes won. However, I got the impression that they debated the wrong point: Exceptions and return codes are equally expressive, they should however not be used to describe errors. Actually the return codes contained definitions like ARRAY_INDEX_OUT_OF_RANGE. But I wondered: How shall my function react, when it gets this return code from a subroutine? Shall it send a mail to its programmer? It could return this code to its caller in turn, but it will also not know how to cope with it. Even worse, since I cannot make assumptions about the implementation of a function, I have to expect an ARRAY_INDEX_OUT_OF_RANGE from every subroutine. My conclusion is, that ARRAY_INDEX_OUT_OF_RANGE is a (programming) error. It cannot be handled or fixed at runtime, it can only be fixed by its developer. Thus there should be no according return code, but instead there should be asserts.

The second example is a library for advanced arithmetic in Modula-3. I decided to use exceptions for signalling problems. One of the exceptions was VectorSizeMismatch, that was raised whenever two vectors of different sizes should be added or multiplied by a scalar product. However I found, that quickly almost every function in the library could potentially raise this exception and Modula-3 urges you to declare all potential exceptions. (However, ignoring potential exceptions only yields a compiler warning, that can even be suppressed.) I also noticed that due to the way I generated and combined the vectors and matrices the sizes would always match. Thus in case of a mismatch this means, there is not a problem with user input but with my program. Consequently, I removed this exception and replaced the checks by ASSERT. These ASSERTs can be disabled by a compiler switch for efficiency concerns. A correct program fulfils all ASSERTs and thus it does not make a difference whether they are present in the compiled program or not. In a faulty program the presence (or lack) of ASSERTs controls the way a program fails: either by continuing execution giving wrong results, or by immediate termination due to a failed assertion.

With the new handling of vector size compatibility, if the operands of a vector addition originate from user input, then you have to check that their sizes match before you call vector addition. However this is a cheap check. Thus if you want another criterion for distinction of errors and exceptions: Errors can be prevented by (cheap) checks in advance, whereas exceptions can only be handled after a risky action was run. You can easily check for array indices being within array bounds, pointers for being not NULL, divisors for being not zero before calling according functions. In many cases you will not need those checks, because e.g. you have a loop traversing all valid indices of an array, and consequently you know that every index is allowed. You do not need to check exceptions afterwards. In contrast to that, memory full, disk full, file not existing, file without write permission and even overflows are clearly exceptions. Even if you check that there is enough memory available before allocating, the required chunk of memory might just be allocated by someone else between your memory check and your allocation. The file permission might be just changed between checking the permission and writing to the file. Permissions might even change while you write. Overflows are deterministic, but in order to prevent an overflow say for a multiplication, you have to reimplement the multiplication in an overflow-proof way. This will be slower than the actual multiplication. (Processors always show overflows by flags, but almost none of the popular high-level languages allows to query this information.)

My conclusion is that (programming) errors can only be handled by the programmer, not by the running program. Thus the term "error handling" sounds contradictory to me. However supporting a programmer with finding errors (bugs) in their programs is a good thing. I just wouldn't call it "error handling" but "debugging". An important example in Haskell is the module Debug.Trace. It provides the function trace that looks like a non-I/O function but actually outputs something on the console. It is natural that debugging functions employ hacks. For finding a programming error it would be inappropriate to transform the program code to allow I/O in a set of functions that do not need it otherwise. The change would only persist until the bug is detected and fixed. Summarized, hacks in debugging functions are necessary for quickly finding problems without large restructuring of the program and they are not problematic, because they only exist until the bug is removed.

Different from that exceptions are things you cannot fix in advance. You will always have to live with files that cannot be found and user input that is malformed. You can insist that the user does not hit the X key, but your program has to be prepared to receive a "X key pressed" message nonetheless. Thus exceptions belong to the program and the program must be adapted to treat exceptional values where they can occur. No hacks can be accepted for exception handling.

When exceptions become errors

Another issue that makes distinction between exceptions and errors difficult is, that sometimes the one gets converted into the other one.

It is an error to not handle an exception. If a file cannot be opened you must respect that result. You can proceed as if the file could be opened, though. If you do so you might crash the machine or the runtime system terminates your program. All of these effects are possible consequences of a (programming) error. Again, it does not matter whether the exceptional situation is signaled by a return code that you ignore or an IO exception for which you did not run a catch.

When errors become exceptions

Often there is criticism about the distinction between errors and exceptions because there are software architectures where even programming errors of a part shall not crash a larger piece of software. Typical examples are: A process in an operating system shall not crash the whole system if it crashes itself. A buggy browser plugin shall not terminate the browser. A corrupt CGI script shall not bring the web server down, where it runs on.

In these cases errors are handled like exceptions. But there is no reason to dismiss the distinction of errors and exceptions, at all. Obviously there are levels, and when crossing level boundaries it is ok to turn an error into an exception. The part that contains an error cannot do anything to recover from it. Also the next higher level cannot fix it, but it can restrict the damage. Within one encapsulated part of an architecture errors and exceptions shall be strictly separated. (Or put differently: If at one place you think you have to handle an error like an exception, why not dividing the program into two parts at this position? :-) )

There is an important reason to not simply catch an error and proceed as if it were only an exception: The error might have corrupted some data and there is no general way to recover from that. Say, after detecting an error you might want to close a file that you were working on. But since you are coping with an error, something you did not foresee, you cannot know whether the file was already closed again or never opened. So it is better to just abort the program.

The next higher level, the shell calling your program or the browser calling your plugin, shall have registered what has been opened and allocated and can reliably free those resources.

Errors and type system

It is generally considered, that errors in a program imply a lack in the type system. If the type system would be strong enough and the programmers would be patient enough to work out the proofs imposed by library functions, then there would be no errors in programs at all, only exceptions.

An alternative to extending the type system to dependent type system that allows for a wide range of proofs is the Extended Static Checking. For example:

{-# CONTRACT head :: { xs | not (null xs) } -> Ok #-}
head :: [a] -> a
head []    = error "head: empty list"
head (x:_) = x

When there is a pre-condition (or a contract) like here, it is a programming error to give an empty list to head. This means that checking if the list is empty must be done before the call. It has to statically deductible from the call site.

If you write a function and cannot prove that you will not call head on the empty list then either you check before calling, or you use a safe-head function like viewL :: [a] -> Maybe (a, [a]) or a case xs of x:_ -> doSomethingWithHead a; [] -> doSomethingElse or you add a pre-condition to your function.

These contracts somehow look like the exception declarations, but they specify something about preconditions, not about possible results. There would be no sense to give the contracts names in order to handle different ways of violating the contracts after the function has been called with inappropriate arguments.

Call stacks

Both for errors and exceptions some kind of call stack might be helpful to be reported to the programmer or user, respectively. However the call stacks for programmers (for debugging) noticably differ from those for users generated as result of an exception.

For errors we might prefer something like:

 Prelude.head:42:23: empty list
 when calling recursively MyModule.scan.go:2009:12 and MyModule.scan.view:2009:7
 when calling MyGUI.promptString:1234:321
 ... many more details ...

whereas users would certainly more like to see

 Program could not be started,
 because Config file could not be read
 because Config file does not exist in dir0, dir1, dir2

but the exception handler may also decide to use a default configuration instead or ask the user, how to proceed.

Escaping from control structures

In imperative languages we often find statements that escape from a control structure. These escapers are refered to as exceptions, as well. E.g. in C/C++/Java break escapes for loops and return escapes functions and methods. Analogously in Modula-3 EXIT escapes LOOPs and RETURN escapes PROCEDUREs. The analogy between these statements and using the explicit exception handling mechanism of the particular language is also helpful in order to describe the interaction between these statements and handling of regular exceptions. E.g. what exception handlers and resource deallocators shall be run when you leave a loop or function using break? Analogously exceptions can also be used to escape from custom control structures (yeah, higher order functions are also possible in imperative languages) or deep recursive searches. In imperative languages exceptions are often implemented in a way that is especially efficient when deep recursions have to be aborted.

You might debate intensively about whether using exceptions for escaping control structures is abuse of exceptions or not. At least escaping from control structures is more exception than error. Escaping from a control structure is just the irregular case with respect to the regular case of looping/descending in recursion. In Haskell, when you use exception monads like Control.Monad.Exception.Synchronous or Control.Monad.Error, exceptions are just an automated handling of return codes.