Performance/Laziness

Haskell Performance Resource Constructs: Data Types - Functions Overloading - FFI - Arrays Strings - Integers - I/O Floating point - Concurrency Modules - Monads Techniques: Strictness - Laziness Avoiding space leaks Accumulating parameter Implementation-Specific: GHC - nhc98 - Hugs Yhc - JHC

Laziness: Procrastinating for Fun & Profit

To look at how laziness works in Haskell, and how to make it do efficient work, we'll implement a merge sort function. It will have the type:

 merge_sort :: (Ord a) => [a] -> [a]

We'll also need a function to split the list in two, I'll call this cleaving, and it will look like this:

 cleave :: [a] -> ([a],[a])

Let's start by implementing the cleaving function. The conventional way to split a list in merge sort is to take the first half elements off the front, and the remaining half elements after that. The problem is that finding the length of a list means additional traversing of the whole list. So instead, we'll take pairs of elements off the front. Define two functions:

 evens []  = []
 evens [x] = [x]
 evens (x:_:xs) = x : evens xs

 odds []  = []
 odds [x] = []
 odds (_:x:xs) = x : odds xs

and use them to implement cleave:

 cleave xs = (evens xs, odds xs)

Experience in a strictly evaluation language like SML or Objective CAML may lead you to write alternate versions using an accumulating parameter. Assuming that reversing the order of the elements doesn't matter, you could use this function to split the list into even and odd elements and implement the cleave function as follows:

 cleave = cleave' ([],[]) where
     cleave' (eacc,oacc) [] = (eacc,oacc)
     cleave' (eacc,oacc) [x] = (x:eacc,oacc)
     cleave' (eacc,oacc) (x:x':xs) = cleave' (x:eacc,x':oacc) xs

This initially appears to be a better implementation. It's tail recursive, and by either strictness analysis or explicitly making the accumulating parameters strict, it won't blow the stack up... Believe it or not, our first implementation was better.

Why? In order to produce the first element of either list, the second version needs to process the entire list. In a non-strict language, we could encounter an infinite list, and we'd like our function to work nicely on them. Consider the effect of:

 head . fst $ cleave [0..10000000]

With our first definition, we'll get $0$ in constant time. But with our second, we'll get it in $O (n)$ time, and our calculation will diverge on an infinite list like [0..].

Let's look at how evens works and how lists are represented in Haskell. Lists are represented as either an empty list, or a "cons" cell that consists of an element and the remaining list. In pseudo-Haskell, we might write:

 data [a] = [] | a : [a]

In a lazy language, an expression is only evaluated when needed, and (:), a.k.a. "cons", is a lazy data constructor. The machinery used to implement this is called a thunk. It's essentially a value with two possible states: either a computed value, or the process to compute that value. When we assign a value in Haskell, we create a thunk with the instructions to compute the value of the expression we've assigned. When this thunk is forced, these instructions are used to compute a value which is stored in the thunk. The next time the value is required, this computed value is retrieved. Lazyness can be implemented in languages like SML using this method together with mutable references.

So in the recursive case of evens, we produce a thunk that contains a list cons cell. This cons cell contains two thunks, one of the element value, and one of the rest of the list. The thunk for the element is taken from the list the function is operating on, and the thunk for the rest of the list consists of instructions to compute the rest of the list using evens. We'd say that evens and odds are lazy in their input: they consume only enough value to produce the value. As an example of how lazy functions work, consider:

 head ( 5 : undefined )
 tail [undefined, 5]
 evens (5 : undefined : 3 : undefined : 1 : undefined : [])
 take 3 $ evens (5 : undefined : 3 : undefined : 1 : undefined : undefined)

Despite how all the inputs contain partially undefined values, all the values of the function applications are valid values. A lazy function will only diverge when a required value for its computation diverges. If you're wondering why we have two undefineds at the end of the list, recall how evens was implemented. We need two undefineds to make sure the third case is selected: the one with two elements followed by a remainder list (though undefined, but needed because of the take 3 call). Having only one undefined means that pattern matching (1:undefined) against either (x:[]) or (x:(_:xs)) (to choose between the 2nd and 3rd cases) will fail.

Now let's look at what happens with lazy evaluation and diverging values. Consider:

 tail (5 : undefined)
 head [undefined, 5]
 odds (5 : undefined : 3 : undefined : 1 : undefined : [])
 take 4 $ evens ( 5 : undefined : 3 : undefined : 1 : undefined : undefined)

So the application of evens to a non-trivial list results in a thunk being returned immediately. And when we ask for the first element of the list evens produces, we only evaluate the value thunk as much as needed for it to produce its first cons cell, of which the tail is itself a thunk too. This is why we can apply evens or odds (and cleave for that matter) to an infinite list. We'll implement merge_sort using cleave:

 merge_sort []  = []
 merge_sort [x] = [x]
 merge_sort lst = let (e,o) = cleave lst 
                  in merge (merge_sort e) (merge_sort o) where
     merge :: (Ord a) => [a] -> [a] -> [a]
     merge xs [] = xs
     merge [] ys = ys
     merge xs@(x:t) ys@(y:u)
         | x <= y    = x : merge t ys
         | otherwise = y : merge xs u

You can see that this function isn't lazy. It begins by cleaving the list recursively until it is left with trivial lists, i.e. the lists with zero or one elements. These are obviously already sorted. It then uses the nested function merge, which combines two ordered lists while preserves the order. (this function doesn't have to be nested as it does not refer to anything in the outside scope) The act of partitioning the list into trivial lists before the reassembly can begin means that the entire list needs to be accessed before we can begin merging and assembling the output lists on each recursion level. In this case, we can't make a lazier solution, one that would work on an infinite list. This shouldn't be surprising, as in sorting a list, the first element out should be the least (or greatest) of all. We say that merge_sort is strict in the array to be sorted: if an infinite list is supplied, the computation will diverge - no output will ever be produced. There are some operations that cannot be done lazily, for instance, sorting a list.

But lazyness still comes into play with the definitions like min xs = head . merge_sort $ xs. In finding the minimal element this way only the necessary amount of comparisons between elements will be performed ( $O (n)$ a.o.t. $O (n \log n)$ comparisons needed to fully sort the whole list).

We've seen the difference between a lazy function and a strict function. Lazy computing has two major appeals. The first is that only the necessary work is done to compute a value. The second is that we can operate in the presence of infinite and undefined data structures, as long as we don't examine the undefined parts or try to process the infinity of values.