Lemming: link to lazy pattern match

2012-09-07T07:39:04Z

link to lazy pattern match

Lemming: show five implementations of partitionEithers and their pros and cons

2012-09-07T07:31:22Z

show five implementations of partitionEithers and their pros and cons

New page

Traversing a list is sometimes more difficult
than it seems to be at the first glance.
With "traversal" I mean to consume one or more lists
and produce one or more new ones.
Our goal is to do this efficiently and lazily.

As a running example I use the <hask>partitionEithers</hask> function
that can be found in the <hask>Data.Either</hask> module
since <code>base-4.0</code>.

Its type signature is
<haskell>
partitionEithers :: [Either a b] -> ([a], [b])
</haskell>
and it does what you expect:
<haskell>
Prelude Data.Either> partitionEithers [Left 'a', Right False, Left 'z']
("az",[False])
Prelude Data.Either> take 100 $ snd $ partitionEithers $ cycle [Left 'a', Right (0 :: Int)]
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
</haskell>

The second example is especially important
because it shows that the input can be infinitely long
and the output can be, too.
That's the proof that the implementation is lazy.
We will use this example as test for our implementations below.

== First attempt - quadratic runtime, not lazy ==

In our first attempt we maintain a state containing two lists
that we want to extend to the result lists step by step.

<haskell>
partitionEithers2 :: [Either a b] -> ([a], [b])
partitionEithers2 =
let aux ab [] = ab
aux (as, bs) (Left a : es) = aux (as ++ [a], bs) es
aux (as, bs) (Right b : es) = aux (as, bs ++ [b]) es
in aux ([], [])
</haskell>

This implementation works for finite lists
but fails for infinite ones.
You will also notice that it is quite slow.
The reason is that appending something to a list like <hask>as</hask>
requires <hask>length as</hask> steps
in order to reach the end of <hask>as</hask>.
Since we do this repeatedly we end up with quadratic runtime.

== Second attempt - linear runtime, still not lazy ==

We have learned that appending something to a list is expensive.
However prepending a single element is very cheap,
it needs only constant number of operations.
Thus we will implement the following idea:
We prepend new elements to the result list
and since this reverses the order of elements,
we reverse the result lists in the end.
<haskell>
partitionEithers1 :: [Either a b] -> ([a], [b])
partitionEithers1 xs =
let aux ab [] = ab
aux (as, bs) (Left a : es) = aux (a : as, bs) es
aux (as, bs) (Right b : es) = aux (as, b : bs) es
(ys,zs) = aux ([], []) xs
in (reverse ys, reverse zs)
</haskell>
This implementation is much faster than the first one
but it cannot be lazy because <hask>reverse</hask> is not lazy.

== Third attempt - linear runtime and full laziness ==

In order to get linear runtime and full laziness
we must produce the list in the same order as the input.
However we must avoid appending to the end of the list.
Instead we must prepend elements to lists that become known in the future.
We must be very careful that the leading elements of the result lists
can be generated without touching the following elements.
Here is the solution:
<haskell>
partitionEithers :: [Either a b] -> ([a], [b])
partitionEithers [] = ([], [])
partitionEithers (Left a : es) =
let (as,bs) = partitionEithers es
in (a:as, bs)
partitionEithers (Right b : es) =
let (as,bs) = partitionEithers es
in (as, b:bs)
</haskell>
It is crucial to know that a <hask>let</hask> binding
matches the top-most data constructor lazily.
The following expressions would match strictly and thus would fail:
<haskell>
(\(as,bs) -> (a:as, bs)) $ partitionEithers es
</haskell>
<haskell>
case partitionEithers es of (as,bs) -> (a:as, bs)
</haskell>
Matching the pair constructor strictly means
that the recursive call to <hask>partitionEithers</hask> is triggered
before the pair constructor of the result is generated.
This starts a cascade that forces all recursive calls
until the end of the input list.

This is different for lazy pattern matches.
The above <hask>let</hask> can be rewritten equivalently to:
<haskell>
let ~(as,bs) = partitionEithers es
in (a:as, bs)
</haskell>
<haskell>
(\ ~(as,bs) -> (a:as, bs)) $ partitionEithers es
</haskell>
<haskell>
case partitionEithers es of ~(as,bs) -> (a:as, bs)
</haskell>
or without the tilde as syntactic sugar:
<haskell>
case partitionEithers es of ab -> (a : fst ab, snd ab)
</haskell>
Of course, both <hask>fst</hask> and <hask>snd</hask>
contain strict pattern matches on the pair constructor
but the key difference to above is
that these matches happen inside the pair constructor of
<hask>(a : fst ab, snd ab)</hask>.
That is, the outer pair constructor can be generated
before the evaluation of <hask>ab</hask> is started.

== Fourth attempt - expert solution ==

Now real experts would not recurse manually
but would let <hask>foldr</hask> do this job.
This allows for [[fusion]].
Additionally real experts would add the line
<hask>(\ ~(as,bs) -> (as,bs))</hask>
in order to generate the pair constructor of the result
completely independent from the input.
This yields maximum laziness.
<haskell>
partitionEithersFoldr :: [Either a b] -> ([a], [b])
partitionEithersFoldr =
(\ ~(as,bs) -> (as,bs)) .
foldr
(\e ~(as,bs) ->
case e of
Left a -> (a:as, bs)
Right b -> (as, b:bs))
([], [])
</haskell>

== Fifth attempt - your solution ==

If you are tired of all these corner cases
that we need to respect in order to get full laziness
then you might prefer to solve the problem
by just combining functions that are known to be lazy.
It is good style anyway to avoid explicit recursion.
Of course, when combining lazy functions
you must still take care that the combinators maintain laziness.
Thus my exercise for you at the end of this article
is to implement <hask>partitionEithers</hask> using standard functions,
say, from <code>base</code> before version 4.
A small hint: the module <hask>Data.Maybe</hask>
turns out to be very useful.

[[Category:Idioms]]

@@ Line 107: / Line 107: @@
 until the end of the input list.
-This is different for lazy pattern matches.
+This is different for [[lazy pattern match]]es.
 The above <hask>let</hask> can be rewritten equivalently to:
 <haskell>
@@ Line 130: / Line 130: @@
 That is, the outer pair constructor can be generated
 before the evaluation of <hask>ab</hask> is started.
 == Fourth attempt - expert solution ==

List traversal - Revision history

Lemming: link to lazy pattern match

Lemming: show five implementations of partitionEithers and their pros and cons