Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Haskell
Wiki community
Recent changes
Random page
HaskellWiki
Search
Search
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Maintaining laziness
(section)
Page
Discussion
English
Read
Edit
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
View history
General
What links here
Related changes
Special pages
Page information
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Laziness breakers == === Maybe, Either, Exceptions === Some laziness breakers are visible in type signatures: <haskell> decodeUTF8 :: [Word8] -> Either Message String </haskell> The <hask>Either</hask> type signals that the function marks decoding-failure by using the <hask>Left</hask> constructor of <hask>Either</hask>. This function cannot be lazy, because when you access the first character of the result, it must already be computed, whether the result is <hask>Left</hask> or <hask>Right</hask>. For this decision, the complete input must be decoded. A better type signature is <haskell> decodeUTF8 :: [Word8] -> (Maybe Message, String) </haskell> where the <hask>String</hask> contains as much characters as could be decoded and <hask>Maybe Message</hask> gives the reason for the stop of the decoding. <hask>Nothing</hask> means the input was completely read, <hask>Just msg</hask> means the decoding was aborted for the reason described in <hask>msg</hask>. If you touch the first element of the pair, the complete decodings is triggered, thus laziness is broken. This means you should first process the <hask>String</hask> and look at <hask>Maybe Message</hask> afterwards. Instead of the unspecific pair type you should use the special type for asynchronous exceptions as found in the [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/explicit-exception explicit exception] package. Especially in parsers you may find a function, called Wadler's force function. It works as follows: <haskell> force y = let Just x = y in Just x </haskell> It looks like a complicated expression for <hask>y</hask> with an added danger of failing unrecoverably when <hask>y</hask> is not <hask>Just</hask>. Its purpose is to use the lazy pattern matching of <hask>let</hask> and to show to the runtime system, that we expect that <hask>y</hask> is always a <hask>Just</hask>. Then the runtime system does not need to wait until it can determine the right constructor but it can proceed immediately. This way, a function can be made lazy, also if it returns <hask>Maybe</hask>. It can however fail, if later it turns out, that <hask>y</hask> is actually <hask>Nothing</hask>. <!-- fail how? To be lazy? Or it is some hideous failure like 'head []'? --> Using force-like functions is sometimes necessary, but should be avoided for data types with more than one constructor. It is better to use an interim data type with one constructor and lift to the multi-constructor datatype when needed. Consider parsers of type <hask>StateT [Word8] Maybe a</hask>. Now consider the parser combinator <haskell>many :: StateT [Word8] Maybe a -> StateT [Word8] Maybe [a]</haskell> which parses as many elements of type <hask>a</hask> as possible. It shall be lazy and thus must be infallible and must not use the <hask>Maybe</hask>. It shall just return an empty list, if parsing of one element fails. A quick hack would be to define <hask>many</hask> using a <hask>force</hask> function. It would be better to show by the type, that <hask>many</hask> cannot fail: <haskell>many :: StateT [Word8] Maybe a -> StateT [Word8] Identity [a]</haskell>. === Early decision === ==== List construction ==== Be aware that the following two expressions are not equivalent. <haskell> -- less lazy if b then f x else f y -- more lazy f (if b then x else y) </haskell> It is <hask>if undefined then f x else f y</hask> is <hask>undefined</hask>, whereas <hask>f (if b then x else y)</hask> is <hask>f undefined</hask>, which is a difference in [[non-strict semantics]]. Consider e.g. <hask>if b then 'a':x else 'a':y</hask>. It is common source of too much strictness to make decisions too early and thus duplicate code in the decision branches. Intuitively spoken, the bad thing about [[code duplication]] (stylistic questions put aside) is, that the run-time system cannot see that in the branches, some things are equal and do it in common before the critical decision. Actually, the compiler and run-time system could be "improved" to do so, but in order to keep things predictable, they do not do so. Even more, this behaviour is required by theory, since by pushing decisions to the inner of an expression you change the semantics of the expression. So we return to the question, what the programmer actually wants. Now, do you think this expression <haskell> if b then [x] else y:ys </haskell> is maximally lazy? It seems so, but actually it is not. In both branches we create non-empty lists, but the run-time system cannot see this. It is <hask>null (if undefined then [x] else y:ys)</hask> again <hask>undefined</hask>, but we like to have it evaluated to <hask>False</hask>. Here we need lazy pattern matching as provided by <hask>let</hask>. <haskell> let z:zs = if b then [x] else y:ys in z:zs </haskell> This expression always returns the constructor <hask>(:)</hask> and thus <hask>null</hask> knows that the list is not empty. However, this is a little bit unsafe, because the <hask>let z:zs</hask> may fail if in the branches of <hask>if</hask> there is an empty list. This error can only caught at run-time which is bad. We can avoid it using the single constructor pair type. <haskell> let (z,zs) = if b then (x,[]) else (y,ys) in z:zs </haskell> which can be abbreviated to <haskell> uncurry (:) (if b then (x,[]) else (y,ys)) </haskell> Another example is the <hask>inits</hask> function. In the Haskell 98 report the implementation <haskell> inits :: [a] -> [[a]] inits [] = [[]] inits (x:xs) = [[]] ++ map (x:) (inits xs) </haskell> is suggested. However you find that <hask>inits undefined</hask> is undefined, although <hask>inits</hask> always should return the empty list as first element. The following implementation does exactly this: <haskell> inits :: [a] -> [[a]] inits xt = [] : case xt of [] -> [] x:xs -> map (x:) (inits xs) </haskell> See also the article on [[base cases and identities]]. ==== Reader-Writer-State monad ==== I do not know whether the following example can be simplified. In this form it occurred in a real application, namely the HTTP package. Consider the following action of the <hask>Control.Monad.RWS</hask> which fetches a certain number of elements from a list. The state of the monad is the input list we fetch the elements from. The reader part provides an element which means that the input is consumed. It is returned as singleton when the caller tries to read from a completely read input. The writer allows to log some information, however the considered action does not output anything to the log. <haskell> getN :: Int -> RWS a [Int] [a] [a] getN n = do input <- get if null input then asks (:[]) else let (fetched,rest) = splitAt n input in put rest >> return fetched </haskell> As we learned as good imperative programmers, we only call <hask>splitAt</hask> when the input is non-empty, that is, only if there is something to fetch. This works in even more many corner cases, but not in the following one. Although <hask>getN</hask> does obviously not log something (i.e. it does not call <hask>tell</hask>), it requires to read the input in order to find out, that nothing was logged: <haskell> *Test> (\(_a,_s,w) -> w) $ runRWS (getN 5) '\n' undefined *** Exception: Prelude.undefined </haskell> The problem is again, that <hask>if</hask> checks the emptiness of the input, which is undefined, since the input is undefined. Thus we must ensure, that the invoked monadic actions are run independent from the input. Only this way, the run-time system can see that the logging stream is never touched. We start refactoring by calling <hask>put</hask> independently from <hask>input</hask>'s content. It works as well for empty lists, since <hask>splitAt</hask> will just return empty lists in this case. <haskell> getN :: Int -> RWS a [Int] [a] [a] getN n = do input <- get let (fetched,rest) = splitAt n input put rest if null input then asks (:[]) else return fetched </haskell> This doesn't resolve the problem. There is still a choice between <hask>asks</hask> and <hask>return</hask>. We have to pull out <hask>ask</hask> as well. <haskell> getN :: Int -> RWS a [Int] [a] [a] getN n = do input <- get let (fetched,rest) = splitAt n input put rest endOfInput <- ask return $ if null input then [endOfInput] else fetched </haskell> Now things work as expected: <haskell> *Test> (\(_a,_s,w) -> w) $ runRWS (getN 5) '\n' undefined [] </haskell> We learn from this example, that sometimes in Haskell it is more efficient to call functions that are not needed under some circumstances. Always remind, that the [[Do notation considered harmful|do notation]] looks only imperative, but it is not imperative. E.g., <hask>endOfInput</hask> is only evaluated if the end of the input is really reached. Thus, the call <hask>ask</hask> does not mean that there is actually an action performed between <hask>put</hask> and <hask>return</hask>. === Strict pattern matching in a recursion === Consider the <hask>partition</hask> function which sorts elements, that match a predicate, into one list and the non-matching elements into another list. This function should also work on infinite lists, but the implementation shipped with GHC up to 6.2 [http://www.haskell.org/pipermail/libraries/2004-October/002645.html failed on infinite lists]. What happened? The reason was that pattern matching was too strict. Let's first consider the following correct implementation: <haskell> partition :: (a -> Bool) -> [a] -> ([a], [a]) partition p = foldr (\x ~(y,z) -> if p x then (x : y, z) else (y, x : z)) ([],[]) </haskell> The usage of <hask>foldr</hask> seems to be reserved for advanced programmers. Formally <hask>foldr</hask> runs from the end to the start of the list. However, how can this work if there is a list without an end? That can be seen when applying the definition of <hask>foldr</hask>. <haskell> foldr :: (a -> b -> b) -> b -> [a] -> b foldr _ b [] = b foldr f b (a:as) = f a (foldr f b as) </haskell> Now we expand this once for an infinite input list, we get <haskell> partition p (a:as) = (\ ~(y,z) -> if p a then (a:y, z) else (y, a:z)) (foldr ... ([],[]) as) </haskell> We see that the whether <hask>a</hask> is prepended to the first or the second list, does only depend on <hask>p a</hask>, and neither on <hask>y</hask> nor on <hask>z</hask>. The laziness annotation <hask>~</hask> is crucial, since it tells, intuitively spoken, that we can rely on the recursive call of <hask>foldr</hask> to return a pair and not <hask>undefined</hask>. Omitting it, would require the evaluation of the whole input list before the first output element can be determined. This fails for infinite lists and is inefficient for finite lists, and that was the bug in former implementations of <hask>partition</hask>. Btw. by the expansion you also see, that it would not help to omit the tilde and apply the above 'force' trick to the 'if-then-else' expression. However, there still remains a small laziness break: There is an unnecessary decision between the pair constructor of the initial accumulator value <hask>([],[])</hask> and the pair constructors within the <hask>if</hask>. This can only be avoided by applying a <hask>force</hask> function to the result of <hask>foldr</hask>: <haskell> partition :: (a -> Bool) -> [a] -> ([a], [a]) partition p = (\ ~(ys,zs) -> (ys,zs)) . foldr (\x ~(y,z) -> if p x then (x : y, z) else (y, x : z)) ([],[]) </haskell> === List reversal === Any use of the list function <hask>reverse</hask> should alert you, since when you access the first element of a reversed list, then all nodes of the input list must be evaluated and stored in memory. Think twice whether it is really needed. The articles on [[Infinity and efficiency]] and [[List traversal]] show how to avoid list reversal.
Summary:
Please note that all contributions to HaskellWiki are considered to be released under simple permissive license (see
HaskellWiki:Copyrights
for details). If you don't want your writing to be edited mercilessly and redistributed at will, then don't submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
DO NOT SUBMIT COPYRIGHTED WORK WITHOUT PERMISSION!
Cancel
Editing help
(opens in new window)
Toggle limited content width