https://wiki.haskell.org/api.php?action=feedcontributions&user=Dylex&feedformat=atomHaskellWiki - User contributions [en]2021-03-01T13:57:39ZUser contributionsMediaWiki 1.27.4https://wiki.haskell.org/index.php?title=Stack_overflow&diff=16789Stack overflow2007-11-11T14:14:58Z<p>Dylex: avoid some unintended links</p>
<hr />
<div>[[Category:Tutorials]]<br />
<br />
== Folds ==<br />
<br />
First, read [[Performance/Accumulating parameter]]. If you are not writing your code tail-recursively, then that is why you are getting stack overflows. However, as the bottom of that page suggests, making code tail-recursive in a lazy language is not quite the same as in a eager language. This page is more geared to the latter case using foldr/l as the prime culprit/example. As such [[Fold]] may be helpful, but isn't too critical. Also knowing what <hask>seq</hask> and <hask>($!)</hask> do as briefly covered in ForcingEagerEvaluation or in the [http://haskell.org/onlinereport/ Haskell Report] is necessary.<br />
<br />
The definitions of the three folds we'll be looking at are as follows,<br />
<haskell><br />
foldr f z [] = z<br />
foldr f z (x:xs) = f x (foldr f z xs)<br />
<br />
foldl f z [] = z<br />
foldl f z (x:xs) = foldl f (f z x) xs<br />
<br />
foldl' f z [] = z<br />
foldl' f z (x:xs) = (foldl' f $! f z x) xs<br />
<br />
foldl' (found in e.g. Data.List) is just a stricter version of foldl.<br />
</haskell><br />
The one-line summary for folds: if the binary operation is strict use foldl' otherwise use foldr.<br />
<br />
----<br />
<br />
Common newbie stack overflowing code:<br />
<haskell><br />
mysum :: [Integer] -> Integer<br />
mysum = foldr (+) 0<br />
<br />
main = print (mysum [1..1000000])<br />
</haskell><br />
If you've read [[Performance/Accumulating parameter]], you should immediately see the problem from the definition of foldr above. Quite simply, foldr isn't tail-recursive! But,<br />
<haskell><br />
concat xss = foldr (++) [[]] xss<br />
</haskell><br />
This is from the Haskell Report. Surely they know what they are doing! And sure enough,<br />
<haskell><br />
main = print (length (concat [[x] | x <- [1..1000000]]))<br />
</haskell><br />
works fine.<br />
<br />
Less common newbie stack overflowing code:<br />
<haskell><br />
mysum :: [Integer] -> Integer<br />
mysum = foldl (+) 0<br />
<br />
main = print (mysum [1..1000000])<br />
</haskell><br />
So what's going on here. Looking at the code for foldl, it looks tail-recursive. Well, much like you can see the problem with a non-tail-recursive factorial by unfolding a few iterations, let's do the same for our foldl definition of sum, but making sure to use a call-by-name/need evaluation order. Here is the unfolding,<br />
<haskell><br />
mysum [1..10] -><br />
foldl (+) 0 (1:[2..10]) -><br />
foldl (+) (0+1) (2:[3..10]) -><br />
foldl (+) (0+1+2) (3:[4..10]) -><br />
foldl (+) (0+1+2+3) (4:[5..10]) -> <nowiki>...</nowiki><br />
</haskell><br />
I think you get the idea. The problem is that we are building up a chain of thunks that will evaluate the sum instead of just maintaining a running sum. What we need to do is to force the addition before recursing. This is exactly what foldl' does.<br />
<br />
Just to check,<br />
<haskell><br />
mysum :: [Integer] -> Integer<br />
mysum = foldl' (+) 0<br />
<br />
main = print (mysum [1..1000000])<br />
</haskell><br />
works fine.<br />
<br />
Now let's go back to the foldr sum and concat. What's the difference between sum and concat that makes the sum definition wrong, but the concat definition right. Again, let's evaluate each by hand.<br />
<haskell><br />
mysum (+) 0 [1..10] -><br />
foldr (+) 0 (1:[2..10]) -><br />
1+foldr (+) 0 (2:[3..10]) -><br />
1+(2+foldr (+) 0 (3:[4..10])) -> <nowiki>...</nowiki><br />
</haskell><br />
Okay, no surprise there.<br />
<haskell><br />
concat [[1],[2],[3],<nowiki>...</nowiki>] -><br />
foldr (++) [[]] ([1]:[[2],[3],<nowiki>...</nowiki>]) -><br />
(1:[])++foldr (++) [[]] [[2],[3],<nowiki>...</nowiki>] -><br />
1:([]++foldr (++) [[]] [[2],[3],<nowiki>...</nowiki>])<br />
</haskell><br />
Notice that there is no '-> ...' at the end. That was the complete evaluation. There is no reason to do anything more unless we look at the more of the result. We may well GC the 1 before we look at the tail, and GC the first cons cell before we look at the second. So, concat runs in a constant amount of stack and further can handle infinite lists (as a note, it's immediately obvious foldl(') can never work on infinite lists because we'll always be in the (:) case and that always immediately recurses). The differentiator between mysum and concat is that (++) is not strict* in its second argument; we don't have to evaluate the rest of the foldr to know the beginning of concat. In mysum, since (+) is strict in its second argument we need the results of the whole foldr before we can compute the final result.<br />
<br />
So, we arrive at the one-line summary: A function strict in its second argument will always* require linear stack space with foldr, so foldl' should be used instead in that case. If the function is lazy/non-strict in its second argument we should use foldr to 1) support infinite lists and 2) to allow a streaming use of the input list where only part of it needs to be in memory at a time.<br />
<br />
Okay, both here and in the one-line summary, there is no mention of foldl. When should foldl be used? The pragmatic answer is: by and far it shouldn't be used. A case where it makes a difference is if the function is conditionally strict in its first argument depending on its second, where I use conditionally strict to mean a function that is strict or not in one argument depending on another argument(s). For an example, consider a definition of <hask>(*)</hask> that builds up ASTs of arithmetic expressions and incorporates a simplification <hask>(a*0 = 0 and then 0*a = 0)</hask>; then if <hask>product</hask> is defined by <hask>foldl (*) 1</hask>, <hask>product [</hask>&perp;<hask>,0]</hask> will terminate with 0 while a definition in terms of <hask>foldl'</hask> wouldn't. However, I can't think of a really convincing example. In most cases, foldl' is what you want.<br />
<br />
* A strict function is a function, f such that f &perp; = &perp;. Typically, we think of a function "being strict" in an argument as a function that "forces" its argument, but the above definition of strict should immediately suggest another function that is strict and doesn't "force" it's argument in the intuitive sense, namely id. ([]++) = id and therefore is a strict function. Sure enough, if you were to evaluate (concat (repeat [])) it would not terminate. As such (++) is a conditionally strict function. This also makes the "always" slightly imprecise, a function that is strict because it just returns it's argument will not use up stack space (but is, as I mentioned, still an issue for infinitely long lists).<br />
<br />
== Scans ==<br />
<br />
A subtle stack-overflow surprise comes when<br />
<haskell><br />
print (scanl (+) 0 [1..1000000])<br />
</haskell><br />
completes successfully but<br />
<haskell><br />
print (last (scanl (+) 0 [1..1000000]))<br />
</haskell><br />
causes a stack overflow.<br />
<br />
The latter stack overflow is explained exactly as before, namely,<br />
<haskell><br />
last (scanl (+) 0 [1..5]) -><br />
<nowiki>... several steps ...</nowiki> -><br />
((((0+1)+2)+3)+4)+5<br />
</haskell><br />
This is exactly like <hask>foldl</hask>, building a deep thunk, then evaluating, needing much stack.<br />
<br />
Most puzzling is why the former succeeds without a stack overflow. This is caused by a combination of two factors:<br />
* thunks in the list produced by <hask>scanl</hask> enjoy sharing: late thunks build upon early thunks<br />
* printing a list of numbers evaluates early thunks and then late thunks<br />
To exemplify, here is an abridged progression. I use this pseudo format to depict sharing of thunks<br />
<haskell><br />
expr where var=expr, var=expr<br />
</haskell><br />
although in reality it is more like a pointer graph.<br />
<haskell><br />
print (scanl (+) 0 [1..1000000]) -><br />
print (a : case [1..1000000] of <nowiki>...</nowiki> x:xs -> scanl (+) (a+x) xs) where a=0 -><br />
<nowiki>... evaluate a to 0 for printing, I/O, some more steps ...</nowiki> -><br />
<br />
print (scanl (+) (a+1) [2..1000000]) where a=0 -><br />
print (b : case [2..1000000] of <nowiki>...</nowiki> x:xs -> scanl (+) (b+x) xs) where a=0, b=a+1 -><br />
<nowiki>... evaluate b to 1 for printing, I/O, some more steps ...</nowiki> -><br />
<br />
print (scanl (+) (b+2) [3..1000000]) where b=1 -><br />
print (c : case [3..1000000] of <nowiki>...</nowiki> x:xs -> scanl (+) (c+x) xs) where b=1, c=b+2 -><br />
<nowiki>... evaluate c to 3 for printing, I/O, some more steps ...</nowiki> -><br />
<br />
print (scanl (+) (c+3) [4..1000000]) where c=3 -><br />
print (d : case [4..1000000] of <nowiki>...</nowiki> x:xs -> scanl (+) (d+x) xs) where c=3, d=c+3 -><br />
<nowiki>... evaluate d to 6 for printing, I/O, some more steps ...</nowiki> -><br />
<br />
print (scanl (+) (d+4) [5..1000000]) where d=6 -> etc.<br />
</haskell><br />
The important thing to watch is the life cycle of intermediate thunks, e.g., <hask>c</hask> is created at some point as a 1-level deep addition, then almost immediately reduced to a number out of necessity, before a later thunk <hask>d</hask> builds upon it. Therefore there is no growth and no stack overflow.<br />
<br />
In contrast, again, <hask>last (scanl (+) 0 [1..1000000])</hask> skips over to the last thunk right away. Since early items are not reduced yet, the last item remains a huge chain and causes overflow.</div>Dylexhttps://wiki.haskell.org/index.php?title=Performance/JHC&diff=16786Performance/JHC2007-11-11T13:46:45Z<p>Dylex: </p>
<hr />
<div>#REDIRECT [[Jhc]]</div>Dylexhttps://wiki.haskell.org/index.php?title=Performance&diff=16785Performance2007-11-11T13:29:56Z<p>Dylex: </p>
<hr />
<div>{{Performance infobox}}<br />
Welcome to the '''Haskell Performance Resource''', the collected wisdom on how to make your Haskell programs go faster. <br />
<br />
== Introduction ==<br />
<br />
In most cases it is possible to write a Haskell program that performs as well as, or better than, the same program written in [''insert language here'']. There's a big caveat though: you may have to modify your code significantly in order to improve its performance. Compilers such as GHC are good at eliminating layers of abstraction, but they aren't perfect, and often need some help. <br />
<br />
There are many non-invasive techniques: compiler options, for example. Then there are techniques that require adding some small amounts of performance cruft to your program: strictness annotations, for example. If you still don't get the best performance, though, it might be necessary to resort to larger refactorings.<br />
<br />
Sometimes the code tweaks required to get the best performance are non-portable, perhaps because they require language extensions that aren't implemented in all compilers (e.g. unboxing), or because they require using platform-specific features or libraries. This might not be acceptable in your setting.<br />
<br />
If the worst comes to the worst, you can always write your critical code in C and use the FFI to call it. Beware of the boundaries though - marshaling data across the FFI can be expensive, and multi-language memory management can be complex and error-prone. It's usually better to stick to Haskell if possible.<br />
<br />
== Basic techniques ==<br />
<br />
The key tool to use in making your Haskell program run faster is ''profiling''. Profiling is provided by [[GHC]] and [[nhc98]]. There is ''no substitute'' for finding where your program's time/space is ''really'' going, as opposed to where you imagine it is going.<br />
<br />
Another point to bear in mind: By far the best way to improve a program's performance ''dramatically'' is to use better algorithms. Once profiling has thrown the spotlight on the guilty time-consumer(s), it may be better to re-think your program than to try all the tweaks listed below.<br />
<br />
Another extremely efficient way to make your program snappy is to use library code that has been Seriously Tuned By Someone Else. You ''might'' be able to write a better sorting function than the one in <tt>Data.List</tt>, but it will take you much longer than typing <tt>import Data.List</tt>.<br />
<br />
We have chosen to organise the rest of this resource first by Haskell construct (data types, pattern matching, integers), and then within each category to describe techniques that apply across implementations, and also techniques that are specific to a certain Haskell implementation (e.g. GHC). There are some implementation-specific techniques that apply in general - those are linked from the [[Haskell Performance Resource#General_Implementation-Specific_Techniques | General Implementation-Specific Techniques]] section below.<br />
<br />
== Haskell constructs ==<br />
<br />
* [[/Data Types/]]<br />
* [[/Functions/]]<br />
* [[/Overloading/]]<br />
* [[/FFI/]]<br />
* [[/Arrays/]]<br />
* [[/Strings/]]<br />
* [[/Integers/]]<br />
* [[/IO | I/O ]]<br />
* [[/Floating Point/]]<br />
* [[/Concurrency/]]<br />
* [[/Modules/]]<br />
* [[/Monads/]]<br />
<br />
== General techniques ==<br />
<br />
* [[/Strictness/]]<br />
* [[/Laziness/]]<br />
* [[/Space | Avoiding space leaks]]<br />
* [[/Accumulating parameter|Accumulating parameters]]<br />
* [[Stack_overflow|Avoiding stack overflow]]<br />
<br />
== Compiler specific techniques ==<br />
<br />
* [[/GHC/]]<br />
* [[/NHC98| nhc98]]<br />
* [[/Hugs/]]<br />
* [[/Yhc/]]<br />
* [[/JHC/]]<br />
<br />
== More information ==<br />
<br />
* There are plenty of good examples of Haskell code written for performance in the [http://shootout.alioth.debian.org/ The Computer Language Shootout Benchmarks]<br />
* And many alternatives, with discussion, on the [http://web.archive.org/web/20060209215702/http://haskell.org/hawiki/ShootoutEntry old Haskell wiki]<br />
<br />
== Specific comparisons of data structures ==<br />
* Data.Sequence vs. lists<br />
<br />
Data.Sequence has complexity O(log(min(i,n-i))) for access, insertion and update to position i of a sequence of length n.<br />
<br />
List has complexity O(i).<br />
<br />
List is a non-trivial constant-factor faster for operations at the head (cons and head), making it a more efficient choice for stack-like and stream-like access patterns. Data.Sequence is faster for every other access pattern, such as queue and random access.<br />
<br />
== Additional Tips ==<br />
<br />
* Use strict returns ( return $! ...) unless you absolutely need them lazy.<br />
* foldl' over foldr unless you have to use foldr.<br />
* Profile, profile, profile - understand who is hanging on to the memory (+RTS -hc) and how it's being used (+RTS -hb).<br />
* Use +RTS -p to understand who's doing all the allocations and where your time is being spent.<br />
* Approach profiling like a science experiment - make one change, observe if anything is different, rollback and make another change - observer the change. Keep notes!<br />
<br />
<br />
[[Category:Idioms]]<br />
[[Category:Language]]<br />
[[Category:Performance|*]]</div>Dylexhttps://wiki.haskell.org/index.php?title=Record_access&diff=16560Record access2007-11-06T04:52:03Z<p>Dylex: added link to partial implementation</p>
<hr />
<div>Here some proposal for desugared fine functional record field access for HaskellTwo and above.<br />
<haskell><br />
{- |<br />
In Haskell 98 the name of a record field<br />
is automatically also the name of a function which gets the value<br />
of the according field.<br />
E.g. if we have<br />
@<br />
data Pair a b = Pair {first :: a, second :: b}<br />
@<br />
then<br />
@<br />
first :: Pair a b -> a<br />
second :: Pair a b -> b<br />
@<br />
However for setting or modifying a field value<br />
we need to use some syntactic sugar, which is often clumsy.<br />
@<br />
modifyFirst :: (a -> a) -> (Pair a b -> Pair a b)<br />
modifyFirst f r@(Pair {first=a}) = r{first = f a}<br />
@<br />
<br />
We propose to extend the meaning of the record field names<br />
to a function which allows setting, getting and modifying values easily.<br />
-}<br />
module RecordAccess where<br />
<br />
import Control.Monad.State (MonadState, StateT)<br />
import qualified Control.Monad.State as State<br />
import Data.Char (ord)<br />
<br />
{- |<br />
The access functions we propose, look very similar to those<br />
needed for List.mapAccumL (but parameter order is swapped) and State monad.<br />
They get the new value of the field and the record<br />
and return the old value of the field and the record with the updated field.<br />
-}<br />
type Accessor r a = a -> r -> (a, r)<br />
<br />
{- *<br />
Access helper functions,<br />
these are similar to State methods and should be in Prelude<br />
-}<br />
<br />
{- | Set the value of a field. -}<br />
set :: Accessor r a -> a -> r -> r<br />
set f x = snd . f x<br />
<br />
{- | Set many fields at once.<br />
<br />
This function could also be used for initialisation of record,<br />
if record value with undefined fields is provided.<br />
<br />
Drawback:<br />
Since all types in a list must have the same type,<br />
you can set only values of the same type.<br />
-}<br />
setMany :: [r -> (a, r)] -> r -> r<br />
setMany = flip (foldl (\x f -> snd (f x)))<br />
<br />
{- |<br />
This is a general function,<br />
but it is especially useful for setting many values of different type at once.<br />
-}<br />
compose :: [r -> r] -> r -> r<br />
compose = flip (foldl (flip id))<br />
<br />
{- | Get the value of a field. -}<br />
get :: Accessor r a -> r -> a<br />
get f = fst . f undefined<br />
<br />
{- | Transform the value of a field by a function. -}<br />
modify :: Accessor r a -> (a -> a) -> (r -> r)<br />
modify f g rOld =<br />
let (a,rNew) = f (g a) rOld<br />
in rNew<br />
<br />
<br />
{- *<br />
Access helper functions in a State monad.<br />
-}<br />
<br />
setState :: MonadState r m => Accessor r a -> a -> m ()<br />
setState f x = State.modify (set f x)<br />
<br />
getState :: MonadState r m => Accessor r a -> m a<br />
getState f = State.gets (get f)<br />
<br />
modifyState :: MonadState r m => Accessor r a -> (a -> a) -> m ()<br />
modifyState f g = State.modify (modify f g)<br />
<br />
<br />
<br />
{- * Reading records from streams -}<br />
<br />
class ReadBin a where<br />
readBin :: String -> Maybe (a, String)<br />
<br />
instance ReadBin Char where<br />
readBin (c:cs) = Just (c,cs)<br />
readBin _ = Nothing<br />
<br />
instance ReadBin Int where<br />
readBin (c0:c1:c2:c3:cs) =<br />
Just (foldl1 (\acc d -> acc*256+d) (map ord [c0,c1,c2,c3]), cs)<br />
readBin _ = Nothing<br />
<br />
type Parser r = (r, String) -> Maybe (r, String)<br />
<br />
readField :: ReadBin a => Accessor r a -> Parser r<br />
readField f (r,s) =<br />
do (x,s') <- readBin s<br />
return (set f x r, s')<br />
<br />
readRecord :: [Parser r] -> Parser r<br />
readRecord ps = flip (foldl (>>=)) ps . Just<br />
<br />
<br />
<br />
{- * Example accessors for the pair type -}<br />
<br />
{- | Access to the first value of a pair. -}<br />
first :: Accessor (a,b) a<br />
first xNew (xOld,y) = (xOld, (xNew,y))<br />
<br />
{- | Access to the second value of a pair. -}<br />
second :: Accessor (a,b) b<br />
second yNew (x,yOld) = (yOld, (x,yNew))<br />
<br />
<br />
<br />
{- * Example accesses -}<br />
<br />
{- | Example of using 'set', 'get', 'modify'. -}<br />
example :: Int<br />
example =<br />
get second $<br />
modify second succ $<br />
set first 'a' $<br />
('b',7)<br />
<br />
exampleState :: State.State (Char,Int) Int<br />
exampleState =<br />
do setState first 'a'<br />
modifyState second succ<br />
getState second<br />
<br />
exampleInit :: (Char,Int)<br />
exampleInit =<br />
compose [set first 'b', modify first succ, set second 7]<br />
(undefined,undefined)<br />
-- setMany [first 'b', second 7] (undefined,undefined)<br />
<br />
exampleRead :: Maybe ((Char,Int), String)<br />
exampleRead =<br />
readRecord<br />
[readField first, readField second]<br />
((undefined,undefined), "c\059\154\202\000")<br />
</haskell><br />
<br />
There is a partial [https://dylex.net:9947/~dylan/src/RecordAccess.hs implementation] of this using [[Template Haskell]] for [[GHC]].<br />
<br />
[[Category:Proposals]]</div>Dylex