Haskell programming tips/Discussion
About
This page is meant for discussions about ThingsToAvoid, as consensus seems to be difficult to reach, and it'd be nice if newbies wouldn't bump into all kinds of Holy Wars during their first day of using Haskell ;)
You may want to add your name to your comments to make it easier to refer to your words and ridicule you in public.
Other Suggestions
This article is about elegance, so could we please inject some elegance into the article itself? Why do many of the functions have no type declaration? It took me quite some time to figure out the type declaration on foreach2: Monad m => [a] -> [b] -> (a -> b -> m c) -> m ()
These functions should actually be tested. The way linify is currently defined, it produces 4 GHC warnings. I do not even know how to get rid of 2 of them.
Readability could be considerably improved. At the moment, many sections start out fine, but then they suffer from a long list of additions which have not been properly integrated, so it reads something like this: Do this. Oh, but there is this too, and there is this caveat, but you could also do this, and performance is sometimes better if you do this, ...
May I also suggest that periods be left off the end of a sentence whose last word is a code section? Currently, the article is formatted such that I have this: Words words words, see my code here:
\n
[ Code section
Code section
Code section]
\n
.
\n
That is ridiculous. Just remove the period and the entailing massive whitespace.
Flame Away
Avoid recursion
Many times explicit recursion is the fastest way to implement a loop. e.g.
loop 0 _ acc = acc
loop i v acc = ...
Using HOFs is more elegant, but makes it harder to reason about space usage, also explicit recursion does not make the code hard to read - just explicit about what it is doing.
-- EinarKarttunen
I disagree with this. Sometimes explicit recursion is simpler to design, but I don't see how it makes space usage any easier to reason about and can see how it makes it harder. By using combinators you only have to know the properties of the combinator to know how it behaves, whereas I have to reanalyze each explicitly implemented function. StackOverflow gives a good example of this for stack usage and folds. As far as being "faster" I have no idea what the basis for that is; most likely GHC would inline into the recursive version anyways, and using higher-order list combinators makes deforesting easier. At any rate, if using combinators makes it easier to correctly implement the function, then that should be the overriding concern.
-- DerekElkins
I read lots of code with recursion -- and it was hard to read, because it is hard to retrieve the data flow from it. -- HenningThielemann
IMO explicit recursion usually does make code harder to read, as it's trying to do two things at once: Recursing and performing the actual work it's supposed to do. Phrases like OnceAndOnlyOnce and SeparationOfConcerns come to the mind. However, the concern about efficiency is (partly) justified. HOFs defined for certain recursion patterns often need additional care to achieve the same performance as functions using explicit recursion. As an example, in the following code, two sum functions are defined using two equivalent left folds, but only one of the folds is exported. Due to various peculiarities of GHCs strictness analyzer, simplifier etc, the call from main to mysum_2 works, yet the call to mysum_1 fails with a stack-overflow.
module Foo (myfoldl_1, mysum_1, mysum_2) where
-- exported
myfoldl_1 f z xs = fold z xs
where
fold z [] = z
fold z (x:xs) = fold (f z x) xs
-- not exported
myfoldl_2 f z xs = fold z xs
where
fold z [] = z
fold z (x:xs) = fold (f z x) xs
mysum_1 = myfoldl_1 (+) 0
mysum_2 = myfoldl_2 (+) 0
module Main where
import Foo
xs = [1..1000*1000]
main = do
print (mysum_2 xs)
print (mysum_1 xs)
(Results might differ for your particular GHC version, of course...) -- RemiTurk
GHC made "broken" code work. As covered in StackOverflow, foldl is simply not tail-recursive in a non-strict language. Writing out mysum would still be broken. The problem here isn't the use of a HOF, but simply the use of non-tail-recursive function. The only "care" needed here is not relying on compiler optimizations (the code doesn't work in my version of GHC) or the care needed when relying on compiler optimizations. Heck, the potential failure of inlining (and subsequent optimizations following from it) could be handled by restating recursion combinator definitions in each module that uses them; this would still be better than explicit recursion which essentially restates the definition for each expression that uses it.
-- DerekElkins
Here is a demonstration of the problem - with the classic sum as the problem. Of course microbenchmarking has little sense, but it tells us a little bit which combinator should be used.
import Data.List
import System
sum' :: Int -> Int -> Int
sum' 0 n = sum [1..n]
sum' 1 n = foldl (\a e -> a+e) 0 [1..n]
sum' 2 n = foldl (\a e -> let v = a+e in v `seq` v) 0 [1..n]
sum' 3 n = foldr (\a e -> a+e) 0 [1..n]
sum' 4 n = foldr (\a e -> let v = a+e in v `seq` v) 0 [1..n]
sum' 5 n = foldl' (\a e -> a+e) 0 [1..n]
sum' 6 n = foldl' (\a e -> let v = a+e in v `seq` v) 0 [1..n]
sum' 7 n = loop n 0
where loop 0 acc = acc
loop n acc = loop (n-1) (n+acc)
sum' 8 n = loop n 0
where loop 0 acc = acc
loop n acc = loop (n-1) $! n+acc
main = do [v,n] <- getArgs
print $ sum' (read v) (read n)
When executing with n = 1000000 it produces the following results:
* seq does not affect performance - as excepted. * foldr overflows stack - as excepted. * explicit loop takes 0.006s * foldl takes 0.040s * foldl' takes 0.080s
In this case the "correct" choice would be foldl' - ten times slower than explicit recursion. This is not to say that using a fold would not be better for most code. Just that it can have subtle evil effects in inner loops.
-- EinarKarttunen
This is ridiculous. The "explicit recursion" version is not the explicit recursion version of the foldl' version. Here is another set of programs and the results I get:
import Data.List
import System
paraNat :: (Int -> a -> a) -> a -> Int -> a
paraNat s = fold
where fold z 0 = z
fold z n = (fold $! s n z) (n-1)
localFoldl' c = fold
where fold n [] = n
fold n (x:xs) = (fold $! c n x) xs
sumFoldl' :: Int -> Int
sumFoldl' n = foldl' (+) 0 [1..n]
sumLocalFoldl' :: Int -> Int
sumLocalFoldl' n = localFoldl' (+) 0 [1..n]
sumParaNat :: Int -> Int
sumParaNat n = paraNat (+) 0 n
sumRecursionNat :: Int -> Int
sumRecursionNat n = loop n 0
where loop 0 acc = acc
loop n acc = loop (n-1) $! n+acc
sumRecursionList :: Int -> Int
sumRecursionList n = loop [1..n] 0
where loop [] acc = acc
loop (n:ns) acc = loop ns $! n+acc
main = do
[v,n] <- getArgs
case v of
"1" -> print (sumFoldl' (read n))
"2" -> print (sumLocalFoldl' (read n))
"3" -> print (sumParaNat (read n))
"4" -> print (sumRecursionNat (read n))
"5" -> print (sumRecursionList (read n))
(best but typical real times according to time of a few trials each)
sumFoldl' takes 2.872s
sumLocalFoldl' takes 1.683s
sumParaNat takes 0.212s
sumRecursionNat takes 0.213s
sumRecursionList takes 1.669s
sumLocalFoldl' and sumRecursionList were practically identical in performance and sumParaNat and sumRecursionNat were practically identical in performance. All that's demonstrated is the cost of detouring through lists (and the cost of module boundaries I guess).
-- DerekElkins
n+k patterns
n+k patterns are similar to the definition of infix functions, thus they make it harder to understand patterns. http://www.dcs.gla.ac.uk/mail-www/haskell/msg01131.html (Why I hate n+k)
So far I have seen only one rule for Good Coding Practice in Haskell: Do Not Use n+k Patterns. I hope someone can give some directions, how to avoid known pitfalls (especially Space Leaks). -- On the haskell mailing list
The most natural definition of many functions on the natural numbers is by induction, a fact that can very nicely be expressed with the (n+1)-pattern notation. Also, (n+k)-patterns are unlikely to produce space leaks, since if anything, they make the function stricter. The possible ambiguities don't seem to appear in real code. --ThomasJäger
If natural numbers would be defined by PeanoNumbers then pattern matching on successors would be straightforward. This would be fairly slow and space consuming, that's why natural numbers are not implemented this way. They are implemented using binary numbers and it is not even tried to simulate the behaviour of Natural
(e.g. laziness). Thus I wouldn't state, that 3
matches the pattern 2+1
. -- HenningThielemann
Lazyness/Strictness isn't really an argument in this situation, since when using a strict natural type, e.g.
data Nat = Zero | Succ !Nat
pattern matching on Nat behaves exactly like n+1 patterns. -- ThomasJaeger
n+k patterns also apply to negative numbers - don't they? Yes, I see the analogy but in the current implementation it's nothing than sugar. -- HenningThielemann
No, they don't. `let f (n+2) = n in f 1` is a runtime error. -- DerekElkins
But translating it into pattern matching is impossible, thus it must be a static error. -- HenningThielemann
Use syntactic sugar wisely
I have to say, i strongly disagree with most of what is said in this section. First of all the claim
Syntactic extensions make source code processors complicated and error prone. But they don't help to make programs safer (like type checks and contracts) or easier to maintain (like modularization and scoping).
is obviously wrong. There certainly are applications of syntatic sugar that make programs easier to read, therefore easier to understand, easier to maintain, and safer, as you are more likely to spot bugs.
- My statement was: Don't use syntactic sugar by default because you believe it makes your program more readable automatically (I've read lots of code of programmers who seem to believe that), but use syntactic sugar if (and only if) it makes the program more readable. Syntactic sugar is only a matter of readability not of safety in the sense of scoping and type checking. If I accidentally introduce inconsistencies into my code, the name resolver or the type checker will report problems, but not the de-sugarizer. -- HenningThielemann
ad. right sections are evil
I can't believe someone is seriously advocating to replace (&$ x)
with the less readable flip (&$) x
just because that's the way one specific haskell implementation is reporting errors. Most of the time, I don't even care to read type errors because once I know which line the error occured in, it is usually immediately clear what the error was. Even if you have to read the error message, there should be no difficulty to see that (+1) and flip (+) 1 is the same thing, especially when used in a context.
- Nobody advocated for replacing
($ x)
byflip ($) x
this was just to demonstrate the problems arising with syntactic sugar. I believe that many people don't take that into account when requesting more sugar (such as parallel list comprehension). -- HenningThielemann
Infix notation is problematic for both human readers and source code formatters.
No, infix notation isn't problematic for human readers, it enables them to read the code faster in many cases.
- ... if he knows the precedences, which is only true for (some) of the predefined operators. I guess you don't know all of the precedences of the Prelude operators. I also use infix operations like (+) and (:) but I'm very concerned with introducing lots of new operator symbols and `f` notation. -- HenningThielemann
- Introducing new operators should definitely not be done careless (then again, one shouldn't be careless in programming anyway), and operator percedences might be better defined as a partial order. (e.g. there is an order between (+) and (*), and between (&&) and (||), but not between (+) and (&&)). Other proposals for replacing the current left/right associative + precedence system do exist. However, doing away with infix operators entirely appears to me to practically render combinator libraries unusable, which would make Haskell a lot less attractive. -- RemiTurk
- The nice thing about precedences in Haskell is that it's often not necessary to know them exactly in order
to use them. If the types of you operators are sufficiently general, or sufficiently distinct, only the sight way to parse them will lead to type-checking code, so you can just give it a shot without parenthesis and hopefully remember the precedence the next time you're in a similar situation. -- ThomasJaeger
(+1) `fmap` m
expresses that the fmap
is just a fancy function application, when writing interpreter-like code Var "f" `App` Var "x" `App` "y"
is certainly more readable and can be written much faster than App (App (Var "f") (Var "x")) (Var "y"))
and our brains have been trained to parse + and * with different precendences since elementary school.
- If you make it for readability, I agree, if you make it for fancyness, I disagree. In the case of
App
it looks like the list can become longer, so it's worth of thinking about usingfoldl
,App
and a list - though then I would certainly also use some syntactic sugar to enter the list["f", "x", "y"]
. Btw. even the regular list notation is disputable since in the infix notation("f":"x":"y":[])
it is easier move elements around. So I end up with infix notation, ok. Even more,foldl
shows clearly the left associativity, whereas`App`
does not. - In the case of
`on`
andcompose2
, I disagree.on
is less informative thancompose2
. And yes,compose2
is a kind of generalization of.
, though it is not the most popular one. -- HenningThielemann
- You do indeed have a point there: it's indeed an extension of
(.)
, which I incorrectly denied. However, as it's not the extension, and AFAIK not even the most used extension, I consider the namecompose2
to be slightly misleading. In addition, I think "group by equality on isAlpha" (groupBy ((==) `on` isAlpha)
) is just too cute too resist. -- RemiTurk
- So, do you do it for readability or for fanciness? I find an infix function application hard to read, since it looks like just another argument. That is,
((==) `on` isAlpha)
reads very similar to((==) on isAlpha)
. In your example you really switch the prefered usage of the identifiers, you use the infix symbol==
in prefix notation and the prefix function nameon
in infix notation. -- HenningThielemann
- So, do you do it for readability or for fanciness? I find an infix function application hard to read, since it looks like just another argument. That is,
- Any kind of syntax highlighting should make the difference between
((==) `on` isAlpha)
and((==) on isAlpha)
obvious. Another argument for using an infixon
here is that it explains the order of the elements: Deciding ifon (==) isAlpha
oron isAlpha (==)
is better and trying to remember which way the implementor choose is certainly more difficult than realizing thatisAlpha `on` (==)
makes no sense (There are better examples for this such as`elem`
, or think about the confusion between the order of arguments inData.FiniteMap
vs.Data.Map
). -- ThomasJaeger
- Any kind of syntax highlighting should make the difference between
- Btw. the function
comparing
has found its way to the moduleData.Ord
(http://www.haskell.org/ghc/dist/current/docs/libraries/base/Data-Ord.html). It is a composition ofcompare
andon
/compose2
. However it does not satisfyingly help to implement extensions of*By
functions, because the key for sorting/grouping/maximising must be re-computed for each comparison if you write, saysortBy (comparing length)
. -- HenningThielemann
- Btw. the function
- Of course,
foldl App
only works ifApp
models application in an untyped language. Using GADTs,App
could be of typeExpr (a -> b) -> Expr a -> Expr b
, also, many combinator that works on different types can't be "sugared" using functions. -- ThomasJaeger
Finally, there is no reason why one should expect a tool that processes haskell code not to be aware of Haskell 98's syntax. All mentioned syntactic extensions (list comprehension, guards, sections, infix stuff) fall under this category and can be used without any bad conscience.
Sorry for having become so "religous" -- ThomasJaeger
I agree. -- CaleGibbard
- If you want a good example for unnecessary sugar, take this one:
tuples :: Int -> [a] -> [[a]]
tuples 0 _ = return []
tuples (r+1) xs = do
y:ys <- tails xs
(y:) `fmap` tuples r ys
- Why is infix
`fmap`
more readable than prefixfmap
? Where is the analogy tomap
? Why don't you usemap
at all? I seemap
as an operator which lifts scalar functions to list functions, this is perfectly expressed by prefix notation. What is the readability benefit of(r+1)
pattern and why isdo
more readable than explicitinit (tails xs) >>= (\(y:ys) -> map (y:) (tuples (r-1) ys))
here? (Mostly because this is not the correct translation and the correct translation is unreadable -- DerekElkins) It's even worse than[(y:) `fmap` tuples r ys | (y:ys) <- tails xs]
. You rewrote my code just with sugar but the structure which must be understood remained the same. Sorry, I don't find it easier to understand. Maybe people who believe a common notation rather than to try to understand the meaning are happy with the sugar. -- HenningThielemann
- The pattern
m >>= \x -> f x
is exactly the reason the do-notation was introduced, so each time I write something like this, I replace it with a do notation for the following reason: It is definitely the more common style (nobody is usingm >>= \x -> \n
-style these days), so much more likely to be understood faster (at least for myself), the do notation expresses nicely that monadic (in this case notdeterminstic) effects are taking place, and finally it is much easier to make changes to the code if it's in do-form (e.g. add additional guards). Of course you CAN do the same changes in>>=
-style, too, after all there is a straightforward translation (although complicated by the fact that you have to check if pattern matchings are exhaustive), but I'm not the kind of guy who does all kinds of verbose translation in his head just because he wants to stay away from syntactic sugar.
- I disagree with arguments like "nobody is using ...". What does it tell about the quality of a technique? I write here to give reasons against too much syntactic sugar rather than to record today's habits of some programmers. -- HenningThielemann
- You are further critizing that I am using
fmap
instead of the more specialmap
. I find it natural to usefmap
in monadic code to abstract from lists. If it weren't fortails
, the code would even have type(MonadPlus m, Functor m) => Int -> [a] -> m [a]
, increasing usability.liftM
would also be acceptable (and for some strange reason even slightly faster), but it feels awfully wrong to me to useliftM
, so that I'm willing to live with additionalFunctor
constraints. This is also the reason while your list comprehension solution is clearly inferior to a monadic one.
- One more thing about pattern match failure vs.
init
. Though it doesn't matter match in this simple example, the version exploiting pattern match failure is closer to the conceptional algorithm, because it doesn't rely on the "low-level-property" oftails
that the empty list is the last element of the returned list (I can never remember this kind of things, even though one of the possible behaviors makes much more sense).
- The function
tuples
is defined by recursion on an Int and only uses the case of the predecessor, so this is a classical example for (n+1)-patterns. Note that the LHSs in your implementation are overlapped, so a reader might need more time to figure out what is going on in your implementation (I admit the effect is small, but this is a very tiny example).
- Using
fmap
infix is a personal habit of mine, but when you think about it, it makes a lot of sense. As we can't overload a space, it's the closest thing to application syntax we can get. I know you preferApp (App f x) y
-style, which seems more difficult to understand for most people. This just is a matter of personal style. Please do not mistake your own personal style for the only sensible one.
- If no one else objects, I'd like to put my implementation back on the main page, possibly with a less controversial comment. --ThomasJaeger
- My argument is that the syntactic sugared version may be readable like the unsugared version, but it does not improve the readability, thus it should be avoided. Sugar shouldn't be the default, it shouldn't used just because it exists. That's the basic opinion where we disagree. Btw. I find the
do
notation in connection with theList
monad very confusing because it looks imperative and it suggests that first something is chosen from the list then it is returned. -- HenningThielemann
- My argument is that the syntactic sugared version may be readable like the unsugared version, but it does not improve the readability, thus it should be avoided. Sugar shouldn't be the default, it shouldn't used just because it exists. That's the basic opinion where we disagree. Btw. I find the
- While it may not be more readable for you, it is for me, for the reasons I'm getting tired of stating. Also, your opinions on the do-notation seem very strange to me. If we have monads - a way to unify different notions of effects - why make the artificial distinction between IO/State effects and more general ones again? The do-notation expresses in which order the effects are happening - that's the same for a list and an IO monad. However, a distinction between commutative and non-commutative monads would make sense, but unfortunately, there's no way to prove the commutativity of a monad statically in Haskell.
There are still issues that aren't implemented in GHC which belong to the Haskell 98 standard and which are of more importance, I think, such as mutual recursive modules and some special case of polymorphic mutual function recursion. So I don't vote for wasting the time with syntactic sugar when real enhancements are deferred by it. If I would write a Haskell code processor I would certainly prevent me from the trouble of supporting guards and (n+k) patterns. I'm also fed up with the similar situation in HTML with its tons of extensions and the buggy code which is accepted by most browsers (which is also a sort of inofficial extension) - there is simply no fun in processing HTML code. Not to mention C++.
By the way I'd like to have a real function if
instead of the sugarized version with then
and else
in Haskell 98. Then I could use it in connection with zipWith3
. See ["Case"] for another application. -- HenningThielemann
- Agrees with that. I'm not using
if
all that often, and could easily add a few braces. And, it would freethen
andelse
for normal identifier use. -- RemiTurk
Personally, I like the explicit `then` and `else` and find that they help when reading code to separate where the break is between the sections. It's not that I necessarily disagree with the inclusion of such a function, it is an easy one to write in any case, but I think that some sugar in the form of a few extra words to mark important points in common structures is useful. Human languages have many such words, and their presence makes reading or listening much easier. - CaleGibbard
- Other people seem to have problems with this special syntax, too. And they propose even more special syntax to solve the problem. http://prime.haskell.org/wiki/DoAndIfThenElse -- HenningThielemann