Difference between revisions of "Haskell programming tips"

From HaskellWiki
Jump to navigation Jump to search
(pointer to reference)
m (Fixed a relative reference within the text.)
 
(55 intermediate revisions by 22 users not shown)
Line 1: Line 1:
= Preface =
+
==Preface==
   
This page shows several examples of how code can be improved. We try to derive general rules from them, though they can not be applied deterministicly and are a matter of taste. We all know that, please don't add "this is disputable" to each item!
+
This page shows several examples of how code can be improved. We try to derive general rules from them, though they cannot be applied deterministically and are a matter of taste. We all know this, please don't add "this is disputable" to each item!
   
 
Instead, you can now add "this is disputable" on [[/Discussion]] and change this page only when some sort of consensus is reached.
 
Instead, you can now add "this is disputable" on [[/Discussion]] and change this page only when some sort of consensus is reached.
   
= Be concise =
+
==Be concise==
   
== Don't reinvent the wheel ==
+
===Don't reinvent the wheel===
   
The standard libraries are full of useful functions, possibly too full. If you rewrite an existing function, the reader wonders what the difference to the standard function is. But if you use a standard function, the reader may learn something new and useful.
+
The standard libraries are full of useful, well-tuned functions. If you rewrite an existing library function, the reader of your code might spend a minute trying to figure out why you've done that. But if you use a standard function, the reader will either immediately understand what you've done, or can learn something new.
If you have problems finding an appropriate list function, try this guide:
 
http://www.cs.chalmers.se/Cs/Grundutb/Kurser/d1pt/d1pta/ListDoc/
 
   
== Avoid explicit recursion ==
+
===Avoid explicit recursion===
   
Explicit recursion is not generally bad, but you should spend some time on trying to find a more declarative implementation using higher order functions.
+
Explicit recursion is not generally bad, but you should spend some time trying to find a more declarative implementation using higher order functions.
   
 
Don't define
 
Don't define
Line 23: Line 21:
 
raise x (y:ys) = x+y : raise x ys
 
raise x (y:ys) = x+y : raise x ys
 
</haskell>
 
</haskell>
because it is hard for the reader to find out,
+
because it is hard for the reader to find out
 
how much of the list is processed and
 
how much of the list is processed and
 
on which values the elements of the output list depend.
 
on which values the elements of the output list depend.
Line 36: Line 34:
 
and the reader knows that the complete list is processed and that each output element depends only on the corresponding input element.
 
and the reader knows that the complete list is processed and that each output element depends only on the corresponding input element.
   
If you don't find appropriate functions in the standard library, extract a general function. This helps you and others understanding the program. Haskell is '''very''' good at factoring out parts of the code.
+
If you don't find appropriate functions in the standard library, extract a general function.
  +
This helps you and others understand the program.
If you find it very general, put it in a separate module and re-use it. It may appear in the standard libraries later, or you may later find that it is already there.
 
  +
Thanks to [[higher order function]]s Haskell gives you '''very''' many opportunities to factor out parts of the code.
  +
If you find the function very general, put it in a separate module and re-use it. It may appear in the standard libraries later, or you may later find that it is already there in an even more general way.
   
Decomposing a problem this way has also the advantage that you can debug easier. If the last implementation of <hask>raise</hask> does not show the expected behaviour, you can inspect <hask>map</hask> (I hope it is correct :-) ) and the invoked instance of <hask>(+)</hask> separately.
+
Decomposing a problem this way also has the advantage that you can debug more easily. If the last implementation of <hask>raise</hask> does not show the expected behaviour, you can inspect <hask>map</hask> (I hope it is correct :-) ) and the invoked instance of <hask>(+)</hask> separately.
   
   
''Could this be stated more generally? It seems to me this is a special case of the general principle of separating concerns: iteration over a collection vs operating on elements of a collection should apply. If you can write the loop over a data structure (list, tree, whatever) once and debug it, then you don't need to duplicate that code over and over (at least in haskell), so your code can follow the principle of Wiki:OnceAndOnlyOnce ; Wiki:OnceAndOnlyOnce is a lot harder in languages that don't provide a certain level of functional programming support (i.e. Java requires copy and paste programming, the delegate C# syntax is clumsy but workable - using it is almost Wiki:GoldPlating).''
+
''This is a special case of the general principle of [[separation of concerns|separating concerns]]. If you can write the loop over a data structure once and debug it, then there's no need to duplicate that code.''
   
   
  +
Another example: the function <hask>count</hask> counts the number of elements which fulfill a certain property, i.e. the elements for which the predicate <hask>p</hask> is <hask>True</hask>.
Another example:
 
The function <hask>count</hask> counts the number of elements
 
which fulfill a certain property,
 
i.e. the elements for which the predicate <hask>p</hask> is <hask>True</hask>.
 
   
 
I found the following code (but convoluted in a more specific function) in a Haskell program
 
I found the following code (but convoluted in a more specific function) in a Haskell program
Line 58: Line 55:
 
| otherwise = count p xs
 
| otherwise = count p xs
 
</haskell>
 
</haskell>
which you won't like any longer if you become aware of
+
which you won't like after you become aware of
 
<haskell>
 
<haskell>
 
count p = length . filter p
 
count p = length . filter p
Line 64: Line 61:
 
.
 
.
   
  +
===Only introduce identifiers you need===
 
== Only introduce identifiers you need ==
 
   
 
Here is some advice that is useful for every language, including scientific prose
 
Here is some advice that is useful for every language, including scientific prose
 
(http://www.cs.utexas.edu/users/EWD/transcriptions/EWD09xx/EWD993.html):
 
(http://www.cs.utexas.edu/users/EWD/transcriptions/EWD09xx/EWD993.html):
 
Introduce only identifiers you use.
 
Introduce only identifiers you use.
The compiler will check that you if you pass an option like <code>-Wall</code> for GHC.
+
The compiler will check this for you if you pass an option like <code>-Wall</code> to GHC.
   
 
In an expression like
 
In an expression like
Line 84: Line 80:
   
   
== Remember the zero ==
+
===Remember the zero===
   
Don't forget that zero is a natural number. Recursive definitions become more complicated if the recursion anchor is not chosen properly. As an example I have chosen the function <hask>tupel</hask> presented in ''DMV-Mitteilungen 2004/12-3, Jürgen Bokowski: Haskell, ein gutes Werkzeug der Diskreten Mathematik'' (Haskell, a good tool for discrete mathematics). It is also a good example of how to avoid guards.
+
Don't forget that zero is a natural number. Recursive definitions become more complicated if the recursion anchor is not chosen properly. For example the function <hask>tupel</hask> presented in ''DMV-Mitteilungen 2004/12-3, Jürgen Bokowski: Haskell, ein gutes Werkzeug der Diskreten Mathematik'' (Haskell, a good tool for discrete mathematics). This is also a good example of how to avoid guards.
 
<haskell>
 
<haskell>
 
tuples :: Int -> [a] -> [[a]]
 
tuples :: Int -> [a] -> [[a]]
Line 140: Line 136:
 
</haskell>
 
</haskell>
   
We have learnt above that <hask>length</hask> is evil! What about
+
We will learn below that <hask>length</hask> can be evil! What about
 
<haskell>
 
<haskell>
 
tuples :: Int -> [a] -> [[a]]
 
tuples :: Int -> [a] -> [[a]]
Line 150: Line 146:
 
</haskell>
 
</haskell>
 
? It is no longer necessary to compute the length of <hask>l</hask> again and again. The code is easier to read and it covers all special cases, including <hask>tuples (-1) [1,2,3]</hask>!
 
? It is no longer necessary to compute the length of <hask>l</hask> again and again. The code is easier to read and it covers all special cases, including <hask>tuples (-1) [1,2,3]</hask>!
  +
  +
''Eliminating the <hask>length</hask> test can worsen performance dramatically in some cases, like <hask>tuples 24 [1..25]</hask>. We could also use <hask>null (drop (r-1) l)</hask> instead of <hask>length l < r</hask>, which works for infinite lists. See also [[#Don't ask for the length of a list, if you don't need it|below]].''
   
 
You can even save one direction of recursion
 
You can even save one direction of recursion
Line 162: Line 160:
 
</haskell>
 
</haskell>
   
Since (=<<) in the list monad is concatMap, we can also write this as follows.
+
Since <hask>(=<<)</hask> in the list monad is <hask>concatMap</hask>, we can also write this as follows.
 
Where in the previous version the pattern <hask>y:ys</hask> filtered out the last empty suffix
 
Where in the previous version the pattern <hask>y:ys</hask> filtered out the last empty suffix
 
we have to do this manually now with <hask>init</hask>.
 
we have to do this manually now with <hask>init</hask>.
Line 177: Line 175:
   
   
''More generally, BaseCasesAndIdentities''
+
''More generally, [[Base cases and identities]]''
   
== Don't overuse lambdas ==
+
===Don't overuse lambdas===
   
 
Like explicit recursion, using explicit lambdas isn't a universally bad idea, but a better solution often exists.
 
Like explicit recursion, using explicit lambdas isn't a universally bad idea, but a better solution often exists.
Line 210: Line 208:
 
which is both clearer and re-usable.
 
which is both clearer and re-usable.
 
Actually, starting with GHC-6.6 you do not need to define <hask>comparing</hask>, since it is already in module <hask>Data.Ord</hask>.
 
Actually, starting with GHC-6.6 you do not need to define <hask>comparing</hask>, since it is already in module <hask>Data.Ord</hask>.
http://www.haskell.org/ghc/dist/current/docs/libraries/base/Data-Ord.html
+
http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-Ord.html
   
 
(Just a remark for this special example:
 
(Just a remark for this special example:
We can avoid multiple evaluations of the conversions.
+
We can avoid multiple evaluations of the conversions
  +
with a function that is present in GHC.Exts of GHC 6.10:
 
<haskell>
 
<haskell>
sortKey :: (Ord b) => (a -> b) -> [a] -> [a]
+
sortWith :: (Ord b) => (a -> b) -> [a] -> [a]
sortKey f x = map snd (sortBy (comparing fst) (zip (map f x) x))
+
sortWith f x = map snd (sortBy (comparing fst) (zip (map f x) x))
 
</haskell>
 
</haskell>
 
)
 
)
Line 232: Line 231:
 
</haskell>
 
</haskell>
   
  +
===<hask>Bool</hask> is a regular type===
 
== Bool is a regular type ==
 
   
 
Logic expressions are not restricted to guards and <hask>if</hask> statements.
 
Logic expressions are not restricted to guards and <hask>if</hask> statements.
Line 248: Line 246:
 
.
 
.
   
  +
The definitions
  +
<haskell>
  +
hasSpace (a:as)
  +
| isSpace a = True
  +
| otherwise = hasSpace as
  +
</haskell>
  +
and
  +
<haskell>
  +
hasSpace (a:as) = if isSpace a then True else hasSpace as
  +
</haskell>
  +
can be shortened to
  +
<haskell>
  +
hasSpace (a:as) = isSpace a || hasSpace as
  +
</haskell>
  +
(I just wanted to show the logic transform.
  +
In the particular example you would write <hask>any isSpace</hask>, of course.)
  +
The same way
  +
<haskell>
  +
allPrintable (a:as)
  +
| isSpace a = False
  +
| otherwise = allPrintable as
  +
</haskell>
  +
and
  +
<haskell>
  +
allPrintable (a:as) = if isSpace a then False else allPrintable as
  +
</haskell>
  +
can be shortened to
  +
<haskell>
  +
allPrintable (a:as) = not (isSpace a) && allPrintable as
  +
</haskell>
  +
(but <hask>all (not . isSpace)</hask> is even better in this particular example.)
   
   
   
= Use syntactic sugar wisely =
+
==Use syntactic sugar wisely==
   
People who employ SyntacticSugar extensively
+
People who employ [[syntactic sugar]] extensively
argue that their code becomes more readable by it.
+
argue that it makes their code more readable.
 
The following sections show several examples
 
The following sections show several examples
 
where less syntactic sugar is more readable.
 
where less syntactic sugar is more readable.
Line 266: Line 295:
   
   
== list comprehension ==
+
===List comprehension===
   
List comprehension let you remain in imperative thinking, that is it let you think in variables rather than transformations. Open your mind, discover the flavour of the PointFreeStyle!
+
List comprehension lets you remain in imperative thinking, that is it lets you think in variables rather than transformations. Open your mind, discover the flavour of the [[pointfree]] style!
   
 
Instead of
 
Instead of
Line 286: Line 315:
 
</haskell>
 
</haskell>
 
where it takes some time for the reader
 
where it takes some time for the reader
to find out which value depends on what other value
+
to discover which value depends on what other value
 
and it is not so clear how many times
 
and it is not so clear how many times
 
the interim values <hask>s</hask> and <hask>c</hask> are used.
 
the interim values <hask>s</hask> and <hask>c</hask> are used.
Line 297: Line 326:
   
   
When using higher order functions you can switch easier to
+
When using higher order functions you can switch more easily from <hask>List</hask> to other data structures.
data structures different from <hask>List</hask>.
 
   
 
Compare
 
Compare
Line 309: Line 337:
 
</haskell>
 
</haskell>
 
.
 
.
If there would be a standard instance for the <hask>Functor</hask> class
+
If there were a standard instance for the <hask>Functor</hask> class
 
you could use the code
 
you could use the code
 
<haskell>
 
<haskell>
Line 317: Line 345:
   
 
If you are not used to higher order functions for list processing
 
If you are not used to higher order functions for list processing
you feel like needing parallel list comprehension.
+
you may feel you need parallel list comprehension.
 
This is unfortunately supported by GHC now,
 
This is unfortunately supported by GHC now,
but somehow superfluous since various flavours of <hask>zip</hask> already do a great job.
+
but it is arguably superfluous since various flavours of <hask>zip</hask> already do a great job.
   
   
   
   
== do notation ==
+
===<hask>do</hask> notation===
   
Do notation is useful to express the imperative nature (e.g. a hidden state or an order of execution) of a piece of code.
+
[[Do notation considered harmful|do notation]] is useful to express the imperative nature (e.g. a hidden state or an order of execution) of a piece of code.
 
Nevertheless it's sometimes useful to remember that the <hask>do</hask> notation is explained in terms of functions.
 
Nevertheless it's sometimes useful to remember that the <hask>do</hask> notation is explained in terms of functions.
   
Line 366: Line 394:
 
</haskell>
 
</haskell>
 
.
 
.
Btw. in the case of <hask>IO</hask> monad the <hask>Functor</hask> class method <hask>fmap</hask> and the <hask>Monad</hask> based function <hask>liftM</hask> are the same.
+
By the way, the <hask>Functor</hask> class method <hask>fmap</hask> and the <hask>Monad</hask> based function <hask>liftM</hask> are the same (as long as both are defined, as they should be).
   
 
''Be aware that "more complicated" does not imply "worse". If your do-expression was longer than this, then mixing do-notation and <hask>fmap</hask> might be precisely the wrong thing to do, because it adds one more thing to think about. Be natural. Only change it if you gain something by changing it. -- AndrewBromage''
 
''Be aware that "more complicated" does not imply "worse". If your do-expression was longer than this, then mixing do-notation and <hask>fmap</hask> might be precisely the wrong thing to do, because it adds one more thing to think about. Be natural. Only change it if you gain something by changing it. -- AndrewBromage''
   
== Guards ==
+
===Guards===
  +
  +
''Disclaimer: This section is NOT advising you to avoid guards. It is advising you to prefer pattern matching to guards when both are appropriate. -- AndrewBromage''
   
 
Guards look like
 
Guards look like
Line 381: Line 411:
 
which implements a factorial function. This example, like a lot of uses of guards, has a number of problems.
 
which implements a factorial function. This example, like a lot of uses of guards, has a number of problems.
   
The first problem is that it's nearly impossible for the compiler to check if guards like this are exhaustive, as the guard conditions may be arbitrarily complex (Ghc will warn you if you use the <code>-Wall</code> option). To avoid this problem and potential bugs through non exhaustive patterns you should use an <hask>otherwise</hask> guard, that will match for all remaining cases:
+
The first problem is that it's nearly impossible for the compiler to check whether guards like this are exhaustive, as the guard conditions may be arbitrarily complex (GHC will warn you if you use the <code>-Wall</code> option). To avoid this problem and potential bugs through non exhaustive patterns you should use an <hask>otherwise</hask> guard, that will match for all remaining cases:
   
 
<haskell>
 
<haskell>
Line 400: Line 430:
 
else n * fac (n-1)
 
else n * fac (n-1)
 
</haskell>
 
</haskell>
Note that <hask>if</hask> has its own set of problems, for example in connection with the layout rule or that nested <hask>if</hask>s are difficult to read. See ["Case"] how to avoid nested <hask>if</hask>s.
+
Note that <hask>if</hask> has its own set of [[If-then-else|problems]], for example in connection with the layout rule or that nested <hask>if</hask>s are difficult to read. See [[Case]] how to avoid nested <hask>if</hask>s.
   
 
But in this special case, the same can be done even more easily with pattern matching:
 
But in this special case, the same can be done even more easily with pattern matching:
Line 418: Line 448:
 
This may also be more efficient as <hask>product</hask> might be optimized by the library-writer... In GHC, when compiling with optimizations turned on, this version runs in O(1) stack-space, whereas the previous versions run in O(n) stack-space.
 
This may also be more efficient as <hask>product</hask> might be optimized by the library-writer... In GHC, when compiling with optimizations turned on, this version runs in O(1) stack-space, whereas the previous versions run in O(n) stack-space.
   
Note however, that there is a difference between this version and the previous ones: When given a negative number, the previous versions do not terminate (until StackOverflow-time), while the last implemenation returns 1.
+
Note however, that there is a difference between this version and the previous ones: When given a negative number, the previous versions do not terminate (until StackOverflow-time), while the last implementation returns 1.
   
   
Line 431: Line 461:
 
</haskell>
 
</haskell>
   
or compare the following example using the advanced PatternGuards
+
or compare the following example using the advanced [[pattern guard]]s
(http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#PATTERN-GUARDS)
 
 
<haskell>
 
<haskell>
 
parseCmd ln
 
parseCmd ln
Line 440: Line 469:
 
= x
 
= x
 
</haskell>
 
</haskell>
with this one with NoPatternGuards:
+
with this one with [[no pattern guard]]s:
 
<haskell>
 
<haskell>
 
parseCmd ln = case parse cmd "Commands" ln of
 
parseCmd ln = case parse cmd "Commands" ln of
Line 453: Line 482:
   
   
By the way, a compiler has also problems with numerical patterns. E.g. the pattern <hask>0</hask> in fact means <hask>fromInteger 0</hask>, thus it involves a computation, which is uncommon for function parameter patterns. To illustrate this, consider the following example:
+
Incidentally, compilers often also have problems with numerical patterns. For example, the pattern <hask>0</hask> in fact means <hask>fromInteger 0</hask>; thus it involves a computation, which is uncommon for function parameter patterns. To illustrate this, consider the following example:
 
<haskell>
 
<haskell>
 
data Foo = Foo deriving (Eq, Show)
 
data Foo = Foo deriving (Eq, Show)
Line 470: Line 499:
 
</haskell>
 
</haskell>
   
Only use guards if you need to, in general you should stick to pattern matching whenever possible.
+
Only use guards when you need to. In general, you should stick to pattern matching whenever possible.
   
== n+k patterns ==
+
===<hask>n+k</hask> patterns===
 
In order to allow pattern matching against numerical types, Haskell 98 provides so-called n+k patterns, as in
 
In order to allow pattern matching against numerical types, Haskell 98 provides so-called n+k patterns, as in
 
<haskell>
 
<haskell>
Line 479: Line 508:
 
take _ _ = []
 
take _ _ = []
 
</haskell>
 
</haskell>
However, they are often critizised for hiding computational complexity and producing ambiguties, see ["ThingsToAvoid/Discussion"] for details. They are subsumed by the more general ["Views"] proposal, which was unfortunately never implemented despite being around for quite some time now.
+
However, they are often criticized for hiding computational complexity and producing ambiguities, see [[/Discussion]] for details. They are subsumed by the more general [[Views]] proposal, which has unfortunately never been implemented despite being around for quite some time now.
   
  +
The <hask>n+k</hask> patterns are not in the [[Haskell 2010]] standard.
   
= Efficiency and infinity =
+
==Efficiency and infinity==
   
 
A rule of thumb is:
 
A rule of thumb is:
If a function makes sense for an infinite data structure but the implementation at hand fails for an infinite amount of data, then the implementation is probably inefficient also for finite data.
+
If a function makes sense for an infinite data structure but the implementation at hand fails for an infinite amount of data, then the implementation is probably also inefficient for finite data.
   
== Don't ask for the length of a list, if you don't need it ==
+
===Don't ask for the length of a list when you don't need it===
   
 
Don't write
 
Don't write
Line 502: Line 532:
 
is faster but it requires the list <hask>x</hask> to be of type <hask>[a]</hask> where <hask>a</hask> is a type of class <hask>Eq</hask>.
 
is faster but it requires the list <hask>x</hask> to be of type <hask>[a]</hask> where <hask>a</hask> is a type of class <hask>Eq</hask>.
   
The best to do is
+
The best thing to do is
 
<haskell>
 
<haskell>
 
null x
 
null x
 
</haskell>
 
</haskell>
   
Additionally, many uses of the length function can be replaced with an <hask>atLeast</hask> function that only checks to see that a list is greater than the required minimum length.
+
Additionally, many uses of the length function are overspecifying the problem: one may only need to check that a list is ''at least'' a certain length, and not a specific length. Thus use of <haskell>length</haskell> could be replaced with an <hask>atLeast</hask> function that only checks to see that a list is greater than the required minimum length.
 
<haskell>
 
<haskell>
 
atLeast :: Int -> [a] -> Bool
 
atLeast :: Int -> [a] -> Bool
Line 538: Line 568:
 
take (length x) y
 
take (length x) y
 
</haskell>
 
</haskell>
since this is inefficient for large lists <hask>x</hask> and fails for infinite ones.
+
since this is inefficient for large lists <hask>x</hask> and fails for infinite ones. But this can be useful to extract a finite prefix from an infinite list.
But this can be useful to extract a finite prefix from an infinite list.
 
 
So, instead
 
So, instead
 
<haskell>
 
<haskell>
Line 546: Line 575:
 
works well.
 
works well.
   
It should be noted that <hask>length</hask>, <hask>take</hask> and others wouldn't cause headache if they would count using PeanoNumbers as shown below.
+
It should be noted that <hask>length</hask>, <hask>take</hask>
  +
can be replaced by <hask>genericLength</hask>, <hask>genericTake</hask> et.al.,
  +
which allow the usage of [[Peano numbers]].
   
== Don't ask for the minimum if you don't need it ==
+
===Don't ask for the minimum when you don't need it===
   
 
The function <hask>isLowerLimit</hask> checks if a number is a lower limit to a sequence.
 
The function <hask>isLowerLimit</hask> checks if a number is a lower limit to a sequence.
Line 555: Line 586:
 
isLowerLimit x ys = x <= minimum ys
 
isLowerLimit x ys = x <= minimum ys
 
</haskell>
 
</haskell>
It fails definitely if <hask>ys</hask> is infinite. Is this a problem?
+
It certainly fails if <hask>ys</hask> is infinite. Is this a problem?
   
 
Compare it with
 
Compare it with
Line 561: Line 592:
 
isLowerLimit x = all (x<=)
 
isLowerLimit x = all (x<=)
 
</haskell>
 
</haskell>
This definition terminates for infinite lists, if <hask>x</hask> is not a lower limit. It aborts immediately if an element is found which is below <hask>x</hask>. Thus it is also faster for finite lists. Even more: It works also for empty lists.
+
This definition terminates for infinite lists, if <hask>x</hask> is not a lower limit. It aborts immediately if an element is found which is below <hask>x</hask>. Thus it is also faster for finite lists. Even more: It also works for empty lists.
   
   
  +
===Use sharing===
== Choose the right fold ==
 
   
  +
If you want a list of lists with increasing length and constant content, don't write
See StackOverflow for advice on which fold is appropriate for your situation.
 
  +
<haskell>
  +
map (flip replicate x) [0..]
  +
</haskell>
  +
because this needs quadratic space and run-time. If you code
  +
<haskell>
  +
iterate (x:) []
  +
</haskell>
  +
then the lists will share their suffixes and thus need only linear space and run-time for creation.
   
   
= Choose types properly =
+
===Choose the appropriate fold===
   
  +
See "[[Stack overflow]]" or "[[Foldr Foldl Foldl']]" for advice on which fold is appropriate for your situation.
== Lists are not good for everything ==
 
   
  +
=== Lists are not arrays ===
 
  +
==Choose types properly==
  +
  +
===Lists are not good for everything===
  +
  +
====Lists are not arrays====
   
 
Lists are not arrays, so don't treat them as such.
 
Lists are not arrays, so don't treat them as such.
 
Frequent use of <hask>(!!)</hask> should alarm you.
 
Frequent use of <hask>(!!)</hask> should alarm you.
 
Accessing the <hask>n</hask>th list element
 
Accessing the <hask>n</hask>th list element
requires to traverse through the first <hask>n</hask> nodes of the list.
+
involves traversing through the first <hask>n</hask> nodes of the list.
 
This is very inefficient.
 
This is very inefficient.
   
If you access the elements progressively like in
+
If you access the elements progressively, as in
 
<haskell>
 
<haskell>
 
[x !! i - i | i <- [0..n]]
 
[x !! i - i | i <- [0..n]]
 
</haskell>
 
</haskell>
you should try to get rid of indexing like in
+
you should try to get rid of indexing, as in
 
<haskell>
 
<haskell>
 
zipWith (-) x [0..n]
 
zipWith (-) x [0..n]
Line 591: Line 635:
 
.
 
.
   
If you really need random access like in the Fourier Transform
+
If you really need random access, as in the Fourier Transform,
you should switch to <hask>Array</hask>s.
+
you should switch to [[Arrays]].
   
   
=== Lists are not sets ===
+
====Lists are not sets====
   
 
If you manage data sets where each object can occur only once
 
If you manage data sets where each object can occur only once
Line 604: Line 648:
 
you should think about switching to sets.
 
you should think about switching to sets.
 
If you need multi-sets,
 
If you need multi-sets,
i.e. data sets with irrelevant order but multiple occurence of an object
+
i.e. data sets with irrelevant order but multiple occurrences of objects,
you can use a <hask>FiniteMap a Int</hask>.
+
you can use a <hask>Data.Map.Map a Int</hask>.
 
   
=== Lists are not finite maps ===
 
   
Similarly, lists are not finite maps, as mentioned on EfficiencyHints.
+
====Lists are not finite maps====
   
  +
Similarly, lists are not finite maps, as mentioned in [[Efficiency_hints#Data.Sequence_vs._lists | efficiency hints]].
   
== Reduce type class constraints ==
+
===Reduce type class constraints===
   
=== Eq type class ===
+
====Eq type class====
   
 
When using functions like <hask>delete</hask>, <hask>(\\)</hask>, <hask>nub</hask>, and so on you should be aware that they need types of the <hask>Eq</hask> class. There are two problems: The routines might not work as expected if a processed list contains multiple equal elements and the element type of the list may not be comparable, like functions.
 
When using functions like <hask>delete</hask>, <hask>(\\)</hask>, <hask>nub</hask>, and so on you should be aware that they need types of the <hask>Eq</hask> class. There are two problems: The routines might not work as expected if a processed list contains multiple equal elements and the element type of the list may not be comparable, like functions.
Line 635: Line 678:
   
   
== Don't use Int if you don't consider integers ==
+
===Don't use <hask>Int</hask> when you don't consider integers===
   
 
Before using integers for each and everything (C style)
 
Before using integers for each and everything (C style)
Line 641: Line 684:
 
If only the values <hask>0</hask> and <hask>1</hask> are of interest,
 
If only the values <hask>0</hask> and <hask>1</hask> are of interest,
 
try the type <hask>Bool</hask> instead.
 
try the type <hask>Bool</hask> instead.
If there are more choices and numeric operations aren't needed
+
If there are more but predefined choices and numeric operations aren't needed try an enumeration.
try an enumeration.
 
If an enumeration is not appropriate
 
you can define a <hask>newtype</hask> carrying the type that is closest to what you need.
 
   
 
Instead of
 
Instead of
Line 662: Line 702:
 
</haskell>
 
</haskell>
   
It allows all sensible operations like <hask>==</hask>, <hask>&lt;</hask>, <hask>succ</hask> and
+
It allows all sensible operations like <hask>==</hask>, <hask> < </hask>, <hask>succ</hask> and
 
forbids all nonsensical ones like <hask>+</hask>, <hask>*</hask>.
 
forbids all nonsensical ones like <hask>+</hask>, <hask>*</hask>.
 
You cannot accidentally mix up weekdays with numbers and
 
You cannot accidentally mix up weekdays with numbers and
 
the signature of a function with weekday parameter clearly states what kind of data is expected.
 
the signature of a function with weekday parameter clearly states what kind of data is expected.
   
  +
If an enumeration is not appropriate
  +
you can define a <hask>newtype</hask> carrying the type that is closest to what you need.
  +
E.g. if you want to associate objects with a unique identifier,
  +
you may want to choose the type <hask>Int</hask>.
  +
But you don't need arithmetic and you can make this type distinct from real <hask>Int</hask>s by defining
  +
<haskell>
  +
newtype Identifier = Identifier Int deriving Eq
  +
</haskell>
   
= Miscellaneous =
 
   
== Separate IO and data processing ==
+
===Avoid redundancy in data types===
  +
  +
I often see data types with redundant fields, e.g.
  +
<haskell>
  +
data XML =
  +
Element Position Name [Attribute] [XML]
  +
| Comment Position String
  +
| Text Position String
  +
</haskell>
  +
  +
I suggest to factor out the common field <hask>Position</hask>
  +
since this lets you handle the text position the same way for all XML parts.
  +
<haskell>
  +
data XML = XML Position Part
  +
  +
data Part =
  +
Element Name [Attribute] [XML]
  +
| Comment String
  +
| Text String
  +
</haskell>
  +
  +
==Miscellaneous==
  +
  +
===Separate IO and data processing===
   
 
It's not good to use the IO Monad everywhere,
 
It's not good to use the IO Monad everywhere,
Line 678: Line 748:
 
that is you don't have to specify an order of execution
 
that is you don't have to specify an order of execution
 
and you don't have to worry about what computations are actually necessary.
 
and you don't have to worry about what computations are actually necessary.
  +
Useful techniques are described in [[Avoiding IO]].
You can easily benefit from lazy evaluation
 
if you process data purely functionally
 
and output it by a short IO interaction.
 
   
  +
===Forget about quot and rem===
  +
  +
They complicate handling of negative dividends. <hask>div</hask> and <hask>mod</hask> are almost always the better choice. If <hask>b > 0</hask> then it always holds
 
<haskell>
 
<haskell>
  +
a == b * div a b + mod a b
-- import Control.Monad (replicateM_)
 
  +
mod a b < b
replicateM_ 10 (putStr "foo")
 
  +
mod a b >= 0
</haskell>
 
is certainly worse than
 
<haskell>
 
putStr (concat $ replicate 10 "foo")
 
 
</haskell>
 
</haskell>
  +
The first equation is true also for <hask>quot</hask> and <hask>rem</hask>,
  +
but the two others are true only for <hask>mod</hask>, but not for <hask>rem</hask>. That is, <hask>mod a b</hask> always wraps <hask>a</hask> to an element from <hask>[0..(b-1)]</hask>, whereas the sign of <hask>rem a b</hask> depends on the sign of <hask>a</hask>.
   
  +
This seems to be more an issue of experience rather than one of a superior reason. You might argue, that the sign of the dividend is more important for you, than that of the divisor. However, I have never seen such an application, but many uses of <hask>quot</hask> and <hask>rem</hask> where <hask>div</hask> and <hask>mod</hask> were clearly superior.
  +
  +
Examples:
  +
  +
* Conversion from a continuously counted tone pitch to the pitch class, like C, D, E etc.: <hask>mod p 12</hask>
  +
  +
* Pad a list <hask>xs</hask> to a multiple of <hask>m</hask> number of elements: <hask>xs ++ replicate (mod (- length xs) m) pad</hask>
  +
  +
* Conversion from a day counter to a week day: <hask>mod n 7</hask>
  +
  +
* Pacman runs out of the screen and re-appears at the opposite border: <hask>mod x screenWidth</hask>
  +
  +
See
  +
* Daan Leijen: [http://www.cs.uu.nl/~daan/download/papers/divmodnote-letter.pdf Division and Modulus for Computer Scientists]
  +
* Haskell-Cafe: [http://www.haskell.org/pipermail/haskell-cafe/2007-August/030394.html default for quotRem in terms of divMod?]
  +
  +
===Partial functions like <hask>fromJust</hask> and <hask>head</hask>===
  +
  +
Avoid functions that fail for certain input values like <hask>fromJust</hask> and <hask>head</hask>.
  +
They raise errors that can only be detected at runtime.
  +
Think about how they can be avoided by different program organization
  +
or by choosing more specific types.
  +
  +
Instead of
 
<haskell>
 
<haskell>
  +
if i == Nothing then deflt else fromJust i
do
 
h <- openFile "foo" WriteMode
 
replicateM_ 10 (hPutStr h "bar")
 
hClose h
 
 
</haskell>
 
</haskell>
  +
write
can be shortened to
 
 
<haskell>
 
<haskell>
  +
fromMaybe deflt i
writeFile "foo" (concat $ replicate 10 "bar")
 
 
</haskell>
 
</haskell>
which also safes you from proper closing of the handle <hask>h</hask>
+
Please note, that <hask>(==)</hask> also requires an <hask>Eq</hask> class instance for the type of <hask>i</hask>,
  +
which <hask>fromMaybe</hask> does not require because it employs [[pattern matching]].
in case of failure.
 
  +
See also [[#Reduce type class constraints]].
   
  +
If it is not possible to avoid <hask>fromJust</hask> this way,
A function which computes a random value
 
  +
then use <hask>fromMaybe</hask> anyway
with respect to a custom distribution
 
  +
and document with an <hask>error</hask> why you think that the value must be always <hask>Just</hask> in your situation.
(<hask>distInv</hask> is the inverse of the distribution function)
 
can be defined via IO
 
<haskell>
 
randomDist :: (Random a, Num a) => (a -> a) -> IO a
 
randomDist distInv = liftM distInv (randomRIO (0,1))
 
</haskell>
 
but there is no need to do so.
 
You don't need the state of the whole world
 
just for remembering the state of a random number generator.
 
What about
 
 
<haskell>
 
<haskell>
  +
fromMaybe (error "Function bla: The list does always contains the searched value")
randomDist :: (RandomGen g, Random a, Num a) => (a -> a) -> State g a
 
  +
(lookup key dict)
randomDist distInv = liftM distInv (State (randomR (0,1)))
 
 
</haskell>
 
</haskell>
?
 
   
  +
The function <hask>head</hask> can be avoided by [[Non-empty list|checking with types]], that it is never empty.
== Forget about quot and rem ==
 
  +
There is also a function which returns an existing first list element in terms of <hask>Maybe</hask>:
  +
<hask>maybeToList</hask>
  +
(See [http://www.haskell.org/pipermail/haskell-cafe/2007-March/023391.html remark].
  +
See also "[[Avoiding partial functions]]".)
   
  +
== Related Links ==
They complicate handling of negative dividends.
 
<hask>div</hask> and <hask>mod</hask> are almost always the better choice.
 
If <hask>b&gt;0</hask> then it always holds
 
<haskell>
 
a == b * div a b + mod a b
 
mod a b < b
 
mod a b >= 0
 
</haskell>
 
The first equation is true also for <hask>quot</hask> and <hask>rem</hask>,
 
but the two others are true only for <hask>mod</hask>, but not for <hask>rem</hask>.
 
That is, <hask>mod a b</hask> always wraps <hask>a</hask> to an element from <hask>[0..(b-1)]</hask>,
 
whereas the sign of <hask>rem a b</hask> depends on the sign of <hask>a</hask>.
 
   
  +
=== Common Mistakes and Incorrect Beliefs By Haskell Beginners ===
This seems to be more an issue of experience rather than one of a superior reason.
 
You might argue, that the sign of the dividend is more important for you, than that of the divisor.
 
However, I have never seen such an application,
 
but many uses of <hask>quot</hask> and <hask>rem</hask> where <hask>div</hask> and <hask>mod</hask> were clearly superior.
 
   
  +
* [[Common Misunderstandings]]
Examples:
 
   
  +
=== Some Common (and not so common!) Hugs Errors ===
* Conversion from a continuously counted tone pitch to the pitch class, like C, D, E etc.: <hask>mod p 12</hask>
 
   
  +
* [http://www.cs.kent.ac.uk/people/staff/sjt/craft2e/errors/allErrors.html Some common Hugs error messages]
* Conversion from a day counter to a week day: <hask>mod n 7</hask>
 
   
* Pacman runs out of the screen and re-appears at the opposite border: <hask>mod x screenWidth</hask>
 
   
 
[[Category:Style]]
 
[[Category:Style]]

Latest revision as of 12:36, 4 May 2021

Preface

This page shows several examples of how code can be improved. We try to derive general rules from them, though they cannot be applied deterministically and are a matter of taste. We all know this, please don't add "this is disputable" to each item!

Instead, you can now add "this is disputable" on /Discussion and change this page only when some sort of consensus is reached.

Be concise

Don't reinvent the wheel

The standard libraries are full of useful, well-tuned functions. If you rewrite an existing library function, the reader of your code might spend a minute trying to figure out why you've done that. But if you use a standard function, the reader will either immediately understand what you've done, or can learn something new.

Avoid explicit recursion

Explicit recursion is not generally bad, but you should spend some time trying to find a more declarative implementation using higher order functions.

Don't define

raise :: Num a => a -> [a] -> [a]
raise _ [] = []
raise x (y:ys) = x+y : raise x ys

because it is hard for the reader to find out how much of the list is processed and on which values the elements of the output list depend. Just write

raise x ys = map (x+) ys

or even

raise x = map (x+)

and the reader knows that the complete list is processed and that each output element depends only on the corresponding input element.

If you don't find appropriate functions in the standard library, extract a general function. This helps you and others understand the program. Thanks to higher order functions Haskell gives you very many opportunities to factor out parts of the code. If you find the function very general, put it in a separate module and re-use it. It may appear in the standard libraries later, or you may later find that it is already there in an even more general way.

Decomposing a problem this way also has the advantage that you can debug more easily. If the last implementation of raise does not show the expected behaviour, you can inspect map (I hope it is correct :-) ) and the invoked instance of (+) separately.


This is a special case of the general principle of separating concerns. If you can write the loop over a data structure once and debug it, then there's no need to duplicate that code.


Another example: the function count counts the number of elements which fulfill a certain property, i.e. the elements for which the predicate p is True.

I found the following code (but convoluted in a more specific function) in a Haskell program

count :: (a -> Bool) -> [a] -> Int
count _ [] = 0
count p (x:xs)
   | p x       = 1 + count p xs
   | otherwise =     count p xs

which you won't like after you become aware of

count p = length . filter p

.

Only introduce identifiers you need

Here is some advice that is useful for every language, including scientific prose (http://www.cs.utexas.edu/users/EWD/transcriptions/EWD09xx/EWD993.html): Introduce only identifiers you use. The compiler will check this for you if you pass an option like -Wall to GHC.

In an expression like

[a | i <- [1..m]]

where a might be a horrible complex expression it is not easy to see, that a really does not depend on i.

replicate m a

is certainly better here.


Remember the zero

Don't forget that zero is a natural number. Recursive definitions become more complicated if the recursion anchor is not chosen properly. For example the function tupel presented in DMV-Mitteilungen 2004/12-3, Jürgen Bokowski: Haskell, ein gutes Werkzeug der Diskreten Mathematik (Haskell, a good tool for discrete mathematics). This is also a good example of how to avoid guards.

tuples :: Int -> [a] -> [[a]]
tuples r l
   | r == 1        = [[el] | el <- l]
   | length l == r = [l]
   | otherwise     = (map ([head l] ++) (tuples (r-1) (tail l)))
                                    ++   tuples  r    (tail l)

Do you have an idea what it does?

Let's strip the guards and forget about list comprehension.

tuples :: Int -> [a] -> [[a]]
tuples 1 l = map (:[]) l
tuples r l =
  if r == length l
    then [l]
    else
      let t = tail l
      in  map (head l :) (tuples (r-1) t)
                     ++   tuples  r    t

What about tuples with zero elements? We can add the pattern

tuples 0 _ = [[]]

but then we can also omit the pattern for 1-tuples.

tuples :: Int -> [a] -> [[a]]
tuples 0 _ = [[]]
tuples r l =
  if r == length l
    then [l]
    else
      let t = tail l
      in  map (head l :) (tuples (r-1) t)
                     ++   tuples  r    t

What about the case r > length l? Sure, no reason to let head fail - in that case there is no tuple, thus we return an empty list. Again, this saves us one special case.

tuples :: Int -> [a] -> [[a]]
tuples 0 _ = [[]]
tuples r l =
  if r > length l
    then []
    else
      let t = tail l
      in  map (head l :) (tuples (r-1) t)
                     ++   tuples  r    t

We will learn below that length can be evil! What about

tuples :: Int -> [a] -> [[a]]
tuples 0 _  = [[]]
tuples _ [] = []
tuples r (x:xs) =
   map (x :) (tuples (r-1) xs)
         ++   tuples  r    xs

? It is no longer necessary to compute the length of l again and again. The code is easier to read and it covers all special cases, including tuples (-1) [1,2,3]!

Eliminating the length test can worsen performance dramatically in some cases, like tuples 24 [1..25]. We could also use null (drop (r-1) l) instead of length l < r, which works for infinite lists. See also below.

You can even save one direction of recursion by explicit computation of the list of all suffixes provided by tails. You can do this with do notation

tuples :: Int -> [a] -> [[a]]
tuples 0 _  = [[]]
tuples r xs = do
  y:ys <- tails xs
  map (y:) (tuples (r-1) ys)

Since (=<<) in the list monad is concatMap, we can also write this as follows. Where in the previous version the pattern y:ys filtered out the last empty suffix we have to do this manually now with init.

tuples :: Int -> [a] -> [[a]]
tuples 0 _  = [[]]
tuples r xs =
   concatMap (\(y:ys) -> map (y:) (tuples (r-1) ys))
             (init (tails xs))

The list of all suffixes could be generated with iterate tail but this ends with a "Prelude.tail: empty list". tails generates the suffixes in the same order but aborts properly.


More generally, Base cases and identities

Don't overuse lambdas

Like explicit recursion, using explicit lambdas isn't a universally bad idea, but a better solution often exists. For example, Haskell is quite good at currying. Don't write

zipWith (\x y -> f x y)

map (\x -> x + 42)

instead, write

zipWith f

map (+42)

also, instead of writing

-- sort a list of strings case insensitively
sortBy (\x y -> compare (map toLower x) (map toLower y))

write

comparing p x y = compare (p x) (p y)

sortBy (comparing (map toLower))

which is both clearer and re-usable. Actually, starting with GHC-6.6 you do not need to define comparing, since it is already in module Data.Ord. http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-Ord.html

(Just a remark for this special example: We can avoid multiple evaluations of the conversions with a function that is present in GHC.Exts of GHC 6.10:

sortWith :: (Ord b) => (a -> b) -> [a] -> [a]
sortWith f x = map snd (sortBy (comparing fst) (zip (map f x) x))

)

As a rule of thumb, once your expression becomes too long to easily be point-freed, it probably deserves a name anyway. Lambdas are occasionally appropriate however, e.g. for control structures in monadic code (in this example, a control-structure "foreach2" which most languages don't even support.):

foreach2 xs ys f = zipWithM_ f xs ys

linify :: [String] -> IO ()
linify lines
        = foreach2 [1..] lines $ \lineNr line -> do
            unless (null line) $
                putStrLn $ shows lineNr $ showString ": " $ show line

Bool is a regular type

Logic expressions are not restricted to guards and if statements. Avoid verbosity like in

isEven n
  | mod n 2 == 0  =  True
  | otherwise     =  False

since it is the same as

isEven n  =  mod n 2 == 0

.

The definitions

hasSpace (a:as)
  | isSpace a = True
  | otherwise = hasSpace as

and

hasSpace (a:as) = if isSpace a then True else hasSpace as

can be shortened to

hasSpace (a:as) = isSpace a || hasSpace as

(I just wanted to show the logic transform. In the particular example you would write any isSpace, of course.) The same way

allPrintable (a:as)
  | isSpace a = False
  | otherwise = allPrintable as

and

allPrintable (a:as) = if isSpace a then False else allPrintable as

can be shortened to

allPrintable (a:as) = not (isSpace a) && allPrintable as

(but all (not . isSpace) is even better in this particular example.)


Use syntactic sugar wisely

People who employ syntactic sugar extensively argue that it makes their code more readable. The following sections show several examples where less syntactic sugar is more readable.

It is argued that a special notation is often more intuitive than a purely functional expression. But the term "intuitive notation" is always a matter of habit. You can also develop an intuition for analytic expressions that don't match your habits at the first glance. So why not making a habit of less sugar sometimes?


List comprehension

List comprehension lets you remain in imperative thinking, that is it lets you think in variables rather than transformations. Open your mind, discover the flavour of the pointfree style!

Instead of

[toUpper c | c <- s]

write

map toUpper s

.


Consider

[toUpper c | s <- strings, c <- s]

where it takes some time for the reader to discover which value depends on what other value and it is not so clear how many times the interim values s and c are used. In contrast to that

map toUpper (concat strings)

can't be clearer.


When using higher order functions you can switch more easily from List to other data structures.

Compare

map (1+) list

and

mapSet (1+) set

. If there were a standard instance for the Functor class you could use the code

fmap (1+) pool

for both choices.

If you are not used to higher order functions for list processing you may feel you need parallel list comprehension. This is unfortunately supported by GHC now, but it is arguably superfluous since various flavours of zip already do a great job.



do notation

do notation is useful to express the imperative nature (e.g. a hidden state or an order of execution) of a piece of code. Nevertheless it's sometimes useful to remember that the do notation is explained in terms of functions.

Instead of

do
  text <- readFile "foo"
  writeFile "bar" text

one can write

readFile "foo" >>= writeFile "bar"

.


The code

do
  text <- readFile "foo"
  return text

can be simplified to

readFile "foo"

by a law that each Monad must fulfill.


You certainly also agree that

do
  text <- readFile "foobar"
  return (lines text)

is more complicated than

liftM lines (readFile "foobar")

. By the way, the Functor class method fmap and the Monad based function liftM are the same (as long as both are defined, as they should be).

Be aware that "more complicated" does not imply "worse". If your do-expression was longer than this, then mixing do-notation and fmap might be precisely the wrong thing to do, because it adds one more thing to think about. Be natural. Only change it if you gain something by changing it. -- AndrewBromage

Guards

Disclaimer: This section is NOT advising you to avoid guards. It is advising you to prefer pattern matching to guards when both are appropriate. -- AndrewBromage

Guards look like

-- Bad implementation:
fac :: Integer -> Integer
fac n | n == 0 = 1
      | n /= 0 = n * fac (n-1)

which implements a factorial function. This example, like a lot of uses of guards, has a number of problems.

The first problem is that it's nearly impossible for the compiler to check whether guards like this are exhaustive, as the guard conditions may be arbitrarily complex (GHC will warn you if you use the -Wall option). To avoid this problem and potential bugs through non exhaustive patterns you should use an otherwise guard, that will match for all remaining cases:

-- Slightly improved implementation:
fac :: Integer -> Integer
fac n | n == 0    = 1
      | otherwise = n * fac (n-1)

Another reason to prefer this one is its greater readability for humans and optimizability for compilers. Though it may not matter much in a simple case like this, when seeing an otherwise it's immediately clear that it's used whenever the previous guard fails, which isn't true if the "negation of the previous test" is spelled out. The same applies to the compiler: It probably will be able to optimize an otherwise (which is a synonym for True) away but cannot do that for most expressions.

This can be done with even less sugar using if,

-- Less sugar (though the verbosity of if-then-else can also be considered as sugar :-)
fac :: Integer -> Integer
fac n = if n == 0
          then 1
          else n * fac (n-1)

Note that if has its own set of problems, for example in connection with the layout rule or that nested ifs are difficult to read. See Case how to avoid nested ifs.

But in this special case, the same can be done even more easily with pattern matching:

-- Good implementation:
fac :: Integer -> Integer
fac 0 = 1
fac n = n * fac (n-1)

Actually, in this case there is an even more easier to read version, which (see above) doesn't use Explicit Recursion:

-- Excellent implementation:
fac :: Integer -> Integer
fac n = product [1..n]

This may also be more efficient as product might be optimized by the library-writer... In GHC, when compiling with optimizations turned on, this version runs in O(1) stack-space, whereas the previous versions run in O(n) stack-space.

Note however, that there is a difference between this version and the previous ones: When given a negative number, the previous versions do not terminate (until StackOverflow-time), while the last implementation returns 1.


Guards don't always make code clearer. Compare

foo xs | not (null xs) = bar (head xs)

and

foo (x:_) = bar x

or compare the following example using the advanced pattern guards

parseCmd ln
   | Left err <- parse cmd "Commands" ln
     = BadCmd $ unwords $ lines $ show err
   | Right x <- parse cmd "Commands" ln
     = x

with this one with no pattern guards:

parseCmd ln = case parse cmd "Commands" ln of
   Left err -> BadCmd $ unwords $ lines $ show err
   Right x  -> x

or, if you expect your readers to be familiar with the either function:

parseCmd :: -- add an explicit type signature, as this is now a pattern binding
parseCmd = either (BadCmd . unwords . lines . show) id . parse cmd "Commands"


Incidentally, compilers often also have problems with numerical patterns. For example, the pattern 0 in fact means fromInteger 0; thus it involves a computation, which is uncommon for function parameter patterns. To illustrate this, consider the following example:

data Foo = Foo deriving (Eq, Show)

instance Num Foo where
    fromInteger = error "forget it"

f       :: Foo -> Bool
f 42    = True
f _     = False
*Main> f 42
*** Exception: forget it

Only use guards when you need to. In general, you should stick to pattern matching whenever possible.

n+k patterns

In order to allow pattern matching against numerical types, Haskell 98 provides so-called n+k patterns, as in

take :: Int -> [a] -> [a]
take (n+1) (x:xs) = x: take n xs
take _     _      = []

However, they are often criticized for hiding computational complexity and producing ambiguities, see /Discussion for details. They are subsumed by the more general Views proposal, which has unfortunately never been implemented despite being around for quite some time now.

The n+k patterns are not in the Haskell 2010 standard.

Efficiency and infinity

A rule of thumb is: If a function makes sense for an infinite data structure but the implementation at hand fails for an infinite amount of data, then the implementation is probably also inefficient for finite data.

Don't ask for the length of a list when you don't need it

Don't write

length x == 0

to find out if the list x is empty. If you write it, you force Haskell to create all list nodes. It fails on an infinite list although the expression should be evaluated to False in this case. (Nevertheless the content of the list elements may not be evaluated.)

In contrast

x == []

is faster but it requires the list x to be of type [a] where a is a type of class Eq.

The best thing to do is

null x
Additionally, many uses of the length function are overspecifying the problem: one may only need to check that a list is at least a certain length, and not a specific length. Thus use of
length
could be replaced with an atLeast function that only checks to see that a list is greater than the required minimum length.
atLeast :: Int -> [a] -> Bool
atLeast 0 _      = True
atLeast _ []     = False
atLeast n (_:ys) = atLeast (n-1) ys

or non-recursive, but less efficient because both length and take must count

atLeast :: Int -> [a] -> Bool
atLeast n x = n == length (take n x)

or non-recursive but fairly efficient

atLeast :: Int -> [a] -> Bool
atLeast n =
  if n>0
    then not . null . drop (n-1)
    else const True

or

atLeast :: Int -> [a] -> Bool
atLeast 0 = const True
atLeast n = not . null . drop (n-1)

The same problem arises if you want to shorten a list to the length of another one by

take (length x) y

since this is inefficient for large lists x and fails for infinite ones. But this can be useful to extract a finite prefix from an infinite list. So, instead

zipWith const y x

works well.

It should be noted that length, take can be replaced by genericLength, genericTake et.al., which allow the usage of Peano numbers.

Don't ask for the minimum when you don't need it

The function isLowerLimit checks if a number is a lower limit to a sequence.

isLowerLimit :: Ord a => a -> [a] -> Bool
isLowerLimit x ys = x <= minimum ys

It certainly fails if ys is infinite. Is this a problem?

Compare it with

isLowerLimit x = all (x<=)

This definition terminates for infinite lists, if x is not a lower limit. It aborts immediately if an element is found which is below x. Thus it is also faster for finite lists. Even more: It also works for empty lists.


Use sharing

If you want a list of lists with increasing length and constant content, don't write

map (flip replicate x) [0..]

because this needs quadratic space and run-time. If you code

iterate (x:) []

then the lists will share their suffixes and thus need only linear space and run-time for creation.


Choose the appropriate fold

See "Stack overflow" or "Foldr Foldl Foldl'" for advice on which fold is appropriate for your situation.


Choose types properly

Lists are not good for everything

Lists are not arrays

Lists are not arrays, so don't treat them as such. Frequent use of (!!) should alarm you. Accessing the nth list element involves traversing through the first n nodes of the list. This is very inefficient.

If you access the elements progressively, as in

[x !! i - i | i <- [0..n]]

you should try to get rid of indexing, as in

zipWith (-) x [0..n]

.

If you really need random access, as in the Fourier Transform, you should switch to Arrays.


Lists are not sets

If you manage data sets where each object can occur only once and the order is irrelevant, if you use list functions like sort, nub, union, elem, delete, (\\) frequently, you should think about switching to sets. If you need multi-sets, i.e. data sets with irrelevant order but multiple occurrences of objects, you can use a Data.Map.Map a Int.


Lists are not finite maps

Similarly, lists are not finite maps, as mentioned in efficiency hints.

Reduce type class constraints

Eq type class

When using functions like delete, (\\), nub, and so on you should be aware that they need types of the Eq class. There are two problems: The routines might not work as expected if a processed list contains multiple equal elements and the element type of the list may not be comparable, like functions.

Example: The following function takes the input list xs and removes each element of xs once from xs. Clear what it does? No? The code is probably more understandable

removeEach :: (Eq a) => [a] -> [[a]]
removeEach xs = map (flip List.delete xs) xs

but it should be replaced by

removeEach :: [a] -> [[a]]
removeEach xs =
   zipWith (++) (List.inits xs) (tail (List.tails xs))

since this works perfectly for function types a and for equal elements in xs.


Don't use Int when you don't consider integers

Before using integers for each and everything (C style) think of more specialised types. If only the values 0 and 1 are of interest, try the type Bool instead. If there are more but predefined choices and numeric operations aren't needed try an enumeration.

Instead of

type Weekday = Int

write

data Weekday = Monday
             | Tuesday
             | Wednesday
             | Thursday
             | Friday
             | Saturday
             | Sunday
  deriving (Eq, Ord, Enum)

It allows all sensible operations like ==, <, succ and forbids all nonsensical ones like +, *. You cannot accidentally mix up weekdays with numbers and the signature of a function with weekday parameter clearly states what kind of data is expected.

If an enumeration is not appropriate you can define a newtype carrying the type that is closest to what you need. E.g. if you want to associate objects with a unique identifier, you may want to choose the type Int. But you don't need arithmetic and you can make this type distinct from real Ints by defining

newtype Identifier = Identifier Int deriving Eq


Avoid redundancy in data types

I often see data types with redundant fields, e.g.

data XML =
     Element Position Name [Attribute] [XML]
   | Comment Position String
   | Text Position String

I suggest to factor out the common field Position since this lets you handle the text position the same way for all XML parts.

data XML = XML Position Part

data Part =
     Element Name [Attribute] [XML]
   | Comment String
   | Text String

Miscellaneous

Separate IO and data processing

It's not good to use the IO Monad everywhere, much of the data processing can be done without IO interaction. You should separate data processing and IO because pure data processing can be done purely functionally, that is you don't have to specify an order of execution and you don't have to worry about what computations are actually necessary. Useful techniques are described in Avoiding IO.

Forget about quot and rem

They complicate handling of negative dividends. div and mod are almost always the better choice. If b > 0 then it always holds

a == b * div a b + mod a b
mod a b < b
mod a b >= 0

The first equation is true also for quot and rem, but the two others are true only for mod, but not for rem. That is, mod a b always wraps a to an element from [0..(b-1)], whereas the sign of rem a b depends on the sign of a.

This seems to be more an issue of experience rather than one of a superior reason. You might argue, that the sign of the dividend is more important for you, than that of the divisor. However, I have never seen such an application, but many uses of quot and rem where div and mod were clearly superior.

Examples:

  • Conversion from a continuously counted tone pitch to the pitch class, like C, D, E etc.: mod p 12
  • Pad a list xs to a multiple of m number of elements: xs ++ replicate (mod (- length xs) m) pad
  • Conversion from a day counter to a week day: mod n 7
  • Pacman runs out of the screen and re-appears at the opposite border: mod x screenWidth

See

Partial functions like fromJust and head

Avoid functions that fail for certain input values like fromJust and head. They raise errors that can only be detected at runtime. Think about how they can be avoided by different program organization or by choosing more specific types.

Instead of

if i == Nothing then deflt else fromJust i

write

fromMaybe deflt i

Please note, that (==) also requires an Eq class instance for the type of i, which fromMaybe does not require because it employs pattern matching. See also #Reduce type class constraints.

If it is not possible to avoid fromJust this way, then use fromMaybe anyway and document with an error why you think that the value must be always Just in your situation.

fromMaybe (error "Function bla: The list does always contains the searched value")
          (lookup key dict)

The function head can be avoided by checking with types, that it is never empty. There is also a function which returns an existing first list element in terms of Maybe: maybeToList (See remark. See also "Avoiding partial functions".)

Related Links

Common Mistakes and Incorrect Beliefs By Haskell Beginners

Some Common (and not so common!) Hugs Errors