Haskell programming tips: Difference between revisions
(plain copy from Hawiki) |
(ported from hawiki) |
||
Line 18: | Line 18: | ||
Don't define | Don't define | ||
<haskell> | |||
raise :: Num a => a -> [a] -> [a] | raise :: Num a => a -> [a] -> [a] | ||
raise _ [] = [] | raise _ [] = [] | ||
raise x (y:ys) = x+y : raise x ys | raise x (y:ys) = x+y : raise x ys | ||
</haskell> | |||
because it is hard for the reader to find out, | because it is hard for the reader to find out, | ||
how much of the list is processed and | how much of the list is processed and | ||
on which values the elements of the output list depend. | on which values the elements of the output list depend. | ||
Just write | Just write | ||
<haskell> | |||
raise x ys = map (x+) ys | raise x ys = map (x+) ys | ||
</haskell> | |||
or even | or even | ||
<haskell> | |||
raise x = map (x+) | raise x = map (x+) | ||
</haskell> | |||
and the reader knows that the complete list is processed and that each output element depends only on the corresponding input element. | and the reader knows that the complete list is processed and that each output element depends only on the corresponding input element. | ||
Line 39: | Line 39: | ||
If you find it very general, put it in a separate module and re-use it. It may appear in the standard libraries later, or you may later find that it is already there. | If you find it very general, put it in a separate module and re-use it. It may appear in the standard libraries later, or you may later find that it is already there. | ||
Decomposing a problem this way has also the advantage that you can debug easier. If the last implementation of | Decomposing a problem this way has also the advantage that you can debug easier. If the last implementation of <hask>raise</hask> does not show the expected behaviour, you can inspect <hask>map</hask> (I hope it is correct :-) ) and the invoked instance of <hask>(+)</hask> separately. | ||
Line 46: | Line 46: | ||
Another example: | Another example: | ||
The function | The function <hask>count</hask> counts the number of elements | ||
which fulfill a certain property, | which fulfill a certain property, | ||
i.e. the elements for which the predicate | i.e. the elements for which the predicate <hask>p</hask> is <hask>True</hask>. | ||
I found the following code (but convoluted in a more specific function) in a Haskell program | I found the following code (but convoluted in a more specific function) in a Haskell program | ||
<haskell> | |||
count :: (a -> Bool) -> [a] -> Int | count :: (a -> Bool) -> [a] -> Int | ||
count _ [] = 0 | count _ [] = 0 | ||
Line 57: | Line 57: | ||
| p x = 1 + count p xs | | p x = 1 + count p xs | ||
| otherwise = count p xs | | otherwise = count p xs | ||
</haskell> | |||
which you won't like any longer if you become aware of | which you won't like any longer if you become aware of | ||
<haskell> | |||
count p = length . filter p | count p = length . filter p | ||
</haskell> | |||
. | . | ||
Line 70: | Line 70: | ||
(http://www.cs.utexas.edu/users/EWD/transcriptions/EWD09xx/EWD993.html): | (http://www.cs.utexas.edu/users/EWD/transcriptions/EWD09xx/EWD993.html): | ||
Introduce only identifiers you use. | Introduce only identifiers you use. | ||
The compiler will check that you if you pass an option like | The compiler will check that you if you pass an option like <code>-Wall</code> for GHC. | ||
In an expression like | In an expression like | ||
<haskell> | |||
[a | i <- [1..m]] | [a | i <- [1..m]] | ||
</haskell> | |||
where | where <hask>a</hask> might be a horrible complex expression it is not easy to see, | ||
that | that <hask>a</hask> really does not depend on <hask>i</hask>. | ||
<haskell> | |||
replicate m a | replicate m a | ||
</haskell> | |||
is certainly better here. | is certainly better here. | ||
Line 86: | Line 86: | ||
== Remember the zero == | == Remember the zero == | ||
Don't forget that zero is a natural number. Recursive definitions become more complicated if the recursion anchor is not chosen properly. As an example I have chosen the function | Don't forget that zero is a natural number. Recursive definitions become more complicated if the recursion anchor is not chosen properly. As an example I have chosen the function <hask>tupel</hask> presented in ''DMV-Mitteilungen 2004/12-3, Jürgen Bokowski: Haskell, ein gutes Werkzeug der Diskreten Mathematik'' (Haskell, a good tool for discrete mathematics). It is also a good example of how to avoid guards. | ||
<haskell> | |||
tuples :: Int -> [a] -> [[a]] | tuples :: Int -> [a] -> [[a]] | ||
tuples r l | tuples r l | ||
Line 94: | Line 94: | ||
| otherwise = (map ([head l] ++) (tuples (r-1) (tail l))) | | otherwise = (map ([head l] ++) (tuples (r-1) (tail l))) | ||
++ tuples r (tail l) | ++ tuples r (tail l) | ||
</haskell> | |||
Do you have an idea what it does? | Do you have an idea what it does? | ||
Let's strip the guards and forget about list comprehension. | Let's strip the guards and forget about list comprehension. | ||
<haskell> | |||
tuples :: Int -> [a] -> [[a]] | tuples :: Int -> [a] -> [[a]] | ||
tuples 1 l = map (:[]) l | tuples 1 l = map (:[]) l | ||
Line 108: | Line 108: | ||
in map (head l :) (tuples (r-1) t) | in map (head l :) (tuples (r-1) t) | ||
++ tuples r t | ++ tuples r t | ||
</haskell> | |||
What about tuples with zero elements? We can add the pattern | What about tuples with zero elements? We can add the pattern | ||
<haskell> | |||
tuples 0 _ = [[]] | tuples 0 _ = [[]] | ||
</haskell> | |||
but then we can also omit the pattern for 1-tuples. | but then we can also omit the pattern for 1-tuples. | ||
<haskell> | |||
tuples :: Int -> [a] -> [[a]] | tuples :: Int -> [a] -> [[a]] | ||
tuples 0 _ = [[]] | tuples 0 _ = [[]] | ||
Line 126: | Line 126: | ||
in map (head l :) (tuples (r-1) t) | in map (head l :) (tuples (r-1) t) | ||
++ tuples r t | ++ tuples r t | ||
</haskell> | |||
What about the case | What about the case <hask>r > length l</hask>? Sure, no reason to let <hask>head</hask> fail - in that case there is no tuple, thus we return an empty list. Again, this saves us one special case. | ||
<haskell> | |||
tuples :: Int -> [a] -> [[a]] | tuples :: Int -> [a] -> [[a]] | ||
tuples 0 _ = [[]] | tuples 0 _ = [[]] | ||
Line 138: | Line 138: | ||
in map (head l :) (tuples (r-1) t) | in map (head l :) (tuples (r-1) t) | ||
++ tuples r t | ++ tuples r t | ||
</haskell> | |||
We have learnt above that | We have learnt above that <hask>length</hask> is evil! What about | ||
<haskell> | |||
tuples :: Int -> [a] -> [[a]] | tuples :: Int -> [a] -> [[a]] | ||
tuples 0 _ = [[]] | tuples 0 _ = [[]] | ||
Line 148: | Line 148: | ||
map (x :) (tuples (r-1) xs) | map (x :) (tuples (r-1) xs) | ||
++ tuples r xs | ++ tuples r xs | ||
</haskell> | |||
? It is no longer necessary to compute the length of | ? It is no longer necessary to compute the length of <hask>l</hask> again and again. The code is easier to read and it covers all special cases, including <hask>tuples (-1) [1,2,3]</hask>! | ||
You can even save one direction of recursion | You can even save one direction of recursion | ||
by explicit computation of the list of all suffixes provided by | by explicit computation of the list of all suffixes provided by <hask>tails</hask>. | ||
You can do this with do notation | You can do this with do notation | ||
<haskell> | |||
tuples :: Int -> [a] -> [[a]] | tuples :: Int -> [a] -> [[a]] | ||
tuples 0 _ = [[]] | tuples 0 _ = [[]] | ||
Line 160: | Line 160: | ||
y:ys <- tails xs | y:ys <- tails xs | ||
map (y:) (tuples (r-1) ys) | map (y:) (tuples (r-1) ys) | ||
</haskell> | |||
Since (=<<) in the list monad is concatMap, we can also write this as follows. | Since (=<<) in the list monad is concatMap, we can also write this as follows. | ||
Where in the previous version the pattern | Where in the previous version the pattern <hask>y:ys</hask> filtered out the last empty suffix | ||
we have to do this manually now with | we have to do this manually now with <hask>init</hask>. | ||
<haskell> | |||
tuples :: Int -> [a] -> [[a]] | tuples :: Int -> [a] -> [[a]] | ||
tuples 0 _ = [[]] | tuples 0 _ = [[]] | ||
Line 171: | Line 171: | ||
concatMap (\(y:ys) -> map (y:) (tuples (r-1) ys)) | concatMap (\(y:ys) -> map (y:) (tuples (r-1) ys)) | ||
(init (tails xs)) | (init (tails xs)) | ||
</haskell> | |||
The list of all suffixes could be generated with | The list of all suffixes could be generated with <hask>iterate tail</hask> | ||
but this ends with a "Prelude.tail: empty list". | but this ends with a "Prelude.tail: empty list". | ||
<hask>tails</hask> generates the suffixes in the same order but aborts properly. | |||
Line 183: | Line 183: | ||
Like explicit recursion, using explicit lambdas isn't a universally bad idea, but a better solution often exists. | Like explicit recursion, using explicit lambdas isn't a universally bad idea, but a better solution often exists. | ||
For example, Haskell is quite good at currying. Don't write | For example, Haskell is quite good at currying. Don't write | ||
<haskell> | |||
zipWith (\x y -> f x y) | zipWith (\x y -> f x y) | ||
map (\x -> x + 42) | map (\x -> x + 42) | ||
</haskell> | |||
instead, write | instead, write | ||
<haskell> | |||
zipWith f | zipWith f | ||
map (+42) | map (+42) | ||
</haskell> | |||
also, instead of writing | also, instead of writing | ||
<haskell> | |||
-- sort a list of strings case insensitively | -- sort a list of strings case insensitively | ||
sortBy (\x y -> compare (map toLower x) (map toLower y)) | sortBy (\x y -> compare (map toLower x) (map toLower y)) | ||
</haskell> | |||
write | write | ||
<haskell> | |||
comparing p x y = compare (p x) (p y) | comparing p x y = compare (p x) (p y) | ||
sortBy (comparing (map toLower)) | sortBy (comparing (map toLower)) | ||
</haskell> | |||
which is both clearer and re-usable. | which is both clearer and re-usable. | ||
Actually, in a future version of GHC you may not even have to define | Actually, in a future version of GHC you may not even have to define <hask>comparing</hask>, as it's already defined in Data.Ord in the CVS version. | ||
(Just a remark for this special example: | (Just a remark for this special example: | ||
We can avoid multiple evaluations of the conversions. | We can avoid multiple evaluations of the conversions. | ||
<haskell> | |||
sortKey :: (Ord b) => (a -> b) -> [a] -> [a] | sortKey :: (Ord b) => (a -> b) -> [a] -> [a] | ||
sortKey f x = map snd (sortBy (comparing fst) (zip (map f x) x)) | sortKey f x = map snd (sortBy (comparing fst) (zip (map f x) x)) | ||
</haskell> | |||
) | ) | ||
As a rule of thumb, once your expression becomes too long to easily be point-freed, it probably deserves a name anyway. | As a rule of thumb, once your expression becomes too long to easily be point-freed, it probably deserves a name anyway. | ||
Lambdas are occasionally appropriate however, e.g. for control structures in monadic code (in this example, a control-structure "foreach2" which most languages don't even support.): | Lambdas are occasionally appropriate however, e.g. for control structures in monadic code (in this example, a control-structure "foreach2" which most languages don't even support.): | ||
<haskell> | |||
foreach2 xs ys f = zipWithM_ f xs ys | foreach2 xs ys f = zipWithM_ f xs ys | ||
Line 229: | Line 229: | ||
unless (null line) $ | unless (null line) $ | ||
putStrLn $ shows lineNr $ showString ": " $ show line | putStrLn $ shows lineNr $ showString ": " $ show line | ||
</haskell> | |||
== Bool is a regular type == | == Bool is a regular type == | ||
Logic expressions are not restricted to guards and | Logic expressions are not restricted to guards and <hask>if</hask> statements. | ||
Avoid verbosity like in | Avoid verbosity like in | ||
<haskell> | |||
isEven n | isEven n | ||
| mod n 2 == 0 = True | | mod n 2 == 0 = True | ||
| otherwise = False | | otherwise = False | ||
</haskell> | |||
since it is the same as | since it is the same as | ||
<haskell> | |||
isEven n = mod n 2 == 0 | isEven n = mod n 2 == 0 | ||
</haskell> | |||
. | . | ||
Line 270: | Line 270: | ||
Instead of | Instead of | ||
<haskell> | |||
[toUpper c | c <- s] | [toUpper c | c <- s] | ||
</haskell> | |||
write | write | ||
<haskell> | |||
map toUpper s | map toUpper s | ||
</haskell> | |||
. | . | ||
Consider | Consider | ||
<haskell> | |||
[toUpper c | s <- strings, c <- s] | [toUpper c | s <- strings, c <- s] | ||
</haskell> | |||
where it takes some time for the reader | where it takes some time for the reader | ||
to find out which value depends on what other value | to find out which value depends on what other value | ||
and it is not so clear how many times | and it is not so clear how many times | ||
the interim values | the interim values <hask>s</hask> and <hask>c</hask> are used. | ||
In contrast to that | In contrast to that | ||
<haskell> | |||
map toUpper (concat strings) | map toUpper (concat strings) | ||
</haskell> | |||
can't be clearer. | can't be clearer. | ||
Line 297: | Line 297: | ||
When using higher order functions you can switch easier to | When using higher order functions you can switch easier to | ||
data structures different from | data structures different from <hask>List</hask>. | ||
Compare | Compare | ||
<haskell> | |||
map (1+) list | map (1+) list | ||
</haskell> | |||
and | and | ||
<haskell> | |||
mapSet (1+) set | mapSet (1+) set | ||
</haskell> | |||
. | . | ||
If there would be a standard instance for the | If there would be a standard instance for the <hask>Functor</hask> class | ||
you could use the code | you could use the code | ||
<haskell> | |||
fmap (1+) pool | fmap (1+) pool | ||
</haskell> | |||
for both choices. | for both choices. | ||
Line 318: | Line 318: | ||
you feel like needing parallel list comprehension. | you feel like needing parallel list comprehension. | ||
This is unfortunately supported by GHC now, | This is unfortunately supported by GHC now, | ||
but somehow superfluous since various flavours of | but somehow superfluous since various flavours of <hask>zip</hask> already do a great job. | ||
Line 326: | Line 326: | ||
Do notation is useful to express the imperative nature (e.g. a hidden state or an order of execution) of a piece of code. | Do notation is useful to express the imperative nature (e.g. a hidden state or an order of execution) of a piece of code. | ||
Nevertheless it's sometimes useful to remember that the | Nevertheless it's sometimes useful to remember that the <hask>do</hask> notation is explained in terms of functions. | ||
Instead of | Instead of | ||
<haskell> | |||
do | do | ||
text <- readFile "foo" | text <- readFile "foo" | ||
writeFile "bar" text | writeFile "bar" text | ||
</haskell> | |||
one can write | one can write | ||
<haskell> | |||
readFile "foo" >>= writeFile "bar" | readFile "foo" >>= writeFile "bar" | ||
</haskell> | |||
. | . | ||
The code | The code | ||
<haskell> | |||
do | do | ||
text <- readFile "foo" | text <- readFile "foo" | ||
return text | return text | ||
</haskell> | |||
can be simplified to | can be simplified to | ||
<haskell> | |||
readFile "foo" | readFile "foo" | ||
</haskell> | |||
by a law that each Monad must fulfill. | by a law that each Monad must fulfill. | ||
You certainly also agree that | You certainly also agree that | ||
<haskell> | |||
do | do | ||
text <- readFile "foobar" | text <- readFile "foobar" | ||
return (lines text) | return (lines text) | ||
</haskell> | |||
is more complicated than | is more complicated than | ||
<haskell> | |||
liftM lines (readFile "foobar") | liftM lines (readFile "foobar") | ||
</haskell> | |||
. | . | ||
Btw. in the case of | Btw. in the case of <hask>IO</hask> monad the <hask>Functor</hask> class method <hask>fmap</hask> and the <hask>Monad</hask> based function <hask>liftM</hask> are the same. | ||
''Be aware that "more complicated" does not imply "worse". If your do-expression was longer than this, then mixing do-notation and | ''Be aware that "more complicated" does not imply "worse". If your do-expression was longer than this, then mixing do-notation and <hask>fmap</hask> might be precisely the wrong thing to do, because it adds one more thing to think about. Be natural. Only change it if you gain something by changing it. -- AndrewBromage'' | ||
== Guards == | == Guards == | ||
Guards look like | Guards look like | ||
<haskell> | |||
-- Bad implementation: | -- Bad implementation: | ||
fac :: Integer -> Integer | fac :: Integer -> Integer | ||
fac n | n == 0 = 1 | fac n | n == 0 = 1 | ||
| n /= 0 = n * fac (n-1) | | n /= 0 = n * fac (n-1) | ||
</haskell> | |||
which implements a factorial function. This example, like a lot of uses of guards, has a number of problems. | which implements a factorial function. This example, like a lot of uses of guards, has a number of problems. | ||
The first problem is that it's nearly impossible for the compiler to check if guards like this are exhaustive, as the guard conditions may be arbitrarily complex (Ghc will warn you if you use the | The first problem is that it's nearly impossible for the compiler to check if guards like this are exhaustive, as the guard conditions may be arbitrarily complex (Ghc will warn you if you use the <code>-Wall</code> option). To avoid this problem and potential bugs through non exhaustive patterns you should use an <hask>otherwise</hask> guard, that will match for all remaining cases: | ||
<haskell> | |||
-- Slightly improved implementation: | -- Slightly improved implementation: | ||
fac :: Integer -> Integer | fac :: Integer -> Integer | ||
fac n | n == 0 = 1 | fac n | n == 0 = 1 | ||
| otherwise = n * fac (n-1) | | otherwise = n * fac (n-1) | ||
</haskell> | |||
Another reason to prefer this one is its greater readability for humans and optimizability for compilers. Though it may not matter much in a simple case like this, when seeing an | Another reason to prefer this one is its greater readability for humans and optimizability for compilers. Though it may not matter much in a simple case like this, when seeing an <hask>otherwise</hask> it's immediately clear that it's used whenever the previous guard fails, which isn't true if the "negation of the previous test" is spelled out. The same applies to the compiler: It probably will be able to optimize an <hask>otherwise</hask> (which is a synonym for <hask>True</hask>) away but cannot do that for most expressions. | ||
This can be done with even less sugar using | This can be done with even less sugar using <hask>if</hask>, | ||
<haskell> | |||
-- Less sugar (though the verbosity of if-then-else can also be considered as sugar :-) | -- Less sugar (though the verbosity of if-then-else can also be considered as sugar :-) | ||
fac :: Integer -> Integer | fac :: Integer -> Integer | ||
Line 398: | Line 398: | ||
then 1 | then 1 | ||
else n * fac (n-1) | else n * fac (n-1) | ||
</haskell> | |||
Note that | Note that <hask>if</hask> has its own set of problems, for example in connection with the layout rule or that nested <hask>if</hask>s are difficult to read. See ["Case"] how to avoid nested <hask>if</hask>s. | ||
But in this special case, the same can be done even more easily with pattern matching: | But in this special case, the same can be done even more easily with pattern matching: | ||
<haskell> | |||
-- Good implementation: | -- Good implementation: | ||
fac :: Integer -> Integer | fac :: Integer -> Integer | ||
fac 0 = 1 | fac 0 = 1 | ||
fac n = n * fac (n-1) | fac n = n * fac (n-1) | ||
</haskell> | |||
Actually, in this case there is an even more easier to read version, which (see above) doesn't use Explicit Recursion: | Actually, in this case there is an even more easier to read version, which (see above) doesn't use Explicit Recursion: | ||
<haskell> | |||
-- Excellent implementation: | -- Excellent implementation: | ||
fac :: Integer -> Integer | fac :: Integer -> Integer | ||
fac n = product [1..n] | fac n = product [1..n] | ||
</haskell> | |||
This may also be more efficient as | This may also be more efficient as <hask>product</hask> might be optimized by the library-writer... In GHC, when compiling with optimizations turned on, this version runs in O(1) stack-space, whereas the previous versions run in O(n) stack-space. | ||
Note however, that there is a difference between this version and the previous ones: When given a negative number, the previous versions do not terminate (until StackOverflow-time), while the last implemenation returns 1. | Note however, that there is a difference between this version and the previous ones: When given a negative number, the previous versions do not terminate (until StackOverflow-time), while the last implemenation returns 1. | ||
Line 422: | Line 422: | ||
Guards don't always make code clearer. | Guards don't always make code clearer. | ||
Compare | Compare | ||
<haskell> | |||
foo xs | not (null xs) = bar (head xs) | foo xs | not (null xs) = bar (head xs) | ||
</haskell> | |||
and | and | ||
<haskell> | |||
foo (x:_) = bar x | foo (x:_) = bar x | ||
</haskell> | |||
or compare the following example using the advanced PatternGuards | or compare the following example using the advanced PatternGuards | ||
(http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#PATTERN-GUARDS) | (http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#PATTERN-GUARDS) | ||
<haskell> | |||
parseCmd ln | parseCmd ln | ||
| Left err <- parse cmd "Commands" ln | | Left err <- parse cmd "Commands" ln | ||
Line 438: | Line 438: | ||
| Right x <- parse cmd "Commands" ln | | Right x <- parse cmd "Commands" ln | ||
= x | = x | ||
</haskell> | |||
with this one with NoPatternGuards: | with this one with NoPatternGuards: | ||
<haskell> | |||
parseCmd ln = case parse cmd "Commands" ln of | parseCmd ln = case parse cmd "Commands" ln of | ||
Left err -> BadCmd $ unwords $ lines $ show err | Left err -> BadCmd $ unwords $ lines $ show err | ||
Right x -> x | Right x -> x | ||
</haskell> | |||
or, if you expect your readers to be familiar with the | or, if you expect your readers to be familiar with the <hask>either</hask> function: | ||
<haskell> | |||
parseCmd :: -- add an explicit type signature, as this is now a pattern binding | parseCmd :: -- add an explicit type signature, as this is now a pattern binding | ||
parseCmd = either (BadCmd . unwords . lines . show) id . parse cmd "Commands" | parseCmd = either (BadCmd . unwords . lines . show) id . parse cmd "Commands" | ||
</haskell> | |||
By the way, a compiler has also problems with numerical patterns. E.g. the pattern | By the way, a compiler has also problems with numerical patterns. E.g. the pattern <hask>0</hask> in fact means <hask>fromInteger 0</hask>, thus it involves a computation, which is uncommon for function parameter patterns. To illustrate this, consider the following example: | ||
<haskell> | |||
data Foo = Foo deriving (Eq, Show) | data Foo = Foo deriving (Eq, Show) | ||
Line 462: | Line 462: | ||
f 42 = True | f 42 = True | ||
f _ = False | f _ = False | ||
</haskell> | |||
<haskell> | |||
*Main> f 42 | *Main> f 42 | ||
*** Exception: forget it | *** Exception: forget it | ||
</haskell> | |||
Only use guards if you need to, in general you should stick to pattern matching whenever possible. | Only use guards if you need to, in general you should stick to pattern matching whenever possible. | ||
Line 473: | Line 473: | ||
== n+k patterns == | == n+k patterns == | ||
In order to allow pattern matching against numerical types, Haskell 98 provides so-called n+k patterns, as in | In order to allow pattern matching against numerical types, Haskell 98 provides so-called n+k patterns, as in | ||
<haskell> | |||
take :: Int -> [a] -> [a] | take :: Int -> [a] -> [a] | ||
take (n+1) (x:xs) = x: take n xs | take (n+1) (x:xs) = x: take n xs | ||
take _ _ = [] | take _ _ = [] | ||
</haskell> | |||
However, they are often critizised for hiding computational complexity and producing ambiguties, see ["ThingsToAvoid/Discussion"] for details. They are subsumed by the more general ["Views"] proposal, which was unfortunately never implemented despite being around for quite some time now. | However, they are often critizised for hiding computational complexity and producing ambiguties, see ["ThingsToAvoid/Discussion"] for details. They are subsumed by the more general ["Views"] proposal, which was unfortunately never implemented despite being around for quite some time now. | ||
Line 489: | Line 489: | ||
Don't write | Don't write | ||
<haskell> | |||
length x == 0 | length x == 0 | ||
</haskell> | |||
to find out if the list | to find out if the list <hask>x</hask> is empty. | ||
If you write it, you force Haskell to create all list nodes. It fails on an infinite list although the expression should be evaluated to | If you write it, you force Haskell to create all list nodes. It fails on an infinite list although the expression should be evaluated to <hask>False</hask> in this case. (Nevertheless the content of the list elements may not be evaluated.) | ||
In contrast | In contrast | ||
<haskell> | |||
x == [] | x == [] | ||
</haskell> | |||
is faster but it requires the list | is faster but it requires the list <hask>x</hask> to be of type <hask>[a]</hask> where <hask>a</hask> is a type of class <hask>Eq</hask>. | ||
The best to do is | The best to do is | ||
<haskell> | |||
null x | null x | ||
</haskell> | |||
Additionally, many uses of the length function can be replaced with an | Additionally, many uses of the length function can be replaced with an <hask>atLeast</hask> function that only checks to see that a list is greater than the required minimum length. | ||
<haskell> | |||
atLeast :: Int -> [a] -> Bool | atLeast :: Int -> [a] -> Bool | ||
atLeast 0 _ = True | atLeast 0 _ = True | ||
atLeast _ [] = False | atLeast _ [] = False | ||
atLeast n (_:ys) = atLeast (n-1) ys | atLeast n (_:ys) = atLeast (n-1) ys | ||
</haskell> | |||
or non-recursive, but less efficient because both | or non-recursive, but less efficient because both <hask>length</hask> and <hask>take</hask> must count | ||
<haskell> | |||
atLeast :: Int -> [a] -> Bool | atLeast :: Int -> [a] -> Bool | ||
atLeast n x = n == length (take n x) | atLeast n x = n == length (take n x) | ||
</haskell> | |||
or non-recursive but fairly efficient | or non-recursive but fairly efficient | ||
<haskell> | |||
atLeast :: Int -> [a] -> Bool | atLeast :: Int -> [a] -> Bool | ||
atLeast n = | atLeast n = | ||
Line 525: | Line 525: | ||
then not . null . drop (n-1) | then not . null . drop (n-1) | ||
else const True | else const True | ||
</haskell> | |||
or | or | ||
<haskell> | |||
atLeast :: Int -> [a] -> Bool | atLeast :: Int -> [a] -> Bool | ||
atLeast 0 = const True | atLeast 0 = const True | ||
atLeast n = not . null . drop (n-1) | atLeast n = not . null . drop (n-1) | ||
</haskell> | |||
The same problem arises if you want to shorten a list to the length of another one by | The same problem arises if you want to shorten a list to the length of another one by | ||
<haskell> | |||
take (length x) y | take (length x) y | ||
</haskell> | |||
since this is inefficient for large lists | since this is inefficient for large lists <hask>x</hask> and fails for infinite ones. | ||
But this can be useful to extract a finite prefix from an infinite list. | But this can be useful to extract a finite prefix from an infinite list. | ||
So, instead | So, instead | ||
<haskell> | |||
zipWith const y x | zipWith const y x | ||
</haskell> | |||
works well. | works well. | ||
It should be noted that | It should be noted that <hask>length</hask>, <hask>take</hask> and others wouldn't cause headache if they would count using PeanoNumbers as shown below. | ||
== Don't ask for the minimum if you don't need it == | == Don't ask for the minimum if you don't need it == | ||
The function | The function <hask>isLowerLimit</hask> checks if a number is a lower limit to a sequence. | ||
<haskell> | |||
isLowerLimit :: Ord a => a -> [a] -> Bool | isLowerLimit :: Ord a => a -> [a] -> Bool | ||
isLowerLimit x ys = x <= minimum ys | isLowerLimit x ys = x <= minimum ys | ||
</haskell> | |||
It fails definitely if | It fails definitely if <hask>ys</hask> is infinite. Is this a problem? | ||
Compare it with | Compare it with | ||
<haskell> | |||
isLowerLimit x = all (x<=) | isLowerLimit x = all (x<=) | ||
</haskell> | |||
This definition terminates for infinite lists, if | This definition terminates for infinite lists, if <hask>x</hask> is not a lower limit. It aborts immediately if an element is found which is below <hask>x</hask>. Thus it is also faster for finite lists. Even more: It works also for empty lists. | ||
Line 575: | Line 575: | ||
Lists are not arrays, so don't treat them as such. | Lists are not arrays, so don't treat them as such. | ||
Frequent use of | Frequent use of <hask>(!!)</hask> should alarm you. | ||
Accessing the | Accessing the <hask>n</hask>th list element | ||
requires to traverse through the first | requires to traverse through the first <hask>n</hask> nodes of the list. | ||
This is very inefficient. | This is very inefficient. | ||
If you access the elements progressively like in | If you access the elements progressively like in | ||
<haskell> | |||
[x !! i - i | i <- [0..n]] | [x !! i - i | i <- [0..n]] | ||
</haskell> | |||
you should try to get rid of indexing like in | you should try to get rid of indexing like in | ||
<haskell> | |||
zipWith (-) x [0..n] | zipWith (-) x [0..n] | ||
</haskell> | |||
. | . | ||
If you really need random access like in the Fourier Transform | If you really need random access like in the Fourier Transform | ||
you should switch to | you should switch to <hask>Array</hask>s. | ||
Line 599: | Line 599: | ||
and the order is irrelevant, | and the order is irrelevant, | ||
if you use list functions like | if you use list functions like | ||
<hask>sort</hask>, <hask>nub</hask>, <hask>union</hask>, <hask>elem</hask>, <hask>delete</hask>, <hask>(\\)</hask> | |||
frequently, | frequently, | ||
you should think about switching to sets. | you should think about switching to sets. | ||
If you need multi-sets, | If you need multi-sets, | ||
i.e. data sets with irrelevant order but multiple occurence of an object | i.e. data sets with irrelevant order but multiple occurence of an object | ||
you can use a | you can use a <hask>FiniteMap a Int</hask>. | ||
Line 616: | Line 616: | ||
=== Eq type class === | === Eq type class === | ||
When using functions like | When using functions like <hask>delete</hask>, <hask>(\\)</hask>, <hask>nub</hask>, and so on you should be aware that they need types of the <hask>Eq</hask> class. There are two problems: The routines might not work as expected if a processed list contains multiple equal elements and the element type of the list may not be comparable, like functions. | ||
Example: | Example: | ||
The following function takes the input list | The following function takes the input list <hask>xs</hask> and removes each element of <hask>xs</hask> once from <hask>xs</hask>. | ||
Clear what it does? No? The code is probably more understandable | Clear what it does? No? The code is probably more understandable | ||
<haskell> | |||
removeEach :: (Eq a) => [a] -> [[a]] | removeEach :: (Eq a) => [a] -> [[a]] | ||
removeEach xs = map (flip List.delete xs) xs | removeEach xs = map (flip List.delete xs) xs | ||
</haskell> | |||
but it should be replaced by | but it should be replaced by | ||
<haskell> | |||
removeEach :: [a] -> [[a]] | removeEach :: [a] -> [[a]] | ||
removeEach xs = | removeEach xs = | ||
zipWith (++) (List.inits xs) (tail (List.tails xs)) | zipWith (++) (List.inits xs) (tail (List.tails xs)) | ||
</haskell> | |||
since this works perfectly for function types | since this works perfectly for function types <hask>a</hask> and for equal elements in <hask>xs</hask>. | ||
Line 638: | Line 638: | ||
Before using integers for each and everything (C style) | Before using integers for each and everything (C style) | ||
think of more specialised types. | think of more specialised types. | ||
If only the values | If only the values <hask>0</hask> and <hask>1</hask> are of interest, | ||
try the type | try the type <hask>Bool</hask> instead. | ||
If there are more choices and numeric operations aren't needed | If there are more choices and numeric operations aren't needed | ||
try an enumeration. | try an enumeration. | ||
If an enumeration is not appropriate | If an enumeration is not appropriate | ||
you can define a | you can define a <hask>newtype</hask> carrying the type that is closest to what you need. | ||
Instead of | Instead of | ||
<haskell> | |||
type Weekday = Int | type Weekday = Int | ||
</haskell> | |||
write | write | ||
<haskell> | |||
data Weekday = Monday | data Weekday = Monday | ||
| Tuesday | | Tuesday | ||
Line 659: | Line 659: | ||
| Sunday | | Sunday | ||
deriving (Eq, Ord, Enum) | deriving (Eq, Ord, Enum) | ||
</haskell> | |||
It allows all sensible operations like | It allows all sensible operations like <hask>==</hask>, <hask><</hask>, <hask>succ</hask> and | ||
forbids all nonsensical ones like | forbids all nonsensical ones like <hask>+</hask>, <hask>*</hask>. | ||
You cannot accidentally mix up weekdays with numbers and | You cannot accidentally mix up weekdays with numbers and | ||
the signature of a function with weekday parameter clearly states what kind of data is expected. | the signature of a function with weekday parameter clearly states what kind of data is expected. | ||
Line 681: | Line 681: | ||
and output it by a short IO interaction. | and output it by a short IO interaction. | ||
<haskell> | |||
-- import Control.Monad (replicateM_) | -- import Control.Monad (replicateM_) | ||
replicateM_ 10 (putStr "foo") | replicateM_ 10 (putStr "foo") | ||
</haskell> | |||
is certainly worse than | is certainly worse than | ||
<haskell> | |||
putStr (concat $ replicate 10 "foo") | putStr (concat $ replicate 10 "foo") | ||
</haskell> | |||
<haskell> | |||
do | do | ||
h <- openFile "foo" WriteMode | h <- openFile "foo" WriteMode | ||
replicateM_ 10 (hPutStr h "bar") | replicateM_ 10 (hPutStr h "bar") | ||
hClose h | hClose h | ||
</haskell> | |||
can be shortened to | can be shortened to | ||
<haskell> | |||
writeFile "foo" (concat $ replicate 10 "bar") | writeFile "foo" (concat $ replicate 10 "bar") | ||
</haskell> | |||
which also safes you from proper closing of the handle | which also safes you from proper closing of the handle <hask>h</hask> | ||
in case of failure. | in case of failure. | ||
A function which computes a random value | A function which computes a random value | ||
with respect to a custom distribution | with respect to a custom distribution | ||
( | (<hask>distInv</hask> is the inverse of the distribution function) | ||
can be defined via IO | can be defined via IO | ||
<haskell> | |||
randomDist :: (Random a, Num a) => (a -> a) -> IO a | randomDist :: (Random a, Num a) => (a -> a) -> IO a | ||
randomDist distInv = liftM distInv (randomRIO (0,1)) | randomDist distInv = liftM distInv (randomRIO (0,1)) | ||
</haskell> | |||
but there is no need to do so. | but there is no need to do so. | ||
You don't need the state of the whole world | You don't need the state of the whole world | ||
just for remembering the state of a random number generator. | just for remembering the state of a random number generator. | ||
What about | What about | ||
<haskell> | |||
randomDist :: (RandomGen g, Random a, Num a) => (a -> a) -> State g a | randomDist :: (RandomGen g, Random a, Num a) => (a -> a) -> State g a | ||
randomDist distInv = liftM distInv (State (randomR (0,1))) | randomDist distInv = liftM distInv (State (randomR (0,1))) | ||
</haskell> | |||
? | ? | ||
Line 724: | Line 724: | ||
They complicate handling of negative dividends. | They complicate handling of negative dividends. | ||
<hask>div</hask> and <hask>mod</hask> are almost always the better choice. | |||
If | If <hask>b>0</hask> then it always holds | ||
<haskell> | |||
a == b * div a b + mod a b | a == b * div a b + mod a b | ||
mod a b < b | mod a b < b | ||
mod a b >= 0 | mod a b >= 0 | ||
</haskell> | |||
The first equation is true also for | The first equation is true also for <hask>quot</hask> and <hask>rem</hask>, | ||
but the two others are true only for | but the two others are true only for <hask>mod</hask>, but not for <hask>rem</hask>. | ||
That is, | That is, <hask>mod a b</hask> always wraps <hask>a</hask> to an element from <hask>[0..(b-1)]</hask>, | ||
whereas the sign of | whereas the sign of <hask>rem a b</hask> depends on the sign of <hask>a</hask>. | ||
This seems to be more an issue of experience rather than one of a superior reason. | This seems to be more an issue of experience rather than one of a superior reason. | ||
You might argue, that the sign of the dividend is more important for you, than that of the divisor. | You might argue, that the sign of the dividend is more important for you, than that of the divisor. | ||
However, I have never seen such an application, | However, I have never seen such an application, | ||
but many uses of | but many uses of <hask>quot</hask> and <hask>rem</hask> where <hask>div</hask> and <hask>mod</hask> were clearly superior. | ||
Examples: | Examples: | ||
* Conversion from a continuously counted tone pitch to the pitch class, like C, D, E etc.: | * Conversion from a continuously counted tone pitch to the pitch class, like C, D, E etc.: <hask>mod p 12</hask> | ||
* Conversion from a day counter to a week day: | * Conversion from a day counter to a week day: <hask>mod n 7</hask> | ||
* Pacman runs out of the screen and re-appears at the opposite border: | * Pacman runs out of the screen and re-appears at the opposite border: <hask>mod x screenWidth</hask> | ||
[[Category: | [[Category:Style]] |
Revision as of 13:12, 13 October 2006
Preface
This page shows several examples of how code can be improved. We try to derive general rules from them, though they can not be applied deterministicly and are a matter of taste. We all know that, please don't add "this is disputable" to each item!
Instead, you can now add "this is disputable" on ["ThingsToAvoid/Discussion"] and change this page only when some sort of consensus is reached.
Be concise
Don't reinvent the wheel
The standard libraries are full of useful functions, possibly too full. If you rewrite an existing function, the reader wonders what the difference to the standard function is. But if you use a standard function, the reader may learn something new and useful. If you have problems finding an appropriate list function, try this guide:
http://www.cs.chalmers.se/Cs/Grundutb/Kurser/d1pt/d1pta/ListDoc/
Avoid explicit recursion
Explicit recursion is not generally bad, but you should spend some time on trying to find a more declarative implementation using higher order functions.
Don't define
raise :: Num a => a -> [a] -> [a]
raise _ [] = []
raise x (y:ys) = x+y : raise x ys
because it is hard for the reader to find out, how much of the list is processed and on which values the elements of the output list depend. Just write
raise x ys = map (x+) ys
or even
raise x = map (x+)
and the reader knows that the complete list is processed and that each output element depends only on the corresponding input element.
If you don't find appropriate functions in the standard library, extract a general function. This helps you and others understanding the program. Haskell is very good at factoring out parts of the code. If you find it very general, put it in a separate module and re-use it. It may appear in the standard libraries later, or you may later find that it is already there.
Decomposing a problem this way has also the advantage that you can debug easier. If the last implementation of raise
does not show the expected behaviour, you can inspect map
(I hope it is correct :-) ) and the invoked instance of (+)
separately.
Could this be stated more generally? It seems to me this is a special case of the general principle of separating concerns: iteration over a collection vs operating on elements of a collection should apply. If you can write the loop over a data structure (list, tree, whatever) once and debug it, then you don't need to duplicate that code over and over (at least in haskell), so your code can follow the principle of Wiki:OnceAndOnlyOnce ; Wiki:OnceAndOnlyOnce is a lot harder in languages that don't provide a certain level of functional programming support (i.e. Java requires copy and paste programming, the delegate C# syntax is clumsy but workable - using it is almost Wiki:GoldPlating).
Another example:
The function count
counts the number of elements
which fulfill a certain property,
i.e. the elements for which the predicate p
is True
.
I found the following code (but convoluted in a more specific function) in a Haskell program
count :: (a -> Bool) -> [a] -> Int
count _ [] = 0
count p (x:xs)
| p x = 1 + count p xs
| otherwise = count p xs
which you won't like any longer if you become aware of
count p = length . filter p
.
Only introduce identifiers you need
Here is some advice that is useful for every language, including scientific prose
(http://www.cs.utexas.edu/users/EWD/transcriptions/EWD09xx/EWD993.html):
Introduce only identifiers you use.
The compiler will check that you if you pass an option like -Wall
for GHC.
In an expression like
[a | i <- [1..m]]
where a
might be a horrible complex expression it is not easy to see,
that a
really does not depend on i
.
replicate m a
is certainly better here.
Remember the zero
Don't forget that zero is a natural number. Recursive definitions become more complicated if the recursion anchor is not chosen properly. As an example I have chosen the function tupel
presented in DMV-Mitteilungen 2004/12-3, Jürgen Bokowski: Haskell, ein gutes Werkzeug der Diskreten Mathematik (Haskell, a good tool for discrete mathematics). It is also a good example of how to avoid guards.
tuples :: Int -> [a] -> [[a]]
tuples r l
| r == 1 = [[el] | el <- l]
| length l == r = [l]
| otherwise = (map ([head l] ++) (tuples (r-1) (tail l)))
++ tuples r (tail l)
Do you have an idea what it does?
Let's strip the guards and forget about list comprehension.
tuples :: Int -> [a] -> [[a]]
tuples 1 l = map (:[]) l
tuples r l =
if r == length l
then [l]
else
let t = tail l
in map (head l :) (tuples (r-1) t)
++ tuples r t
What about tuples with zero elements? We can add the pattern
tuples 0 _ = [[]]
but then we can also omit the pattern for 1-tuples.
tuples :: Int -> [a] -> [[a]]
tuples 0 _ = [[]]
tuples r l =
if r == length l
then [l]
else
let t = tail l
in map (head l :) (tuples (r-1) t)
++ tuples r t
What about the case r > length l
? Sure, no reason to let head
fail - in that case there is no tuple, thus we return an empty list. Again, this saves us one special case.
tuples :: Int -> [a] -> [[a]]
tuples 0 _ = [[]]
tuples r l =
if r > length l
then []
else
let t = tail l
in map (head l :) (tuples (r-1) t)
++ tuples r t
We have learnt above that length
is evil! What about
tuples :: Int -> [a] -> [[a]]
tuples 0 _ = [[]]
tuples _ [] = []
tuples r (x:xs) =
map (x :) (tuples (r-1) xs)
++ tuples r xs
? It is no longer necessary to compute the length of l
again and again. The code is easier to read and it covers all special cases, including tuples (-1) [1,2,3]
!
You can even save one direction of recursion
by explicit computation of the list of all suffixes provided by tails
.
You can do this with do notation
tuples :: Int -> [a] -> [[a]]
tuples 0 _ = [[]]
tuples r xs = do
y:ys <- tails xs
map (y:) (tuples (r-1) ys)
Since (=<<) in the list monad is concatMap, we can also write this as follows.
Where in the previous version the pattern y:ys
filtered out the last empty suffix
we have to do this manually now with init
.
tuples :: Int -> [a] -> [[a]]
tuples 0 _ = [[]]
tuples r xs =
concatMap (\(y:ys) -> map (y:) (tuples (r-1) ys))
(init (tails xs))
The list of all suffixes could be generated with iterate tail
but this ends with a "Prelude.tail: empty list".
tails
generates the suffixes in the same order but aborts properly.
More generally, BaseCasesAndIdentities
Don't overuse lambdas
Like explicit recursion, using explicit lambdas isn't a universally bad idea, but a better solution often exists. For example, Haskell is quite good at currying. Don't write
zipWith (\x y -> f x y)
map (\x -> x + 42)
instead, write
zipWith f
map (+42)
also, instead of writing
-- sort a list of strings case insensitively
sortBy (\x y -> compare (map toLower x) (map toLower y))
write
comparing p x y = compare (p x) (p y)
sortBy (comparing (map toLower))
which is both clearer and re-usable.
Actually, in a future version of GHC you may not even have to define comparing
, as it's already defined in Data.Ord in the CVS version.
(Just a remark for this special example: We can avoid multiple evaluations of the conversions.
sortKey :: (Ord b) => (a -> b) -> [a] -> [a]
sortKey f x = map snd (sortBy (comparing fst) (zip (map f x) x))
)
As a rule of thumb, once your expression becomes too long to easily be point-freed, it probably deserves a name anyway. Lambdas are occasionally appropriate however, e.g. for control structures in monadic code (in this example, a control-structure "foreach2" which most languages don't even support.):
foreach2 xs ys f = zipWithM_ f xs ys
linify :: [String] -> IO ()
linify lines
= foreach2 [1..] lines $ \lineNr line -> do
unless (null line) $
putStrLn $ shows lineNr $ showString ": " $ show line
Bool is a regular type
Logic expressions are not restricted to guards and if
statements.
Avoid verbosity like in
isEven n
| mod n 2 == 0 = True
| otherwise = False
since it is the same as
isEven n = mod n 2 == 0
.
Use syntactic sugar wisely
People who employ SyntacticSugar extensively argue that their code becomes more readable by it. The following sections show several examples where less syntactic sugar is more readable.
It is argued that a special notation is often more intuitive than a purely functional expression. But the term "intuitive notation" is always a matter of habit. You can also develop an intuition for analytic expressions that don't match your habits at the first glance. So why not making a habit of less sugar sometimes?
list comprehension
List comprehension let you remain in imperative thinking, that is it let you think in variables rather than transformations. Open your mind, discover the flavour of the PointFreeStyle!
Instead of
[toUpper c | c <- s]
write
map toUpper s
.
Consider
[toUpper c | s <- strings, c <- s]
where it takes some time for the reader
to find out which value depends on what other value
and it is not so clear how many times
the interim values s
and c
are used.
In contrast to that
map toUpper (concat strings)
can't be clearer.
When using higher order functions you can switch easier to
data structures different from List
.
Compare
map (1+) list
and
mapSet (1+) set
.
If there would be a standard instance for the Functor
class
you could use the code
fmap (1+) pool
for both choices.
If you are not used to higher order functions for list processing
you feel like needing parallel list comprehension.
This is unfortunately supported by GHC now,
but somehow superfluous since various flavours of zip
already do a great job.
do notation
Do notation is useful to express the imperative nature (e.g. a hidden state or an order of execution) of a piece of code.
Nevertheless it's sometimes useful to remember that the do
notation is explained in terms of functions.
Instead of
do
text <- readFile "foo"
writeFile "bar" text
one can write
readFile "foo" >>= writeFile "bar"
.
The code
do
text <- readFile "foo"
return text
can be simplified to
readFile "foo"
by a law that each Monad must fulfill.
You certainly also agree that
do
text <- readFile "foobar"
return (lines text)
is more complicated than
liftM lines (readFile "foobar")
.
Btw. in the case of IO
monad the Functor
class method fmap
and the Monad
based function liftM
are the same.
Be aware that "more complicated" does not imply "worse". If your do-expression was longer than this, then mixing do-notation and fmap
might be precisely the wrong thing to do, because it adds one more thing to think about. Be natural. Only change it if you gain something by changing it. -- AndrewBromage
Guards
Guards look like
-- Bad implementation:
fac :: Integer -> Integer
fac n | n == 0 = 1
| n /= 0 = n * fac (n-1)
which implements a factorial function. This example, like a lot of uses of guards, has a number of problems.
The first problem is that it's nearly impossible for the compiler to check if guards like this are exhaustive, as the guard conditions may be arbitrarily complex (Ghc will warn you if you use the -Wall
option). To avoid this problem and potential bugs through non exhaustive patterns you should use an otherwise
guard, that will match for all remaining cases:
-- Slightly improved implementation:
fac :: Integer -> Integer
fac n | n == 0 = 1
| otherwise = n * fac (n-1)
Another reason to prefer this one is its greater readability for humans and optimizability for compilers. Though it may not matter much in a simple case like this, when seeing an otherwise
it's immediately clear that it's used whenever the previous guard fails, which isn't true if the "negation of the previous test" is spelled out. The same applies to the compiler: It probably will be able to optimize an otherwise
(which is a synonym for True
) away but cannot do that for most expressions.
This can be done with even less sugar using if
,
-- Less sugar (though the verbosity of if-then-else can also be considered as sugar :-)
fac :: Integer -> Integer
fac n = if n == 0
then 1
else n * fac (n-1)
Note that if
has its own set of problems, for example in connection with the layout rule or that nested if
s are difficult to read. See ["Case"] how to avoid nested if
s.
But in this special case, the same can be done even more easily with pattern matching:
-- Good implementation:
fac :: Integer -> Integer
fac 0 = 1
fac n = n * fac (n-1)
Actually, in this case there is an even more easier to read version, which (see above) doesn't use Explicit Recursion:
-- Excellent implementation:
fac :: Integer -> Integer
fac n = product [1..n]
This may also be more efficient as product
might be optimized by the library-writer... In GHC, when compiling with optimizations turned on, this version runs in O(1) stack-space, whereas the previous versions run in O(n) stack-space.
Note however, that there is a difference between this version and the previous ones: When given a negative number, the previous versions do not terminate (until StackOverflow-time), while the last implemenation returns 1.
Guards don't always make code clearer.
Compare
foo xs | not (null xs) = bar (head xs)
and
foo (x:_) = bar x
or compare the following example using the advanced PatternGuards (http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#PATTERN-GUARDS)
parseCmd ln
| Left err <- parse cmd "Commands" ln
= BadCmd $ unwords $ lines $ show err
| Right x <- parse cmd "Commands" ln
= x
with this one with NoPatternGuards:
parseCmd ln = case parse cmd "Commands" ln of
Left err -> BadCmd $ unwords $ lines $ show err
Right x -> x
or, if you expect your readers to be familiar with the either
function:
parseCmd :: -- add an explicit type signature, as this is now a pattern binding
parseCmd = either (BadCmd . unwords . lines . show) id . parse cmd "Commands"
By the way, a compiler has also problems with numerical patterns. E.g. the pattern 0
in fact means fromInteger 0
, thus it involves a computation, which is uncommon for function parameter patterns. To illustrate this, consider the following example:
data Foo = Foo deriving (Eq, Show)
instance Num Foo where
fromInteger = error "forget it"
f :: Foo -> Bool
f 42 = True
f _ = False
*Main> f 42
*** Exception: forget it
Only use guards if you need to, in general you should stick to pattern matching whenever possible.
n+k patterns
In order to allow pattern matching against numerical types, Haskell 98 provides so-called n+k patterns, as in
take :: Int -> [a] -> [a]
take (n+1) (x:xs) = x: take n xs
take _ _ = []
However, they are often critizised for hiding computational complexity and producing ambiguties, see ["ThingsToAvoid/Discussion"] for details. They are subsumed by the more general ["Views"] proposal, which was unfortunately never implemented despite being around for quite some time now.
Efficiency and infinity
A rule of thumb is: If a function makes sense for an infinite data structure but the implementation at hand fails for an infinite amount of data, then the implementation is probably inefficient also for finite data.
Don't ask for the length of a list, if you don't need it
Don't write
length x == 0
to find out if the list x
is empty.
If you write it, you force Haskell to create all list nodes. It fails on an infinite list although the expression should be evaluated to False
in this case. (Nevertheless the content of the list elements may not be evaluated.)
In contrast
x == []
is faster but it requires the list x
to be of type [a]
where a
is a type of class Eq
.
The best to do is
null x
Additionally, many uses of the length function can be replaced with an atLeast
function that only checks to see that a list is greater than the required minimum length.
atLeast :: Int -> [a] -> Bool
atLeast 0 _ = True
atLeast _ [] = False
atLeast n (_:ys) = atLeast (n-1) ys
or non-recursive, but less efficient because both length
and take
must count
atLeast :: Int -> [a] -> Bool
atLeast n x = n == length (take n x)
or non-recursive but fairly efficient
atLeast :: Int -> [a] -> Bool
atLeast n =
if n>0
then not . null . drop (n-1)
else const True
or
atLeast :: Int -> [a] -> Bool
atLeast 0 = const True
atLeast n = not . null . drop (n-1)
The same problem arises if you want to shorten a list to the length of another one by
take (length x) y
since this is inefficient for large lists x
and fails for infinite ones.
But this can be useful to extract a finite prefix from an infinite list.
So, instead
zipWith const y x
works well.
It should be noted that length
, take
and others wouldn't cause headache if they would count using PeanoNumbers as shown below.
Don't ask for the minimum if you don't need it
The function isLowerLimit
checks if a number is a lower limit to a sequence.
isLowerLimit :: Ord a => a -> [a] -> Bool
isLowerLimit x ys = x <= minimum ys
It fails definitely if ys
is infinite. Is this a problem?
Compare it with
isLowerLimit x = all (x<=)
This definition terminates for infinite lists, if x
is not a lower limit. It aborts immediately if an element is found which is below x
. Thus it is also faster for finite lists. Even more: It works also for empty lists.
Choose the right fold
See StackOverflow for advice on which fold is appropriate for your situation.
Choose types properly
Lists are not good for everything
Lists are not arrays
Lists are not arrays, so don't treat them as such.
Frequent use of (!!)
should alarm you.
Accessing the n
th list element
requires to traverse through the first n
nodes of the list.
This is very inefficient.
If you access the elements progressively like in
[x !! i - i | i <- [0..n]]
you should try to get rid of indexing like in
zipWith (-) x [0..n]
.
If you really need random access like in the Fourier Transform
you should switch to Array
s.
Lists are not sets
If you manage data sets where each object can occur only once
and the order is irrelevant,
if you use list functions like
sort
, nub
, union
, elem
, delete
, (\\)
frequently,
you should think about switching to sets.
If you need multi-sets,
i.e. data sets with irrelevant order but multiple occurence of an object
you can use a FiniteMap a Int
.
Lists are not finite maps
Similarly, lists are not finite maps, as mentioned on EfficiencyHints.
Reduce type class constraints
Eq type class
When using functions like delete
, (\\)
, nub
, and so on you should be aware that they need types of the Eq
class. There are two problems: The routines might not work as expected if a processed list contains multiple equal elements and the element type of the list may not be comparable, like functions.
Example:
The following function takes the input list xs
and removes each element of xs
once from xs
.
Clear what it does? No? The code is probably more understandable
removeEach :: (Eq a) => [a] -> [[a]]
removeEach xs = map (flip List.delete xs) xs
but it should be replaced by
removeEach :: [a] -> [[a]]
removeEach xs =
zipWith (++) (List.inits xs) (tail (List.tails xs))
since this works perfectly for function types a
and for equal elements in xs
.
Don't use Int if you don't consider integers
Before using integers for each and everything (C style)
think of more specialised types.
If only the values 0
and 1
are of interest,
try the type Bool
instead.
If there are more choices and numeric operations aren't needed
try an enumeration.
If an enumeration is not appropriate
you can define a newtype
carrying the type that is closest to what you need.
Instead of
type Weekday = Int
write
data Weekday = Monday
| Tuesday
| Wednesday
| Thursday
| Friday
| Saturday
| Sunday
deriving (Eq, Ord, Enum)
It allows all sensible operations like ==
, <
, succ
and
forbids all nonsensical ones like +
, *
.
You cannot accidentally mix up weekdays with numbers and
the signature of a function with weekday parameter clearly states what kind of data is expected.
Miscellaneous
Separate IO and data processing
It's not good to use the IO Monad everywhere, much of the data processing can be done without IO interaction. You should separate data processing and IO because pure data processing can be done purely functionally, that is you don't have to specify an order of execution and you don't have to worry about what computations are actually necessary. You can easily benefit from lazy evaluation if you process data purely functionally and output it by a short IO interaction.
-- import Control.Monad (replicateM_)
replicateM_ 10 (putStr "foo")
is certainly worse than
putStr (concat $ replicate 10 "foo")
do
h <- openFile "foo" WriteMode
replicateM_ 10 (hPutStr h "bar")
hClose h
can be shortened to
writeFile "foo" (concat $ replicate 10 "bar")
which also safes you from proper closing of the handle h
in case of failure.
A function which computes a random value
with respect to a custom distribution
(distInv
is the inverse of the distribution function)
can be defined via IO
randomDist :: (Random a, Num a) => (a -> a) -> IO a
randomDist distInv = liftM distInv (randomRIO (0,1))
but there is no need to do so. You don't need the state of the whole world just for remembering the state of a random number generator. What about
randomDist :: (RandomGen g, Random a, Num a) => (a -> a) -> State g a
randomDist distInv = liftM distInv (State (randomR (0,1)))
?
Forget about quot and rem
They complicate handling of negative dividends.
div
and mod
are almost always the better choice.
If b>0
then it always holds
a == b * div a b + mod a b
mod a b < b
mod a b >= 0
The first equation is true also for quot
and rem
,
but the two others are true only for mod
, but not for rem
.
That is, mod a b
always wraps a
to an element from [0..(b-1)]
,
whereas the sign of rem a b
depends on the sign of a
.
This seems to be more an issue of experience rather than one of a superior reason.
You might argue, that the sign of the dividend is more important for you, than that of the divisor.
However, I have never seen such an application,
but many uses of quot
and rem
where div
and mod
were clearly superior.
Examples:
- Conversion from a continuously counted tone pitch to the pitch class, like C, D, E etc.:
mod p 12
- Conversion from a day counter to a week day:
mod n 7
- Pacman runs out of the screen and re-appears at the opposite border:
mod x screenWidth