Difference between revisions of "Haskell programming tips/Discussion"

Revision as of 00:55, 11 November 2008

About

This page is meant for discussions about ThingsToAvoid, as concensus seems to be difficult to reach, and it'd be nice if newbies wouldn't bump into all kinds of Holy Wars during their first day of using Haskell ;)

You may want to add your name to your comments to make it easier to refer to your words and ridicule you in public.

Flame Away

Avoid recursion

Many times explicit recursion is the fastest way to implement a loop. e.g.

loop 0 _ acc = acc
loop i v acc = ...

Using HOFs is more elegant, but makes it harder to reason about space usage, also explicit recursion does not make the code hard to read - just explicit about what it is doing.

-- EinarKarttunen

I disagree with this. Sometimes explicit recursion is simpler to design, but I don't see how it makes space usage any easier to reason about and can see how it makes it harder. By using combinators you only have to know the properties of the combinator to know how it behaves, whereas I have to reanalyze each explicitly implemented function. StackOverflow gives a good example of this for stack usage and folds. As far as being "faster" I have no idea what the basis for that is; most likely GHC would inline into the recursive version anyways, and using higher-order list combinators makes deforesting easier. At any rate, if using combinators makes it easier to correctly implement the function, then that should be the overriding concern.

-- DerekElkins

I read lots of code with recursion -- and it was hard to read, because it is hard to retrieve the data flow from it. -- HenningThielemann

IMO explicit recursion usually does make code harder to read, as it's trying to do two things at once: Recursing and performing the actual work it's supposed to do. Phrases like OnceAndOnlyOnce and SeparationOfConcerns come to the mind. However, the concern about efficiency is (partly) justified. HOFs defined for certain recursion patterns often need additional care to achieve the same performance as functions using explicit recursion. As an example, in the following code, two sum functions are defined using two equivalent left folds, but only one of the folds is exported. Due to various peculiarities of GHCs strictness analyzer, simplifier etc, the call from main to mysum_2 works, yet the call to mysum_1 fails with a stack-overflow.

module Foo (myfoldl_1, mysum_1, mysum_2) where

-- exported
myfoldl_1 f z xs = fold z xs
    where
        fold z []       = z
        fold z (x:xs)   = fold (f z x) xs

-- not exported
myfoldl_2 f z xs = fold z xs
    where
        fold z []       = z
        fold z (x:xs)   = fold (f z x) xs

mysum_1 = myfoldl_1 (+) 0
mysum_2 = myfoldl_2 (+) 0

module Main where

import Foo

xs = [1..1000*1000]

main = do
    print (mysum_2 xs)
    print (mysum_1 xs)

(Results might differ for your particular GHC version, of course...) -- RemiTurk

GHC made "broken" code work. As covered in StackOverflow, foldl is simply not tail-recursive in a non-strict language. Writing out mysum would still be broken. The problem here isn't the use of a HOF, but simply the use of non-tail-recursive function. The only "care" needed here is not relying on compiler optimizations (the code doesn't work in my version of GHC) or the care needed when relying on compiler optimizations. Heck, the potential failure of inlining (and subsequent optimizations following from it) could be handled by restating recursion combinator definitions in each module that uses them; this would still be better than explicit recursion which essentially restates the definition for each expression that uses it.

-- DerekElkins

Here is a demonstration of the problem - with the classic sum as the problem. Of course microbenchmarking has little sense, but it tells us a little bit which combinator should be used.

import Data.List
import System

sum' :: Int -> Int -> Int

sum' 0 n = sum [1..n]
sum' 1 n = foldl (\a e -> a+e) 0 [1..n]
sum' 2 n = foldl (\a e -> let v = a+e in v `seq` v) 0 [1..n]
sum' 3 n = foldr (\a e -> a+e) 0 [1..n]
sum' 4 n = foldr (\a e -> let v = a+e in v `seq` v) 0 [1..n]
sum' 5 n = foldl' (\a e -> a+e) 0 [1..n]
sum' 6 n = foldl' (\a e -> let v = a+e in v `seq` v) 0 [1..n]
sum' 7 n = loop n 0
    where loop 0 acc = acc
          loop n acc = loop (n-1) (n+acc)
sum' 8 n = loop n 0
    where loop 0 acc = acc
          loop n acc = loop (n-1) $! n+acc


main = do [v,n] <- getArgs
          print $ sum' (read v) (read n)

When executing with n = 1000000 it produces the following results:

* seq does not affect performance - as excepted.
* foldr overflows stack - as excepted.
* explicit loop takes 0.006s
* foldl takes 0.040s
* foldl' takes 0.080s

In this case the "correct" choice would be foldl' - ten times slower than explicit recursion. This is not to say that using a fold would not be better for most code. Just that it can have subtle evil effects in inner loops.

-- EinarKarttunen

This is ridiculous. The "explicit recursion" version is not the explicit recursion version of the foldl' version. Here is another set of programs and the results I get:

import Data.List
import System

paraNat :: (Int -> a -> a) -> a -> Int -> a
paraNat s = fold
    where fold z 0 = z
          fold z n = (fold $! s n z) (n-1)

localFoldl' c = fold
    where fold n [] = n
          fold n (x:xs) = (fold $! c n x) xs

sumFoldl' :: Int -> Int
sumFoldl' n = foldl' (+) 0 [1..n]

sumLocalFoldl' :: Int -> Int
sumLocalFoldl' n = localFoldl' (+) 0 [1..n]

sumParaNat :: Int -> Int
sumParaNat n = paraNat (+) 0 n

sumRecursionNat :: Int -> Int
sumRecursionNat n = loop n 0
    where loop 0 acc = acc
          loop n acc = loop (n-1) $! n+acc

sumRecursionList :: Int -> Int
sumRecursionList n = loop [1..n] 0
    where loop [] acc = acc
          loop (n:ns) acc = loop ns $! n+acc

main = do
    [v,n] <- getArgs
    case v of
        "1" -> print (sumFoldl' (read n))
        "2" -> print (sumLocalFoldl' (read n))
        "3" -> print (sumParaNat (read n))
        "4" -> print (sumRecursionNat (read n))
        "5" -> print (sumRecursionList (read n))

(best but typical real times according to time of a few trials each)

sumFoldl' takes 2.872s
sumLocalFoldl' takes 1.683s
sumParaNat takes 0.212s
sumRecursionNat takes 0.213s
sumRecursionList takes 1.669s

sumLocalFoldl' and sumRecursionList were practically identical in performance and sumParaNat and sumRecursionNat were practically identical in performance. All that's demonstrated is the cost of detouring through lists (and the cost of module boundaries I guess).

-- DerekElkins

n+k patterns

n+k patterns are similar to the definition of infix functions, thus they make it harder to understand patterns. http://www.dcs.gla.ac.uk/mail-www/haskell/msg01131.html (Why I hate n+k)

So far I have seen only one rule for Good Coding Practice in Haskell: Do Not Use n+k Patterns. I hope someone can give some directions, how to avoid known pitfalls (especially Space Leaks). -- On the haskell mailing list

The most natural definition of many functions on the natural numbers is by induction, a fact that can very nicely be expressed with the (n+1)-pattern notation. Also, (n+k)-patterns are unlikely to produce space leaks, since if anything, they make the function stricter. The possible ambiguities don't seem to appear in real code. --ThomasJäger

If natural numbers would be defined by PeanoNumbers then pattern matching on successors would be straightforward. This would be fairly slow and space consuming, that's why natural numbers are not implemented this way. They are implemented using binary numbers and it is not even tried to simulate the behaviour of Natural (e.g. laziness). Thus I wouldn't state, that 3 matches the pattern 2+1. -- HenningThielemann

Lazyness/Strictness isn't really an argument in this situation, since when using a strict natural type, e.g.

data Nat = Zero | Succ !Nat

pattern matching on Nat behaves exactly like n+1 patterns. -- ThomasJaeger

n+k patterns also apply to negative numbers - don't they? Yes, I see the analogy but in the current implementation it's nothing than sugar. -- HenningThielemann

No, they don't. `let f (n+2) = n in f 1` is a runtime error. -- DerekElkins

But translating it into pattern matching is impossible, thus it must be a static error. -- HenningThielemann

Use syntactic sugar wisely

I have to say, i strongly disagree with most of what is said in this section. First of all the claim

Syntactic extensions make source code processors complicated and error prone. But they don't help to make programs safer (like type checks and contracts) or easier to maintain (like modularization and scoping).

is obviously wrong. There certainly are applications of syntatic sugar that make programs easier to read, therefore easier to understand, easier to maintain, and safer, as you are more likely to spot bugs.

My statement was: Don't use syntactic sugar by default because you believe it makes your program more readable automatically (I've read lots of code of programmers who seem to believe that), but use syntactic sugar if (and only if) it makes the program more readable. Syntactic sugar is only a matter of readability not of safety in the sense of scoping and type checking. If I accidentally introduce inconsistencies into my code, the name resolver or the type checker will report problems, but not the de-sugarizer. -- HenningThielemann

ad. right sections are evil

I can't believe someone is seriously advocating to replace (&$ x) with the less readable flip (&$) x just because that's the way one specific haskell implementation is reporting errors. Most of the time, I don't even care to read type errors because once I know which line the error occured in, it is usually immediately clear what the error was. Even if you have to read the error message, there should be no difficulty to see that (+1) and flip (+) 1 is the same thing, especially when used in a context.

Nobody advocated for replacing ($ x) by flip ($) x this was just to demonstrate the problems arising with syntactic sugar. I believe that many people don't take that into account when requesting more sugar (such as parallel list comprehension). -- HenningThielemann

Infix notation is problematic for both human readers and source code formatters.

No, infix notation isn't problematic for human readers, it enables them to read the code faster in many cases.

... if he knows the precedences, which is only true for (some) of the predefined operators. I guess you don't know all of the precedences of the Prelude operators. I also use infix operations like (+) and (:) but I'm very concerned with introducing lots of new operator symbols and `f` notation. -- HenningThielemann

Introducing new operators should definitely not be done careless (then again, one shouldn't be careless in programming anyway), and operator percedences might be better defined as a partial order. (e.g. there is an order between (+) and (*), and between (&&) and (||), but not between (+) and (&&)). Other proposals for replacing the current left/right associative + precedence system do exist. However, doing away with infix operators entirely appears to me to practically render combinator libraries unusable, which would make Haskell a lot less attractive. -- RemiTurk

The nice thing about precedences in Haskell is that it's often not necessary to know them exactly in order

to use them. If the types of you operators are sufficiently general, or sufficiently distinct, only the sight way to parse them will lead to type-checking code, so you can just give it a shot without parenthesis and hopefully remember the precedence the next time you're in a similar situation. -- ThomasJaeger

(+1) `fmap` m expresses that the fmap is just a fancy function application, when writing interpreter-like code Var "f" `App` Var "x" `App` "y" is certainly more readable and can be written much faster than App (App (Var "f") (Var "x")) (Var "y")) and our brains have been trained to parse + and * with different precendences since elementary school.

If you make it for readability, I agree, if you make it for fancyness, I disagree. In the case of App it looks like the list can become longer, so it's worth of thinking about using foldl, App and a list - though then I would certainly also use some syntactic sugar to enter the list ["f", "x", "y"]. Btw. even the regular list notation is disputable since in the infix notation ("f":"x":"y":[]) it is easier move elements around. So I end up with infix notation, ok. Even more, foldl shows clearly the left associativity, whereas `App` does not.

In the case of `on` and compose2, I disagree. on is less informative than compose2. And yes, compose2 is a kind of generalization of ., though it is not the most popular one. -- HenningThielemann

You do indeed have a point there: it's indeed an extension of (.), which I incorrectly denied. However, as it's not the extension, and AFAIK not even the most used extension, I consider the name compose2 to be slightly misleading. In addition, I think "group by equality on isAlpha" (groupBy ((==) `on` isAlpha)) is just too cute too resist. -- RemiTurk

So, do you do it for readability or for fanciness? I find an infix function application hard to read, since it looks like just another argument. That is, ((==) `on` isAlpha) reads very similar to ((==) on isAlpha). In your example you really switch the prefered usage of the identifiers, you use the infix symbol == in prefix notation and the prefix function name on in infix notation. -- HenningThielemann

Any kind of syntax highlighting should make the difference between ((==) `on` isAlpha) and ((==) on isAlpha) obvious. Another argument for using an infix on here is that it explains the order of the elements: Deciding if on (==) isAlpha or on isAlpha (==) is better and trying to remember which way the implementor choose is certainly more difficult than realizing that isAlpha `on` (==) makes no sense (There are better examples for this such as `elem`, or think about the confusion between the order of arguments in Data.FiniteMap vs. Data.Map). -- ThomasJaeger

Btw. the function comparing has found its way to the module Data.Ord (http://www.haskell.org/ghc/dist/current/docs/libraries/base/Data-Ord.html). It is a composition of compare and on/compose2. However it does not satisfyingly help to implement extensions of *By functions, because the key for sorting/grouping/maximising must be re-computed for each comparison if you write, say sortBy (comparing length). -- HenningThielemann

Of course, foldl App only works if App models application in an untyped language. Using GADTs, App could be of type Expr (a -> b) -> Expr a -> Expr b, also, many combinator that works on different types can't be "sugared" using functions. -- ThomasJaeger

Finally, there is no reason why one should expect a tool that processes haskell code not to be aware of Haskell 98's syntax. All mentioned syntactic extensions (list comprehension, guards, sections, infix stuff) fall under this category and can be used without any bad conscience.

Sorry for having become so "religous" -- ThomasJaeger

I agree. -- CaleGibbard

If you want a good example for unnecessary sugar, take this one:

tuples :: Int -> [a] -> [[a]]
tuples 0     _  = return []
tuples (r+1) xs = do
   y:ys <- tails xs
   (y:) `fmap` tuples r ys

Why is infix `fmap` more readable than prefix fmap? Where is the analogy to map? Why don't you use map at all? I see map as an operator which lifts scalar functions to list functions, this is perfectly expressed by prefix notation. What is the readability benefit of (r+1) pattern and why is do more readable than explicit init (tails xs) >>= (\(y:ys) -> map (y:) (tuples (r-1) ys)) here? (Mostly because this is not the correct translation and the correct translation is unreadable -- DerekElkins) It's even worse than [(y:) `fmap` tuples r ys | (y:ys) <- tails xs]. You rewrote my code just with sugar but the structure which must be understood remained the same. Sorry, I don't find it easier to understand. Maybe people who believe a common notation rather than to try to understand the meaning are happy with the sugar. -- HenningThielemann

The pattern m >>= \x -> f x is exactly the reason the do-notation was introduced, so each time I write something like this, I replace it with a do notation for the following reason: It is definitely the more common style (nobody is using m >>= \x -> \n-style these days), so much more likely to be understood faster (at least for myself), the do notation expresses nicely that monadic (in this case notdeterminstic) effects are taking place, and finally it is much easier to make changes to the code if it's in do-form (e.g. add additional guards). Of course you CAN do the same changes in >>=-style, too, after all there is a straightforward translation (although complicated by the fact that you have to check if pattern matchings are exhaustive), but I'm not the kind of guy who does all kinds of verbose translation in his head just because he wants to stay away from syntactic sugar.

I disagree with arguments like "nobody is using ...". What does it tell about the quality of a technique? I write here to give reasons against too much syntactic sugar rather than to record today's habits of some programmers. -- HenningThielemann

You are further critizing that I am using fmap instead of the more special map. I find it natural to use fmap in monadic code to abstract from lists. If it weren't for tails, the code would even have type (MonadPlus m, Functor m) => Int -> [a] -> m [a], increasing usability. liftM would also be acceptable (and for some strange reason even slightly faster), but it feels awfully wrong to me to use liftM, so that I'm willing to live with additional Functor constraints. This is also the reason while your list comprehension solution is clearly inferior to a monadic one.

One more thing about pattern match failure vs. init. Though it doesn't matter match in this simple example, the version exploiting pattern match failure is closer to the conceptional algorithm, because it doesn't rely on the "low-level-property" of tails that the empty list is the last element of the returned list (I can never remember this kind of things, even though one of the possible behaviors makes much more sense).

The function tuples is defined by recursion on an Int and only uses the case of the predecessor, so this is a classical example for (n+1)-patterns. Note that the LHSs in your implementation are overlapped, so a reader might need more time to figure out what is going on in your implementation (I admit the effect is small, but this is a very tiny example).

Using fmap infix is a personal habit of mine, but when you think about it, it makes a lot of sense. As we can't overload a space, it's the closest thing to application syntax we can get. I know you prefer App (App f x) y-style, which seems more difficult to understand for most people. This just is a matter of personal style. Please do not mistake your own personal style for the only sensible one.

If no one else objects, I'd like to put my implementation back on the main page, possibly with a less controversial comment. --ThomasJaeger

My argument is that the syntactic sugared version may be readable like the unsugared version, but it does not improve the readability, thus it should be avoided. Sugar shouldn't be the default, it shouldn't used just because it exists. That's the basic opinion where we disagree. Btw. I find the do notation in connection with the List monad very confusing because it looks imperative and it suggests that first something is chosen from the list then it is returned. -- HenningThielemann

While it may not be more readable for you, it is for me, for the reasons I'm getting tired of stating. Also, your opinions on the do-notation seem very strange to me. If we have monads - a way to unify different notions of effects - why make the artificial distinction between IO/State effects and more general ones again? The do-notation expresses in which order the effects are happening - that's the same for a list and an IO monad. However, a distinction between commutative and non-commutative monads would make sense, but unfortunately, there's no way to prove the commutativity of a monad statically in Haskell.

There are still issues that aren't implemented in GHC which belong to the Haskell 98 standard and which are of more importance, I think, such as mutual recursive modules and some special case of polymorphic mutual function recursion. So I don't vote for wasting the time with syntactic sugar when real enhancements are deferred by it. If I would write a Haskell code processor I would certainly prevent me from the trouble of supporting guards and (n+k) patterns. I'm also fed up with the similar situation in HTML with its tons of extensions and the buggy code which is accepted by most browsers (which is also a sort of inofficial extension) - there is simply no fun in processing HTML code. Not to mention C++.

By the way I'd like to have a real function if instead of the sugarized version with then and else in Haskell 98. Then I could use it in connection with zipWith3. See ["Case"] for another application. -- HenningThielemann

Agrees with that. I'm not using if all that often, and could easily add a few braces. And, it would free then and else for normal identifier use. -- RemiTurk

Personally, I like the explicit `then` and `else` and find that they help when reading code to separate where the break is between the sections. It's not that I necessarily disagree with the inclusion of such a function, it is an easy one to write in any case, but I think that some sugar in the form of a few extra words to mark important points in common structures is useful. Human languages have many such words, and their presence makes reading or listening much easier. - CaleGibbard

Other people seem to have problems with this special syntax, too. And they propose even more special syntax to solve the problem. http://hackage.haskell.org/trac/haskell-prime/wiki/DoAndIfThenElse -- HenningThielemann

Difference between revisions of "Haskell programming tips/Discussion"

Revision as of 00:55, 11 November 2008

Contents

About

Flame Away

Avoid recursion

n+k patterns

Use syntactic sugar wisely

Navigation menu

Search