Difference between revisions of "Do notation considered harmful"

Revision as of 20:22, 24 August 2015

Criticism

Haskell's do notation is popular and ubiquitous. However we shall not ignore that there are several problems. Here we like to shed some light on aspects you may not have thought about, so far.

Didactics

The do notation hides functional details. This is wanted in order to simplify writing imperative style code fragments. The downsides are that:

Since do notation is used almost everywhere IO takes place, newcomers quickly believe that the do notation is necessary for doing IO,
Newcomers might think that IO is somehow special and non-functional, in contrast to the advertisement for Haskell being purely functional,
Newcomers might think that the order of statements determines the order of execution.

These misunderstandings let people write clumsy code like

do putStrLn "text"

instead of

putStrLn "text"

or

do text <- getLine
   return text

instead of

getLine

or

do
  text <- readFile "foo"
  writeFile "bar" text

instead of

readFile "foo" >>= writeFile "bar"

.

The order of statements is also not the criterion for the evaluation order. Also here only the data dependencies count. See for instance

do x <- Just (3+5)
   y <- Just (5*7)
   return (x-y)

where 3+5 and 5*7 can be evaluated in any order, also in parallel. Or consider

do x <- Just (3+5)
   y <- Nothing
   return (x-y)

where 3+5 is probably not evaluated at all, because it's result is not necessary to find out, that the entire do describes a Nothing.

Library design

Unfortunately, the do notation is so popular that people write more things with monads than necessary. See for instance the Binary package. It contains the Put monad, which in principle jas nothing to do with a monad. All "put" operations have the monadic result (). In fact it is a Writer monad using the Builder type, and all you need is just the Builder monoid. Even more unfortunate, the applicative functors were introduced to Haskell's standard libraries only after monads and arrows, thus many types are instances of Monad and Arrow classes, but not as many are instances of Applicative. There is no special syntax for applicative functors because it is hardly necessary. You just write

  data Header = Header Char Int Bool

  readHeader :: Get Header
  readHeader = liftA3 Header get get get

or

  readHeader = Header <$> get <*> get <*> get

Not using monads, along with the do notation, can have advantages. Consider a generator of unique identifiers. First you might think of a State monad which increments a counter each time an identifier is requested.

run :: State Int a -> a
run m = evalState m 0

newId :: State Int Int
newId =
   do n <- get
      modify succ
      return n

example :: (Int -> Int -> a) -> a
example f =
   run $
      do x <- newId
         y <- newId
         return (f x y)

If you are confident, that you will not need the counter state at the end and that you will not combine blocks of code using the counter (where the second block needs the state at the end of the first block), you can enforce a more strict scheme of usage. The following is like a Reader monad, where we call local on an incremented counter for each generated identifier. Alternatively you can view it as Continuation monad.

newtype T a = T (Int -> a)

run :: T a -> a
run (T f) = f 0

newId :: (Int -> T a) -> T a
newId f = T $ \i -> case f i of T g -> g (succ i)

example :: (Int -> Int -> T a) -> a
example f =
   run $
   newId $ \a ->
   newId $ \b ->
   f a b

This way users cannot accidentally place a return somewhere in a do block where it has no effect.

Safety

This page addresses an aspect of Haskell style, which is to some extent a matter of taste. Just pick what you find appropriate for you and ignore the rest.

With do notation we have kept alive a dark side of the C programming language: The silent neglect of return values of functions. In an imperative language it is common to return an error code and provide the real work by side effects. In Haskell this cannot happen, because functions have no side effects. If you ignore the result of a Haskell function, the function will not even be evaluated. The situation is different for IO: While processing the IO, you might still ignore the contained return value.

You can write

do getLine
   putStrLn "text"

and thus silently ignore the result of getLine. The same applies to

do System.Cmd.system "echo foo >bar"

where you ignore the ExitCode. Is this behaviour wanted?

There are possibilities to explicitly ignore return values in safety oriented languages (e.g. EVAL in Modula-3). Haskell does not need this, because you can already write

do _ <- System.Cmd.system "echo foo >bar"
   return ()

Writing _ <- should always make you cautious whether ignoring the result is the right thing to do. The possibility for silently ignoring monadic return values is not entirely the fault of the do notation. It would suffice to restrict the type of the (>>) combinator to

(>>) :: m () -> m a -> m a

This way, you can omit _ <- only if the monadic return value has type ().

New developments:

GHC since version 6.12 emits a warning when you silently ignore a return value
There is a new function called void that makes ignoring of return values explicit: GHC ticket 3292

Happy with less sugar

Additional combinators

Using the infix combinators for writing functions simplifies the addition of new combinators. Consider for instance a monad for random distributions. This monad cannot be an instance of MonadPlus, because there is no mzero (it would be an empty list of events, but their probabilities do not sum up to 1) and mplus is not associative because we have to normalize the sum of probabilities to 1. Thus we cannot use standard guard for this monad. However we would like to write the following:

do f <- family
   guard (existsBoy f)
   return f

Given a custom combinator which performs a filtering with subsequent normalization called (>>=?) :: Distribution a -> (a -> Bool) -> Distribution a we can rewrite this easily:

family >>=? existsBoy

Note that the (>>=?) combinator introduces the risk of returning an invalid distribution (empty list of events), but it seems that we have to live with that problem.

Alternative combinators

If you are used to writing monadic functions using infix combinators (>>) and (>>=) you can easily switch to a different set of combinators. This is useful when there is a monadic structure that does not fit into the current Monad type constructor class, where the monadic result type cannot be constrained. This is e.g. useful for the Set data type, where the element type must have a total order.

Useful applications

It shall be mentioned that the do sometimes takes the burden away from you of writing boring things. E.g. in

getRight :: Either a b -> Maybe b
getRight y =
   do Right x <- y
      return x

a case on y is included, which calls fail if y is not a Right (i.e. Left), and thus returns Nothing in this case.

Also the mdo notation proves useful, since it maintains a set of variables for you in a safe manner. Compare

mdo x <- f x y z
    y <- g x y z
    z <- h x y z
    return (x+y+z)

and

mfix
   (\ ~( ~(x,y,z), _) ->
      do x <- f x y z
         y <- g x y z
         z <- h x y z
         return ((x,y,z),x+y+z))

@@ Line 64: / Line 64: @@
 === Library design ===
-Unfortunately, the <hask>do</hask> notation is so popular that people write more things with monads than necessary. See for instance the [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary-0.4.1 Binary] package. It contains the <hask>Put</hask> monad, which has in principle [http://www.haskell.org/pipermail/haskell-cafe/2009-January/053317.html nothing to do with a monad].
+Unfortunately, the <hask>do</hask> notation is so popular that people write more things with monads than necessary. See for instance the [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/binary-0.4.1 Binary] package. It contains the <hask>Put</hask> monad, which in principle [http://www.haskell.org/pipermail/haskell-cafe/2009-January/053317.html jas nothing to do with a monad].
 All "put" operations have the monadic result <hask>()</hask>.
 In fact it is a <hask>Writer</hask> monad using the <hask>Builder</hask> type, and all you need is just the <hask>Builder</hask> monoid.
-Even more unfortunate, the [[applicative functor]]s were introduced to Haskell's standard libraries only after [[monad]]s and [[arrow]]s, thus many types are instances of <hask>Monad</hask> and <hask>Arrow</hask> classes, but not as much are instances of <hask>Applicative</hask>. There is no special syntax for applicative functors because it is hardly necessary.
+Even more unfortunate, the [[applicative functor]]s were introduced to Haskell's standard libraries only after [[monad]]s and [[arrow]]s, thus many types are instances of <hask>Monad</hask> and <hask>Arrow</hask> classes, but not as many are instances of <hask>Applicative</hask>. There is no special syntax for applicative functors because it is hardly necessary.
 You just write
 <haskell>
@@ Line 80: / Line 80: @@
 </haskell>
-<br>Not using monads and thus <hask>do</hask> notation can have advantages.
+<br>Not using monads, along with the <hask>do</hask> notation, can have advantages.
 Consider a generator of unique identifiers. First you might think of a <hask>State</hask> monad which increments a counter each time an identifier is requested.
@@ Line 133: / Line 133: @@
 In an imperative language it is common to return an error code and provide the real work by side effects.
 In Haskell this cannot happen, because functions have no side effects.
-If you ignore the result of a Haskell function the function will even not be evaluated.
+If you ignore the result of a Haskell function, the function will not even be evaluated.
 The situation is different for <hask>IO</hask>:
-While processing the <hask>IO</hask> you might still ignore the contained return value.
+While processing the <hask>IO</hask>, you might still ignore the contained return value.
 You can write
@@ Line 150: / Line 150: @@
 Is this behaviour wanted?
-In safety oriented languages there are possibilities to explicitly ignore return values
+There are possibilities to explicitly ignore return values in safety oriented languages
 (e.g. <code>EVAL</code> in [http://www.modula3.org/ Modula-3]).
 Haskell does not need this, because you can already write
@@ Line 197: / Line 197: @@
 === Alternative combinators ===
-If you are used to write monadic function using infix combinators <hask>(>>)</hask> and <hask>(>>=)</hask>
+If you are used to writing monadic functions using infix combinators <hask>(>>)</hask> and <hask>(>>=)</hask>
 you can easily switch to a different set of combinators.
-This is useful when there is a monadic structure that does not fit into the current <hask>Monad</hask> type constructor class,
+This is useful when there is a monadic structure that does not fit into the current <hask>Monad</hask> type constructor class, where the monadic result type cannot be constrained.
+This is e.g. useful for the [http://www.randomhacks.net/articles/2007/03/15/data-set-monad-haskell-macros Set data type], where the element type must have a total order.
-where the monadic result type cannot be constrained.
-This is e.g. useful for the [http://www.randomhacks.net/articles/2007/03/15/data-set-monad-haskell-macros Set data type],
-where the element type must have a total order.
 == Useful applications ==
-It shall be mentioned that the <hask>do</hask> sometimes takes the burden from you to write boring things.
+It shall be mentioned that the <hask>do</hask> sometimes takes the burden away from you of writing boring things.
 E.g. in
 <haskell>
@@ Line 214: / Line 212: @@
       return x
 </haskell>
-a <hask>case</hask> on <hask>y</hask> is included,
+a <hask>case</hask> on <hask>y</hask> is included, which calls <hask>fail</hask> if <hask>y</hask> is not a <hask>Right</hask> (i.e. <hask>Left</hask>), and thus returns <hask>Nothing</hask> in this case.
-which calls <hask>fail</hask> if <hask>y</hask> is not a <hask>Right</hask> (i.e. <hask>Left</hask>),
-and thus returns <hask>Nothing</hask> in this case.
 Also the <hask>mdo</hask> notation proves useful, since it maintains a set of variables for you in a safe manner.

Difference between revisions of "Do notation considered harmful"

Revision as of 20:22, 24 August 2015

Contents

Criticism

Didactics

Library design

Safety

Happy with less sugar

Additional combinators

Alternative combinators

Useful applications

See also

Navigation menu

Search