Difference between revisions of "Do notation considered harmful"
(some draft) 
(New "Alternatives" section; extra suggestion for alternate "do"notation) 

(25 intermediate revisions by 11 users not shown)  
Line 1:  Line 1:  
−  <! 

−  
== Criticism == 
== Criticism == 

−  Haskell's 
+  Haskell's [[Keywords#do  do notation]] is popular and ubiquitous. 
However we shall not ignore that there are several problems. 
However we shall not ignore that there are several problems. 

Here we like to shed some light on aspects you may not have thought about, so far. 
Here we like to shed some light on aspects you may not have thought about, so far. 

−  > 

+  === Didactics === 

+  
+  The <hask>do</hask> notation hides functional details. 

+  This is wanted in order to simplify writing imperative style code fragments. 

+  The downsides are that: 

+  * Since <hask>do</hask> notation is used almost everywhere <hask>IO</hask> takes place, newcomers quickly believe that the <hask>do</hask> notation is necessary for doing <hask>IO</hask>, 

+  * Newcomers might think that <hask>IO</hask> is somehow special and nonfunctional, in contrast to the advertisement for Haskell being purely functional, 

+  * Newcomers might think that the order of statements determines the order of execution. 

+  
+  These misunderstandings let people write clumsy code like 

+  <haskell> 

+  do putStrLn "text" 

+  </haskell> 

+  instead of 

+  <haskell> 

+  putStrLn "text" 

+  </haskell> 

+  or 

+  <haskell> 

+  do text < getLine 

+  return text 

+  </haskell> 

+  instead of 

+  <haskell> 

+  getLine 

+  </haskell> 

+  or 

+  <haskell> 

+  do 

+  text < readFile "foo" 

+  writeFile "bar" text 

+  </haskell> 

+  instead of 

+  <haskell> 

+  readFile "foo" >>= writeFile "bar" 

+  </haskell> 

+  . 

+  
+  The order of statements is also not the criterion for the evaluation order. 

+  Also here only the data dependencies count. 

+  See for instance 

+  <haskell> 

+  do x < Just (3+5) 

+  y < Just (5*7) 

+  return (xy) 

+  </haskell> 

+  where <hask>3+5</hask> and <hask>5*7</hask> can be evaluated in any order, also in parallel. 

+  Or consider 

+  <haskell> 

+  do x < Just (3+5) 

+  y < Nothing 

+  return (xy) 

+  </haskell> 

+  where <hask>3+5</hask> is probably not evaluated at all, because its result is not necessary to find out that the entire <hask>do</hask> describes a <hask>Nothing</hask>. 

+  
+  
+  === Library design === 

+  
+  Unfortunately, the <hask>do</hask> notation is so popular that people write more things with monads than necessary. See for instance the [http://hackage.haskell.org/cgibin/hackagescripts/package/binary0.4.1 Binary] package. It contains the <hask>Put</hask> monad, which in principle [http://www.haskell.org/pipermail/haskellcafe/2009January/053317.html has nothing to do with a monad]. 

+  All "put" operations have the monadic result <hask>()</hask>. 

+  In fact it is a <hask>Writer</hask> monad using the <hask>Builder</hask> type, and all you need is just the <hask>Builder</hask> monoid. 

+  Even more unfortunate, the [[applicative functor]]s were introduced to Haskell's standard libraries only after [[monad]]s and [[arrow]]s, thus many types were instances of the <hask>Monad</hask> and <hask>Arrow</hask> classes, but not instances of <hask>Applicative</hask>. Fortunately, since GHC 7.10 the [[FunctorApplicativeMonad Proposal]] is implemented and now <hask>Applicative</hask> ''is'' a superclass of <hask>Monad</hask>. There is no special syntax for applicative functors because it is hardly necessary. 

+  You just write 

+  <haskell> 

+  data Header = Header Char Int Bool 

+  
+  readHeader :: Get Header 

+  readHeader = liftA3 Header get get get 

+  </haskell> 

+  or 

+  <haskell> 

+  readHeader = Header <$> get <*> get <*> get 

+  </haskell> 

+  
+  <br>Not using monads, along with the <hask>do</hask> notation, can have advantages. 

+  Consider a generator of unique identifiers. First you might think of a <hask>State</hask> monad which increments a counter each time an identifier is requested. 

+  
+  <haskell> 

+  run :: State Int a > a 

+  run m = evalState m 0 

+  
+  newId :: State Int Int 

+  newId = 

+  do n < get 

+  modify succ 

+  return n 

+  
+  example :: (Int > Int > a) > a 

+  example f = 

+  run $ 

+  do x < newId 

+  y < newId 

+  return (f x y) 

+  </haskell> 

+  
+  If you are confident that you will not need the counter state at the end and 

+  that you will not combine blocks of code using the counter 

+  (where the second block needs the state at the end of the first block), 

+  you can enforce a more strict scheme of usage. The following is like a <hask>Reader</hask> monad, where we call <hask>local</hask> on an incremented counter for each generated identifier. Alternatively you can view it as [[Continuation]] monad. 

+  
+  <haskell> 

+  newtype T a = T (Int > a) 

+  
+  run :: T a > a 

+  run (T f) = f 0 

+  
+  newId :: (Int > T a) > T a 

+  newId f = T $ \i > case f i of T g > g (succ i) 

+  
+  example :: (Int > Int > T a) > a 

+  example f = 

+  run $ 

+  newId $ \a > 

+  newId $ \b > 

+  f a b 

+  </haskell> 

+  
+  This way users cannot accidentally place a <hask>return</hask> 

+  somewhere in a <hask>do</hask> block where it has no effect. 

+  
+  === Safety === 

+  {{essay}} 

+  
+  With <hask>do</hask> notation we have kept alive a dark side of the C programming language: 

+  The silent neglect of return values of functions. 

+  In an imperative language it is common to return an error code and provide the real work by side effects. 

+  In Haskell this cannot happen, because functions have no side effects. 

+  If you ignore the result of a Haskell function, the function will not even be evaluated. 

+  The situation is different for <hask>IO</hask>: 

+  While processing the <hask>IO</hask>, you might still ignore the contained return value. 

+  
+  You can write 

+  <haskell> 

+  do getLine 

+  putStrLn "text" 

+  </haskell> 

+  and thus silently ignore the result of <hask>getLine</hask>. 

+  The same applies to 

+  <haskell> 

+  do System.Cmd.system "echo foo >bar" 

+  </haskell> 

+  where you ignore the <hask>ExitCode</hask>. 

+  Is this behaviour wanted? 

+  
+  There are possibilities to explicitly ignore return values in safety oriented languages 

+  (e.g. <code>EVAL</code> in [http://www.modula3.org/ Modula3]). 

+  Haskell does not need this, because you can already write 

+  <haskell> 

+  do _ < System.Cmd.system "echo foo >bar" 

+  return () 

+  </haskell> 

+  Writing <hask> _ < </hask> should always make you cautious whether ignoring the result is the right thing to do. 

+  The possibility for silently ignoring monadic return values is not entirely the fault of the <hask>do</hask> notation. 

+  It would suffice to restrict the type of the <hask>(>>)</hask> combinator to 

+  <haskell> 

+  (>>) :: m () > m a > m a 

+  </haskell> 

+  This way, you can omit <hask> _ < </hask> only if the monadic return value has type <hask>()</hask>. 

+  
+  New developments: 

+  * GHC since version 6.12 emits a warning when you silently ignore a return value 

+  * There is a new function called <hask>void</hask> that makes ignoring of return values explicit: [http://hackage.haskell.org/trac/ghc/ticket/3292 GHC ticket 3292] 

+  <! related is the problem on inefficient void (mapM f xs) vs. (mapM_ f xs) > 

+  
+  == Alternatives == 

+  === Use less sugar === 

+  
+  *<u>Additional combinators</u> 

+  :{ 

+  
+  Using the infix combinators for writing functions simplifies the addition of new combinators. 

+  Consider for instance a monad for [http://darcs.haskell.org/probability/src/Numeric/Probability.hs random distributions]. 

+  This monad cannot be an instance of <hask>MonadPlus</hask>, 

+  because there is no <hask>mzero</hask> (it would be an empty list of events, but their probabilities do not sum up to 1) 

+  and <hask>mplus</hask> is not associative because we have to normalize the sum of probabilities to 1. 

+  Thus we cannot use standard <hask>guard</hask> for this monad. 

+  However we would like to write the following: 

+  <haskell> 

+  do f < family 

+  guard (existsBoy f) 

+  return f 

+  </haskell> 

+  
+  Given a custom combinator which performs a filtering with subsequent normalization called <hask>(>>=?) :: Distribution a > (a > Bool) > Distribution a</hask> 

+  we can rewrite this easily: 

+  <haskell> 

+  family >>=? existsBoy 

+  </haskell> 

+  Note that the <hask>(>>=?)</hask> combinator introduces the risk of returning an invalid distribution (empty list of events), 

+  but it seems that we have to live with that problem. 

+  } 

+  
+  *<u>Alternative combinators</u> 

+  :{ 

+  If you are used to writing monadic functions using infix combinators <hask>(>>)</hask> and <hask>(>>=)</hask> 

+  you can easily switch to a different set of combinators. 

+  This is useful when there is a monadic structure that does not fit into the current <hask>Monad</hask> type constructor class, where the monadic result type cannot be constrained. 

+  This is e.g. useful for the [http://www.randomhacks.net/articles/2007/03/15/datasetmonadhaskellmacros Set data type], where the element type must have a total order. 

+  } 

+  
+  === Use more existing sugar === 

+  
+  Another option is to bring <code>do</code>notation closer to the syntax of listcomprehensions. 

+  
+  Instead of: 

+  <haskell> 

+  do f < family 

+  guard (existsBoy f) 

+  return f 

+  </haskell> 

+  
+  one can write: 

+  <haskell> 

+  do f  f < family, 

+  existsBoy f 

+  </haskell> 

+  
+  or: 

+  <haskell> 

+  do { f  f < family, existsBoy f } 

+  </haskell> 

+  
+  This also preserves the use of the extant listcomprehension syntax solely for lists, reducing the temptation (and subsequent confusion for Haskell beginners!) to extend it to all monadic types. 

+  
+  == Useful applications == 

+  
+  It shall be mentioned that the <hask>do</hask> sometimes takes the burden away from you of writing boring things. 

+  E.g. in 

+  <haskell> 

+  getRight :: Either a b > Maybe b 

+  getRight y = 

+  do Right x < y 

+  return x 

+  </haskell> 

+  a <hask>case</hask> on <hask>y</hask> is included, which calls <hask>fail</hask> if <hask>y</hask> is not a <hask>Right</hask> (i.e. <hask>Left</hask>), and thus returns <hask>Nothing</hask> in this case. 

+  
+  Also the <hask>mdo</hask> notation proves useful, since it maintains a set of variables for you in a safe manner. 

+  Compare 

+  <haskell> 

+  mdo x < f x y z 

+  y < g x y z 

+  z < h x y z 

+  return (x+y+z) 

+  </haskell> 

+  and 

+  <haskell> 

+  mfix 

+  (\ ~( ~(x,y,z), _) > 

+  do x < f x y z 

+  y < g x y z 

+  z < h x y z 

+  return ((x,y,z),x+y+z)) 

+  </haskell> 

== See also == 
== See also == 

Line 13:  Line 263:  
* Paul Hudak in HaskellCafe: [http://www.haskell.org/pipermail/haskellcafe/2007August/030178.html A regressive view of support for imperative programming in Haskell] 
* Paul Hudak in HaskellCafe: [http://www.haskell.org/pipermail/haskellcafe/2007August/030178.html A regressive view of support for imperative programming in Haskell] 

* Data.Syntaxfree on Wordpress: [http://syntaxfree.wordpress.com/2006/12/12/donotationconsideredharmful/ Donotation considered harmful] 
* Data.Syntaxfree on Wordpress: [http://syntaxfree.wordpress.com/2006/12/12/donotationconsideredharmful/ Donotation considered harmful] 

+  * [[Things to avoid#do notation]] 

[[Category:Syntax]] 
[[Category:Syntax]] 
Latest revision as of 01:08, 27 September 2021
Contents
Criticism
Haskell's do notation is popular and ubiquitous. However we shall not ignore that there are several problems. Here we like to shed some light on aspects you may not have thought about, so far.
Didactics
The do
notation hides functional details.
This is wanted in order to simplify writing imperative style code fragments.
The downsides are that:
 Since
do
notation is used almost everywhereIO
takes place, newcomers quickly believe that thedo
notation is necessary for doingIO
,  Newcomers might think that
IO
is somehow special and nonfunctional, in contrast to the advertisement for Haskell being purely functional,  Newcomers might think that the order of statements determines the order of execution.
These misunderstandings let people write clumsy code like
do putStrLn "text"
instead of
putStrLn "text"
or
do text < getLine
return text
instead of
getLine
or
do
text < readFile "foo"
writeFile "bar" text
instead of
readFile "foo" >>= writeFile "bar"
.
The order of statements is also not the criterion for the evaluation order. Also here only the data dependencies count. See for instance
do x < Just (3+5)
y < Just (5*7)
return (xy)
where 3+5
and 5*7
can be evaluated in any order, also in parallel.
Or consider
do x < Just (3+5)
y < Nothing
return (xy)
where 3+5
is probably not evaluated at all, because its result is not necessary to find out that the entire do
describes a Nothing
.
Library design
Unfortunately, the do
notation is so popular that people write more things with monads than necessary. See for instance the Binary package. It contains the Put
monad, which in principle has nothing to do with a monad.
All "put" operations have the monadic result ()
.
In fact it is a Writer
monad using the Builder
type, and all you need is just the Builder
monoid.
Even more unfortunate, the applicative functors were introduced to Haskell's standard libraries only after monads and arrows, thus many types were instances of the Monad
and Arrow
classes, but not instances of Applicative
. Fortunately, since GHC 7.10 the FunctorApplicativeMonad Proposal is implemented and now Applicative
is a superclass of Monad
. There is no special syntax for applicative functors because it is hardly necessary.
You just write
data Header = Header Char Int Bool
readHeader :: Get Header
readHeader = liftA3 Header get get get
or
readHeader = Header <$> get <*> get <*> get
Not using monads, along with the do
notation, can have advantages.
Consider a generator of unique identifiers. First you might think of a State
monad which increments a counter each time an identifier is requested.
run :: State Int a > a
run m = evalState m 0
newId :: State Int Int
newId =
do n < get
modify succ
return n
example :: (Int > Int > a) > a
example f =
run $
do x < newId
y < newId
return (f x y)
If you are confident that you will not need the counter state at the end and
that you will not combine blocks of code using the counter
(where the second block needs the state at the end of the first block),
you can enforce a more strict scheme of usage. The following is like a Reader
monad, where we call local
on an incremented counter for each generated identifier. Alternatively you can view it as Continuation monad.
newtype T a = T (Int > a)
run :: T a > a
run (T f) = f 0
newId :: (Int > T a) > T a
newId f = T $ \i > case f i of T g > g (succ i)
example :: (Int > Int > T a) > a
example f =
run $
newId $ \a >
newId $ \b >
f a b
This way users cannot accidentally place a return
somewhere in a do
block where it has no effect.
Safety
 This page addresses an aspect of Haskell style, which is to some extent a matter of taste. Just pick what you find appropriate for you and ignore the rest.
With do
notation we have kept alive a dark side of the C programming language:
The silent neglect of return values of functions.
In an imperative language it is common to return an error code and provide the real work by side effects.
In Haskell this cannot happen, because functions have no side effects.
If you ignore the result of a Haskell function, the function will not even be evaluated.
The situation is different for IO
:
While processing the IO
, you might still ignore the contained return value.
You can write
do getLine
putStrLn "text"
and thus silently ignore the result of getLine
.
The same applies to
do System.Cmd.system "echo foo >bar"
where you ignore the ExitCode
.
Is this behaviour wanted?
There are possibilities to explicitly ignore return values in safety oriented languages
(e.g. EVAL
in Modula3).
Haskell does not need this, because you can already write
do _ < System.Cmd.system "echo foo >bar"
return ()
Writing _ <
should always make you cautious whether ignoring the result is the right thing to do.
The possibility for silently ignoring monadic return values is not entirely the fault of the do
notation.
It would suffice to restrict the type of the (>>)
combinator to
(>>) :: m () > m a > m a
This way, you can omit _ <
only if the monadic return value has type ()
.
New developments:
 GHC since version 6.12 emits a warning when you silently ignore a return value
 There is a new function called
void
that makes ignoring of return values explicit: GHC ticket 3292
Alternatives
Use less sugar
 Additional combinators

Using the infix combinators for writing functions simplifies the addition of new combinators.
Consider for instance a monad for random distributions.
This monad cannot be an instance of
MonadPlus
, because there is nomzero
(it would be an empty list of events, but their probabilities do not sum up to 1) andmplus
is not associative because we have to normalize the sum of probabilities to 1. Thus we cannot use standardguard
for this monad. However we would like to write the following:do f < family guard (existsBoy f) return f
Given a custom combinator which performs a filtering with subsequent normalization called
(>>=?) :: Distribution a > (a > Bool) > Distribution a
we can rewrite this easily:family >>=? existsBoy
Note that the
(>>=?)
combinator introduces the risk of returning an invalid distribution (empty list of events), but it seems that we have to live with that problem.
 Alternative combinators

If you are used to writing monadic functions using infix combinators
(>>)
and(>>=)
you can easily switch to a different set of combinators. This is useful when there is a monadic structure that does not fit into the currentMonad
type constructor class, where the monadic result type cannot be constrained. This is e.g. useful for the Set data type, where the element type must have a total order.
Use more existing sugar
Another option is to bring do
notation closer to the syntax of listcomprehensions.
Instead of:
do f < family
guard (existsBoy f)
return f
one can write:
do f  f < family,
existsBoy f
or:
do { f  f < family, existsBoy f }
This also preserves the use of the extant listcomprehension syntax solely for lists, reducing the temptation (and subsequent confusion for Haskell beginners!) to extend it to all monadic types.
Useful applications
It shall be mentioned that the do
sometimes takes the burden away from you of writing boring things.
E.g. in
getRight :: Either a b > Maybe b
getRight y =
do Right x < y
return x
a case
on y
is included, which calls fail
if y
is not a Right
(i.e. Left
), and thus returns Nothing
in this case.
Also the mdo
notation proves useful, since it maintains a set of variables for you in a safe manner.
Compare
mdo x < f x y z
y < g x y z
z < h x y z
return (x+y+z)
and
mfix
(\ ~( ~(x,y,z), _) >
do x < f x y z
y < g x y z
z < h x y z
return ((x,y,z),x+y+z))
See also
 Paul Hudak in HaskellCafe: A regressive view of support for imperative programming in Haskell
 Data.Syntaxfree on Wordpress: Donotation considered harmful
 Things to avoid#do notation