Difference between revisions of "Enumerator and iteratee"
(Oleg's talk) 
m (Fix code highlighting) 

(11 intermediate revisions by 8 users not shown)  
Line 1:  Line 1:  
−  An enumerator is something that knows how to 
+  An enumerator is something that knows how to generate a list and an iteratee is something that does one step in processing another piece of the big list. E.g. to sum up all elements of a list, we do 
<haskell> 
<haskell> 

−  +  foldl (+) 0 xs 

</haskell> 
</haskell> 

+  Then <hask>foldl</hask> is the enumerator and <hask>((+),0)</hask> is the iteratee. 

−  to sum up all elements of a set we do 

+  Clearly the function that sums the current element with the accumulator, <hask>(+)</hask>, doesn't know or care from which collection the elements are coming from. The initial seed, <hask>0</hask>, is again unaware of the collection. That achieves the 

+  [[separation of concerns]]: fold (aka, enumerator) has the intimate knowledge 

+  of the collection and how to get to the next element; iteratee knows 

+  what to do with the current element. 

+  
+  == Definition == 

+  
+  Do not rely on the <hask>foldl</hask> analogy too firmly, it is misleading. <hask>((+),0)</hask> is an [[Falgebra]] and <hask>foldl (+) 0</hask> is a [[catamorphism]]. But iteratee is different, it is an [[automaton]]. From this point of view, the enumerator sends elements of a list sequentially, from head to tail, as input messages to the iteratee. If the iteratee finishes, it outputs an accumulator. If the iteratee continues, it outputs nothing (i.e., <hask>()</hask>). 

+  
+  So, a set of states of iteratee is divided into subsets "Done" and "Next". Donestate means that automaton finished consuming a list, i.e., the automaton is dead. Nextstate means that you can give an input message and obtain the same automaton in a '''new''' state. 

<haskell> 
<haskell> 

−  Set.fold (+) 0 st 

+  data Iteratee i o 

+  = Done o 

+   Next (i > Iteratee i o) 

</haskell> 
</haskell> 

−  Then <hask>fold</hask> is the enumerator and <hask>(+)</hask> and <hask>0</hask> are the iteratee. 

+  <hask>i</hask> is the type of the iteratee's input messages (or list elements) and <hask>o</hask> is a type of the output message (an accumulator). Precisely speaking, <hask>Iteratee</hask> stores not an automaton, but an automaton in some state, an automaton with distinguished state. As you see, if an <hask>Iteratee</hask> is in the <hask>Next</hask> state, then we have a function that takes an input message and returns a new <hask>Iteratee</hask>. 

−  Ditto for any other foldable data structure. Clearly the function that 

+  The distinct feature of iteratee is that it can say after which list element an iteratee finishes. An iteratee says this by sending "Done" to an enumerator. Then the enumerator can, for example, close a file or a socket (a stream) where a list of characters is read from. [[Lazy I/O]], which uses lazy lists, closes a stream only when the stream is exhausted. 

−  sums the current element with the accumulator, (+), doesn't know or 

−  care from which collection the elements are coming from. The initial 

−  seed, 0, is again unaware of the collection. 

−  Iteratee is indeed the function that you pass to fold (combined with 

+  The drawback is that an enumerator can not tell an iteratee that an input is exhausted — an <hask>Iteratee</hask> consumes only infinite lists. You can remedy this by assuming 

−  the seed for practical reasons). One may conceptually consider 

+  <haskell> 

−  iteratee to be a pair of the function to feed to fold, and the initial 

+  i == Maybe i' 

−  seed (the accumulator in the above example). That achieves the 

+  </haskell> 

−  separation of concerns: fold (aka, enumerator) has the intimate knowledge 

+  where <hask>i'</hask> is a type of list elements. <hask>Nothing</hask> given to iteratee signals that the list is exhausted. 

−  of the collection and how to get to the next element; iteratee knows 

+  
−  what to do with the current element. 

+  Here is a sample enumerator that takes input messages from a file: 

+  <haskell> 

+  enumerator :: FilePath > Iteratee (Maybe Char) o > IO o 

+  enumerator file it = withFile file ReadMode 

+  $ \h > fix (\rc it > case it of 

+  Done o > return o 

+  Next f > do 

+  eof < hIsEOF h 

+  case eof of 

+  False > do 

+  c < hGetChar h 

+  rc (f (Just c)) 

+  True > rc (f Nothing) 

+  ) it 

+  </haskell> 

+  
+  == Functions == 

+  
+  You can compose iteratees sequentially in time. This is done by <hask>(>>)</hask>. <hask>it0 >> it1</hask> means that when <hask>it0</hask> finishes, <hask>it1</hask> starts. Generally speaking, <hask>Iteratee i</hask> is a <hask>Monad</hask>, and it works exactly like a [[monadic parser]]. 

+  <haskell> 

+  { s = state } 

+  instance Functor (Iteratee input) where 

+  fmap f = fix $ \rc s > case s of 

+  Done o > Done (f o) 

+  Next g > Next (rc . g) 

+  instance Monad (Iteratee input) where 

+  return = Done 

+  it0 >>= it1 = fix (\rc s > case s of 

+  Done o > it1 o 

+  Next g > Next (rc . g) 

+  ) it0 

+  </haskell> 

+  
+  You can also compose iteratees sequentially in space. <hask>it0</hask>'s output messages become <hask>it1</hask>'s input messages, so <hask>it0</hask> and <hask>it1</hask> work in parallel. Their composition is denoted <hask>it1 . it0</hask>. If <hask>it0</hask> finishes, it is resurrected to its original state. If <hask>it1</hask> finishes, <hask>it1 . it0</hask> finishes — The main feature here is that <hask>it0</hask> is restarted, as this is used for repetitive parsing. 

+  <haskell> 

+  arr0 f = Next $ \i > Done (f i) 

+  instance Category Iteratee where 

+  id = arr0 id 

+  it1 . it0 = fix (\rc1 it1 > case it1 of 

+  Done c > Done c 

+  Next f1 > fix (\rc0 it0 > case it0 of 

+  Done b > rc1 (f1 b) 

+  Next f0 > Next (rc0 . f0) 

+  ) it0 

+  ) it1 

+  </haskell> 

+  
+  == Generalization == 

+  You may note that <hask>Iteratee</hask> is a [[final coalgebra]]. Other kinds of automata can be described with other [[Fcoalgebra]]s. In practice such automata can handle network protocols or interactive user input. See for example [http://www.cs.ru.nl/~bart/PAPERS/index.html papers] by Bart Jacobs for theoretical discussion. 

== See also == 
== See also == 

+  * Oleg Kiselyov: "[http://okmij.org/ftp/Haskell/Iteratee/describe.pdf Iteratees]"  FLOPS 2012 paper 

+  * [http://www.mew.org/~kazu/proj/enumerator/ A tutorial on the enumerator library] 

* HaskellCafe on [http://www.haskell.org/pipermail/haskellcafe/2008December/052181.html understanding enumerator/iteratee] 
* HaskellCafe on [http://www.haskell.org/pipermail/haskellcafe/2008December/052181.html understanding enumerator/iteratee] 

−  * Oleg Kiselyov: "[http://okmij.org/ftp/Haskell/Iteratee/DEFUN08talknotes.pdf Incremental multilevel input processing with leftfold enumerator]  predictable, highperformance, safe, and elegant" 

+  * HaskellCafe on [http://www.haskell.org/pipermail/haskellcafe/2009February/056816.html Left fold enumerator  a real pearl overlooked?] 

+  * John Lato's cabalized [http://inmachina.net/~jwlato/haskell/iteratee/ package] of Oleg's code 

+  * [[Iteratee I/O]] 

+  * The Yesod book's [http://www.yesodweb.com/book/enumerator appendix on the Enumerator package] 

[[Category:Idioms]] 
[[Category:Idioms]] 
Latest revision as of 02:34, 13 July 2019
An enumerator is something that knows how to generate a list and an iteratee is something that does one step in processing another piece of the big list. E.g. to sum up all elements of a list, we do
foldl (+) 0 xs
Then foldl
is the enumerator and ((+),0)
is the iteratee.
Clearly the function that sums the current element with the accumulator, (+)
, doesn't know or care from which collection the elements are coming from. The initial seed, 0
, is again unaware of the collection. That achieves the
separation of concerns: fold (aka, enumerator) has the intimate knowledge
of the collection and how to get to the next element; iteratee knows
what to do with the current element.
Definition
Do not rely on the foldl
analogy too firmly, it is misleading. ((+),0)
is an Falgebra and foldl (+) 0
is a catamorphism. But iteratee is different, it is an automaton. From this point of view, the enumerator sends elements of a list sequentially, from head to tail, as input messages to the iteratee. If the iteratee finishes, it outputs an accumulator. If the iteratee continues, it outputs nothing (i.e., ()
).
So, a set of states of iteratee is divided into subsets "Done" and "Next". Donestate means that automaton finished consuming a list, i.e., the automaton is dead. Nextstate means that you can give an input message and obtain the same automaton in a new state.
data Iteratee i o
= Done o
 Next (i > Iteratee i o)
i
is the type of the iteratee's input messages (or list elements) and o
is a type of the output message (an accumulator). Precisely speaking, Iteratee
stores not an automaton, but an automaton in some state, an automaton with distinguished state. As you see, if an Iteratee
is in the Next
state, then we have a function that takes an input message and returns a new Iteratee
.
The distinct feature of iteratee is that it can say after which list element an iteratee finishes. An iteratee says this by sending "Done" to an enumerator. Then the enumerator can, for example, close a file or a socket (a stream) where a list of characters is read from. Lazy I/O, which uses lazy lists, closes a stream only when the stream is exhausted.
The drawback is that an enumerator can not tell an iteratee that an input is exhausted — an Iteratee
consumes only infinite lists. You can remedy this by assuming
i == Maybe i'
where i'
is a type of list elements. Nothing
given to iteratee signals that the list is exhausted.
Here is a sample enumerator that takes input messages from a file:
enumerator :: FilePath > Iteratee (Maybe Char) o > IO o
enumerator file it = withFile file ReadMode
$ \h > fix (\rc it > case it of
Done o > return o
Next f > do
eof < hIsEOF h
case eof of
False > do
c < hGetChar h
rc (f (Just c))
True > rc (f Nothing)
) it
Functions
You can compose iteratees sequentially in time. This is done by (>>)
. it0 >> it1
means that when it0
finishes, it1
starts. Generally speaking, Iteratee i
is a Monad
, and it works exactly like a monadic parser.
{ s = state }
instance Functor (Iteratee input) where
fmap f = fix $ \rc s > case s of
Done o > Done (f o)
Next g > Next (rc . g)
instance Monad (Iteratee input) where
return = Done
it0 >>= it1 = fix (\rc s > case s of
Done o > it1 o
Next g > Next (rc . g)
) it0
You can also compose iteratees sequentially in space. it0
's output messages become it1
's input messages, so it0
and it1
work in parallel. Their composition is denoted it1 . it0
. If it0
finishes, it is resurrected to its original state. If it1
finishes, it1 . it0
finishes — The main feature here is that it0
is restarted, as this is used for repetitive parsing.
arr0 f = Next $ \i > Done (f i)
instance Category Iteratee where
id = arr0 id
it1 . it0 = fix (\rc1 it1 > case it1 of
Done c > Done c
Next f1 > fix (\rc0 it0 > case it0 of
Done b > rc1 (f1 b)
Next f0 > Next (rc0 . f0)
) it0
) it1
Generalization
You may note that Iteratee
is a final coalgebra. Other kinds of automata can be described with other Fcoalgebras. In practice such automata can handle network protocols or interactive user input. See for example papers by Bart Jacobs for theoretical discussion.
See also
 Oleg Kiselyov: "Iteratees"  FLOPS 2012 paper
 A tutorial on the enumerator library
 HaskellCafe on understanding enumerator/iteratee
 HaskellCafe on Left fold enumerator  a real pearl overlooked?
 John Lato's cabalized package of Oleg's code
 Iteratee I/O
 The Yesod book's appendix on the Enumerator package