Difference between revisions of "Enumerator and iteratee"

From HaskellWiki
Jump to navigation Jump to search
(link to an independent article)
(automaton)
Line 1: Line 1:
An enumerator is something that knows how to process a data structure and an iteratee is something that does one step in processing another piece of the big data structure. E.g. to sum up all elements of Data.Map, we do
+
An enumerator is something that knows how to generate a list and an iteratee is something that does one step in processing another piece of the big list. E.g. to sum up all elements of a list, we do
<haskell>
+
<pre-haskell>
Map.fold (+) 0 mp
+
foldl (+) 0 xs
</haskell>
+
</pre-haskell>
 
Then <code-haskell>foldl</code-haskell> is the enumerator and <code-haskell>((+),0)</code-haskell> is the iteratee.
   
  +
Clearly the function that sums the current element with the accumulator, <code-haskell>(+)</code-haskell>, doesn't know or care from which collection the elements are coming from. The initial seed, <code-haskell>0</code-haskell>, is again unaware of the collection. That achieves the
to sum up all elements of a set we do
 
<haskell>
 
Set.fold (+) 0 st
 
</haskell>
 
 
Then <hask>fold</hask> is the enumerator and <hask>(+)</hask> and <hask>0</hask> are the iteratee.
 
 
Ditto for any other foldable data structure. Clearly the function that
 
sums the current element with the accumulator, (+), doesn't know or
 
care from which collection the elements are coming from. The initial
 
seed, 0, is again unaware of the collection.
 
 
Iteratee is indeed the function that you pass to fold (combined with
 
the seed for practical reasons). One may conceptually consider
 
iteratee to be a pair of the function to feed to fold, and the initial
 
seed (the accumulator in the above example). That achieves the
 
 
[[separation of concerns]]: fold (aka, enumerator) has the intimate knowledge
 
[[separation of concerns]]: fold (aka, enumerator) has the intimate knowledge
 
of the collection and how to get to the next element; iteratee knows
 
of the collection and how to get to the next element; iteratee knows
 
what to do with the current element.
 
what to do with the current element.
   
  +
== Definition ==
  +
  +
Do not rely on the <code-haskell>foldl</code-haskell> analogy too firmly, it is misleading. <code-haskell>((+),0)</code-haskell> is [[F-algebra]] and <code-haskell>foldl (+) 0</code-haskell> is [[catamorphism]]. But iteratee is different, it is [[automaton]]. From this point of view, enumerator sends elements of a list from head to tail sequentially as input messages to iteratee. If iteratee finishes, it outputs an accumulator. If iteratee continues, it outputs nothing (i.e. <code-haskell>()</code-haskell>).
  +
  +
So, a set of states of iteratee is divided into subsets "Done" and "Next". Done-state means that automaton finished consuming a list, i.e. automaton is dead. Next-state means that you can give an input message and obtain the same automaton in a '''new''' state.
 
<pre-haskell>
  +
data Iteratee i o
  +
= Done o
  +
| Next (i -> Iteratee i o)
 
</pre-haskell>
  +
  +
<code-haskell>i</code-haskell> is a type of input messages (or list elements) and <code-haskell>o</code-haskell> is a type of the output message (an accumulator). Precisely speaking, <code-haskell>Iteratee</code-haskell> stores not an automaton, but an automaton in some state, an automaton with distinguished state. As you see, if <code-haskell>Iteratee</code-haskell> is <code-haskell>Next</code-haskell>, than we have a function that takes an input message and returns a new <code-haskell>Iteratee</code-haskell>.
  +
  +
The distinct feature of iteratee is that it can say after which list element an iteratee finishes. An iteratee says this by sending "Done" to an enumerator. Than the enumerator can close a file or a socket (a stream) where a list of characters is read from. [[Lazy I/O]], which uses lazy lists, close a stream only when the stream is exhausted.
  +
  +
The drawback is that enumerator can not tell iteratee that a list is exhausted. An <code-haskell>Iteratee</code-haskell> consumes only infinite lists. You can remedy this by assuming
  +
<pre-haskell>
  +
i == Maybe i'
  +
</pre-haskell>
  +
where <code-haskell>i'</code-haskell> is a type of list elements. <code-haskell>Nothing</code-haskell> given to iteratee signals that the list is exhausted.
  +
  +
Here is a sample enumerator that takes input messages from a file:
  +
<pre-haskell>
  +
enumerator :: FilePath -> Iteratee (Maybe Char) o -> IO o
  +
enumerator file it = withFile file ReadMode
  +
$ \h -> fix (\rc it -> case it of
  +
Done o -> return o
  +
Next f -> do
  +
eof <- hIsEOF h
  +
case eof of
  +
False -> do
  +
c <- hGetChar h
  +
rc (f (Just c))
  +
True -> rc (f Nothing)
  +
) it
  +
</pre-haskell>
  +
  +
== Functions ==
  +
  +
You can compose iteratees sequentially in time. This is done by <code-haskell>(>>)</code-haskell>. <code-haskell>it0 >> it1</code-haskell> means that when <code-haskell>it0</code-haskell> finishes, <code-haskell>it1</code-haskell> starts. Generally speaking, <code-haskell>Iteratee i</code-haskell> is a <code-haskell>Monad</code-haskell>. And it works exactly as [[monadic parser]].
  +
<pre-haskell>
  +
{- s = state -}
  +
instance Functor (Iteratee input) where
  +
fmap f = fix $ \rc s -> case s of
  +
Done o -> Done (f o)
  +
Next g -> Next (rc . g)
  +
instance Monad (Iteratee input) where
  +
return = Done
  +
it0 >>= it1 = fix (\rc s -> case s of
  +
Done o -> it1 o
  +
Next g -> Next (rc . g)
  +
) it0
  +
</pre-haskell>
  +
  +
You can compose iteratees sequentially in space. <code-haskell>it0</code-haskell>'s output messages become <code-haskell>it1</code-haskell>'s input messages. So <code-haskell>it0</code-haskell> and <code-haskell>it1</code-haskell> works in parallel. Their composition is denoted <code-haskell>it1 . it0</code-haskell>. If <code-haskell>it0</code-haskell> finishes, it is resurrected to its original state. If <code-haskell>it1</code-haskell> finishes, <code-haskell>it1 . it0</code-haskell> finishes. The main feature here is that <code-haskell>it0</code-haskell> is restarted. This is used for repetitive parsing.
  +
<pre-haskell>
  +
arr0 f = Next $ \i -> Done (f i)
  +
instance Category Iteratee where
  +
id = arr0 id
  +
it1 . it0 = fix (\rc1 it1 -> case it1 of
  +
Done c -> Done c
  +
Next f1 -> fix (\rc0 it0 -> case it0 of
  +
Done b -> rc1 (f1 b)
  +
Next f0 -> Next (rc0 . f0)
  +
) it0
  +
) it1
  +
</pre-haskell>
  +
  +
== Generalization ==
  +
  +
You may note that <code-haskell>Iteratee</code-haskell> is [[final coalgebra]]. Other kinds of automata can be described with other [[F-coalgebra]]s. In practice such automata can handle network protocols or interactive user input. See for example [http://www.cs.ru.nl/~bart/PAPERS/index.html papers] by Bart Jacobs for theoretical discussion.
   
 
== See also ==
 
== See also ==

Revision as of 16:36, 16 June 2010

An enumerator is something that knows how to generate a list and an iteratee is something that does one step in processing another piece of the big list. E.g. to sum up all elements of a list, we do <pre-haskell> foldl (+) 0 xs </pre-haskell> Then <code-haskell>foldl</code-haskell> is the enumerator and <code-haskell>((+),0)</code-haskell> is the iteratee.

Clearly the function that sums the current element with the accumulator, <code-haskell>(+)</code-haskell>, doesn't know or care from which collection the elements are coming from. The initial seed, <code-haskell>0</code-haskell>, is again unaware of the collection. That achieves the separation of concerns: fold (aka, enumerator) has the intimate knowledge of the collection and how to get to the next element; iteratee knows what to do with the current element.

Definition

Do not rely on the <code-haskell>foldl</code-haskell> analogy too firmly, it is misleading. <code-haskell>((+),0)</code-haskell> is F-algebra and <code-haskell>foldl (+) 0</code-haskell> is catamorphism. But iteratee is different, it is automaton. From this point of view, enumerator sends elements of a list from head to tail sequentially as input messages to iteratee. If iteratee finishes, it outputs an accumulator. If iteratee continues, it outputs nothing (i.e. <code-haskell>()</code-haskell>).

So, a set of states of iteratee is divided into subsets "Done" and "Next". Done-state means that automaton finished consuming a list, i.e. automaton is dead. Next-state means that you can give an input message and obtain the same automaton in a new state. <pre-haskell> data Iteratee i o

 = Done o
 | Next (i -> Iteratee i o)

</pre-haskell>

<code-haskell>i</code-haskell> is a type of input messages (or list elements) and <code-haskell>o</code-haskell> is a type of the output message (an accumulator). Precisely speaking, <code-haskell>Iteratee</code-haskell> stores not an automaton, but an automaton in some state, an automaton with distinguished state. As you see, if <code-haskell>Iteratee</code-haskell> is <code-haskell>Next</code-haskell>, than we have a function that takes an input message and returns a new <code-haskell>Iteratee</code-haskell>.

The distinct feature of iteratee is that it can say after which list element an iteratee finishes. An iteratee says this by sending "Done" to an enumerator. Than the enumerator can close a file or a socket (a stream) where a list of characters is read from. Lazy I/O, which uses lazy lists, close a stream only when the stream is exhausted.

The drawback is that enumerator can not tell iteratee that a list is exhausted. An <code-haskell>Iteratee</code-haskell> consumes only infinite lists. You can remedy this by assuming <pre-haskell> i == Maybe i' </pre-haskell> where <code-haskell>i'</code-haskell> is a type of list elements. <code-haskell>Nothing</code-haskell> given to iteratee signals that the list is exhausted.

Here is a sample enumerator that takes input messages from a file: <pre-haskell> enumerator :: FilePath -> Iteratee (Maybe Char) o -> IO o enumerator file it = withFile file ReadMode

 $ \h -> fix (\rc it -> case it of
   Done o -> return o
   Next f -> do
     eof <- hIsEOF h
     case eof of
       False -> do
         c <- hGetChar h
         rc (f (Just c))
       True -> rc (f Nothing)
   ) it

</pre-haskell>

Functions

You can compose iteratees sequentially in time. This is done by <code-haskell>(>>)</code-haskell>. <code-haskell>it0 >> it1</code-haskell> means that when <code-haskell>it0</code-haskell> finishes, <code-haskell>it1</code-haskell> starts. Generally speaking, <code-haskell>Iteratee i</code-haskell> is a <code-haskell>Monad</code-haskell>. And it works exactly as monadic parser. <pre-haskell> {- s = state -} instance Functor (Iteratee input) where

 fmap f = fix $ \rc s -> case s of
   Done o -> Done (f o)
   Next g -> Next (rc . g)

instance Monad (Iteratee input) where

 return = Done
 it0 >>= it1 = fix (\rc s -> case s of
   Done o -> it1 o
   Next g -> Next (rc . g)
   ) it0

</pre-haskell>

You can compose iteratees sequentially in space. <code-haskell>it0</code-haskell>'s output messages become <code-haskell>it1</code-haskell>'s input messages. So <code-haskell>it0</code-haskell> and <code-haskell>it1</code-haskell> works in parallel. Their composition is denoted <code-haskell>it1 . it0</code-haskell>. If <code-haskell>it0</code-haskell> finishes, it is resurrected to its original state. If <code-haskell>it1</code-haskell> finishes, <code-haskell>it1 . it0</code-haskell> finishes. The main feature here is that <code-haskell>it0</code-haskell> is restarted. This is used for repetitive parsing. <pre-haskell> arr0 f = Next $ \i -> Done (f i) instance Category Iteratee where

 id = arr0 id
 it1 . it0 = fix (\rc1 it1 -> case it1 of
   Done c -> Done c
   Next f1 -> fix (\rc0 it0 -> case it0 of
     Done b -> rc1 (f1 b)
     Next f0 -> Next (rc0 . f0)
     ) it0
   ) it1

</pre-haskell>

Generalization

You may note that <code-haskell>Iteratee</code-haskell> is final coalgebra. Other kinds of automata can be described with other F-coalgebras. In practice such automata can handle network protocols or interactive user input. See for example papers by Bart Jacobs for theoretical discussion.

See also