Difference between revisions of "Enumerator and iteratee"

From HaskellWiki
Jump to navigation Jump to search
m (Replaced the reference to the old talk with the reference to a new paper.)
m (Fix code highlighting)
 
Line 1: Line 1:
 
An enumerator is something that knows how to generate a list and an iteratee is something that does one step in processing another piece of the big list. E.g. to sum up all elements of a list, we do
 
An enumerator is something that knows how to generate a list and an iteratee is something that does one step in processing another piece of the big list. E.g. to sum up all elements of a list, we do
<pre-haskell>
+
<haskell>
 
foldl (+) 0 xs
 
foldl (+) 0 xs
</pre-haskell>
+
</haskell>
Then <code-haskell>foldl</code-haskell> is the enumerator and <code-haskell>((+),0)</code-haskell> is the iteratee.
+
Then <hask>foldl</hask> is the enumerator and <hask>((+),0)</hask> is the iteratee.
   
Clearly the function that sums the current element with the accumulator, <code-haskell>(+)</code-haskell>, doesn't know or care from which collection the elements are coming from. The initial seed, <code-haskell>0</code-haskell>, is again unaware of the collection. That achieves the
+
Clearly the function that sums the current element with the accumulator, <hask>(+)</hask>, doesn't know or care from which collection the elements are coming from. The initial seed, <hask>0</hask>, is again unaware of the collection. That achieves the
 
[[separation of concerns]]: fold (aka, enumerator) has the intimate knowledge
 
[[separation of concerns]]: fold (aka, enumerator) has the intimate knowledge
 
of the collection and how to get to the next element; iteratee knows
 
of the collection and how to get to the next element; iteratee knows
Line 12: Line 12:
 
== Definition ==
 
== Definition ==
   
Do not rely on the <code-haskell>foldl</code-haskell> analogy too firmly, it is misleading. <code-haskell>((+),0)</code-haskell> is an [[F-algebra]] and <code-haskell>foldl (+) 0</code-haskell> is a [[catamorphism]]. But iteratee is different, it is an [[automaton]]. From this point of view, the enumerator sends elements of a list sequentially, from head to tail, as input messages to the iteratee. If the iteratee finishes, it outputs an accumulator. If the iteratee continues, it outputs nothing (i.e., <code-haskell>()</code-haskell>).
+
Do not rely on the <hask>foldl</hask> analogy too firmly, it is misleading. <hask>((+),0)</hask> is an [[F-algebra]] and <hask>foldl (+) 0</hask> is a [[catamorphism]]. But iteratee is different, it is an [[automaton]]. From this point of view, the enumerator sends elements of a list sequentially, from head to tail, as input messages to the iteratee. If the iteratee finishes, it outputs an accumulator. If the iteratee continues, it outputs nothing (i.e., <hask>()</hask>).
   
 
So, a set of states of iteratee is divided into subsets "Done" and "Next". Done-state means that automaton finished consuming a list, i.e., the automaton is dead. Next-state means that you can give an input message and obtain the same automaton in a '''new''' state.
 
So, a set of states of iteratee is divided into subsets "Done" and "Next". Done-state means that automaton finished consuming a list, i.e., the automaton is dead. Next-state means that you can give an input message and obtain the same automaton in a '''new''' state.
<pre-haskell>
+
<haskell>
 
data Iteratee i o
 
data Iteratee i o
 
= Done o
 
= Done o
 
| Next (i -> Iteratee i o)
 
| Next (i -> Iteratee i o)
</pre-haskell>
+
</haskell>
   
<code-haskell>i</code-haskell> is the type of the iteratee's input messages (or list elements) and <code-haskell>o</code-haskell> is a type of the output message (an accumulator). Precisely speaking, <code-haskell>Iteratee</code-haskell> stores not an automaton, but an automaton in some state, an automaton with distinguished state. As you see, if an <code-haskell>Iteratee</code-haskell> is in the <code-haskell>Next</code-haskell> state, then we have a function that takes an input message and returns a new <code-haskell>Iteratee</code-haskell>.
+
<hask>i</hask> is the type of the iteratee's input messages (or list elements) and <hask>o</hask> is a type of the output message (an accumulator). Precisely speaking, <hask>Iteratee</hask> stores not an automaton, but an automaton in some state, an automaton with distinguished state. As you see, if an <hask>Iteratee</hask> is in the <hask>Next</hask> state, then we have a function that takes an input message and returns a new <hask>Iteratee</hask>.
   
 
The distinct feature of iteratee is that it can say after which list element an iteratee finishes. An iteratee says this by sending "Done" to an enumerator. Then the enumerator can, for example, close a file or a socket (a stream) where a list of characters is read from. [[Lazy I/O]], which uses lazy lists, closes a stream only when the stream is exhausted.
 
The distinct feature of iteratee is that it can say after which list element an iteratee finishes. An iteratee says this by sending "Done" to an enumerator. Then the enumerator can, for example, close a file or a socket (a stream) where a list of characters is read from. [[Lazy I/O]], which uses lazy lists, closes a stream only when the stream is exhausted.
   
The drawback is that an enumerator can not tell an iteratee that an input is exhausted — an <code-haskell>Iteratee</code-haskell> consumes only infinite lists. You can remedy this by assuming
+
The drawback is that an enumerator can not tell an iteratee that an input is exhausted — an <hask>Iteratee</hask> consumes only infinite lists. You can remedy this by assuming
<pre-haskell>
+
<haskell>
 
i == Maybe i'
 
i == Maybe i'
</pre-haskell>
+
</haskell>
where <code-haskell>i'</code-haskell> is a type of list elements. <code-haskell>Nothing</code-haskell> given to iteratee signals that the list is exhausted.
+
where <hask>i'</hask> is a type of list elements. <hask>Nothing</hask> given to iteratee signals that the list is exhausted.
   
 
Here is a sample enumerator that takes input messages from a file:
 
Here is a sample enumerator that takes input messages from a file:
<pre-haskell>
+
<haskell>
 
enumerator :: FilePath -> Iteratee (Maybe Char) o -> IO o
 
enumerator :: FilePath -> Iteratee (Maybe Char) o -> IO o
 
enumerator file it = withFile file ReadMode
 
enumerator file it = withFile file ReadMode
Line 45: Line 45:
 
True -> rc (f Nothing)
 
True -> rc (f Nothing)
 
) it
 
) it
</pre-haskell>
+
</haskell>
   
 
== Functions ==
 
== Functions ==
   
You can compose iteratees sequentially in time. This is done by <code-haskell>(>>)</code-haskell>. <code-haskell>it0 >> it1</code-haskell> means that when <code-haskell>it0</code-haskell> finishes, <code-haskell>it1</code-haskell> starts. Generally speaking, <code-haskell>Iteratee i</code-haskell> is a <code-haskell>Monad</code-haskell>, and it works exactly like a [[monadic parser]].
+
You can compose iteratees sequentially in time. This is done by <hask>(>>)</hask>. <hask>it0 >> it1</hask> means that when <hask>it0</hask> finishes, <hask>it1</hask> starts. Generally speaking, <hask>Iteratee i</hask> is a <hask>Monad</hask>, and it works exactly like a [[monadic parser]].
<pre-haskell>
+
<haskell>
 
{- s = state -}
 
{- s = state -}
 
instance Functor (Iteratee input) where
 
instance Functor (Iteratee input) where
Line 62: Line 62:
 
Next g -> Next (rc . g)
 
Next g -> Next (rc . g)
 
) it0
 
) it0
</pre-haskell>
+
</haskell>
   
You can also compose iteratees sequentially in space. <code-haskell>it0</code-haskell>'s output messages become <code-haskell>it1</code-haskell>'s input messages, so <code-haskell>it0</code-haskell> and <code-haskell>it1</code-haskell> work in parallel. Their composition is denoted <code-haskell>it1 . it0</code-haskell>. If <code-haskell>it0</code-haskell> finishes, it is resurrected to its original state. If <code-haskell>it1</code-haskell> finishes, <code-haskell>it1 . it0</code-haskell> finishes — The main feature here is that <code-haskell>it0</code-haskell> is restarted, as this is used for repetitive parsing.
+
You can also compose iteratees sequentially in space. <hask>it0</hask>'s output messages become <hask>it1</hask>'s input messages, so <hask>it0</hask> and <hask>it1</hask> work in parallel. Their composition is denoted <hask>it1 . it0</hask>. If <hask>it0</hask> finishes, it is resurrected to its original state. If <hask>it1</hask> finishes, <hask>it1 . it0</hask> finishes — The main feature here is that <hask>it0</hask> is restarted, as this is used for repetitive parsing.
<pre-haskell>
+
<haskell>
 
arr0 f = Next $ \i -> Done (f i)
 
arr0 f = Next $ \i -> Done (f i)
 
instance Category Iteratee where
 
instance Category Iteratee where
Line 76: Line 76:
 
) it0
 
) it0
 
) it1
 
) it1
</pre-haskell>
+
</haskell>
   
 
== Generalization ==
 
== Generalization ==
   
You may note that <code-haskell>Iteratee</code-haskell> is a [[final coalgebra]]. Other kinds of automata can be described with other [[F-coalgebra]]s. In practice such automata can handle network protocols or interactive user input. See for example [http://www.cs.ru.nl/~bart/PAPERS/index.html papers] by Bart Jacobs for theoretical discussion.
+
You may note that <hask>Iteratee</hask> is a [[final coalgebra]]. Other kinds of automata can be described with other [[F-coalgebra]]s. In practice such automata can handle network protocols or interactive user input. See for example [http://www.cs.ru.nl/~bart/PAPERS/index.html papers] by Bart Jacobs for theoretical discussion.
   
 
== See also ==
 
== See also ==

Latest revision as of 02:34, 13 July 2019

An enumerator is something that knows how to generate a list and an iteratee is something that does one step in processing another piece of the big list. E.g. to sum up all elements of a list, we do

foldl (+) 0 xs

Then foldl is the enumerator and ((+),0) is the iteratee.

Clearly the function that sums the current element with the accumulator, (+), doesn't know or care from which collection the elements are coming from. The initial seed, 0, is again unaware of the collection. That achieves the separation of concerns: fold (aka, enumerator) has the intimate knowledge of the collection and how to get to the next element; iteratee knows what to do with the current element.

Definition

Do not rely on the foldl analogy too firmly, it is misleading. ((+),0) is an F-algebra and foldl (+) 0 is a catamorphism. But iteratee is different, it is an automaton. From this point of view, the enumerator sends elements of a list sequentially, from head to tail, as input messages to the iteratee. If the iteratee finishes, it outputs an accumulator. If the iteratee continues, it outputs nothing (i.e., ()).

So, a set of states of iteratee is divided into subsets "Done" and "Next". Done-state means that automaton finished consuming a list, i.e., the automaton is dead. Next-state means that you can give an input message and obtain the same automaton in a new state.

data Iteratee i o
  = Done o
  | Next (i -> Iteratee i o)

i is the type of the iteratee's input messages (or list elements) and o is a type of the output message (an accumulator). Precisely speaking, Iteratee stores not an automaton, but an automaton in some state, an automaton with distinguished state. As you see, if an Iteratee is in the Next state, then we have a function that takes an input message and returns a new Iteratee.

The distinct feature of iteratee is that it can say after which list element an iteratee finishes. An iteratee says this by sending "Done" to an enumerator. Then the enumerator can, for example, close a file or a socket (a stream) where a list of characters is read from. Lazy I/O, which uses lazy lists, closes a stream only when the stream is exhausted.

The drawback is that an enumerator can not tell an iteratee that an input is exhausted — an Iteratee consumes only infinite lists. You can remedy this by assuming

i == Maybe i'

where i' is a type of list elements. Nothing given to iteratee signals that the list is exhausted.

Here is a sample enumerator that takes input messages from a file:

enumerator :: FilePath -> Iteratee (Maybe Char) o -> IO o
enumerator file it = withFile file ReadMode
  $ \h -> fix (\rc it -> case it of
    Done o -> return o
    Next f -> do
      eof <- hIsEOF h
      case eof of
        False -> do
          c <- hGetChar h
          rc (f (Just c))
        True -> rc (f Nothing)
    ) it

Functions

You can compose iteratees sequentially in time. This is done by (>>). it0 >> it1 means that when it0 finishes, it1 starts. Generally speaking, Iteratee i is a Monad, and it works exactly like a monadic parser.

{- s = state -}
instance Functor (Iteratee input) where
  fmap f = fix $ \rc s -> case s of
    Done o -> Done (f o)
    Next g -> Next (rc . g)
instance Monad (Iteratee input) where
  return = Done
  it0 >>= it1 = fix (\rc s -> case s of
    Done o -> it1 o
    Next g -> Next (rc . g)
    ) it0

You can also compose iteratees sequentially in space. it0's output messages become it1's input messages, so it0 and it1 work in parallel. Their composition is denoted it1 . it0. If it0 finishes, it is resurrected to its original state. If it1 finishes, it1 . it0 finishes — The main feature here is that it0 is restarted, as this is used for repetitive parsing.

arr0 f = Next $ \i -> Done (f i)
instance Category Iteratee where
  id = arr0 id
  it1 . it0 = fix (\rc1 it1 -> case it1 of
    Done c -> Done c
    Next f1 -> fix (\rc0 it0 -> case it0 of
      Done b -> rc1 (f1 b)
      Next f0 -> Next (rc0 . f0)
      ) it0
    ) it1

Generalization

You may note that Iteratee is a final coalgebra. Other kinds of automata can be described with other F-coalgebras. In practice such automata can handle network protocols or interactive user input. See for example papers by Bart Jacobs for theoretical discussion.

See also