Difference between revisions of "DataDriven"

From HaskellWiki
Jump to navigation Jump to search
m (→‎Caching and weak pointers: link fixed again, thanks to kosmikus)
Line 119: Line 119:
 
Various forms of ''caching'' have this same problem. Suppose we use a hash table to memoize an expensive function. Even though the hash table is in service to function's arguments, the table's key/value entries keep the key values (function arguments) from ever getting reclaimed. Ideally, the situation would be reversed: the key would keep its table entry alive, and when the key was GC'd, they entry would shortly follow. Unfortunately, the direction of pointers, from entry to key, means that the entries keep the values alive, wasting space and slowing down search for useful entries.
 
Various forms of ''caching'' have this same problem. Suppose we use a hash table to memoize an expensive function. Even though the hash table is in service to function's arguments, the table's key/value entries keep the key values (function arguments) from ever getting reclaimed. Ideally, the situation would be reversed: the key would keep its table entry alive, and when the key was GC'd, they entry would shortly follow. Unfortunately, the direction of pointers, from entry to key, means that the entries keep the values alive, wasting space and slowing down search for useful entries.
   
The classic solution to this problem is to use "weak" pointers, i.e., pointers that the GC disregards in its decision about whether to retain a previously allocated object. For more discussion of these issues, see the [http://www.haskell.org/ghc/docs/latest/html/libraries/base-3.0.0.0/System-Mem-Weak.html <hask>System.Mem.Weak</hask> documentation] and "[http://citeseer.ist.psu.edu/peytonjones99stretching.html Stretching the storage manager]". (These ideas were a motivating application for the "Stretching" paper, but the description got cut for lack of space.)
+
The classic solution to this problem is to use "weak" pointers, i.e., pointers that the GC disregards in its decision about whether to retain a previously allocated object. For more discussion of these issues, see the [http://www.haskell.org/ghc/docs/6.8.1/html/libraries/base-3.0.0.0/System-Mem-Weak.html <hask>System.Mem.Weak</hask> documentation] and "[http://citeseer.ist.psu.edu/peytonjones99stretching.html Stretching the storage manager]". (These ideas were a motivating application for the "Stretching" paper, but the description got cut for lack of space.)
   
 
=== Ephemeral listeners ===
 
=== Ephemeral listeners ===

Revision as of 19:08, 26 November 2007


Abstract

Warning: The Haddock docs are not ready yet. I'm trying to get a working haddock 2.0 running (on my windows machine).

DataDriven is a library for functional events and time-varying values ("sources"). The ideas and interface come mainly from functional reactive programming (FRP). Most FRP implementations I'm aware of have a demand-driven implementation, while the implementation of DataDriven is data-driven (surprise). This library is a resurrection of some ideas from an old, incomplete Fran reimplementation that also became the basis of Meurig Sage's FranTk. This time around, I've been particularly interested in using standard classes as much as possible, most centrally Applicative and Monoid.

Besides this wiki page, here are more ways to find out about DataDriven:

Please leave comments at the Talk page.

Events

Background

The heart of the library is a notion of functional, composable events, with a data-driven implementation. Most of the ideas and vocabulary are borrowed from Fran, when Fran's events came to mean multiple occurrences (see Declarative Event-Oriented Programming, rather than the initial ICFP '97 publication). As in Fran, you can think of an event as a stream of "occurrences", each of which has a time and a value. The implementation, however, is radically different from Fran's, being data-driven rather than demand-driven. And in some cases, the functions are not pure. There are also several event-related functions, to create time-varying values.

A first look at the interface

Some of the useful event operations come through standard classes.

  • Functor: fmap f e is the event that occurs whenever e occurs, but whose occurrence values come from applying f to the values from e. (Fran's (==>).)
  • Monoid: mempty is the event that never occurs, and e `mappend` e' is the event that combines occurrences from e and e'. (Fran's neverE and (.|.).)
  • Monad: return a is an event with a single occurrence. This one doesn't quite fit the original semantics, as the occurrence is delivered immediately on "listening" to an event (discussed later). In e >>= f, each occurrence of e leads, through f, to a new event. Similarly for join ee, which is somehow simpler for me to think about. The occurrences of e >>= f (or join ee) correspond to the union of the occurrences of all such events. For example, suppose we're playing Asteroids and tracking collisions. Each collision can break an asteroid into more of them, each of which has to be tracked for more collisions. Another example: A chat room has an "enter" event, whose occurrences contain new events like "speak".

As a simple example, the following function transforms and combines two events:

show2 :: (Show a, Show b) => Event a -> Event b -> Event String
show2 ea eb = showE ea `mappend` showE eb
 where
   showE e = fmap show e

Events as continuations

The Event type is not actually a new type, but merely a specialization of the familiar type of continuation-based computations, Cont:

newtype Cont o a = Cont { runCont :: (a -> o) -> o }

The Functor and Monad instances come from Cont. The Monoid instance for Cont is missing (as of 2007-09-08), so it is defined in this module (and thus is an "orphan") simply by deriving. The more specialized event type is simply

type Event = Cont (IO ())

Why does it make sense to think of continuation-based computations as events? Because an event is something that one can subscribe to. Subscription provides a "listener" (a continuation) to be invoked on every occurrence of the event. If the occurrence value has type a, and the result of the listener and of registration has type o, then subscribing has type (a -> o) -> o, which is the type wrapped by Cont.

The Monoid, Functor, and Monad operations are simple. Given a listener l :: a -> o,

  • Subscribing l to mempty has no effect, since the mempty is guaranteed never to occur.
  • Subscribing l to ea `mappend` eb subscribes l to each of ea and eb.
  • Subscribing l to fmap f e subscribes l . f to e.
  • Subscribing l to return a immediately invokes l a.
  • Subscribing l to join e subscribes to e a listener that subscribes to every event generated by e. (Similarly for e >>= f == join (fmap f e).)

The functions in the Event module operate on this general notion of events (Cont) or something more specialized. I expect the most common use to be the Event (IO) specialization, and the types are often much easier to read for that type. General functions are given general signatures, with the Event specializations as comments.

Sources

Sources are time-varying values, akin to Fran's "behaviors". They are built up mainly from constant values and application (via the Applicative interface), as well as reaction to events.

Composing Sources

Like events, sources have a more general, and surprisingly simple, form:

type SourceG change m = ((,) change) `O` m

The change type parameter provides a description of everything that can affect a source (cause it to change). The m parameter is a way to sample the value when changed. Here g `O` h means the composition of two type constructors, and is defined in TypeCompose. Without the fancy type constructors,

type SourceG' change m a = (change, m a)

One of the delightful properties of functors and of applicative functors is that they compose. That is, two functors compose to a functor and two AFs compose to form an AF. For any monoid o, ((,) o) is an AF (corresponding to the writer monad). So, when change is a monoid and m is an AF, SourceG change m is an AF.

There are many possible monoid choices for change. One especially useful one is a continuation/event:

type SourceC o m = SourceG (Cont (m o) ()) m

Still more specifically

type Source = SourceC () IO

A source, then, is simply a change event together with a sampler IO. Given an AF application f <*> a for AFs f and a, the change event combines (mappend) change events for f and a, and sampling just applies a sampling of f to a sampling of a.

Sources and events

As an example of event-based sources, the following function makes a source with an initial value and changing at every occurrence of an event. The resulting source remembers the event's most recent occurrence value.

mkStepper :: a -> Event a -> IO (Source a)

The result of mkStepper a e is an IO because it starts reacting to occurrences of e only after it is executed. The semantic difference is clearer with the following function, which accumulates event occurrence values:

mkAccumS :: a -> Event (a -> a) -> IO (Source a)

Sources are also used to make events. For instance, the snapshot function samples a source whenever an event occurs, and pairs the occurrence and source values.

snapshot :: Event a -> Source b -> Event (a,b)

Ephemeral values

GC favors demand-driven computation

The purpose of garbage collection is to keep services alive as long as they are useful to clients and then free up the services' computational resources (effort and memory). Conventional garbage collection works very well for demand-driven (pull-based) computation, but not for data-driven (push-based) computation.

Consider a piece of information supplied by a service and used by a client. In a demand-driven scenario, the client has a pointer to the service and uses that pointer to get more of the information. The client keeps the serice alive. When the client get GC'd, its pointer to the service goes away. If there are no more pointers to the service, then it will also get GC'd. Both the computational effort and the memory are freed up for other uses. GC did its job.

The situation is reversed for data-driven computation. Here, the service pushes information to the client, so the service has a pointer to the client. This pointer means that the service keeps the client alive and keeps computing even when the client is no longer of any use. GC fails to satisfy its purpose.

Caching and weak pointers

Various forms of caching have this same problem. Suppose we use a hash table to memoize an expensive function. Even though the hash table is in service to function's arguments, the table's key/value entries keep the key values (function arguments) from ever getting reclaimed. Ideally, the situation would be reversed: the key would keep its table entry alive, and when the key was GC'd, they entry would shortly follow. Unfortunately, the direction of pointers, from entry to key, means that the entries keep the values alive, wasting space and slowing down search for useful entries.

The classic solution to this problem is to use "weak" pointers, i.e., pointers that the GC disregards in its decision about whether to retain a previously allocated object. For more discussion of these issues, see the System.Mem.Weak documentation and "Stretching the storage manager". (These ideas were a motivating application for the "Stretching" paper, but the description got cut for lack of space.)

Ephemeral listeners

What does all this have to events and sources? Recall that an event is simply a means of registering a "listener" (continuation) to be invoked on every occurrence. The event must have some kind of reference to the listener in order to invoke it. It must not keep the listener alive, however, since the event is the service and the listener is the client. The solution is for events to hold their clients weakly, i.e., point to them via weak references. Once a listener's strong pointers all disappear, the GC nulls out ("tombstones") the event's weak pointer. The next time the event occurs, it finds that it no longer has a live pointer, so it stops notifying the listener.

One of the (currently four) functions in Data.Ephemeral converts a value into an ephemeral one:

ephemeral :: (WeakM m, Monoid o) => o -> m o

WeakM refers to monads having weak pointers, currently just IO. The result of ephemeral o is an "ephemeral" monadic version mo'. Initially, mo' just returns o. Once o is GC'd, mo' instead returns mempty. The ephemeral function is a special case of the more general ephemeralWith, which drops the Monoid constraint and takes an explicit fall-back value.

The implementation of DataDriven takes care of ephemerality automatically, so client code doesn't have to worry about it. The only sign of this issue is the WeakM monad constraint in the most general forms of Event and Source functions. In fact, in the implementation of DataDriven, only one primitive worries about ephemerality, and all of the others inherit the benefits.