From HaskellWiki
Revision as of 14:58, 11 April 2008 by JohanTibell (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Note: This page is currently being written and is in an intermediate state.


This document specifies a proposed standard interface between web servers and Haskell web applications or frameworks, to promote web application portability across a variety of web servers.

Specification Details

A Data Type for Representing Bytes

To be able to represent an HTTP message we need a type to represent bytes. Although some parts of a message can be represented by other types, be it integers or strings, some parts are properly viewed as a sequence of bytes (e.g. the message body). Therefor, we need a type to represent bytes. Haskell has three different types that could be and are used for this purpose:

  • String - Used both to represent bytes (e.g. in the Socket API) and text. Has a inefficient memory implementation giving it a larger memory footprint and poor cache behavior. Using it for storing bytes is also considered wrong since its purpose is to represent Unicode code points.
  • [Word8] - Has the same properties as String except for being explicitly intended to only contain binary data.
  • ByteString - A fast and memory efficient representation. Used in this proposal as it can be easily be converted to the above two types but the opposite is not possible without a performance penalty.

In this specification we use the strict, Word8 flavor:

import Data.ByteString (ByteString)

The Enumerator Type

The web server needs to provide the web application with the data in the request body and the web application needs to provide a response body (e.g. an HTML page) to the web server. They could do so using bytestrings. However, if the the amount of data to be sent is large (e.g. a big file) all data would have to be kept in memory leading to unnecessary high memory usage. A way to stream data between the server and the client is needed.

Streams can be represented in Haskell using lists as they are lazy or some more optimized data type like lazy bytestrings. However, using either of these two options is problematic in a web server serving hundreds or even thousands of request per second for the following reason. When the web application opens a file (or some other resource) to be sent through the client through the web server it needs to close this resource when it is no longer needed. Stream I/O using lists or lazy bytestrings both uses unsafeInterleaveIO together with a finalizer that gets run by the garbage collector to free the resource (i.e. file) when it's no longer needed. But since the garbage collector runs at some unpredictable time in the future the server might run out of resources before it is run leading to it crashing or being unresponsive.

To avoid this problem resources need to be freed as soon as they are no longer needed. There are (at least) two different ways to achieve this. The first is to use a stream type that provides an explicit close of the underlying resource:

class InputStream where
    read  :: IO Word8
    readN :: Int -> IO ByteString  -- ^ efficient block read
    close :: IO ()

This is the solution used in most imperative languages like Python. The other option is to use inversion of control and have the server drive the iteration through the resource and close it once it is no longer needed. Oleg showed how this can be done using essentially a left fold. He called his solution a left fold enumerator.

type Enumerator = forall a. (a -> ByteString -> IO (Either a a)) -> a -> IO a

The Environment Type

The Environment type providing information regarding the request.

data Environment = Environment
    { requestMethod   :: Method
    , scriptName      :: ByteString
    , pathInfo        :: ByteString
    , queryString     :: Maybe (ByteString)
    , requestProtocol :: (Int, Int)
    , headers         :: Headers
    , input           :: Enumerator
    , errors          :: String -> IO ()