I'm not currently doing much web development in Haskell, but I've done a lot in Ruby over the last few years, including writing a couple of frameworks, so I have some general comments.
1. Send the original, unmodified, unparsed request! This minimzies the chances that bugs on the web server side will cause pain for the appplication server side, and also gives you the best chance of recovering from these bugs. For example, one of the major problems with CGI and related protocols is that every web server seems to deal with things like the REQUEST_PATH differently, so switching web servers, or even making changes to a web server configuration, may break your application's dispatch. There's no reliable way, in CGI, to recover the URL that was originally used to access the resource.
Feel free to supply a library for parsing cookie headers or whatever, but make that part of the Haskell side, both to minimize the amount of work needed to be done by the web server implementor and to let us fix the bugs in once place, rather than several.
It's reasonable, if the web server has further configuration information to send (such as what the CGI "prefix path" is, if it has such a thing, or if we want to generate things like REQUEST_PATH for backward compatability) to send an environment (presumably as a dictionary) along with each request. But this should be separate from the request itself.
2. Consider how the protocol will work over a network, for those that need that functionality. Issues include:
- How do you send back error messages to be logged by the web server? - How do you set up and tear down connections. (One hopes not one per request!) - Do you multiplex requests in an async manner over a single connection? If so, do you force all clients to have to deal with multiplexed requests, or do you specify that it can be disabled?
Incidently, if it turns out to be a decent protocol, I may well contribute a set of Ruby bindings, since I'm reaching the point where I am soon going to have to rewrite from scratch the FastCGI library I'm using in Ruby, and that still won't eliminate all the problems I'm having.
Further to the above, here's a rough sketch of a protocol that I think would deal with my of the problems I've seen over the last few years.
A web server starts up one or more TCP connections to one or more backends, and is expected to round-robin the requests across all connections. Ideally this should be configurable with weights for each connection, but that's just a web-server-specific issue. The initial message on a connection includes a protocol version, and the backend should generate an appropriate response accepting the connection.
The protocol should probably be packetized so that the web server can send the body chunk by chunk, with length bytes for each, and send an end indication when there's no further data from the client. This is needed to deal with, e.g., a TCP disconnect from the client before its sent the complete request (i.e., you can't trust content-length).
When the web server sends a request to a backend, it sends a dictionary containing whatever environment information the server happens to want to send, and the request itself. The request should preferably be streamed as it's being received from the client so that the back-end can start processing even before the body, if any, has been entirely received. (This helps with processing uploaded files, especially for displaying (via ajax) upload status indications.)
A response goes back in the same way, though with optional interspersed messages for things to log to the error log. I don't see any particular need for the web server to do much validation on the response; make the application responsible for making sure it's doing HTTP correctly. At worst, the web server just ends up cutting off a persistent connection.