Difference between revisions of "NIO"

From HaskellWiki
Jump to navigation Jump to search
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
= New I/O =
 
= New I/O =
   
  +
This is a new I/O library for Haskell that is intended to provide a
''This page is currently being created. Please do not edit.''
 
  +
high performance API that make good use of advance operating system
  +
facilities for I/O.
   
 
== Rationale and Goals ==
 
== Rationale and Goals ==
   
Haskell 98 specifies and number of I/O actions. All these actions accept and return <hask>String</hask>s. However, <hask>String</hask>s are not a good type for performing I/O in all cases. Their structure give them bad cache locality and they take up more memory per byte or character than more compact representations like <hask>ByteString</hask>s. It is also conceptually the wrong type for some operations. For example, sockets receive and send bytes while file I/O often deals in terms of characters and yet both use <hask>String</hask> to represent these two different concepts.
+
Haskell 98 specifies and number of I/O actions. All these actions accept and return <hask>String</hask>s. However, <hask>String</hask>s perform badly and waste space. They are also conceptually the wrong type for many operations. For example, sockets receive and send bytes while file I/O often deals in terms of characters and yet both use <hask>String</hask> while sockets should use a data type that represents binary data such as <hask>ByteString</hask>s.
   
  +
Furthermore, do not use of efficient operating system APIs for asynchronous I/O like <code>epoll</code>.
We need to first create a low-level API that covers the basic I/O functionality provided by the operating system which other, more high-level libraries can build upon.
 
  +
  +
== Background Study ==
  +
  +
To get a good idea of the different possible trade-offs in designing an I/O library here's an overview over what I/O libraries look like in other programming languages.
  +
  +
=== Java ===
  +
  +
While Java first I/O library was built using streams the new I/O library, dubbed NIO, uses a similar concept called channels. The two basic channels, <code>ReadableByteChannel</code> and <code>WritableByteChannel</code>, have a very narrow interface only providing a single read and a single write function. These two function operate on <code>ByteBuffer</code>s. <code>ByteBuffer</code>s are mutable buffers that keep track on the next position available for writing and reading. Since the buffers can be allocated in a memory region used by the operating system for its native I/O operations additional copying can be avoided and the CPU might not have to be involved in the data transfer at all.
  +
  +
=== Available OS APIs for asynchronous I/O ===
  +
  +
==== epoll ====
  +
  +
Linux provides <code>epoll</code>, a more efficient version of the older <code>poll</code> API, since version 2.5.44. The man page describes <code>epoll</code>:
  +
  +
<blockquote>
  +
<p>"An epoll set is connected to a file descriptor created by epoll_create(2). Interest for certain file descriptors is then registered via epoll_ctl(2). Finally, the actual wait is started by epoll_wait(2).</p>
  +
<p>
  +
An epoll set is connected to a file descriptor created by epoll_create(2). Interest for certain file descriptors is then registered via epoll_ctl(2). Finally, the actual wait is started by epoll_wait(2)."
  +
</p>
  +
</blockquote>
  +
  +
The API provides the following functions:
  +
  +
<pre>
  +
#include <sys/epoll.h>
  +
  +
int epoll_create(int size);
  +
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
  +
int epoll_wait(int epfd, struct epoll_event *events,
  +
int maxevents, int timeout);
  +
int epoll_pwait(int epfd, struct epoll_event *events,
  +
int maxevents, int timeout,
  +
const sigset_t *sigmask);
  +
</pre>
   
 
== Raw I/O ==
 
== Raw I/O ==
Line 30: Line 67:
 
close :: Handle -> IO ()
 
close :: Handle -> IO ()
 
truncate :: Handle -> Integer -> IO () -- should throw some kind of exception
 
truncate :: Handle -> Integer -> IO () -- should throw some kind of exception
  +
isReadable :: Handle -> IO Bool
  +
isWritable :: Handle -> IO Bool
 
</haskell>
 
</haskell>
   
Line 43: Line 82:
   
 
# http://www.python.org/dev/peps/pep-3116/
 
# http://www.python.org/dev/peps/pep-3116/
  +
# http://www.youtube.com/watch?v=yNRS1ssLPdQ
#
 
  +
# http://openjdk.java.net/projects/nio/presentations/TS-5686.pdf
  +
# http://javanio.info/filearea/nioserver/WhatsNewNIO2.pdf
  +
# http://jcp.org/en/jsr/detail?id=203

Latest revision as of 21:13, 10 August 2008

New I/O

This is a new I/O library for Haskell that is intended to provide a high performance API that make good use of advance operating system facilities for I/O.

Rationale and Goals

Haskell 98 specifies and number of I/O actions. All these actions accept and return Strings. However, Strings perform badly and waste space. They are also conceptually the wrong type for many operations. For example, sockets receive and send bytes while file I/O often deals in terms of characters and yet both use String while sockets should use a data type that represents binary data such as ByteStrings.

Furthermore, do not use of efficient operating system APIs for asynchronous I/O like epoll.

Background Study

To get a good idea of the different possible trade-offs in designing an I/O library here's an overview over what I/O libraries look like in other programming languages.

Java

While Java first I/O library was built using streams the new I/O library, dubbed NIO, uses a similar concept called channels. The two basic channels, ReadableByteChannel and WritableByteChannel, have a very narrow interface only providing a single read and a single write function. These two function operate on ByteBuffers. ByteBuffers are mutable buffers that keep track on the next position available for writing and reading. Since the buffers can be allocated in a memory region used by the operating system for its native I/O operations additional copying can be avoided and the CPU might not have to be involved in the data transfer at all.

Available OS APIs for asynchronous I/O

epoll

Linux provides epoll, a more efficient version of the older poll API, since version 2.5.44. The man page describes epoll:

"An epoll set is connected to a file descriptor created by epoll_create(2). Interest for certain file descriptors is then registered via epoll_ctl(2). Finally, the actual wait is started by epoll_wait(2).

An epoll set is connected to a file descriptor created by epoll_create(2). Interest for certain file descriptors is then registered via epoll_ctl(2). Finally, the actual wait is started by epoll_wait(2)."

The API provides the following functions:

#include <sys/epoll.h>
 
int epoll_create(int size);
int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
int epoll_wait(int epfd, struct epoll_event *events,
               int maxevents, int timeout);
int epoll_pwait(int epfd, struct epoll_event *events,
                int maxevents, int timeout,
                const sigset_t *sigmask);

Raw I/O

The new I/O library resides in the New I/O (NIO) module.

module System.Nio

All I/O actions deal in terms of ByteStrings.

import Data.ByteString
read :: Handle -> Int -> IO ByteString
write :: Handle -> ByteString -> IO Int
tell :: Handle -> IO Integer
seek :: Handle -> SeekMode -> Integer -> IO ()
close :: Handle -> IO ()
truncate :: Handle -> Integer -> IO ()  -- should throw some kind of exception
isReadable :: Handle -> IO Bool
isWritable :: Handle -> IO Bool

Buffered I/O

Text I/O

Non-blocking I/O

Extensibility

References

  1. http://www.python.org/dev/peps/pep-3116/
  2. http://www.youtube.com/watch?v=yNRS1ssLPdQ
  3. http://openjdk.java.net/projects/nio/presentations/TS-5686.pdf
  4. http://javanio.info/filearea/nioserver/WhatsNewNIO2.pdf
  5. http://jcp.org/en/jsr/detail?id=203