Difference between revisions of "NIO"

From HaskellWiki
Jump to: navigation, search
Line 8: Line 8:
   
 
We need to first create a low-level API that covers the basic I/O functionality provided by the operating system which other, more high-level libraries can build upon.
 
We need to first create a low-level API that covers the basic I/O functionality provided by the operating system which other, more high-level libraries can build upon.
  +
  +
== Background Study ==
  +
  +
To get a good idea of the different possible trade-offs in designing an I/O library here's an overview over what I/O libraries look like in other programming languages.
  +
  +
=== Java ===
  +
  +
While Java first I/O library was built using streams the new I/O library, dubbed NIO, uses a similar concept called channels. The two basic channels, <code>ReadableByteChannel</code> and <code>WritableByteChannel</code>, have a very narrow interface only providing a single read and a single write function. These two function operate on <code>ByteBuffer</code>s. <code>ByteBuffer</code>s are mutable buffers that keep track on the next position available for writing and reading. Since the buffers can be allocated in a memory region used by the operating system for its native I/O operations additional copying can be avoided and the CPU might not have to be involved in the data transfer at all.
  +
  +
A Haskell clone of this API split into two modules could look as follows. First the buffer module:
  +
  +
<haskell>
  +
module System.Nio.Buffers
  +
( ByteBuffer,
  +
allocate,
  +
clear,
  +
flip,
  +
get,
  +
hasRemaining,
  +
limit,
  +
position,
  +
put,
  +
remaining,
  +
rewind,
  +
setLimit,
  +
setPosition
  +
) where
  +
  +
import Data.ByteString.Internal (mallocByteString)
  +
import Data.IORef (IORef, newIORef, modifyIORef, readIORef, writeIORef)
  +
import Data.Word (Word8)
  +
import Foreign (ForeignPtr, peek, plusPtr, poke, withForeignPtr)
  +
import Prelude hiding (flip)
  +
  +
-- ---------------------------------------------------------------------
  +
-- Buffers
  +
  +
-- | A byte buffer.
  +
data ByteBuffer = ByteBuffer
  +
{-# UNPACK #-} !(IORef Int) -- capacity
  +
{-# UNPACK #-} !(IORef Int) -- limit
  +
{-# UNPACK #-} !(IORef Int) -- position
  +
{-# UNPACK #-} !(ForeignPtr Word8)
  +
  +
-- | Allocates a new byte buffer.
  +
--
  +
-- The new buffer's position will be zero, its limit will be its
  +
-- capacity, and its mark will be undefined.
  +
allocate :: Int -> IO ByteBuffer
  +
allocate capacity = do
  +
cap <- newIORef capacity
  +
lim <- newIORef capacity
  +
pos <- newIORef 0
  +
fp <- mallocByteString capacity
  +
return $! ByteBuffer cap lim pos fp
  +
  +
-- | Clears this buffer. The position is set to zero, and the limit is
  +
-- set to the capacity.
  +
--
  +
-- This method does not actually erase the data in the buffer, but it
  +
-- is named as if it did because it will most often be used in
  +
-- situations in which that might as well be the case.
  +
clear :: ByteBuffer -> IO ()
  +
clear (ByteBuffer cap lim pos _) = do
  +
readIORef cap >>= writeIORef lim
  +
writeIORef pos 0
  +
  +
-- | Flips this buffer. The limit is set to the current position and
  +
-- then the position is set to zero.
  +
flip :: ByteBuffer -> IO ()
  +
flip (ByteBuffer _ lim pos _) = do
  +
readIORef pos >>= writeIORef lim
  +
writeIORef pos 0
  +
  +
-- | Reads the byte at this buffer's current position, and then
  +
-- increments the position.
  +
get :: ByteBuffer -> IO Word8
  +
get (ByteBuffer _ lim pos fp) = do
  +
lim' <- readIORef lim
  +
pos' <- readIORef pos
  +
if pos' == lim'
  +
then error "BufferUnderflowException"
  +
else do
  +
b <- withForeignPtr fp $ \p -> peek (p `plusPtr` pos')
  +
modifyIORef pos (+ 1)
  +
return b
  +
  +
-- | Tells whether there are any elements between the current position
  +
-- and the limit.
  +
hasRemaining :: ByteBuffer -> IO Bool
  +
hasRemaining (ByteBuffer _ lim pos _) = do
  +
lim' <- readIORef lim
  +
pos' <- readIORef pos
  +
return $! pos' < lim'
  +
  +
-- | Returns this buffer's limit.
  +
limit :: ByteBuffer -> IO Int
  +
limit (ByteBuffer _ lim _ _) = readIORef lim
  +
  +
-- | Returns this buffer's position.
  +
position :: ByteBuffer -> IO Int
  +
position (ByteBuffer _ _ pos _) = readIORef pos
  +
  +
-- | Writes the given byte into this buffer at the current position,
  +
-- and then increments the position.
  +
put :: Word8 -> ByteBuffer -> IO ()
  +
put b (ByteBuffer _ lim pos fp) = do
  +
lim' <- readIORef lim
  +
pos' <- readIORef pos
  +
if pos' == lim'
  +
then error "BufferOverflowException"
  +
else do withForeignPtr fp $ \p -> poke (p `plusPtr` pos') b
  +
modifyIORef pos (+ 1)
  +
  +
-- | Returns the number of elements between the current position and
  +
-- the limit.
  +
remaining :: ByteBuffer -> IO Int
  +
remaining (ByteBuffer _ lim pos _) = do
  +
lim' <- readIORef lim
  +
pos' <- readIORef pos
  +
return $! lim' - pos'
  +
  +
-- | Rewinds this buffer. The position is set to zero.
  +
rewind :: ByteBuffer -> IO ()
  +
rewind (ByteBuffer _ _ pos _) = writeIORef pos 0
  +
  +
-- | Sets this buffer's limit. If the position is larger than the new
  +
-- limit then it is set to the new limit.
  +
setLimit :: Int -> ByteBuffer -> IO ()
  +
setLimit newLimit (ByteBuffer cap lim pos _) = do
  +
cap' <- readIORef cap
  +
if newLimit < 0 || newLimit > cap'
  +
then error "IllegalArgumentException"
  +
else do
  +
writeIORef lim newLimit
  +
pos' <- readIORef pos
  +
writeIORef pos $! (min pos' newLimit)
  +
  +
-- | Sets this buffer's position.
  +
setPosition :: Int -> ByteBuffer -> IO ()
  +
setPosition newPosition (ByteBuffer _ lim pos _) = do
  +
lim' <- readIORef lim
  +
if newPosition < 0 || newPosition > lim'
  +
then error "IllegalArgumentException"
  +
else writeIORef pos newPosition
  +
</haskell>
  +
  +
And then the channel module:
  +
  +
<haskell>
  +
module System.Nio.Channels
  +
( Channel(..),
  +
ReadableByteChannel(..),
  +
WriteableByteChannel(..)
  +
) where
  +
  +
import System.Nio.Buffers
  +
  +
-- ---------------------------------------------------------------------
  +
-- Channels
  +
  +
class Channel a where
  +
close :: a -> IO ()
  +
isOpen :: a -> IO Bool
  +
  +
class Channel a => ReadableByteChannel a where
  +
read :: a -> ByteBuffer -> IO Int
  +
  +
class Channel a => WritableByteChannel a where
  +
write :: a -> ByteBuffer -> IO Int
  +
</haskell>
  +
  +
Missing from this implementation is a function to allocate buffers inside the operating system's area for native I/O and instances of the different channel classes.
  +
  +
The main drawback of this design is that it makes heavy use of mutable data structures (i.e. the buffers).
   
 
== Raw I/O ==
 
== Raw I/O ==

Revision as of 19:24, 21 July 2008

New I/O

This page is currently being created. Please do not edit.

Rationale and Goals

Haskell 98 specifies and number of I/O actions. All these actions accept and return Strings. However, Strings are not a good type for performing I/O in all cases. Their structure give them bad cache locality and they take up more memory per byte or character than more compact representations like ByteStrings. It is also conceptually the wrong type for some operations. For example, sockets receive and send bytes while file I/O often deals in terms of characters and yet both use String to represent these two different concepts.

We need to first create a low-level API that covers the basic I/O functionality provided by the operating system which other, more high-level libraries can build upon.

Background Study

To get a good idea of the different possible trade-offs in designing an I/O library here's an overview over what I/O libraries look like in other programming languages.

Java

While Java first I/O library was built using streams the new I/O library, dubbed NIO, uses a similar concept called channels. The two basic channels, ReadableByteChannel and WritableByteChannel, have a very narrow interface only providing a single read and a single write function. These two function operate on ByteBuffers. ByteBuffers are mutable buffers that keep track on the next position available for writing and reading. Since the buffers can be allocated in a memory region used by the operating system for its native I/O operations additional copying can be avoided and the CPU might not have to be involved in the data transfer at all.

A Haskell clone of this API split into two modules could look as follows. First the buffer module:

module System.Nio.Buffers
    ( ByteBuffer,
      allocate,
      clear,
      flip,
      get,
      hasRemaining,
      limit,
      position,
      put,
      remaining,
      rewind,
      setLimit,
      setPosition
    ) where

import Data.ByteString.Internal (mallocByteString)
import Data.IORef (IORef, newIORef, modifyIORef, readIORef, writeIORef)
import Data.Word (Word8)
import Foreign (ForeignPtr, peek, plusPtr, poke, withForeignPtr)
import Prelude hiding (flip)

-- ---------------------------------------------------------------------
-- Buffers

-- | A byte buffer.
data ByteBuffer = ByteBuffer
    {-# UNPACK #-} !(IORef Int)  -- capacity
    {-# UNPACK #-} !(IORef Int)  -- limit
    {-# UNPACK #-} !(IORef Int)  -- position
    {-# UNPACK #-} !(ForeignPtr Word8)

-- | Allocates a new byte buffer.
--
-- The new buffer's position will be zero, its limit will be its
-- capacity, and its mark will be undefined.
allocate :: Int -> IO ByteBuffer
allocate capacity = do
  cap <- newIORef capacity
  lim <- newIORef capacity
  pos <- newIORef 0
  fp <- mallocByteString capacity
  return $! ByteBuffer cap lim pos fp

-- | Clears this buffer. The position is set to zero, and the limit is
-- set to the capacity.
--
-- This method does not actually erase the data in the buffer, but it
-- is named as if it did because it will most often be used in
-- situations in which that might as well be the case.
clear :: ByteBuffer -> IO ()
clear (ByteBuffer cap lim pos _) = do
  readIORef cap >>= writeIORef lim
  writeIORef pos 0

-- | Flips this buffer. The limit is set to the current position and
-- then the position is set to zero.
flip :: ByteBuffer -> IO ()
flip (ByteBuffer _ lim pos _) = do
  readIORef pos >>= writeIORef lim
  writeIORef pos 0

-- | Reads the byte at this buffer's current position, and then
-- increments the position.
get :: ByteBuffer -> IO Word8
get (ByteBuffer _ lim pos fp) = do
  lim' <- readIORef lim
  pos' <- readIORef pos
  if pos' == lim'
     then error "BufferUnderflowException"
     else do
       b <- withForeignPtr fp $ \p -> peek (p `plusPtr` pos')
       modifyIORef pos (+ 1)
       return b

-- | Tells whether there are any elements between the current position
-- and the limit.
hasRemaining :: ByteBuffer -> IO Bool
hasRemaining (ByteBuffer _ lim pos _) = do
  lim' <- readIORef lim
  pos' <- readIORef pos
  return $! pos' < lim'

-- | Returns this buffer's limit.
limit :: ByteBuffer -> IO Int
limit (ByteBuffer _ lim _ _) = readIORef lim

-- | Returns this buffer's position.
position :: ByteBuffer -> IO Int
position (ByteBuffer _ _ pos _) = readIORef pos

-- | Writes the given byte into this buffer at the current position,
-- and then increments the position.
put :: Word8 -> ByteBuffer -> IO ()
put b (ByteBuffer _ lim pos fp) = do
  lim' <- readIORef lim
  pos' <- readIORef pos
  if pos' == lim'
     then error "BufferOverflowException"
     else do withForeignPtr fp $ \p -> poke (p `plusPtr` pos') b
             modifyIORef pos (+ 1)

-- | Returns the number of elements between the current position and
-- the limit.
remaining :: ByteBuffer -> IO Int
remaining (ByteBuffer _ lim pos _) = do
  lim' <- readIORef lim
  pos' <- readIORef pos
  return $! lim' - pos'

-- | Rewinds this buffer. The position is set to zero.
rewind :: ByteBuffer -> IO ()
rewind (ByteBuffer _ _ pos _) = writeIORef pos 0

-- | Sets this buffer's limit. If the position is larger than the new
-- limit then it is set to the new limit.
setLimit :: Int -> ByteBuffer -> IO ()
setLimit newLimit (ByteBuffer cap lim pos _) = do
  cap' <- readIORef cap
  if newLimit < 0 || newLimit > cap'
     then error "IllegalArgumentException"
     else do
       writeIORef lim newLimit
       pos' <- readIORef pos
       writeIORef pos $! (min pos' newLimit)

-- | Sets this buffer's position.
setPosition :: Int -> ByteBuffer -> IO ()
setPosition newPosition (ByteBuffer _ lim pos _) = do
  lim' <- readIORef lim
  if newPosition < 0 || newPosition > lim'
     then error "IllegalArgumentException"
     else writeIORef pos newPosition

And then the channel module:

module System.Nio.Channels
    ( Channel(..),
      ReadableByteChannel(..),
      WriteableByteChannel(..)
    ) where

import System.Nio.Buffers

-- ---------------------------------------------------------------------
-- Channels

class Channel a where
    close :: a -> IO ()
    isOpen :: a -> IO Bool

class Channel a => ReadableByteChannel a where
    read :: a -> ByteBuffer -> IO Int

class Channel a => WritableByteChannel a where
    write :: a -> ByteBuffer -> IO Int

Missing from this implementation is a function to allocate buffers inside the operating system's area for native I/O and instances of the different channel classes.

The main drawback of this design is that it makes heavy use of mutable data structures (i.e. the buffers).

Raw I/O

The new I/O library resides in the New I/O (NIO) module.

module System.Nio

All I/O actions deal in terms of ByteStrings.

import Data.ByteString
read :: Handle -> Int -> IO ByteString
write :: Handle -> ByteString -> IO Int
tell :: Handle -> IO Integer
seek :: Handle -> SeekMode -> Integer -> IO ()
close :: Handle -> IO ()
truncate :: Handle -> Integer -> IO ()  -- should throw some kind of exception
isReadable :: Handle -> IO Bool
isWritable :: Handle -> IO Bool

Buffered I/O

Text I/O

Non-blocking I/O

Extensibility

References

  1. http://www.python.org/dev/peps/pep-3116/