Difference between revisions of "Library/Streams"

From HaskellWiki
Jump to navigation Jump to search
(Improved document structure)
(Added link to Iteratee I/O and the io-streams hackage package)
(36 intermediate revisions by 5 users not shown)
Line 1: Line 1:
  +
[[Category:Libraries]]
 
== Introduction ==
 
== Introduction ==
   
 
=== Streams: the extensible I/O library ===
 
=== Streams: the extensible I/O library ===
   
I have developed a new I/O library that IMHO is so sharp that it can
+
I (Bulat Ziganshin) developed a new I/O library in 2006 that IMHO is so sharp that it can eventually replace the current I/O facilities based on using Handles. The main advantage of the new library is its strong modular design using
eventually replace the current I/O facilities based on using Handles.
 
The main advantage of the new library is its strong modular design using
 
 
typeclasses. The library consists of small independent modules, each
 
typeclasses. The library consists of small independent modules, each
 
implementing one type of stream (file, memory buffer, pipe) or one
 
implementing one type of stream (file, memory buffer, pipe) or one
Line 19: Line 18:
 
this as my own work. :) Further development direction was inspired
 
this as my own work. :) Further development direction was inspired
 
by the "New I/O library" written by Simon Marlow.
 
by the "New I/O library" written by Simon Marlow.
  +
  +
---
  +
  +
More recent, 2013-04, developments have focused on [[Iteratee_I/O]] and in particular [http://hackage.haskell.org/package/io-streams io-streams] is similar in it's focus on I/O and replacing file handles.
   
 
=== Simple Streams ===
 
=== Simple Streams ===
Line 24: Line 27:
 
The key concept of the lib is the Stream class, whose interface mimics
 
The key concept of the lib is the Stream class, whose interface mimics
 
familiar interface for Handles, just with "h" replaced with "v" in
 
familiar interface for Handles, just with "h" replaced with "v" in
function names: vGetContents, vSeek, vIsEOF, vClose and so on. This
+
function names:
  +
means that you already know how to use any stream! The Stream interface
 
  +
<haskell>
  +
class (Monad m) => Stream m h where
  +
vPutStrLn :: h -> String -> m ()
  +
vGetContents :: h -> m String
  +
vIsEOF :: h -> m Bool
  +
vClose :: h -> m ()
  +
....................
  +
</haskell>
  +
  +
This means that you already know how to use any stream! The Stream interface
 
currently has 8 implementations: a Handle itself, raw files, pipes,
 
currently has 8 implementations: a Handle itself, raw files, pipes,
 
memory buffers and string buffers. Future plans include support for
 
memory buffers and string buffers. Future plans include support for
Line 32: Line 45:
   
 
By themselves, these Stream implementations are rather simple. Basically,
 
By themselves, these Stream implementations are rather simple. Basically,
to implement new streams, it's enough to provide vPutBuf/vGetBuf
+
to implement new Stream type, it's enough to provide vPutBuf/vGetBuf
operations, or even vGetChar/vPutChar. The latter way, although
+
operations, or even vGetChar/vPutChar. The latter way, although
ineffective (inefficient?), allows us to implement streams that can work in any monad.
+
inefficient, allows us to implement streams that can work in any monad.
 
StringReader and StringBuffer streams use this to provide string-based
 
StringReader and StringBuffer streams use this to provide string-based
Stream class implementations both for IO and ST monads. And, yes, you can
+
Stream class implementations both for IO and ST monads. Yes, you can
 
use the full power of Stream operations inside the ST monad!
 
use the full power of Stream operations inside the ST monad!
   
=== Layers of Functionality ===
+
=== Layers of functionality ===
   
 
All additional functionality is implemented via Stream Transformers,
 
All additional functionality is implemented via Stream Transformers,
Line 47: Line 60:
 
Stream. For example:
 
Stream. For example:
   
  +
<haskell>
 
h <- openRawFD "test" WriteMode
 
h <- openRawFD "test" WriteMode
 
>>= bufferBlockStream
 
>>= bufferBlockStream
 
>>= withEncoding utf8
 
>>= withEncoding utf8
 
>>= withLocking
 
>>= withLocking
  +
</haskell>
   
This code creates a new FD, which represents raw files, and then adds
+
This code creates a new FD, which represents a raw file, and then adds
 
to this Stream buffering, Char encoding and locking functionality. The
 
to this Stream buffering, Char encoding and locking functionality. The
 
result type of "h" is something like this:
 
result type of "h" is something like this:
   
  +
<haskell>
 
WithLocking (WithEncoding (BufferedBlockStream FD))
 
WithLocking (WithEncoding (BufferedBlockStream FD))
  +
</haskell>
   
 
The complete type, as well as all the intermediate types, implements the Stream
 
The complete type, as well as all the intermediate types, implements the Stream
Line 71: Line 88:
 
transformer flushes the current buffer and passes the call further.
 
transformer flushes the current buffer and passes the call further.
 
Finally, FD itself performs the operation after all these
 
Finally, FD itself performs the operation after all these
preparations and on the returning path the locking transformer release
+
preparations and on the returning path, the locking transformer release
 
its lock.
 
its lock.
   
As another example, the vPutChar call on this Stream is
+
As another example, the "vPutChar" call on this Stream is
transformed (after locking) into several vPutByte calls by the
+
transformed (after locking) into several "vPutByte" calls by the
 
encoding transformer, and these bytes go to the buffer in the
 
encoding transformer, and these bytes go to the buffer in the
 
buffering transformer, with or without a subsequent call to the FD's
 
buffering transformer, with or without a subsequent call to the FD's
vPutBuf.
+
"vPutBuf".
   
 
=== Modularity ===
 
=== Modularity ===
Line 103: Line 120:
 
=== Speed ===
 
=== Speed ===
   
A quick comment about speed: it's fast enough -- 12-70 MB/s (depending
+
A quick comment about speed: it's fast enough -- 10-50 MB/s (depending
on the type of operation) on a 1GHz cpu. Compared to the old Handles,
+
on the type of operation) on a 1GHz cpu. The Handle operations, for comparison,
  +
show speed of 1-10 mb/s on the same computer. But that don't means that each
this library shows up to a 60x speed improvement. The library
 
  +
and any operation in new library is 10 times faster. Strict I/O (including
includes benchmarking code in the file "Examples/StreamsBenchmark.hs"
 
  +
vGetChar/vPutChar) is a LOT faster. I included a demonstration of this
  +
fascinating speed as "Examples/wc.hs". If you need a really high speed,
  +
don't forget to increase buffer size with "vSetBuffering".
   
  +
On the other side, lazy I/O (including any operations that receive or return
=== Stage of Development ===
 
  +
strings) show only modest speedup. This is limited by Haskell/GHC itself and
  +
I can't do much to get around these limits. Instead, I plan to provide support
  +
for I/O using packed strings. This will allow to write I/O-intensive Haskell
  +
programs that are as fast as their C counterparts.
   
  +
Other sources of slowness includes using of locking transformer (if you need
The library is currently at the beta stage. It contains a number of
 
  +
to do this, try use "lock" around speed-critical algorithms) and complex class
known minor problems and an unknown number of yet-to-be-discovered bugs.
 
  +
structure, what may be avoided by using "forall" types (I'm not sure, Simon
It is not properly documented, doesn't include QuickCheck tests, is not
 
  +
Marlow can enlighten this topic).
cabalized, and not all "h*" operations still have their "v*" equivalents.
 
If anyone wants to join this effort in order to help fix these oddities
 
and prepare the lib for inclusion in the standard libraries suite, I would
 
be really happy. :) I will also be happy (although much less ;) to see
 
bug reports and suggestions about its interface and internal
 
organization. It's just a first public version, so we still can change
 
everything here!
 
   
  +
The library includes benchmarking code in the file "Examples/StreamsBenchmark.hs"
== Overview of Stream Transformers ==
 
   
  +
Now for a small overview of transformers and streams, as of the current implementation.
 
  +
  +
== Overview of Stream transformers ==
   
 
=== Buffering ===
 
=== Buffering ===
   
There are 3 buffering transformers. Each buffering transformer
+
There are three buffering transformers. Each buffering transformer
 
implements support for vGetByte, vPutChar, vGetContents and other
 
implements support for vGetByte, vPutChar, vGetContents and other
 
byte- and text-oriented operations for the streams, which by themselves
 
byte- and text-oriented operations for the streams, which by themselves
support only vGetBuf/vPutBuf (or vReceiveBuf/vSendBuf) operations, which are implemented, of course, by using an intermediate buffer.
+
support only vGetBuf/vPutBuf (or vReceiveBuf/vSendBuf) operations.
   
The first transformer can be applied to any streams supporting
+
The first transformer can be applied to any stream supporting
 
vGetBuf/vPutBuf. This is applied by the operation "bufferBlockStream". The
 
vGetBuf/vPutBuf. This is applied by the operation "bufferBlockStream". The
 
well-known vSetBuffering/vGetBuffering operations are intercepted by
 
well-known vSetBuffering/vGetBuffering operations are intercepted by
Line 141: Line 161:
 
Two other transformers can be applied to streams that implement
 
Two other transformers can be applied to streams that implement
 
vReceiveBuf/vSendBuf operations -- that is, streams whose data
 
vReceiveBuf/vSendBuf operations -- that is, streams whose data
resides in memory, including in-memory streams and memory-mapped
+
reside in memory, including in-memory streams and memory-mapped
 
files. In these cases, the buffering transformer doesn't need to allocate
 
files. In these cases, the buffering transformer doesn't need to allocate
 
a buffer itself, it just requests from the underlying stream the address and
 
a buffer itself, it just requests from the underlying stream the address and
Line 147: Line 167:
 
result is the same -- we get support for all byte- and text-oriented
 
result is the same -- we get support for all byte- and text-oriented
 
I/O operations. The "bufferMemoryStream" operation can be applied to any
 
I/O operations. The "bufferMemoryStream" operation can be applied to any
memory-based stream to add buffering to it. The "bufferMemoryStreamUnchecked" operation (which implements the third buffering
+
memory-based stream to add buffering to it. The "bufferMemoryStreamUnchecked"
transformer) can be used instead, if you can guarantee that I/O
+
operation (which implements the third buffering transformer) can be used instead,
operations can't overflow used buffer.
+
if you can guarantee that I/O operations can't overflow the used buffer.
   
 
=== Encoding ===
 
=== Encoding ===
Line 163: Line 183:
 
"Data.CharEncoding" module to see how to implement new encodings.
 
"Data.CharEncoding" module to see how to implement new encodings.
 
Encoding of streams created with the "withEncoding" operation can be
 
Encoding of streams created with the "withEncoding" operation can be
  +
queried with "vGetEncoding". See examples of their usage in the file
changed at any moment with "vSetEncoding" and queried with
 
"vGetEncoding". See examples of their usage in the file
 
 
"Examples/CharEncoding.hs"
 
"Examples/CharEncoding.hs"
   
 
=== Locking ===
 
=== Locking ===
   
The locking transformer ensures that the stream is properly shared by
+
The locking transformer ensures that the stream is properly shared by
 
several threads. You already know enough about its basic usage --
 
several threads. You already know enough about its basic usage --
 
"withLocking" applies this transformer to the stream and all the
 
"withLocking" applies this transformer to the stream and all the
Line 175: Line 194:
 
operations to acquire the lock explicitly during multiple operations:
 
operations to acquire the lock explicitly during multiple operations:
   
  +
<haskell>
 
lock h $ \h -> do
 
lock h $ \h -> do
 
savedpos <- vTell h
 
savedpos <- vTell h
Line 180: Line 200:
 
vPutStr h ":-)"
 
vPutStr h ":-)"
 
vSeek h AbsoluteSeek savedpos
 
vSeek h AbsoluteSeek savedpos
  +
</haskell>
   
  +
See the file "Examples/Locking.hs" for examples of using locking transformer.
== Overview of Stream Types ==
 
  +
  +
=== Attaching user data ===
  +
  +
This transformer allows you to attach arbitrary data to any Stream. It does
  +
nothing extraordinary except that the stream with attached data is the proper
  +
Stream, again. See example of its usage in the file "Examples/UserData.hs"
  +
  +
== Overview of Stream [[type]]s ==
   
 
=== Handle (legacy way to access files/sockets) ===
 
=== Handle (legacy way to access files/sockets) ===
   
Handle is an instance of the Stream class, with a straightforward implementation. You can use the
+
"Handle" is an instance of the Stream class, with a straightforward implementation.
  +
You can use the
 
Char encoding transformer with Handles. Although Handles implement
 
Char encoding transformer with Handles. Although Handles implement
 
buffering and locking by themselves, you may also be interested in
 
buffering and locking by themselves, you may also be interested in
Line 199: Line 229:
 
The new method of using files, independent of the existing I/O
 
The new method of using files, independent of the existing I/O
 
library, is implemented with the FD type. FD is just an Int representing a
 
library, is implemented with the FD type. FD is just an Int representing a
POSIX file descriptor and FD type implements only basic Stream I/O
+
POSIX file descriptor and the FD type implements only basic Stream I/O
 
operations - vGetBuf and vPutBuf. So, to create a full-featured FD-based
 
operations - vGetBuf and vPutBuf. So, to create a full-featured FD-based
stream, you need to apply buffering transformers. Therefore, library
+
stream, you need to apply buffering transformers. Therefore, the library
 
defines two ways to open files with FD - openRawFD/openRawBinaryFD
 
defines two ways to open files with FD - openRawFD/openRawBinaryFD
 
just creates FD, while openFD/openBinaryFD creates FD and immediatelly
 
just creates FD, while openFD/openBinaryFD creates FD and immediatelly
 
apply buffering transformer (bufferBlockStream) to it. In most cases
 
apply buffering transformer (bufferBlockStream) to it. In most cases
you will use the later operations. Both pairs mimics the arguments and
+
you will use the latter operations. Both pairs mimic the arguments and
 
behaviour of well-known Handle operations openFile/openBinaryFile, so
 
behaviour of well-known Handle operations openFile/openBinaryFile, so
 
you already know how to use them. Other transformers may be used then
 
you already know how to use them. Other transformers may be used then
 
as you need. So, abovementioned example can be abbreviated to:
 
as you need. So, abovementioned example can be abbreviated to:
   
  +
<haskell>
 
h <- openFD "test" WriteMode
 
h <- openFD "test" WriteMode
 
>>= withEncoding utf8
 
>>= withEncoding utf8
 
>>= withLocking
 
>>= withLocking
  +
</haskell>
   
 
Thus, to switch from the existing I/O library to using Streams, you
 
Thus, to switch from the existing I/O library to using Streams, you
need only to replace "h" with "v" in names of Handle operations, and
+
need only to replace "h" with "v" in the names of Handle operations, and
 
replace openFile/openBinaryFile calls with openFD/openBinaryFD while
 
replace openFile/openBinaryFile calls with openFD/openBinaryFD while
adding "withLocking" transformer to files used in multiple threads.
+
adding the "withLocking" transformer to files used in multiple threads.
 
That's all!
 
That's all!
   
  +
For example, the following code:
  +
  +
<haskell>
  +
h <- openFile "test" ReadMode
  +
text <- hGetContents h
  +
hClose h
  +
</haskell>
  +
  +
should be translated to:
  +
  +
<haskell>
  +
h <- openFD "test" ReadMode
  +
-- >>= withLocking -- needed only for multi-threaded usage
  +
text <- vGetContents h
  +
vClose h
  +
</haskell>
  +
  +
  +
File "Examples/FD.hs" will show you the FD usage.
  +
  +
  +
In order to work with stdin/stdout/stderr via FDs, you should open them in the same way:
  +
  +
<haskell>
  +
stdinStream <- bufferBlockStream fdStdIn
  +
>>= withEncoding utf8 -- optional, required only for using non-Latin1 encoding
  +
>>= withLocking -- optional, required only to use this Stream in concurrent Haskell threads
  +
  +
stdoutStream <- bufferBlockStream fdStdOut
  +
>>= withEncoding utf8 -- see above
  +
>>= withLocking -- ...
  +
  +
stderrStream <- bufferBlockStream fdStdErr
  +
>>= withEncoding utf8 -- ...
  +
>>= withLocking -- ...
  +
</haskell>
  +
  +
Please note that Streams currently supports only block buffering, there is no line buffering and no-buffering support.
   
 
=== MemBuf (memory-resident stream) ===
 
=== MemBuf (memory-resident stream) ===
Line 232: Line 302:
 
"vClose".
 
"vClose".
   
Actually, raw MemBufs created by the createRawMemBuf and openRawMemBuf
+
Actually, raw MemBufs created by the "createRawMemBuf" and "openRawMemBuf"
operations, while createMemBuf/openMemBuf incorporates additional
+
operations, while createMemBuf/openMemBuf incorporate an additional
 
"bufferMemoryStream" call (as you should remember, buffering adds vGetChar,
 
"bufferMemoryStream" call (as you should remember, buffering adds vGetChar,
vPutStr and other text- and byte-i/o operations on top of vReceiveBuf
+
vPutStr and other text- and byte-I/O operations on top of vReceiveBuf
 
and vSendBuf). You can also apply Char encoding and locking
 
and vSendBuf). You can also apply Char encoding and locking
transformers to these streams.
+
transformers to these streams. The "saveToFile" and "readFromFile" operations
  +
provide an easy way to save/restore buffer contents in a file.
  +
  +
File "Examples/MemBuf.hs" demonstrates the usage of MemBuf.
   
 
=== FunctionsMemoryStream ===
 
=== FunctionsMemoryStream ===
   
Fourth Stream type allow to implement arbitrary streams just by
+
This Stream type allows implementation of arbitrary streams, just by
providing 3 functions that implement vReceiveBuf, vSendBuf and cleanup
+
providing three functions that implement vReceiveBuf, vSendBuf and cleanup
 
operations. It seems that this Stream type is of interest only for my
 
operations. It seems that this Stream type is of interest only for my
own program and can be scrutinized only as example of creating 3-party
+
own program and can be scrutinized only as example of creating 3rd-party
Stream types. It named "FunctionsMemoryStream", see the sources if you
+
Stream types. It is named "FunctionsMemoryStream", see the sources if you
 
are interested.
 
are interested.
   
=== StringReader & StringBuffer (streams that work in ST monad) ===
+
=== StringReader & StringBuffer (String-based streams) ===
   
Four remaining Stream types was a part of HVIO module and I copy their
+
Four remaining Stream types were part of the HVIO module and I copied their
 
description from there:
 
description from there:
   
Line 259: Line 332:
 
filters (simply initialize it with the result from, say, a map over
 
filters (simply initialize it with the result from, say, a map over
 
hGetContents from another Stream object), codecs, and simple I/O
 
hGetContents from another Stream object), codecs, and simple I/O
testing. Because it is lazy, it need not hold the entire string in
+
testing. Because it is lazy, it needs not hold the entire string in
 
memory. You can create a 'StringReader' with a call to
 
memory. You can create a 'StringReader' with a call to
 
'newStringReader'.
 
'newStringReader'.
Line 277: Line 350:
 
ST monad.
 
ST monad.
   
=== Pipes (pass data between Haskell threads) ===
+
=== Pipes (passing data between Haskell threads) ===
   
 
Finally, there are pipes. These pipes are analogous to the Unix pipes
 
Finally, there are pipes. These pipes are analogous to the Unix pipes
Line 291: Line 364:
 
portable and interact well with Haskell threads. A new pipe can be
 
portable and interact well with Haskell threads. A new pipe can be
 
created with a call to 'newHVIOPipe'.
 
created with a call to 'newHVIOPipe'.
  +
  +
  +
  +
== Additional details ==
  +
  +
=== Support for [[GHC]], [[Hugs]] and other compilers ===
  +
  +
The library is compatible with [[GHC]] 6.4
  +
  +
  +
The library fully supports [[Hugs]] 2003-2006, but
  +
  +
1) support for FD and MMFile is temporarily disabled because I don't know how
  +
to build DLLs
  +
  +
2) Hugs 2003 doesn't include support for "instance Bits Word" and vGetBuf/vPutBuf,
  +
so you need to add these implementations manually or delete the lines that use it
  +
(look for "2003" in the sources)
  +
  +
3) WinHugs doesn't support preprocessing, so I included the MakeHugs.cmd script
  +
to preprocess source files using cpphs
  +
  +
  +
Main disadvantage of the library is that it supports only Hugs and GHC
  +
because of using extensions in type classe system (namely, MPTC+FD). I think that it
  +
can be made H98-compatible at the cost of excluding support for non-IO
  +
monads. I will try to make such a stripped version for other compilers
  +
if people are interested.
  +
  +
=== Downloading and installation ===
  +
  +
To get Streams 0.1.7, you can download one of
  +
http://files.pupeno.com/software/streams/Streams-0.1.7.tar.bz2
  +
http://files.pupeno.com/software/streams/Streams-0.1.7.tar.gz
  +
or you can get it from its repository by running:
  +
  +
darcs get --tag=0.1.7 http://software.pupeno.com/Streams-0.1 Streams-0.1.7
  +
  +
You can also download and keep track of the 0.1 branch, which is
  +
supposed to remain stable and only get bug-fixes by running
  +
  +
darcs get http://software.pupeno.com/Streams-0.1/
  +
  +
and then run 'darcs pull' inside it to get further changes.
  +
  +
To get the latest unstable and fluctuating version, the development
  +
version, run:
  +
  +
darcs get http://software.pupeno.com/Streams/
  +
  +
Note: as of this moment, while the project is being darcsified you are
  +
not going to find anything useful there, but we expect that to change.
  +
  +
Preferably, you should send patches to code to
  +
[mailto:Bulat.Ziganshin@gmail.com Bulat.Ziganshin@gmail.com]
  +
and to other parts of library to Pupeno. Documentation may
  +
be edited right at the project homepage, which remains
  +
http://haskell.org/haskellwiki/Library/Streams
  +
  +
Thanks to Jeremy Shaw, the library is now cabalized. To install it, run command:
  +
  +
make install
  +
  +
Directory "Examples" contains examples of using the library.
  +
  +
=== Stage of development ===
  +
  +
The library is currently at the beta stage. It contains a number of
  +
known minor problems and an unknown number of yet-to-be-discovered bugs.
  +
It is not properly documented, doesn't include QuickCheck tests, is not
  +
cabalized, and not all "h*" operations have their "v*" equivalents yet.
  +
If anyone wants to join this effort in order to help fix these oddities
  +
and prepare the lib for inclusion in the standard libraries suite, I would
  +
be really happy. :) I will also be happy (although much less ;) to see
  +
bug reports and suggestions about its interface and internal
  +
organization. It's just a first public version, so we still can change
  +
everything here!
  +
  +
In particular, this wiki page is an official library documentation.
  +
Please continue to improve it and add more information about using the library.
  +
Feel free to ask me about library usage via email:
  +
[mailto:Bulat.Ziganshin@gmail.com Bulat.Ziganshin@gmail.com]
  +
  +
=== Changelog ===
  +
  +
User-visible improvements made in Streams library since version 0.1 (6 Feb 2006)
  +
  +
0.1a (6 Feb 2006)
  +
- Fixed bug: System.MMFile was uncompilable on non-Windows systems
  +
  +
0.1b (9 Feb 2006)
  +
- Fixed bug: very slow WithLocking.vGetLine
  +
- Fixed bug: System.FD was also uncompilable on non-Windows systems
  +
  +
0.1c (12 Feb 2006)
  +
- Fixed bug: System.FD modified one more time to reach Unix compatibility
  +
  +
0.1d (13 Feb 2006)
  +
- Fixed bug: BufferedBlockStream.vGetLine caused exception
  +
* CharEncoding transformer was made faster, but vSetEncoding no more supported
  +
  +
0.1e (8 Jun 2006)
  +
- Fixed bug: "openFD name WriteMode" didn't truncate files on unixes
  +
* Full library now released under BSD3 license, thanks to John Goerzen
  +
+ Now cabalized, thanks to Jeremy Shaw
  +
  +
0.1.6 (Oct 14 2006)
  +
* Added compatibility with just released GHC 6.6
  +
  +
0.1.7 (Nov 24 2006)
  +
* true support for GHC 6.6
  +
* support of files larger than 4 gb on windows (see FD5gb.hs example)
  +
* files are now open in shared mode on all systems
  +
* haddock'ized internal docs
  +
* ready to be included in any unix packaging system

Revision as of 21:36, 9 April 2013

Introduction

Streams: the extensible I/O library

I (Bulat Ziganshin) developed a new I/O library in 2006 that IMHO is so sharp that it can eventually replace the current I/O facilities based on using Handles. The main advantage of the new library is its strong modular design using typeclasses. The library consists of small independent modules, each implementing one type of stream (file, memory buffer, pipe) or one part of common stream functionality (buffering, Char encoding, locking). 3rd-party libs can easily add new stream types and new common functionality. Other benefits of the new library include support for streams functioning in any monad, Hugs and GHC compatibility, high speed and an easy migration path from the existing I/O library.

The Streams library is heavily based on the HVIO module written by John Goerzen. I especially want to thank John for his clever design and implementation. Really, I just renamed HVIO to Stream and presented this as my own work. :) Further development direction was inspired by the "New I/O library" written by Simon Marlow.

---

More recent, 2013-04, developments have focused on Iteratee_I/O and in particular io-streams is similar in it's focus on I/O and replacing file handles.

Simple Streams

The key concept of the lib is the Stream class, whose interface mimics familiar interface for Handles, just with "h" replaced with "v" in function names:

 class (Monad m) => Stream m h where
    vPutStrLn :: h -> String -> m ()
    vGetContents :: h -> m String
    vIsEOF :: h -> m Bool
    vClose :: h -> m ()
    ....................

This means that you already know how to use any stream! The Stream interface currently has 8 implementations: a Handle itself, raw files, pipes, memory buffers and string buffers. Future plans include support for memory-mapped files, sockets, circular memory buffers for interprocess communication and UArray-based streams.

By themselves, these Stream implementations are rather simple. Basically, to implement new Stream type, it's enough to provide vPutBuf/vGetBuf operations, or even vGetChar/vPutChar. The latter way, although inefficient, allows us to implement streams that can work in any monad. StringReader and StringBuffer streams use this to provide string-based Stream class implementations both for IO and ST monads. Yes, you can use the full power of Stream operations inside the ST monad!

Layers of functionality

All additional functionality is implemented via Stream Transformers, which are just parameterized Streams, whose parameters also implement the Stream interface. This allows you to apply any number of stream transformers to the raw stream and then use the result as an ordinary Stream. For example:

          h <- openRawFD "test" WriteMode
                   >>= bufferBlockStream
                   >>= withEncoding utf8
                   >>= withLocking

This code creates a new FD, which represents a raw file, and then adds to this Stream buffering, Char encoding and locking functionality. The result type of "h" is something like this:

          WithLocking (WithEncoding (BufferedBlockStream FD))

The complete type, as well as all the intermediate types, implements the Stream interface. Each transformer intercepts operations corresponding to its nature, and passes the rest through. For example, the encoding transformer intercepts only vGetChar/vPutChar operations and translates them to the sequences of vGetByte/vPutByte calls of the lower-level stream. The locking transformer just wraps any operation in the locking wrapper.

We can trace, for example, the execution of a "vPutBuf" operation on the above-constructed Stream. First, the locking transformer acquires a lock and then passes this call to the next level. Then the encoding transformer does nothing and passes this call to the next level. The buffering transformer flushes the current buffer and passes the call further. Finally, FD itself performs the operation after all these preparations and on the returning path, the locking transformer release its lock.

As another example, the "vPutChar" call on this Stream is transformed (after locking) into several "vPutByte" calls by the encoding transformer, and these bytes go to the buffer in the buffering transformer, with or without a subsequent call to the FD's "vPutBuf".

Modularity

As you can see, stream transformers really are independent of each other. This allows you to use them on any stream and in any combination (but you should apply them in proper order - buffering, then Char encoding, then locking). As a result, you can apply to the stream only the transformers that you really need. If you don't use the stream in multiple threads, you don't need to apply the locking transformer. If you don't use any encodings other than Latin-1 -- or don't use text I/O at all -- you don't need an encoding transformer. Moreover, you may not even need to know anything about the UserData transformer until you actually need to use it :)

Both streams and stream transformers can be implemented by 3rd-party libraries. Streams and transformers from arbitrary libraries will seamlessly work together as long as they properly implement the Stream interface. My future plans include implementation of an on-the-fly (de)compression transformer and I will be happy to see 3rd-party transformers that intercept vGetBuf/vPutBuf calls and use select(), kqueue() and other methods to overlap I/O operations.

Speed

A quick comment about speed: it's fast enough -- 10-50 MB/s (depending on the type of operation) on a 1GHz cpu. The Handle operations, for comparison, show speed of 1-10 mb/s on the same computer. But that don't means that each and any operation in new library is 10 times faster. Strict I/O (including vGetChar/vPutChar) is a LOT faster. I included a demonstration of this fascinating speed as "Examples/wc.hs". If you need a really high speed, don't forget to increase buffer size with "vSetBuffering".

On the other side, lazy I/O (including any operations that receive or return strings) show only modest speedup. This is limited by Haskell/GHC itself and I can't do much to get around these limits. Instead, I plan to provide support for I/O using packed strings. This will allow to write I/O-intensive Haskell programs that are as fast as their C counterparts.

Other sources of slowness includes using of locking transformer (if you need to do this, try use "lock" around speed-critical algorithms) and complex class structure, what may be avoided by using "forall" types (I'm not sure, Simon Marlow can enlighten this topic).

The library includes benchmarking code in the file "Examples/StreamsBenchmark.hs"


Overview of Stream transformers

Buffering

There are three buffering transformers. Each buffering transformer implements support for vGetByte, vPutChar, vGetContents and other byte- and text-oriented operations for the streams, which by themselves support only vGetBuf/vPutBuf (or vReceiveBuf/vSendBuf) operations.

The first transformer can be applied to any stream supporting vGetBuf/vPutBuf. This is applied by the operation "bufferBlockStream". The well-known vSetBuffering/vGetBuffering operations are intercepted by this transformer and used to control buffer size. At this moment, only BlockBuffering is implemented, while LineBuffering and NoBuffering are only in the planning stages.

Two other transformers can be applied to streams that implement vReceiveBuf/vSendBuf operations -- that is, streams whose data reside in memory, including in-memory streams and memory-mapped files. In these cases, the buffering transformer doesn't need to allocate a buffer itself, it just requests from the underlying stream the address and size of the next available portion of data. Nevertheless, the final result is the same -- we get support for all byte- and text-oriented I/O operations. The "bufferMemoryStream" operation can be applied to any memory-based stream to add buffering to it. The "bufferMemoryStreamUnchecked" operation (which implements the third buffering transformer) can be used instead, if you can guarantee that I/O operations can't overflow the used buffer.

Encoding

The Char encoding transformer allows you to encode each Char written to the stream as a sequence of bytes, implementing UTF and other encodings. This transformer can be applied to any stream implementing vGetByte/vPutByte operations and in return it implements vGetChar/vPutChar and all other text-oriented operations. This transformer can be applied to a stream with the "withEncoding encoding" operation, where `encoding` may be `latin1`, `utf8` or any other encoding that you (or a 3rd-party lib) implement. Look at the "Data.CharEncoding" module to see how to implement new encodings. Encoding of streams created with the "withEncoding" operation can be queried with "vGetEncoding". See examples of their usage in the file "Examples/CharEncoding.hs"

Locking

The locking transformer ensures that the stream is properly shared by several threads. You already know enough about its basic usage -- "withLocking" applies this transformer to the stream and all the required locking is performed automagically. You can also use "lock" operations to acquire the lock explicitly during multiple operations:

  lock h $ \h -> do
    savedpos <- vTell h
    vSeek h AbsoluteSeek 100
    vPutStr h ":-)"
    vSeek h AbsoluteSeek savedpos

See the file "Examples/Locking.hs" for examples of using locking transformer.

Attaching user data

This transformer allows you to attach arbitrary data to any Stream. It does nothing extraordinary except that the stream with attached data is the proper Stream, again. See example of its usage in the file "Examples/UserData.hs"

Overview of Stream types

Handle (legacy way to access files/sockets)

"Handle" is an instance of the Stream class, with a straightforward implementation. You can use the Char encoding transformer with Handles. Although Handles implement buffering and locking by themselves, you may also be interested in applying these transformers to the Handle type. This has benefits -- "bufferBlockStream" works faster than internal Handle buffering, and the locking transformer enables the use of a "lock" operation to create a lock around a sequence of operations. Moreover, the locking transformer should be used to ensure proper multi-threading operation of Handle with added encoding or buffering facilities.

FD (new way to access files)

The new method of using files, independent of the existing I/O library, is implemented with the FD type. FD is just an Int representing a POSIX file descriptor and the FD type implements only basic Stream I/O operations - vGetBuf and vPutBuf. So, to create a full-featured FD-based stream, you need to apply buffering transformers. Therefore, the library defines two ways to open files with FD - openRawFD/openRawBinaryFD just creates FD, while openFD/openBinaryFD creates FD and immediatelly apply buffering transformer (bufferBlockStream) to it. In most cases you will use the latter operations. Both pairs mimic the arguments and behaviour of well-known Handle operations openFile/openBinaryFile, so you already know how to use them. Other transformers may be used then as you need. So, abovementioned example can be abbreviated to:

          h <- openFD "test" WriteMode
                   >>= withEncoding utf8
                   >>= withLocking

Thus, to switch from the existing I/O library to using Streams, you need only to replace "h" with "v" in the names of Handle operations, and replace openFile/openBinaryFile calls with openFD/openBinaryFD while adding the "withLocking" transformer to files used in multiple threads. That's all!

For example, the following code:

  h <- openFile "test" ReadMode
  text <- hGetContents h
  hClose h

should be translated to:

  h <- openFD "test" ReadMode
         --  >>= withLocking  -- needed only for multi-threaded usage
  text <- vGetContents h
  vClose h


File "Examples/FD.hs" will show you the FD usage.


In order to work with stdin/stdout/stderr via FDs, you should open them in the same way:

stdinStream  <- bufferBlockStream fdStdIn  
                    >>= withEncoding utf8    -- optional, required only for using non-Latin1 encoding
                    >>= withLocking          -- optional, required only to use this Stream in concurrent Haskell threads

stdoutStream <- bufferBlockStream fdStdOut
                    >>= withEncoding utf8    -- see above
                    >>= withLocking          -- ...

stderrStream <- bufferBlockStream fdStdErr
                    >>= withEncoding utf8    -- ...
                    >>= withLocking          -- ...

Please note that Streams currently supports only block buffering, there is no line buffering and no-buffering support.

MemBuf (memory-resident stream)

MemBuf is a stream type, that keeps its contents in memory buffer. There are two types of MemBufs you can create - you can either open existing memory buffer with "openMemBuf ptr size" or create new one with "createMemBuf initsize". MemBuf opened by "openMemBuf" will be never resized or moved in memory, and will not be freed by "vClose". MemBuf created by "createMemBuf" will grow as needed, can be manually resized by "vSetFileSize" operation, and is automatically freed by "vClose".

Actually, raw MemBufs created by the "createRawMemBuf" and "openRawMemBuf" operations, while createMemBuf/openMemBuf incorporate an additional "bufferMemoryStream" call (as you should remember, buffering adds vGetChar, vPutStr and other text- and byte-I/O operations on top of vReceiveBuf and vSendBuf). You can also apply Char encoding and locking transformers to these streams. The "saveToFile" and "readFromFile" operations provide an easy way to save/restore buffer contents in a file.

File "Examples/MemBuf.hs" demonstrates the usage of MemBuf.

FunctionsMemoryStream

This Stream type allows implementation of arbitrary streams, just by providing three functions that implement vReceiveBuf, vSendBuf and cleanup operations. It seems that this Stream type is of interest only for my own program and can be scrutinized only as example of creating 3rd-party Stream types. It is named "FunctionsMemoryStream", see the sources if you are interested.

StringReader & StringBuffer (String-based streams)

Four remaining Stream types were part of the HVIO module and I copied their description from there:

In addition to Handle, there are several pre-defined stream types for your use. 'StringReader' is a particularly interesting one. At creation time, you pass it a String. Its contents are read lazily whenever a read call is made. It can be used, therefore, to implement filters (simply initialize it with the result from, say, a map over hGetContents from another Stream object), codecs, and simple I/O testing. Because it is lazy, it needs not hold the entire string in memory. You can create a 'StringReader' with a call to 'newStringReader'.

'StringBuffer' is a similar type, but with a different purpose. It provides a full interface like Handle (it supports read, write and seek operations). However, it maintains an in-memory buffer with the contents of the file, rather than an actual on-disk file. You can access the entire contents of this buffer at any time. This can be quite useful for testing I/O code, or for cases where existing APIs use I/O, but you prefer a String representation. Note however that this stream type is very inefficient. You can create a 'StringBuffer' with a call to 'newStringBuffer'.

One significant improvement over the original HVIO library is that 'StringReader' and 'StringBuffer' can work not only in IO, but also in ST monad.

Pipes (passing data between Haskell threads)

Finally, there are pipes. These pipes are analogous to the Unix pipes that are available from System.Posix, but don't require Unix and work only in Haskell. When you create a pipe, you actually get two Stream objects: a 'PipeReader' and a 'PipeWriter'. You must use the 'PipeWriter' in one thread and the 'PipeReader' in another thread. Data that's written to the 'PipeWriter' will then be available for reading with the 'PipeReader'. The pipes are implemented completely with existing Haskell threading primitives, and require no special operating system support. Unlike Unix pipes, these pipes cannot be used across a fork(). Also unlike Unix pipes, these pipes are portable and interact well with Haskell threads. A new pipe can be created with a call to 'newHVIOPipe'.


Additional details

Support for GHC, Hugs and other compilers

The library is compatible with GHC 6.4


The library fully supports Hugs 2003-2006, but

1) support for FD and MMFile is temporarily disabled because I don't know how to build DLLs

2) Hugs 2003 doesn't include support for "instance Bits Word" and vGetBuf/vPutBuf, so you need to add these implementations manually or delete the lines that use it (look for "2003" in the sources)

3) WinHugs doesn't support preprocessing, so I included the MakeHugs.cmd script to preprocess source files using cpphs


Main disadvantage of the library is that it supports only Hugs and GHC because of using extensions in type classe system (namely, MPTC+FD). I think that it can be made H98-compatible at the cost of excluding support for non-IO monads. I will try to make such a stripped version for other compilers if people are interested.

Downloading and installation

To get Streams 0.1.7, you can download one of

 http://files.pupeno.com/software/streams/Streams-0.1.7.tar.bz2
 http://files.pupeno.com/software/streams/Streams-0.1.7.tar.gz

or you can get it from its repository by running:

darcs get --tag=0.1.7 http://software.pupeno.com/Streams-0.1 Streams-0.1.7

You can also download and keep track of the 0.1 branch, which is supposed to remain stable and only get bug-fixes by running

darcs get http://software.pupeno.com/Streams-0.1/

and then run 'darcs pull' inside it to get further changes.

To get the latest unstable and fluctuating version, the development version, run:

darcs get http://software.pupeno.com/Streams/

Note: as of this moment, while the project is being darcsified you are not going to find anything useful there, but we expect that to change.

Preferably, you should send patches to code to Bulat.Ziganshin@gmail.com and to other parts of library to Pupeno. Documentation may be edited right at the project homepage, which remains http://haskell.org/haskellwiki/Library/Streams

Thanks to Jeremy Shaw, the library is now cabalized. To install it, run command:

 make install

Directory "Examples" contains examples of using the library.

Stage of development

The library is currently at the beta stage. It contains a number of known minor problems and an unknown number of yet-to-be-discovered bugs. It is not properly documented, doesn't include QuickCheck tests, is not cabalized, and not all "h*" operations have their "v*" equivalents yet. If anyone wants to join this effort in order to help fix these oddities and prepare the lib for inclusion in the standard libraries suite, I would be really happy. :) I will also be happy (although much less ;) to see bug reports and suggestions about its interface and internal organization. It's just a first public version, so we still can change everything here!

In particular, this wiki page is an official library documentation. Please continue to improve it and add more information about using the library. Feel free to ask me about library usage via email: Bulat.Ziganshin@gmail.com

Changelog

User-visible improvements made in Streams library since version 0.1 (6 Feb 2006)

0.1a (6 Feb 2006)
- Fixed bug: System.MMFile was uncompilable on non-Windows systems
0.1b (9 Feb 2006)
- Fixed bug: very slow WithLocking.vGetLine
- Fixed bug: System.FD was also uncompilable on non-Windows systems
0.1c (12 Feb 2006)
- Fixed bug: System.FD modified one more time to reach Unix compatibility
0.1d (13 Feb 2006)
- Fixed bug: BufferedBlockStream.vGetLine caused exception
* CharEncoding transformer was made faster, but vSetEncoding no more supported
0.1e (8 Jun 2006)
- Fixed bug: "openFD name WriteMode" didn't truncate files on unixes
* Full library now released under BSD3 license, thanks to John Goerzen
+ Now cabalized, thanks to Jeremy Shaw
0.1.6 (Oct 14 2006)
* Added compatibility with just released GHC 6.6
0.1.7 (Nov 24 2006)
* true support for GHC 6.6
* support of files larger than 4 gb on windows (see FD5gb.hs example)
* files are now open in shared mode on all systems
* haddock'ized internal docs
* ready to be included in any unix packaging system