Difference between revisions of "Library/AltBinary"

From HaskellWiki
Jump to navigation Jump to search
(added old unfinished docs)
m (added <haskell> to code fragments)
Line 1: Line 1:
AltBinary is a part of Streams library, what implements binary I/O
+
AltBinary is a part of [http://haskell.org/haskellwiki/Library/Streams Streams] library, what implements binary I/O
 
and serialization faсilities. Just a list of features implemented in this lib:
 
and serialization faсilities. Just a list of features implemented in this lib:
   
Line 45: Line 45:
 
big-endian representations:
 
big-endian representations:
   
  +
<haskell>
 
h <- openBinaryFD "test" WriteMode
 
h <- openBinaryFD "test" WriteMode
 
vPutByte h (1::Int)
 
vPutByte h (1::Int)
Line 59: Line 60:
 
vClose h
 
vClose h
 
print (a::Int, b::Int, c::Int, d::Int)
 
print (a::Int, b::Int, c::Int, d::Int)
  +
</haskell>
   
 
All these operations work fine not only with Int, but with any
 
All these operations work fine not only with Int, but with any
Line 72: Line 74:
 
each other:
 
each other:
   
- Byte I/O (vGetByte and vPutByte)
+
* Byte I/O (vGetByte and vPutByte)
- Integral values I/O (getWordXX and putWordXX)
+
* Integral values I/O (getWordXX and putWordXX)
- Data structures I/O (over 100 operations :) )
+
* Data structures I/O (over 100 operations :) )
- Serialization API (get and put_)
+
* Serialization API (get and put_)
   
 
We will study them all sequentially, starting from the lowest level.
 
We will study them all sequentially, starting from the lowest level.
Line 86: Line 88:
 
types:
 
types:
   
  +
<haskell>
 
vGetByte :: (Stream m h, Enum a) => h -> m a
 
vGetByte :: (Stream m h, Enum a) => h -> m a
 
vPutByte :: (Stream m h, Enum a) => h -> a -> m ()
 
vPutByte :: (Stream m h, Enum a) => h -> a -> m ()
  +
</haskell>
   
 
This allows to read/write any integral and enumeration values without
 
This allows to read/write any integral and enumeration values without
Line 97: Line 101:
 
binary data. You can freely mix byte and text I/O on one Stream:
 
binary data. You can freely mix byte and text I/O on one Stream:
   
  +
<haskell>
 
main = do vPutByte stdout (1::Int)
 
main = do vPutByte stdout (1::Int)
 
vPutStrLn stdout "text"
 
vPutStrLn stdout "text"
 
vPutBuf stdout buf bufsize
 
vPutBuf stdout buf bufsize
  +
</haskell>
   
   
Line 106: Line 112:
 
The core of this API is two generalized operations:
 
The core of this API is two generalized operations:
   
  +
<haskell>
 
getBits bits h
 
getBits bits h
 
putBits bits h value
 
putBits bits h value
  +
</haskell>
   
 
`getBits` reads certain number of bits from given BinaryStream and
 
`getBits` reads certain number of bits from given BinaryStream and
Line 123: Line 131:
 
4 methods to open BinaryStream on top of plain Stream:
 
4 methods to open BinaryStream on top of plain Stream:
   
  +
<haskell>
 
binaryStream <- openByteAligned stream -- big-endian
 
binaryStream <- openByteAligned stream -- big-endian
 
binaryStream <- openByteAlignedLE stream -- little-endian
 
binaryStream <- openByteAlignedLE stream -- little-endian
 
binaryStream <- openBitAligned stream -- big-endian
 
binaryStream <- openBitAligned stream -- big-endian
 
binaryStream <- openBitAlignedLE stream -- little-endian
 
binaryStream <- openBitAlignedLE stream -- little-endian
  +
</haskell>
   
 
Moreover, to simplify your work, Stream by itself can also be used as
 
Moreover, to simplify your work, Stream by itself can also be used as
 
BinaryStream - in this case byte-aligned big-endian representation used.
 
BinaryStream - in this case byte-aligned big-endian representation used.
 
So, you can write, for example:
 
So, you can write, for example:
  +
<haskell>
 
putBits 16 stdout (0::Int)
 
putBits 16 stdout (0::Int)
  +
</haskell>
 
or
 
or
  +
<haskell>
 
bh <- openByteAlignedLE stdout
 
bh <- openByteAlignedLE stdout
 
putBits 16 bh (0::Int)
 
putBits 16 bh (0::Int)
  +
</haskell>
   
 
There is also operation `flushBits h` what aligns BinaryStream on the
 
There is also operation `flushBits h` what aligns BinaryStream on the
Line 143: Line 157:
 
There are also "shortcut" operations what read/write some number of bits:
 
There are also "shortcut" operations what read/write some number of bits:
   
  +
<haskell>
 
getBit h
 
getBit h
 
getWord8 h
 
getWord8 h
Line 153: Line 168:
 
putWord32 h value
 
putWord32 h value
 
putWord64 h value
 
putWord64 h value
  +
</haskell>
   
 
Although these operations seems like just shortcuts for partial
 
Although these operations seems like just shortcuts for partial
Line 162: Line 178:
 
don't forget to make `flushBits` after bit-aligned chunks of I/O:
 
don't forget to make `flushBits` after bit-aligned chunks of I/O:
   
  +
<haskell>
 
main = do putWord32 stdout (1::Int) -- byte-aligned big-endian
 
main = do putWord32 stdout (1::Int) -- byte-aligned big-endian
   
Line 179: Line 196:
 
putBits 15 stdoutBits (1::Int) -- bit-aligned big-endian
 
putBits 15 stdoutBits (1::Int) -- bit-aligned big-endian
 
flushBits stdoutBit
 
flushBits stdoutBit
  +
</haskell>
   
 
When you request to write, say, 15 bits to byte-aligned BinaryStream,
 
When you request to write, say, 15 bits to byte-aligned BinaryStream,
Line 189: Line 207:
 
allow to intermix little-endian and big-endian I/O:
 
allow to intermix little-endian and big-endian I/O:
   
  +
<haskell>
 
getWord16le h
 
getWord16le h
 
getWord32le h
 
getWord32le h
Line 201: Line 220:
 
putWord32be h value
 
putWord32be h value
 
putWord64be h value
 
putWord64be h value
  +
</haskell>
   
 
For example, you can write:
 
For example, you can write:
   
  +
<haskell>
 
main = do putWord32le stdout (1::Int) -- byte-aligned little-endian
 
main = do putWord32le stdout (1::Int) -- byte-aligned little-endian
 
putWord16be stdout (1::Int) -- byte-aligned big-endian
 
putWord16be stdout (1::Int) -- byte-aligned big-endian
  +
</haskell>
   
 
Please note that `h` in these operations is a Stream, not
 
Please note that `h` in these operations is a Stream, not
Line 215: Line 237:
 
automatically perform `flushBits` at the finish:
 
automatically perform `flushBits` at the finish:
   
  +
<haskell>
 
withBitAlignedLE stdout $ \h -> do
 
withBitAlignedLE stdout $ \h -> do
 
putBit h (1::Int) -- bit-aligned little-endian
 
putBit h (1::Int) -- bit-aligned little-endian
 
putBits 15 h (1::Int) -- bit-aligned little-endian
 
putBits 15 h (1::Int) -- bit-aligned little-endian
  +
</haskell>
   
 
I also should say that you can perform all the Stream operations on
 
I also should say that you can perform all the Stream operations on
Line 223: Line 247:
 
performing any I/O and seeking operations. For example:
 
performing any I/O and seeking operations. For example:
   
  +
<haskell>
 
h <- openBitAligned stdout
 
h <- openBitAligned stdout
 
vPutStr h "text"
 
vPutStr h "text"
Line 230: Line 255:
 
putWord16le h (1::Int) -- little-endian format will be used here despite
 
putWord16le h (1::Int) -- little-endian format will be used here despite
 
-- big-endiannes of the BinaryStream itself
 
-- big-endiannes of the BinaryStream itself
  +
</haskell>
   
   
Line 237: Line 263:
 
This part is a really small! :) There are just two operations:
 
This part is a really small! :) There are just two operations:
   
  +
<haskell>
 
get h
 
get h
 
put_ h a
 
put_ h a
  +
</haskell>
   
 
where `h` is a BinaryStream. These operations read and write binary
 
where `h` is a BinaryStream. These operations read and write binary
Line 255: Line 283:
 
"import Data.Binary" statement with either
 
"import Data.Binary" statement with either
   
  +
<haskell>
 
import Data.Binary.ByteAligned
 
import Data.Binary.ByteAligned
  +
</haskell>
 
or
 
or
  +
<haskell>
 
import Data.Binary.BitAligned
 
import Data.Binary.BitAligned
  +
</haskell>
   
 
depending on what type of access you need. in the first case
 
depending on what type of access you need. in the first case
Line 270: Line 302:
 
=== AltBinary interface ===
 
=== AltBinary interface ===
   
  +
<haskell>
 
let s = encode ("11",123::Int,[1..10::Int])
 
let s = encode ("11",123::Int,[1..10::Int])
 
print (decode s::(String,Int,[Int]))
 
print (decode s::(String,Int,[Int]))
  +
</haskell>
   
   
Line 280: Line 314:
 
but allows to use them directly on Handles and any other streams:
 
but allows to use them directly on Handles and any other streams:
   
  +
<haskell>
 
import Data.AltBinary
 
import Data.AltBinary
   
Line 289: Line 324:
 
x <- get h :: IO [Int]
 
x <- get h :: IO [Int]
 
print x
 
print x
  +
</haskell>
   
 
if you need bit-aligned serialization, use the `openBitAligned` stream
 
if you need bit-aligned serialization, use the `openBitAligned` stream
 
transformer:
 
transformer:
   
  +
<haskell>
 
h <- openBinaryFile "test" WriteMode
 
h <- openBinaryFile "test" WriteMode
 
>>= openBitAligned
 
>>= openBitAligned
Line 298: Line 335:
 
put_ h True
 
put_ h True
 
vClose h
 
vClose h
  +
</haskell>
   
 
of course, to read these data you also need to use `openBitAligned`:
 
of course, to read these data you also need to use `openBitAligned`:
   
  +
<haskell>
 
h <- openBinaryFile "test" ReadMode
 
h <- openBinaryFile "test" ReadMode
 
>>= openBitAligned
 
>>= openBitAligned
Line 306: Line 345:
 
y <- get h :: IO Bool
 
y <- get h :: IO Bool
 
print (x,y)
 
print (x,y)
  +
</haskell>
   
 
The above code writes data in big-endian format, if you need to use
 
The above code writes data in big-endian format, if you need to use
 
low-endian formats, use the following transformers:
 
low-endian formats, use the following transformers:
   
  +
<haskell>
 
h <- openBinaryFile "test" WriteMode
 
h <- openBinaryFile "test" WriteMode
 
>>= openByteAlignedLE
 
>>= openByteAlignedLE
  +
</haskell>
 
and
 
and
  +
<haskell>
 
h <- openBinaryFile "test" WriteMode
 
h <- openBinaryFile "test" WriteMode
 
>>= openBitAlignedLE
 
>>= openBitAlignedLE
  +
</haskell>
   
 
for the byte-aligned and bit-aligned access, respectively.
 
for the byte-aligned and bit-aligned access, respectively.
Line 322: Line 366:
 
bit-aligned I/O:
 
bit-aligned I/O:
   
  +
<haskell>
 
h <- openBinaryFile "test" WriteMode
 
h <- openBinaryFile "test" WriteMode
 
>>= openBitAligned
 
>>= openBitAligned
Line 328: Line 373:
 
vPutStr h "string"
 
vPutStr h "string"
 
vClose h
 
vClose h
  +
</haskell>
   
 
it's also possible to use different types of binary atreams on top of
 
it's also possible to use different types of binary atreams on top of
 
one Stream:
 
one Stream:
   
  +
<haskell>
 
h <- openBinaryFile "test" WriteMode
 
h <- openBinaryFile "test" WriteMode
 
bh <- openBitAligned h
 
bh <- openBitAligned h
Line 339: Line 386:
 
vPutStr bh "string"
 
vPutStr bh "string"
 
vClose h
 
vClose h
  +
</haskell>
   
 
... if you will ever need this :)
 
... if you will ever need this :)
Line 351: Line 399:
 
specified number of bits:
 
specified number of bits:
   
  +
<haskell>
 
putBits 32 h (123::Int)
 
putBits 32 h (123::Int)
 
x <- getBits 32 h :: IO Int
 
x <- getBits 32 h :: IO Int
  +
</haskell>
   
 
if you call on byte-aligned stream putBits with number of bits, what
 
if you call on byte-aligned stream putBits with number of bits, what
Line 363: Line 413:
 
definition:
 
definition:
   
  +
<haskell>
 
instance Binary Bool where
 
instance Binary Bool where
 
put_ h x = putBit h $! (fromEnum x)
 
put_ h x = putBit h $! (fromEnum x)
 
get h = do x <- getBit h; return $! (toEnum x)
 
get h = do x <- getBit h; return $! (toEnum x)
  +
</haskell>
   
 
allows to encode Bool values with just one bit in bit-aligned streams,
 
allows to encode Bool values with just one bit in bit-aligned streams,
Line 371: Line 423:
 
code for Maybe types uses Bool values:
 
code for Maybe types uses Bool values:
   
  +
<haskell>
 
instance Binary a => Binary (Maybe a) where
 
instance Binary a => Binary (Maybe a) where
 
put_ bh (Just a) = do put_ bh True; put_ bh a
 
put_ bh (Just a) = do put_ bh True; put_ bh a
Line 377: Line 430:
 
if flag then do a <- get bh; return (Just a)
 
if flag then do a <- get bh; return (Just a)
 
else return Nothing
 
else return Nothing
  +
</haskell>
   
 
as a result, representation of `Maybe a` uses just one more bit than
 
as a result, representation of `Maybe a` uses just one more bit than
Line 395: Line 449:
 
example, encode Int as 8-bit value:
 
example, encode Int as 8-bit value:
   
  +
<haskell>
 
putWord8 h (length "test")
 
putWord8 h (length "test")
  +
</haskell>
   
 
these fixed-bits routines used in definitions of Binary instances for
 
these fixed-bits routines used in definitions of Binary instances for
Line 404: Line 460:
 
instead of get/put_:
 
instead of get/put_:
   
  +
<haskell>
 
putWord16 h (1::Int)
 
putWord16 h (1::Int)
 
putWord32 h (2::Word)
 
putWord32 h (2::Word)
 
putWord64 h (3::Integer)
 
putWord64 h (3::Integer)
  +
</haskell>
   
 
the same rule applies if you need to write fixed-size value with
 
the same rule applies if you need to write fixed-size value with
 
non-default number of bits:
 
non-default number of bits:
   
  +
<haskell>
 
putWord8 h (4::Int32)
 
putWord8 h (4::Int32)
  +
</haskell>
   
 
functions putWord16..putWord64 uses big-endian representation, also
 
functions putWord16..putWord64 uses big-endian representation, also
Line 428: Line 488:
 
values in given range [min..max]:
 
values in given range [min..max]:
   
  +
<haskell>
 
putBounded min max h x
 
putBounded min max h x
 
x <- getBounded min max h
 
x <- getBounded min max h
  +
</haskell>
   
 
they also support values of any Integral type. These functions are used
 
they also support values of any Integral type. These functions are used
Line 436: Line 498:
 
you can declare:
 
you can declare:
   
  +
<haskell>
 
data Color = Red | Green | Blue deriving (Bounded, Enum)
 
data Color = Red | Green | Blue deriving (Bounded, Enum)
  +
</haskell>
   
 
and now you can use get/put_ on Colors; Color values would be encoded
 
and now you can use get/put_ on Colors; Color values would be encoded
Line 490: Line 554:
 
surrounding code) to specify its type explicitly, say:
 
surrounding code) to specify its type explicitly, say:
   
  +
<haskell>
 
arr <- get h :: IO (Array Int Int32)
 
arr <- get h :: IO (Array Int Int32)
  +
</haskell>
   
   
Line 497: Line 563:
 
(de)serialization routines. first, there are routines
 
(de)serialization routines. first, there are routines
   
  +
<haskell>
 
putIArray h arr
 
putIArray h arr
 
putMArray h arr
 
putMArray h arr
  +
</haskell>
   
 
what can be used to write to the Stream any array that is instance of
 
what can be used to write to the Stream any array that is instance of
Line 507: Line 575:
 
to explicitly pass them bounds of array read:
 
to explicitly pass them bounds of array read:
   
  +
<haskell>
 
arr <- getIArray h bounds
 
arr <- getIArray h bounds
 
arr <- getMArray h bounds
 
arr <- getMArray h bounds
  +
</haskell>
   
note that this operations is not full analogues of put_/get ones, which
+
note that these operations are not full analogues of put_/get ones,
are write and read array bounds automatically. these operations are
+
which writes and reads array bounds automatically. these operations are
more low-level - they are read/write only the array elements. also
+
more low-level - they reads/writes only the array elements. also note that,
note that just like `get` operation, you may need to specify type of
+
just like the `get` operation, you may need to specify type of
 
the array read:
 
the array read:
   
  +
<haskell>
 
arr <- getIArray h (0,9) :: IO (Array Int Int32)
 
arr <- getIArray h (0,9) :: IO (Array Int Int32)
  +
</haskell>
   
   
Line 525: Line 597:
 
as the first argument:
 
as the first argument:
   
  +
<haskell>
 
putIArrayWith putUnsigned h arr
 
putIArrayWith putUnsigned h arr
 
putMArrayWith (putBits 15) h arr
 
putMArrayWith (putBits 15) h arr
 
arr <- getIArrayWith getWord8 h bounds
 
arr <- getIArrayWith getWord8 h bounds
 
arr <- getMArrayWith (getBounded 1 5) h bounds
 
arr <- getMArrayWith (getBounded 1 5) h bounds
  +
</haskell>
   
 
of course, you can also provide your own read/write procedures, if
 
of course, you can also provide your own read/write procedures, if
Line 539: Line 613:
 
`With`:
 
`With`:
   
  +
<haskell>
 
arr <- getIArrayN h 10 :: IO (Array Int Int32)
 
arr <- getIArrayN h 10 :: IO (Array Int Int32)
 
arr <- getMArrayNWith getWord32 h 10 :: IO (IOArray Int Int)
 
arr <- getMArrayNWith getWord32 h 10 :: IO (IOArray Int Int)
  +
</haskell>
   
 
these operations in some way dubs the similar list procedures
 
these operations in some way dubs the similar list procedures
Line 551: Line 627:
 
specify array types in `get` operations, say instead of:
 
specify array types in `get` operations, say instead of:
   
  +
<haskell>
 
arr <- getIArrayN h 10 :: IO (Array Int Int32)
 
arr <- getIArrayN h 10 :: IO (Array Int Int32)
  +
</haskell>
   
 
one can write
 
one can write
   
  +
<haskell>
 
arr <- getArrayN h 10
 
arr <- getArrayN h 10
  +
</haskell>
   
 
it is nothing more than handy shortcuts. the only exclusion is
 
it is nothing more than handy shortcuts. the only exclusion is
Line 573: Line 653:
   
 
==== Defining Binary instances for custom serialization formats (unwritten) ====
 
==== Defining Binary instances for custom serialization formats (unwritten) ====
  +
  +
----

Revision as of 07:28, 12 July 2006

AltBinary is a part of Streams library, what implements binary I/O and serialization faсilities. Just a list of features implemented in this lib:

  • class-based AltBinary interface plus emulation of NewBinary library interface
  • compatibility with Hugs and GHC, with GHC-specific speed optimizations
  • (de)serialization speed of 20-60 mb/sec on 1GHz CPU
  • free intermixing of text and binary i/o on the same Stream
  • support for byte-aligned and bit-aligned, low-endian and big-endian serialization using the same interface
  • data files written are CPU-independent, f.e. you can serialize data on 32-bit low-endian CPU and then read it back on 64-bit big-endian one
  • classical Binary class with "get" and "put_" functions defines default representation for each type
  • get/put_ uses fixed-size encoding for Int8..Word64, but variable-length encoding for Int/Word/Integer (including encoding of array bounds and list lengths)
  • any integral value can be saved with explicitly specified size using functions "putBits bh bits value" and "getBits bh bits" (their shortcuts putBit/putWord8...putWord64/getBit/... is also supported for all integral types)
  • get/put_ uses UTF8 encoding for strings/chars
  • Binary class instances (i.e. get/put_ implementation) for Bounded Enum, Storable, arrays, maps and sets
  • lots of alternative representations for Strings, lists and arrays, including ability to use user-supported function to read/write each list/array element, such as "putMArrayWith (putBits 15) h arr". for example, "putString0With putLatin1Char h s" implements ASCIIZ string encoding
  • Template Haskell can used to automatically derive new Binary instances
  • ability to serialize data to any Stream, including Handle, raw files, memory-mapped files, memory and string buffers, pipes, network sockets and so on
  • after all, it can work in any monad. the only thing required for it's work is something supporting ByteStream interface, i.e. implementing vGetByte and vPutByte operations in some monad

... and i still don't mentioned some features such as (encode :: a->String) and decode functions. i think that i implemented in this lib everything that anyone (except for Einar ;)) ever imagined :)


I still can't finish documentation for AltBinary, partially because it contains so many features!

Below is results of my 3 attempts to do it. This can't be considered as real docs, it's more like some sketches and ideas about how library can be used. Sorry


First attempt: ByteStream

ByteStream interface provides primitive byte I/O operations - vGetByte and vPutByte. On top of this, all binary I/O and serialization facilities are built. As the result, you can use these facilities on any ByteStream, ranging from the file to string buffer.

Please note that all binary I/O operations come in pairs with names putXXX and getXXX. I will introduce only "putXXX" operations, you should just know that each operation has its "getXXX" twin.

First, most ground-close layer includes operations putWord16le, putWord32le, putWord64le and putWord16be, putWord32be, putWord64be. Together with the vPutByte operation they allow to write low-endian and big-endian values of any size and even mix low-endian and big-endian representations:

  h <- openBinaryFD "test" WriteMode
  vPutByte h (1::Int)
  putWord16le h (2::Int)
  putWord32be h (3::Int)
  putWord64le h (4::Int)
  vClose h

  h <- openBinaryFD "test" ReadMode
  a <- vGetByte h
  b <- getWord16le h
  c <- getWord32be h
  d <- getWord64le h
  vClose h
  print (a::Int, b::Int, c::Int, d::Int)

All these operations work fine not only with Int, but with any integral type (Word, Integer, Int8, Word64 and so on). This allows you to read/write any integral value without explicit type conversion. On the other side, if you write literal constants using these functions, you will need to its type (say, Int) explicitly.


Second attempt

In AltBinary library there are 4 methods of binary I/O builded on top of each other:

  • Byte I/O (vGetByte and vPutByte)
  • Integral values I/O (getWordXX and putWordXX)
  • Data structures I/O (over 100 operations :) )
  • Serialization API (get and put_)

We will study them all sequentially, starting from the lowest level.

Byte I/O

Lowest level, the byte I/O, isn't differ significantly from the Char I/O. All Streams support vGetByte and vPutByte operations, either directly or via buffering transformer. These operations has rather generalized types:

vGetByte :: (Stream m h, Enum a) => h -> m a
vPutByte :: (Stream m h, Enum a) => h -> a -> m ()

This allows to read/write any integral and enumeration values without additional type conversions (of course, these values should belong to the 0..255 range)

Together with other Stream operations, such as vIsEOF, vTell/vSeek, vGetBuf/vPutBuf, this allows to write any programs that operate upon binary data. You can freely mix byte and text I/O on one Stream:

main = do vPutByte stdout (1::Int)
          vPutStrLn stdout "text"
          vPutBuf stdout buf bufsize


Integral values / bit sequences I/O

The core of this API is two generalized operations:

getBits bits h
putBits bits h value

`getBits` reads certain number of bits from given BinaryStream and returns it as value of any integral type (Int, Word8, Integer and so on). `putBits` writes given value as a certain number of bits. The `value`, again, may be of any integral type.

These two operations can be implemented in one of 4 ways, depending on the answers on two questions: - whether integral values written as big- or little-endian? - whether values written are bit-aligned or byte-aligned?

The library allows you to select any answers on these questions. The `h` parameter in this operation represents BinaryStream and there are 4 methods to open BinaryStream on top of plain Stream:

binaryStream <- openByteAligned stream      -- big-endian
binaryStream <- openByteAlignedLE stream    -- little-endian
binaryStream <- openBitAligned stream       -- big-endian
binaryStream <- openBitAlignedLE stream     -- little-endian

Moreover, to simplify your work, Stream by itself can also be used as BinaryStream - in this case byte-aligned big-endian representation used. So, you can write, for example:

   putBits 16 stdout (0::Int)

or

   bh <- openByteAlignedLE stdout
   putBits 16 bh (0::Int)

There is also operation `flushBits h` what aligns BinaryStream on the byte boundary. It fills the rest of pyte with zero bits on output and skip the rest of bits in current bytes on input. Of course, this operation does nothing on byte-aligned BinaryStreams.

There are also "shortcut" operations what read/write some number of bits:

getBit h
getWord8 h
getWord16 h
getWord32 h
getWord64 h
putBit h value
putWord8 h value
putWord16 h value
putWord32 h value
putWord64 h value

Although these operations seems like just shortcuts for partial application of getBits/putBits, they are works somewhat faster. In contrast to other binary I/O libraries, each of these operations can accept/return values of any integral type.

You can freely mix text I/O, byte I/O and bits I/O as long as you don't forget to make `flushBits` after bit-aligned chunks of I/O:

main = do putWord32 stdout (1::Int)  -- byte-aligned big-endian

          stdoutLE <- openByteAlignedLE stdout
          putWord32  stdoutLE (1::Int)  -- byte-aligned little-endian
          putBits 15 stdoutLE (1::Int)  -- byte-aligned little-endian

          stdoutBitsLE <- openBitAlignedLE stdout
          putBit     stdoutBitsLE (1::Int)  -- bit-aligned little-endian
          putBits 15 stdoutBitsLE (1::Int)  -- bit-aligned little-endian
          flushBits stdoutBitsLE

          vPutStrLn stdout "text"

          stdoutBits <- openBitAligned stdout
          putBit     stdoutBits (1::Int)  -- bit-aligned big-endian
          putBits 15 stdoutBits (1::Int)  -- bit-aligned big-endian
          flushBits stdoutBit

When you request to write, say, 15 bits to byte-aligned BinaryStream, the whole number of bytes are written. In particular, each `putBit` operation on byte-aligned BinaryStream writes the whole byte to the stream while the same operation on bit-aligned streams fills one bit at a time.

But that is not yet the whole story! There are also operations that allow to intermix little-endian and big-endian I/O:

getWord16le h
getWord32le h
getWord64le h
putWord16le h value
putWord32le h value
putWord64le h value
getWord16be h
getWord32be h
getWord64be h
putWord16be h value
putWord32be h value
putWord64be h value

For example, you can write:

main = do putWord32le stdout (1::Int)  -- byte-aligned little-endian
          putWord16be stdout (1::Int)  -- byte-aligned big-endian

Please note that `h` in these operations is a Stream, not BinaryStream. Actually, these operations just perform several fixed vGetByte or vPutByte operations and, strictly speaking, they should be noted in previous section.

There are also combinator versions of `open*` operations, that automatically perform `flushBits` at the finish:

    withBitAlignedLE stdout $ \h -> do
        putBit     h (1::Int)  -- bit-aligned little-endian
        putBits 15 h (1::Int)  -- bit-aligned little-endian

I also should say that you can perform all the Stream operations on any BinaryStream, and bit-aligned streams will flush themselves before performing any I/O and seeking operations. For example:

    h <- openBitAligned stdout
    vPutStr h "text"
    putBit h (1::Int)
    vPutByte h (1::Int)     -- `flushBits` will be automatically
                            --   called before this operation
    putWord16le h (1::Int)  -- little-endian format will be used here despite
                            --   big-endiannes of the BinaryStream itself


Serialization API

This part is a really small! :) There are just two operations:

get h
put_ h a

where `h` is a BinaryStream. These operations read and write binary representation of any value belonging to the class Binary.



Third attempt

Emulation of Binary interface

This library implements 2 interfaces: Binary and AltBinary. First interface allows to use this library as drop-in replacement for the well-known Binary and NewBinary libs. all you need to do is to replace "import Data.Binary" statement with either

 import Data.Binary.ByteAligned

or

 import Data.Binary.BitAligned

depending on what type of access you need. in the first case representation of any data value will be written/read as the whole number of bytes, in the second case data values may cross byte boundaries and, for example, Bools will be packed 8 values per byte. please draw attention that despite interface emulation this library and original Binary lib use different representations for most of the data types


AltBinary interface

          let s = encode ("11",123::Int,[1..10::Int])
          print (decode s::(String,Int,[Int]))


Types of binary streams: bit/byte-aligned, low/big-endian

AltBinary is "native" interface of this library to (de)serialize data. it provides the same operations `get` and `put_` to read/write data, but allows to use them directly on Handles and any other streams:

 import Data.AltBinary

 h <- openBinaryFile "test" WriteMode
 put_ h [1..100::Int]
 hClose h

 h <- openBinaryFile "test" ReadMode
 x <- get h :: IO [Int]
 print x

if you need bit-aligned serialization, use the `openBitAligned` stream transformer:

 h <- openBinaryFile "test" WriteMode
        >>= openBitAligned
 put_ h "string"
 put_ h True
 vClose h

of course, to read these data you also need to use `openBitAligned`:

 h <- openBinaryFile "test" ReadMode
        >>= openBitAligned
 x <- get h :: IO String
 y <- get h :: IO Bool
 print (x,y)

The above code writes data in big-endian format, if you need to use low-endian formats, use the following transformers:

 h <- openBinaryFile "test" WriteMode
        >>= openByteAlignedLE

and

 h <- openBinaryFile "test" WriteMode
        >>= openBitAlignedLE

for the byte-aligned and bit-aligned access, respectively.

You can also mix the binary and text i/o at the same stream, with only one requirement: use "flushBits h" after you used stream for some bit-aligned I/O:

 h <- openBinaryFile "test" WriteMode
        >>= openBitAligned
 put_ h True
 flushBits h
 vPutStr h "string"
 vClose h

it's also possible to use different types of binary atreams on top of one Stream:

 h <- openBinaryFile "test" WriteMode
 bh <- openBitAligned h
 put_ bh True
 flushBits bh
 bh <- openByteAlignedLE h
 vPutStr bh "string"
 vClose h

... if you will ever need this :)


getBits/putBits; Binary instances for Bool, Maybe, Either

`get` and `put_` operations are just enough if you need only to save some values in Stream and then restore them. but to assemble/parse data in some particular format, you will need some more low-level functions, such as `getBits` and `putBits`, which transfers just the specified number of bits:

 putBits 32 h (123::Int)
 x <- getBits 32 h :: IO Int

if you call on byte-aligned stream putBits with number of bits, what is not divisible by 8, the whole number of bytes are occupied. in particular, putBit on byte-aligned streams occupies entire byte

this makes possible to use the same (de)serialization code and in particular the same definitions of Binary instances both for byte-aligned and bit-aligned streams! for example, the following definition:

 instance Binary Bool where
     put_ h x = putBit h $! (fromEnum x)
     get  h   = do x <- getBit h; return $! (toEnum x)

allows to encode Bool values with just one bit in bit-aligned streams, but uses the whole byte in byte-aligned ones. further, serialization code for Maybe types uses Bool values:

 instance Binary a => Binary (Maybe a) where
     put_ bh (Just a) = do put_ bh True; put_ bh a
     put_ bh Nothing  = do put_ bh False
     get  bh          = do flag <- get bh
                           if flag then do a <- get bh; return (Just a)
                                   else return Nothing

as a result, representation of `Maybe a` uses just one more bit than representation of type `a` in bit-aligned streams, and whole extra byte otherwise. the same story is for Either types


getWord8..putWord64; Binary instances for Int8..Word64

most widespread uses of getBits/putBits is for 1/8/16/32/64 bits, and so there are specialized (and sometimes more efficient) versions of these functions, called putBit, putWord8...putWord64 (and of course their get... counterparts). please draw attention that all these functions accept arguments (or return values) of any Integral type (i.e. types what are instances of Integral class - Int, Integer, Word, Int8..Word64), so you don't need to convert types if you want, for example, encode Int as 8-bit value:

 putWord8 h (length "test")

these fixed-bits routines used in definitions of Binary instances for types with fixed sizes - Int8...Word64. types Int, Word and Integer by default uses variable-sized representation, which would be described later. if you need to read or write values of these types using fixed-size representation, use appropriate fixed-bits procedures instead of get/put_:

 putWord16 h (1::Int)
 putWord32 h (2::Word)
 putWord64 h (3::Integer)

the same rule applies if you need to write fixed-size value with non-default number of bits:

 putWord8 h (4::Int32)

functions putWord16..putWord64 uses big-endian representation, also known as network byte order - it is the order of bytes, used natively on PowerPC/Sparc processors. in this format, representation of value started fom most significant bytes. if you use bit-aligned stream, high bits of each byte are also filled first. if you need little-endian (native for Intel processors) formats, putWord16le..putWord64le is at your service


putBounded; Binary instances for Bounded Enum types

next pair of functions uses mininal possible number of bits to encode values in given range [min..max]:

 putBounded min max h x
 x <- getBounded min max h

they also support values of any Integral type. These functions are used to provide default Binary instances for all Bounded Enum types (i.e. types which support both Bounded and Enum interfaces). for example, you can declare:

 data Color = Red | Green | Blue  deriving (Bounded, Enum)

and now you can use get/put_ on Colors; Color values would be encoded using 2 bits in bit-aligned streams (of course, whole byte would be used in byte-aligned streams)


putUnsigned/putInteger/putLength; Binary instances for Int/Integer/Word

putUnsigned provides variable-sized encoding, what can be used to represent any non-negative Integral value using minimal possible number of bytes. it uses 7+1 encoding, i.e. 7 bits in each byte represents bits of actual value, and higher bit used to distinguish last byte in sequence. so, values in range 0..127 would be encoded using one byte, values in range 128..2^14-1 - using two bytes and so on

putInteger is about the same, but allows to encode also negative values, so -64..63 encoded with one byte, -2^13..2^13-1 - with two bytes...

putLength is synonym for putUnsigned, just used to represent lengths of various containers - strings, lists, arrays and so on

put_ uses putInteger to encode Int and Integer, and putUnsigned to encode Word; i don't used fixed-size representation for Int and Word because that will produce data incompatible between 32-bit and 64-bit platforms. i also don't use internal GHC's representation of Integer to speed up (de)serialization because that will produce data incompatible with other Haskell compilers. but if you need to (de)serialize large number of Integers quickly, you should use putGhcInteger/getGhcInteger procedures, described later. of course, this way your program will become compatible only with the GHC compiler.


Binary instances for Char and String (unwritten)

Lists support (unwritten)

Arrays support

This library supports (de)serialization for all array types, included in standard hierarchical libraries plus PArr arrays, supported only by GHC. Immutable array types can be (de)serialized to any Stream (just like lists); mutable arrays can be (de)serialized only in the corresponding monad (where this array can be read/modified), i.e. IOArray can be get/put only to Stream belonging to IO monad, STArray can be get/put only to Stream belonging to the same state monad. all that is done automatically, just use put_ or get operation on the corresponding array

if you read an array, you may need (or don't need, depending on the surrounding code) to specify its type explicitly, say:

 arr <- get h :: IO (Array Int Int32)


besides of automatic support for all array types in put_/get operations, there are also huge number of "low-level" array (de)serialization routines. first, there are routines

 putIArray h arr
 putMArray h arr

what can be used to write to the Stream any array that is instance of IArray or MArray class, correspondingly (the first class contains all immutable arrays: Array, UArray, DiffArray, DiffUArray; the second - all other, mutable arrays - IOArray, IOUArray, STArray, STUArray, StorableArray). corresponding operations to read these arrays require to explicitly pass them bounds of array read:

 arr <- getIArray h bounds
 arr <- getMArray h bounds

note that these operations are not full analogues of put_/get ones, which writes and reads array bounds automatically. these operations are more low-level - they reads/writes only the array elements. also note that, just like the `get` operation, you may need to specify type of the array read:

 arr <- getIArray h (0,9) :: IO (Array Int Int32)


second, you can read/write array elements with explicitly pointed (de)serialization procedure for array elements isstead of default ones, provided by the Binary class. to achive this, add `With` suffix to routine name and specify procedure to read or write array elements as the first argument:

 putIArrayWith putUnsigned  h arr
 putMArrayWith (putBits 15) h arr
 arr <- getIArrayWith getWord8 h bounds
 arr <- getMArrayWith (getBounded 1 5) h bounds

of course, you can also provide your own read/write procedures, if they have the same types as standard get/put_ functions.


there are also variants of all get operations, which uses `size` parameter instead of `bounds`, and creates arrays with bounds (0,size-1::Int). they have names with `N` at the end of of procedure name, but before `With`:

 arr <- getIArrayN h 10 :: IO (Array Int Int32)
 arr <- getMArrayNWith getWord32 h 10 :: IO (IOArray Int Int)

these operations in some way dubs the similar list procedures


at last, part of the `get` operations have versions, specialized to specific type constructors. for example, `getMArrayN` have `getIOArrayN` and `getIOUArrayN` variants which can read only the IOArray/IOUArray, accordingly. it's just a trick to avoid necessity to specify array types in `get` operations, say instead of:

 arr <- getIArrayN h 10 :: IO (Array Int Int32)

one can write

 arr <- getArrayN h 10

it is nothing more than handy shortcuts. the only exclusion is operations to read `UArray`, what is not specializations of corresponding `IArray` operations, but use some faster algorithm and work only in IO monad. if you need to read `UArray` in any other monad - please use general operations on the `IArray` instead (anyway the compiler will ensure proper use via the typechecking)


so far i don't say anything about specific operations for (de)serialization of parallel arrays (available only in GHC via the module GHC.PArr).


putGhcInteger and putGhcByteArray - GHC-specific routines (unwritten)

Defining Binary instances for custom serialization formats (unwritten)