Dealing with binary data

From HaskellWiki
Revision as of 20:18, 28 January 2008 by AdamLangley (talk | contribs) (Incremental saving: page not ready)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Handling Binary Data with Haskell

Many programming problems call for the use of binary formats for compactness, ease-of-use, compatibility or speed. This page quickly covers some common libraries for handling binary data in Haskell.

ByteStrings

Everything else in this tutorial will be based on bytestrings. Normal Haskell String types are linked lists of 32-bit charactors. This has a number of useful properties like coverage of the Unicode space and lazyness, however when it comes to dealing with byte-wise data the String involves a space-inflation of about 24x and a large reduction in speed.

Bytestrings are packed arrays of bytes or 8-bit chars. If you have experience in C, their memory representation would be the same as a uint8_t[] - although bytestrings know their length and don't allow overflows etc.

Their are two major flavours of bytestrings, strict and lazy. Strict bytestrings are exactly what you would expect - a linear array of bytes in memory. Lazy bytestrings are a list of strict bytestrings, often this is called a cord in other languages. When reading a lazy bytestring from a file, the data will be read chunk by chunk and the file can be larger than the size of memory. The default chunk size is currently 32K.

Within each flavour of bytestring comes the Word8 and Char8 versions. These are mostly an aid to the type system since they are fundamentally the same size of element. The Word8 unpacks as a list of Word8 elements (bytes), the Char8 unpacks as a list of Char, which may be useful if you want to convert them to Strings

You might want to open the documentation for strict bytestrings lazy bytestrings in another tab so that you can follow along.

Simple file IO

Here's a very simple program which copies a file from standard input to standard output

module Main where

import qualified Data.ByteString as B
import System.IO (stdin, stdout)

main = do
  contents <- B.hGet stdin
  B.hPut stdout contents