Dealing with binary data
Handling Binary Data with Haskell
Many programming problems call for the use of binary formats for compactness, ease-of-use, compatibility or speed. This page quickly covers some common libraries for handling binary data in Haskell.
ByteStrings
Everything else in this tutorial will be based on bytestrings. Normal Haskell
String
types are linked lists of 32-bit charactors. This has a
number of useful properties like coverage of the Unicode space and lazyness,
however when it comes to dealing with byte-wise data the String
involves a space-inflation of about 24x and a large reduction in speed.
Bytestrings are packed arrays of bytes or 8-bit chars. If you have experience
in C, their memory representation would be the same as a uint8_t[]
- although bytestrings know their length and don't allow overflows etc.
Their are two major flavours of bytestrings, strict and lazy. Strict bytestrings are exactly what you would expect - a linear array of bytes in memory. Lazy bytestrings are a list of strict bytestrings, often this is called a cord in other languages. When reading a lazy bytestring from a file, the data will be read chunk by chunk and the file can be larger than the size of memory. The default chunk size is currently 32K.
Within each flavour of bytestring comes the Word8 and Char8 versions. These are
mostly an aid to the type system since they are fundamentally the same size of
element. The Word8 unpacks as a list of Word8
elements (bytes),
the Char8 unpacks as a list of Char
, which may be useful if you
want to convert them to Strings
You might want to open the documentation for strict bytestrings lazy bytestrings in another tab so that you can follow along.
Simple file IO
Here's a very simple program which copies a file from standard input to standard output
module Main where
import qualified Data.ByteString as B
import System.IO (stdin, stdout)
main = do
contents <- B.hGet stdin
B.hPut stdout contents