Difference between revisions of "Library/Data encoding"

From HaskellWiki
Jump to navigation Jump to search
(Adding Codec.Binary.DataEncoding)
(Page moved to readme.md in source code on github)
 
(13 intermediate revisions by 4 users not shown)
Line 1: Line 1:
[[Category:Libraries]]
 
Data Encodings (dataenc): A collection of data encoding algorithms.
 
 
== Data encodings library ==
 
 
The data encodings library strives to provide implementations in Haskell of every major data encoding, and a few minor ones as well. Currently the following encodings are implemented:
 
 
* Base16 (<hask>Codec.Binary.Base16</hask>)
 
* Base32 (<hask>Codec.Binary.Base32</hask>)
 
* Base32Hex (<hask>Codec.Binary.Base32Hex</hask>)
 
* Base64 (<hask>Codec.Binary.Base64</hask>)
 
* Base64Url (<hask>Codec.Binary.Base64Url</hask>)
 
* Uuencode (<hask>Codec.Binary.Uu</hask>)
 
 
== The API ==
 
 
=== Main API ===
 
 
The module <Codec.Binary.DataEncoding> provides a type that collects the functions for an individual encoding:
 
 
<haskell>
 
data DataCodec = DataCodec {
 
encode :: [Word8] -> String,
 
decode :: String -> [Word8],
 
chop :: Int -> String -> [String],
 
unchop :: [String] -> String
 
}
 
</haskell>
 
 
It also exposes instances of this type for each encoding:
 
 
<haskell>
 
base16 :: DataCodec
 
base32 :: DataCodec
 
base32Hex :: DataCodec
 
base64 :: DataCodec
 
base64Url :: DataCodec
 
uu :: DataCodec
 
</haskell>
 
 
=== Secondary API ===
 
 
Each individual encoding module is also exposed and offers four functions:
 
 
<haskell>
 
encode :: [Word8] -> String
 
decode :: String -> [Word8]
 
chop :: Int -> String -> [String]
 
unchop :: [String] -> String
 
</haskell>
 
 
== Description of the encodings ==
 
 
=== Base16 ===
 
 
Implemented as it's defined in [http://tools.ietf.org/html/rfc4648 RFC 4648].
 
 
Each four bit nibble of an octet is encoded as a character in the set 0-9,A-F.
 
 
=== Base32 ===
 
 
Implemented as it's defined in [http://tools.ietf.org/html/rfc4648 RFC 4648].
 
 
Five octets are expanded into eight so that only the five least significant bits are used. Each is then encoded into a 32-character encoding alphabet.
 
 
=== Base32Hex ===
 
 
Implemented as it's defined in [http://tools.ietf.org/html/rfc4648 RFC 4648].
 
 
Just like Base32 but with a different encoding alphabet. Unlike Base64 and Base32, data encoded with Base32Hex maintains its sort order when the encoded data is compared bit wise.
 
 
=== Base64 ===
 
 
Implemented as it's defined in [http://tools.ietf.org/html/rfc4648 RFC 4648].
 
 
Three octets are expanded into four so that only the six least significant bits are used. Each is then encoded into a 64-character encoding alphabet.
 
 
=== Base64Url ===
 
 
Implemented as it's defined in [http://tools.ietf.org/html/rfc4648 RFC 4648].
 
 
Just like Base64 but with a different encoding alphabet. The encoding alphabet is made URL and filename safe by substituting <tt>+</tt> and <tt>/</tt> for <tt>-</tt> and <tt>_</tt> respectively.
 
 
=== Uuencode ===
 
 
Unfortunately uuencode is badly specified and there are in fact several differing implementations of it. This implementation attempts to encode data in the same way as the <tt>uuencode</tt> utility found in [http://www.gnu.org/software/sharutils/ GNU's sharutils]. The workings of <hask>chop</hask> and <hask>unchop</hask> also follow how sharutils split and unsplit encoded lines.
 

Latest revision as of 16:52, 17 April 2014