Difference between revisions of "Encoding issues"

From HaskellWiki
Jump to: navigation, search
(How IO should work)
(Deleting page that hasn't been edited for over 10 years)
Line 1: Line 1:
Character encoding is a tricky issue. Different systems have different conventions, and they may not be correct for a particular use. There is a distinction between characters (<hask>Char</hask>) and bytes (<hask>Word8</hask>). Conversion between the two can be done in many ways. This page gives an overview of how these issues should be handled.
It is important to note that the issue of encoding is ''completely'' orthogonal to the use of ByteString/PackageString/WhateverString. Such a type is either a list of bytes, equivalent to <hask>[Word8]</hask> or it is a string, a list of character, <hask>[Char]</hask>.
Any type should by either simlair to a <hask>[Char]</hask> or to <char>[Word8]</hask>, and behave accordingly.
There are three kinds of operations...
I/O operations should be split into two parts, binary io and string io.
===Binary IO===
Binary IO operates on <hask>[Word8]</hask>.
put :: Handle -> [Word8] -> IO ()
get :: Handle -> IO [Word8]
===Character IO===
Character IO is layered on top of the binary IO functions, using the default encoding and decoding appropriate for the platform. An encoding error will result in an exception.
putStr :: Handle -> [Char] -> IO ()
getStr :: Handle -> IO [Char]
The encoding used depends on the platform. For unix it will be the encoding from the current locale (usually UTF-8). On windows it will be based on a Byte order mark for file IO, while the output encoding can be any unciode encoding, again with byte order mark. UTF-8 is probably the safest bet. Handles other than files may have different requirements.
===Advanced character IO===
In situations where the default encoding is not correct, or where a different form of error handling is required, encoding/decoding must be done manually.

Revision as of 14:19, 6 February 2021