Revision as of 07:49, 3 February 2007
A small example showing how to read and write UTF-8 in Haskell.
Do whatever you want; it's going in the public domain (Eric Kow on 2007-02-02 says so, anyway)
> module Main where > import Control.Monad (mapM_) > import Data.Word (Word8) > import Foreign.Marshal.Array (allocaArray, peekArray, pokeArray) > import System.Environment (getArgs) > import System.IO (hFileSize, Handle, hGetBuf, hPutBuf, openBinaryFile, > IOMode(ReadMode, WriteMode))
We're going to be using the 2002 UTF-8 implementation by Sven Moritz Hallberg. It happens to be the one that darcs uses ( http://abridgegame.org/repos/darcs/UTF8.lhs ). Note that Pugs also has a UTF-8 library of its own, which if I believe to handle ByteStrings, but I'm sticking with this one because it's what I know.
> import UTF8
We perform the demonstration on a list of files, specified as command line arguments. What we want to show is that we can both read and write UTF-8, so the demonstration will be of reading a file in, reverse every one of its lines, and writing it back out with the extension '.reversed'
> main :: IO () > main = > do args <- getArgs > mapM_ reverseUTF8File args > reverseUTF8File :: FilePath -> IO () > reverseUTF8File f = > do fb <- readFileBytes f > case decode fb of > (cs, ) -> writeFileBytes (f ++ ".reverse") $ encode $ reverseLines cs > (_, xs) -> fail $ show xs > where > reverseLines = unlines . map reverse . lines
For this to work, we need to have some helper functions for reading and writing [Word8]. It would be nice is if there were some standard functions for reading and writing [Word8] in files.
> readFileBytes :: FilePath -> IO [Word8] > readFileBytes f = > do h <- openBinaryFile f ReadMode > hsize <- fromIntegral `fmap` hFileSize h > hGetBytes h hsize > > writeFileBytes :: FilePath -> [Word8] -> IO () > writeFileBytes f ws = > do h <- openBinaryFile f WriteMode > hPutBytes h (length ws) ws > hGetBytes :: Handle -> Int -> IO [Word8] > hGetBytes h c = allocaArray c $ \p -> > do c' <- hGetBuf h p c > peekArray c' p > > hPutBytes :: Handle -> Int -> [Word8] -> IO () > hPutBytes h c ws = allocaArray c $ \p -> > do pokeArray p ws > hPutBuf h p c