(More harm than good?)
Revision as of 02:48, 3 February 2007
A small example showing how to work with UTF-8 in Haskell.
Do whatever you want; it's going in the public domain (Eric Kow on 2007-02-02 says so, anyway)
Note that I don't really know what the best practices are wrt to reading and writing UTF-8, but here's what works for me.
> module Main where > import Control.Monad (mapM_) > import Data.Word (Word8) > import Foreign.Marshal.Array (allocaArray, peekArray, pokeArray) > import System.Environment (getArgs) > import System.IO (hFileSize, Handle, hGetBuf, hPutBuf, openBinaryFile, > IOMode(ReadMode, WriteMode))
We're going to be using the 2002 UTF-8 implementation by Sven Moritz Hallberg. I don't know if this is the best one, but it's what darcs uses. See http://abridgegame.org/repos/darcs/UTF8.lhs
> import UTF8
We perform the demonstration on a list of files, specified as command line arguments. What we want to show is that we can both read and write UTF-8, so the demonstration will be of reading a file in, reverse every one of its lines, and writing it back out with the extension '.reversed'
> main :: IO () > main = > do args <- getArgs > mapM_ reverseUTF8File args > reverseUTF8File :: FilePath -> IO () > reverseUTF8File f = > do fb <- readFileBytes f > case decode fb of > (cs, ) -> writeFileBytes (f ++ ".reverse") $ encode $ reverseLines cs > (_, xs) -> fail $ show xs > where > reverseLines = unlines . map reverse . lines
For this to work, we need to have some helper functions for reading and writing [Word8]. I don't know if this is the right way to go about it.
> readFileBytes :: FilePath -> IO [Word8] > readFileBytes f = > do h <- openBinaryFile f ReadMode > hsize <- fromIntegral `fmap` hFileSize h > hGetBytes h hsize > > writeFileBytes :: FilePath -> [Word8] -> IO () > writeFileBytes f ws = > do h <- openBinaryFile f WriteMode > hPutBytes h (length ws) ws > hGetBytes :: Handle -> Int -> IO [Word8] > hGetBytes h c = allocaArray c $ \p -> > do c' <- hGetBuf h p c > peekArray c' p > > hPutBytes :: Handle -> Int -> [Word8] -> IO () > hPutBytes h c ws = allocaArray c $ \p -> > do pokeArray p ws > hPutBuf h p c