UTF-8

From HaskellWiki
Revision as of 07:48, 3 February 2007 by EricKow (talk | contribs)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

A small example showing how to read and write UTF-8 in Haskell.

Do whatever you want; it's going in the public domain (Eric Kow on 2007-02-02 says so, anyway)

> module Main where

> import Control.Monad (mapM_)
> import Data.Word (Word8)
> import Foreign.Marshal.Array (allocaArray, peekArray, pokeArray)
> import System.Environment (getArgs)
> import System.IO (hFileSize, Handle, hGetBuf, hPutBuf, openBinaryFile,
>                   IOMode(ReadMode, WriteMode))

We're going to be using the 2002 UTF-8 implementation by Sven Moritz Hallberg. It happens to be the one that darcs uses ( http://abridgegame.org/repos/darcs/UTF8.lhs ). Note that Pugs also has a UTF-8 library of its own, which if I believe to handle ByteStrings.

> import UTF8

We perform the demonstration on a list of files, specified as command line arguments. What we want to show is that we can both read and write UTF-8, so the demonstration will be of reading a file in, reverse every one of its lines, and writing it back out with the extension '.reversed'

> main :: IO ()
> main =
>  do args <- getArgs
>     mapM_ reverseUTF8File args

> reverseUTF8File :: FilePath -> IO ()
> reverseUTF8File f =
>   do fb <- readFileBytes f
>      case decode fb of
>        (cs, []) -> writeFileBytes (f ++ ".reverse") $ encode $ reverseLines cs
>        (_,  xs) -> fail $ show xs
>   where
>     reverseLines = unlines . map reverse . lines

For this to work, we need to have some helper functions for reading and writing [Word8]. It would be nice is if there were some standard functions for reading and writing [Word8] in files.

> readFileBytes :: FilePath -> IO [Word8]
> readFileBytes f =
>   do h <- openBinaryFile f ReadMode
>      hsize <- fromIntegral `fmap` hFileSize h
>      hGetBytes h hsize
>
> writeFileBytes :: FilePath -> [Word8] -> IO ()
> writeFileBytes f ws =
>  do h <- openBinaryFile f WriteMode
>     hPutBytes h (length ws) ws

> hGetBytes :: Handle -> Int -> IO [Word8]
> hGetBytes h c = allocaArray c $ \p ->
>                   do c' <- hGetBuf h p c
>                      peekArray c' p
>
> hPutBytes :: Handle -> Int -> [Word8] -> IO ()
> hPutBytes h c ws = allocaArray c $ \p ->
>                      do pokeArray p ws
>                         hPutBuf h p c