Tutorials/Programming Haskell/String IO

From HaskellWiki
< Tutorials‎ | Programming Haskell
Revision as of 07:01, 17 December 2006 by DonStewart (talk | contribs) (today's code)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Programming Haskell: String processing (with a dash of concurrency)

This is part two in a series of tutorials on programming Haskell. You can get up to speed by reading yesterday's introductory article.

Today we'll look more into the basic tools at our disposal in the Haskell language, in particular, operations for doing IO and playing with files and strings.

Administrivia

Before we get started, I should clarify a small point raised by yesterday's article. One issue I forgot to mention was that there are slight differences between running Haskell in ghci, the bytecode interpreter, and compiling it to native code with GHC.

Haskell programs are executed by evaluating the special 'main' function.

    <span class='keyword'>import</span> <span class='conid'>Data</span><span class='varop'>.</span><span class='conid'>List</span>

    <span class='varid'>mylength</span> <span class='keyglyph'>=</span> <span class='varid'>foldr</span> <span class='layout'>(</span><span class='varid'>const</span> <span class='layout'>(</span><span class='varop'>+</span><span class='num'>1</span><span class='layout'>)</span><span class='layout'>)</span> <span class='num'>0</span>
    <span class='varid'>main</span> <span class='keyglyph'>=</span> <span class='varid'>print</span> <span class='layout'>(</span><span class='varid'>mylength</span> <span class='str'>"haskell"</span><span class='layout'>)</span>


To compile this to native code, we would feed the source file to the compiler:

    <span class='varop'>$</span> <span class='varid'>ghc</span> <span class='conid'>A</span><span class='varop'>.</span><span class='varid'>hs</span>
    <span class='varop'>$</span> <span class='varop'>./</span><span class='varid'>a</span><span class='varop'>.</span><span class='varid'>out</span> 
    <span class='num'>7</span>


For a faster turnaround, we can run the code directly through the bytecode interpreter, GHCi, using the 'runhaskell' program:

    <span class='varop'>$</span> <span class='varid'>runhaskell</span> <span class='conid'>A</span><span class='varop'>.</span><span class='varid'>hs</span>
    <span class='num'>7</span>


GHCi, the interactive Haskell environment, is a little bit different. As it is an interactive system, GHCi must execute your code sequentially, as you define each line. This is different to normal Haskell, where the order of definition is irrelevant. GHCi effectively executes your code inside a do-block. Therefore you can use the do-notation at the GHCi prompt to define new functions:

    <span class='varop'>$</span> <span class='varid'>ghci</span>
    <span class='conid'>Prelude</span><span class='varop'>></span> <span class='conop'>:</span><span class='varid'>m</span> <span class='varop'>+</span> <span class='conid'>Data</span><span class='varop'>.</span><span class='conid'>List</span>

    <span class='conid'>Prelude</span><span class='varop'>></span> <span class='keyword'>let</span> <span class='varid'>mylength</span> <span class='keyglyph'>=</span> <span class='varid'>foldr</span> <span class='layout'>(</span><span class='varid'>const</span> <span class='layout'>(</span><span class='varop'>+</span><span class='num'>1</span><span class='layout'>)</span><span class='layout'>)</span> <span class='num'>0</span>

    <span class='conid'>Prelude</span><span class='varop'>></span> <span class='conop'>:</span><span class='varid'>t</span> <span class='varid'>mylength</span>
    <span class='varid'>mylength</span> <span class='keyglyph'>::</span> <span class='keyglyph'>[</span><span class='varid'>a</span><span class='keyglyph'>]</span> <span class='keyglyph'>-></span> <span class='conid'>Integer</span>

    <span class='conid'>Prelude</span><span class='varop'>></span> <span class='varid'>mylength</span> <span class='str'>"haskell"</span>
    <span class='num'>7</span>


For this tutorial I will be developing code in a source file, and either compiling it as above, or loading the source file into GHCi for testing. To load a source file into GHCi, we do:

    <span class='varop'>$</span> <span class='varid'>ghci</span>    
    <span class='conid'>Prelude</span><span class='varop'>></span> <span class='conop'>:</span><span class='varid'>load</span> <span class='conid'>A</span><span class='varop'>.</span><span class='varid'>hs</span>

    <span class='varop'>*</span><span class='conid'>Main</span><span class='varop'>></span> <span class='conop'>:</span><span class='varid'>t</span> <span class='varid'>main</span>
    <span class='varid'>main</span> <span class='keyglyph'>::</span> <span class='conid'>IO</span> <span class='layout'>(</span><span class='layout'>)</span>

    <span class='varop'>*</span><span class='conid'>Main</span><span class='varop'>></span> <span class='conop'>:</span><span class='varid'>t</span> <span class='varid'>mylength</span>
    <span class='varid'>mylength</span> <span class='keyglyph'>::</span> <span class='keyglyph'>[</span><span class='varid'>a</span><span class='keyglyph'>]</span> <span class='keyglyph'>-></span> <span class='conid'>Integer</span>

    <span class='varop'>*</span><span class='conid'>Main</span><span class='varop'>></span> <span class='varid'>mylength</span> <span class='str'>"foo"</span>
    <span class='num'>3</span>

    <span class='varop'>*</span><span class='conid'>Main</span><span class='varop'>></span> <span class='varid'>main</span>
    <span class='num'>7</span>


Now, let's get into the code!

IO

As the Camel Book says:

Unless you're using artificial intelligence to model a solipsistic philosopher, your program needs some way to communicate with the outside world.


In yesterday's tutorial, I briefly introduced 'readFile', for reading a String from a file on disk. Let's consider now IO in more detail. The most common IO operations are defined in the <a href="http://haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html">System.IO</a> library.


For the most basic stdin/stdout Unix-style programs in Haskell, we can use the 'interact' function:

    <span class='varid'>interact</span>    <span class='keyglyph'>::</span>  <span class='layout'>(</span><span class='conid'>String</span> <span class='keyglyph'>-></span> <span class='conid'>String</span><span class='layout'>)</span> <span class='keyglyph'>-></span> <span class='conid'>IO</span> <span class='layout'>(</span><span class='layout'>)</span>


This higher order function takes, as an argument, some function for processing a string (of type String -> String). It runs this function over the standard input stream, printing the result to standard output. A surprisingly large number of useful programs can be written this way. For example, we can write the 'cat' unix program as:

    <span class='varid'>main</span> <span class='keyglyph'>=</span> <span class='varid'>interact</span> <span class='varid'>id</span>


Yes, that's it! Let's compile and run this program:

    <span class='varop'>$</span> <span class='varid'>ghc</span> <span class='comment'>-</span><span class='conid'>O</span> <span class='conid'>A</span><span class='varop'>.</span><span class='varid'>hs</span>       

    <span class='varop'>$</span> <span class='varid'>cat</span> <span class='conid'>A</span><span class='varop'>.</span><span class='varid'>hs</span> <span class='keyglyph'>|</span> <span class='varop'>./</span><span class='varid'>a</span><span class='varop'>.</span><span class='varid'>out</span>
    <span class='varid'>main</span> <span class='keyglyph'>=</span> <span class='varid'>interact</span> <span class='varid'>id</span>


How does this work? Firstly, 'interact' is defined as:

    <span class='varid'>interact</span> <span class='varid'>f</span> <span class='keyglyph'>=</span> <span class='keyword'>do</span> <span class='varid'>s</span> <span class='keyglyph'><-</span> <span class='varid'>getContents</span>
                    <span class='varid'>putStr</span> <span class='layout'>(</span><span class='varid'>f</span> <span class='varid'>s</span><span class='layout'>)</span>


So it reads a string from standard input, and writes to standard output the result of applying its argument function to that string. The 'id' function itself has the type:

    <span class='varid'>id</span> <span class='keyglyph'>::</span> <span class='varid'>a</span> <span class='keyglyph'>-></span> <span class='varid'>a</span>


'id' is a function of one argument, of any type (the lowercase 'a' in the type means any type can be used in that position, i.e. it is a polymorphic function (also called a generic function in some languages)). 'id' takes a value of some type 'a', and returns a value of the same type. There's is only one (non-trivial) function of this type:

    <span class='varid'>id</span> <span class='varid'>a</span> <span class='keyglyph'>=</span> <span class='varid'>a</span>


So 'interact id' will print to the input string to standard output unmodified.


Let's now write the 'wc' program:

    <span class='varid'>main</span>    <span class='keyglyph'>=</span> <span class='varid'>interact</span> <span class='varid'>count</span>
    <span class='varid'>count</span> <span class='varid'>s</span> <span class='keyglyph'>=</span> <span class='varid'>show</span> <span class='layout'>(</span><span class='varid'>length</span> <span class='varid'>s</span><span class='layout'>)</span> <span class='varop'>++</span> <span class='str'>"\n"</span>


This will print the length of the input string, that is, the number of chars:

    <span class='varop'>$</span> <span class='varid'>runhaskell</span> <span class='conid'>A</span><span class='varop'>.</span><span class='varid'>hs</span> <span class='varop'><</span> <span class='conid'>A</span><span class='varop'>.</span><span class='varid'>hs</span>
    <span class='num'>57</span>

Line oriented IO

Only a small number of programs operate on unstructured input streams. It is far more common to treat an input stream as a list of lines. So let's do that. To break a string up into lines, we'll use the ... 'lines' function, defined in the Data.List library:

    <span class='varid'>lines</span> <span class='keyglyph'>::</span> <span class='conid'>String</span> <span class='keyglyph'>-></span> <span class='keyglyph'>[</span><span class='conid'>String</span><span class='keyglyph'>]</span>


The type, once again, tells the story. 'lines' takes a String, and breaks it up into a list of strings, splitting on newlines. To join a list of strings back into a single string, inserting newlines, we'd use the ... 'unlines' function:

    <span class='varid'>unlines</span> <span class='keyglyph'>::</span> <span class='keyglyph'>[</span><span class='conid'>String</span><span class='keyglyph'>]</span> <span class='keyglyph'>-></span> <span class='conid'>String</span>


There are also similar functions for splitting on words, namely 'words' and 'unwords'. Now, an example. To count the number of lines in a file:

    <span class='varid'>main</span> <span class='keyglyph'>=</span> <span class='varid'>interact</span> <span class='layout'>(</span><span class='varid'>count</span> <span class='varop'>.</span> <span class='varid'>lines</span><span class='layout'>)</span>


We can run this as:

    <span class='varop'>$</span> <span class='varid'>ghc</span> <span class='comment'>-</span><span class='conid'>O</span> <span class='conid'>A</span><span class='varop'>.</span><span class='varid'>hs</span>

    <span class='varop'>$</span> <span class='varop'>./</span><span class='varid'>a</span><span class='varop'>.</span><span class='varid'>out</span> <span class='varop'><</span> <span class='conid'>A</span><span class='varop'>.</span><span class='varid'>hs</span>
    <span class='num'>3</span>


Here we reuse the 'count' function from above, by composing it with the lines function.

On composition

This nice code reuse via composition is achieved using the (.) function, pronounced 'compose'. Let's look at how that works. (Feel free to skip this section, if you want to just get things done).


The (.) function is just a normal everyday Haskell function, defined as:

    <span class='layout'>(</span><span class='varop'>.</span><span class='layout'>)</span> <span class='varid'>f</span> <span class='varid'>g</span> <span class='varid'>x</span> <span class='keyglyph'>=</span> <span class='varid'>f</span> <span class='layout'>(</span><span class='varid'>g</span> <span class='varid'>x</span><span class='layout'>)</span>


This looks a little like magic (or line noise), but its pretty easy. The (.) function simply takes two functions as arguments, along with another value. It applies the 'g' function to the value 'x', and then applies 'f' to the result, returning this final value. The functions may be of any type. The type of (.) is actually:

    <span class='layout'>(</span><span class='varop'>.</span><span class='layout'>)</span> <span class='keyglyph'>::</span> <span class='layout'>(</span><span class='varid'>b</span> <span class='keyglyph'>-></span> <span class='varid'>c</span><span class='layout'>)</span> <span class='keyglyph'>-></span> <span class='layout'>(</span><span class='varid'>a</span> <span class='keyglyph'>-></span> <span class='varid'>b</span><span class='layout'>)</span> <span class='keyglyph'>-></span> <span class='varid'>a</span> <span class='keyglyph'>-></span> <span class='varid'>c</span>


which might look a bit hairy, but it essentially specifies what types of arguments make sense to compose. That is, only those where:

    <span class='varid'>f</span> <span class='keyglyph'>::</span> <span class='varid'>b</span> <span class='keyglyph'>-></span> <span class='varid'>c</span>
    <span class='varid'>g</span> <span class='keyglyph'>::</span> <span class='varid'>a</span> <span class='keyglyph'>-></span> <span class='varid'>b</span>
    <span class='varid'>x</span> <span class='keyglyph'>::</span> <span class='varid'>a</span>


can be composed, yielding a new function of type:

    <span class='layout'>(</span><span class='varid'>f</span> <span class='varop'>.</span> <span class='varid'>g</span><span class='layout'>)</span> <span class='keyglyph'>::</span> <span class='varid'>a</span> <span class='keyglyph'>-></span> <span class='varid'>c</span>


The nice thing is that this composition makes sense (and works) for all types a, b and c.


How does this relate to code reuse? Well, since our 'count' function is polymorphic, it works equally well counting the length of a string, or the length of a list of strings. Our littler 'wc' program is the epitome of the phrase: "higher order + polymorphic = reusable". That is, functions which take other functions as arguments, when combined with functions that work over any type, produce great reusable 'glue'. You only need vary the argument function to gain terrific code reuse (and the strong type checking ensures you can only reuse code in ways that work).

More on lines

Another little example, let's reverse each line of a file (like the unix 'rev' command):

    <span class='varid'>main</span> <span class='keyglyph'>=</span> <span class='varid'>interact</span> <span class='layout'>(</span><span class='varid'>unlines</span> <span class='varop'>.</span> <span class='varid'>map</span> <span class='varid'>reverse</span> <span class='varop'>.</span> <span class='varid'>lines</span><span class='layout'>)</span>

Which when run, reverses the input lines:

    <span class='varop'>$</span> <span class='varop'>./</span><span class='varid'>a</span><span class='varop'>.</span><span class='varid'>out</span> <span class='varop'><</span> <span class='conid'>B</span><span class='varop'>.</span><span class='varid'>hs</span>
    <span class='varid'>rahC</span><span class='varop'>.</span><span class='varid'>ataD</span> <span class='varid'>tropmi</span>
    <span class='varid'>ebyaM</span><span class='varop'>.</span><span class='varid'>ataD</span> <span class='varid'>tropmi</span>
    <span class='varid'>tsiL</span><span class='varop'>.</span><span class='varid'>ataD</span> <span class='varid'>tropmi</span>


So we take the input string, split it into lines, and the loop over that list of lines, reversing each of them, using the 'map' function. Finally, once we've reversed each line, we join them back into a single string with unlines, and print it out.


The 'map' function is a fundamental control structure of functional programming, similar to the 'foreach' keyword in a number of imperative languages. 'map' however is just a function on lists, not built in syntax, and has the type:

    <span class='varid'>map</span> <span class='keyglyph'>::</span> <span class='layout'>(</span><span class='varid'>a</span> <span class='keyglyph'>-></span> <span class='varid'>b</span><span class='layout'>)</span> <span class='keyglyph'>-></span> <span class='keyglyph'>[</span><span class='varid'>a</span><span class='keyglyph'>]</span> <span class='keyglyph'>-></span> <span class='keyglyph'>[</span><span class='varid'>b</span><span class='keyglyph'>]</span>


That is, it takes some function, and a list, and applies that function to each element of the list, returning a new list as a result. Since loops are so common in programming, we'll be using 'map' a lot. Just for reference, 'map' is implemented as:

    <span class='varid'>map</span> <span class='keyword'>_</span> <span class='keyglyph'>[</span><span class='keyglyph'>]</span>     <span class='keyglyph'>=</span> <span class='keyglyph'>[</span><span class='keyglyph'>]</span>
    <span class='varid'>map</span> <span class='varid'>f</span> <span class='layout'>(</span><span class='varid'>x</span><span class='conop'>:</span><span class='varid'>xs</span><span class='layout'>)</span> <span class='keyglyph'>=</span> <span class='varid'>f</span> <span class='varid'>x</span> <span class='conop'>:</span> <span class='varid'>map</span> <span class='varid'>f</span> <span class='varid'>xs</span>

File IO

Operating on stdin/stdout is good for scripts (and this is how tools like sed or perl -p work), but for 'real' programs we'll at least need to do some file IO. The basic operations of files are:

    <span class='varid'>readFile</span>  <span class='keyglyph'>::</span> <span class='conid'>FilePath</span> <span class='keyglyph'>-></span> <span class='conid'>IO</span> <span class='conid'>String</span>
    <span class='varid'>writeFile</span> <span class='keyglyph'>::</span> <span class='conid'>FilePath</span> <span class='keyglyph'>-></span> <span class='conid'>String</span> <span class='keyglyph'>-></span> <span class='conid'>IO</span> <span class='layout'>(</span><span class='layout'>)</span>


'readFile' takes a file name as an argument, does some IO, and returns the file's contents as a string. 'writeFile' takes a file name, a string, and does some IO (writing that string to the file), before returning the void (or unit) value, ().


We could implement a 'cp' program on files, as:

    <span class='keyword'>import</span> <span class='conid'>System</span><span class='varop'>.</span><span class='conid'>Environment</span>

    <span class='varid'>main</span> <span class='keyglyph'>=</span> <span class='keyword'>do</span>
        <span class='keyglyph'>[</span><span class='varid'>f</span><span class='layout'>,</span><span class='varid'>g</span><span class='keyglyph'>]</span> <span class='keyglyph'><-</span> <span class='varid'>getArgs</span>
        <span class='varid'>s</span>     <span class='keyglyph'><-</span> <span class='varid'>readFile</span> <span class='varid'>f</span>
        <span class='varid'>writeFile</span> <span class='varid'>g</span> <span class='varid'>s</span>


Running this program:

    <span class='varop'>$</span> <span class='varid'>ghc</span> <span class='comment'>-</span><span class='conid'>O</span> <span class='conid'>A</span><span class='varop'>.</span><span class='varid'>hs</span>

    <span class='varop'>$</span> <span class='varop'>./</span><span class='varid'>a</span><span class='varop'>.</span><span class='varid'>out</span> <span class='conid'>A</span><span class='varop'>.</span><span class='varid'>hs</span> <span class='conid'>Z</span><span class='varop'>.</span><span class='varid'>hs</span>

    <span class='varop'>$</span> <span class='varid'>cat</span> <span class='conid'>Z</span><span class='varop'>.</span><span class='varid'>hs</span>
    <span class='keyword'>import</span> <span class='conid'>System</span><span class='varop'>.</span><span class='conid'>Environment</span>

    <span class='varid'>main</span> <span class='keyglyph'>=</span> <span class='keyword'>do</span>
        <span class='keyglyph'>[</span><span class='varid'>f</span><span class='layout'>,</span><span class='varid'>g</span><span class='keyglyph'>]</span> <span class='keyglyph'><-</span> <span class='varid'>getArgs</span>
        <span class='varid'>s</span>     <span class='keyglyph'><-</span> <span class='varid'>readFile</span> <span class='varid'>f</span>
        <span class='varid'>writeFile</span> <span class='varid'>g</span> <span class='varid'>s</span>


Since we're doing IO (the type of readFile and writeFile enforce this), the code runs inside a do-block, using the IO monad. "Using the IO monad" just means that we wish to use an imperative, sequential order of evaluation. (As an aside, a wide range of other monads exist, for programming different program evaluation strategies, such as Prolog-style backtracking, or continutation-based evaluation. All of imperative programming is just one subset of possible evaluation strategies you can use in Haskell, via monads).


In do-notation, whenever you wish to run an action, for its side effect, and save the result to a variable, you would write:

    <span class='varid'>v</span> <span class='keyglyph'><-</span> <span class='varid'>action</span>


For example, to run the 'readFile' action, which has the side effect of reading a file from disk, we say:

    <span class='varid'>s</span> <span class='keyglyph'><-</span> <span class='varid'>readFile</span> <span class='varid'>f</span>


Finally, we can use the 'appendFile' function to append to an existing file.


File Handles

The most generic interface to files is provided via Handles. Sometimes we need to keep a file open, for multiple reads or writes. To do this we use Handles, an abstraction much like the underlying system's file handles.


To open up a file, and get its Handle, we use:

    <span class='varid'>openFile</span> <span class='keyglyph'>::</span> <span class='conid'>FilePath</span> <span class='keyglyph'>-></span> <span class='conid'>IOMode</span> <span class='keyglyph'>-></span> <span class='conid'>IO</span> <span class='conid'>Handle</span>


So to open a file for reading only, in GHCi:

    <span class='conid'>Prelude</span> <span class='conid'>System</span><span class='varop'>.</span><span class='conid'>IO</span><span class='varop'>></span> <span class='varid'>h</span> <span class='keyglyph'><-</span> <span class='varid'>openFile</span> <span class='str'>"A.hs"</span> <span class='conid'>ReadMode</span>
    <span class='layout'>{</span><span class='varid'>handle</span><span class='conop'>:</span> <span class='conid'>A</span><span class='varop'>.</span><span class='varid'>hs</span><span class='layout'>}</span>


Which returns a Handle onto the file "A.hs". We can read a line from this handle:

    <span class='conid'>Prelude</span> <span class='conid'>System</span><span class='varop'>.</span><span class='conid'>IO</span><span class='varop'>></span> <span class='varid'>hGetLine</span> <span class='varid'>h</span>
    <span class='str'>"main = do"</span>


To close a Handle, and flush the buffer:

    <span class='varid'>hClose</span> <span class='keyglyph'>::</span> <span class='conid'>Handle</span> <span class='keyglyph'>-></span> <span class='conid'>IO</span> <span class='layout'>(</span><span class='layout'>)</span>


Once a Handle is closed, we can no longer read from it:

    <span class='conid'>Prelude</span> <span class='conid'>System</span><span class='varop'>.</span><span class='conid'>IO</span><span class='varop'>></span> <span class='varid'>hClose</span> <span class='varid'>h</span>
    <span class='conid'>Prelude</span> <span class='conid'>System</span><span class='varop'>.</span><span class='conid'>IO</span><span class='varop'>></span> <span class='varid'>hGetLine</span> <span class='varid'>h</span>
    <span class='varop'>***</span> <span class='conid'>Exception</span><span class='conop'>:</span> <span class='conid'>A</span><span class='varop'>.</span><span class='varid'>hs</span><span class='conop'>:</span> <span class='varid'>hGetLine</span><span class='conop'>:</span> <span class='varid'>illegal</span> <span class='varid'>operation</span> <span class='layout'>(</span><span class='varid'>handle</span> <span class='varid'>is</span> <span class='varid'>closed</span><span class='layout'>)</span>


We can also flush explicitly with:

    <span class='varid'>hFlush</span> <span class='keyglyph'>::</span> <span class='conid'>Handle</span> <span class='keyglyph'>-></span> <span class='conid'>IO</span> <span class='layout'>(</span><span class='layout'>)</span>


Other useful operations for reading from Handles:

    <span class='varid'>hGetChar</span>     <span class='keyglyph'>::</span> <span class='conid'>Handle</span> <span class='keyglyph'>-></span> <span class='conid'>IO</span> <span class='conid'>Char</span>
    <span class='varid'>hGetLine</span>     <span class='keyglyph'>::</span> <span class='conid'>Handle</span> <span class='keyglyph'>-></span> <span class='conid'>IO</span> <span class='keyglyph'>[</span><span class='conid'>Char</span><span class='keyglyph'>]</span>
    <span class='varid'>hGetContents</span> <span class='keyglyph'>::</span> <span class='conid'>Handle</span> <span class='keyglyph'>-></span> <span class='conid'>IO</span> <span class='keyglyph'>[</span><span class='conid'>Char</span><span class='keyglyph'>]</span>


To write to Handles:

    <span class='varid'>hPutChar</span>    <span class='keyglyph'>::</span> <span class='conid'>Handle</span> <span class='keyglyph'>-></span> <span class='conid'>Char</span> <span class='keyglyph'>-></span> <span class='conid'>IO</span> <span class='layout'>(</span><span class='layout'>)</span>
    <span class='varid'>hPutStr</span>     <span class='keyglyph'>::</span> <span class='conid'>Handle</span> <span class='keyglyph'>-></span> <span class='keyglyph'>[</span><span class='conid'>Char</span><span class='keyglyph'>]</span> <span class='keyglyph'>-></span> <span class='conid'>IO</span> <span class='layout'>(</span><span class='layout'>)</span>
    <span class='varid'>hPutStrLn</span>   <span class='keyglyph'>::</span> <span class='conid'>Handle</span> <span class='keyglyph'>-></span> <span class='keyglyph'>[</span><span class='conid'>Char</span><span class='keyglyph'>]</span> <span class='keyglyph'>-></span> <span class='conid'>IO</span> <span class='layout'>(</span><span class='layout'>)</span>
    <span class='varid'>hPrint</span>      <span class='keyglyph'>::</span> <span class='conid'>Show</span> <span class='varid'>a</span> <span class='keyglyph'>=></span> <span class='conid'>Handle</span> <span class='keyglyph'>-></span> <span class='varid'>a</span> <span class='keyglyph'>-></span> <span class='conid'>IO</span> <span class='layout'>(</span><span class='layout'>)</span>


Some other useful actions:

    <span class='varid'>hSeek</span>     <span class='keyglyph'>::</span> <span class='conid'>Handle</span> <span class='keyglyph'>-></span> <span class='conid'>SeekMode</span> <span class='keyglyph'>-></span> <span class='conid'>Integer</span> <span class='keyglyph'>-></span> <span class='conid'>IO</span> <span class='layout'>(</span><span class='layout'>)</span>
    <span class='varid'>hTell</span>     <span class='keyglyph'>::</span> <span class='conid'>Handle</span> <span class='keyglyph'>-></span> <span class='conid'>IO</span> <span class='conid'>Integer</span>
    <span class='varid'>hFileSize</span> <span class='keyglyph'>::</span> <span class='conid'>Handle</span> <span class='keyglyph'>-></span> <span class='conid'>IO</span> <span class='conid'>Integer</span>
    <span class='varid'>hIsEOF</span>    <span class='keyglyph'>::</span> <span class='conid'>Handle</span> <span class='keyglyph'>-></span> <span class='conid'>IO</span> <span class='conid'>Bool</span>

An example: spell checking

Here is a small example of combining the Data.Set and List data structures from yesterday's tutorial, with more IO operations. We'll implement a little spell checker, building the dictionary in a Set data type. First, some libraries to import:

    <span class='keyword'>import</span> <span class='conid'>System</span><span class='varop'>.</span><span class='conid'>Environment</span>
    <span class='keyword'>import</span> <span class='conid'>Control</span><span class='varop'>.</span><span class='conid'>Monad</span>
    <span class='keyword'>import</span> <span class='conid'>Data</span><span class='varop'>.</span><span class='conid'>Set</span>


And the complete program:

    <span class='varid'>main</span> <span class='keyglyph'>=</span> <span class='keyword'>do</span>
        <span class='keyglyph'>[</span><span class='varid'>s</span><span class='keyglyph'>]</span> <span class='keyglyph'><-</span> <span class='varid'>getArgs</span>
        <span class='varid'>f</span>   <span class='keyglyph'><-</span> <span class='varid'>readFile</span> <span class='str'>"/usr/share/dict/words"</span>
        <span class='varid'>g</span>   <span class='keyglyph'><-</span> <span class='varid'>readFile</span> <span class='varid'>s</span>
        <span class='keyword'>let</span> <span class='varid'>dict</span> <span class='keyglyph'>=</span> <span class='varid'>fromList</span> <span class='layout'>(</span><span class='varid'>lines</span> <span class='varid'>f</span><span class='layout'>)</span>
        <span class='varid'>mapM_</span> <span class='layout'>(</span><span class='varid'>spell</span> <span class='varid'>dict</span><span class='layout'>)</span> <span class='layout'>(</span><span class='varid'>words</span> <span class='varid'>g</span><span class='layout'>)</span>

    <span class='varid'>spell</span> <span class='varid'>d</span> <span class='varid'>w</span> <span class='keyglyph'>=</span> <span class='varid'>when</span> <span class='layout'>(</span><span class='varid'>w</span> <span class='varop'>`notMember`</span> <span class='varid'>d</span><span class='layout'>)</span> <span class='layout'>(</span><span class='varid'>putStrLn</span> <span class='varid'>w</span><span class='layout'>)</span>


Running this program, on its own source, and it reports the following words are not found in the dictionary:

    <span class='varop'>$</span> <span class='varid'>ghc</span> <span class='comment'>-</span><span class='conid'>O</span> <span class='conid'>Spell</span><span class='varop'>.</span><span class='varid'>hs</span> <span class='comment'>-</span><span class='varid'>o</span> <span class='varid'>spell</span>

    <span class='varop'>$</span> <span class='varop'>./</span><span class='varid'>spell</span> <span class='conid'>A</span><span class='varop'>.</span><span class='varid'>hs</span>
    <span class='conid'>Data</span><span class='varop'>.</span><span class='conid'>Char</span>
    <span class='keyglyph'>=</span>
    <span class='keyglyph'><-</span>
    <span class='layout'>(</span><span class='varid'>map</span>
    <span class='varid'>toUpper</span>
    <span class='varid'>n</span><span class='layout'>)</span>
    <span class='keyglyph'>=</span>
    <span class='keyglyph'><-</span>
    <span class='varid'>getLine</span>
    <span class='num'>1</span>

Writing the results out

If we wanted to write the results out to a temporary file, we can do so. Let's import a couple of other modules:

    <span class='keyword'>import</span> <span class='conid'>Data</span><span class='varop'>.</span><span class='conid'>Set</span>
    <span class='keyword'>import</span> <span class='conid'>Data</span><span class='varop'>.</span><span class='conid'>Maybe</span>
    <span class='keyword'>import</span> <span class='conid'>Text</span><span class='varop'>.</span><span class='conid'>Printf</span>
    <span class='keyword'>import</span> <span class='conid'>System</span><span class='varop'>.</span><span class='conid'>IO</span>
    <span class='keyword'>import</span> <span class='conid'>System</span><span class='varop'>.</span><span class='conid'>Environment</span>
    <span class='keyword'>import</span> <span class='conid'>System</span><span class='varop'>.</span><span class='conid'>Posix</span><span class='varop'>.</span><span class='conid'>Temp</span>


Refactoring the main code to separate out the reading and writing phases in to their own function, we end up with the core code:

    <span class='varid'>main</span> <span class='keyglyph'>=</span> <span class='keyword'>do</span>
        <span class='layout'>(</span><span class='varid'>f</span><span class='layout'>,</span> <span class='varid'>g</span><span class='layout'>)</span> <span class='keyglyph'><-</span> <span class='varid'>readFiles</span>
        <span class='keyword'>let</span> <span class='varid'>dict</span> <span class='keyglyph'>=</span> <span class='varid'>fromList</span> <span class='layout'>(</span><span class='varid'>lines</span> <span class='varid'>f</span><span class='layout'>)</span>
            <span class='varid'>errs</span> <span class='keyglyph'>=</span> <span class='varid'>mapMaybe</span> <span class='layout'>(</span><span class='varid'>spell</span> <span class='varid'>dict</span><span class='layout'>)</span> <span class='layout'>(</span><span class='varid'>words</span> <span class='varid'>g</span><span class='layout'>)</span>
        <span class='varid'>write</span> <span class='varid'>errs</span>

    <span class='varid'>spell</span> <span class='varid'>d</span> <span class='varid'>w</span> <span class='keyglyph'>|</span> <span class='varid'>w</span> <span class='varop'>`notMember`</span> <span class='varid'>d</span> <span class='keyglyph'>=</span> <span class='conid'>Just</span> <span class='varid'>w</span>
              <span class='keyglyph'>|</span> <span class='varid'>otherwise</span>       <span class='keyglyph'>=</span> <span class='conid'>Nothing</span>


Where reading is now its own function:

    <span class='varid'>readFiles</span> <span class='keyglyph'>=</span> <span class='keyword'>do</span>
        <span class='keyglyph'>[</span><span class='varid'>s</span><span class='keyglyph'>]</span> <span class='keyglyph'><-</span> <span class='varid'>getArgs</span>
        <span class='varid'>f</span>   <span class='keyglyph'><-</span> <span class='varid'>readFile</span> <span class='str'>"/usr/share/dict/words"</span>
        <span class='varid'>g</span>   <span class='keyglyph'><-</span> <span class='varid'>readFile</span> <span class='varid'>s</span>
        <span class='varid'>return</span> <span class='layout'>(</span><span class='varid'>f</span><span class='layout'>,</span><span class='varid'>g</span><span class='layout'>)</span>


And writing errors out to their own file:

    <span class='varid'>write</span> <span class='varid'>errs</span> <span class='keyglyph'>=</span> <span class='keyword'>do</span>
        <span class='layout'>(</span><span class='varid'>t</span><span class='layout'>,</span><span class='varid'>h</span><span class='layout'>)</span> <span class='keyglyph'><-</span> <span class='varid'>mkstemp</span> <span class='str'>"/tmp/spell.XXXXXX"</span>
        <span class='varid'>mapM_</span> <span class='layout'>(</span><span class='varid'>hPutStrLn</span> <span class='varid'>h</span><span class='layout'>)</span> <span class='varid'>errs</span>
        <span class='varid'>hClose</span> <span class='varid'>h</span>
        <span class='varid'>printf</span> <span class='str'>"%d spelling errors written to '%s'\n"</span> <span class='layout'>(</span><span class='varid'>length</span> <span class='varid'>errs</span><span class='layout'>)</span> <span class='varid'>t</span>

Pretty simple! Running this program:

    <span class='varop'>$</span> <span class='varid'>ghc</span> <span class='comment'>--make -O Spell.hs -o myspell</span>
    <span class='keyglyph'>[</span><span class='num'>1</span> <span class='keyword'>of</span> <span class='num'>1</span><span class='keyglyph'>]</span> <span class='conid'>Compiling</span> <span class='conid'>Main</span>             <span class='layout'>(</span> <span class='conid'>Spell</span><span class='varop'>.</span><span class='varid'>hs</span><span class='layout'>,</span> <span class='conid'>Spell</span><span class='varop'>.</span><span class='varid'>o</span> <span class='layout'>)</span>
    <span class='conid'>Linking</span> <span class='varid'>myspell</span> <span class='varop'>...</span>

    <span class='varop'>$</span> <span class='varop'>./</span><span class='varid'>myspell</span> <span class='conid'>Spell</span><span class='varop'>.</span><span class='varid'>hs</span>
    <span class='num'>67</span> <span class='varid'>spelling</span> <span class='varid'>errors</span> <span class='varid'>written</span> <span class='varid'>to</span> <span class='chr'>'</span><span class='varop'>/</span><span class='varid'>tmp</span><span class='varop'>/</span><span class='varid'>spell</span><span class='varop'>.</span><span class='varid'>ia8256'</span>

Extension: using SMP parallelism

Finally, just for some bonus fun ... and hold on to your hat 'cause I'm going to go fast ... we'll add some parallelism to the mix.


Haskell was designed from the start to support easy parallelisation, and since GHC 6.6, multithreaded code will run transparently on multicore systems using as many cores as you specify. Let's look at how we'd parallelise our little program to exploit multiple cores. We'll use an explicit threading model, via Control.Concurrent. You can also make your code implicitly parallel, using ">Control.Parallel.Strategies, but we'll leave that for another tutorial.


Here's the source, for you to ponder. First some imports:

    <span class='keyword'>import</span> <span class='conid'>Data</span><span class='varop'>.</span><span class='conid'>Set</span> <span class='varid'>hiding</span> <span class='layout'>(</span><span class='varid'>map</span><span class='layout'>)</span>
    <span class='keyword'>import</span> <span class='conid'>Data</span><span class='varop'>.</span><span class='conid'>Maybe</span>
    <span class='keyword'>import</span> <span class='conid'>Data</span><span class='varop'>.</span><span class='conid'>Char</span>
    <span class='keyword'>import</span> <span class='conid'>Text</span><span class='varop'>.</span><span class='conid'>Printf</span>
    <span class='keyword'>import</span> <span class='conid'>System</span><span class='varop'>.</span><span class='conid'>IO</span>
    <span class='keyword'>import</span> <span class='conid'>System</span><span class='varop'>.</span><span class='conid'>Environment</span>
    <span class='keyword'>import</span> <span class='conid'>Control</span><span class='varop'>.</span><span class='conid'>Concurrent</span>
    <span class='keyword'>import</span> <span class='conid'>Control</span><span class='varop'>.</span><span class='conid'>Monad</span>


The entry point, modified to break the word list into chunks, and then dispatching a chunk to each thread:

    <span class='varid'>main</span> <span class='keyglyph'>=</span> <span class='keyword'>do</span>
        <span class='layout'>(</span><span class='varid'>f</span><span class='layout'>,</span> <span class='varid'>g</span><span class='layout'>,</span> <span class='varid'>n</span><span class='layout'>)</span> <span class='keyglyph'><-</span> <span class='varid'>readFiles</span>
        <span class='keyword'>let</span> <span class='varid'>dict</span> <span class='keyglyph'>=</span> <span class='varid'>fromList</span> <span class='layout'>(</span><span class='varid'>lines</span> <span class='varid'>f</span><span class='layout'>)</span>
            <span class='varid'>work</span> <span class='keyglyph'>=</span> <span class='varid'>chunk</span> <span class='varid'>n</span> <span class='layout'>(</span><span class='varid'>words</span> <span class='varid'>g</span><span class='layout'>)</span>
        <span class='varid'>run</span> <span class='varid'>n</span> <span class='varid'>dict</span> <span class='varid'>work</span>


The 'run' function sets up a channel between the main thread and all children thread ('errs'), and prints spelling errors as they arrive on the channel from the children. It then forks off 'n' children threads on each piece of the work list:

    <span class='varid'>run</span> <span class='varid'>n</span> <span class='varid'>dict</span> <span class='varid'>work</span> <span class='keyglyph'>=</span> <span class='keyword'>do</span>
        <span class='varid'>chan</span> <span class='keyglyph'><-</span> <span class='varid'>newChan</span>
        <span class='varid'>errs</span> <span class='keyglyph'><-</span> <span class='varid'>getChanContents</span> <span class='varid'>chan</span>    <span class='comment'>-- errors returned back to main thread</span>
        <span class='varid'>mapM_</span> <span class='layout'>(</span><span class='varid'>forkIO</span> <span class='varop'>.</span> <span class='varid'>thread</span> <span class='varid'>chan</span> <span class='varid'>dict</span><span class='layout'>)</span> <span class='layout'>(</span><span class='varid'>zip</span> <span class='keyglyph'>[</span><span class='num'>1</span><span class='keyglyph'>..</span><span class='varid'>n</span><span class='keyglyph'>]</span> <span class='varid'>work</span><span class='layout'>)</span>
        <span class='varid'>wait</span> <span class='varid'>n</span> <span class='varid'>errs</span> <span class='num'>0</span>


The main thread then just waits on all the threads to finish, printing any spelling errors they pass up:

    <span class='varid'>wait</span> <span class='varid'>n</span> <span class='varid'>xs</span> <span class='varid'>i</span> <span class='keyglyph'>=</span> <span class='varid'>when</span> <span class='layout'>(</span><span class='varid'>i</span> <span class='varop'><</span> <span class='varid'>n</span><span class='layout'>)</span> <span class='varop'>$</span> <span class='keyword'>case</span> <span class='varid'>xs</span> <span class='keyword'>of</span>
        <span class='conid'>Nothing</span> <span class='conop'>:</span> <span class='varid'>ys</span> <span class='keyglyph'>-></span> <span class='varid'>wait</span> <span class='varid'>n</span> <span class='varid'>ys</span> <span class='varop'>$!</span> <span class='varid'>i</span><span class='varop'>+</span><span class='num'>1</span>
        <span class='conid'>Just</span> <span class='varid'>s</span>  <span class='conop'>:</span> <span class='varid'>ys</span> <span class='keyglyph'>-></span> <span class='varid'>putStrLn</span> <span class='varid'>s</span> <span class='varop'>>></span> <span class='varid'>wait</span> <span class='varid'>n</span> <span class='varid'>ys</span> <span class='varid'>i</span>


Each thread spell checks its own piece of the work list. If it finds a spelling error, it passes the offending word back over the channel to the main thread.

    <span class='varid'>thread</span> <span class='varid'>chan</span> <span class='varid'>dict</span> <span class='layout'>(</span><span class='varid'>me</span><span class='layout'>,</span> <span class='varid'>xs</span><span class='layout'>)</span> <span class='keyglyph'>=</span> <span class='keyword'>do</span>
        <span class='varid'>mapM_</span> <span class='varid'>spellit</span> <span class='varid'>xs</span>
        <span class='varid'>writeChan</span> <span class='varid'>chan</span> <span class='conid'>Nothing</span>

     <span class='keyword'>where</span>
        <span class='varid'>spellit</span> <span class='varid'>w</span> <span class='keyglyph'>=</span> <span class='keyword'>do</span>
            <span class='varid'>when</span> <span class='layout'>(</span><span class='varid'>spell</span> <span class='varid'>dict</span> <span class='varid'>w</span><span class='layout'>)</span> <span class='varop'>$</span>
                <span class='varid'>writeChan</span> <span class='varid'>chan</span> <span class='varop'>.</span> <span class='conid'>Just</span> <span class='varop'>$</span> <span class='varid'>printf</span> <span class='str'>"Thread %d: %-25s"</span> <span class='layout'>(</span><span class='varid'>me</span><span class='keyglyph'>::</span><span class='conid'>Int</span><span class='layout'>)</span> <span class='varid'>w</span>


The 'spell' function is simplified a little:

    <span class='varid'>spell</span> <span class='varid'>d</span> <span class='varid'>w</span> <span class='keyglyph'>=</span> <span class='varid'>w</span> <span class='varop'>`notMember`</span> <span class='varid'>d</span>

which we could also write as:

    <span class='varid'>spell</span> <span class='keyglyph'>=</span> <span class='varid'>flip</span> <span class='varid'>notMember</span>


We modify the readFiles phase to take an additional numeric command line argument, specifying the number of threads to run:

    <span class='varid'>readFiles</span> <span class='keyglyph'>=</span> <span class='keyword'>do</span>
        <span class='keyglyph'>[</span><span class='varid'>s</span><span class='layout'>,</span><span class='varid'>n</span><span class='keyglyph'>]</span> <span class='keyglyph'><-</span> <span class='varid'>getArgs</span>
        <span class='varid'>f</span>     <span class='keyglyph'><-</span> <span class='varid'>readFile</span> <span class='str'>"/usr/share/dict/words"</span>
        <span class='varid'>g</span>     <span class='keyglyph'><-</span> <span class='varid'>readFile</span> <span class='varid'>s</span>
        <span class='varid'>return</span> <span class='layout'>(</span><span class='varid'>f</span><span class='layout'>,</span><span class='varid'>g</span><span class='layout'>,</span> <span class='varid'>read</span> <span class='varid'>n</span><span class='layout'>)</span>


We compile this with the GHC SMP parallel runtime system:

    <span class='varop'>$</span> <span class='varid'>ghc</span> <span class='comment'>-</span><span class='conid'>O</span> <span class='comment'>--make -threaded Spell.hs -o spell</span>


Now, we can run 'n' worker threads (lightweight Haskell threads), mapped onto 'm' OS threads. Since I'm using a 4 core linux server, we'll play around with 4 OS threads. First, running everything in a single thread:

    <span class='varop'>$</span> <span class='varid'>time</span> <span class='varop'>./</span><span class='varid'>spell</span> <span class='varid'>test</span><span class='varop'>.</span><span class='varid'>txt</span> <span class='num'>1</span> <span class='varop'>+</span><span class='conid'>RTS</span> <span class='comment'>-</span><span class='conid'>N1</span>
    <span class='varop'>...</span>
    <span class='conid'>Thread</span> <span class='num'>1</span><span class='conop'>:</span> <span class='varid'>week</span><span class='conop'>:</span>                    
    <span class='conid'>Thread</span> <span class='num'>1</span><span class='conop'>:</span> <span class='conid'>IO</span><span class='varop'>!</span>
    <span class='varop'>./</span><span class='varid'>spell</span> <span class='varid'>test</span><span class='varop'>.</span><span class='varid'>txt</span> <span class='num'>1</span> <span class='varop'>+</span><span class='conid'>RTS</span> <span class='comment'>-</span><span class='conid'>N1</span> <span class='num'>99</span><span class='varop'>%</span> <span class='varid'>cpu</span> <span class='num'>2.533</span> <span class='varid'>total</span>


Ok, now we change the command line flag to run it with 4 OS threads, to try to utilise all 4 cores:

    <span class='varop'>$</span> <span class='varid'>time</span> <span class='varop'>./</span><span class='varid'>spell</span> <span class='num'>4</span> <span class='varop'>+</span><span class='conid'>RTS</span> <span class='comment'>-</span><span class='conid'>N4</span>
    <span class='varop'>...</span>
    <span class='conid'>Thread</span> <span class='num'>2</span><span class='conop'>:</span> <span class='varid'>week</span><span class='conop'>:</span>                    
    <span class='conid'>Thread</span> <span class='num'>3</span><span class='conop'>:</span> <span class='conid'>IO</span><span class='varop'>!</span>
    <span class='varop'>./</span><span class='varid'>spell</span> <span class='varid'>test</span><span class='varop'>.</span><span class='varid'>txt</span> <span class='num'>4</span> <span class='varop'>+</span><span class='conid'>RTS</span> <span class='comment'>-</span><span class='conid'>N4</span> <span class='num'>111</span><span class='varop'>%</span> <span class='varid'>cpu</span> <span class='num'>2.335</span> <span class='varid'>total</span>


Ok. Good... A little bit faster, uses a little bit more cpu. It turns out however the program is bound currently by the time spent in the main thread building the initial dictionary. Actual searching time is only some 10% of the running time. Nonetheless, it was fairly painless to break up the initial simple program into a parallel version.


If the program running time was extended (as the case for a server), the parallelism would be a win. Additionally, should we buy more cores for the server, all we need to is change the +RTS -N argument to the program, to start utilising these extra cores.

Next week

In upcoming tutorials we'll look more into implicitly parallel programs, and the use of the new high performance ByteString data type for string processing.