Tutorials/Programming Haskell/Introduction

From HaskellWiki
Jump to navigation Jump to search

It's about time we got some job done in Haskell, eh? Now, one of my favourite programming books as an undergraduate was the Camel Book, "Programming Perl". It was full of lots of practical examples of Perl code, written well. (And I'm grateful to Larry Wall, Tom Christiansen and Randal Schwartz for writing the book that made programming fun).

So what would it look like if we wrote a Haskell tutorial in this style? Let's have at it!

Getting started

Like some languages Haskell can be both compiled and interpreted. The most widely used implementation of Haskell currently is GHC, which provides both an optimising native code compiler, and an interactive bytecode interpreter. I'll be using GHC (or its interactive front end, GHCi, for all code. So grab a copy of GHC now, from your package system, or the GHC home page.

Start up GHCi:

   $ ghci
      ___         ___ _
     / _ \ /\  /\/ __(_)
    / /_\// /_/ / /  | |      GHC Interactive, version 6.6, for Haskell 98.
   / /_\\/ __  / /___| |      http://www.haskell.org/ghc/
   \____/\/ /_/\____/|_|      Type :? for help.
   Loading package base ... linking ... done.
   Prelude>

The interpreter now waits for your code fragments. The "Prelude" prompt indicates which library modules are in scope, and in this case, only the basic language module, known as the Prelude.

Now we can start running Haskell code.

Prelude> "G'day, world!"
"G'day, world!"

Prelude> putStrLn "G'day, world!"
G'day, world!

You can compile this code to a native binary using GHC, by writing in a source file:

main = putStrLn "G'day, world!"

and then compiling the source to native code. Assuming your file is A.hs:

   $ ghc A.hs

This produces a new executable, ./a.out (a.out.exe on windows), which you can run like any other program on your system:

   $ ./a.out
   G'day, world!

Variables

We can name arbitrary fragments of Haskell using variables. Like so:

phrase = "G'day, world!"
main = putStrLn phrase

We don't have to define what type phrase is, as Haskell uses type inference to infer at compile time the types of all expressions in the program. As "G'day, world!" is a string, so must phrase be a string. There are a bunch of basic types of values to play with. Here's a small sample:

answer      = 42
pi          = 3.141592653589793238462643383279502884197169399375105820974944592
avocados    = 6.02e23
pet         = "Lambda"
sign        = "I love my " ++ pet
coat        = "It costs $100"
hence       = "whence"
thence      = hence
moles       = 2.5
x           = moles * avocados
c           = '#'
pair        = (2.5, "lambdas")
list        = [5,6,4,3,1]
options     = Just "done"
failed      = Nothing
void        = ()

One important thing to remember is that Haskell's variables, like in most functional programming languages, are like variables in mathematics, and are just names for expressions. They're explicitly not mutable boxes, like in most imperative programming languages. As a result, you never need to worry about initialising a Haskell variable, nor do you need to worry about the current value in a variable: it always has the same value, and can always be replaced with its definition. So the following behaves just like it would in maths:

answer      = 42
another     = answer + 1
more        = another + answer
main        = print more

That is,

   $ ghc A.hs
   $ ./a.out
   85

Now, since variables are just names for program fragments, you can evaluate Haskell on paper by replacing all names with their definition, until you reach a final value, like so:

    main = print more
  =>
    main = print (another + answer)
  =>
    main = print ((answer + 1) + answer)
  =>
    main = print ((answer + 1) + 42)
  =>
    main = print ((42 + 1) + 42)
  =>
    main = print (43 + 42)
  =>
    main = print 85
  =>
    85

Having such a simple system for variables allows for a wide range of interesting optimisations, and makes understanding what a program is doing at any point much easier, since you don't have to worry about what state a variable might currently be in. (Of course, some problems need (threadsafe) mutable boxes, and they're available as a library for when you need that).

Collections

Often you need to collect a bunch of values together into some kind of collection. Haskell has many many collection types, but in particular, it has lists and finite maps, which operate much like arrays and hashes of the imperative world.

Lists

A list is just an ordered, um, list of values. They can be nested, and transformed in all sorts of ways, using functions. Assuming your file, A.hs, contains:

home  = ["couch", "chair", "table", "stove"]

We can play around with this stuff like so:

    $ ghci A.hs

    *Main> home
    ["couch","chair","table","stove"]

    *Main> head home
    "couch"

    *Main> tail home
    ["chair","table","stove"]

    *Main> last home
    "stove"

    *Main> home !! 2
    "table"

    *Main> reverse home
    ["stove","table","chair","couch"]

    *Main> map reverse home
    ["hcuoc","riahc","elbat","evots"]

Loading in the List library gives us some more functions to use:

    *Main> :m + Data.List

    *Main Data.List> intersperse "#" home
    ["couch","#","chair","#","table","#","stove"]

    *Main Data.List> concat (intersperse "#" home)
    "couch#chair#table#stove"

    *Main Data.List> home \\ ["table","stove"]
    ["couch","chair"]

Finite Maps

Finite maps (or maps) are the lookup tables of purely functional programming. Whenever you'd use some kind of hash in an imperative language, you can replace it with a Map in Haskell.

Like hashes, maps can be seen as a table of pairs of keys and values. You can declare a new map:

import Data.Map

days = fromList
        [ ("Sun",  "Sunday"     )
        , ("Mon",  "Monday"     )
        , ("Tue",  "Tuesday"    )
        , ("Wed",  "Wednesday"  )
        , ("Thu",  "Thursday"   )
        , ("Fri",  "Friday"     )
        , ("Sat",  "Saturday"   ) ]

You can also convert a map to a list, using (well, duh!) toList:

   *Main> toList days
    [("Fri","Friday"),("Mon","Monday"),("Sat","Saturday")
    ,("Sun","Sunday"),("Thu","Thursday"),("Tue","Tuesday")
    ,("Wed","Wednesday")]

Note that they come out unordered, just like in hashes. If you just want the keys of the map:

*Main> keys days
    ["Fri","Mon","Sat","Sun","Thu","Tue","Wed"]

*Main> elems days
    ["Friday","Monday","Saturday","Sunday","Thursday","Tuesday","Wednesday"]

Since maps are a good structure for looking up values, you can search them using the lookup function. This function returns the element, if found:

    *Main> Data.Map.lookup "Tue" days
    "Tuesday"

Since the name 'lookup' is also used by a list function of similar purpose in the Prelude, we use the qualified name here to disambiguate which 'lookup' to use.

On failure

But what happens if the key is not found? (Feel free to skip this section if you don't care about errors yet) lookup will then fail, and how it fails depends on what kind of failure you want. Haskell goes to great lengths to make programming for failure flexible. For example, to fail with an exception:

    *Main> Data.Map.lookup "Thor" days
    *** Exception: user error (Data.Map.lookup: Key not found)

Which is the same as failing with an IO error. We can specify this specifically with a type annotation, to say "fail with an IO error":

    *Main> Data.Map.lookup "Thor" days :: IO String
    *** Exception: user error (Data.Map.lookup: Key not found)

Often you might instead prefer that some special value is returned on failure:

    *Main> Data.Map.lookup "Thors" days :: Maybe String
    Nothing

Maybe you'd just like an empty list:

    *Main> Data.Map.lookup "Thor" days :: [String]
    []

Finally, you can always provide an explicit default value:

    *Main> findWithDefault "Not found" "Thor" days
    "Not found"

Failure is entirely under your control!

Actions

Now, real programs interact with the outside world. They call functions which do IO, as a side effect, and then return some value. In Haskell, functions with side effects are often called actions, to distinguish them from normal Haskell functions (which behave like mathematical functions: they take inputs and return a result, with no side effects). Programming with side effects is carefully handled in Haskell, again to control the possibility of errors, and all functions which have side effects have a special type: the IO type.

For example, the function to print a string has the following type (and you can ask the interpreter for the type interactively):

    Prelude> :t putStr
    putStr :: String -> IO ()

which tells you that this function takes a String as an argument, does some IO side effect, and returns the null value. It is equivalent to the following C type:

   void putStr(char *);

but with a bit of extra information, namely, that the function does some IO. We would print out some element of our map like so:

    main = print ("Tue in long form is " ++ findWithDefault "Not found" "Tue" days)

    *Main> main
    "Tue in long form is Tuesday"

An example

One of the classic programming puzzles for introducing real world problems is the 'class grade' problem. You have a text file containing a list of student names and their grades, and you'd like to extract various information and display it. In deference to The Camel Book, we'll follow this lead, and start with a file "grades", containing something like this:

   Alonzo 70
   Simon 94
   Henk 79
   Eugenio 69
   Bob 80
   Oleg 77
   Philip 73
   ...

Student's appear multiple times, with entries for each of their subjects Let's read this file, populate a map with the data, and print some statistical information about the results. First thing to do is import some basic libraries:

import Data.Char
import Data.Maybe
import Data.List
import qualified Data.Map hiding (map)
import Text.Printf

And now here's the entire program, to read the grades file, compute all the averages, and print them:

main = do
    src <- readFile "grades"
    let pairs   = map (split.words) (lines src)
    let grades  = foldr insert Data.Map.empty pairs
    mapM_ (draw grades) (sort (Data.Map.keys grades))
  where
    insert (s, g) = Data.Map.insertWith (++) s [g]
    split [name,mark] = (name, read mark)
 
draw g s = printf "%s\t%s\tAverage: %f\n" s (show marks) avg
  where
    marks = Data.Map.findWithDefault (error "No such student") s g
    avg   = sum marks / fromIntegral (length marks) :: Double


Running it

How do we run this program? There's lots of ways:

Compile it to native code

   $ ghc -O Grades.hs
   $ ./a.out
   Alonzo  [70.0,71.0]     Average: 70.5
   Bob     [80.0,88.0]     Average: 84.0
   Eugenio [69.0,98.0]     Average: 83.5
   Henk    [79.0,81.0]     Average: 80.0
   Oleg    [77.0,68.0]     Average: 72.5
   Philip  [73.0,71.0]     Average: 72.0
   Simon   [94.0,83.0]     Average: 88.5

Run it in the bytecode interpreter

   $ runhaskell Grades.hs
   Alonzo  [70.0,71.0]     Average: 70.5
   Bob     [80.0,88.0]     Average: 84.0
   Eugenio [69.0,98.0]     Average: 83.5
   Henk    [79.0,81.0]     Average: 80.0
   Oleg    [77.0,68.0]     Average: 72.5
   Philip  [73.0,71.0]     Average: 72.0
   Simon   [94.0,83.0]     Average: 88.5

Execute it interactively

   $ ghci Grades.hs
   Prelude Main> main
   Alonzo  [70.0,71.0]     Average: 70.5
   Bob     [80.0,88.0]     Average: 84.0
   Eugenio [69.0,98.0]     Average: 83.5
   Henk    [79.0,81.0]     Average: 80.0
   Oleg    [77.0,68.0]     Average: 72.5
   Philip  [73.0,71.0]     Average: 72.0
   Simon   [94.0,83.0]     Average: 88.5

Make the script executable

Under unix, you can use the #! convention to make a script executable. Add the following to the top of the source file:

   #!/usr/bin/env runhaskell

And then set the script executable:

   $ chmod +x Grades.hs
   $ ./Grades.hs
   Alonzo  [70.0,71.0]     Average: 70.5
   Bob     [80.0,88.0]     Average: 84.0
   Eugenio [69.0,98.0]     Average: 83.5
   Henk    [79.0,81.0]     Average: 80.0
   Oleg    [77.0,68.0]     Average: 72.5
   Philip  [73.0,71.0]     Average: 72.0
   Simon   [94.0,83.0]     Average: 88.5

Next week

More IO!