https://wiki.haskell.org/api.php?action=feedcontributions&user=JoshHoyt&feedformat=atomHaskellWiki - User contributions [en]2024-03-29T13:34:30ZUser contributionsMediaWiki 1.35.5https://wiki.haskell.org/index.php?title=Hitchhikers_guide_to_Haskell&diff=2385Hitchhikers guide to Haskell2006-02-20T05:32:30Z<p>JoshHoyt: </p>
<hr />
<div>= Hitchhikers Guide To The Haskell =<br />
<br />
== Preface: DONT PANIC! ==<br />
<br />
Recent experiences from a few of my fellow C++/Java programmers<br />
indicate that they read various Haskell tutorials with "exponential<br />
speedup" (think about how TCP/IP session starts up). They start slow<br />
and cautious, but when they see that the first 3-5 pages do not<br />
contain "anything interesting" in terms of code and examples, they<br />
begin skipping paragraphs, then chapters, then whole pages, only to<br />
slow down - often to a complete halt - somewhere on page 50, finding<br />
themselves in the thick of concepts like "type classes", "type<br />
constructors", "monadic IO", at which point they usually panic, think<br />
of a perfectly rational excuse not to read further anymore, and<br />
happily forget this sad and scary encounter with Haskell (as human<br />
beings usually tend to forget sad and scary things).<br />
<br />
This text intends to introduce the reader to the practical aspects of Haskell<br />
from the very beginning (plans for the first chapters include: I/O, darcs,<br />
Parsec, QuickCheck, profiling and debugging, to mention the few). The reader<br />
is expected to know (where to find) at least the basics of haskell: how to run<br />
"hugs" or "ghci", that layout is 2-dimensional, etc. Other than that, we do<br />
not plan to take radical leaps, and will go one step at a time in order not to<br />
lose the reader along the way. So DONT PANIC, take your towel with you and<br />
read along.<br />
<br />
Oh, almost forgot: author is very interested in ANY feedback. Drop him a line<br />
or a word (see [[Adept]] for contact info)<br />
<br />
== Chapter 1: Ubiquitous "Hello world!" and other ways to do IO in Haskell ==<br />
<br />
Each chapter will be dedicated to one small real-life task which we will<br />
complete from the ground up.<br />
<br />
So here is the task for this chapter: in order to free up space on<br />
your hard drive for all the haskell code you are going to write in the<br />
nearest future, you are going to archive some of the old and dusty<br />
information on CDs and DVDs. While CD (or DVD) burning itself is easy<br />
these days, it usually takes some (or quite a lot ot) time to decide<br />
how to put a several Gb's of digital photos on CD-Rs, when directories<br />
with images range from 10 to 300 Mb's in size, and you dont want to<br />
burn half-full (or half-empty) CD-Rs.<br />
<br />
So, the task is to write a program which will help us to put a given<br />
collection of directories on the minimum possible amount of media,<br />
while packing the media as tight as possible. Let's name this program<br />
"cd-fit".<br />
<br />
Oh. Wait. Let's do the usual "hello world" thing, before we forget about it,<br />
and then move on to more interesting things:<br />
<br />
-- put this in hello.hs<br />
module Main where<br />
main = putStrLn "Hello world!"<br />
<br />
Run it:<br />
<br />
$ runhaskell ./hello.hs<br />
Hello world!<br />
<br />
OK, we've done it. Move along now, nothing interesting here :)<br />
<br />
Any serious development must be done with the help of version control<br />
system, and we will not make an exception. We will use modern<br />
distributed version control system "darcs". "Modern" means that it is<br />
written in Haskell, "distributed" means that each working copy is<br />
repository in itself.<br />
<br />
First, lets create an empty directory for all our code, and invoke<br />
"darcs init" there, which will create subdirectory "_darcs" to store<br />
all version-control-related stuff there.<br />
<br />
Fire up your favorite editor and create new file "cd-fit.hs" in our<br />
working directory. Now lets think for a moment about how our program<br />
will operate and express it in pseudocode:<br />
<br />
main = read list of directories and their sizes<br />
decide how to fit them on CD-Rs<br />
print solution<br />
<br />
Sounds reasonable? I thought so.<br />
<br />
Lets simplify our life a little and assume for now that we will<br />
compute directory sizes somewhere outside our program (for example,<br />
with "du -sb *") and read this information from stdin.<br />
Now let me convert all this to Haskell:<br />
<br />
module Main where<br />
<br />
main = do input <- getContents<br />
putStrLn ("DEBUG: got input " ++ input)<br />
-- compute solution and print it<br />
<br />
Not really working, but pretty close to plain English, eh? Let's stop<br />
for a moment and look closer at whats written here line-by-line<br />
<br />
Let's begin from the top:<br />
<br />
input <- getContents<br />
<br />
This is an example of the Haskell syntax for doing IO (namely, input). This<br />
line is an instruction to read all the information available from the stdin,<br />
return it as a single string, and bind to the symbol "input", so we can<br />
process this string any way we want.<br />
<br />
How did I know that? Did I memorize all the functions by heart? Of course not!<br />
Each function has a type, which, along with function's name, usually tells a<br />
lot about what function will do.<br />
<br />
Let's fire up interactive Haskell environment and examine this function<br />
up close:<br />
<br />
$ ghci<br />
___ ___ _<br />
/ _ \ /\ /\/ __(_)<br />
/ /_\// /_/ / / | | GHC Interactive, version 6.4.1, for Haskell 98.<br />
/ /_\\/ __ / /___| | http://www.haskell.org/ghc/<br />
\____/\/ /_/\____/|_| Type :? for help.<br />
<br />
Loading package base-1.0 ... linking ... done.<br />
Prelude> :type getContents<br />
getContents :: IO String<br />
Prelude> <br />
<br />
We see that "getContents" is a function without arguments, that will return<br />
"IO String". Prefix "IO" meant that this is an IO action. It will return<br />
String, when evaluated. Action will be evaluated as soon as we use "<-" to<br />
bind its result to some symbol.<br />
<br />
Note that "<-" is not a fancy way to do assignment to variable. It is a way to<br />
evaluate (execute) IO actions, in other words - to actually do some I/O and<br />
return its result (if any). <br />
<br />
We can choose not to evaluate action obtained from "getContents", but to carry<br />
it around a bit and evaluate later:<br />
<br />
let x = getContents<br />
-- 300 lines of code here<br />
input <- x<br />
<br />
So, as you see, IO actions can act like an ordinary values. Suppose that we<br />
have built a list of IO actions and found a way to execute them one by one.<br />
This would be a way to simulate imperative programming with its notion of<br />
"order of execution".<br />
<br />
Haskell allows you to do better than that. <br />
<br />
Standard language library (named "Prelude", by the way) provides us with lots<br />
of functions that return useful primitive IO actions. In order to combine them<br />
to produce more complex actions, we use a "do":<br />
<br />
c = do a <- someAction<br />
b <- someOtherAction<br />
print (bar b)<br />
print (foo a)<br />
putStrLn "done"<br />
<br />
Here we '''bind''' "c" to an action with the following "scenario":<br />
* '''evaluate''' action "someAction" and '''bind''' its result to "a"<br />
* then, '''evaluate''' "someOtherAction" and '''bind''' its result to "b"<br />
* then, process "b" with function "bar" and print result<br />
* then, process "a" with function "foo" and print result<br />
* then, print the word "done"<br />
<br />
When all this will actually be executed? Answer: as soon as we evaluate "c"<br />
using the "<-" (if it returns result, as "getContents" does) or just<br />
by using it as a function name (if it does not return a result, as "print"<br />
does):<br />
<br />
process = do putStrLn "Will do some processing"<br />
c<br />
putStrLn "Done"<br />
<br />
Notice that we took a bunch of functions ("someAction", "someOtherAction",<br />
"print", "putStrLn") and using "do" created from them a new function, which we<br />
bound to symbol "c". Now we could use "c" as a building block to produce even<br />
more complex function, "process", and we could carry this on and on.<br />
Eventually, some of the functions will be mentioned in the code of function<br />
"main", to which the ultimate topmost IO action any Haskell program is bound.<br />
<br />
When the "main" will be executed/evaluated/forced? As soon as we run the<br />
program. Read this twice and try to comprehend: <br />
<br />
''Execution of the Haskell program is an evaluation of the symbol "main" to<br />
which we have bound an IO action. Via evaluation we obtain the result of that<br />
action''. <br />
<br />
Readers familiar with advanced C++ or Java programming and arcane body of<br />
knowledge named "OOP Design Patterns" might note that "build actions from<br />
actions" and "evaluate actions to get result" is essentially a "Command<br />
pattern" and "Composition pattern" combined. Good new: in Haskell you get them<br />
for all your IO, and get them '''for free''' :)<br />
<br />
----<br />
'''Exercise:'''<br />
Consider the following code:<br />
<br />
module Main where<br />
c = putStrLn "C!"<br />
<br />
combine before after =<br />
do before<br />
putStrLn "In the middle"<br />
after<br />
<br />
main = do combine c c<br />
let b = combine (putStrLn "Hello!") (putStrLn "Bye!)<br />
let d = combine (b) (combine c c)<br />
putStrLn "So long!"<br />
<br />
See how we construct code out of thin air? Try to imagine what this code will<br />
do, then run it and check yourself. <br />
<br />
Do you understand why "Hello!" and "Bye!" are not printed?<br />
----<br />
<br />
Let's examine our "main" function closer:<br />
<br />
Prelude> :load cd-fit.hs<br />
Compiling Main ( ./cd-fit.hs, interpreted )<br />
Ok, modules loaded: Main.<br />
*Main> :type main<br />
main :: IO ()<br />
*Main> <br />
<br />
We see that "main" is indeed an IO action which will return nothing<br />
when evaluated. When combining actions with "do", the type of the<br />
result will be the type of the last action, and "putStrLn something" has type<br />
"IO ()": <br />
<br />
*Main> :type putStrLn "Hello world!"<br />
putStrLn "Hello world!" :: IO ()<br />
*Main> <br />
<br />
Oh, by the way: have you noticed that we actually compiled our first<br />
Haskell program in order to examine "main"? :)<br />
<br />
Lets celebrate that by putting it under version control: execute<br />
"darcs add cd-fit.hs" and "darcs record", answer "y" to all questions<br />
and provide a commit comment "Skeleton of cd-fit.hs"<br />
<br />
Let's try to run it:<br />
<br />
$ echo "foo" | runhaskell cd-fit.hs<br />
DEBUG: got input foo<br />
<br />
----<br />
'''Exercises''':<br />
<br />
* Try to write program that takes your name from the stdin and greets you (keywords: getLine, putStrLn);<br />
<br />
* Try to write program that asks for you name, reads it, greets you, asks for your favorite color, and prints it back (keywords: getLine, putStrLn).<br />
<br />
== Chapter 2: Parsing the input ==<br />
<br />
OK, now that we have proper understanding of the powers of Haskell IO<br />
(and are awed by them, I hope), lets forget about IO and actually do<br />
some usefull work. <br />
<br />
As you remember, we set forth to pack some CD-Rs as tightly as<br />
possible with data scattered in several input directories. We assume<br />
that "du -sb" will compute the sizes of input directories and output<br />
something like:<br />
<br />
65572 /home/adept/photos/raw-to-burn/dir1<br />
68268 /home/adept/photos/raw-to-burn/dir2<br />
53372 /home/adept/photos/raw-to-burn/dir3<br />
713124 /home/adept/photos/raw-to-burn/dir4<br />
437952 /home/adept/photos/raw-to-burn/dir5<br />
<br />
Our next task is to parse that input into some suitable internal<br />
representation.<br />
<br />
For that we will use powerful library of '''parsing combinators''' named<br />
"Parsec" which ships with most Haskell implementations.<br />
<br />
Much like the IO facilities we have seen in the first chapter, this<br />
library provides a set of basic parsers and means to combine into more<br />
complex parsing constructs.<br />
<br />
Unlike other tools in this area (lex/yacc or JavaCC to name a few),<br />
Parsec parsers do not require separate preprocessing stage. Since in<br />
Haskell we can return function as a result of function and thus<br />
construct functions "from the thin air", there is no need for separate<br />
syntax for parser description. But enough advertisements, lets actually<br />
do some parsing:<br />
<br />
import Text.ParserCombinators.Parsec<br />
<br />
-- parseInput parses output of "du -sb", which consists of many lines,<br />
-- each of which describes single directory<br />
parseInput = <br />
do dirs <- many dirAndSize<br />
eof<br />
return dirs<br />
<br />
-- Datatype Dir holds information about single directory - its size and name<br />
data Dir = Dir Int String deriving Show<br />
<br />
-- `dirAndSize` parses information about single directory, which is:<br />
-- a size in bytes (number), some spaces, then directory name, which extends till newline<br />
dirAndSize = <br />
do size <- many1 digit<br />
spaces<br />
dir_name <- anyChar `manyTill` newline<br />
return (Dir (read size) dir_name)<br />
<br />
Just add those lines into "cd-fit.hs". Here we see quite a lot of new<br />
things, and several those that we know already. <br />
<br />
First of all, note the familiar "do" construct, which, as we know, is<br />
used to combine IO actions to produce new IO actions. Here we use it<br />
to combine "parsing" actions into new "parsing" actions. Does this<br />
mean that "parsing" implies "doing IO"? Not at all. Thing is, I must<br />
admit that I lied to you - "do" is used not only to combine IO<br />
actions. Is is used to combine any kind of so-called ''monadic<br />
actions'' or ''monadic values'' together.<br />
<br />
Think about monad as of "design pattern" in the functional world.<br />
Monad is a way to hide from the user (programmer) all the machinery<br />
required for complex functionality to operate.<br />
<br />
As you might have heard, Haskell has no notion of "assignment",<br />
"mutable state", "variables", and is a "pure functional language",<br />
which means that every function called with the same input parameters<br />
will return exactly the same result. Meanwhile "doing IO" requires<br />
hauling around file handles and their states and dealing with IO<br />
errors. "Parsing" requires to track position in the input and dealing<br />
with parsing errors.<br />
<br />
In both cases Wise Men Who Wrote Libraries cared for our needs and<br />
hid all underlying complexities from us, exposing the API of their<br />
libraries (IO and parsing) in the form of "monadic action" which we<br />
are free to combine as we see fit. <br />
<br />
Think of programming with monads as of doing the remodelling with the<br />
help of professional remodelling crew. You describe sequence of<br />
actions on the piece of paper (that's us writing in "do" notation),<br />
and then, when required, that sequence will be evaluated by the<br />
remodelling crew ("in the monad") which will provide you with end<br />
result, hiding all the underlying complexity (how to prepare the<br />
paint, which nails to choose, etc) from you.<br />
<br />
Lets use interactive Haskell environment to decipher all the<br />
instructions we've written for the parsing library. As usually, we'll<br />
go top-down:<br />
<br />
*Main> :reload<br />
Ok, modules loaded: Main.<br />
*Main> :t parseInput<br />
parseInput :: GenParser Char st [Dir]<br />
*Main> :t dirAndSize<br />
dirAndSize :: GenParser Char st Dir<br />
*Main> <br />
<br />
Assuming (well, take my word for it) that "GenParser Char st" is our<br />
parsing monad, we could see that "parseInput", when evaluated, will<br />
produce a list of "Dir", and "dirAndSize", when evaluated, will<br />
produce "Dir". Assuming that "Dir" somehow represents information<br />
about single directory, that is pretty much what we wanted, isn't it?<br />
<br />
Let's see what a "Dir" means. We defined ''datatype'' Dir as a record,<br />
which holds an Int and a String:<br />
<br />
data Dir = Dir Int String deriving Show<br />
<br />
In order to construct such records, we must use ''data constructor''<br />
Dir:<br />
<br />
*Main> :t Dir 1 "foo"<br />
Dir 1 "foo" :: Dir<br />
<br />
In order to reduce confusion for newbies, we could have written:<br />
<br />
data Dir = D Int String deriving Show<br />
<br />
, which would define ''datatype'' "Dir" with ''data constructor'' "D".<br />
However, traditionally name of the datatype and its constructor are<br />
chosen to be the same.<br />
<br />
Clause "deriving Show" instructs compiler to make enough code "behind<br />
the curtains" to make this ''datatype'' conform to the interface of<br />
the ''type class'' Show. We will explain ''type classes'' later, for<br />
now lets just say that this will allow us to "print" instances of<br />
"Dir".<br />
<br />
Exercises: <br />
* examine types of "digit", "anyChar", "many", "many1" and "manyTill" to see how they are used to build more complex parsers from single ones.<br />
<br />
* compare types of "manyTill", "manyTill anyChar" and "manyTill anyChar newline". Note that "anyChar `manyTill` newline" is just another syntax sugar. Note that when function is supplied with less arguments that it actually needs, we get not a value, but a new function, which is called ''partial application''.<br />
<br />
<br />
OK. So, we combined a lot of primitive parsing actions to get ourselves a<br />
parser for output of "du -sb". How can we actually parse something? Parsec<br />
library supplies us with function "parse":<br />
<br />
*Main> :t parse<br />
parse :: GenParser tok () a<br />
-> SourceName<br />
-> [tok]<br />
-> Either ParseError a<br />
*Main> :t parse parseInput<br />
parse parseInput :: SourceName -> [Char] -> Either ParseError [Dir]<br />
*Main> <br />
<br />
First type might be a bit cryptic, but once we supply "parse" with parser we<br />
made, compiler gets more information and presents us with a more concise type.<br />
<br />
Stop and consider this for a moment. Compiler figured out type of the function<br />
without a single type annotation supplied by us! Imagine that Java compiler<br />
deduces types for you, and you dont have to specify types of arguments and<br />
return values of methods, ever.<br />
<br />
OK, back to the code. We can observe that the "parser" is a function, which,<br />
given a parser, a name of the source file or channel (f.e. "stdin"), and<br />
source data (String, which is a list of "Char"s, which is written "[Char]"),<br />
will either produce parse error, or parse us a list of "Dir".<br />
<br />
Datatype "Either" is an example of datatype whose constructor has name, different<br />
from the name of the datatype. In fact, "Either" has two constructors:<br />
<br />
data Either a b = Left a | Right b<br />
<br />
In order to undestand better what does this mean consider the following<br />
example:<br />
<br />
*Main> :t Left 'a'<br />
Left 'a' :: Either Char b<br />
*Main> :t Right "aaa"<br />
Right "aaa" :: Either a [Char]<br />
*Main> <br />
<br />
You see that "Either" is a ''union'' (much like the C/C++ "union") which could<br />
hold value of one of the two distinct types. However, unlike C/C++ "union",<br />
when presented with value of type "Either Int Char" we could immediately see<br />
whether its an Int or a Char - by looking at the constructor which was used to<br />
produce the value. Such datatypes are called "tagged unions", and they are<br />
another power tool in the Haskell toolset.<br />
<br />
Did you also notice that we provide "parse" with parser, which is monadic<br />
value, but receive not a new monadic value, but a parsing result? That is<br />
because "parse" is an evaluator for "Parser" monad, much like the GHC or Hugs<br />
runtime is an evaluator for the IO monad. Function "parser" implements all<br />
monadic machinery: tracks errors and positions in input, implements<br />
backtracking and lookahead, etc.<br />
<br />
Lets extend our "main" function to use "parse" and actually parse the input<br />
and show us the parsed data structures:<br />
<br />
main = do input <- getContents<br />
putStrLn ("DEBUG: got input " ++ input)<br />
let dirs = case parse parseInput "stdin" input of<br />
Left err -> error $ "Input:\n" ++ show input ++ <br />
"\nError:\n" ++ show err<br />
Right result -> result<br />
putStrLn "DEBUG: parsed:"; print dirs<br />
<br />
Exercise:<br />
<br />
* In order to understand this snippet of code better, examine (with ghci or hugs) the difference between 'drop 1 ( drop 1 ( drop 1 ( drop 1 ( drop 1 "foobar" ))))' and 'drop 1 $ drop 1 $ drop 1 $ drop 1 $ drop 1 "foobar"'. Examine type of ($).<br />
* Try putStrLn "aaa" and print "aaa" and see the difference, examine their types.<br />
* Try print (Dir 1 "foo") and putStrLn (Dir 1 "foo"). Examine types of print and putStrLn to understand the behavior in both cases.<br />
<br />
Let's try to run what we have so far:<br />
<br />
$ du -sb * | runhaskell ./cd-fit.hs<br />
<br />
DEBUG: got input 22325 Article.txt<br />
18928 Article.txt~<br />
1706 cd-fit.hs<br />
964 cd-fit.hs~<br />
61609 _darcs<br />
<br />
DEBUG: parsed:<br />
[Dir 22325 "Article.txt",Dir 18928 "Article.txt~",Dir 1706 "cd-fit.hs",Dir 964 "cd-fit.hs~",Dir 61609 "_darcs"]<br />
<br />
Seems to be doing exactly as planned. Now lets try some erroneous<br />
input:<br />
<br />
$ echo "foo" | runhaskell cd-fit.hs<br />
DEBUG: got input foo<br />
<br />
DEBUG: parsed:<br />
*** Exception: Input:<br />
"foo\n"<br />
Error:<br />
"stdin" (line 1, column 1):<br />
unexpected "f"<br />
expecting digit or end of input<br />
<br />
Seems to be doing fine. Let's "darcs record" it, giving the commit comment "Implemented parsing of input".<br />
<br />
----<br />
Here is complete "cd-fit.hs" what we should have written so far:<br />
<br />
module Main where<br />
<br />
import Text.ParserCombinators.Parsec<br />
<br />
-- Output of "du -sb" -- which is our input -- consists of many lines,<br />
-- each of which describes single directory<br />
parseInput = <br />
do dirs <- many dirAndSize<br />
eof<br />
return dirs<br />
<br />
-- Information about single direcory is a size (number), some spaces,<br />
-- then directory name, which extends till newline<br />
data Dir = Dir Int String deriving Show<br />
dirAndSize = <br />
do size <- many1 digit<br />
spaces<br />
dir_name <- anyChar `manyTill` newline<br />
return $ Dir (read size) dir_name<br />
<br />
main = do input <- getContents<br />
putStrLn ("DEBUG: got input " ++ input)<br />
let dirs = case parse parseInput "stdin" input of<br />
Left err -> error $ "Input:\n" ++ show input ++ <br />
"\nError:\n" ++ show err<br />
Right result -> result<br />
putStrLn "DEBUG: parsed:"; print dirs<br />
-- compute solution and print it<br />
<br />
<br />
== Chapter 3: Packing the knapsack and testing it with class, too (and don't forget your towel!) ==<br />
<br />
Enough preliminaries already. Lets go pack some CDs.<br />
<br />
As you might already have recognized, our problem is a classical one. It is<br />
called a "knapsack problem" ([http://www.google.com/search?q=knapsack+problem google it up], if you don't know already what is<br />
it. There are more than 100000 links).<br />
<br />
Lets start from the greedy solution, but first let's slightly modify our "Dir"<br />
datatype to allow easy extraction of its components:<br />
<br />
data Dir = Dir {dir_size::Int, dir_name::String} deriving Show<br />
<br />
----<br />
Exercise: examine types of "Dir", "dir_size" and "dir_name"<br />
----<br />
<br />
From now on, we could use "dir_size d" to get a size of directory, and<br />
"dir_name d" to get its name, provided that "d" is of type "Dir".<br />
<br />
Greedy algorithm sorts directories from the biggest down, and tries to puts<br />
them on CD one by one, until there is no room for more. We will need to track<br />
which directories we added to CD, so lets add another datatype, and code this<br />
simple packing algorithm:<br />
<br />
import Data.List (sortBy)<br />
<br />
-- DirPack holds a set of directories which are to be stored on single CD.<br />
-- 'pack_size' could be calculated, but we will store it separately to reduce<br />
-- amount of calculation<br />
data DirPack = DirPack {pack_size::Int, dirs::[Dir]} deriving Show<br />
<br />
-- For simplicity, lets assume that we deal with standard 700 Mb CDs for now<br />
media_size = 700*1024*1024<br />
<br />
-- Greedy packer tries to add directories one by one to initially empty 'DirPack'<br />
greedy_pack dirs = foldl maybe_add_dir (DirPack 0 []) $ sortBy cmpSize dirs<br />
where<br />
cmpSize d1 d2 = compare (dir_size d1) (dir_size d2)<br />
<br />
-- Helper function, which only adds directory "d" to the pack "p" when new<br />
-- total size does not exceed media_size<br />
maybe_add_dir p d =<br />
let new_size = pack_size p + dir_size d<br />
new_dirs = d:(dirs p)<br />
in if new_size > media_size then p else DirPack new_size new_dirs<br />
<br />
----<br />
I'll highlight the areas which you could explore on your own (using other nice<br />
tutorials out there, of which I especially recommend "Yet Another Haskell<br />
Tutorial" by Hal Daume):<br />
* We choose to import a single function "sortBy" from a module Data.List, not the whole thing.<br />
* Instead of coding case-by-case recursive definition of "greedy_pack", we go with high-order approach, choosing "foldl" as a vehicle for list traversal. Examine its type. Other useful function from the same category are "map", "foldr", "scanl" and "scanr". Look them up!<br />
* To sort list of "Dir" by size only, we use custom sort function and parameterized sort - "sortBy". This sort of setup where user could provide custom "modifier" for generic library function is quite common: look up "deleteBy", "deleteFirstsBy", "groupBy", "insertBy", "intersectBy", "maximumBy", "minimumBy", "sortBy", "unionBy".<br />
* To code quite complex function "maybe_add_dir", we introduced several '''local definition''' in the "let" clause, which we could reuse within function body. We used "where" clause in the "greedy_pack" function to achieve the same. Read about "let" and "where" clauses and difference between them.<br />
* Note that in order to construct a new value of type "DirPack" (in function "maybe_add_dir") we haven't used helper accessor functions "pack_size" and "dirs"<br />
----<br />
<br />
In order to actually use our greedy packer we must call it from our "main"<br />
function, so lets add a lines:<br />
<br />
main = do ...<br />
-- compute solution and print it<br />
putStrLn "Solution:" ; print (greedy_pack dirs)<br />
<br />
Verify integrity of our definitions by (re)loading our code in ghci. Compiles?<br />
Thought so :)<br />
<br />
Now it is time to test our creation. We could do it by actually running it in<br />
the wild like this:<br />
<br />
$ du -sb ~/DOWNLOADS/* | runhaskell ./cd-fit.hs<br />
<br />
This will prove that our code seems to be working. At least, this once. How<br />
about establishing with reasonable degree of certainty that our code, parts<br />
and the whole, works properly, and doing so in re-usable manner? In other<br />
words, how about writing some test?<br />
<br />
Java programmers used to JUnit probably thought about screens of boiler-plate<br />
code and hand-coded method invocations. Never fear, we will not do anything as<br />
silly :)<br />
<br />
Enter '''QuickCheck'''.<br />
<br />
QuickCheck is a tool to do automated testing of you functions using<br />
(semi)random input data. In the spirit of "100b of code examples worth 1kb of<br />
praise" lets show the code for testing the following ''property'': attempt to pack<br />
directories returned by "greedy_pack" should return "DirPack" of exactly the<br />
same pack:<br />
<br />
import Test.QuickCheck<br />
import Control.Monad (liftM2)<br />
<br />
-- We must teach QuickCheck how to generate arbitrary "Dir"s<br />
instance Arbitrary Dir where<br />
-- Let's just skip "coarbitrary" for now, ok? <br />
-- I promise, we will get back to it later :)<br />
coarbitrary = undefined<br />
-- We generate arbitrary "Dir" by generating random size and random name<br />
-- and stuffing them inside "Dir"<br />
arbitrary = liftM2 Dir gen_size gen_name<br />
-- Generate random size between 10 and 1400 Mb<br />
where gen_size = do s <- choose (10,1400)<br />
return (s*1024*1024)<br />
-- Generate random name 1 to 300 chars long, consisting of symbols "fubar/" <br />
gen_name = do n <- choose (1,300)<br />
sequence $ take (n*10+1) $ repeat (elements "fubar/")<br />
<br />
-- For convenience and by tradition, all QuickCheck tests begin with prefix "prop_".<br />
-- Assume that "ds" will be a random list of "Dir"s and code your test.<br />
prop_greedy_pack_is_fixpoint ds =<br />
let pack = greedy_pack ds <br />
in pack_size pack == pack_size (greedy_pack (dirs pack))<br />
<br />
Lets run the test, after which I'll explain how it all works:<br />
<br />
Prelude> :r<br />
Compiling Main ( ./cd-fit.hs, interpreted )<br />
Ok, modules loaded: Main.<br />
*Main> quickCheck prop_greedy_pack_is_fixpoint<br />
[numbers spinning]<br />
OK, passed 100 tests.<br />
*Main> <br />
<br />
We've just seen our "greedy_pack" run on a 100 completely (well, almost<br />
completely) random lists of "Dir"s, and it seems that property indeed holds.<br />
<br />
Lets dissect the code. The most intriguing part is "instance Arbitrary Dir<br />
where", which declares that "Dir" is an '''instance''' of '''typeclass'''<br />
"Arbitrary". Whoa, that's a whole lot of unknown words! :) Let's slow down a<br />
bit. <br />
<br />
What is a '''typeclass'''? Typeclass is a Haskell way of dealing with the<br />
following situation: suppose that you are writing a library of usefull<br />
functions and you dont know in advance how exactly they will be used, so you<br />
want to make them generic. Now, on one hand you dont want to restrict your<br />
users to certain type (f.e. String). On the other hand, you want to enforce<br />
the convention, that arguments for your function must satisfy certain set of<br />
constraints. That is where '''typeclass''' comes in handy. <br />
<br />
Think of typeclass as of '''contract''' (or "interface", in Java terms) that<br />
your type must fulfill in order to be admitted as an argument to certain<br />
function. <br />
<br />
Let's examine typeclass "Arbitrary":<br />
<br />
*Main> :i Arbitrary<br />
class Arbitrary a where<br />
arbitrary :: Gen a<br />
coarbitrary :: a -> Gen b -> Gen b<br />
-- Imported from Test.QuickCheck<br />
instance Arbitrary Dir<br />
-- Defined at ./cd-fit.hs:61:0<br />
instance Arbitrary Bool -- Imported from Test.QuickCheck<br />
instance Arbitrary Double -- Imported from Test.QuickCheck<br />
instance Arbitrary Float -- Imported from Test.QuickCheck<br />
instance Arbitrary Int -- Imported from Test.QuickCheck<br />
instance Arbitrary Integer -- Imported from Test.QuickCheck<br />
-- rest skipped --<br />
<br />
It could be read this way: "Any type (let's name it 'a') could be a member of<br />
class Arbitrary as soon as we define two functions for it: "arbitrary" and<br />
"coarbitrary", with signatures shown. For types Dir, Bool, Double, Float, Int and<br />
Integer such definition were provided, so all those types are instance of<br />
class Arbitrary".<br />
<br />
Now, if you write a function which operates on its arguments solely by means<br />
of "arbitrary" and "coarbitrary", you can be sure that this function will work<br />
on any type which is and instance of "Arbitrary"!<br />
<br />
Lets say it again. Someone (maybe even you) writes the code (API or library),<br />
which requires that input values implement certain ''interfaces'', which is<br />
described in terms of functions. Once you show how your type implements this<br />
''interface'' you are free to use API or library.<br />
<br />
Conside the function "sort" from standard library:<br />
<br />
*Main> :t Data.List.sort<br />
Data.List.sort :: (Ord a) => [a] -> [a]<br />
<br />
We see that it sorts lists of any values which are instance of typeclass<br />
"Ord". Let's examine that class:<br />
<br />
*Main> :i Ord<br />
class Eq a => Ord a where<br />
compare :: a -> a -> Ordering<br />
(<) :: a -> a -> Bool<br />
(>=) :: a -> a -> Bool<br />
(>) :: a -> a -> Bool<br />
(<=) :: a -> a -> Bool<br />
max :: a -> a -> a<br />
min :: a -> a -> a<br />
-- skip<br />
instance Ord Double -- Imported from GHC.Float<br />
instance Ord Float -- Imported from GHC.Float<br />
instance Ord Bool -- Imported from GHC.Base<br />
instance Ord Char -- Imported from GHC.Base<br />
instance Ord Integer -- Imported from GHC.Num<br />
instance Ord Int -- Imported from GHC.Base<br />
-- skip<br />
*Main> <br />
<br />
We see a couple of interesting things: first, there is an additional<br />
requirement listed: in order to be an instance of "Ord", type must first be an<br />
instance of typeclass "Eq". Then, we see that there is an awful lot of<br />
functions to define in order to be an instance of "Ord". Wait a second, isn't<br />
it silly to define both (<) and (>) when one could be expressed via another? <br />
<br />
Right you are! Usually, typeclass contains several "default" implementation<br />
for its functions, when it is possible to express them through each other (as<br />
it is with "Ord"). In this case it is possible to supply only a minimal<br />
definition (which in case of "Ord" consists of any single function) and others<br />
will be automatically derived. If you supplied less functions than required<br />
for minimal implementation, compiler/interpreter will surely say so and<br />
explain which functions you still have to define.<br />
<br />
Once again, we see that a lot of type are already instances of typeclass Ord,<br />
and thus we are able to sort them.<br />
<br />
Now, lets take a look back to the definition of "Dir":<br />
<br />
data Dir = Dir {dir_size::Int, dir_name::String} deriving Show<br />
<br />
See that "deriving" clause? It instructs compiler to automatically derive code<br />
to make "Dir" an instance of typeclass Show. Compiler knows about a bunch of<br />
standard typeclasses (Eq, Ord, Show, Enum, Bound, Typeable to name a few) and<br />
knows how to make a type into "suitably good" instance of any of them. If you<br />
want to derive instances of more than one typeclass, say it this way:<br />
"deriving (Eq,Ord,Show)". Voila! Now we can compare, sort and print data of<br />
that type!<br />
<br />
Sidenote for Java programmers: just imagine java compiler which derives code<br />
for "implements Storable" for you...<br />
<br />
Sidenote for C++ programmers: just imagine that deep copy constructors are<br />
being written for you by compiler....<br />
<br />
----<br />
Exercises:<br />
* Examine typeclasses Eq and Show<br />
* Examine types of (==) and "print"<br />
* Try to make "Dir" instance of "Eq"<br />
----<br />
<br />
OK, back to our tests. So, what we have had to do in order to make "Dir" an<br />
instance of "Arbitrary"? Minimal definition consists of "arbitrary". Let's<br />
examine it up close:<br />
<br />
*Main> :t arbitrary<br />
arbitrary :: (Arbitrary a) => Gen a<br />
<br />
See that "Gen a"? Reminds you of something? Right! Think of "IO a" and "Parser<br />
a" which we've seen already. This is yet another example of action-returning<br />
function, which could be used inside "do"-notation. (You might ask yourself,<br />
wouldn't it be useful to generalize that convenient concept of actions and<br />
"do"? Of course! It is already done, the concept is called "Monad" and we will<br />
talk about it in Chapter 400 :) )<br />
<br />
Since 'a' here is a type which is an instance of "Arbitrary", we could<br />
substitute "Dir" here. So, how we can make and return action of type "Gen<br />
Dir"?<br />
<br />
Let's look at the code:<br />
<br />
arbitrary = liftM2 Dir gen_size gen_name<br />
-- Generate random size between 10 and 1400 Mb<br />
where gen_size = do s <- choose (10,1400)<br />
return (s*1024*1024)<br />
-- Generate random name 1 to 300 chars long, consisting of symbols "fubar/" <br />
gen_name = do n <- choose (1,300)<br />
sequence $ take (n*10+1) $ repeat (elements "fubar/")<br />
<br />
We have used library-provided functions "choose" and "elements" to build up<br />
"gen_size :: Gen Int" and "gen_name :: Gen String" (exercise: don't take my<br />
word on that. Find a way to check types of "gen_name" and "gen_size"). Since<br />
"Int" and "String" are components of "Dir", we sure must be able to use "Gen<br />
Int" and "Gen String" to build "Gen Dir". But where is the "do" block for<br />
that? There is none, and there is only single call to "liftM2". <br />
<br />
Let's examine it:<br />
<br />
*Main> :t liftM2<br />
liftM2 :: (Monad m) => (a1 -> a2 -> r) -> m a1 -> m a2 -> m r<br />
<br />
Kind of scary, right? Let's provide typechecker with more context:<br />
<br />
*Main> :t liftM2 Dir<br />
liftM2 Dir :: (Monad m) => m Int -> m String -> m Dir<br />
<br />
Since you already heard that "Gen" is a "Monad", you could substitute "Gen" fo<br />
r "m" here, obtaining "liftM2 Dir :: (Monad Gen) => Gen Int -> Gen String -><br />
Gen Dir". Exactly what we wanted!<br />
<br />
Consider "liftM2" to be "advanced topic" of this chapter (which we will cover<br />
later) and just note for now that:<br />
* "2" is a number of arguments for data constructor "Dir" and we have used "liftM2" to construct "Gen Dir" out of "Dir"<br />
* There are also "liftM", "liftM3", "liftM4", "liftM5"<br />
* "liftM2" is defined as "liftM2 f a1 a2 = do x<-a1; y<-a2; return (f x y)"<br />
<br />
Hopefully, this will all make sense after you read it for the third time ;)<br />
<br />
== Chapter 4: REALLY packing the knapsack this time == <br />
<br />
In this chapter we are going to write several not-so-trivial packing methods,<br />
compare their efficiency, and learn something new about debugging and<br />
profiling of the Haskell programs along the way<br />
<br />
== Chapter 400: Monads up close ==<br />
<br />
Google "All about monads" and read it. 'Nuff said :)<br />
<br />
== Chapter 500: IO up close ==<br />
<br />
Shows that:<br />
<br />
c = do a <- someAction<br />
b <- someOtherAction<br />
print (bar b)<br />
print (foo a)<br />
print "done"<br />
<br />
really is just a syntax sugar for:<br />
<br />
c = someAction >>= \a -><br />
someOtherAction >>= \b -><br />
print (bar b) >><br />
print (foo a) >><br />
print "done"<br />
<br />
and explains about ">>=" and ">>". Oh wait. This was already explained<br />
in Chapter 400 :)<br />
<br />
== Chapter 9999: Installing Haskell Compiler/Interpreter and all necessary software ==<br />
<br />
Plenty of material on this on the web and this wiki. Just go get<br />
yourself installation of GHC (6.4 or above) or Hugs (v200311 or<br />
above) and "darcs", which we will use for version control.<br />
<br />
== Chapter 10000: Thanks! ==<br />
<br />
Thanks for comments, proofreading, good advice and kind words go to: Helge,<br />
alt, dottedmag, Paul Moore, Ben Rudiak-Gould, Jim Wilkinson, avalez, Martin<br />
Percossi. If I should have mentioned YOU and forgot - tell me so.<br />
<br />
Without you I would have stopped after Chapter 1 :)</div>JoshHoyt