How to profile a Haskell program: Difference between revisions

Revision as of 12:35, 20 March 2007

Just jotting down my notes whilst profiling one of my helper scripts. It would be great if the community could transform this into a tutorial

The case study

I have a script that converts from an XML format to some pickled data structures via Data.Binary. The XML part is generated by HaXml's DtdToHaskell. On a 54M XML file, the thing swaps like crazy and takes several hours. I would like to improve the situation.

Preliminaries

Enable profiling on libraries

For example, my script uses HaXmL, which uses a library called polyparse:

cd polyparse
runhaskell Setup.hs configure --enable-library-profiling
runhaskell Setup.hs build
sudo runhaskell Setup.hs install
cd ..

cd HaXml
runhaskell Setup.hs configure --enable-library-profiling
runhaskell Setup.hs build
sudo runhaskell Setup.hs install

When they are done building, you should notice output like:

ar: creating archive dist/build/libHSpolyparse-1.0.a
ar: creating archive dist/build/libHSpolyparse-1.0_p.a

The _p file is the library with profiling information. Note that the non-profiling one is also created and installed, so you don't have to worry about this slowing down your regular code.

You'll need to do this for every library that you use.

Enable profiling on your stuff

Note that I assume you are using Cabal. If not, see How to write a Haskell program. It's super easy, and you'll be happy you did it.

cd yourProgram
runhaskell Setup.hs configure --enable-executable-profiling
runhaskell Setup.hs build

No need to install it. We'll be making changes aplenty.

Get toy data

My script takes hours to convert 50M of XML. Running it on such data every time I tweak something would clearly not be a good idea. You want something which is small enough for your program to come back relatively quickly, but large enough to study.

I use something like sed -f makeToy.sed reallyBigFile.xml > toy.xml where makeToy.sed is a bit of text-hacking to chop off the rest of my data after the arbitrarily chosen item #6621:

/6621/{
c\
</grammar>
q
}

Test harness

Make things easy on yourself! I find that it's very helpful to automate my way out of my clumsiness. Ideally, each tweak you make to your software should be accompanied by a simple run and not some long sequence of actions, half of which you might forget. Note: you might also consider using a Makefile instead of a bunch of scripts.

We'll be working with a stable and unstable repository. It's possible that you'll be making a lot of small modifications to your program, so what would be nice is to be able to save some of your modifications along the way. Darcs is very handy for this.

Create a profiling directory

mkdir profiling
mv toy.xml profiling

Create a script `profiling/setup`

#!/bin/sh
runhaskell Setup.lhs configure --enable-executable-profiling

Create a script `profiling/run`

This script compiles your code, and runs it on some profiling data

#!/bin/sh
runhaskell Setup.lhs build
yourProgram -prof profiling/toy.xml

Create a script `profiling/save`

Create the stable and unstable repositories

darcs get yourRepository perfUnstable
darcs get yourRepository perfStable

You should work in perfUnstable. From time to time, you'll want to record your changes and push them into the stable branch. More on this later.

Profiling

Generate the data, advice on how to scrutinise it (help especially wanted)

@@ Line 36: / Line 36: @@
   cd yourProgram
-  runhaskell Setup.hs configure --enable-binary-profiling
+  runhaskell Setup.hs configure --enable-executable-profiling
   runhaskell Setup.hs build
@@ Line 54: / Line 54: @@
 == Test harness ==
-Make things easy on yourself!  I find that it's very helpful to automate my way out of my clumsiness.  Ideally, each tweak you make to your software should be accompanied by a simple <code>run</code> and not some long sequence of actions, half of which you might forget.
+Make things easy on yourself!  I find that it's very helpful to automate my way out of my clumsiness.  Ideally, each tweak you make to your software should be accompanied by a simple <code>run</code> and not some long sequence of actions, half of which you might forget.  Note: you might also consider using a Makefile instead of a bunch of scripts.
-=== Create stable and unstable repositories ===
+We'll be working with a stable and unstable repository.  It's possible that you'll be making a lot of small modifications to your program, so what would be nice is to be able to save some of your modifications along the way.  Darcs is very handy for this.
-It's possible that you'll be making a lot of small modifications to your program, so what would be nice is to be able to save some of your modifications along the way.  Darcs is very handy for this.
+=== Create a profiling directory ===
+ mkdir profiling
+ mv toy.xml profiling
+=== Create a script <code>profiling/setup</code> ===
+ #!/bin/sh
+ runhaskell Setup.lhs configure --enable-executable-profiling
+=== Create a script <code>profiling/run</code> ===
+This script compiles your code, and runs it on some profiling data
+ #!/bin/sh
+ runhaskell Setup.lhs build
+  yourProgram -prof profiling/toy.xml
+=== Create a script <code>profiling/save</code> ===
+=== Create the stable and unstable repositories ===
   darcs get yourRepository perfUnstable
@@ Line 65: / Line 86: @@
 You should work in perfUnstable.  From time to time, you'll want to record your changes and push them into the stable branch.  More on this later.
-=== Create a <code>run</code> script ===
-=== Create a <code>save</code> script ===
 == Profiling  ==
 :''Generate the data, advice on how to scrutinise it (help especially wanted)''