Difference between revisions of "How to profile a Haskell program"

From HaskellWiki
Jump to navigation Jump to search
m (How to profile your code moved to How to profile a Haskell program)
Line 26: Line 26:
   
 
Note that I assume you are using Cabal. If not, see [[How to write a Haskell program]]. It's super easy, and you'll be happy you did it.
 
Note that I assume you are using Cabal. If not, see [[How to write a Haskell program]]. It's super easy, and you'll be happy you did it.
  +
  +
=== Get toy data ===
  +
  +
My script takes hours to convert 50M of XML. Running it on such data every time I tweak something would clearly not be a good idea. You want something which is small enough for your program to come back relatively quickly, but large enough to study.
  +
  +
=== Assemble a test harness ===
  +
  +
Make things easy on yourself! I find that it's very helpful to automate my way out of my clumsiness. Ideally, each tweak you make to your software should be accompanied by a simple <code>go</code> and not some long sequence of actions, half of which you might forget.

Revision as of 12:03, 20 March 2007


Just jotting down my notes whilst profiling one of my helper scripts. It would be great if the community could transform this into a tutorial

The case study

I have a script that converts from an XML format to some pickled data structures via Data.Binary. The XML part is generated by HaXml's DtdToHaskell. On a 54M XML file, the thing swaps like crazy and takes several hours. I would like to improve the situation.

Setting things up

Enable profiling on libraries

For example, my script uses HaXmL, which uses a library called polyparse:

cd polyparse
runhaskell Setup.hs configure --enable-library-profiling
runhaskell Setup.hs build
sudo runhaskell Setup.hs install
cd ..
cd HaXml
runhaskell Setup.hs configure --enable-library-profiling
runhaskell Setup.hs build
sudo runhaskell Setup.hs install

Enable profiling on your stuff

Note that I assume you are using Cabal. If not, see How to write a Haskell program. It's super easy, and you'll be happy you did it.

Get toy data

My script takes hours to convert 50M of XML. Running it on such data every time I tweak something would clearly not be a good idea. You want something which is small enough for your program to come back relatively quickly, but large enough to study.

Assemble a test harness

Make things easy on yourself! I find that it's very helpful to automate my way out of my clumsiness. Ideally, each tweak you make to your software should be accompanied by a simple go and not some long sequence of actions, half of which you might forget.