How to profile a Haskell program: Difference between revisions
No edit summary |
No edit summary |
||
Line 7: | Line 7: | ||
I have a script that converts from an XML format to some pickled data structures via Data.Binary. The XML part is generated by HaXml's DtdToHaskell. On a 54M XML file, the thing swaps like crazy and takes several hours. I would like to improve the situation. | I have a script that converts from an XML format to some pickled data structures via Data.Binary. The XML part is generated by HaXml's DtdToHaskell. On a 54M XML file, the thing swaps like crazy and takes several hours. I would like to improve the situation. | ||
== | == Preliminaries == | ||
=== Enable profiling on libraries === | === Enable profiling on libraries === | ||
Line 31: | Line 31: | ||
My script takes hours to convert 50M of XML. Running it on such data every time I tweak something would clearly not be a good idea. You want something which is small enough for your program to come back relatively quickly, but large enough to study. | My script takes hours to convert 50M of XML. Running it on such data every time I tweak something would clearly not be a good idea. You want something which is small enough for your program to come back relatively quickly, but large enough to study. | ||
== | == Test harness == | ||
Make things easy on yourself! I find that it's very helpful to automate my way out of my clumsiness. Ideally, each tweak you make to your software should be accompanied by a simple <code> | Make things easy on yourself! I find that it's very helpful to automate my way out of my clumsiness. Ideally, each tweak you make to your software should be accompanied by a simple <code>run</code> and not some long sequence of actions, half of which you might forget. | ||
=== Create stable and unstable repositories === | |||
It's possible that you'll be making a lot of small modifications to your program, so what would be nice is to be able to save some of your modifications along the way. Darcs is very handy for this. | |||
darcs get yourRepository perfUnstable | |||
darcs get yourRepository perfStable | |||
You should work in perfUnstable. From time to time, you'll want to record your changes and push them into the stable branch. More on this later. | |||
=== Create a <code>run</code> script === | |||
=== Create a <code>save</code> script === | |||
== Profiling == | |||
:''Generate the data, advice on how to scrutinise it (help especially wanted)'' |
Revision as of 12:12, 20 March 2007
- Just jotting down my notes whilst profiling one of my helper scripts. It would be great if the community could transform this into a tutorial
The case study
I have a script that converts from an XML format to some pickled data structures via Data.Binary. The XML part is generated by HaXml's DtdToHaskell. On a 54M XML file, the thing swaps like crazy and takes several hours. I would like to improve the situation.
Preliminaries
Enable profiling on libraries
For example, my script uses HaXmL, which uses a library called polyparse:
cd polyparse runhaskell Setup.hs configure --enable-library-profiling runhaskell Setup.hs build sudo runhaskell Setup.hs install cd ..
cd HaXml runhaskell Setup.hs configure --enable-library-profiling runhaskell Setup.hs build sudo runhaskell Setup.hs install
Enable profiling on your stuff
Note that I assume you are using Cabal. If not, see How to write a Haskell program. It's super easy, and you'll be happy you did it.
Get toy data
My script takes hours to convert 50M of XML. Running it on such data every time I tweak something would clearly not be a good idea. You want something which is small enough for your program to come back relatively quickly, but large enough to study.
Test harness
Make things easy on yourself! I find that it's very helpful to automate my way out of my clumsiness. Ideally, each tweak you make to your software should be accompanied by a simple run
and not some long sequence of actions, half of which you might forget.
Create stable and unstable repositories
It's possible that you'll be making a lot of small modifications to your program, so what would be nice is to be able to save some of your modifications along the way. Darcs is very handy for this.
darcs get yourRepository perfUnstable darcs get yourRepository perfStable
You should work in perfUnstable. From time to time, you'll want to record your changes and push them into the stable branch. More on this later.
Create a run
script
Create a save
script
Profiling
- Generate the data, advice on how to scrutinise it (help especially wanted)