Difference between revisions of "How to profile a Haskell program"

From HaskellWiki
Jump to navigation Jump to search
m (The -O flag is generally a better choice then -O2 in GHC.)
(17 intermediate revisions by 6 users not shown)
Line 1: Line 1:
  +
[[Category:Performance]]
 
[[Category:Tutorials]]
 
[[Category:Tutorials]]
  +
[[Category:Testing]]
   
  +
This page shows you how to profile Haskell programs using [[GHC]].
:''Just jotting down my notes whilst profiling one of my helper scripts. It would be great if the community could transform this into a tutorial''
 
  +
The information is presented in the form of several case studies.
   
  +
:'''''Note:''' It would be a good idea to gather together the information contained in the case studies and present it here in reference form.''
== The case study ==
 
  +
  +
{{Stub}}
  +
  +
== Case study: AntVis ==
  +
  +
This case study is a graphical simulation of ant foraging that demonstrates
  +
the use of [[software transactional memory]]. A Haskell version of
  +
the simulation was written by Jeff Foster and described in detail
  +
on his [http://www.fatvat.co.uk/2010/08/ants-and-haskell.html blog].
  +
  +
There were some performance and memory usage problems with the
  +
first version of the simulation. Another
  +
[http://www.fatvat.co.uk/2010/08/speeding-up-ants-program.html blog post]
  +
details Jeff's use of profiling to solve the problems.
  +
  +
== Case study: Processing XML using HaXML ==
   
 
I have a script that converts from an XML format to some pickled data structures via Data.Binary. The XML part is generated by HaXml's DtdToHaskell. On a 54M XML file, the thing swaps like crazy and takes several hours. I would like to improve the situation.
 
I have a script that converts from an XML format to some pickled data structures via Data.Binary. The XML part is generated by HaXml's DtdToHaskell. On a 54M XML file, the thing swaps like crazy and takes several hours. I would like to improve the situation.
   
== Preliminaries ==
+
=== Preliminaries ===
  +
  +
==== Preflight checklist ====
  +
  +
* Are you using compiler flags for optimised code, e.g. <code>-O</code>?
  +
* Are you using the latest version of your libraries?
  +
* If you just set the an optimisation code, did you remember to <code>make clean</code> (or the equivalent) and rebuild?
   
=== Enable profiling on libraries ===
+
==== Enable profiling on libraries ====
   
 
For example, my script uses HaXmL, which uses a library called polyparse:
 
For example, my script uses HaXmL, which uses a library called polyparse:
Line 31: Line 56:
 
You'll need to do this for every library that you use.
 
You'll need to do this for every library that you use.
   
=== Enable profiling on your stuff ===
+
==== Enable profiling on your stuff ====
   
 
Note that I assume you are using Cabal. If not, see [[How to write a Haskell program]]. It's super easy, and you'll be happy you did it.
 
Note that I assume you are using Cabal. If not, see [[How to write a Haskell program]]. It's super easy, and you'll be happy you did it.
  +
  +
If you are looking to profile code that results in an executable, it's pretty straight-forward.
   
 
cd yourProgram
 
cd yourProgram
Line 41: Line 68:
 
No need to install it. We'll be making changes aplenty.
 
No need to install it. We'll be making changes aplenty.
   
  +
If you are looking to profile library code that a particular executable invokes, there's one more step. This presentation is specific to GHC's profiling; look to your compiler's manual if you're not using GHC.
=== Get toy data ===
 
  +
  +
cd yourLibrary
  +
runhaskell Setup.hs configure --enable-library-profiling --ghc-option=-auto-all
 
runhaskell Setup.hs build
  +
sudo runhaskell Setup.hs install
  +
  +
That extra <code>-auto-all</code> bit tells the profiler to track the internals of that library more carefully. You might refine the tracking later (see the [http://www.haskell.org/ghc/docs/latest/html/users_guide/profiling.html GHC manual's chapter on profiling]), but <code>-auto-all</code> is usually a good place to start.
  +
  +
Let's summarize.
  +
# Every library the code you want to profile depends on (transitively) must be compiled with <code>--enable-library-profiling</code>.
  +
# Your executable must be compiled with <code>--enable-executable-profiling</code>.
  +
# If you want to profile the code of a library, then that library needs to be compiled with GHC's <code>-auto-all</code> option (or comparable for other compilers) in addition to <code>--enable-library-profiling</code>. (Cabal might have a generic flag for this after version 1.8, according to [http://hackage.haskell.org/trac/hackage/ticket/200 this ticket].)
  +
  +
This article proceeds under the assumption that you are profiling an executable's code, but it's the same basic idea if you're investigating a library's code.
  +
 
==== Get toy data ====
   
 
My script takes hours to convert 50M of XML. Running it on such data every time I tweak something would clearly not be a good idea. You want something which is small enough for your program to come back relatively quickly, but large enough to study.
 
My script takes hours to convert 50M of XML. Running it on such data every time I tweak something would clearly not be a good idea. You want something which is small enough for your program to come back relatively quickly, but large enough to study.
Line 52: Line 95:
 
}
 
}
   
== Test harness ==
+
=== Test harness ===
   
 
Make things easy on yourself! I find that it's very helpful to automate my way out of my clumsiness. Ideally, each tweak you make to your software should be accompanied by a simple <code>run</code> and not some long sequence of actions, half of which you might forget. Note: you might also consider using a Makefile instead of a bunch of scripts.
 
Make things easy on yourself! I find that it's very helpful to automate my way out of my clumsiness. Ideally, each tweak you make to your software should be accompanied by a simple <code>run</code> and not some long sequence of actions, half of which you might forget. Note: you might also consider using a Makefile instead of a bunch of scripts.
Line 58: Line 101:
 
We'll be working with a stable and unstable repository. It's possible that you'll be making a lot of small modifications to your program, so what would be nice is to be able to save some of your modifications along the way. Darcs is very handy for this.
 
We'll be working with a stable and unstable repository. It's possible that you'll be making a lot of small modifications to your program, so what would be nice is to be able to save some of your modifications along the way. Darcs is very handy for this.
   
=== Create a profiling directory ===
+
==== Create a profiling directory ====
   
 
mkdir profiling
 
mkdir profiling
 
mv toy.xml profiling
 
mv toy.xml profiling
   
=== Create a script <code>profiling/setup</code> ===
+
==== Create a script <code>profiling/setup</code> ====
   
 
#!/bin/sh
 
#!/bin/sh
 
chmod u+x profiling/setup
 
chmod u+x profiling/run
  +
chmod u+x profiling/compare
  +
chmod u+x profiling/save
 
runhaskell Setup.lhs configure --enable-executable-profiling
 
runhaskell Setup.lhs configure --enable-executable-profiling
   
=== Create a script <code>profiling/run</code> ===
+
==== Create a script <code>profiling/run</code> ====
   
 
This script compiles your code, and runs it on some profiling data
 
This script compiles your code, and runs it on some profiling data
   
  +
<pre>
#!/bin/sh
 
  +
#!/bin/sh
runhaskell Setup.lhs build
 
yourProgram --yourflags profiling/toy.xml +RTS -prof
 
   
  +
PROG==geniconvert
=== Create a script <code>profiling/save</code> ===
 
  +
VIEW==open
 
FLAGS==--yourflags profiling/toydata.xml
   
  +
runhaskell Setup.lhs build
  +
dist/build/${PROG}/${PROG} ${FLAGS} +RTS -p -hc -s${PROG}.summary
  +
hp2ps ${PROG}.hp
  +
${VIEW} ${PROG}.ps
  +
cat ${PROG}.summary
  +
</pre>
   
=== Create the stable and unstable repositories ===
+
==== Create a script <code>profiling/compare</code> ====
  +
 
==== Create a script <code>profiling/save</code> ====
  +
 
#!/bin/sh
  +
darcs push --no-set-default ../perfStable
  +
cd ../perfStable
  +
profiling/run
  +
  +
==== Create a stable branch ====
   
darcs get yourRepository perfUnstable
 
 
darcs get yourRepository perfStable
 
darcs get yourRepository perfStable
 
cd perfStable
 
cd perfStable
chmod u+x profiling/setup
+
sh profiling/setup
profiling/setup
 
 
cd ..
 
cd ..
   
cd perfUnstable
+
cd yourRepository
chmod u+x profiling/setup
+
sh profiling/setup
profiling/setup
 
   
You should work in perfUnstable. From time to time, you'll want to record your changes and push them into the stable branch. More on this later.
+
You should work in the unstable branch (yourRepository). From time to time, you'll want to record your changes and push them into the stable branch. More on this later.
   
== Profiling ==
+
=== Profiling ===
   
 
:''Generate the data, advice on how to scrutinise it (help especially wanted)''
 
:''Generate the data, advice on how to scrutinise it (help especially wanted)''
   
=== Generate the data ===
+
==== Generate the data ====
   
 
This should just be:
 
This should just be:
 
profiling/run
 
profiling/run
   
=== Determine what is wrong ===
+
==== Determine what is wrong ====
   
=== Fix your code ===
+
==== Fix your code ====
   
 
See [[Performance]] for ideas, especially [[Performance/GHC]] if relevant
 
See [[Performance]] for ideas, especially [[Performance/GHC]] if relevant
   
=== Run it again ===
+
==== Run it again ====
 
profiling/run
 
profiling/run
   
=== Save the results? ===
+
==== Save the results? ====
 
Happy with the direction things are taking?
 
Happy with the direction things are taking?
 
profiling/save
 
profiling/save
   
 
Go profile again!
 
Go profile again!
  +
  +
<!-- This comment makes it easier to edit this page in Emacs.
  +
Local Variables:
  +
mode: wikipedia
  +
End:
  +
-->

Revision as of 15:53, 28 March 2018


This page shows you how to profile Haskell programs using GHC. The information is presented in the form of several case studies.

Note: It would be a good idea to gather together the information contained in the case studies and present it here in reference form.

This article is a stub. You can help by expanding it.

Case study: AntVis

This case study is a graphical simulation of ant foraging that demonstrates the use of software transactional memory. A Haskell version of the simulation was written by Jeff Foster and described in detail on his blog.

There were some performance and memory usage problems with the first version of the simulation. Another blog post details Jeff's use of profiling to solve the problems.

Case study: Processing XML using HaXML

I have a script that converts from an XML format to some pickled data structures via Data.Binary. The XML part is generated by HaXml's DtdToHaskell. On a 54M XML file, the thing swaps like crazy and takes several hours. I would like to improve the situation.

Preliminaries

Preflight checklist

  • Are you using compiler flags for optimised code, e.g. -O?
  • Are you using the latest version of your libraries?
  • If you just set the an optimisation code, did you remember to make clean (or the equivalent) and rebuild?

Enable profiling on libraries

For example, my script uses HaXmL, which uses a library called polyparse:

cd polyparse
runhaskell Setup.hs configure --enable-library-profiling
runhaskell Setup.hs build
sudo runhaskell Setup.hs install
cd ..
cd HaXml
runhaskell Setup.hs configure --enable-library-profiling
runhaskell Setup.hs build
sudo runhaskell Setup.hs install

When they are done building, you should notice output like:

ar: creating archive dist/build/libHSpolyparse-1.0.a
ar: creating archive dist/build/libHSpolyparse-1.0_p.a

The _p file is the library with profiling information. Note that the non-profiling one is also created and installed, so you don't have to worry about this slowing down your regular code.

You'll need to do this for every library that you use.

Enable profiling on your stuff

Note that I assume you are using Cabal. If not, see How to write a Haskell program. It's super easy, and you'll be happy you did it.

If you are looking to profile code that results in an executable, it's pretty straight-forward.

cd yourProgram
runhaskell Setup.hs configure --enable-executable-profiling
runhaskell Setup.hs build

No need to install it. We'll be making changes aplenty.

If you are looking to profile library code that a particular executable invokes, there's one more step. This presentation is specific to GHC's profiling; look to your compiler's manual if you're not using GHC.

cd yourLibrary
runhaskell Setup.hs configure --enable-library-profiling --ghc-option=-auto-all
runhaskell Setup.hs build
sudo runhaskell Setup.hs install

That extra -auto-all bit tells the profiler to track the internals of that library more carefully. You might refine the tracking later (see the GHC manual's chapter on profiling), but -auto-all is usually a good place to start.

Let's summarize.

  1. Every library the code you want to profile depends on (transitively) must be compiled with --enable-library-profiling.
  2. Your executable must be compiled with --enable-executable-profiling.
  3. If you want to profile the code of a library, then that library needs to be compiled with GHC's -auto-all option (or comparable for other compilers) in addition to --enable-library-profiling. (Cabal might have a generic flag for this after version 1.8, according to this ticket.)

This article proceeds under the assumption that you are profiling an executable's code, but it's the same basic idea if you're investigating a library's code.

Get toy data

My script takes hours to convert 50M of XML. Running it on such data every time I tweak something would clearly not be a good idea. You want something which is small enough for your program to come back relatively quickly, but large enough to study.

I use something like sed -f makeToy.sed reallyBigFile.xml > toy.xml where makeToy.sed is a bit of text-hacking to chop off the rest of my data after the arbitrarily chosen item #6621:

/6621/{
c\
</grammar>
q
}

Test harness

Make things easy on yourself! I find that it's very helpful to automate my way out of my clumsiness. Ideally, each tweak you make to your software should be accompanied by a simple run and not some long sequence of actions, half of which you might forget. Note: you might also consider using a Makefile instead of a bunch of scripts.

We'll be working with a stable and unstable repository. It's possible that you'll be making a lot of small modifications to your program, so what would be nice is to be able to save some of your modifications along the way. Darcs is very handy for this.

Create a profiling directory

mkdir profiling
mv toy.xml profiling

Create a script profiling/setup

#!/bin/sh
chmod u+x profiling/setup
chmod u+x profiling/run
chmod u+x profiling/compare
chmod u+x profiling/save
runhaskell Setup.lhs configure --enable-executable-profiling

Create a script profiling/run

This script compiles your code, and runs it on some profiling data

#!/bin/sh

PROG==geniconvert
VIEW==open
FLAGS==--yourflags profiling/toydata.xml

runhaskell Setup.lhs build
dist/build/${PROG}/${PROG} ${FLAGS} +RTS  -p -hc -s${PROG}.summary
hp2ps ${PROG}.hp
${VIEW} ${PROG}.ps
cat ${PROG}.summary

Create a script profiling/compare

Create a script profiling/save

#!/bin/sh
darcs push --no-set-default ../perfStable
cd ../perfStable
profiling/run

Create a stable branch

darcs get yourRepository perfStable
cd perfStable
sh profiling/setup
cd ..
cd yourRepository
sh profiling/setup

You should work in the unstable branch (yourRepository). From time to time, you'll want to record your changes and push them into the stable branch. More on this later.

Profiling

Generate the data, advice on how to scrutinise it (help especially wanted)

Generate the data

This should just be:

profiling/run

Determine what is wrong

Fix your code

See Performance for ideas, especially Performance/GHC if relevant

Run it again

profiling/run

Save the results?

Happy with the direction things are taking?

profiling/save

Go profile again!