Please report any overly-slow GHC-compiled programs. Since GHC doesn't have any credible competition in the performance department these days it's hard to say what overly-slow means, so just use your judgement! Of course, if a GHC compiled program runs slower than the same program compiled by another compiler, then it's definitely a bug.
1 Use Optimisation
Optimise, using -O or -O2: this is the most basic way to make your program go faster. Compilation time will be slower, especially with -O2.
At present, -O2 is nearly indistinguishable from -O.
GHCi cannot optimise interpreted code, so when using GHCi, compile critical modules using -O or -O2, then load them into GHCi.
2 Measuring Performance
The first thing to do is measure the performance of your program, and find out whether all the time is being spent in the garbage collector or not. Run your program with the +RTS -sstderr option:
$ ./clausify 20 +RTS -sstderr 42,764,972 bytes allocated in the heap 6,915,348 bytes copied during GC (scavenged) 360,448 bytes copied during GC (not scavenged) 36,616 bytes maximum residency (7 sample(s))
81 collections in generation 0 ( 0.07s) 7 collections in generation 1 ( 0.00s)
2 Mb total memory in use
INIT time 0.00s ( 0.00s elapsed) MUT time 0.65s ( 0.94s elapsed) GC time 0.07s ( 0.06s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.72s ( 1.00s elapsed)
%GC time 9.7% (6.0% elapsed)
Alloc rate 65,792,264 bytes per MUT second
Productivity 90.3% of total user, 65.1% of total elapsed
This tells you how much time is being spent running the program itself (MUT time), and how much time spent in the garbage collector (GC time).
If you can't reduce the GC cost any further, then using more memory by tweaking the GC options will probably help. For example, increasing the default heap size with +RTS -H128m will reduce the number of GCs.
If your program isn't doing too much GC, then you should proceed to time and allocation profiling to see where the big hitters are.
3 Unboxed types
When you are really desperate for speed, and you want to get right down to the “raw bits.” Please see GHC Primitives for some information about using unboxed types.
This should be a last resort, however, since unboxed types and primitives are non-portable. Fortunately, it is usually not necessary to resort to using explicit unboxed types and primitives, because GHC's optimiser can do the work for you by inlining operations it knows about, and unboxing strict function arguments (see Performance:Strictness). Strict and unpacked constructor fields can also help a lot (see Performance:Data Types). Sometimes GHC needs a little help to generate the right code, so you might have to look at the Core output to see whether your tweaks are actually resulting in the desired results.
One thing that can be said for using unboxed types and primitives is that you know you're writing efficient code, rather than relying on GHC's optimiser to do the right thing, and being at the mercy of changes in GHC's optimiser down the line. This may well be important to you, in which case go for it.
4 Looking at the Core