Please report any overly-slow GHC-compiled programs. Since GHC doesn't have any credible competition in the performance department these days it's hard to say what overly-slow means, so just use your judgement! Of course, if a GHC compiled program runs slower than the same program compiled by another compiler, then it's definitely a bug.
Optimise, using -O or -O2: this is the most basic way to make your program go faster. Compilation time will be slower, especially with -O2.
At present, -O2 is nearly indistinguishable from -O.
GHCi cannot optimise interpreted code, so when using GHCi, compile critical modules using -O or -O2, then load them into GHCi.
The first thing to do is measure the performance of your program, and find out whether all the time is being spent in the garbage collector or not. Run your program with the +RTS -sstderr option:
$ ./clausify 20 +RTS -sstderr 42,764,972 bytes allocated in the heap 6,915,348 bytes copied during GC (scavenged) 360,448 bytes copied during GC (not scavenged) 36,616 bytes maximum residency (7 sample(s))
81 collections in generation 0 ( 0.07s) 7 collections in generation 1 ( 0.00s)
2 Mb total memory in use
INIT time 0.00s ( 0.00s elapsed) MUT time 0.65s ( 0.94s elapsed) GC time 0.07s ( 0.06s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.72s ( 1.00s elapsed)
%GC time 9.7% (6.0% elapsed)
Alloc rate 65,792,264 bytes per MUT second
Productivity 90.3% of total user, 65.1% of total elapsed
This tells you how much time is being spent running the program itself (MUT time), and how much time spent in the garbage collector (GC time).
If you can't reduce the GC cost any further, then using more memory by tweaking the GC options will probably help. For example, increasing the default heap size with +RTS -H128m will reduce the number of GCs.
If your program isn't doing too much GC, then you should proceed to time and allocation profiling to see where the big hitters are.