Performance

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Haskell Performance Resource Constructs: Data Types - Functions Overloading - FFI - Arrays Strings - Integers - I/O Floating point - Concurrency Modules - Monads Techniques: Strictness - Laziness Avoiding space leaks Accumulating parameter Implementation-Specific: GHC - nhc98 - Hugs Yhc - JHC

Welcome to the Haskell Performance Resource, the collected wisdom on how to make your Haskell programs go faster.

Introduction

In most cases it is possible to write a Haskell program that performs as well as, or better than, the same program written in [insert language here]. There's a big caveat though: you may have to modify your code significantly in order to improve its performance. Compilers such as GHC are good at eliminating layers of abstraction, but they aren't perfect, and often need some help.

There are many non-invasive techniques: compiler options, for example. Then there are techniques that require adding some small amounts of performance cruft to your program: strictness annotations, for example. If you still don't get the best performance, though, it might be necessary to resort to larger refactorings.

Sometimes the code tweaks required to get the best performance are non-portable, perhaps because they require language extensions that aren't implemented in all compilers (eg. unboxing), or because they require using platform-specific features or libraries. This might not be acceptable in your setting.

If the worst comes to the worst, you can always write your critical code in C and use the FFI to call it. Beware of the boundaries though - marshaling data across the FFI can be expensive, and multi-language memory management can be complex and error-prone. It's usually better to stick to Haskell if possible.

Basic techniques

The key tool to use in making your Haskell program run faster is profiling. Profiling is provided by GHC and nhc98. There is no substitute for finding where your program's time/space is really going, as opposed to where you imagine it is going.

Another point to bear in mind: By far the best way to improve a program's performance dramatically is to use better algorithms. Once profiling has thrown the spotlight on the guilty time-consumer(s), it may be better to re-think your program than to try all the tweaks listed below.

Another extremely efficient way to make your program snappy is to use library code that has been Seriously Tuned By Someone Else. You might be able to write a better quicksort than the one in Data.List but it will take you much longer than typing import Data.List.

We have chosen to organise the rest of this resource first by Haskell construct (data types, pattern matching, integers), and then within each category to describe techniques that apply across implementations, and also techniques that are specific to a certain Haskell implementation (eg. GHC). There are some implementation-specific tecniques that apply in general - those are linked from the General Implementation-Specific Techniques section below.

Haskell constructs

General techniques

Compiler specific techniques

More information

There are plenty of good examples of Haskell code written for performance in the The Computer Language Shootout Benchmarks
And many alternatives, with discussion, on the old Haskell wiki

Specific comparisons of data structures

Data.Sequence VS lists

import Data.Sequence

insert_million 0 sequence = sequence
insert_million n sequence = insert_million (n - 1)(sequence |> n)

main = putStrLn (show (Data.Sequence.length (insert_million 1000000 empty)))

ghc -O2 --make InsertMillionElements.hs time ./InsertMillionElements +RTS -K100M

1000000

real 0m7.238s

user 0m6.804s

sys 0m0.228s

insert_million 0 list = reverse list
insert_million n list = insert_million (n -1) (n:list)

main = putStrLn (show (length (insert_million 1000000 [])))

ghc -O2 --make InsertMillionElements.hs time ./InsertMillionElementsList +RTS -K100M

1000000

real 0m0.588s

user 0m0.528s

sys 0m0.052s

Lists are substantially faster on this micro-benchmark.