This page collects together information about the optimisations that GHC does and does not perform.
- GHC experts: Please check that the info in this page is correct.
- Everybody else: Feel free to add questions!
2 General optimisations
2.1 Common subexpression elimination
First of all, common subexpression elemination (CSE) means that if an expression appears in several places, the code is rearranged so that the value of that expression is computed only once. For example:
foo x = (bar x) * (bar x)
might be transformed into
foo x = let x' = bar x in x' * x'
GHC doesn't actually perform CSE as often as you might expect. The trouble is, performing CSE can affect the strictness/lazyness of the program. So GHC does do CSE, but only in specific circumstances --- see the GHC manual. (Section??)
Long story short: "If you care about CSE, do it by hand."
2.2 InliningInlining is where a function call is replaced by that function's definition. For example, the standard
map :: (a -> b) -> [a] -> [b] map f  =  map f (x:xs) = f x : map f xs
Now if you write something like
foo = map bar
foo  =  foo (x:xs) = bar x : foo xs
So, that's what inlining is. By default, GHC will inline things if they are 'small enough'. Every time you inline a function, you are in a sense making a (customised) copy of that function. Do too much of this and the compiled program will be enormous. So it's only worth it for 'small' functions.
(How does GHC determine 'small'? Isn't there a switch that adjusts this?)
2.3 SpecialisationFlexibility is the enemy of performance. Take
GHC tries to do this where possible. However (as I understand it?) this tends to work less well across module boundaries. For example, suppose you write
module Physics where data Force = ... instance Num Force where ... resultant_force :: [Force] -> Force resultant_force = sum
Generally GHC won't just take an existing function and recompile it with a new type signature. What might happen is that the function gets inlined, and specialised from there. (Can someone say something more concrete here?)
2.4 Strictness analysis
Haskell is a lazy language. Calculations are notionally not performed until their results are 'needed'. However, if the result definitely will be needed, it's a waste of time and effort to save up the expression and execute it later; more efficient to just execute it right now.
Strictness analysis is a process by which GHC attempts to determine, at compile-time, which data definitely will 'always be needed'. GHC can then build code to just calculate such data, rather than the normal (higher overhead) process for storing up the calculation and executing it later.
Unfortunately, looking at a program and saying "will this data be needed?" is a bit like looking at a program and saying "this program will never halt" --- see The Halting Problem. (Good link?) But GHC does its best, and can give big speedups in some cases.
3 Execution Model
In order to understand how to write efficient code, and what GHC does with your code to optimise it, it helps to know a bit about what your compiled code looks like and how it works.
3.1 Graph reduction
To a first approximation, at any moment your program is a 'graph' of objects in memory. ('Graph' in the graph theory sense --- nodes connected by arcs.) Some of the objects are 'data' --- booleans, integers, strings, lists, etc. Some of those objects are functions (because Haskell lets you pass functions around like data). And some of these are thunks --- unevaluated expressions (because Haskell only evaluates expressions 'as needed').The program starts off with a single node representing the unevaluated call to
3.2 About STG
GHC compiles to the spineless tagness G-machine (STG). This is a notional graph reduction machine (i.e., a virtual machine that performs graph reductions as described above). 'G-machine' because it does graph reduction. 'Spineless' because it can't stand up to bullies. 'Tagless' because the graph nodes don't have 'tags' on them to say what they are.
Instead of tags, the nodes have access pointers. If the node is a thunk, its pointer points to the code to evaluate the thunk and return the real result. Otherwise the pointer points to some 'do-nothing' code. So to access any type of node, you just do an indirect jump on this pointer; no case analysis is necessary.
(Gosh I hope I got that lot right!)
Internally, GHC uses a kind of 'machine code' that runs on this non-existent G-machine. It does a number of optimisations on that representation, before finally compiling it into real machine code (possibly via C using GCC).
3.3 STG optimisations
There are a number of optimisations done at the STG level. These mainly involve trying to avoid unnecessary steps. For example, avoid creating a thunk which immediately creates another thunk when executed; make it evaluate all the way down to a final result in one go. (If we 'need' the thunk's value, we're going to evaluate all the way down anyway, so let's leave out the overhead...)
3.4 Primitive data typesHaskell-98 provides some standard types such as
-- From GHC.Exts: data Int = I# Int# data Word = W# Word# data Double = D# Double# -- etc.
3.5 Algebraic data types
(I'm not sure about the basic memory layout. Somebody fill in the general case?)
There are a few special cases:
3.5.1 Types with 1 constructor
If a function puts a bunch of things into a type value, and the caller immediately takes the things out of the bunch again, GHC will try to eliminate the bundle type all together. (Or is that just for tuples?)
3.5.2 Constructors with no fields
Booleans are a good example:
data Bool = False | True