Inlining and Specialisation

From HaskellWiki

The inliner and specialiser are the two parts of the optimiser which are crucial to writing performant functional programs. They ensure that we can write programs at a high-level of abstraction which are simplified when eventually used with concrete arguments.

The inliner's job is to replace a function with its definition. This removes one layer of indirection and most importantly allows other optimisations to fire. The specialiser is important for optimising code which uses type classes. Type classes are desugared into dictionary passing style but the specialiser removes a layer of indirection by creating new functions with the relevant dictionaries already supplied.

This document will explain the basics of these two parts of the optimiser and some user-facing options which can be used to control them.

What does the INLINABLE pragma do?[edit]

Top-level definitions can be marked INLINABLE.

myComplicatedFunction :: (Show a, Num a) => ...
myComplicatedFunction = ...

{-# INLINABLE myComplicatedFunction #-}

This causes exactly two things to happens.

  1. The function's (exact) definition is included in the interface file for the module.
  2. The function will be specialised at use sites -- even across modules.

Note that GHC is no more keen to inline an INLINABLE function than any other.

What does the INLINE pragma do?[edit]

The INLINE pragma can be applied to top-level definitions. It behaves like the INLINABLE pragma but makes GHC very keen to inline the function.

mySimpleFunction :: ...
mySimpleFunction = ...

{-# INLINE mySimpleFunction #-}

It is a sledgehammer and without care you can make the compiler take a long time and produce a lot of code. Most "ticks exhausted" panics are due to library authors misusing INLINE pragmas.

Liberally sprinkling all your definitions with INLINE is likely make the compiler take a very long time to compile your program. It is not beneficial to inline every function, inlining a function which is not optimised further only increases overall code size without improving performance.

One situation where it is useful to use an INLINE pragma is when the definition of the function contains functions which are mentioned in RULES. In this case, it is essential that the optimiser is quite aggressive so that the RULES can fire.

Optimised vs unoptimised unfoldings[edit]

GHC will decide to include some small unfoldings in interface files. When it does this, it first optimises the definitions so that they are not repeatedly optimised at each use site after being inlined. Unfoldings included by INLINE or INLINABLE are unoptimised so that they interact better with RULES.

What is an interface file?[edit]

An interface file stores all information about a module which is needed by other modules.

The key to cross-module inlining and specialisation is making sure that we have the definitions of functions we want to inline at hand. Information is only passed between modules by interface files, therefore we must include the unfoldings of definitions in interface files if we want to inline them across modules.

The extension for interface files is .hi, you can see what's in an interface file by using the --show-iface flag.

What is an unfolding?[edit]

The unfolding of a function f is what f is replaced by when it is inlined. This is usually the definition of f.

When are unfoldings included in interface files?[edit]

Not all definitions are included in interface files by default, doing so might create quite large files. There's no point including an unfolding of very large definitions which we will never inline in other modules.

Unfoldings end up in interface files in three ways:

  1. GHC decides to include unfoldings of small functions by default which it knows it will inline later.
  2. Functions marked as INLINE or INLINABLE are included in interface files.
  3. Compiler flags such as -fexpose-all-unfoldings include all unfoldings of all definitions in a module unless they are marked as NOINLINE.

What is specialisation?[edit]

Specialisation is the process of removing typeclass dictionary arguments by creating a new type-specialised definition for an overloaded function. Once specialised, dictionary methods can be easily inlined which usually creates more efficient code.

For example, if we define the overloaded function foo

foo :: Show a => a -> a -> Bool
foo x y = show x == show y

the following core definition will be produced:

foo = \ @a $dShow x y ->
    eqString (show $dShow x) (show $dShow y)

There are now 4 parameters to foo, the first argument is a type (denoted by @), the second argument is the dictionary for Show (denoted by the $d prefix) and the last two are the arguments x and y of the original definition.

The class constraint is translated into a dictionary. Each time a class method is used, we must dynamically lookup which definition to use in the supplied class dictionary.

If we know which type a is instantiated with, we can specialise the definition of foo and produce much better code.

qux :: Bool -> Bool -> Bool
qux = foo @Bool

Using foo at a specific type produces a new definition foo_$sfoo which is defined as:

foo_$sfoo :: Bool -> Bool -> Bool
foo_$sfoo = foo @Bool $fShowBool

Further optimisations then inline foo and then the dictionary selector show which produces the following more direct program.

foo_$sfoo =
  \ x y ->
    case x of {
      False ->
        case y of {
          False -> foo4;
          True -> foo3
        };
      True ->
        case y of _ {
          False -> foo2;
          True -> foo1
        }
    }

When does specialisation occur?[edit]

Specialisation occurs when an overloaded function is called at a specific type. The specialised definition is placed in the module where the call happens but also exported so that it can be reused if there is another upstream call-site where specialisation would take place.

By default, functions are not specialised across modules.

There are two ways to make functions specialise across modules:

  1. Marking a function as INLINABLE or INLINE.
  2. Using the flag -fspecialise-aggressively when compiling the client module. An unfolding must still be available to perform specialisation.

Further to this, observe that for specialisation to occur across modules, the unfolding must be made available in interface files.

Notice this subtle point, the INLINABLE pragma guarantees the precise conditions for a function to be specialised across modules.

How do I use the SPECIALISE pragma?[edit]

The SPECIALISE pragma is used to create a specialised copy of an overloaded function even if it is not used with that type in the module.

module A where

class C ...

foo :: C a => a -> a

{-# SPECIALISE foo :: Text -> Text #-}

This example will create a new function, foo_$sfoo :: Text -> Text which will be used whenever foo is applied to a Text value even in modules which import A.

This is useful to prevent GHC creating many copies of the same specialised function if you have a very flat module structure.

What is a loop-breaker?[edit]

In general, if we were to inline recursive definitions without care we could easily cause the simplifier to diverge. However, we still want to inline as many functions which appear in mutually recursive blocks as possible. GHC statically analyses each recursive groups of bindings and chooses one of them as the loop-breaker. Any function which is marked as a loop-breaker will never be inlined. Other functions in the recursive group are free to be inlined as eventually a loop-breaker will be reached and the inliner will stop.

Note: Do not apply INLINE pragmas to loop-breakers, GHC will never inline a loop breaker regardless of which pragma you attach. In fact, with a debugging compiler, core lint will warn about using an INLINE pragma on a loopbreaker.

Loop-breakers are discussed in detail in section 4 of Secrets of the Glasgow Haskell Compiler inliner.

How does GHC choose a loop-breaker?[edit]

GHC uses a heuristic to decide which definitions it would be least beneficial to inline and to choose those as loop breakers. For example, it is very beneficial to inline simple expressions and dictionary selector functions so they are given high scores. Discounts are also available if an unfolding is available thus marking a definition as INLINABLE or INLINE will usually cause GHC to not choose it.

Which flags can I use to control the simplifier and inliner?[edit]

-fspecialise-aggressively removes the restrictions about which functions are specialisable. Any overloaded function will be specialised with this flag. This can potentially create lots of additional code.

-fexpose-all-unfoldings will include the (optimised) unfoldings of all functions in interface files so that they can be inlined and specialised across modules.

Using these two flags in conjunction will have nearly the same effect as marking every definition as INLINABLE apart from the fact that the unfoldings for INLINABLE definitions are not optimised.

Will GHC ever inline recursive definitions with static arguments?[edit]

Sometimes people ask if GHC is smart enough to unroll a recursive definition when given a static argument. For example, if we could define sum using direct recursion:

sum :: [Int] -> Int
sum [] = 0
sum (x:xs) = x + sum xs

A user might expect sum [1,2,3] to be optimised to 6. However, GHC will not inline sum because it is a self-recursive definition and hence a loop-breaker. The compiler is not smart enough to realise that repeatedly inlining sum will terminate.

But there are a few tricks that can make this work regardless:

1. Unrolling using rewrite rules[edit]

While the compiler is not currently smart/daring enough on its own to attempt to inline a loop breaker, we can create one or more rewrite rules that make sure that when the function is called with a static argument, it is rewritten to its body:

sumExample :: Int
sumExample = simpleSum [1,2,3]

simpleSum :: [Int] -> Int
{-# NOINLINE simpleSum #-}
simpleSum [] = 0
simpleSum (x : xs) = x + simpleSum xs

{-# RULES 
"simpleSum/base" simpleSum [] = 0
"simpleSum/recurse" forall x xs. simpleSum (x : xs) = x + simpleSum xs
#-}

When above code is compiled with optimizations enabled, sumExample = simpleSum [1, 2, 3] will be rewritten to sumExample = 6. Here is another example:

findFirstExample :: Int -> Maybe String
findFirstExample x = findFirst [(10, "a"), (20, "b"), (30, "c")] x

findFirst :: Eq k => [(k, v)] -> k -> Maybe v
{-# NOINLINE findFirst #-}
findFirst [] _ = Nothing
findFirst ((k, v) : xs) needle = if k == needle then Just v else findFirst xs needle

{-# RULES 
"findFirst/base" forall x. findFirst [] x = Nothing
"findFirst/recurse" forall k v xs needle. findFirst ((k, v) : xs) needle = if k == needle then Just v else findFirst xs needle
#-}

With optimizations enabled, this turns into:

findFirstExample
  = \ x_aIl ->
      case x_aIl of { I# y_aW5 ->
      case y_aW5 of {
        __DEFAULT -> Nothing;
        10# -> lvl_rWv;
        20# -> lvl1_rWw;
        30# -> lvl2_rWx
      }
      }

(That is: after the rule rewrites, GHC is clever enough to collapse all nested ifs into a single case statement!)

This technique has two small disadvantages to be mindful of:

  1. It requires writing the body of the function twice: Once in the normal function definition and once more in the rewrite rule section. When you change the code, be mindful to change it in both places!
  2. You have to be a bit careful: While a rewrite rule will only be applied if it not only matches and type-checks, it is still possible to accidentally create an 'infinite' rewrite rule. Such rewrite rules can cause an infinite loop at compile-time.

This technique works not only for lists, but for all datatypes and functions. It can be very valuable to improve the inlining (and thereby performance), especially of embedded Domain Specific Languages (eDSLs).

See On the unrolling of recursive functions applied to static arguments (‘Inlining of loop breakers’) for a detailed discussion of this technique.

2. Unrolling using GHC's List Fusion[edit]

As a variant on the above: In GHC, a whole system of rewrite rules is already in place when combining static list arguments with common list manipulation functions like foldr, map, filter and much of the Prelude and Data.List modules.

As such, if you're dealing with specifically lists, when you define your function based on these functions rather than writing manual recursion, those pre-existing rewrite rules kick in:

sumExampleFold :: Int
sumExampleFold = foldBasedSum [1,2,3]

foldBasedSum :: [Int] -> Int
{-# INLINE foldBasedSum #-}
foldBasedSum xs = foldr (+) 0 xs

When above code is compiled with optimizations enabled, sumExampleFold = foldBasedSum [1,2,3] is rewritten to sumExampleFold = 6 just as above.

(Side note: Since foldBasedSum itself is no longer a recursive function but instead only calls foldr, we can and really do want it to inline such that the foldr is always exposed, and therefore the list fusion rules have a chance to fire even across module boundaries).

Similarly, the following alternative implementation of findFirst gets compiled to the same simple case statement just as above.

findFirstFold :: Eq k => [(k, v)] -> k -> Maybe v
{-# INLINE findFirstFold #-}
findFirstFold xs key = foldr (\(k,v) acc -> if k == key then Just v else acc) Nothing xs


When dealing with lists, this is nicer than writing your own rewrite rules for unrolling as there is no duplicate code to maintain. However, this technique can only be used with lists.

3. Use type arguments instead of value arguments[edit]

An alternative technique that does not depend on rewrite rules, is to lift value arguments to the type level. This can be used to tell GHC that an argument is truly static: As types (are guaranteed to) fully disappear during compilation, GHC has to evaluate such code during compile-time, including when optimizations are not enabled!

By defining suitable type class instances, we can recurse on the structure of the type as we would on a normal value. This time however, GHC will happily inline each "recursive" call as each call to sum is at a different type.

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE InstanceSigs #-}
{-# LANGUAGE PolyKinds #-}
module Sum where

import Prelude (Integer, (+))
import GHC.TypeLits

data Proxy x = Proxy

class Sum (xs :: [Nat]) where
  sum :: proxy xs -> Integer

instance Sum '[] where
  sum _ = 0

instance (KnownNat x, Sum xs) => Sum (x ': xs) where
  --sum :: Proxy (x ': xs) -> Int
  sum _ = natVal (Proxy @x) + sum (Proxy @xs)

main = sum (Proxy @'[1,2,3])

Inspecting the core we find that the definition of main is simplified to the constant value 6.

One advantage of this technique is that it does not depend on rewrite rules, and also works when optimizations are not enabled (though without optimizations enabled, constant folding might not happen either). A disadvantage however, is that the way your function is called is now quite different. Furthermore, you have to be able to lift not only the (in this case) list to the type level, but also all of its contents, which might not always be possible.

Side note that this is slightly different to the static argument transformation which applies to a multi-parameter recursive functions where one of the arguments is the same for each recursive call. In this case, there are no arguments which remain constant across recursive calls.

This 'lifting to types' technique is due to Andres Löh.

More Links[edit]

Acknowledgements[edit]

Thanks to Reid Barton, Ashok Menon and Csongor Kiss for useful comments on a draft.