Shared libraries and GHC

From HaskellWiki
Revision as of 07:32, 31 October 2017 by Allbery b (talk | contribs) (further clarification about inlining)

Jump to: navigation, search

GHC, inlining, and dynamic linking

Or: why Haskell shared objects are a bad idea

Lazy evaluation for languages like Haskell is accomplished by using a graph reduction engine instead of normal evaluation. This is effective and about as efficient as you can get when a majority of values are not strictly evaluated. However, it is only relatively efficient: evaluation jumps around a lot more than in straightforward procedural or OO code, and often does indirect jumps. In addition, GHC implements Haskell typeclasses as a runtime dictionary of type-specific operations which can require double indirect jumps (one for the dictionary itself, the other for the type-specific function/typeclass method). (The only time the STG graph reduction engine used by GHC does a call instead of a jump is garbage collection, or calls to foreign libraries.) In addition, because of lazy evaluation, all values are indirect and may require additional indirect jumps to force evaluation to the next constructor (or, depending on context, the entire value).

All this jumping and indirect jumping carries a cost. In some cases involving typeclass dictionaries, this cost can be extremely severe: one recent pathological case saw a 90x slowdown when inlining was disabled. 5-10x is more common, but 20x is not especially unusual.

GHC tries to inline code whenever possible, and in particular tries to resolve typeclass methods at compile time to avoid the dictionary lookup. This doesn't help evaluation across modules, however. To deal with this, GHC exports some of each module's internal code for inlining in the .hi (Haskell Interface) file. When compiling another module with inlining enabled, it can inline the code from the interface file for that module instead of having to jump to it. This can also mean that a double indirect jump through a typeclass dictionary can be optimized to a direct jump, or even avoid the extra jumps entirely.

It also means that the binary interface (ABI) to a module includes this exported inlinable code. If you change the code in the module, you must recompile the modules that use it in case they have inlined some of that code, or you risk errors ranging from bizarre results to crashes.

This means that Haskell shared libraries have to carry an ABI hash incorporating the contents of the .hi file, and that hash will usually change if you change the source code to a module unless you happen to only change things that are self-recursive or otherwise can't be inlined. So the primary reason you would want to use a Haskell shared library — being able to drop in a bug fix without rebuilding consumers of that library — is more or less impossible. (There are other reasons you might need a Haskell shared library, mostly related to run-time loading for things like Template Haskell splices.)

NOTE: Until recently, GHC shared libraries had an additional problem that precluded their use even with careful build systems such as Nix: they embedded compiler temporary names into the shared library, again altering the ABI but on every build even of an identical library with identical options. This has been fixed, but it does nothing about the more general issue with cross-module inlining. And it is unlikely that cross-module inlining will ever go away because of the severe performance penalties described above.