HSFFIG/Linkage optimization

From HaskellWiki

This page describes possibilities to optimize linkage with library bindings autogenerated by hsffig from C header files.

The program below was used in HsffigTutorial as an illustration of the problem when a C header file contains prototypes for functions not actually defined by the library associated with the header. Even though an application linked against the library does not use those functions, their presence in the header file causes references to them included in object files resulting from the compilation of autogenerated Haskell code. This in turn causes linker messages about unresolved references and failed executable build process.

Another issue with hsffig is that object code not actually used is linked into executables because the linker includes a whole object module even if only one reference to a symbol defined in that module exists. The "standard" way of compilation with GHC results in one object module per one source module.

GHC has an object module splitting facility (the -split-objs command line option) used mainly to create package libraries. This serves exactly the purpose to optimize linkage against those libraries to include only those object code pieces that are necessary to build an executable. Unfortunately this splitting object files mode is incompatible with another extremely convenient facility, the make mode (the --make command line option). Additionally, only compilable Haskell source/object/interface files are recognized by GHC in --make mode as targets ane dependencies, not libraries.

The purpose of this page is to show how the --make option may be used to generate an executable linked against a library containing split object files, so linkage will be optimized.

The method described on this page uses the ability of GHC to generate intermodular dependencies (the -M command line option) in the Makefile format which may be used with the standard make utility.

The following assumptions are made:

  • The make utility supports pattern rules. GNU make satisfies this. Other vendors' make utilities were not tested.
  • awk, grep, and find are available on the system where compilation is performed.
  • These methods were tested on GHC version 6.2.2 on Linux (Intel). Other platforms/GHC versions were not tested.

The program itself just calls the write function defined in the unistd.h header file: indeed this is a kernel call. When run, the program outputs "Hello World" on its standard output and terminates.

-- Test of syscalls invoked directly with unistd.h

module Main where

import UNISTD_H

main = withCStringLen "Hello World\n" $ \(x,y) -> do
  f_write (fromIntegral 0) x (fromIntegral y)

The Makefile to build the executable. It should define AR, AWK, GCC, GHC, RANLIB, HSC2HS, GREP, FIND, and SPLITTER variables in order for the library building makerules worked properly. Note the include ffilib.mk statement in the Makefile.

AR = ar
AWK = awk
GCC = gcc
GHC = ghc
GREP = grep
FIND = find
RANLIB = ranlib
HSFFIG = /path/to/hsffig
SPLITTER = /path/to/splitter

# The command line to invoke hsc2hs:

HSC2HS = hsc2hs -t /dev/null

all: syscall

include ffilib.mk

# The hsc file is obtained by just passing the header file through hsffig.

UNISTD_H.hsc: /usr/include/unistd.h
        $(GCC) -E -dD /usr/include/unistd.h | $(HSFFIG) > UNISTD_H.hsc

# This target builds the executable out of its source files (syscall.hs, but may be any number of them),
# and the library produced out of the hsc code generated by hsffig.

syscall: syscall.hs libUNISTD_H.a
        $(GHC) -fglasgow-exts -package HSFFIG --make syscall.hs libUNISTD_H.a -o syscall
        strip syscall

The file to include in the Makefile, ffilib.mk:

# An awk script to form the archiver command lines. All object modules must be added to the archive.
# Their number may be thousands even for a small header file. If placed all in one command line,
# this may cause shell command line limit exceeded, and the archiver will not be started.
# This script reads object file names one by one, holds them in the command line buffer
# until its length exceeds 16k bytes (empirically set size), and finally feeds those command lines to
# the archiver.

define make-library
$(AWK) '\
BEGIN {\
  argbuf=""\
}\
{\
  argbuf=argbuf " " $$0 ;\
  if (length(argbuf) > 16384) {\
    system ("$(AR) qv $@ " argbuf) ;\
    argbuf=""\
  }\
}\
END {\
  system ("$(AR) qv $@ " argbuf)\
}\
'
endef

# The target to build the library out of the hsc file. This target is used in the toplevel
# Makefile.

lib%.a : %.hsc

# Run hsc2hs to obtain Haskell code from hsc code.

        $(HSC2HS) $< -o $*.hs_unsplit

# Split the large Haskell file into one-per-structure (see the Tutorial) files.

        $(SPLITTER) $*.hs_unsplit

# Create the intermodular dependencies file. The main Makefile will be included (along with this
# file as it is included in the toplevel Makefile).

        echo "include Makefile" > $*_depend

# Run GHC on all the Haskell sources related to the library being built: their names will be selected
# by the $**.hs pattern ($* is Haskell module name to include with the library, e. g. UNISTD_H).
# Dependencies will be appended to the dependencies file created at the previous step.

        $(GHC) -package HSFFIG -M $**.hs -optdep-f -optdep$*_depend

# Force compilation of the file created by hsc2hs: it contains functions to access bit fields (if any).

        $(GHC) -c $*_hsc.c

# Run make on the dependencies file. ghc -M is not as powerful as ghc --make, so certain modules must be
# forced to compile first.

        make -f $*_depend $*_S_t.o $*_S_n.o $*_S_d.o $*_C.o $*_S.o $*.o

# Delete the target. The library might have existed from previous (possibly failed) runs of make.

        rm -f $@

# Force creation of the library and inclusion of the object module of the file generated by hsc2hs.

        $(AR) cq $@ $*_hsc.o

# The find + grep programs will list all the object files created by the object file splitter (not to be confused
# with the hsffig splitter). These file names are piped to the awk script which takes care on proper
# length of the archiver command line. Result of this step is the library ready to use. Finally, ranlib is run
# on the library: it is not necessary with GNU ar which rebuilds the name index even with the -q option, but may be
# useful if other vendor's archiver is used.

        $(FIND) . -name '$**.o' | $(GREP) 'split/' | $(make-library)
        $(RANLIB) $@

# The rule to compile a Haskell source and split its object file: used in the dependencies file.

%.hi %.o : %.hs

# Create the directory for split objects and ignore the error if it already exists.

        -mkdir $*_split 2>/dev/null

# To cheat GHC, create a dummy object file named as if it were obtained by traditional compilation
# of the Haskell source by GHC.

        echo "int dummy_$*_stub_entry;" | $(GCC) -x c -c -o $*.o -

# Actually compile the source splitting the object files.

        $(GHC) -split-objs -c -package HSFFIG $<

# Make sure the (fake) object file and the (true) interface file have the same timestamp.

        touch $*.o $*.hi

# This rule (or its .o.hi suffix rule analog) appears in the GHC build system documentation:
# do nothing to obtain an interface file from an object file.

%.hi : %.o
        true

When the executable is being built, GHC will print multiple messages about skipped object modules. This is caused by the fact that GHC does not treat libraries as dependencies for --make, and tries to resolve module dependencies using timestamps of object and interface modules. As the latter are both provided with proper timestamps, the compiler decides that they are new enough not to force the compilation of the source file again. Finally, the compiler uses the library to resolve all necessary symbol references that the executable may have to the code autogenerated by hsffig.

As a result, the executable is much smaller than if it were produced with --make in the "traditional" way.


User:DimitryGolubovsky