FFI imports packaging utility

From HaskellWiki

Abstract

The Haskell Cabal [1] is a framework which defines a common interface for authors to more easily build their applications in a portable way. The Haskell Foreign Functions Import Generator (hsffig) [3] is a tool to convert a C header file (.h) into Haskell code containing FFI [2] import statements for all entities whose declarations are found in the header file. The FFI Packaging Utility (ffipkg) is a tool that integrates the functionality of hsffig with the Cabal framework allowing for building and installation of packages entirely consisting of foreign functions imports.

It is recommended that readers of this document be familiar with the documents referred to as [3] and [4].

Benefits of packaging FFI imports

To build a Haskell application linked to a foreign library, it is necessary to specify the locations of certain files (C headers and static or shared library files) for the Haskell compiler, and this information must be remembered for every application using the library. Building a FFI package means that all such information is contained within the package descriptor, and all that needs to be remembered is just name of the package.

Purpose

The ffipkg utility prepares a Haskell package for building by accepting locations of C header and foreign library files as command line arguments and producing Haskell source files with FFI declarations, a Makefile, a Cabal package descriptor file, and a Setup.hs file suitable for running the Cabal package setup program. The utility acts as a "driver" running the C preprocessor, the equivalent of the hsffig program, and the source splitter. The Makefile created allows for compilation of Haskell source files into split object files: a feature provided by GHC. This technique is discussed in [4].

Command line options

Synopsis

Usage: ffipkg [OPTION...] include-file...
  -v      --verbose              provide verbose output
  -i      --header               stop after writing package include file
  -?, -h  --help                 print this help message
  -I                             include files location (may be multiple)
  -L                             library files location (may be multiple)
  -l                             library file to link (may be multiple)
  -c      --cpp=                 option for CPP (may be multiple)
  -V      --version              show program version number
  -w 0.0  --package-version=0.0  specify version of the package
  -p      --package-name=        name the package (will be uppercased)
          --with-make=make       path to make
          --with-awk=awk         path to awk
          --with-ar=ar           path to ar
          --with-ghc=ghc         path to ghc
          --with-gcc=gcc         path to gcc
          --with-hsc2hs=hsc2hs   path to hsc2hs

Package naming and versioning

Per the Cabal specification, the two fields are mandatory for a package descriptor file: Name and Version. The -p option sets the name of the package into its argument uppercased. If omitted, name of the first include file found on the command line will be used for package name, uppercased, with directory part and file name suffix stripped. The -w option sets the version field of the package descriptor file to its argument. The version supplied is checked for correctness using the same parser Cabal itself uses. If the syntax of the version is incorrect, or if the option is omitted, the default version string "0.0" will be used.

For FFI packages, versioning does not carry as much sense as it does for native library packages. In some cases, as shown in the Berkeley DB binding example, it may be set to the version of the library used, but this is totally up to the FFI package creator. It is generally safe to omit this option unless there are separate packages created for different versions of the same library.

Location of libraries and include (header) files

Similarly to GCC, the -I option is used to specify location(s) where header files will be searched for, and the -L option is used to specify location(s) where library files will be searched for. The -l option is to specify name(s) of library files to link the resulting executable against, and all non-option command line arguments regardless of their position will be treated as include file names (although it is advisable to place all the option arguments on the command line first, and then all non-option arguments).

The ffipkg utility itself does not check for validity or existence of directories and files supplied this way; it only places this information in appropriate fields of the Cabal package descriptor file created by the utility.

Include file names may or may not include the directory part. If included, that may be either relative or absolute paths. The utility creates a small include file which in turn contains #include directives for all include files found on the command line in the order of their appearance.

For example, if the command line contains:

db.h sys/stat.h

then the include file will look like:

/* File is generated automatically: do not edit */
#include "db.h"
#include "sys/stat.h"

Library file names may be anything a particular linker would accept. It is recommended though to keep with the standard practice not to include directory part into library file names, but use the -L option instead.

Number of -I, -L, -l options, and non-option command line arguments is not limited by the logic of the utility.

External programs

The --with-XXX options may be used to specify paths to certain programs (list of programs may vary between the versions of ffipkg) used during the course of action, and referred to in the Makefile. This may be necessary if a program is not on the default PATH, or it is desired to use a specific version of a program other than installed in "standard" way.

Only absolute paths to executable programs are accepted with the --with-XXX command line options.

The ffipkg utility checks for existence and executability of these programs (full list in the Synopsis), and also of several other programs (such as echo, rm, find, etc.) to make sure that the Makefile produced is valid.

If any of these programs is not possible to execute, the utility aborts with a diagnostic message. Users are advised to check their PATH environment variable in this case.


Other options

  • The -v option sets verbosity level: if specified, the 'ffipkg utility will output intermediate information about intermediate steps.
  • The -i option causes the utility to stop after the package include file has been created. No hsc code is produced, and neither Makefile nor Setup.hs files are created. This option may be used to just check that all the external programs are available or specified correctly on the utility command line.
  • The -V option prints version number of the utility.
  • The -h option prints the command line options synopsis.
  • The -c option passes an option to the C preprocessor and compiler. To add a definition of a constant for the C preprocessor and compiler, use for example: -c "-DX=Y"

Creating a FFI import package

Preparatory steps

Create an empty package building directory and change there. There are no source files specific to a FFI package.

Determine which include and library files will be used in the package, locate them. Keep in mind that the more is the summary length of all include files, the longer it takes to run the utility, and this dependency may be nonlinear. Also determine the package installation location: it should not be the same directory as the package building directory nor its subdirectory.

Possible reasons of failure at this step:

none (in this and following sections, only failures related to the utility itself, and the Cabal setup program are discussed; reasons like no space available on the file device are not mentioned).

Remember: always start with an empty directory. Any pre-existing files may have unspecified effects on the process of a FFI package creation.

Creating Haskell sources, makefile, Cabal file, setup program

Execute the ffipkg utility supplying all information about locations of include and library files, and package name and version if applicable.

After the utility finishes, the following files will be created in the directory:

  • the package include file
  • Haskell sources which are result of running hsc2hs, and the splitter
  • the Cabal package descriptor file
  • the Makefile
  • the Setup.hs file

Possible reasons of failure at this step:

  • one or more external programs necessary to build the package were not found or cannot execute: rerun the utility with -i, -v options to see which programs failed.
  • syntax error reported by the preprocessor: check if the correct include files were supplied.
  • syntax error reported by the header file parser: this is an internal error; contact the utility developer/maintainer.

Configuring a package

Execute the runghc Setup.hs configure command as specified in [1].

Possible reasons of failure at this step:

none related to the utility itself. For Cabal-related failures see the Cabal documentation.

Building a package

Execute the runghc Setup.hs build command as specified in [1].

Possible reasons of failure at this step:

none related to the utility itself. For Cabal-related failures see the Cabal documentation.

Package installation

Execute the runghc Setup.hs install command as specified in [1].

Possible reasons of failure at this step:

none related to the utility itself. For Cabal-related failures see the Cabal documentation.

Using FFI packages: Naming conventions

If name of a package is PKG then the module to include in an application using the package is HS_PKG_H. This module should be imported in any application's module which uses the package.

See also [3] for the information how imported functions/ structures/ unions/ etc. are named and visible to applications using FFI packages.

Examples

In this section, several examples of building FFI packages are provided. In these examples, commands running the Cabal setup program are omitted. They are issued in a standard manner as if this were a regular Haskell package.

Hello, World

A very simple example featuring:

  • Utility default operation.

This example shows how to create a package out of the unistd.h include file, and how to call low-level library calls directly from a Haskell program.

It is assumed that the include file unistd.h is at some location known to the compiler, so there is no need in additional command line options.

Command line to prepare the package UNISTD:

ffipkg unistd.h

A sample program using this package (syscall.hs):

-- Test of syscalls invoked directly with unistd.h

module Main where

import HS_UNISTD_H

main = withCStringLen "Hello World\n" $ \(hello_p, hello_len) -> do
  f_write (fromIntegral 0) hello_p (fromIntegral hello_len)

A command line to GHC to compile and build the executable using the package UNISTD:

ghc -fglasgow-exts -package HSFFIG -package UNISTD --make syscall.hs -o syscall

In the sample program above, f_write is the name that the write (2) function is imported under. See [3] for the complete information about how the names of imported entities are formed by HSFFIG.

Alternative bindings to syslog (3)

This example was inspired by [7]. Features:

  • Multiple include files
  • Explicitly naming the package
  • Extra parameters for the C preprocessor
  • Comparison with "hand-crafted" bindings

To prepare the SYSLOG package, the ffipkg utility is invoked as follows:

ffipkg -p SYSLOG -c "-DSYSLOG_NAMES" unistd.h syslog.h

Package name is specified explicitly to prevent picking it from the first include file name. It is necessary to include unistd.h before syslog.h as the manual page suggests. Part of the syslog.h header file definitions is surrounded by an #ifdef SYSLOG_NAMES directive, so to open it up for import, the -c option is used.

An example program (hslogger.hs) is a simplified command-line logger:

module Main where

import HS_SYSLOG_H
import Data.Bits
import Data.List
import System.Environment

main = do
  iargs <- getArgs >>= return . unwords
  withCString "Haskell Logger" $ \lid -> do
    f_openlog lid (fromIntegral c_LOG_PID) (fromIntegral c_LOG_LOCAL0)
  withCString "%s" $ \fmt ->
    withCString iargs $ \str ->
    withArray [str, nullPtr] $ \atxt -> do
      f_vsyslog (fromIntegral c_LOG_INFO) fmt (castPtr atxt)
  f_closelog

Running it like this:

hslogger This is only a test                                                                   

results in the following record in the log:

Feb  7 21:27:14 dmghome Haskell Logger[2729]: This is only a test

One thing to note is that the function whose name is expected to be f_syslog does not exist: this function is variadic. Another finction, f_vsyslog is used instead: it takes a list of pointers instead of variable number of arguments.

The command line to build the Haskell Logger is:

ghc -fglasgow-exts -package HSFFIG -package SYSLOG --make hslogger.hs -o hslogger

The original bindings developed by Peter Simons are "hand crafted". The developer manually grouped values for Priority, Facility, and Option into algebraic data types. Therefore the bindings interface looks more "functional" from the very beginning.

On the contrary, automatically generated bindings are just mechanical translation of imperative interface into Haskell. One significant advantage here is that all type signatures of functions imported are automatically captured, thus reducing the risk of error. So, to make the interface look "functional", similar hand crafting is necessary, only the startting level will be different.

Berkeley DB binding

This example features:

  • Non-standard location of library and header files
  • Explicit package versioning

It is assumed for the purpose of this example, that Berkeley DB version 4.2 is installed at /usr/local/BerkeleyDB.4.2, and this location is not searched by the C compiler nor by the linker by default.

The command line to prepare the package DB:

ffipkg -w 4.2 -I /usr/local/BerkeleyDB.4.2/include \
              -L /usr/local/BerkeleyDB.4.2/lib \
              -l db \
              db.h sys/stat.h

The version (-w) is set explicitly, but this may be needed only if an application depends on a specific version of Berkeley DB library to distinguish between library versions.

The test program bdbtest.hs is too large to be included here: use the link provided.

The command line to compile the test program is:

ghc -fglasgow-exts -package HSFFIG -package DB --make bdbtest.hs -o bdbtest

So, it is not necessary to remember where the library is installed: this information is saved with the package.

X11 transport protocol

This example is rather speculative: it does not contain much usable code, but instead gives an idea how to reuse definitions of a transport layer protocol represented as a set of C structure declarations to re-implement the same protocol in Haskell.

Features:

  • Import of header files only to reuse definitions contained in them

Among many header files that come with X window system sources, there are two of interest: X.h and Xproto.h. The former contains various constant definitions: event codes, error codes, etc. The latter consists mainly of structure and union declarations. These declarations define binary structure of packets that X11 client and server exchange.

So, to prepare the XPROTO package, the following command may be issued:

hsffig-1.1/bin/ffipkg -p XPROTO \
    -I /usr/X11/include/X11 \
    X.h Xproto.h

Note that there are no references to any libraries. No foreign functions are imported: only what is declared in the header files as constants and structures/unions.

According to [6], a typical X11 request packet consists of an opcode field, request body of arbitrary length, and the request length field, which usually follows the opcode (with some exceptions, when the whole request body fits in one octet, it may follow after the opcode field, like in the example below).

For a simlpiest request, Bell, the following packet structure is defined in [6]:

Bell
1 104 opcode
1 INT8 percent
2 1 request length

The corresponding structure in Xproto.h looks like this:

typedef struct {
    CARD8 reqType;
    INT8 percent;  /* -100 to 100 */
    CARD16 length B16;
} xBellReq;    

This structure is to be filled with values and transmitted over the X11 client-server connection as is, i. e. as a sequence of octets, from zero to maximum offset within the structure memory image.

So, to form a Bell request, the following (shown in fragments) code (using hsffig imports) may be written:

...
import HS_XPROTO_H
...

allocA f = alloca $ \(buf :: Ptr a) -> do
  pokeArray (castPtr buf) (take (sizeOf (undefined :: a)) ([0..] :: [Word8]))
  f buf

...

bell pct = do
  allocA $ \(rqptr :: Ptr T_xBellReq) -> do
    rqbl <- (rqptr --> V_sizeof) >>= (return . fromIntegral)     -- 1
    (rqptr, V_reqType) <-- fromIntegral c_X_Bell                 -- 2
    (rqptr, V_percent) <-- fromIntegral pct                      -- 3
    (rqptr, V_length) <-- fromIntegral (rqbl `div` 4)            -- 4
    rqb <- peekArray (fromIntegral rqbl) (castPtr rqptr)
    return $ [
        RqEnc8 (fromIntegral rqbl) rqb
      ]

The allocA function acts as a standard alloca, but it ensures that the memory allocated is filled with zeros. The RqEnc8 constructor encapsulates an array of octets for further transmission, but this is beyond the scope of this example.

Conclusion

Functionality of the FFI Packaging Utility (ffipkg) and examples of its usage have been discussed. This utility has been developed with hope that it helps speed up the process of buiding applications that depend on external (foreign) libraries or structure definitions for data transmission formats.

The developer will always appreciate any feedback. For suggestions, questions, and concerns please send e-mail to: Dimitry Golubovsky.

References