HSFFIG/Tutorial

From HaskellWiki
Revision as of 11:47, 3 June 2009 by F.s. (talk | contribs) (formatting)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Program input

Hsffig acts as a filter (in Unix sense) which means that it consumes all its input information on its standard input (stdin), and produces its output on standard output (stdout). No other files are involved.

The hsffig program is invoked as follows:

gcc -E -dD header.h | hsffig > HEADER_H.hsc

The -E option instructs gcc only to preprocess (no compilation) its input file, and the -dD option instructs it to emit #define preprocessor statements along with the preprocessed header (without -dD, autogenerated binding module will not contain import statements for constants defined this way).

Preprocessor requirements

It is assumed that majority of hsffig users will use tools from the GNU Compiler Collection (GCC), whose preprocessor complies with the hsffig requirements. For other preprocessors from other vendors, the following is expected:

  • The very first line of preprocessor output will contain the following construct:
# 1 header.h

where header.h is the name of the "toplevel" header file; number following after the hash sign does not matter.

Since hsffig does not take any command line arguments, the only way to determine the name of the module is the preprocessor output itself. When hsffig encounters a line like shown above (starting with hash sign and number), it derives the autogenerated Haskell module name by stripping the directory part from the header file name (as in basename(1)), uppercasing all characters (therefore the header file name cannot start with a digit), and replacing dots with underscores. Therefore, for a header file header.h, the autogenerated module will be named HEADER_H.

If compilant preprocessor is not available, and control lines like discussed above cannot appear in the preprocessor output (with gcc, this may be simulated with -P option), a special token @@MODULENAME@@ will be used in the Module statement. Later, it may be substituted with anything desirable with a program like sed(1). Also, if hsffig is unable to determine the header file name from its input, constants defined via #define will not be imported.

Program output

Hsffig outputs hsc code on its standard output, to be redirected to a file or (possibly) to some other filter program. The output is ready for processing by hsc2hs. Description of hsc2hs input file format may be found here:

http://haskell.org/ghc/docs/6.2.2/html/users_guide/hsc2hs.html

The output file will contain the following sections:

  • Special Definitions for hsc2hs
  • Module header
  • Constants defined with #define
  • For each structure/union imported, an instance of the above mentioned class to access each member
  • Constants defined via enumerations
  • Import declarations for each function/extern variable declared

The preprocessor outputs preprocessed contents of not only the header file specified on its command line, but also of all other header files included within. This results in importing of some extra stuff (for instance, if a header file has #include <stdio.h> inside, everything defined in stdio.h will be imported as well). This merely gives the Haskell compiler which "connects" the library to an application written in Haskell the same look as the C compiler would have if it were "connecting" the library to an application written in C.

In first lines of generated hsc code, an #include statement is present to include the same header file that was consumed by hsffig. If hsffig is unable to determine header file name from its input (see Preprocessor Requirements), a special token @@INCLUDEFILE@@ will be placed with the #include statement instead. Similarly to the module name substitute, it may be later converted into a real header file name with a program like sed.

Special definitions for hsc2hs

In generated hsc code, the following hsc2hs macros are used to access members of structures/unions:

  • #peek
  • #poke
  • #ptr

These macros take type declaration of the structure whose members are being accessed, as its first argument.

Consider the following structure declaration (a slightly modified fragment of BerkeleyDB's db.h):

struct __dbc {
        DB *dbp;                        /* Related DB access method. */
        DB_TXN   *txn;                  /* Associated transaction. */

        struct {
                DBC *tqe_next, **tqe_prev;
        } links;
        /* . . . other members . . . */
}

To access members of the {{{struct ___dbc}}}, e. g. to peek a value of dbp, the following hsc2hs macro will be used:

(#peek struct __dbc, dbp)

But links is declared "anonymously", i. e. its structure type is not named. The only way to construct a #peek/#poke/#ptr macro is to put the whole structure declaration as its first argument:

(#peek struct {DBC *tqe_next, **tqe_prev;}, tqe_next)

This will cause an error because the C preprocessor macro that hsc2hs converts the above construct into, takes only two arguments, but comma in the structure declaration will be recognized as argument separator, i. e. there will be three arguments.

To work this around, hsffig makes the following declaration at the beginning of generated hsc code:

#ifndef __quote__
#define __quote__(x...) x
#endif

This is a gcc feature called "macro varargs". The only thing this macro does is to represent its arguments whatever is their number, as a single token for the preprocessor.

So the macros from the example above will look:

(#peek __quote__(struct __dbc), dbp)
(#peek __quote__(struct {DBC *tqe_next, **tqe_prev;}), tqe_next)

Additionally, the template file that hsc2hs needs to convert hsc to Haskell code needs to be modified. It uses the offsetof macro which also takes two arguments, but passing anonymously declared structure type containing commas will cause the same problems. So in the modified template file, the same __quote__ macro is used with offsetof:

#define hsc_peek(t, f) \
    printf ("(\\hsc_ptr -> peekByteOff hsc_ptr %ld)", (long) offsetof (__quote__(t), f));

These special definitions are placed in the very beginning of every file that hsffig generates. A separate template file for hsc2hs is no longer needed. So, to generate importable Haskell code, a command like one below needs to be issued:

hsc2hs -t /dev/null HEADER_H.hsc -o HEADER_H.hs

Specifyiing /dev/null as template file name causes hsc2hs not to use any template file.

Splitting large modules

Considering that hsffig generates import statements for everything it encounters in all header files included within the toplevel header file, amount of generated Haskell code may be quite large, so it cannot be compiled by ghc at once due to memory limitations. To overcome this, a special utility, splitter is included with hsffig. In generated hsc code, the following comments are inserted:

-- Split begin/DB_H

{-# OPTIONS -fglasgow-exts -ffi #-}
#include "db.h"
#ifndef __quote__
#define __quote__(x...) x
#endif

module DB_H(
  module DB_H,
{-- #SPLIT#
  module DB_H_C,
  module DB_H_S,
  module DB_H_F,
  module DB_H_E,
  module DB_H_S_cnt,
#SPLIT# --}
  module Foreign,
  module Foreign.C.String,
  module Foreign.C.Types) where

import Foreign
import Foreign.Ptr
import Foreign.C.Types
import Foreign.C.String
{-- #SPLIT#
import DB_H_C
import DB_H_S
import DB_H_F
import DB_H_E
import DB_H_S_cnt
#SPLIT# --}

-- Split end

This example is found in the very begining of generated hsc code. The splitter program reads its input file line by line. Comments like {{{-- Split begin/DB_H}}} redirect all subsequent lines in a file whose name is composed of characters following after '/' with '.hs' appended. Comments like {{{-- Split end}}} close output file: input lines are lost until another Split begin comment is met. Lines {{{ {-- #SPLIT# }}} and {{{ #SPLIT# --} }}} are removed from output, so enclosed pieces of Haskell code are uncommented.

As a result, one large module is split into many parts:

  • The "Root" module to be imported into an application: name derived from the header file name, see above, e. g. DB_H
  • A separate module containing import statements for constants declared with #define: name is derived from the header file name and postfixed with "_C", e. g. DB_H_C
  • A separate module containing import statements for constants declared via enumerations: name is postfixed with "_E", e. g. DB_H_E
  • A separate module containing import statements for variables and functions: name is postfixed with "_F", e. g. DB_H_F
  • A separate module containing declaration of a multiparameter class and datatypes necessary to access members of structures/unions: name is postfixed with "_S_cnt", e. g. DB_H_S_cnt (Note: as of the patchlevel 2, instead of a single module postfixed with "_S_cnt", three separated modules are created, positixed with: "_S_n" for newtype declarations, "_S_t" for type aliases, and "_S_d" for declarations of structure member selector datatypes (V_..., X_..., D_...).
  • A separate module for each structure/union defined in the header file and files included within: name is postfixed with "_S_" and followed by the structure type identifier derived by hsffig, e. g. DB_H_S_S{{{___}}}db_dbt for {{{struct __db_dbt}}}, or module DB_H_S_S_1142 for a structure declared anonymously (hsffig assigns unique numeric identifiers to such structures).

However, application developers do not need to know all these module names: all they still need to know is name of the "root" module (DB_H in our example) to be imported.

It is recommended to use ghc with --make option: this forces the compiler to chase all module dependencies automatically. Since all the modules created by splitter contain correct import statements, no module will be left behind.

FFI import details

Naming conventions

A significant difference between C and Haskell is that C does not force starting character of its variable, function, or type identifiers to be of certain case while Haskell has requirements of this kind. To resolve this problem, the following rules apply when an identifier declared in a C library is imported in a Haskell application.

All identifiers imported from a C library, or created to make import easier, are prefixed as shown in the table below:

Prefix Purpose Sort of Import
Constant Variable Function
Pointer
Function Structure
Type
Structure
Member
c_ Constant imported with #define Yes No No No No No
e_ Constant imported with enum Yes No No No No No
f_ Standalone function No No No Yes No No
p_ Pointer (memory address) of a variable No Yes Yes No No No
v_ Value of a variable No Yes Yes No No No
s_ Set a variable No Yes Yes No No No
x_ Function pointed to by a variable No No Yes No No No
w_ Wrapper for a function argument No No Yes Yes No No
S_, U_ Structure/Union type as imported No No No No Yes No
V_ Member of a structure (value) No No No No No Yes
D_ Member of a structure - arrays only (dimensions) No No No No No Yes
X_ Member of a structure (callable) No No No No No Yes
T_ C Type Alias (typedef) as imported No No No No No No

The table summarizes all prefixes hsffig adds to imported identifiers, and corresponding types of import. Flags Yes and No show whether given prefix and import purpose is suitable for each sort of imported entity.

Typedefs

C types declared with the typedef statement are imported as type aliases prefixed with "T_". These type aliases may be more convenient to use in certain situations.

The following C declaration:

typedef union {
  struct {
    int a:1;
    int b:1;
    int c:1;
    int d:1;
  } x;
  int y;
} *PBF;

imports as:

type T_PBF = Ptr (U_9)

newtype S_5 = S_5 ()
newtype U_9 = U_9 ()

-- instances to access S_5 and U_9 members follow

In the following Haskell program:

module Main where

import BF_H

main = do
  putStrLn $ "Test of Bit Fields"

  alloca $ \(bfu :: T_PBF) -> do
    x <- (bfu --> V_x)
    (x, V_a) <-- fromIntegral 1
    (x, V_b) <-- fromIntegral 0
    (x, V_c) <-- fromIntegral 0
    (x, V_d) <-- fromIntegral 1
    y <- (bfu --> V_y)
    putStrLn $ "y = " ++ (show y)

it is necessary to declare the type of bfu because there is no foreign import declaration that would drive the type inference mechanism. Since the union is declared anonymously, the type derived by hsffig contains a numeric identifier that is hard to remember. Imported typedef helps because it may be used instead, and is controlled by a developer (Ptr U_9 couls be used with the same effect).

Per FFI Addendum, the following typedefs are imported as they are (the Addendum defines names of Haskell types equivalent to them):

* ptrdiff_t
* size_t
* wchar_t
* sig_atomic_t
* clock_t
* time_t
* FILE
* fpos_t
* jmp_buf

Constants (#define)

For each constant declared with #define, generated hsc code looks as follows:

#define BIG_ENDIAN      __BIG_ENDIAN
#define PDP_ENDIAN      __PDP_ENDIAN
#define BYTE_ORDER      __BYTE_ORDER
#define S_IREAD       S_IRUSR
#define S_IRGRP (S_IRUSR >> 3)
#define S_IROTH (S_IRGRP >> 3)

becomes

c_BIG_ENDIAN = #const BIG_ENDIAN
c_PDP_ENDIAN = #const PDP_ENDIAN
c_BYTE_ORDER = #const BYTE_ORDER
c_S_IREAD = #const S_IREAD
c_S_IRGRP = #const S_IRGRP
c_S_IROTH = #const S_IROTH

that is, hsffig does not deal with actual value of the constant: it is left to hsc2hs.

However, arbitrary #define cannot be imported this way. Some preprocessor macros have parameters, or contain C statements, or otherwise cannot be assignment expression's RHS. To rule such preprocessor macros out, for each #define hsffig encounters on its input, a short C program is created, like this:

#include "header.h"
static int a = BIG_ENDIAN;

for #define BIG_ENDIAN __BIG_ENDIAN. The command line for gcc is:

gcc -pipe -x c -q -fsyntax-only - 2>/dev/null

If this program passes syntax check, the constant is imported, otherwise not imported.

This explains why without being able to determine the header file name from its input, hsffig is unable to import #define 'd constants. This also shows that processing time for a header will be rougly linear to the number of #define preprocessor statements in the header.

Constants are not in the IO monad, and may be used in any function. Bit operations (from Foreign.C.Bits) may be applied to them:

dbperm = c_S_IRUSR .|. c_S_IWUSR .|. c_S_IRGRP .|. c_S_IWGRP .|. c_S_IROTH

may be used to specify file permissions {{{rw-rw-r--}}}

Constants (enum)

Enumerations are imported similarly to preprocessor-defined constants, but identifiers are prefixed with "e_".

An example showing usage of both #define and enumeration constants:

  dbop <- dbp --> V_open
  ret <- withCString "access.db" $ \dbname -> do
    r <- dbop dbp
              nullPtr
              dbname
              nullPtr
              (fromIntegral e_DB_BTREE)
              (fromIntegral c_DB_CREATE)
              (fromIntegral dbperm)
    return r

fromIntegral may be necessary to apply to constants when calling foreign functions because the Haskell compiler infers the type for the following sort of expressions:

c_DB_CREATE = 1
e_DB_BTREE = 1

as Integer while the function referred to by dbop in the example above requires CInt, CUInt, CInt for its last three arguments.

Variables

Regular variables

Regular variables contain either values or pointers to other variables (not functions). For this sort of imports, the following identifiers are created:

  • a pointer to the variable itself: prefix is "p_"
  • a nullary function returning variable's value in the IO monad: prefix is "v_"
  • a unary function taking a value and assigning it to the variable (in the IO monad): prefix is "s_"

So, for a variable declared as

extern int a;

FFI import declarations will be as follows:

foreign import ccall "header.h &a"
  ___2___ :: Ptr (CInt)
p_a = ___2___
v_a = peek ___2___
s_a = poke ___2___

where ___2___ is one of internal unique names that hsffig assigns to all identifiers it encounters in the header file.

So, to retrieve value from a variable, v_a is used:

x <- v_a -- equivalent to peek p_a

To update the variable value, s_a is used:

s_a 4 -- equivalent to poke p_a 4

If a variable contains pointer to a value, e. g

extern int *ap;

then it is imported as follows:

foreign import ccall "header.h &ap"
  ___3___ :: Ptr (Ptr (CInt))
p_ap = ___3___
v_ap = peek ___3___
s_ap = poke ___3___

so to dereference the pointer, additional peek is needed:

c <- v_ap >>= peek

and s_ap will update the variable itself with new address, rather than the value it points to.

Function pointers

Function pointers contain pointers to executable code.

For this sort of imports, the following identifiers are created:

  • pointer to the variable itself: prefix is "p_"
  • a function the variable points to (with proper arity and type signature): prefix is "x_"
  • a nullary function returning the function pointed to by the variable (in the IO monad): prefix is "v_"
  • a unary function updating the variable with new function reference: prefix is "s_"
  • optionally: wrapper functions for arguments taking function references: prefix is "w_"

A variable declared as:

int (*c)(long,int);

is imported as follows:

foreign import ccall "header.h &c"
  ___8___ :: Ptr (FunPtr (CLong -> CInt -> IO CInt))
p_c = ___8___
foreign import ccall "dynamic"
  ___8___mk___ :: FunPtr (CLong -> CInt -> IO CInt) -> (CLong -> CInt -> IO CInt)
foreign import ccall "wrapper"
  ___8___wr___ :: (CLong -> CInt -> IO CInt) -> IO (FunPtr (CLong -> CInt -> IO CInt))
x_c _1 _2 = peek ___8___ >>= \s -> ___8___mk___ s _1 _2
v_c = peek ___8___ >>= (return . ___8___mk___)
s_c = \s -> ___8___wr___ s >>= poke ___8___

Note that the function referred to by c returns IO CInt. It is assumed that all functions imported from a C library are unsafe, and therefore forced to be monadic. Based on the same assumption, all calls are considered "safe": this also allows calling Haskell functions from C code.

A variable declared as a function pointer in C , from the Haskell point of view contains a FunPtr i. e. virtual address of some foreign function's entry point. This cannot be used by a Haskell application directly, so a special dynamic import function needs to be declared. Such a pseudo-function (only its name and type signature need to be declared: the Haskell compiler does the rest) converts a FunPtr into a function usable by Haskell applications. Another pseudo-function, wrapper import does the opposite: converts a Haskell function into a monadic value containing a FunPtr i. e. a virtual address of an entry point usable by foreign functions (useful for callbacks).

For the variable - function pointer c, p_c returns pointer to the variable, just like for a regular variable. However, v_c returns the variable value passed through the dynamic import pseudo-function, so it may be used directly in Haskell applications. The setter function, s_c takes a Haskell function with proper type signature and updates the c variable value with new function address. The function x_c is the foreign function c points to.

The difference between v_c and x_c is illustrated below:

pc1 <- v_c
r1 <- pc1 33::CLong 21::CInt

-- but

r2 <- x_c 33::CLong 21::CInt

v_c is a monadic value; x_c is not.

Optionally additional wrapper imports are created if a function pointed to by a variable takes other functions as arguments.

Consider the following:

typedef int (*F2) (long);
typedef float (*X2) (double);
F2 (*ee) (X2);

The variable ee is a pointer to a function taking a function from double to float, and returning a function from long to int.

The import looks as follows:

foreign import ccall "header.h &ee"
  ___40___ :: Ptr (FunPtr (FunPtr (CDouble -> IO CFloat) -> IO (FunPtr (CLong -> IO CInt))))
p_ee = ___40___
foreign import ccall "dynamic"
  ___40___mk___ :: FunPtr (FunPtr (CDouble -> IO CFloat) -> IO (FunPtr (CLong -> IO CInt))) ->
    (FunPtr (CDouble -> IO CFloat) -> IO (FunPtr (CLong -> IO CInt)))
foreign import ccall "wrapper"
  ___40___wr___ :: (FunPtr (CDouble -> IO CFloat) -> IO (FunPtr (CLong -> IO CInt))) ->
    IO (FunPtr (FunPtr (CDouble -> IO CFloat) -> IO (FunPtr (CLong -> IO CInt))))
x_ee _1 = peek ___40___ >>= \s -> ___40___mk___ s _1
v_ee = peek ___40___ >>= (return . ___40___mk___)
s_ee = \s -> ___40___wr___ s >>= poke ___40___
foreign import ccall "wrapper"
  w_ee_1 :: (CDouble -> IO CFloat) -> IO (FunPtr (CDouble -> IO CFloat))

An additional import declaration, w_ee_1 has been created. It may be used if ee is called, and some Haskell function is passed as a parameter:

mydoublefloat :: CDouble -> IO CFloat
mydoublefloat = {-- function body here --}

mydoublefloat_wrapped <- w_ee_1 mydoublefloat

foo <- x_ee mydoublefloat_wrapped

-- or

foo <- w_ee_1 mydoublefloat >>= x_ee

Note that in the current version, hsffig does not create a dynamic import when a function imported returns a FunPtr.

Name of autogenerated wrapper import for function arguments ends with "_1". If a function takes more than one argument - function then more than one wrapper import will be created (one per function type). For example:

void (*psetpdf)(double (*df)(double,double),int,long,void(*)(int));

imports as:

foreign import ccall "header.h &psetpdf"
  ___6___ :: Ptr (FunPtr (FunPtr (CDouble -> CDouble -> IO CDouble) -> CInt -> CLong -> FunPtr (CInt -> IO ()) -> IO ()))
p_psetpdf = ___6___
foreign import ccall "dynamic"
  ___6___mk___ :: FunPtr (FunPtr (CDouble -> CDouble -> IO CDouble) -> CInt -> CLong ->
    FunPtr (CInt -> IO ()) -> IO ()) ->
    (FunPtr (CDouble -> CDouble -> IO CDouble) -> CInt -> CLong -> FunPtr (CInt -> IO ()) -> IO ())
foreign import ccall "wrapper"
  ___6___wr___ :: (FunPtr (CDouble -> CDouble -> IO CDouble) -> CInt -> CLong -> FunPtr (CInt -> IO ()) -> IO ()) ->
    IO (FunPtr (FunPtr (CDouble -> CDouble -> IO CDouble) -> CInt -> CLong -> FunPtr (CInt -> IO ()) -> IO ()))
x_psetpdf _1 _2 _3 _4 = peek ___6___ >>= \s -> ___6___mk___ s _1 _2 _3 _4
v_psetpdf = peek ___6___ >>= (return . ___6___mk___)
s_psetpdf = \s -> ___6___wr___ s >>= poke ___6___
foreign import ccall "wrapper"
  w_psetpdf_1 :: (CDouble -> CDouble -> IO CDouble) -> IO (FunPtr (CDouble -> CDouble -> IO CDouble))
foreign import ccall "wrapper"
  w_psetpdf_2 :: (CInt -> IO ()) -> IO (FunPtr (CInt -> IO ()))

The function pointed to by setpdf takes two arguments-functions of different types, therefore two wrapper imports have been created.

Enumeration of wrappers created for arguments does not correspond to positional numbers of arguments.

Functions

Functions are imported under their own names prefixed by "f_". Similarly to the situation described above (variables holding function pointers), additional wrappers will be created for arguments representing other functions. Examples of import:

double dplus(double,double);

imports as:

foreign import ccall "static header.h dplus"
  f_dplus :: CDouble -> CDouble -> IO CDouble

A function taking another function as argument:

int db_env_set_func_yield  (int (*)(void))  ;

imports as:

foreign import ccall "static header.h db_env_set_func_yield"
  f_db_env_set_func_yield :: FunPtr (IO CInt) -> IO CInt
foreign import ccall "wrapper"
  w_db_env_set_func_yield_1 :: (IO CInt) -> IO (FunPtr (IO CInt))

Rules for creation and naming of wrapper imports are the same as defined for variables containing function pointers.

Structures and unions

Basic information

C structures/unions are different from Haskell records with labeled fields: different structure types may contain members with same names. Therefore direct mapping of C structures to Haskell records is not possible.

Some structures are declared with struct statement:

struct foo {
  int bar;
  long baz;
};

Some structures are declared with typedef statement:

typedef struct foo {
  int bar;
  long baz;
} foo_t;

In some cases, structure name in the typedef statement may be omitted:

typedef struct {
  int bar;
  long baz;
} foo_t;

There also may be structures/unions declared within another structures/unions without explicit structure type name declaration:

typedef struct {
  int bar;
  long baz;
  struct {
    float faa;
    size_t taa;
  } str_anon;
} foo_t;

Datatypes and combinators for structure members access

Structure types are imported using the newtype Haskell statement. Type name is prefixed with "S_" for structures and "U_" for unions. If a structure was declared as struct foo, Haskell type name will be S_foo. If a structure was declared anonymously (struct without identifier), a unique numeric identifier will be used as structure type name, e. g. S_65.

A imported structure type declaration looks like: newtype S_foo = S_foo (). Indeed, this type name serves only identification purpose.

In practice, Haskell application developers using hsffig do not need to memorize these clumsy type identifiers: Haskell type inference system carries most of the burden.

To access imported structure members, two algebraic data types are declared for each member identifier hsffig encounters in its input:

  • V_member_id: to get/set value of a structure member member_id
  • X_member_id: if a member holds function pointer, to invoke the function member_id points to

Unlike variables, there is no "S_" prefixed item.

A multi-parameter class with functional dependencies is declared to enable access to structures/unions members. Because of this, Haskell-98 extensions must be enabled in the compiler processing output of hsffig. The definition of this class is placed in the HSFFIG.FieldAccess module of the Haskell library coming with the package.

class FieldAccess a b c | a c -> b where
  (==>) :: Ptr a -> c -> b
  (-->) :: Ptr a -> c -> IO b
  (<--) :: (Ptr a, c) -> b -> IO ()
  (==>) _ _ = error " illegal context for ==>"
  (-->) _ _ = error " illegal context for -->"
  (<--) _ _ = error " illegal context for <--"

The functional dependencies a b c | a c -> b specify that type of b depends entirely on the types a and c, and there may be only one type of b for every possible combination of a and c.

Indeed, a is the type of an imported structure/union. The type c is the data type associated with a structure/union member identifier. The type b is type of the member as represented to a Haskell program. In a C structure/union cannot be more than one member with same name (and different types). That's what the functional dependencies mean in this case. This information is sufficient for the Haskell compiler to infer the type of a value retrieved from a structure/union member.

The multiparameter class mentioned above is not used by Haskell applications directly. All imported types of structures/unions are instances of this class.

So, for a simple structure (note that the structure type and one of its members have the same name):

struct abc {
  int abc;
};

the datatypes are:

newtype S_abc = S_abc () -- for the structure itself
data V_abc = V_abc deriving (Show) -- for the member `abc'
data X_abc = X_abc deriving (Show) -- not used

To get/set the member abc of the structure abc, the following instance is defined:

instance FieldAccess S_abc (CInt) V_abc where
  z --> V_abc = (#peek __quote__(struct abc), abc) z
  (z, V_abc) <-- v = (#poke __quote__(struct abc), abc) z v

So, for this instance, a = S_abc, b = CInt, and c = V_abc. Which is semantically equivalent to the above declaration of the member abc of the struct abc.

The --> combinator retrieves value of the given member from the structure pointed to by the given pointer. Thus, to access the value of abc, a Haskell application contains:

r <- z --> V_abc

Because of the multiparameter class and the instance declaration, the compiler is able to determine the type of the {{{z --> V_abc}}} expression correctly.

Assume another structure declaration:

struct def {
  void (*abc)(char *,int);
};

perfectly legal from the C standpoint. Not only do the two different structures have a member with the same name abc, but these members have different types: one is a scalar integer value; another is a function pointer.

The import for struct def is (its part of the interest):

newtype S_def = S_def ()
instance FieldAccess S_def ((Ptr (CChar) -> CInt -> IO ())) V_abc where
  z --> V_abc = (#peek __quote__(struct def), abc) z >>= (return . ___60___S_def___mk)
  (z, V_abc) <-- v = (___60___S_def___wr v) >>= (#poke __quote__(struct def), abc) z

For this instance, a = S_def, b = Ptr (CChar) -> CInt -> IO (), and c = V_abc i. e. the same as for abc in struct abc.

So, in

r <- z --> abc
t <- x --> abc

if x and z point to structures abc and def respectively, types of values bound to r and t will be different.

The <-- combinator is used for updating structure/union members with new values. This is a "real" update, not simulated by Haskell state mechanisms.

The code

(z,V_abc) <-- 4 -- fixme: is fromIntegral needed for a constant?

is equivalent to C code:

z -> abc = 4;

The ==> combinator is used to call functions pointed to by structre/union members. While --> may be applied only to "V_" prefixed identifiers, the ==> combinator may be applied only to "X_" prefixed identifiers.

The difference between "V_" and "X_" prefixed structure member identifiers is similar to that for imported variables holding function pointers:

  fclose <- dbp --> V_close
  ret <- fclose dbp 0

-- but

  ret <- (dbp ==> X_close) dbp 0

The following is complete set of import statements that hsffig generates for struct def:

instance FieldAccess S_def ((Ptr (CChar) -> CInt -> IO ())) V_abc where
  z --> V_abc = (#peek __quote__(struct def), abc) z >>= (return . ___62___S_def___mk)
  (z, V_abc) <-- v = (___62___S_def___wr v) >>= (#poke __quote__(struct def), abc) z
foreign import ccall "dynamic"
  ___62___S_def___mk :: (FunPtr (Ptr (CChar) -> CInt -> IO ())) -> ((Ptr (CChar) -> CInt -> IO ()))
foreign import ccall "wrapper"
  ___62___S_def___wr :: ((Ptr (CChar) -> CInt -> IO ())) -> IO (FunPtr (Ptr (CChar) -> CInt -> IO ())

instance FieldAccess S_def ((Ptr (CChar) -> CInt -> IO ())) X_abc where
  z ==> X_abc = \_1 _2 -> do
    x <- z --> V_abc
    r <- x _1 _2
    return r

So, z ==> X_abc> returns a function with type Ptr (CChar) -> CInt -> IO (). Because instance with V_abc is defined with only --> combinator, and instance with X_abc is defined with only ==> combinator, and instance with X_abc is declared only for members holding pointers to functions, attempt to use wrong combinator-member combination will result in compilation or runtime error.

Members of structures/unions which are structures/unions themselves are imported so that --> combinator returns a Ptr to the member, and <-- combinator is not allowed. Example (from bfd.h):

typedef struct lineno_cache_entry {
  unsigned int line_number;     /* Linenumber from start of function*/
  union {
    struct symbol_cache_entry *sym; /* Function name */
    unsigned long offset;       /* Offset into section */
  } u;
} alent;

imports as:

newtype S_lineno_cache_entry = S_lineno_cache_entry ()
newtype U_40 = U_40 ()

instance FieldAccess S_lineno_cache_entry (Ptr U_40) V_u where
  z --> V_u = return $ (#ptr __quote__(struct lineno_cache_entry), u) z
  (z, V_u) <-- v = error $ "field u is a structure: cannot be set"

instance FieldAccess S_lineno_cache_entry (CUInt) V_line_number where
  z --> V_line_number = (#peek __quote__(struct lineno_cache_entry), line_number) z
  (z, V_line_number) <-- v = (#poke __quote__(struct lineno_cache_entry), line_number) z v

instance FieldAccess S_lineno_cache_entry (CInt) V_sizeof where
  z --> V_sizeof = return $ (#size __quote__(struct lineno_cache_entry))

--

instance FieldAccess U_40 (CULong) V_offset where
  z --> V_offset = (#peek __quote__(union {struct symbol_cache_entry  *sym; unsigned long  offset; }), offset) z
  (z, V_offset) <-- v = (#poke __quote__(union {struct symbol_cache_entry  *sym; unsigned long  offset; }), offset) z v

instance FieldAccess U_40 (Ptr (S_symbol_cache_entry)) V_sym where
  z --> V_sym = (#peek __quote__(union {struct symbol_cache_entry  *sym; unsigned long  offset; }), sym) z
  (z, V_sym) <-- v = (#poke __quote__(union {struct symbol_cache_entry  *sym; unsigned long  offset; }), sym) z v

instance FieldAccess U_40 (CInt) V_sizeof where
  z --> V_sizeof = return $ (#size __quote__(union {struct symbol_cache_entry  *sym; unsigned long  offset; }))

So, to achieve the same as C code below does:

alent *pal;

/* ... */

off = pal -> u.offset

a Haskell program would contain:

pal <- -- whatever returns IO (Ptr S_lineno_cache_entry)

off <- (pal --> V_u) >>= \u -> (u --> V_offset)

Bit fields

Bit fields (declared as int somefield:n) are supported by hsffig, but there are some additional things that need attention.

There is no such thing as offsetof for bit fields. To implement the --> and <-- combinators, special inline C functions are embedded in the hsc code hsffig generates. This is achieved with the #def macro that hsc2hs recognizes.

These functions are placed in a special C file. This file must be compiled and linked to the final executable. Consider the following C header (bf.h):

union bitfields {
  struct {
    int a:1;
    int b:1;
    int c:1;
    int d:1;
  } x;
  int y;
};

and the Haskell program (bftest.hs):

-- Test for bit fields.

module Main where

import BF_H

main = do
  putStrLn $ "Test of Bit Fields"

  alloca $ \(bfu :: Ptr U_bitfields) -> do
    x <- (bfu --> V_x)
    (x, V_a) <-- fromIntegral 1
    (x, V_b) <-- fromIntegral 0
    (x, V_c) <-- fromIntegral 0
    (x, V_d) <-- fromIntegral 1
    y <- (bfu --> V_y)
    putStrLn $ "y = " ++ (show y)

When hsffig processes bf.h it creates BF_H.hsc. When hsc2hs processes BF_H.hsc it creates BF_H.hs, and additionally BF_H_hsc.h and BF_H_hsc.c. The latter is recommended to compile with ghc:

ghc -c BF_H_hsc.c

which provides correct compilation environment, so necessary paths for the runtime header files will be passed to gcc properly.

After all, the executable may be compiled and linked:

ghc -ffi -fglasgow-exts -package HSFFIG --make bftest.hs BF_H_hsc.o -o bftest

If bftest is run, it prints "9", as one might expect (at least on the x86 architecture).

Foreign structures/unions as storables and alloca

It is often necessary to pre-allocate a foreign (C) structure/union, so a foreign function can use it. The alloca and malloc functions are defined in Haskell FFI for this purpose. They require an argument belonging to the class Storable.

So, for every structure type imported by hsffig, iinstance of Storable is declared (for struct abc used in the example above):

instance Storable S_abc where
  sizeOf _ = (#size __quote__(struct abc))
  alignment _ = 1
  peek _ = error $ "peek and poke cannot be used with struct abc"
  poke _ = error $ "peek and poke cannot be used with struct abc"

These four methods are minimally required implementation of Storable per the FFI Addendum.

Actually, only sizeOf plays a role here, returning the same value as C expression sizeof (struct abc) would return.

An example below (retrieval of data from Berkeley DB) shows the use of everything discussed above:

  (ret,
   ks,
   vs) <- alloca $ \dbkey ->                                        -- dbkey = alloca (sizeof (DBT))
          alloca $ \dbdata ->                                       -- dbdata = alloca (sizeof (DBT))
          withCStringLen "fruit" $ \fruit -> do                     -- (address, length)
            (dbkey,V_data) <-- fst fruit                            -- dbkey -> data set to the string address
            (dbkey,V_size) <-- (fromIntegral $ snd fruit)           -- dbkey -> size set to the string length
            r <- (dbp ==> X_get) dbp nullPtr dbkey dbdata 0         -- r = dbp -> get (dbp, NULL, dbkey, dbdata, 0)
            ksc <- (dbkey --> V_data) :: IO CString                 -- ksc = dbkey -> data
            kss <- dbkey --> V_size                                 -- kss = dbkey -> size
            if (r == 0)
              then do dsc <- (dbdata --> V_data) :: IO CString      -- dsc = dbdata -> data
                      dss <- dbdata --> V_size                      -- dss = dbdata -> size
                      ks <- peekCStringLen (ksc, fromIntegral kss)
                      vs <- peekCStringLen (dsc, fromIntegral dss)
                      return (r, ks, vs)
              else return (r, undefined, undefined)

It may be noted again that a Haskell application developer does not need to remember type identifiers of foreign structures involved. The compiler, using FFI import statements and the multiparameter class' functional dependencies, correctly infers all necessary type information. In fact, type of dbkey and dbdata in the example above should be Ptr S_'_'_db_dbt (as defined in the header: typedef struct __db_dbt { ... } DBT;). But this type is not mentioned explicitly in the code.

Compilation of applications using bindings created with HSFFIG

Applications using bindings created with hsffig must be compiled/linked with the option -package HSFFIG passed to ghc. In the example below, the application consists of a single module containing the main funstion: main.hs, and the C library header file is header.h. The library file name is library.a.

The main.hs file must contain an import statement:

  import HEADER_H

When hsffig processes header.h it creates HEADER_H.hsc. When hsc2hs processes HEADER_H.hsc it creates HEADER_H.hs, and additionally HEADER_H_hsc.h and HEADER_H_hsc.c. The latter is recommended to compile with ghc:

ghc -c HEADER_H_hsc.c

which provides correct compilation environment, so necessary paths for the runtime header files will be passed to gcc properly.

After all, the executable may be compiled and linked:

ghc -ffi -fglasgow-exts -package HSFFIG --make main.hs HEADER_H_hsc.o library.a -o main

The library of the package HSFFIG contains definition of the multiparameter class (HSFFIG.Field'Access.Field'Access) and the combinators necessary to access members of C structures/unions (see above). It is not necessary to directly import the hsffig modules in the application modules because the binding module re-exports them. It is however necessary to use the -fglasgow-exts options of ghc.

So, in the simplest case, the following steps are necessary:

gcc -E -dD header.h | hsffig > HEADER_H.hsc

hsc2hs -t /dev/null HEADER_H.hsc -o HEADER_H.hs

ghc -c HEADER_H_hsc.c

ghc -ffi -fglasgow-exts -package HSFFIG --make main.hs HEADER_HSC.o library.a -o main

In some cases, size of the autogenerated Haskell code of the binding module is too large so it cannot be compiled due to lack of system resources. In this case, the binding module has to be split (see above, Splitting Large Modules). Steps to compile and build an executable now are:

gcc -E -dD header.h | hsffig > HEADER_H.hsc

hsc2hs -t /dev/null HEADER_H.hsc -o HEADER_H.hs_unsplit

splitter HEADER_H.hs_unsplit

ghc -c HEADER_H_hsc.c

ghc -ffi -fglasgow-exts -package HSFFIG --make main.hs HEADER_HSC.o library.a -o main

The only one step added is running splitter over the Haskell source code generated by hsc2hs. The way the final executable is linked does not change.

Known problems and workarounds

Variadic functions

C functions whose argument list ends with an ellipsis (...) in other words taking variable number of arguments cannot be imported by hsffig. A common example is printf. If hsffig encounters such a function, it places a comment in its output explaining why the function could not be imported. However if a variable or a structure/union member is defined as a pointer to a variadic function is imported as a pointer to a function with type IO () i. e. void (*p)(void). This preserves the ability to query/set function pointer from a Haskell program.

So:

int (*pvard)(int,int,...);

int vard (int,int,...);

imports as:

foreign import ccall "header.h &pvard"
  ___58___ :: Ptr (FunPtr (IO ()))
p_pvard = ___58___
foreign import ccall "dynamic"
  ___58___mk___ :: FunPtr (IO ()) -> (IO ())
foreign import ccall "wrapper"
  ___58___wr___ :: (IO ()) -> IO (FunPtr (IO ()))
x_pvard  = peek ___58___ >>= \s -> ___58___mk___ s
v_pvard = peek ___58___ >>= (return . ___58___mk___)
s_pvard = \s -> ___58___wr___ s >>= poke ___58___

--

--
-- import of function/variable/structure member(s) vard :: WrongVariadicFunction is not possible
-- because of the following reason(s):
-- function is variadic

Functions/variables of structure type

If a function or a variable has a structure (not structure pointer) type, hsffig cannot import it. If hsffig encounters such a function, it places a comment in its output explaining why the function could not be imported. However, if a structure/union member is itself a structure/union, it is imported so that the --> combinator returns a Ptr to that member. However the <-- combinator is not defined for such members.

So:

struct {int a,b,c,d;} *allocs(int,int,int,int);

struct {int a,b,c,d;} directs(int,int,int,int);

imports as:

foreign import ccall "static header.h allocs"
  f_allocs :: CInt -> CInt -> CInt -> CInt -> IO (Ptr (S_56))

--

--
-- import of function/variable/structure member(s) directs :: CInt -> CInt -> CInt -> CInt -> @S_58 is not possible
-- because of the following reason(s):
-- function takes/returns structure(s) directly
--

Note that while these two functions have return types looking the same, they are not the same from the C compiler standpoint, nor are they the same from the hsffig standpoint.

Unresolved symbols When linking

As mentioned above, hsffig generates FFI import declarations for every function it finds a prototype for. It may rarely happen, the header file has prototypes for functions not defined by the library the header is related to. As an example, consider the following short program printing "Hello World" using the low-level write function of the standard C POSIX library:

module Main where

import UNISTD_H

main = withCStringLen "Hello World\n" $ \hello -> do
  f_write (fromIntegral 0) (fst hello) (fromIntegral $ snd hello)

The prototype for write is contained in the unistd.h include file usually located in /usr/include. However if the program above is compiled and linked the "usual" way, two symbols remain unresolved by the linker: __ftruncate, and pthread_atfork *. As a result, the linker fails to create the executable file.

The possible workarounds are:

  • To find the library where missing symbols are defined, and specify it on the ghc command line to link the executable.
  • To tell the linker to ignore the unresolved symbols and build the executable no matter what.
  • To use the -split-objs option that ghc provides (Note: See also HsffigLinkageOptimization for more detailed discussion of object file splitting possibilities).

The first option provides the best consistency, but it may be hard sometimes to find the correct library, especially for such low-level functions perhaps not assumed to be called directly.

The second option is a "short-cut" indeed: it leaves the problem open. If the developer is cofident that none of the unresolved symbols will be used for a whole life cycle of the apllication being developed, this may be acceptable, but this should not be considered an universal solution.

The third option is kind of a compromise: it leads to creation of consistent executable (and even the file size may be reduced significantly), but it also complicates the compliation/linkage process.

When given an option, -split-objs, ghc invokes a special script on the assembly files generated during compilation which cuts them into large number of small pieces. For this example program, UNISTD_H.hs was split into more than 700 object files.

These object files are placed in the special subdirectory whose name is derived from the module name: e. g. UNISTD_H_split for UNISTD_H. This directory is not autocreated: explicit mkdir command is necessary. The next step is to create a static library containing all those object files (ar -cq). The library must be specified later on the ghc command line to link the executable.

To tell more, this applies to the situation when splitter is not invoked, and there remains only one Haskell module with FFI declarations. If the module is split (this is "source split", not "object split, one not to be confused with another), then it makes sense to apply -split-objs only to the submodule containing FFI import declarations for functions (i. e. whose name is postfixed with "_F", UNISTD_H_F in this example): structures/unions, constants, and enumerations are compile-time items not dealt with by the linker.

The bottom line: uniform processing of object files with ghc --make is no longer available. Even worse, --make does not recognize libraries as compilation targets: if the module Main imports UNISTD_H, and there is UNISTD_H.hs, and libUNISTD_H.a, but no UNISTD_H.o, ghc --make will try to compile UNISTD_H to UNISTD_H.o and will ignore existence of the library.

The Makefile for this example program is shown below. It features the second and the third ways to work around the unresolved symbols problem.

GCC = gcc
GHC = ghc
HSFFIGDIR = /path/to/installed/hsffig-1.0
HSFFIG = $(HSFFIGDIR)/bin/hsffig

# The command line to invoke hsc2hs:

HSC2HS = hsc2hs -t /dev/null

all: syscall

UNISTD_H.hsc: /usr/include/unistd.h
        $(GCC) -E -dD /usr/include/unistd.h | $(HSFFIG) > UNISTD_H.hsc

UNISTD_H.hs: UNISTD_H.hsc
        $(HSC2HS) UNISTD_H.hsc -o UNISTD_H.hs

# The target to build the split-objects library:

libUNISTD_H.a: UNISTD_H.hs
        -mkdir UNISTD_H_split
        $(GHC) -c -split-objs -package HSFFIG UNISTD_H.hs
        ar cq libUNISTD_H.a UNISTD_H_split/*.o

syscall: syscall.hs libUNISTD_H.a UNISTD_H_hsc.o
        $(GHC) -c UNISTD_H_hsc.c

# The second workaround: ignore unresolved symbols. The linker option to do this may depend on the version
# of binutils used. In this example, it is --noinhibit-exec. Note how -optl and -Wl are combined.

        $(GHC) -ffi -package HSFFIG -fglasgow-exts --make syscall.hs UNISTD_H_hsc.o \
                -optl -Wl,--warn-once -optl -Wl,--noinhibit-exec -o syscall-1

# The third workaround: link against the library. The --make option of ghc is not available.

        $(GHC) -package HSFFIG -fglasgow-exts syscall.hs UNISTD_H_hsc.o libUNISTD_H.a -o syscall-2

clean:
        rm -rf syscall-1 syscall-2 libUNISTD_H.a UNISTD_H* UNISTD_H_split *.o *.hi

  • Note: This is of course OS-dependent situation: the situation described was observed on Linux with glibc 2.1.x. This example is provided here to illustrate how unresolved symbols problem may be worked around.

DimitryGolubovsky