Foreign Function Interface: Difference between revisions

From HaskellWiki
 
(One intermediate revision by the same user not shown)
Line 226: Line 226:
This is not very complicated but it can become very cumbersome if our data type has many fields. Several tools have been developed to automatically or semi-automatically create the Storable instances.
This is not very complicated but it can become very cumbersome if our data type has many fields. Several tools have been developed to automatically or semi-automatically create the Storable instances.


==== Renaming and Storable instances ====
==== Newtypes and Storable instances ====


It is very common to use type renaming (i.e. newtype) to wrap a data type as in the following example, where we declare a type Pair that contains a pair of Double values just like our Complex type.
It is very common to use <code>newtype</code> to rename types for some specific purpose. In the following example we declare the type Pair which contains a pair of Double values just like our Complex type from above.


<haskell>
<haskell>

Latest revision as of 16:46, 20 December 2023


WARNING: This article covers the ccall method, which is discouraged. Use capi instead (User's Guide)![1] (This just means replacing all uses of ccall with capi and adding {-# LANGUAGE CApiFFI #-} at the top of the file. The rest is still correct.)

Introduction

The Foreign Function Interface (FFI) allows Haskell programs to cooperate with programs written with other languages. Haskell programs can call foreign functions and foreign functions can call Haskell code.

Compared to many other languages, Haskell FFI is very easy to use: in the most common case, you only have to translate the prototype of the foreign function into the equivalent Haskell prototype and you're done. For instance, to call the exponential function ("exp") of the libc, you only have to translate its prototype:

    double exp(double);

into the following Haskell code

    --- warning: consider whether you really want to use ccall, see warning in beginning of article
    foreign import ccall "exp" c_exp :: Double -> Double

Now you can use the function "c_exp" just like any other Haskell function. When evaluated, it will call "exp".

Similarly, to export the following Haskell function:

    triple :: Int -> Int
    triple x = 3*x

so that it can be used in foreign codes, you only have to write:

    foreign export ccall triple :: Int -> Int

It can get at little more complicated depending on what you want to do, the function parameters, the foreign code you target, etc. This page is here to explain all of this to you.

Generalities

FFI extension

The Foreign Function Interface (FFI) is an extension to the Haskell standard. To use it, you need to enable it with the following compiler pragma at the beginning of your source file:

    {-# LANGUAGE ForeignFunctionInterface #-}

Calling conventions

When a program (in any language) is compiled into machine code, functions and procedures become labels: a label is a symbol (a string) associated to a position into the machine code. Calling a function only consists in putting parameters at appropriate places into memory and registers and then branching at the label position. The caller needs to know where to store parameters and the callee needs to know where to retrieve parameters from: there is a calling convention.

To interact with foreign code, you need to know the calling conventions that are used by the other language implementation on the given architecture. It can also depend on the operating system.

GHC supports standard calling conventions with the FFI: it can generate code to convert between its internal (non-standard) convention and the foreign one. If we consider the previous example:

    foreign import ccall "exp" c_exp :: Double -> Double

we see that the C calling convention ("ccall") is used. GHC will generate code to put (and to retrieve) parameters into memory and registers conforming to what is expected by a code generated with a C compiler (or any other compiler conforming to this convention).

Other available conventions supported by GHC include "stdcall" (i.e. Pascal convention).

Foreign types

Calling conventions depend on parameter types. For instance, floating-point values (Double, Float) may be passed into floating-point registers. Several values can be combined into a single vector register. And so on. As an example, in http://www.x86-64.org/documentation/abi.pdf you can find the algorithm describing how to pass parameters to functions on Linux on a x86-64 architecture depending on the types of the parameters.

Only some Haskell types can be directly used as parameters for foreign functions, because they correspond to basic types of low-level languages such as C and are used to define calling conventions.

According to [1], the type of a foreign function is a foreign type, that is a function type with zero or more arguments where:

  • the argument types can be marshallable foreign types, i.e. Char, Int, Double, Float, Bool, Int8, Int16, Int32, Int64, Word8, Word16, Word32, Word64, Ptr a, FunPtr a, StablePtr a or a renaming of any of these using newtype.
  • the return type is either a marshallable foreign type or has the form IO t where t is a marshallable foreign type or ().

Warning: GHC does not support passing structures as values yet.

The Foreign.C.Types module contains renaming of some of these marshallable foreign types with names closer to those of C types (e.g. CLong, CShort).

If the foreign function performs side-effects, you have to explicitly indicate it in its type (using IO). GHC has no other way to detect it.

    foreign import ccall "my_func" myFunc :: Int -> IO Double

Data structures have to passed by reference (using pointers). We will see how to use them later in this document.

Exported functions

GHC can generate wrappers so that a foreign code can call Haskell code:

    triple :: Int -> Int
    triple x = 3*x

    foreign export ccall triple :: Int -> Int

In the generated binary object, there will be a label "triple" that can be called from a language using the C convention.

Note that to call a Haskell function, the runtime system must have been initialized with a call to "hs_init". It must be released with a call to "hs_exit" when it is no longer required.

The GHC Users Guide has a section on the Foreign Function Interface.

Function pointers

Sometimes you want to manipulate foreign pointers to foreign functions: these are FunPtr in Haskell.

You can get a function pointer by using "&" before a foreign function symbol:

    foreign import ccall "&exp" a_exp :: FunPtr (Double -> Double)

Some foreign functions can also return function pointers.

To call a function pointer from Haskell, GHC needs to convert between its own calling convention and the one expected by the foreign code. To create a function doing this conversion, you must use "dynamic" wrappers:

    foreign import ccall "dynamic" mkFun :: FunPtr (Double -> Double) -> (Double -> Double)

Then you can apply this wrapper to a FunPtr to get a Haskell function:

    c_exp :: Double -> Double
    c_exp = mkFun a_exp

You can also perform the opposite operation to give to the foreign code a pointer to a Haskell function. You need a "wrapper" wrapper. GHC generates the callable code to execute the wrapped Haskell closure with the appropriate calling convention and returns a pointer (FunPtr) on it. You have to release the generated code explicitly with `freeHaskellFunPtr` to avoid memory leaks: GHC has no way to know if the function pointer is still referenced in some foreign code, hence it doesn't collect it.

    add :: Int -> Int
    add = x+y
    
    foreign import ccall "wrapper" createAddPtr :: (Int -> Int) -> IO (FunPtr (Int -> Int))
    
    main = do
        addPtr <- createAddPtr add
        -- you can use addPtr like any other FunPtr (e.g. give it to foreign code)
        ...
        -- you MUST free the FunPtr, otherwise it won't be collected
        freeHaskellFunPtr addPtr

Marshalling data

In Haskell we are accustomed to let the runtime system -- especially the garbage collector -- manage memory. When we use the FFI, however, we sometimes need to do some manual memory management to comply with the data representations of the foreign codes. Hopefully, Haskell makes it very easy to manipulate low-level objects such as pointers. Moreover, many useful Haskell tools have been designed to simplify conversions between data representations.

Pointers

A pointer is an offset in memory. In Haskell, it is represented with the Ptr a data type. Where "a" is a phantom type that can be used to differentiate two pointers. You can think of "Ptr Stuff" as being equivalent to a "Stuff *" type in C (i.e. a pointer to a "Stuff" data). This analogy may not hold if "a" is a Haskell type not representable in the foreign language. For instance, you can have a pointer with the type "Ptr (Stuff -> OtherStuff)" but it is not function pointer in the foreign language: it is just a pointer tagged with the "Stuff -> OtherStuff" type.

You can easily cast between pointer types using `castPtr` or perform pointer arithmetic using `plusPtr`, `minusPtr` and `alignPtr`. NULL pointer is represented with `nullPtr`.

Memory allocation

There are basically two ways to allocate memory:

The allocation is ephemeral: it lasts the time of the execution of an IO action, as in the following example:

    do
        allocaBytes 128 $ \ptr -> do
            -- do stuff with the pointer ptr...
            -- ...
            -- do not return "ptr" in any way because it will become an invalid pointer
        -- here the 128 bytes have been released and should not be accessed
  • on the "low-level" heap (the same as the runtime system uses), using `malloc*` functions in Foreign.Marshal.Alloc

Allocations on the low-level heap are not managed by the Haskell implementation and must be freed explicitly with `free`.

    do
        ptr <- mallocBytes 128
        -- do stuff with the pointer ptr...
        -- ...
        free ptr
        -- here the 128 bytes have been released and should not be accessed

Foreign Pointers

An hybrid approach is to use ForeignPtr. Foreign pointers are similar to Ptr except that they have finalizers (i.e. actions) attached to them. When the garbage collector detects that a ForeignPtr is no longer accessible, it executes its associated finalizers. A basic finalizer is `finalizerFree` [2] that calls `free` on the pointer.

You can convert a Ptr into a ForeignPtr using `newForeignPtr`, add additional finalizers, etc. [3].

In the following example, we use `mallocForeignPtrBytes`. It is equivalent to call `malloc` and then to associate the `finalizerFree` finalizer with `newForeignPtr`. GHC has optimized implementations for `mallocForeignPtr*` functions, hence they should be preferred.

    do
        ptr <- mallocForeignPtrBytes 128
        -- do stuff with the pointer ptr...
        -- ...
        --
        -- ptr is freed when it is collected

Using pointers: Storable instances

You often want to read or to write at the address a of pointer. Reading consists in obtaining a Haskell value from a pointer; writing consists in somehow writing a representation of the Haskell value at the pointed address. Writing and reading a value depends on the type of the value, hence these methods are encapsulated into the Storable type class.

For any type T such that it exists a Storable T instance:

  • you can read a value, using
    peek :: Ptr T -> IO T
  • you can write a value, using
    poke :: Ptr T -> T -> IO ()

`Storable a` also defines a `sizeOf :: a -> Int` method that returns the size of the stored value in bytes.

All the marshallable foreign types (i.e. basic types) have Storable instances. Hence we can use these to write new Storable instances for more involved data types. In the following example, we create a Storable instance for a Complex data type:

    data Complex = Complex Double Double

    instance Storable Complex where
        sizeOf _ = 2 * sizeOf (undefined :: Double) -- stored complex size = 2 * size of a stored Double
        peek ptr = do
            real <- peek ptr
            img  <- peekByteOff ptr (sizeOf real) -- we skip the bytes containing the real part
            return $ Complex real img
        poke ptr (Complex real img) = do
            poke ptr real
            pokeByteOff ptr (sizeOf real) img
        ...

This is not very complicated but it can become very cumbersome if our data type has many fields. Several tools have been developed to automatically or semi-automatically create the Storable instances.

Newtypes and Storable instances

It is very common to use newtype to rename types for some specific purpose. In the following example we declare the type Pair which contains a pair of Double values just like our Complex type from above.

    newtype Pair = Pair Complex

If we want to store Pair values exactly like Complex values, we have several possibilities:

  • unwrap the Complex value each time we want to use its Storable instance
  • create a new Storable Pair instance that does the same thing
  • automatically derive the Storable Pair instance

The last solution is obviously the simplest one. It requires an extension however:

    {-# LANGUAGE GeneralizedNewtypeDeriving #-}

    newtype Pair = Pair Complex deriving (Storable)

Arrays

It is very common to read and to write arrays of values. Foreign.Marshal.Array provides many functions to deal with pointers to arrays. You can easily write an Haskell list of storable values as an array of values, and vice versa.

Strings

Strings in Haskell are lists of Char, where Char represents a unicode character. Many foreign codes use the C representation for strings (CString in Haskell): an array of bytes where each byte is a extended ASCII character terminated with a NUL character.

In Foreign.C.String, you have many functions to convert between both representations. Be careful because Unicode characters outside of the ASCII range may not be representable with the C representation.

    foreign import ccall "echo" c_echo :: CString -> IO ()
    
    echo :: String -> IO ()
    echo str = withCString str $ \c_str ->
        c_echo c_str

Data structures

Marshalling data structures of foreign languages is the most cumbersome task: you have to find out the offset of each field of the data structure (considering padding bytes, etc.). Hopefully, there are Haskell tools to help with this task.

Suppose you have a C data structure like this:

   struct MyStruct {
       double d;
       char c;
       int32_t i;
   };

And its Haskell counterpart:

    data MyStruct = MyStruct
            { d :: Double
            , c :: Word8
            , i :: Int32
            }

The following sub-sections present the different ways to write the Storable instance for MyStruct.

The following header is implied:

    import Control.Applicative ((<$>), (<*>))
    import Foreign.Storable

The explicit way

    instance Storable MyStruct where
        alignment _ = 8
        sizeOf _    = 16
        peek ptr    = MyStruct
            <$> peekByteOff ptr 0
            <*> peekByteOff ptr 8
            <*> peekByteOff ptr 12 -- skip padding bytes after "c"
        poke ptr (MyStruct d c i) = do
            pokeByteOff ptr 0 d
            pokeByteOff ptr 8 c
            pokeByteOff ptr 12 i
  • The structure alignment is the least common multiple of the alignments of the structure fields. The alignment of primitive types is equal to their size in bytes (e.g. 8 for Double, 1 for Word8 and 4 for Word32). Hence the alignment for MyStruct is 8.
  • We indicate the offset of each field explicitly for peek and poke methods. We introduce padding bytes to align the "i" field (Word32) on 4 bytes. A C compiler does the same thing (except for packed structures).
  • The size of the structure is the total number of bytes, including padding bytes between fields.

hsc2hs

hsc2hs is a tool that can help you compute field offsets by using C headers directly.

Save your Haskell file with a .hsc extension to enable the support of hsc2hs.

    #include <myheader.h>
    
    instance Storable MyStruct where
        peek ptr = MyStruct
                      <$> (#peek MyStruct, d) ptr
                      <*> (#peek MyStruct, c) ptr
                      <*> (#peek MyStruct, i) ptr
        ...

c2hs

c2hs is another tool that can help you doing the same thing as hsc2hs for data structure marshalling. They have differences in other aspects though.

    #include <myheader.h>
    
    instance Storable MyStruct where
        peek ptr = MyStruct
                      <$> {#get MyStruct->d} ptr
                      <*> {#get MyStruct->c} ptr
                      <*> {#get MyStruct->i} ptr
        ...


c-storable-deriving

You can also derive the Storable instances from the types of the fields and their order in the Haskell data type by using c-storable-deriving package.

    {-# LANGUAGE DeriveGeneric #-}
    
    import GHC.Generics (Generic)
    import Foreign.CStorable
    
    data MyStruct = MyStruct {...} deriving (Generic)
    
    instance CStorable MyStruct
    
    instance Storable MyStruct where
        sizeOf = cSizeOf
        alignment = cAlignment
        poke = cPoke
        peek = cPeek

The CStorable type-class is equivalent to the Storable type-class but has additional default implementations for its methods if the type has an instance of Generic.

Pointers to Haskell data

In some cases, you may want to give to the foreign code an opaque reference to a Haskell value that you will retrieve later on. You need to be sure that the value is not collected between the time you give it and the time you retrieve it. Stable pointers have been created exactly to do this. You can wrap a value into a StablePtr and give it to the foreign code (StablePtr is one of the marshallable foreign types).

You need to manually free stable pointers using `freeStablePtr` when they are not required anymore.

Tools

There are several tools to help writing bindings using the FFI. In particular by using C headers.

Using C headers

Haskell

Dynamic function call

  • libffi: allow to call a C function without knowing its type at compile time.

Linking

There are several ways for GHC to find the foreign code to link with:

  • Static linking: the foreign code binary object is merged with the Haskell one to form the final executable
  • Dynamic linking: the generated binary uses libraries (e.g. .dll or .so) that are automatically linked when the binary is executed
  • Explicit dynamic linking: the Haskell code explicitly loads libraries, finds symbols in them and makes calls to them

The first two modes are well described in GHC and Cabal manuals. For the last one, you need to use platform dependent methods:

Explicit dynamic linking helps you obtaining function pointers (FunPtr). You need to write "dynamic" wrappers to call the functions from Haskell.

Dynamic linker template

dynamic-linker-template is a package that uses template Haskell to automatically generate "dynamic" wrappers for explicit dynamic linking (only supporting Unix for now).

The idea is that a library is like a record containing functions, hence it is easy to generate the code that load symbols from a library and store them into a Haskell record.

In the following code, the record matching library symbols is the data type MyLib. The generated code will apply "myModifier" to each field name of the record to find corresponding symbols in the library. myModifier should often be "id" but it is sometimes useful when symbols are not pretty. Here in the foreign code "_v2" is appended at the end of each symbol to avoid symbol clashes with the first version of the library.

The package supports optional symbols: functions that may or may not be present in the library. These optional functions are represented by encapsulating the function type into Maybe.

The `libHandle` field is mandatory and contains a pointer to the loaded library. You can use it to unload the library.

A function called `loadMyLib` is generated to load symbols from a library, wrap them using "dynamic" wrappers and store them into a MyLib value that is returned.

    {-# LANGUAGE TemplateHaskell, ForeignFunctionInterface #-}
    
    import System.Posix.DynamicLinker.Template
    
    data MyLib = MyLib {
        libHandle :: DL,
        thing1 :: Double -> IO Int,         -- Mandatory symbol
        thing2 :: Maybe (Int -> Int -> Int) -- Optional symbol
    }
    
    myModifier :: String -> String
    myModifier = (++ "_v2")
    
    $(makeDynamicLinker ''MyLib CCall 'myModifier)
    
    -- Load your library with:
    -- loadMyLib :: FilePath -> [RTLDFlags] -> IO MyLib

Enhancing performance and advanced topics

To enhance performance of a call to a foreign function, you first need to understand how GHC runtime system works. GHC uses user-space threads. It uses a set of system threads (called "Tasks"). Each system thread can execute a "capability" (i.e. a user-space thread manager). User-space threads are distributed on capabilities. Each capability executes its associated user-space threads, one at a time, using cooperative scheduling or preemption if necessary.

All the capabilities have to synchronize to perform garbage collection.

When a FFI call is made:

  • the user-space thread is suspended (indicating it is waiting for the result of a foreign call)
  • the current system thread executing the capability executing the user-space thread releases the capability
    • the capability can be picked up by another system thread
    • the user-space threads that are not suspended in the capability can be executed
    • garbage collection can occur
  • the system thread executes the FFI call
  • when the FFI call returns, the user-space thread is woken up

If there are too many blocked system threads, the runtime system can spawn new ones.

Unsafe calls

All the capability management before and after a FFI call adds some overhead. It is possible to avoid it in some cases by adding the "unsafe" keyword as in the following example:

    foreign import ccall unsafe "exp" c_exp :: Double -> Double

By doing this, the foreign code will be directly called but the capability won't be released by the system thread during the call. Here are the drawbacks of this approach:

  • if the foreign function blocks indefinitely, the other user-space threads of the capability won't be executed anymore (deadlock)
  • if the foreign code calls back into the Haskell code, a deadlock may occur
    • it may wait for a value produced by one of the locked user-space threads on the capability
    • there may not be enough capabilities to execute the code

Foreign PrimOps

If unsafe foreign calls are not fast enough for you, you can try the GHCForeignImportPrim extension.

    {-# LANGUAGE GHCForeignImportPrim,
                 MagicHash,
                 UnboxedTuples,
                 UnliftedFFITypes #-}
    
    import GHC.Base
    import GHC.Int
    
    -- Primitive with type :: Int -> Int -> IO Int
    foreign import prim "my_primop_cmm"
        my_primop# :: Int# -> Int# -> State# RealWorld -> (# State# RealWorld, Int# #)
    
    my_primop :: Int64 -> Int64 -> IO Int64
    my_primop (I64# x) (I64# y) = IO $ \s ->
        case (my_primop# x y s) of (# s1, r #) -> (# s1, I64# r #)

Then you have to write your foreign function "my_primop_cmm" using C-- language used internally by GHC.

As an alternative, if you know how C-- is compiled on your architecture, you can write code in other languages. For instance directly in assembly or using C and LLVM.

Here is a comparison of the different approaches on a specific case.

Bound threads

Some foreign codes use (system) thread-local storage. Some others are not thread-safe. In both case, you have to be sure that the same system thread executes the FFI calls. To control how user-space threads are scheduled on system threads, GHC provide bound threads. Bound threads are user-space threads (Haskell threads) that can only be executed by a single system thread.

Note that bound threads are more expensive to schedule than normal threads. The first thread executing "main" is a bound thread.

Inline FFI calls

If you want to make a one-shot FFI call without the hassle of writing the foreign import, you can use the following technique (using Template Haskell).

In AddTopDecls.hs:

    {-# LANGUAGE TemplateHaskell #-}
     
    module AddTopDecls where
     
    import Language.Haskell.TH
    import Language.Haskell.TH.Syntax
     
    importDoubleToDouble :: String -> ExpQ
    importDoubleToDouble fname = do
        n <- newName fname
        d <- forImpD CCall unsafe fname n [t|Double -> Double|]
        addTopDecls [d]
        [|$(varE n)|]

In your module:

    {-# LANGUAGE TemplateHaskell #-}
     
    module Main where
     
    import Language.Haskell.TH
    import Language.Haskell.TH.Syntax
     
    import AddTopDecls
     
    main :: IO ()
    main = do
        print ($(importDoubleToDouble "sin") pi)
        print ($(importDoubleToDouble "cos") pi)

History

Header inclusion

In old versions of GHC (6.8.3 and earlier), the compiler was able to check the prototypes of the foreign imports by including C header files into the generated C code. For instance, you could write:

{-# INCLUDE <math.h> #-}

or

foreign import ccall "math.h sin" c_sin :: Double -> Double

to include the "math.h" header.

This is deprecated in GHC. Nevertheless you may still find examples using this syntax so it is good to know that it has been used. Moreover, other compilers may still use this feature.

"C functions are normally declared using prototypes in a C header file. Earlier versions of GHC (6.8.3 and earlier) #included the header file in the C source file generated from the Haskell code, and the C compiler could therefore check that the C function being called via the FFI was being called at the right type.

GHC no longer includes external header files when compiling via C, so this checking is not performed. The change was made for compatibility with the native code backend (-fasm) and to comply strictly with the FFI specification, which requires that FFI calls are not subject to macro expansion and other CPP conversions that may be applied when using C header files. This approach also simplifies the inlining of foreign calls across module and package boundaries: there's no need for the header file to be available when compiling an inlined version of a foreign call, so the compiler is free to inline foreign calls in any context.

The -#include option is now deprecated, and the include-files field in a Cabal package specification is ignored."


References

  1. Oleg Grenrus/phadej on capi vs ccall: haskell-cafe post from June 2021


Related links

  • Ivory: EDSL for writing safer low-level C.

Old links

Select one of the following links for more information:

Blog articles

Example libraries

TODO

  • Fix References section
  • Foreign language specific issues
    • C++ symbol mangling
    • Embedded Objective C
  • Precision
The Haskell report only guarantees that Int has 30 bits of signed precision, so converting CInt to Int is not safe! On the other hand, many classes have instances for Int and Integer but not CInt, so it's generally more convenient to convert from the C types. To convert, I suppose you could either write a checkedFromIntegral function if you're sure it's small or just use Integer.
  • Fix [4]
    • One page per tool?
    • Links to external tool specific tutorials