Foreign Function Interface (FFI)

From HaskellWiki
Revision as of 18:10, 13 February 2015 by Hsyl20 (talk | contribs) (First draft)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Introduction

The Foreign Function Interface (FFI) allows Haskell programs to cooperate with programs written with other languages. Haskell programs can call foreign functions and foreign functions can call back into Haskell code.

Compared to many other languages, Haskell FFI is very easy to use: in the most common case, you only have to translate the prototype of the foreign function into the equivalent Haskell prototype and you're done. For instance, to call the exponential function ("exp") of the libc, you only have to translate its prototype:

   double exp(double);

into the following Haskell code

   foreign import ccall "exp" c_exp :: Double -> Double

Now you can use the function "c_exp" just like any other Haskell function. When evaluated, it will call "exp".

Similarly, to export the following Haskell function:

   triple :: Int -> Int
   triple x = 3*x

so that it can be used in foreign codes, you only have to write:

   foreign export ccall triple :: Int -> Int

It can get at little more complicated depending on what you want to do, the function parameters, the foreign code you target, etc. This page is here to explain all of this to you.

Generalities

FFI extension

The Foreign Function Interface (FFI) is an extension to the Haskell standard. To use it, you need to enable it with the following compiler pragma at the beginning of your source file:

   {-# LANGUAGE ForeignFunctionInterface #-}


Calling conventions

When a program (in any language) is compiled into machine code, functions and procedures become labels: a label is a symbol (a string) associated to a position into the machine code. Calling a function only consists in putting parameters at appropriate places into memory and registers and then branching at the label position. The caller needs to know where to store parameters and the callee needs to know where to retrieve parameters from: there is a calling convention.

To interact with foreign code, you need to know the calling conventions that are used by the other language implementation on the given architecture. It can also depend on the operating system.

GHC supports standard calling conventions with the FFI: it can generate code to convert between its internal (non-standard) convention and the foreign one. If we consider the previous example:

   foreign import ccall "exp" c_exp :: Double -> Double

we see that the C calling convention ("ccall") is used. GHC will generate code to put (and to retrieve) parameters into memory and registers conforming to what is expected by a code generated with a C compiler (or any other compiler conforming to this convention).

Other available conventions supported by GHC include "stdcall" (i.e. Pascal convention).

Foreign types

Calling conventions depend on parameter types. For instance, floating-point values (Double, Float) may be passed into floating-point registers. Several values can be combined into a single vector register. And so on. As an example, in http://www.x86-64.org/documentation/abi.pdf you can find the algorithm describing how to pass parameters to functions on Linux on a x86-64 architecture depending on the types of the parameters.

Only some Haskell types can be directly used as parameters for foreign functions, because they correspond to basic types of low-level languages such as C and are used to define calling conventions.

According to [1], the type of a foreign function is a foreign type, that is a function type with zero or more arguments where:

  • the argument types can be marshallable foreign types, i.e. Char, Int, Double, Float, Bool, Int8, Int16, Int32, Int64, Word8, Word16, Word32, Word64, Ptr a, FunPtr a, StablePtr a or a renaming of any of these using newtype.
  • the return type is either a marshallable foreign type or has the form IO t where t is a marshallable foreign type or ().

Warning: GHC does not support passing structures as values yet.

The Foreign.C.Types module contains renaming of some of these marshallable foreign types with names closer to those of C types (e.g. CLong, CShort).

If the foreign function performs side-effects, you have to explicitly indicate it in its type (using IO). GHC has no other way to detect it.

   foreign import ccall "my_func" myFunc :: Int -> IO Double

Data structures have to passed by reference (using pointers). We will see how to use them later in this document.

Exported functions

GHC can generate wrappers so that a foreign code can call Haskell code:

   triple :: Int -> Int
   triple x = 3*x
   foreign export ccall triple :: Int -> Int

In the generated binary object, there will be a label "triple" that can be called from a language using the C convention.

Note that to call a Haskell function, the runtime system must have been initialized with a call to "hs_init". It must be released with a call to "hs_exit" when it is no longer required.

See the user guide for more details.

Function pointers

Sometimes you want to manipulate foreign pointers to foreign functions: these are FunPtr in Haskell.

You can get a function pointer by using "&" before a foreign function symbol:

   foreign import ccall "&exp" a_exp :: FunPtr (Double -> Double)

Some foreign functions can also return function pointers.

To call a function pointer from Haskell, GHC needs to convert between its own calling convention and the one expected by the foreign code. To create a function doing this conversion, you must use "dynamic" wrappers:

   foreign import ccall "dynamic" mkFun :: FunPtr (Double -> Double) -> (Double -> Double)
   

Then you can apply this wrapper to a FunPtr to get a Haskell function:

   c_exp :: Double -> Double
   c_exp = mkFun a_exp

You can also perform the opposite operation to give to the foreign code a pointer to a Haskell function. You need a "wrapper" wrapper. GHC generates the callable code to execute the wrapped Haskell closure with the appropriate calling convention and returns a pointer (FunPtr) on it. You have to release the generated code explicitly with `freeHaskellFunPtr` to avoid memory leaks: GHC has no way to know if the function pointer is still referenced in some foreign code, hence it doesn't collect it.

   add :: Int -> Int
   add = x+y
   
   foreign import ccall "wrapper" createAddPtr :: (Int -> Int) -> IO (FunPtr (Int -> Int))
   
   main = do
       addPtr <- createAddPtr add
       -- you can use addPtr like any other FunPtr (e.g. give it to foreign code)
       ...
       -- you MUST free the FunPtr, otherwise it won't be collected
       freeHaskellFunPtr addPtr

Marshalling data

In Haskell we are accustomed to let the runtime system -- especially the garbage collector -- manage memory. When we use the FFI, however, we sometimes need to do some manual memory management to comply with the data representations of the foreign codes. Hopefully, Haskell makes it very easy to manipulate low-level objects such as pointers. Moreover, many useful Haskell tools have been designed to simplify conversions between data representations.

Pointers

A pointer is an offset in memory. In Haskell, it is represented with the Ptr a data type. Where "a" is a phantom type that can be used to differentiate two pointers. You can think of "Ptr Stuff" as being equivalent to a "Stuff *" type in C (i.e. a pointer to a "Stuff" data). This analogy may not hold if "a" is a Haskell type not representable in the foreign language. For instance, you can have a pointer with the type "Ptr (Stuff -> OtherStuff)" but it is not function pointer in the foreign language: it is just a pointer tagged with the "Stuff -> OtherStuff" type.

You can easily cast between pointer types using `castPtr` or perform pointer arithmetic using `plusPtr`, `minusPtr` and `alignptr`. NULL pointer is represented with `nullPtr`.

Memory allocation

There are basically two ways to allocate memory:

The allocation is ephemeral: it lasts the time of the execution of an IO action, as in the following example:

   do
       allocaBytes 128 $ \ptr -> do
           -- do stuff with the pointer ptr...
           -- ...
           -- do not return "ptr" in any way because it will become an invalid pointer
       -- here the 128 bytes have been released and should not be accessed
  • on the "low-level" heap (the same as the runtime system uses), using `malloc*` functions in Foreign.Marshal.Alloc

Allocations on the low-level heap are not managed by the Haskell implementation and must be freed explicitly with `free`.

   do
       ptr <- mallocBytes 128
       -- do stuff with the pointer ptr...
       -- ...
       free ptr
       -- here the 128 bytes have been released and should not be accessed


Foreign Pointers

An hybrid approach is to use ForeignPtr. Foreign pointers are similar to Ptr except that they have finalizers (i.e. actions) attached to them. When the garbage collector detects that a ForeignPtr is no longer accessible, it executes its associated finalizers. A basic finalizer is `finalizerFree` [2] that calls `free` on the pointer.

You can convert a Ptr into a ForeignPtr using `newForeignPtr`, add additional finalizers, etc. [3].

In the following example, we use `mallocForeignPtrBytes`. It is equivalent to call `malloc` and then to associate the `finalizerFree` finalizer with `newForeignPtr`. GHC has optimized implementations for `mallocForeignPtr*` functions, hence they should be preferred.

   do
       ptr <- mallocForeignPtrBytes 128
       -- do stuff with the pointer ptr...
       -- ...
       --
       -- ptr is freed when it is collected

Using pointers: Storable instances

You often want to read or to write at the address a of pointer. Reading consists in obtaining a Haskell value from a pointer; writing consists in somehow writing a representation of the Haskell value at the pointed address. Writing and reading a value depends on the type of the value, hence these methods are encapsulated into the Storable type class.

For any type T such that it exists a Storable T instance, you can

  • use `peek :: Ptr T -> IO T` to read a value
  • use `poke :: Ptr T -> T -> IO ()` to write a value

`Storable a` also defines a `sizeOf :: a -> Int` method that returns the size of the stored value in bytes.

All the marshallable foreign types (i.e. basic types) have Storable instances. Hence we can use these to write new Storable instances for more involved data types. In the following example, we create a Storable instance for a Complex data type:

   data Complex = Complex Double Double
   instance Storable Complex where
       sizeOf _ = 2 * sizeOf (undefined :: Double) -- stored complex size = 2 * size of a stored Double
       peek ptr = do
           real <- peek ptr
           img  <- peekByteOff ptr (sizeOf real) -- we skip the bytes containing the real part
           return $ Complex real img
       poke ptr (Complex real img) = do
           poke ptr real
           pokeByteOff ptr (sizeOf real) img
       ...

This is not very complicated but it can become very cumbersome if our data type has many fields. Several tools have been developed to automatically or semi-automatically create the Storable instances.

Renaming and Storable instances

It is very common to use type renaming (i.e. newtype) to wrap a data type as in the following example, where we declare a type Pair that contains a pair of Double values just like our Complex type.

   newtype Pair = Pair Complex

If we want to store Pair values exactly like Complex values, we have several possibilities:

  • unwrap the Complex value each time we want to use its Storable instance
  • create a new Storable Pair instance that does the same thing
  • automatically derive the Storable Pair instance

The last solution is obviously the simplest one. It requires an extension however:

   {-# LANGUAGE GeneralizedNewtypeDeriving #-}
   newtype Pair = Pair Complex deriving (Storable)

Arrays

It is very common to read and to write arrays of values. Foreign.Marshal.Array provides many functions to deal with pointers to arrays. You can easily write an Haskell list of storable values as an array of values, and vice versa.

Strings

Strings in Haskell are lists of Char, where Char represents a unicode character. Many foreign codes use the C representation for strings (Foreign.C.String in Haskell): an array of bytes where each byte is a extended ASCII character terminated with a NUL character.

In Foreign.C.String, you have many functions to convert between both representations. Be careful because Unicode characters outside of the ASCII range may not be representable with the C representation.

   foreign import ccall "echo" c_echo :: CString -> IO ()
   
   echo :: String -> IO ()
   echo str = withCString str $ \str' ->
       c_echo str'

Data structures

Marshalling data structures of foreign languages is the most cumbersome task: you have to find out the offset of each field of the data structure (considering padding bytes, etc.). Hopefully, there are Haskell tools to help with this task.

Suppose you have a C data structure like this:

   struct MyStruct {
       double d;
       char c;
       int32_t i;
   };

And its Haskell counterpart:

   data MyStruct = MyStruct
           { d :: Double
           , c :: Word8
           , i :: Int32
           }

Here are the different ways to write the Storable instance for MyStruct.

The following header is implied:

   import Control.Applicative ((<$>), (<*>))
   import Foreign.Storable

The explicit way

Use offset explicitly

   instance Storable MyStruct where
       peek ptr = MyStruct
                     <$> peekByteOff ptr 0
                     <*> peekByteOff ptr 8
                     <*> peekByteOff ptr 12 -- skip padding bytes after "c"
       ...

hsc2hc

hsc2hs is a tool that can help you compute field offsets by using C headers directly.

Save your Haskell file with a .hsc extension to enable the support of hsc2hs.

   #include <myheader.h>
   
   instance Storable MyStruct where
       peek ptr = MyStruct
                     <$> (#peek MyStruct, d) ptr
                     <*> (#peek MyStruct, c) ptr
                     <*> (#peek MyStruct, i) ptr
       ...

c2hs

c2hs is another tool that can help you doing the same thing as hsc2hs for data structure marshalling. They have differences in other aspects though.

   #include <myheader.h>
   
   instance Storable MyStruct where
       peek ptr = MyStruct
                     <$> {#get MyStruct->d} ptr
                     <*> {#get MyStruct->c} ptr
                     <*> {#get MyStruct->i} ptr
       ...


c-storable-deriving

You can also derive the Storable instances from the types of the fields and their order in the Haskell data type by using c-storable-deriving package.

   {-# LANGUAGE DeriveGeneric #-}
   
   import GHC.Generics (Generic)
   import Foreign.CStorable
   
   data MyStruct = MyStruct {...} deriving (Generic)
   
   instance CStorable ModeStruct
   
   instance Storable ModeStruct where
       sizeOf = cSizeOf
       alignment = cAlignment
       poke = cPoke
       peek = cPeek

The CStorable type-class is equivalent to the Storable type-class but has additional default implementations for its methods if the type has an instance of Generic.

Pointers to Haskell data

In some cases, you may want to give to the foreign code an opaque reference to a Haskell value that you will retrieve later on. You need to be sure that the value is not collected between the time you give it and the time you retrieve it. Stable pointers have been created exactly to do this. You can wrap a value into a StablePtr and give it to the foreign code (StablePtr is one of the marshallable foreign types).

You need to manually free stable pointers using `freeStablePtr` when they are not required anymore.

Tools

There are several tools to help writing bindings using the FFI. In particular by using C headers.

Using C headers

Haskell

Dynamic function call

  • libffi: allow to call a C function without knowing its type at compile time.

Linking

There are several ways for GHC to find the foreign code to link with:

  • Static linking: the foreign code binary object is merged with the Haskell one to form the final executable
  • Dynamic linking: the generated binary uses libraries (e.g. .dll or .so) that are automatically linked when the binary is executed
  • Explicit dynamic linking: the Haskell code explicitly loads libraries, finds symbols in them and makes calls to them

The first two modes are well described in GHC and Cabal manuals. For the last one, you need to use platform dependent methods:

Explicit dynamic linking helps you obtaining function pointers (FunPtr). You need to write "dynamic" wrappers to call the functions from Haskell.

Dynamic linker template

dynamic-linker-template is a package that use template Haskell to automatically generate "dynamic" wrappers for explicit dynamic linking (only supporting Unix for now).

The idea is that a library is like a record containing functions, hence it is easy to generate the code that load symbols from a library and store them into a Haskell record.

In the following code, the record matching library symbols is the data type MyLib. The generated code will apply "myModifier" to each field name of the record to find corresponding symbols in the library. myModifier should often be "id" but it is sometimes useful when symbols are not pretty. Here in the foreign code "_v2" is appended at the end of each symbol to avoid symbol clashes with the first version of the library.

The package supports optional symbols: functions that may or may not be present in the library. These optional functions are represented by encapsulating the function type into Maybe.

The `libHandle` field is mandatory and contains a pointer to loaded library. You can use it to unload the library.

A function called `loadMyLib` is generated to load symbols from a library, wrap them using "dynamic" wrappers and store them into a MyLib value that is returned.

   {-# LANGUAGE TemplateHaskell, ForeignFunctionInterface #-}
   
   import System.Posix.DynamicLinker.Template
   
   data MyLib = MyLib {
       libHandle :: DL,
       thing1 :: Double -> IO Int,         -- Mandatory symbol
       thing2 :: Maybe (Int -> Int -> Int) -- Optional symbol
   }
   
   myModifier :: String -> String
   myModifier = (++ "_v2")
   
   $(makeDynamicLinker MyLib CCall 'myModifier)
   
   -- Load your library with:
   -- loadMyLib :: FilePath -> [RTLDFlags] -> IO MyLib

Enhancing performance and advanced topics

To enhance performance of a call to a foreign function, you first need to understand how GHC runtime system works. GHC uses user-space threads. It uses a set of system threads (called "Tasks"). Each system thread can execute a "capability" (i.e. a user-space thread manager). User-space threads are distributed on capabilities. Each capability executes its associated user-space threads, one at a time, using cooperative scheduling or preemption if necessary.

All the capabilities have to synchronize to perform garbage collection.

When a FFI call is made:

  • the user-space thread is suspended (indicating it is waiting for the result of a foreign call)
  • the current system thread executing the capability executing the user-space thread releases the capability
    • the capability can be picked up by another system thread
    • the user-space threads that are not suspended in the capability can be executed
    • garbage collection can occur
  • the system thread executes the FFI call
  • when the FFI call returns, the user-space thread is woken up

If there are too many blocked system threads, the runtime system can spawn new ones.

Unsafe calls

All the capability management before and after a FFI call adds some overhead. It is possible to avoid it in some cases by adding the "unsafe" keyword as in the following example:

   foreign import ccall unsafe "exp" c_exp :: Double -> Double

By doing this, the foreign code will be directly called but the capability won't be released by the system thread during the call. Here are the drawbacks of this approach:

  • if the foreign function blocks indefinitely, the other user-space threads of the capability won't be executed anymore (deadlock)
  • if the foreign code calls back into the Haskell code, a deadlock may occur
    • it may wait for a value produced by one of the locked user-space threads on the capability
    • there may not be enough capabilities to execute the code

Foreign PrimOps

If unsafe foreign calls are not fast enough for you, you can try the GHCForeignImportPrim extension.

   {-# LANGUAGE GHCForeignImportPrim,
                MagicHash,
                UnboxedTuples,
                UnliftedFFITypes #-}
   
   import GHC.Base
   import GHC.Int
   
   -- Primitive with type :: Int -> Int -> IO Int
   foreign import prim "my_primop_cmm" my_primop# :: Int# -> Int# -> State# RealWorld -> (# State# RealWorld, Int# #)
   
   my_primop :: Int64 -> Int64 -> IO Int64
   my_primop (I64# x) (I64# y) = IO $ \s ->
       case (my_primop# x y s) of (# s1, r #) -> (# s1, I64# r #)

Then you have to write you foreign function "my_primop_cmm" using C-- language used internally by GHC.

As an alternative, if you know how C-- is compiled on your architecture, you can write code in other languages. For instance directly in assembly or using C and LLVM.

Here is a comparison of the different approaches on a specific case.

Bound threads

Some foreign codes use (system) thread-local storage. Some others are not thread-safe. In both case, you have to be sure that the same system thread executes the FFI calls. To control how user-space threads are scheduled on system threads, GHC provide bound threads. Bound threads are user-space threads (Haskell threads) that can only be executed by a single system thread.

Note that bound threads are more expensive to schedule than normal threads. The first thread executing "main" is a bound thread.

Inline FFI calls

If you want to make a one-shot FFI call without the hassle of writing the foreign import, you can use the following technique (using Template Haskell).

In AddTopDecls.hs:

   {-# LANGUAGE TemplateHaskell #-}
    
   module AddTopDecls where
    
   import Language.Haskell.TH
   import Language.Haskell.TH.Syntax
    
   importDoubleToDouble :: String -> ExpQ
   importDoubleToDouble fname = do
       n <- newName fname
       d <- forImpD CCall unsafe fname n [t|Double -> Double|]
       addTopDecls [d]
       [|$(varE n)|]

In your module:

   {-# LANGUAGE TemplateHaskell #-}
    
   module Main where
    
   import Language.Haskell.TH
   import Language.Haskell.TH.Syntax
    
   import AddTopDecls
    
   main :: IO ()
   main = do
       print ($(importDoubleToDouble "sin") pi)
       print ($(importDoubleToDouble "cos") pi)

TODO

  • Foreign language specific issues
    • C++ symbol mangling
    • Embedded Objective C

References