FFI cook book

From HaskellWiki
Jump to navigation Jump to search

This attempts to be a guide/tutorial/cookbook approach to writing a library using external (FFI) functions. Some people complain that cookbook approaches encourage a lack of thinking; that may be so, but they also help novices get started faster. Being a little hard of thinking myself, I would have been grateful for something like this when I was getting started. The FFI spec, while valuable, is not a tutorial.

This guide contains examples and lessons accumulated writing an FFI binding to the Oracle DBMS OCI (Oracle Call Interface), a low-level C library.

My FFI library code tends to look like imperative code written in Haskell. I guess we should expect this to some extent when dealing with external libraries, although it might be better (for me) to explore more functional alternatives. (However, Haskell also seems to be quite a good language for writing imperative code in.)

-- AlistairBayley

These libraries are useful for memory management, and using C pointers.

Contains peek, poke, peekByteOff, pokeByteOff, etc:

http://www.haskell.org/ghc/docs/latest/html/libraries/base/Foreign-Storable.html

Contains alloca, malloc, free, etc:

http://www.haskell.org/ghc/docs/latest/html/libraries/base/Foreign-Marshal-Alloc.html

Calling C functions

Passing opaque structures/types

Problem: A C function creates an opaque structure, which I must later pass to other C functions. What type should I use?

Solution: Create a datatype to represent the opaque structure. Note that the C functions expect a pointer to the structure, so I've created a type synonym called OCIHandle for these.

> data OCIStruct = OCIStruct
> type OCIHandle = Ptr OCIStruct

GHC allows this constructor-less version (with EmptyDataDecls):

> data OCIStruct

this would also work, if only for the lesser lines of code you'd need to type:

--no need for the data declaration
type OCIStruct = Ptr ()

i don't know if there are any side effects to this but it works fine for me so far -- eyan at eyan dot org

The side-effect I wanted to avoid was using the wrong pointer at the wrong time. Consider:

type EnvStruct = Ptr ()
type EnvHandle = Ptr EnvStruct
type ErrorStruct = Ptr ()
type ErrorHandle = Ptr ErrorStruct

ErrorHandle and EnvHandle have the same type i.e. you can use one where you would use the other. I would rather use different datatypes so the compiler can help me catch these type errors. Better would be:

data EnvStruct = EnvStruct
type EnvHandle = Ptr EnvStruct
data ErrorStruct = ErrorStruct
type ErrorHandle = Ptr ErrorStruct

-- AlistairBayley

There is also a third style, which uses newtype:

newtype EnvHandle = EnvHandle (Ptr EnvHandle)

This style is notably supported by c2hs {#pointer *GtkObject as Object newtype#}. However, it is annoying to use along with ForeignPtr, because the functions that let you treat ForeignPtrs as Ptrs will mismatch with your newtype, and you will need to manually wrap the value before you ship it off to the FFI itself, and unwrap the value if you want to use it with a function that use Ptr Foo.

You can also use something like Ptr a when functions expect void*, and use that to enforce type consistency. For instance, you have a function that expects a pointer to a buffer and returns the old one:

foreign import ccall "name_of_function" function
 :: Ptr a -> IO (Ptr a)

Passing pointer-to-pointer-to-thing

Problem: C function takes a pointer-to-a-pointer argument, which is modified to point to some newly allocated structure or value. The return value of the C function is a success-or-failure code (int). So we effectively have parameters which are in-out. How do you wrap these in Haskell functions that return the actual structure (and raise an exception on failure)?

Solutions:

Single argument case

If the function only modifies one of its arguments, then use code like this:

OCIHandle is a synonym for Ptr OCIStruct, so the second argument to 
OCIHandleAlloc has type Ptr Ptr OCIStruct. The C signature for OCIHandleAlloc
describes the second argument as **void, i.e. a pointer to a pointer to something.

> foreign import ccall "oci.h OCIHandleAlloc" ociHandleAlloc ::
>   OCIHandle -> Ptr OCIHandle -> CInt -> CInt -> Ptr a -> IO CInt
>
> handleAlloc :: CInt -> OCIHandle -> IO OCIHandle
> handleAlloc handleType env = alloca $ \ptr -> do
>   rc <- ociHandleAlloc env ptr handleType 0 nullPtr
>   if rc < 0
>     then throwOCI (OCIException rc "allocate handle")
>     else peek ptr

(Shouldn't that be "... else peek ptr"?)

(yes, it should. Fixed)

Here, memory is allocated for ptr, and then it is passed to the foreign function. alloca is prefered because it frees the memory for ptr when the function exits, or when an exception is raised. We use peek to get at the value returned. alloca takes an IO action which takes a single argument: the newly allocated ptr. We use a lambda expression here to create an anonymous function (actually an IO action).


Multiple argument case

If the function modifies more than one of its arguments, then things get a little more complex. In this case we have to allocate the memory for the arguments (again, using the alloca* family of functions), call the C function, and extract the values. In this example the ociErrorGet function modifies the third and fourth args (int and string respectively). I've chosen an arbitrary size for the buffer for the string: 1000 bytes.

> getOCIErrorMsg2 :: OCIHandle -> CInt -> Ptr CInt -> CString -> CInt -> IO (CInt, String)
> getOCIErrorMsg2 ocihandle handleType errCodePtr errMsgBuf maxErrMsgLen = do
>   rc <- ociErrorGet ocihandle 1 0 errCodePtr errMsgBuf maxErrMsgLen handleType
>   if rc < 0
>     then return (0, "Error message not available.")
>     else do
>       msg <- peekCString errMsg
>       e <- peek errCode
>       return (e, msg)
>
> getOCIErrorMsg :: OCIHandle -> CInt -> IO (CInt, String)
> getOCIErrorMsg ocihandle handleType = do
>   let stringBufferLen = 1000
>   allocaBytes stringBufferLen $ \errMsg ->
>     alloca $ \errCode ->
>     getOCIErrorMsg2 ocihandle handleType errCode errMsg (mkCInt stringBufferLen)

(Thanks to Udo Stenzel for tips for avoiding memory leaks.)

Passing strings

Problem: C function expects strings with lengths, where each string (char*) is followed by an int stating how long it is.

Solution: Convert Haskell Strings to CStringLens, and pull CStringLens apart with utility functions. A CStringLen is just a (CString, Int) pair. Would it have been better to make CStringLen a (CString, CInt) pair?

> mkCInt :: Int -> CInt
> mkCInt n = fromIntegral n
> cStrLen :: CStringLen -> CInt
> cStrLen = mkCInt . snd
> cStr :: CStringLen -> CString
> cStr = fst
>
> dbLogon :: String -> String -> String -> EnvHandle -> ErrorHandle -> IO ConnHandle
> dbLogon user pswd db env err =
>   withCStringLen user $ \userC ->
>   withCStringLen pswd $ \pswdC ->
>   withCStringLen db   $ \dbC ->
>   alloca $ \conn -> do
>     rc <- ociLogon env err conn (cStr userC) (cStrLen userC) (cStr pswdC) (cStrLen pswdC) (cStr dbC) (cStrLen dbC)
>     case () of
>       _ | rc == oci_SUCCESS_WITH_INFO -> testForErrorWithPtr oci_ERROR "logon" conn
>         | otherwise -> testForErrorWithPtr rc "logon" conn

Working with structs

Please check me

For example suppose you had a struct like:

typedef struct {
  int a;
  char b;
  double c;
} my_struct;

And you wanted to write a storable instance. Here pre-processing with hsc2hs tool can be helpfull. You would write it like this:

instance Storable Struct where
  alignment _ = #{alignment my_struct}
  sizeOf _ = #{size my_struct}
  peek ptr = do
    a <- #{peek my_struct, a} ptr
    b <- #{peek my_struct, b} ptr
    c <- #{peek my_struct, c} ptr
    return (MyStruct a, b, c)
  poke ptr (MyStruct a b c) = do
    #{poke my_struct, a} ptr a
    #{poke my_struct, b} ptr b
    #{poke my_struct, c} ptr c

Note that the #{alignment foo} syntax is not currently built-in to hsc2hs. You have to add the following line to your haskell source file to add the alignment syntax:

#let alignment t = "%lu", (unsigned long)offsetof(struct {char x__; t (y__); }, y__)

With a string field (struct { char c_string_field[MAX_LEN]; }) you would have to do it a little different. For example in peek:

  peek ptr = do
    s <- peekCString $ #{ptr c_type,c_string_field} ptr
    return (Foo s)

Then in poke you would do:

  poke ptr (Foo s) = do
    withCStringLen (take maxLen value) $ uncurry (copyArray $ #{ptr
c_type,c_string_field} ptr)
    where maxLen = #{const MAX_LEN}

Note the use of #{ptr} instead of #{peek} since we want the address of the c_string_field rather than it's value. Unfortunately for the "struct { char *c_string_field; }" style there is no good general solution because you have to worry about allocating memory to have c_string_field point to.


Calling C functions in DLL

by Ronald Guida, hCafe, 12 sep 2007

1. I can leave "test_foo.lhs" and "foo.cpp" as-is:

foo.cpp:

#include "foo.h"

__stdcall int foo(int x)
{
  return 3 * x + 1;
}

test_foo.lhs:

 > {-# OPTIONS_GHC -fglasgow-exts #-}
 > module Main
 >     where
 > import Foreign
 > import Foreign.C

 > foreign import stdcall unsafe "foo"
 >   c_Foo :: Word32 -> IO Word32

 > main = do
 >   putStrLn "Entering main"
 >   let x = 7::Word32
 >   y <- c_Foo(x)
 >   putStrLn $ "y = " ++ show y
 >   putStrLn "Exiting main"

2. I need to change "foo.h" to the following:

#if BUILD_DLL
#define DECLSPEC __declspec(dllexport)
#else
#define DECLSPEC __declspec(dllimport)
#endif

extern "C"
{
  DECLSPEC __stdcall int foo(int x);
}

3. I need to create a "foo.def" file and list the functions to be exported in a DLL:

LIBRARY foo
DESCRIPTION "Foo Library"
EXPORTS
   foo
Note: The library name on the first line must match the dll name. "LIBRARY foo" corresponds to "foo.dll"

4. The build process is as follows.

(1) gcc -DBUILD_DLL -c foo.cpp
(2) gcc -shared -o foo.dll foo.o foo.def \
   -Wl,--enable-stdcall-fixup,--out-implib,libfoo.a

5. At this point, I'll have "foo.dll" and "libfoo.a". I can load my

  "foo" library, as a DLL, into GHCi with the command:
    $ ghci -lfoo
  In reality, I would use:
    $ ghci test_foo.lhs -lfoo

6. Once I'm satisfied and ready to compile:

   ghc --make test_foo.lhs -L. -lfoo
Notes:
(1) "-L." directs GHC to look in the current directory for
    libraries.  GHCi seems to look there by default.
(2) The resulting "test_foo.exe" will dynamicly load "foo.dll".

7. If I want a staticly linked executable instead:

   ar rcs libfoo_static.a foo.o
   ghc --make test_foo.lhs -L. -lfoo_static

8. Finally, I can put the build steps into a Makefile:

# Makefile for foo

test_foo.exe : test_foo.lhs libfoo.a foo.dll
    ghc --make test_foo.lhs -L. -lfoo

test_foo_static.exe : test_foo.lhs libfoo_static.a
    ghc --make test_foo.lhs -L. -lfoo_static -o test_foo_static.exe

libfoo.a : foo.dll

foo.dll : foo.o foo.def
    gcc -shared -o foo.dll foo.o foo.def \
    -Wl,--enable-stdcall-fixup,--out-implib,libfoo.a

libfoo_static.a : foo.o
    ar rcs libfoo_static.a foo.o

foo.o : foo.cpp foo.h
    gcc -DBUILD_DLL -c foo.cpp

.PHONY: clean
clean:
    rm -f *.[oa]
    rm -f *.dll
    rm -f *.hi
    rm -f *.exe


Calling C++ functions in a Visual Studio DLL from Haskell, and Haskell functions from the C++ DLL

This material is placed on the separate page


Raising and handling exceptions

Follow the advice for Dynamic Exceptions, in: http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Exception.html#g:10

Create your own exceptions, and your own throw and catch functions. This makes it easier to trap only exceptions raised by your code.

> data OCIException = OCIException Int String deriving (Typeable, Show)
>
> catchOCI :: IO a -> (OCIException -> IO a) -> IO a
> catchOCI = catchDyn
> throwOCI :: OCIException -> a
> throwOCI = throwDyn

If we can't derive Typeable then the following code should do the trick:

> -- replaces:
> data OCIException = OCIException CInt String deriving (Show)
> ociExceptionTc :: TyCon
> ociExceptionTc = mkTyCon "Database.Oracle.OciFunctions.OCIException"
> instance Typeable OCIException where typeOf _ = mkAppTy ociExceptionTc []


Use the catch functions like this: (Here convertAndRethrow converts the low-level FFI exceptions from one module into higher (application-level) exceptions.)

> commit :: Session -> IO ()
> commit (Session env err conn) = catchOCI ( do
>   OCI.commitTrans err conn
>   ) (\exc -> convertAndRethrow err exc nullAction)
>
> nullAction :: IO ()
> nullAction = return ()
>
> convertAndRethrow :: ErrorHandle -> OCIException -> IO () -> IO ()
> convertAndRethrow err exc cleanupAction = do
>   (e, m) <- OCI.formatErrorMsg exc err
>   cleanupAction
>   throwDB (DBError e m)

Or, an example that must do some cleanup when the exception is thrown: (Note also that the exception handler must return a value of the same type as the main action.)

> logon :: String -> String -> String -> EnvHandle -> ErrorHandle -> IO ConnHandle
> logon user pswd dbname env err = catchOCI ( do
>     connection <- OCI.dbLogon user pswd dbname env err
>     return connection
>   ) (\ociexc -> do
>     convertAndRethrow err ociexc $ do
>       freeHandle (castPtr err) oci_HTYPE_ERROR
>       freeHandle (castPtr env) oci_HTYPE_ENV
>     return undefined
>   )

Suppose I've got a pointer-to-function, a !FunPtr. How do I call the pointed-to function from Haskell? (This is a real problem: When I tried to create a binding to Libdb 4, all functions are actually !FunPtrs contained in structs. I really don't want to write a C function that extracts and dereferences the pointer for every single one of them.) -- UdoStenzel

I haven't done this before, so I can only suggest looking at the docs and experimenting: http://www.haskell.org/ghc/docs/latest/html/libraries/base/Foreign-Ptr.html#t%3AFunPtr

This comment (from that Foreign.Ptr page) might help: "To convert !FunPtr values to corresponding Haskell functions, one can define a dynamic stub for the specific foreign type, e.g.

 type IntFunction = CInt -> IO ()
 foreign import ccall "dynamic" 
   mkFun :: FunPtr IntFunction -> IntFunction

Thanks, I somehow missed that note. Now it seems for every !FunPtr in some structure I need to define a separate dynamic import? This is annoying, I'd have to spell out the type of every such function at least twice (three times when counting the convenient Haskell wrapper)! Is there a way around it? Maybe a preprocessor (c2hs comes close, but doesn't seem to handle !FunPtrs)? -- UdoStenzel

That is what HSFFIG tries to address, especially related to function pointers held in structures' fields, and parsing of their type signatures. And problems with BerkeleyDB described above sort of inspired creation of HSFFIG. See also the HsffigExamples page.

What HSFFIG does not do well yet, is autocreation of dynamic wrappers for !FunPtrs passed as other functions' parameters and/or return values: this is available only in part and not always done in consistent way. -- DimitryGolubovsky