FFI cook book

This attempts to be a guide/tutorial/cookbook approach to writing a library using external (FFI) functions. Some people complain that cookbook approaches discourage a lack of thinking; that may be so, but they also help novices get started faster. Being a little hard of thinking myself, I would have been grateful for something like this when I was getting started. The FFI spec, while valuable, is not a tutorial.

This guide contains examples and lessons accumulated writing an FFI binding to the Oracle DBMS OCI (Oracle Call Interface), a low-level C library.

My FFI library code tends to look like imperative code written in Haskell. I guess we should expect this to some extent when dealing with external libraries, although it might be better (for me) to explore more functional alternatives. (However, Haskell also seems to be quite a good language for writing imperative code in.)

-- AlistairBayley

These libraries are useful for memory management, and using C pointers.

Contains peek, poke, peekByteOff, pokeByteOff, etc:

http://www.haskell.org/ghc/docs/latest/html/libraries/base/Foreign.Storable.html

Contains alloca, malloc, free, etc:

http://www.haskell.org/ghc/docs/latest/html/libraries/base/Foreign.Marshal.Alloc.html

Calling C functions

Passing opaque structures/types

Problem: A C function creates an opaque structure, which I must later pass to other C functions. What type should I use?

Solution: Create a datatype to represent the opaque structure. Note that the C functions expect a pointer to the structure, so I've created a type synonym called OCIHandle for these.

> data OCIStruct = OCIStruct
> type OCIHandle = Ptr OCIStruct

GHC allows this constructor-less version (with -fglasgow-exts):

> data OCIStruct

this would also work, if only for the lesser lines of code you'd need to type:

--no need for the data declaration
type OCIStruct = Ptr ()

i don't know if there are any side effects to this but it works fine for me so far -- eyan at eyan dot org

The side-effect I wanted to avoid was using the wrong pointer at the wrong time. Consider:

type EnvStruct = Ptr ()
type EnvHandle = Ptr EnvStruct
type ErrorStruct = Ptr ()
type ErrorHandle = Ptr ErrorStruct

ErrorHandle and EnvHandle have the same type i.e. you can use one where you would use the other. I would rather use different datatypes so the compiler can help me catch these type errors. Better would be:

data EnvStruct = EnvStruct
type EnvHandle = Ptr EnvStruct
data ErrorStruct = ErrorStruct
type ErrorHandle = Ptr ErrorStruct

-- AlistairBayley

Passing pointer-to-pointer-to-thing

Problem: C function takes a pointer-to-a-pointer argument, which is modified to point to some newly allocated structure or value. The return value of the C function is a success-or-failure code (int). So we effectively have parameters which are in-out. How do you wrap these in Haskell functions that return the actual structure (and raise an exception on failure)?

Solutions:

Single argument case

If the function only modifies one of its arguments, then use code like this:

OCIHandle is a synonym for Ptr OCIStruct, so the second argument to 
OCIHandleAlloc has type Ptr Ptr OCIStruct. The C signature for OCIHandleAlloc
describes the second argument as **void, i.e. a pointer to a pointer to something.

> foreign import ccall "oci.h OCIHandleAlloc" ociHandleAlloc ::
>   OCIHandle -> Ptr OCIHandle -> CInt -> CInt -> Ptr a -> IO CInt
>
> handleAlloc :: CInt -> OCIHandle -> IO OCIHandle
> handleAlloc handleType env = alloca $ \ptr -> do
>   rc <- ociHandleAlloc env ptr handleType 0 nullPtr
>   if rc < 0
>     then throwOCI (OCIException rc "allocate handle")
>     else peek ptr

(Shouldn't that be "... else peek ptr"?)

(yes, it should. Fixed)

Here, memory is allocated for ptr, and then it is passed to the foreign function. alloca is prefered because it frees the memory for ptr when the function exits, or when an exception is raised. We use peek to get at the value returned. alloca takes an IO action which takes a single argument: the newly allocated ptr. We use a lambda expression here to create an anonymous function (actually an IO action).

Multiple argument case

If the function modifies more than one of its arguments, then things get a little more complex. In this case we have to allocate the memory for the arguments (again, using the alloca* family of functions), call the C function, and extract the values. In this example the ociErrorGet function modifies the third and fourth args (int and string respectively). I've chosen an arbitrary size for the buffer for the string: 1000 bytes.

> getOCIErrorMsg2 :: OCIHandle -> CInt -> Ptr CInt -> CString -> CInt -> IO (CInt, String)
> getOCIErrorMsg2 ocihandle handleType errCodePtr errMsgBuf maxErrMsgLen = do
>   rc <- ociErrorGet ocihandle 1 0 errCodePtr errMsgBuf maxErrMsgLen handleType
>   if rc < 0
>     then return (0, "Error message not available.")
>     else do
>       msg <- peekCString errMsg
>       e <- peek errCode
>       return (e, msg)
>
> getOCIErrorMsg :: OCIHandle -> CInt -> IO (CInt, String)
> getOCIErrorMsg ocihandle handleType = do
>   let stringBufferLen = 1000
>   allocaBytes stringBufferLen $ \errMsg ->
>     alloca $ \errCode ->
>     getOCIErrorMsg2 ocihandle handleType errCode errMsg (mkCInt stringBufferLen)

(Thanks to Udo Stenzel for tips for avoiding memory leaks.)

Passing strings

Problem: C function expects strings with lengths, where each string (char*) is followed by an int stating how long it is.

Solution: Convert Haskell Strings to CStringLens, and pull CStrlingLens apart with utility functions. A CStringLen is just a (CString, Int) pair. Would it have been better to make CStringLen a (CString, CInt) pair?

> mkCInt :: Int -> CInt
> mkCInt n = fromIntegral n
> cStrLen :: CStringLen -> CInt
> cStrLen = mkCInt . snd
> cStr :: CStringLen -> CString
> cStr = fst
>
> dbLogon :: String -> String -> String -> EnvHandle -> ErrorHandle -> IO ConnHandle
> dbLogon user pswd db env err =
>   withCStringLen user $ \userC ->
>   withCStringLen pswd $ \pswdC ->
>   withCStringLen db   $ \dbC ->
>   alloca $ \conn -> do
>     rc <- ociLogon env err conn (cStr userC) (cStrLen userC) (cStr pswdC) (cStrLen pswdC) (cStr dbC) (cStrLen dbC)
>     case () of
>       _ | rc == oci_SUCCESS_WITH_INFO -> testForErrorWithPtr oci_ERROR "logon" conn
>         | otherwise -> testForErrorWithPtr rc "logon" conn

Raising and handling exceptions

Follow the advice for Dynamic Exceptions, in: http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control.Exception.html#10

Create your own exceptions, and your own throw and catch functions. This makes it easier to trap only exceptions raised by your code.

> data OCIException = OCIException Int String deriving (Typeable, Show)
>
> catchOCI :: IO a -> (OCIException -> IO a) -> IO a
> catchOCI = catchDyn
> throwOCI :: OCIException -> a
> throwOCI = throwDyn

If we can't derive Typeable then the following code should do the trick:

> -- replaces:
> data OCIException = OCIException CInt String deriving (Show)
> ociExceptionTc :: TyCon
> ociExceptionTc = mkTyCon "Database.Oracle.OciFunctions.OCIException"
> instance Typeable OCIException where typeOf _ = mkAppTy ociExceptionTc []

Use the catch functions like this: (Here convertAndRethrow converts the low-level FFI exceptions from one module into higher (application-level) exceptions.)

> commit :: Session -> IO ()
> commit (Session env err conn) = catchOCI ( do
>   OCI.commitTrans err conn
>   ) (\exc -> convertAndRethrow err exc nullAction)
>
> nullAction :: IO ()
> nullAction = return ()
>
> convertAndRethrow :: ErrorHandle -> OCIException -> IO () -> IO ()
> convertAndRethrow err exc cleanupAction = do
>   (e, m) <- OCI.formatErrorMsg exc err
>   cleanupAction
>   throwDB (DBError e m)

Or, an example that must do some cleanup when the exception is thrown: (Note also that the exception handler must return a value of the same type as the main action.)

> logon :: String -> String -> String -> EnvHandle -> ErrorHandle -> IO ConnHandle
> logon user pswd dbname env err = catchOCI ( do
>     connection <- OCI.dbLogon user pswd dbname env err
>     return connection
>   ) (\ociexc -> do
>     convertAndRethrow err ociexc $ do
>       freeHandle (castPtr err) oci_HTYPE_ERROR
>       freeHandle (castPtr env) oci_HTYPE_ENV
>     return undefined
>   )

Suppose I've got a pointer-to-function, a !FunPtr. How do I call the pointed-to function from Haskell? (This is a real problem: When I tried to create a binding to Libdb 4, all functions are actually !FunPtrs contained in structs. I really don't want to write a C function that extracts and dereferences the pointer for every single one of them.) -- UdoStenzel

I haven't done this before, so I can only suggest looking at the docs and experimenting: http://www.haskell.org/ghc/docs/latest/html/libraries/base/Foreign.Ptr.html#t%3AFunPtr

This comment (from that Foreign.Ptr page) might help: "To convert !FunPtr values to corresponding Haskell functions, one can define a dynamic stub for the specific foreign type, e.g.

 type IntFunction = CInt -> IO ()
 foreign import ccall "dynamic" 
   mkFun :: FunPtr IntFunction -> IntFunction

Thanks, I somehow missed that note. Now it seems for every !FunPtr in some structure I need to define a seperate dynamic import? This is annoying, I'd have to spell out the type of every such function at least twice (three times when counting the convenient Haskell wrapper)! Is there a way around it? Maybe a preprocessor (c2hs comes close, but doesn't seem to handle !FunPtrs)? -- UdoStenzel

That is what [wiki:HsffigTutorial HSFFIG] tries to address, especially related to function pointers held in structures' fields, and parsing of their type signatures. And problems with BerkeleyDB described above sort of inspired creation of HSFFIG. See also the HsffigExamples page.

What HSFFIG does not do well yet, is autocreation of dynamic wrappers for !FunPtrs passed as other functions' parameters and/or return values: this is available only in part and not always done in consistent way. -- DimitryGolubovsky