GHC/As a library (up to 6.8)

From HaskellWiki
< GHC
Jump to navigation Jump to search

For instructions on the GHC API with GHC 7 and above please refer to GHC/As_a_library

Using GHC as a library

In GHC 6.5 and subsequently you can import GHC as a Haskell library, which lets you write a Haskell program that has access to all of GHC.

This page is a place for everyone to add

  • Notes about how to get it working
  • Comments about the API
  • Suggestions for improvement

and so on.

More documentation is available on the GHC wiki: http://cvs.haskell.org/trac/ghc/wiki/Commentary/Compiler/API

Getting started

You'll need a version of GHC (at least 6.5) that supports the GHC API. The GHC download page offers stable releases and development versions; you can also use CVS (instructions) or darcs (e.g., darcs get --partial http://darcs.haskell.org/ghc).

To use the GHC API you say

import GHC

Doing this imports the module GHC from the package ghc. This module exports the "GHC API", which is still in a state of flux. Currently it's not even Haddock-documented. You can see the source code (somewhat documented). There are also other modules of interest as you do more special things.

Here's an example main program that does it Media:Main.hs (good for GHC 6.6). You need to manually change the value of myGhcRoot to point to your GHC directory.

To compile Media:Main.hs, you have to turn on the flag "-package ghc", e.g.

  ghc -package ghc Main.hs

Common use cases and functions

Assumes GHC 6.6.

Default exception handling

If you don't handle exceptions yourself, you are recommended to wrap all code inside the wrapper:

defaultErrorHandler :: DynFlags -> IO a -> IO a
DynFlags.defaultDynFlags :: DynFlags

This catches exceptions and prints out exception details and exits your program with exit code 1.

Example:

import GHC
import DynFlags(defaultDynFlags)

main = defaultErrorHandler defaultDynFlags $ do
         {-
            stuff in the following subsections
         -}

You do not have to use defaultDynFlags, but it's the easiest starting point.

Initialization

First create a session:

newSession :: GhcMode         -- BatchCompile | Interactive | MkDepend | ...
           -> Maybe FilePath  -- GHC installation directory
           -> IO Session      -- your session; you will need it

The path to your GHC installation directory (e.g., /usr/local/lib/ghc-6.6) is in practice mandatory, even though in theory marked as optional. Also, in 6.8, the type signature of newSession changed to remove the GhcMode (but retaining the still not actually optional Maybe FilePath:

newSession :: Maybe FilePath -> IO Session

The session is configurable by dynamic flags (GHC dynamic flags plus session state; think -O2, -fvia-C, -fglasgow-exts, -package). This can be done with:

getSessionDynFlags :: Session -> IO DynFlags
setSessionDynFlags :: Session
                   -> DynFlags
                   -> IO [PackageId]  -- important iff dynamic-linking
parseDynamicFlags :: DynFlags  -- old flags
                  -> [String]  -- e.g., all or part of getArgs
                  -> IO (DynFlags, [String])  -- new flags, unknown args

The DynFlags record has a gazillion fields; ask ghci to show all of them. You can change them by hand, or use the parser (which implements the GHC command line format and does the Right Thing). But there are a couple you must note:

data DynFlags = DynFlags { ...,
    hscTarget :: HscTarget }    -- HscC | HscAsm | HscInterpreted | ...

This corresponds to -fvia-C, -fasm, or interpreting. When the session needs to re-compile a module, this field controls how. The default is HscAsm, even in the interactive mode, meaning the interactive mode may produce .hi and .o files too. If you want to follow GHCi in not doing that, you must set this field to HscInterpreted yourself. (On the other hand, it is fun to contemplate an interactive session that generates machine code upon your command.)

setSessionDynFlags also sets up your session's awareness of the package database (without which you can't even use the Prelude), so even if you like the defaults, you should still call it. (Older code called PackageConfig.initPackages for this.)

Examples:

  • vanilla compiler, use all defaults (rare but good start)
session <- newSession BatchCompile (Just "/usr/local/lib/ghc-6.6")
getSessionDynFlags session >>= setSessionDynFlags session
  • compiler with custom flags, easy with parser
session <- newSession BatchCompile (Just "/usr/local/lib/ghc-6.6")
f0 <- getSessionDynFlags session
(f1,b) <- parseDynamicFlags f0 ["-fglasgow-exts", "-O", "-package", "ghc", "-package Cabal",
                                "foo", "-v", "bar"]
-- b = ["foo", "bar"]; the other args are recognized
-- in GHC 6.6 "-O" implies "-fvia-C", that kind of thing is automatic here too
setSessionDynFlags session f1
  • interactive session with interpreter
session <- newSession Interactive (Just "/usr/local/lib/ghc-6.6")
f0 <- getSessionDynFlags session
setSessionDynFlags session f0{hscTarget = HscInterpreted}
  • no linking

The GHC API behaves just like --make: it always links the program if you have a Main module. There are some cases where you will want to turn the linker off; this has the same affect as --make -c. In order to do this, set the ghcLink flag to NoLink:

session <- newSession BatchCompile (Just "/usr/local/lib/ghc-6.6")
f0 <- getSessionDynFlags session
(f1,b) <- parseDynamicFlags f0 ["-fglasgow-exts", "-O", "-package", "ghc", "-package Cabal",
                                "foo", "-v", "bar"]
-- b = ["foo", "bar"]; the other args are recognized
-- in GHC 6.6 "-O" implies "-fvia-C", that kind of thing is automatic here too
setSessionDynFlags session $ f1 {ghcLink=NoLink}

Load or compile modules

To compile code or load modules, first set one or more targets, then call the load function.

guessTarget :: String       -- "filename.hs" or "filename.lhs" or "MyModule"
            -> Maybe Phase  -- if not Nothing, specifies starting phase
            -> IO Target
addTarget :: Session -> Target -> IO ()
setTargets :: Session -> [Target] -> IO ()
getTargets :: Session -> IO [Target]
removeTarget :: Session -> TargetId -> IO ()
load :: Session -> LoadHowMuch -> IO SuccessFlag
data LoadHowMuch
  = LoadAllTargets
  | LoadUpTo ModuleName
  | LoadDependenciesOf ModuleName

Loading or compiling produces temp directories and files, which can only be correctly cleaned up or kept (depending on temp file flags in DynFlags) with the wrapper:

defaultCleanupHandler :: DynFlags -> IO a -> IO a

Two factors constrain how much code should be wrapped. At the minimal, function calls such as load and depanal that potentially unlit or compile or link should be wrapped. At the maximal, the flags passed to defaultCleanupHandler should be identical to those set to the session, so practically the wrapping should be after setting up seesion flags.

Example:

t <- guessTarget "Main.hs" Nothing
addTarget session t    -- setTargets session [t] is also good
f <- getSessionDynFlags session
sf <- defaultCleanupHandler f (load session LoadAllTargets)
case sf of Succeeded -> ...
           Failed -> ...

Dependencies (both modules and packages) are processed automatically, and an executable is produced if appropriate, precisely like --make.

Modules are compiled as per the hscTarget flag (-fasm, -fvia-C, or interpreter) in DynFlags, independent of GHC mode.

Compiling to and from Core

Note: the following will only work as advertised in the HEAD (GHC 6.9).

To compile a file to Core (the intermediate language used in GHC's simplifier), call compileToCoreModule:

compileToCoreModule :: Session -> FilePath -> IO (Maybe CoreModule)

compileToCoreModule takes a session and a filename and, if successful, returns a CoreModule corresponding to the Haskell module denoted by the filename. It is not necessary to set a target first.

The CoreModule data type is specified in the HscTypes module in GHC. The most important field is probably cm_binds, a list of Core bindings for all top-level declarations from the original Haskell source file.

data CoreModule
  = CoreModule {
      -- Module name
      cm_module   :: !Module,
      -- Type environment for types declared in this module
      cm_types    :: !TypeEnv,
      -- Declarations
      cm_binds    :: [CoreBind]
      -- Imports
      cm_imports  :: ![Module]
    }

The type TypeEnv is defined in HscTypes.lhs.

For documentation of the Core data types, see the CoreSyn module in GHC.

For some applications, you may want to compile the Haskell source for various standard libraries into Core. Given the current API, this is probably only possible in the presence of a GHC tree that was built from source, including libraries. (Otherwise, there may be problems finding the right import files.) If you have experience suggesting otherwise, you may wish to report back.

You can also compile Core code to object code:

compileCoreToObj :: Bool -> Session -> CoreModule -> IO Bool

compileCoreToObj takes a boolean flag that says whether or not to run the simplifier, a session, and a CoreModule, and returns True iff compilation to machine code succeeded. As a side effect, it creates the executable file, as well as any necessary .hi or .o files, in the current working directory. This has only been tested with a single self-contained Core module (i.e., one that only depends on standard library functions rather than on functions in other user-defined modules).

These functions are a work-in-progress, so if you notice anything that is not what you expect, please complain on the cvs-ghc mailing list or file a bug report.

Interactive evaluation

Interactive evaluation ala GHCi is done by runStmt. But first, this is always done under a current context, i.e., which modules are in scope. Most probably you want to have at least the Prelude and those you loaded in the previous section. How to manipulate the context:

setContext :: Session
           -> [Module]    -- their top levels will be visible
           -> [Module]    -- their exports will be visible
           -> IO ()
getContext :: Session -> IO ([Module], [Module])
findModule :: Session -> ModuleName -> Maybe PackageId -> IO Module
mkModule :: PackageId -> ModuleName -> Module
mkModuleName :: String -> ModuleName
PackageConfig.stringToPackageId :: String -> PackageId

Every module given to setContext must be either in a package known to the session or has been loaded as per the previous subsection. Example:

-- equivalent to GHCi's :m Prelude Control.Monad *Main
prelude <- findModule session (mkModuleName "Prelude") Nothing
monad <- findModule session (mkModuleName "Control.Monad") Nothing
usermod <- findModule session (mkModuleName "Main") Nothing  -- we have loaded this
setContext session [usermod] [prelude,monad]

You can also be specific about packages. You can also use mkModule instead of findModule, or even some module query functions in the next subsection.

Having set a useful context, we're now ready to evaluate.

runStmt :: Session -> String -> IO RunResult
data RunResult
    = RunOk [Name]    -- names bound by the expression
    | RunFailed
    | RunException GHC.IOBase.Exception  -- that's Control.Exception.Exception

Example:

runStmt session "let n = 2 + 2"  -- n is bound
runStmt session "n"              -- 4 is printed (note "it" is bound)

(Interactive evaluation works in BatchCompile mode too! There are still other subtle differences, so this is not recommended.)

Type checking

What if I want the type info from a module?

Once the modules are loaded in the session, they are already type-checked. The type information of a loaded module are stored in a data-structure called ModuleInfo. To access the type information, we need to apply function getModuleInfo to the target module.

ModuleInfo is defined as follows,

data ModuleInfo = ModuleInfo {
	minf_type_env  :: TypeEnv,
	minf_exports   :: NameSet, -- ToDo, [AvailInfo] like ModDetails?
	minf_rdr_env   :: Maybe GlobalRdrEnv,	-- Nothing for a compiled/package mod
	minf_instances :: [Instance]
#ifdef GHCI
        ,minf_modBreaks :: ModBreaks 
#endif
	-- ToDo: this should really contain the ModIface too
  }

The field minf_type_env is holding the type environment, of type TypeEnv, which is defined as,

type TypeEnv = [TyThing]

where TyThing can be an identifier, a class, a type constructor or a data constructor.

data TyThing = AnId     Id
	     | ADataCon DataCon
	     | ATyCon   TyCon
	     | AClass   Class

Recalling the running example in the previous subsection, note that the variable usermod captures the user-defined module "Main". We retrieve the module information of "Main" module and unfold the type environment out of it.

mb_userModInfo <- getModuleInfo session usermod
case mb_userModInfo of 
  Just userModInfo ->
    let userTyThings = modInfoTyThings userModInfo              -- access the type environments
        userTys = [ (i, idType i) | (AnId i) <- userTyThings ]  -- we are only interested in the declared ids and their (inferred) types. 
    in  ... -- do something with userTys
  Nothing -> return ()


Queries

-- Get module dependency graph
getModuleGraph :: Session -> IO ModuleGraph    -- ModuleGraph = [ModSummary]
-- Get bindings
getBindings :: Session -> IO [TyThing]

Messages

Compiler messages (including progress, warnings, errors) are controlled by verbosity and routed through a callback mechanism. These are fields in DynFlags:

data DynFlags = DynFlags { ...,
    verbosity :: Int,
    log_action :: Severity -> SrcLoc.SrcSpan -> Outputable.PprStyle -> ErrUtils.Message -> IO () }

The verbosity field corresponds to the command line -v flag; parseDynamicFlags is applicable. A low verbosity means log_action is seldom called.

You can set the callback to your logger, like

f <- getSessionDynFlags session
setSessionDynFlags session f{log_action = my_log_action}

This sets the session's logger, but it will not see exceptions.

If you call defaultErrorHandler at the outermost, clearly its message logger has to be set separately:

main = defaultErrorHandler defaultDynFlags{log_action = my_log_action} $ do ...

This logger will see messages produced by defaultErrorHandler upon exceptions.

Interactive mode example

The file Media:Interactive-6.6.hs or Media:Interactive-6.8.hs (requires Media:MyPrelude.hs) serve as an example for using GHC as a library in interactive mode. It also shows how to replace some of the standard prelude functions with modified versions. See the comments in the code for further information.

Using the GHC library from inside GHCi

This works, to some extent. However, beware about loading object code, because there is only a single linker symbol table in the runtime, so GHCi will be sharing the symbol table with the new GHC session.

$ ghci -package ghc
Prelude> :m + GHC PackageConfig
Prelude GHC> session <- newSession Interactive (Just "/usr/local/lib/ghc-6.6")
Prelude GHC> setSessionDynFlags session =<< getSessionDynFlags session
Prelude GHC> setContext session [] [mkModule (stringToPackageId "base") (mkModuleName "Prelude")]
Prelude GHC> runStmt session "let add1 x = x + 1"
Prelude GHC> runStmt session "add1 2"
3

Profiling

To build the profiling version of GHC-as-a-library, add:

GhcCompilerWays=p

to your build.mk file, and rebuild GHC.

Note: This seems to be broken as of GHC 6.8.3.

Binary size

Using the GHC API in your applications results in large executables (e.g. > 15Mb). You can mitigate this by an enormous amount using the tools strip and gzexe; this may reduce the executable to 15-30% of its previous size. (Example:

"I will take this time to point out that using the GHC API in your

applications results in *large* executables. The Interact example above when compiled with vanilla --make options resulted in a whopping 17mb executable. I've observed however you can mitigate this by an

enormous amount using the tools strip and gzexe [see also upx. -ed], taking it down to a light 2.5mb (a size reduction of about 85%)."[1]

(Using 6.8.1 for me results in a 16M binary using the example interactive session from this page, even when thoroughly stripped. Using UPX on its most intensive settings brought it down to 2.9M. --Gwern)