GHC/As a library (up to 6.8)

From HaskellWiki
< GHC
Revision as of 18:26, 1 December 2007 by Gwern (talk | contribs) (→‎Initialization: 6.8 update)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Using GHC as a library

In GHC 6.5 and subsequently you can import GHC as a Haskell library, which lets you write a Haskell program that has access to all of GHC.

This page is a place for everyone to add

  • Notes about how to get it working
  • Comments about the API
  • Suggestions for improvement

and so on.

More documentation is available on the GHC wiki: http://cvs.haskell.org/trac/ghc/wiki/Commentary/Compiler/API

Getting started

You'll need a version of GHC (at least 6.5) that supports the GHC API. The GHC download page offers stable releases and development versions; you can also use CVS (instructions) or darcs (e.g., darcs get --partial http://darcs.haskell.org/ghc).

To use the GHC API you say

import GHC

Doing this imports the module GHC from the package ghc. This module exports the "GHC API", which is still in a state of flux. Currently it's not even Haddock-documented. You can see the source code (somewhat documented). There are also other modules of interest as you do more special things.

Here's an example main program that does it Media:Main.hs (good for GHC 6.6). You need to manually change the value of myGhcRoot to point to your GHC directory.

To compile Media:Main.hs, you have to turn on the flag "-package ghc", e.g.

  ghc -package ghc Main.hs

Common use cases and functions

Assumes GHC 6.6.

Default exception handling

If you don't handle exceptions yourself, you are recommended to wrap all code inside the wrapper:

defaultErrorHandler :: DynFlags -> IO a -> IO a
DynFlags.defaultDynFlags :: DynFlags

This catches exceptions and prints out exception details and exits your program with exit code 1.

Example:

import GHC
import DynFlags(defaultDynFlags)

main = defaultErrorHandler defaultDynFlags $ do
         {-
            stuff in the following subsections
         -}

You do not have to use defaultDynFlags, but it's the easiest starting point.

Initialization

First create a session:

newSession :: GhcMode         -- BatchCompile | Interactive | MkDepend | ...
           -> Maybe FilePath  -- GHC installation directory
           -> IO Session      -- your session; you will need it
The path to your GHC installation directory (e.g., /usr/local/lib/ghc-6.6) is in practice mandatory, even though in theory marked as optional. Also, in 6.8, the type signature of newSession changed to be simply
newSession :: Maybe FilePath -> IO Session
; the FilePath is still not actually optional.

The session is configurable by dynamic flags (GHC dynamic flags plus session state; think -O2, -fvia-C, -fglasgow-exts, -package). This can be done with:

getSessionDynFlags :: Session -> IO DynFlags
setSessionDynFlags :: Session
                   -> DynFlags
                   -> IO [PackageId]  -- important iff dynamic-linking
parseDynamicFlags :: DynFlags  -- old flags
                  -> [String]  -- e.g., all or part of getArgs
                  -> IO (DynFlags, [String])  -- new flags, unknown args

The DynFlags record has a gazillion fields; ask ghci to show all of them. You can change them by hand, or use the parser (which implements the GHC command line format and does the Right Thing). But there is one you must note:

data DynFlags = DynFlags { ...,
    hscTarget :: HscTarget }    -- HscC | HscAsm | HscInterpreted | ...

This corresponds to -fvia-C, -fasm, or interpreting. When the session needs to re-compile a module, this field controls how. The default is HscAsm, even in the interactive mode, meaning the interactive mode may produce .hi and .o files too. If you want to follow GHCi in not doing that, you must set this field to HscInterpreted yourself. (On the other hand, it is fun to contemplate an interactive session that generates machine code upon your command.)

setSessionDynFlags also sets up your session's awareness of the package database (without which you can't even use the Prelude), so even if you like the defaults, you should still call it. (Older code called PackageConfig.initPackages for this.)

Examples:

  • vanilla compiler, use all defaults (rare but good start)
session <- newSession BatchCompile (Just "/usr/local/lib/ghc-6.6")
getSessionDynFlags session >>= setSessionDynFlags session
  • compiler with custom flags, easy with parser
session <- newSession BatchCompile (Just "/usr/local/lib/ghc-6.6")
f0 <- getSessionDynFlags session
(f1,b) <- parseDynamicFlags f0 ["-fglasgow-exts", "-O", "-package", "ghc", "-package Cabal",
                                "foo", "-v", "bar"]
-- b = ["foo", "bar"]; the other args are recognized
-- in GHC 6.6 "-O" implies "-fvia-C", that kind of thing is automatic here too
setSessionDynFlags session f1
  • interactive session with interpreter
session <- newSession Interactive (Just "/usr/local/lib/ghc-6.6")
f0 <- getSessionDynFlags session
setSessionDynFlags session f0{hscTarget = HscInterpreted}

Load or compile modules

To compile code or load modules, first set one or more targets, then call the load function.

guessTarget :: String       -- "filename.hs" or "filename.lhs" or "MyModule"
            -> Maybe Phase  -- if not Nothing, specifies starting phase
            -> IO Target
addTarget :: Session -> Target -> IO ()
setTargets :: Session -> [Target] -> IO ()
getTargets :: Session -> IO [Target]
removeTarget :: Session -> TargetId -> IO ()
load :: Session -> LoadHowMuch -> IO SuccessFlag
data LoadHowMuch
  = LoadAllTargets
  | LoadUpTo ModuleName
  | LoadDependenciesOf ModuleName

Loading or compiling produces temp directories and files, which can only be correctly cleaned up or kept (depending on temp file flags in DynFlags) with the wrapper:

defaultCleanupHandler :: DynFlags -> IO a -> IO a

Two factors constrain how much code should be wrapped. At the minimal, function calls such as load and depanal that potentially unlit or compile or link should be wrapped. At the maximal, the flags passed to defaultCleanupHandler should be identical to those set to the session, so practically the wrapping should be after setting up seesion flags.

Example:

t <- guessTarget "Main.hs" Nothing
addTarget session t    -- setTargets session [t] is also good
f <- getSessionDynFlags session
sf <- defaultCleanupHandler f (load session LoadAllTargets)
case sf of Succeeded -> ...
           Failed -> ...

Dependencies (both modules and packages) are processed automatically, and an executable is produced if appropriate, precisely like --make.

Modules are compiled as per the hscTarget flag (-fasm, -fvia-C, or interpreter) in DynFlags, independent of GHC mode.

Compiling to Core

To compile a file to Core (the intermediate language used in GHC's simplifier), call compileToCore:

compileToCore :: Session -> FilePath -> IO (Maybe [CoreBind])

compileToCore takes a session and a filename and, if successful, returns a list of Core bindings corresponding to the given module. It is not necessary to set a target first. For documentation of the Core data types, see the CoreSyn module in GHC.

If you need access to the type declarations contained within a module as well, use compileToCoreModule instead:

compileToCoreModule :: Session -> FilePath -> IO (Maybe CoreModule)

compileToCoreModule is similar to compileToCore, except that it returns a CoreModule, which has the following definition:

data CoreModule
  = CoreModule {
      -- Module name
      cm_module   :: !Module,
      -- Type environment for types declared in this module
      cm_types    :: !TypeEnv,
      -- Declarations
      cm_binds    :: [CoreBind]
    }

The type TypeEnv is defined in HscTypes.lhs.

Interactive evaluation

Interactive evaluation ala GHCi is done by runStmt. But first, this is always done under a current context, i.e., which modules are in scope. Most probably you want to have at least the Prelude and those you loaded in the previous section. How to manipulate the context:

setContext :: Session
           -> [Module]    -- their top levels will be visible
           -> [Module]    -- their exports will be visible
           -> IO ()
getContext :: Session -> IO ([Module], [Module])
findModule :: Session -> ModuleName -> Maybe PackageId -> IO Module
mkModule :: PackageId -> ModuleName -> Module
mkModuleName :: String -> ModuleName
PackageConfig.stringToPackageId :: String -> PackageId

Every module given to setContext must be either in a package known to the session or has been loaded as per the previous subsection. Example:

-- equivalent to GHCi's :m Prelude Control.Monad *Main
prelude <- findModule session (mkModuleName "Prelude") Nothing
monad <- findModule session (mkModuleName "Control.Monad") Nothing
usermod <- findModule session (mkModuleName "Main") Nothing  -- we have loaded this
setContext session [usermod] [prelude,monad]

You can also be specific about packages. You can also use mkModule instead of findModule, or even some module query functions in the next subsection.

Having set a useful context, we're now ready to evaluate.

runStmt :: Session -> String -> IO RunResult
data RunResult
    = RunOk [Name]    -- names bound by the expression
    | RunFailed
    | RunException GHC.IOBase.Exception  -- that's Control.Exception.Exception

Example:

runStmt session "let n = 2 + 2"  -- n is bound
runStmt session "n"              -- 4 is printed (note "it" is bound)

(Interactive evaluation works in BatchCompile mode too! There are still other subtle differences, so this is not recommended.)

Type checking

What if I want the type info from a module?

Once the modules are loaded in the session, they are already type-checked. The type information of a loaded module are stored in a data-structure called ModuleInfo. To access the type information, we need to apply function getModuleInfo to the target module.

ModuleInfo is defined as follows,

data ModuleInfo = ModuleInfo {
	minf_type_env  :: TypeEnv,
	minf_exports   :: NameSet, -- ToDo, [AvailInfo] like ModDetails?
	minf_rdr_env   :: Maybe GlobalRdrEnv,	-- Nothing for a compiled/package mod
	minf_instances :: [Instance]
#ifdef GHCI
        ,minf_modBreaks :: ModBreaks 
#endif
	-- ToDo: this should really contain the ModIface too
  }

The field minf_type_env is holding the type environment, of type TypeEnv, which is defined as,

type TypeEnv = [TyThing]

where TyThing can be an identifier, a class, a type constructor or a data constructor.

data TyThing = AnId     Id
	     | ADataCon DataCon
	     | ATyCon   TyCon
	     | AClass   Class

Recalling the running example in the previous subsection, note that the variable usermod captures the user-defined module "Main". We retrieve the module information of "Main" module and unfold the type environment out of it.

mb_userModInfo <- getModuleInfo session usermod
case mb_userModInfo of 
  Just userModInfo ->
    let userTyThings = modInfoTyThings userModInfo              -- access the type environments
        userTys = [ (i, idType i) | (AnId i) <- userTyThings ]  -- we are only interested in the declared ids and their (inferred) types. 
    in  ... -- do something with userTys
  Nothing -> return ()


Queries

-- Get module dependency graph
getModuleGraph :: Session -> IO ModuleGraph    -- ModuleGraph = [ModSummary]
-- Get bindings
getBindings :: Session -> IO [TyThing]

Messages

Compiler messages (including progress, warnings, errors) are controlled by verbosity and routed through a callback mechanism. These are fields in DynFlags:

data DynFlags = DynFlags { ...,
    verbosity :: Int,
    log_action :: Severity -> SrcLoc.SrcSpan -> Outputable.PprStyle -> ErrUtils.Message -> IO () }

The verbosity field corresponds to the command line -v flag; parseDynamicFlags is applicable. A low verbosity means log_action is seldom called.

You can set the callback to your logger, like

f <- getSessionDynFlags session
setSessionDynFlags session f{log_action = my_log_action}

This sets the session's logger, but it will not see exceptions.

If you call defaultErrorHandler at the outermost, clearly its message logger has to be set separately:

main = defaultErrorHandler defaultDynFlags{log_action = my_log_action} $ do ...

This logger will see messages produced by defaultErrorHandler upon exceptions.

Interactive mode example

The file Media:Interactive-6.6.hs or Media:Interactive-6.8.hs (requires Media:MyPrelude.hs) serve as an example for using GHC as a library in interactive mode. It also shows how to replace some of the standard prelude functions with modified versions. See the comments in the code for further information.

Using the GHC library from inside GHCi

This works, to some extent. However, beware about loading object code, because there is only a single linker symbol table in the runtime, so GHCi will be sharing the symbol table with the new GHC session.

$ ghci -package ghc
Prelude> :m + GHC PackageConfig
Prelude GHC> session <- newSession Interactive (Just "/usr/local/lib/ghc-6.6")
Prelude GHC> setSessionDynFlags session =<< getSessionDynFlags session
Prelude GHC> setContext session [] [mkModule (stringToPackageId "base") (mkModuleName "Prelude")]
Prelude GHC> runStmt session "let add1 x = x + 1"
Prelude GHC> runStmt session "add1 2"
3

Profiling

To build the profiling version of GHC-as-a-library, add:

GhcCompilerWays=p

to your build.mk file, and rebuild GHC.

Binary size

Using the GHC API in your applications results in large executables (e.g. > 15Mb). You can mitigate this by an enormous amount using the tools strip and gzexe; this may reduce the executable to 15-30% of its previous size. (Example:

"I will take this time to point out that using the GHC API in your

applications results in *large* executables. The Interact example above when compiled with vanilla --make options resulted in a whopping 17mb executable. I've observed however you can mitigate this by an

enormous amount using the tools strip and gzexe [see also upx. -ed], taking it down to a light 2.5mb (a size reduction of about 85%)."[1]

(Using 6.8.1 for me results in a 16M binary using the example interactive session from this page, even when thoroughly stripped. Using UPX on its most intensive settings brought it down to 2.9M. --Gwern)

Links to other info about the GHC API