Difference between revisions of "Toolmaking libraries"

Revision as of 19:38, 24 March 2007

Basic concept

Writing tools involves duplicating work that gets done over and over, and which is affected by the existence of extensions. So let's define interfaces for libraries that make this easier and build some initial implementations, while letting new Haskell implementations with new extensions provide their own implementations that hopefully make the extensions less painful to deal with.

Views

Different tools need different information about code - for example, haddock doesn't care about terms at all, whereas other tools might want to know about dependencies and yet others might perform abstract interpretations on code. So we should provide different views of a source file, with different modules implementing them.

Apps like Haddock need to see the bindings present in a module and some information about those bindings, but no terms. Some term-relevant analyses could still be usefully provided - so long as they're done on-demand, apps that don't need them won't be restricted by tighter well-formedness constraints.
Other applications could use a desugared 'core' language (and preferably means to find the corresponding source for a given piece of core code). At least two variants would make sense:
- An untyped core. Producing such a language isn't too hard, though there are arguments to be had as to how to handle type classes.
- An explicitly typed core. This could be trickier - the Right Thing would appear to be a core sufficiently expressive to handle type-preserving translations from the numerous extensions kicking around. Type classes are again an issue, in that much of the work involved with them is in resolving which instance to use - this can be dealt with, but requires some thought.

Proposed interfaces

Bindings only

A sketch proposal for a "just the names" interface (... where I've not filled in concrete details yet, feel free to add in the obvious):

-- [Item] corresponds to the order the items appear in the source
data Module = Module Identifier [Export] [Item]
data Export = ...
data Item = Comment String |
            Function FuncDec |
            Fixity FixityDec |
            Type TypeDec |
            Class ClassDec |
            Instance InstanceDec |
            Import ImportDec |
            ... -- new extensions may add new constructors, tools should handle these gracefully

data FuncDec = FuncDec Identifier (Maybe Type)
data FixityDec = FixDec Fixity Int 
data Fixity = FLeft | FRight | FNon
data TypeDec = TypeDec {name :: Identifier,
                        parms,
                        constructors :: [DataConDec],
                        ...} |
               TypeSynonym ... |
               NewType ...
data ClassDec = ... -- keep abstract
data InstanceDec = ... -- keep abstract
data ImportDec = ...

data Type = ... -- keep abstract

-- parse source
readSource :: String -> Module

{- queries on Module -}
funcs :: Module -> [FuncDec]
types :: Module -> [TypeDec]
classes :: Module -> [ClassDec]
instances :: Module -> [InstanceDec]

{- compatibility - extended/custom version of lib will want additional such queries -}
class MaybeH98 dec where
  isH98 :: dec -> Bool

-- instances of MaybeH98 for all Dec structures

{- queries on InstanceDec -}
-- print out the 'head' of the instance, "instance <class> <parms>"
showInstanceHead :: InstanceDec -> String
showInstance :: InstanceDec -> String -- use for Show instance

{- queries on FuncDec -}
-- clearly I have done some handwaving, as FuncDecs'd need tying to their source for this
inferType :: FuncDec -> Type

{- queries on Type -}
showType :: Type -> String -- use for Show instance, should be legit Haskell code

-- Any other useful queries want adding?

Core languages

How about building our untyped core around this language, after going over it with a comb to check it's adequately expressive?:

type Program = [Dec]

data Dec = Dec Identifier Term

data Term = Var Identifier |
            App Term [Term] |
            Lam [Identifier] Term |
            Let [Dec] Term | -- recursive let
            Case Term [Alt] -- simple case

data Alt = Alt {constructor :: Identifier, 
                bindings :: [Identifier],
                result :: Term}

Important design choices to note: this isn't a supercombinator language or a variant of ANF, it's an ordinary untyped lambda calculus with minimal extensions to capture pattern matching and binding structure. Including lambda lifting and A-normalisation functions in the library would probably be a good idea. An interpreter may not go amiss either.

A sensible typed core might use explicit instance lambdas and applications as well as type lambdas and applications in the vein of System F. This would leave resolving which instance to supply to the type inference process. It would need to leave extension mechanisms for new constructs in order to naturally accomodate changes such as GADTs - System Fc is preferable to hacking in casts where it can be used. The library would definitely have to include a typechecker.