Yhc/API/Compiler

Part of Yhc

I want an API to Yhc, GHC has one, and I don't want the other compiler authors laughing at us :) This page is more notes for a potential API, rather than documentation of an existing one. If you would like an additional feature, just list it.

Sample Definitions

-- first stage, lexing!
data Pos = Pos {file :: FileName, line :: Int, pos :: Int}
data Lex = {- Lexemes -} | InlineComment String | MultilineComment String | WhiteSpace String
type LexPos = (Lex, Pos)

lex :: String -> [LexPos] -- lex Haskell bits
lexAll :: String -> [LexPos] -- also lex white space and comments

output :: ParseTree -> [LexPos] -> String

-- QUESTION:
-- should the standard lexer just lex white space, and pass it onwards
-- what about moving to BLG?
-- that will handle bracketing first, then lexing
-- you can always run the BLG in Lex only mode, just not do it that way for the compiler
-- or offer to dump the bracketed version out before higher merging has been done

-- ONE OPTION:
-- lexOnly - not used by the compiler at all, using the lex phase plus some additional rules?
--           maybe entirely separate, just distribute a stadalone lexer if people want it?
-- bracketLex - first step, as the compiler does it - a bracketLex tree, possibly with comments/spaces

-- Group should not be run when spaces and comments are in the source!
-- would anyone ever want the source with just lexing, i.e. before bracketing?

-- we want round tripping, converting from input to output, can that be done?

data ParseTree = ... | Position Pos ParseTree

data Lex = ... | Comment String

data Pos = Pos String Int Int

-- maybe with an annotation mechanism data ParseTree x = ... | ParseAnnotation x (ParseTree x)

parse :: FilePath -> IO ParseTree lexer :: FilePath -> IO [(Pos, Lex)]

dependancies :: ParseTree -> IO [FilePath]

typeCheck :: ParseTree -> TypedParseTree

desugar :: ParseTree -> ParseTree

desugarCase :: ParseTree -> ParseTree desugarListComprehension :: ParseTree -> ParseTree

bytecode :: ParseTree -> ByteCode

-- we were gonig to have a separate bytecode manip library, maybe put that in here?

}}}

Round Tripping

You want to parse the code to the parse tree, modify it in some way, then write it out again. How?

Options: lex returns everything, whitespace and comments included. Then have a parse tree phase. A modified parse tree can go back and look at the lex, along with the positions of the parse tree elements, and decide where things should end up.

But this assumes using lex first, which we don't want - we want preprocess, bracket, lex group (at least i do, and i will fight for this). so how? You can still lex, take the tokens returned from lex, intersperse them with the ParseTree, then write them out again. It means "lex/bracketing" the whole thing twice, but really, is that a problem? It will not be done by the compiler, but would be done by the type annotator, by HaRe, and I'm sure we can think of a few more utilities. Round tripping would be really nice.

Users

Potential or real, just use cases really so we make sure we've got all the functionality

Catch

Formal verifier for haskell programs. Will want parsing, no type checking, desugaring. Finding out where desugared lumps came from is important.

HaRe

Refactorer. Important to be able to output the syntax tree again, with comments intact and with indentation preserved exactly as it was before.

Salmon

Like haddock. Important to see comments, and be able to attach them. Type checker would be very handy.