GHC/GHCi debugger

From HaskellWiki
< GHC
Revision as of 08:18, 17 February 2007 by DonStewart (talk | contribs)

The GHCi Debugger project extends ghci with basic debugging capabilities. The GHC 6.7 documentation includes a section on the debugger.

This page is a dump of the designs, ideas, and results of the project. Check here for a Quicktime video demonstrating the use of the debugger.

Current status

UPDATE: The debugger is currently available in GHC 6.7 nightly builds, so there is no excuse anymore. GHC 6.8 will probably include a debugger, which may or may not be based on this one.

Feature wise

  • Stack Traces have been dropped and are not being pursued at the moment
  • The Closure viewer includes support for everything in GHC, but currently GADTs might be problematic
  • Dynamic breakpoints are fully supported

Intermediate closure viewer

The closure viewer is intended to permit working with polymorphic values in breakpoints, as well as to explore intermediate computations without altering the evaluation order.

This feature is now (more or less) complete. Currently it provides two new commands under ghci, :print and :sprint, both used in the same way as :type or :info. The latter prints a semievaluated closure using underscores to represent suspended computations (pretty much as Hood does). The former one in addition binds these thunks to variable names, so that you can do things with them.

Example:

Prelude> let li = map Just [1..5]
Prelude> length li
5
Prelude> :sp li
li - [_,_,_,_,_,]

Prelude> head li
Just 1

Prelude> :sp li
li - [Just 1,_,_,_,_]

Prelude> last li
Just 5

Prelude> :sp li
li - [Just 1,_,_,_Just 5]

Prelude> :p li
li - [Just 1, (_t1::Maybe Integer),(_t2::Maybe Integer),(_t3::Maybe Integer),Just 5]

Prelude> _t1 `seq` ()

Prelude> :p li
li - [Just 1, Just 2,(_t3::Maybe Integer),(_t4::Maybe Integer),Just 5]

Prelude> _t2
Just 3

Its best feature is that it can work without type information, so you can display polymorphic objects the type of which you don't know. However if there is type information available, it is used. Thanks to this it can work with opaque or coerced types. For instance:

data Opaque = forall a. O a
*Test2> let li = map Just [1..5]
*Test2> let o = O li
*Test2> head li `seq` ()
*Test2> length li `seq` ()
*Test2> :p o
o - [O Just 1,(_t1::Integer),(_t2::Integer),(_t3::Integer),(_t4::Integer)]

In the example above the li inside o has an opaque existential type. However, the closure viewer makes it possible to recover its type when it gets evaluated.

Other currently proposed extensions are a safeCoerce function (not so useful, it depends on ghc-api) and an unsafeDeepSeq (this one is decoupled from ghc-api). There is also a generally useful (for compiler/tool developers) isFullyEvaluated query function. The signatures being:

isFullyEvaluated :: a -> IO Bool 
unsafeDeepSeq    :: a -> b -> b 
safeCoerce       :: GHC.Session -> a -> Maybe b

Finally, note that there are some inconveniences with the current implementation, such as :p binding the same closure to different names when used twice on the same closure, but they are minor and temporary (hopefully).

Dynamic breakpoints

See the user details of the current implementation at the GHC User Guide.

Event sites and events

We define 'event sites' as points in the code where you can want to set a breakpoint. Current candidates for sites are:

  • On the entrance to a function / lambda abstraction
  • Prior to function applications (this one does not make sense unless it forces the application using $!)
  • Local bindings in lets and wheres
  • Entrance to statements in monadic-do code

Overlapping or unnecesary events should be coalesced into a single one. The rationale for what is an event and what is not is trying to find a middle point between the user interests and the overhead introduced:

  • We want to keep the overhead manageable, thus we want to keep the number of breakpoints low.
  • The user wants to introduce breakpoints at will.

Credit goes to both A. Tolmach's ML debugger and the OCaml time-travel debugger for providing the inspiration.

Proposals

There are currently the following proposals:

  • Instrument the code with a conditional breakpoint at every event site. Sites are numbered, and the condition uses a site-indexed array to check if there is a breakpoint enabled. The array is maintained inside ghci. Hopefully not much magic is required for this one.
  • In the style of the previous one, but no array is maintained. All the breakpoint conditions are set to False, so almost no overhead is incurred. When the user demands a breakpoint, its BCO in the heap is rewritten to enable the breakpoint. Feasibility of this?
  • Don't use instrumentation. Have a new header for BCOs with breakpoints, say BCO_BREAK, and change headers in execution time on user demand (as in the previous proposal). The problem I see with this one is how to extract the local bindings. I don't fully grok the scheme Lemmih uses to do that yet.

During this project we have explored the first one, under the lemma of ``do the simplest thing that could possibly work``. I'm sure there are many other designs. Please add your proposal or just throw an idea in.

Call traces

We want to have strict call traces, not the lazy ones.

Proposals

  • It has been suggested that stealing ideas from Cost-Centre Stacks may be useful.

  • Based on Tolmach's debugger, we can instrument the source code to build a timeline of events (either lazily or not). The events contain a pointer to its lexical parent event. With that it should be possible to extract a call trace:
  1. CASE 1: We are in a Function definition (FN):
    1. Go back one step in the timeline: it necessarily is an application (APP)
    2. Go back to its 'binding', i.e. its lexical parent. Keep doing this until it is a FN, then start again from case 1.
    3. Once you reach the top, i.e. the 0 event, you are done. Display all theAPPs you encountered in the way
  2. CASE 2: We are in a site other than a FN:
    1. Go back lexically until you hit a FN and continue with case 1.

This is just a wild, untested idea. It's possible that it would not work. Also even if it worked, it's possible that the overhead was unadmissible. WON'T WORK WITH LAZINESS

Integration

Allowing other tools to integrate with the debugger is an important goal. It should not be taken lightly though.

  • It has been suggested to create a client/server protocol so that the debugger can be used by other tools.
  • On the other hand, arguably it would be much easier to provide integration to clients of the ghc-api via some form of debugger api.
  • Finally, it should be possible to derive the client/server architecture as an afterthought provided there is a debugger api in the ghc-api.

Further pointers

  1. Rectus, Oleg Mürk and Lennart Kolmodin
  2. The Ocaml Debugger, The OCaml Team
  3. A debugger for Standard ML, A.Tolmach, A. Appel
  4. The original discussion in the ghc-cvs mailing list

How to get the patches

The patches are available at the SoC ghc.debugger darcs repo:

darcs get --partial http://darcs.haskell.org/SoC/ghc.debugger

This is a modified version of GHC 6.6. Build it following the instructions at the GHC developers wiki.

If darcs-all does not do it for you, you will need to manually pull a few patches at the libraries/base repo, to be pulled from http://darcs.haskell.org/SoC/ghc.debugger/libraries/base

Have fun! (and feel free to spam me with bugs, suggestions or requests!)