Difference between revisions of "Literate programming"

Revision as of 17:14, 25 June 2016

Literate programming was invented / coined / started by Dr. Donald Knuth. In fact, if you asked Dr. Knuth what his favourite programming language was, (5th question of http://www-cs-faculty.stanford.edu/~knuth/faq.html) you would be told CWEB - which is a literate programming tool combining C and tex.

What is literate programming? To quote Dr. Knuth:

"The main idea is to regard a program as a communication to human beings rather than as a set of instructions to a computer."

The tool used to generate hyperlinked documentation from literate code is Haddock

Haskell and literate programming

Haskell is one of the few languages that provides native features to support literate programming. In haskell, a literate program is one with the suffix .lhs rather than .hs.

In a literate Haskell program, there are two ways to distinguish between code and non-code portions. You can either prepend all code with a > , (bird style) or surround lines of code with \begin{code} and \end{code} pairs (latex style). For those who know, use and love latex, the latter is the suggested way to go.

Note that Happy supports literate programming as well, via bird style (the > marker) in .ly files. Sadly, Alex no longer does. You might consider using noweb with it.

Bird Style

According to the Haskell Report, this style of comment was developed by Richard Bird (hence the name) and Philip Wadler. All lines starting with > are interpreted as code, everything else is considered a comment. One additional requirement is that you always leave a blank line before and after the code block:

In Bird-style you have to leave a blank before the code.

> fact :: Integer -> Integer
> fact 0 = 1
> fact n = n * fact (n-1)

And you have to leave a blank line after the code as well.

The idea behind this restriction is capturing the mistake of not inserting the > mark at the beginning of the line. In general this is not only good practice, but also a formatting that makes the code more readable.

However, there are cases in which you might like to get around this restriction. Perhaps you're writing Haskell code within a markup language that's not Latex, and you may have to surround your code with something equivalent to \begin{code} and \end{code}. In this case, GHC provides a flag that can be used to lift the blank lines requirement:

$ ghc -optL -q

Latex suggestions for literate programming

(See also #lhs2TeX below) In the majority of these suggestions, you can simply write:

\begin{code}
tsort []     = []
tsort (x:xs) = tsort [y | y<-xs, y>x] ++ [x] ++ tsort [y | y<-xs, y<=x]
\end{code}

and the code will be formatted as you requested.

The advantage: Source code and documentation are consistent! The code environment is understood by Haskell compilers, so you can run your documentation files directly.

Verbatim package

One can always simply use verbatim mode which will format the code "as-is".

Use verbatim:

\usepackage{verbatim}
\newenvironment{code}{\footnotesize\verbatim}{\endverbatim\normalsize}

Listings package

Another alternative is to use the latex-package listings, which allows you to do much more customization of the output:

\usepackage{listings}
\lstnewenvironment{code}{\lstset{language=Haskell,basicstyle=\small}}{}

You can configure the appearance of the listings quite a bit! Some people find these settings to be the most satisfying:

\usepackage{listings}
\lstloadlanguages{Haskell}
\lstnewenvironment{code}
    {\lstset{}%
      \csname lst@SetFirstLabel\endcsname}
    {\csname lst@SaveFirstLabel\endcsname}
    \lstset{
      basicstyle=\small\ttfamily,
      flexiblecolumns=false,
      basewidth={0.5em,0.45em},
      literate={+}{{$+$}}1 {/}{{$/$}}1 {*}{{$*$}}1 {=}{{$=$}}1
               {>}{{$>$}}1 {<}{{$<$}}1 {\\}{{$\lambda$}}1
               {\\\\}{{\char`\\\char`\\}}1
               {->}{{$\rightarrow$}}2 {>=}{{$\geq$}}2 {<-}{{$\leftarrow$}}2
               {<=}{{$\leq$}}2 {=>}{{$\Rightarrow$}}2 
               {\ .}{{$\circ$}}2 {\ .\ }{{$\circ$}}2
               {>>}{{>>}}2 {>>=}{{>>=}}2
               {|}{{$\mid$}}1               
    }

You might want to consult the documentation of the "listings" package, to find out whether there's more you can tune to your likings. (Like adding line numbers, etc.) Note that the suggested "literate" option above will replace the given symbols anywhere in the text, including inside strings, which is probably not what one wants.

For arrow notation add the line (requires MnSymbol package, or stmaryrd if you use Yleft instead):

               {<<<}{{$\lll$}}2 {>>>}{{$\ggg$}}2 {-<}{{$\leftY$}}1

Hiding code from Latex

If you want to hide some code, you can e.g. define:

\long\def\ignore#1{}

Auxiliary functions can be hidden as follows:

\ignore{
\begin{code}
help = putStr "Help me, what is this LiterateProgramming thing??"
\end{code}
}

Thanks to Wolfram Kahl, Oliver Braun and the people of the German TeX-newsgroup.

Ciao, Steffen Mazanek

http://www.steffen-mazanek.de

Hiding code from Haskell

If you want to hide a \begin{code}...\end{code} block from the compiler, say, if you want to show an example in the text that is not actually part of the source code, you can just add a comment right after the "\begin{code}" statement. This will cause the Haskell parser to treat this block as text, not code:

And the definition of the following function
would totally screw up my program, so I'm not
definining it:

\begin{code}% this is not really code
main :: IO ()
main = print "just an example"
\end{code}

See?

While this works well for vanilla LaTeX, if you're using lhs2TeX, then you'll get the "% this is not really code" printed in your output. Instead of doing the above, either use \begin{spec} ... \end{spec}, or if you're writing Bird-style code, flip your ">" characters around:

Neither of the following definitions are really code:

\begin{spec}
main :: IO ()
main = print "another example"
\end{spec}

< main :: IO ()
< main = print "...and another"

Transformation of .lhs-files

Sub-pages here have scripts to convert from the demarcation via > (called "bird style" after Dr. Richard Bird) to \begin{code} and \end{code} pairs

Editors

Multi-mode support in Emacs

Another useful tool for literate programmers is the mmm-mode for Emacs. mmm-mode switches the current major mode of the buffer between two alternatives, depending on the context the cursor is in. If you're in, say, a \begin{code}...\end{code} block, you'll be editing in haskell-mode, but once you leave that block, you'll be editing in latex-mode.

I have managed to cook up a configuration for both literate styles, but surely some Emacs guru can enhance these. To configure mmm-mode for Haskell, add these lines to your .emacs file:

(add-hook 'haskell-mode-hook 'my-mmm-mode)

(mmm-add-classes
 '((literate-haskell-bird
    :submode text-mode
    :front "^[^>]"
    :include-front true
    :back "^>\\|$"
    )
   (literate-haskell-latex
    :submode literate-haskell-mode
    :front "^\\\\begin{code}"
    :front-offset (end-of-line 1)
    :back "^\\\\end{code}"
    :include-back nil
    :back-offset (beginning-of-line -1)
    )))

(defun my-mmm-mode ()
  ;; go into mmm minor mode when class is given
  (make-local-variable 'mmm-global-mode)
  (setq mmm-global-mode 'true))

(setq mmm-submode-decoration-level 0)

You can activate mmm-mode by running "M-x mmm-ify-by-class" in the buffer. Emacs will prompt you for the class to use, to which should answer literate-haskell-bird or literate-haskell-latex, respectively.

If you want Emacs to activate mmm-mode automatically for certain literate Haskell files, add these lines to it at the end:

% ----- Configure Emacs -----
%
% Local Variables: ***
% mode: latex ***
% mmm-classes: literate-haskell-latex ***
% End: ***

This is, what the my-mmm-mode hook does, by the way.

Vim

See Literate programming/Vim.

lhs2TeX

Highly recommended is lhs2TeX at [1], courtesy of Andres Löh. It is designed for typesetting papers about Haskell, but lhs2TeX is easily configured and can make for a powerful preprocessor and documentation generator.

Input to lhs2TeX is a slightly modified .lhs file. One would typically use the standard latex recommendations above, using a \begin{code} and \end{code} pair to demarcate code. Additionally, lhs2TeX provides specialized macros to control the preprocessing.

Note that lhs2TeX and in-line commenting do not seem to mix well.

Since it can typeset Haskell formulas in mathematical notation with LaTeX's math mode, you can also use it to create testable papers. That is, readers can play with the formulas presented in the paper if they obtain the literate Haskell source code for the paper.

Shuffle

Shuffle is a tool which extends the capabilities of lhs2TeX. It was used in documenting the Essential Haskell Compiler project. It is available as part of this project, but it is usable also independently.

@@ Line 1: / Line 1: @@
 [[Category:Glossary]] [[Category:Tutorials]]
-Literate programming was invented / coined / started by Dr. Donald
+Literate programming was invented / coined / started by [http://www-cs-faculty.stanford.edu/~knuth/ Dr. Donald Knuth]. In fact, if you asked Dr. Knuth what his favourite programming
+language was, (5th question of http://www-cs-faculty.stanford.edu/~knuth/faq.html) you would be told CWEB - which is a literate programming
-Knuth. In, fact if you asked Dr. Knuth what his favourite programming
+tool combining C and tex.
-language was, you would be told CWEB - which is a literate programming
-tool combining C and tex.
+What is literate programming? To quote Dr. Knuth:
+<blockquote>"The main idea is to regard a program as a communication to human beings rather than as a set of instructions to a computer."</blockquote>
+The tool used to generate hyperlinked documentation from literate code is [[Haddock]]
+==Haskell and literate programming==
+Haskell is one of the few languages that provides native features to support literate programming. In haskell, a literate program is one with the suffix <code>.lhs</code> rather than <code>.hs</code>.
+In a literate Haskell program, there are two ways to distinguish between code and non-code portions. You can either prepend all code with a <code>&gt; </code>, (bird style) or surround lines of code with <code>\begin{code}</code> and <code>\end{code}</code> pairs (latex style). For those who know, use and love latex, the latter is the suggested way to go.
+Note that [[Happy]] supports literate programming as well, via bird style (the <code>&gt; </code> marker) in <code>.ly</code> files. Sadly, [[Alex]] no longer does. You might consider using [http://www.eecs.harvard.edu/~nr/noweb/ noweb] with it.
+==Bird Style==
+According to the [http://www.haskell.org/onlinereport/literate.html Haskell Report], this style of comment was developed by Richard Bird (hence the name) and Philip Wadler.  All lines starting with <code>></code> are interpreted as code, everything else is considered a comment.  One additional requirement is that you always leave a blank line before and after the code block:
+<haskell>
+In Bird-style you have to leave a blank before the code.
+> fact :: Integer -> Integer
+> fact 0 = 1
+> fact n = n * fact (n-1)
+And you have to leave a blank line after the code as well.
+</haskell>
+The idea behind this restriction is capturing the mistake of not inserting the <code>></code> mark at the beginning of the line.  In general this is not only good practice, but also a formatting that makes the code more readable.
+However, there are cases in which you might like to get around this restriction.  Perhaps you're writing Haskell code within a markup language that's not Latex, and you may have to surround your code with something equivalent to <code>\begin{code}</code> and <code>\end{code}</code>.  In this case, GHC provides a flag that can be used to lift the blank lines requirement:
+<haskell>
+$ ghc -optL -q
+</haskell>
 ==Latex suggestions for literate programming ==
+(See also [[#lhs2TeX]] below)
 In the majority of these suggestions, you can simply write:
 <haskell>
 \begin{code}
-qsort []     = []
+tsort []     = []
-qsort (x:xs) = qsort [y | y<-xs, y>x] ++ [x] ++ qsort [y | y<-xs, y<=x]
+tsort (x:xs) = tsort [y | y<-xs, y>x] ++ [x] ++ tsort [y | y<-xs, y<=x]
 \end{code}
 </haskell>
@@ Line 32: / Line 67: @@
 </pre>
 ===Listings package===
-Another alternatative is to use the latex-package listings, which
+Another alternative is to use the latex-package listings, which
 allows you to do much more customization of the output:
@@ Line 55: / Line 90: @@
       literate={+}{{$+$}}1 {/}{{$/$}}1 {*}{{$*$}}1 {=}{{$=$}}1
                {>}{{$>$}}1 {<}{{$<$}}1 {\\}{{$\lambda$}}1
+               {\\\\}{{\char`\\\char`\\}}1
                {->}{{$\rightarrow$}}2 {>=}{{$\geq$}}2 {<-}{{$\leftarrow$}}2
                {<=}{{$\leq$}}2 {=>}{{$\Rightarrow$}}2
@@ Line 62: / Line 98: @@
     }
 </pre>
 You might want to consult the documentation of the "listings" package, to find out whether there's more you can tune to your likings. (Like adding line numbers, etc.)  Note that the suggested "literate" option above will replace the given symbols anywhere in the text, including inside strings, which is probably not what one wants.
+For arrow notation add the line (requires MnSymbol package, or stmaryrd if you use Yleft instead):
+<pre>
+               {<<<}{{$\lll$}}2 {>>>}{{$\ggg$}}2 {-<}{{$\leftY$}}1
+</pre>
 === Hiding code from Latex ===
@@ Line 105: / Line 146: @@
 </pre>
+While this works well for vanilla LaTeX, if you're using lhs2TeX, then you'll get the "% this is not really code" printed in your output. Instead of doing the above, either use <code>\begin{spec} ... \end{spec}</code>, or if you're writing Bird-style code, flip your ">" characters around:
-== Transformation of .lhs-files ==
+<pre>
-To convert a Haskell script file which uses bird style to
-\begin{code}...\end{code} style use one of the following scripts:
+Neither of the following definitions are really code:
+\begin{spec}
-awk:
+main :: IO ()
+main = print "another example"
+\end{spec}
+< main :: IO ()
-<pre>
+< main = print "...and another"
-# bird2code.awk
-^[^>] || ^$ {print; next}
-^> {
-  print "\\begin{code}"
-  sub(/^> /,"")
-  print
-  rc = getline
-  while(($0 ~ ^>) && (rc > 0)) {
-    sub(/^> /,"")
-    print
-    rc = getline
-  }
-  print "\\end{code}\n"
-}
 </pre>
+== Transformation of .lhs-files ==
-Or in sed:
+Sub-pages here have scripts to convert from the demarcation via <code>&gt; </code> (called "bird style" after Dr. Richard Bird) to <code>\begin{code}</code> and <code>\end{code}</code> pairs
-<pre>
-# bird2code.sed
-^> !p
-^> {
-  i\
-\\begin{code}
+* [[Literate programming/Bird conversion via awk]]
-  :loop
+* [[Literate programming/Bird conversion via sed]]
-  N
-  /\n>[^\n]*$/{
-    b loop
-  }
-  s/^> //
-  s/\(\n\)> /\1/g
-  s/\n$//
-  a\
-\\end{code}\
-  p
-}
-</pre>
-Thanks to Peter Tillier from the comp.lang.awk newsgroup.
 == Editors ==
@@ Line 207: / Line 220: @@
 === Vim ===
+See [[Literate programming/Vim]].
-Take a look at http://urchin.earth.li/~ian/vim.  This improves considerably vim's syntax highlighting.
-See also [[Vim]] for the bits of vim script needed to integrate with the ghc compiler.
 == lhs2TeX ==
 Highly recommended is '''lhs2TeX''' at [http://www.cs.uu.nl/~andres/lhs2tex/], courtesy of Andres Löh.  It is designed for typesetting papers ''about'' Haskell, but '''lhs2TeX''' is easily configured and can make for a powerful preprocessor and documentation generator.
-Since it is able to typeset Haskell formulas in mathematical notation
+Input to lhs2TeX is a slightly modified <code>.lhs</code> file. One would typically use the standard latex recommendations above, using a <code>\begin{code}</code> and <code>\end{code}</code> pair to demarcate code. Additionally, lhs2TeX provides specialized macros to control the preprocessing.
-with LaTeX's math mode you can also use it to create ''testable''
-papers. That is the reader can play with the formulas presented in the
+Note that lhs2TeX and in-line commenting do not seem to mix well.
-paper if he obtains the literate Haskell source code of the paper.
+Since it can typeset Haskell formulas in mathematical notation
+with LaTeX's math mode, you can also use it to create ''testable''
+papers. That is, readers can play with the formulas presented in the
+paper if they obtain the literate Haskell source code for the paper.
+== Shuffle ==
+[http://www.cs.uu.nl/wiki/Ehc/Shuffle Shuffle] is a tool which extends the capabilities of <code>lhs2TeX</code>. It was  used in documenting the [http://www.cs.uu.nl/wiki/Ehc/WebHome Essential Haskell Compiler project]. It is available as part of this project, but it is usable also independently.
+==See also==
+* [http://www.literateprogramming.com/ Daniel Mall's website for Literate Programming.]
+* [http://en.wikipedia.org/wiki/Literate_programming Wikipedia]
+* [[Textual Haskell source]] - a <code>.ths</code> processor, where text is king.