Difference between revisions of "User:Maeder"

From HaskellWiki
Jump to navigation Jump to search
m (Haskell programming guidelines proposal)
 
(moved to Programming_guidelines)
 
Line 1: Line 1:
 
== Haskell programming guidelines proposal ==
 
 
 
Programming guidelines shall help to make the code of a project better
 
readable and maintainable by the varying number of contributors.
 
 
It takes some programming experience to develop something like a
 
personal "coding style" and guidelines only serve
 
as rough shape for code. Guidelines should be followed by all members
 
working on the project even if they prefer (or are already used to)
 
different guidelines.
 
 
In the following I will describe documentation, file format, naming
 
conventions and good programming practice (adapted form Matt's C/C++
 
Programming Guidelines and the Linux kernel coding style).
 
 
 
=== Documentation ===
 
 
 
Comments are to be written in application terms (i.e. user's point of
 
view). Don't use technical terms - that's what the code is for!
 
 
Comments should be written using correct spelling and grammar in complete
 
sentences with punctation (in English only).
 
 
"Generally, you want your comments to tell WHAT your code does, not HOW.
 
Also, try to avoid putting comments inside a function body: if the
 
function is so complex that you need to separately comment parts of it,
 
you should probably" (... decompose it)
 
 
Put a haddock comment on top of every exported function and data type!
 
Make sure haddock accepts these comments.
 
 
 
=== File Format ===
 
 
 
All Haskell source files start with a haddock header of the form:
 
 
{- | <br>
 
Module : <File name, i.e. generated by \$Header\$> <br>
 
Description : <Short text displayed on contents page> <br>
 
Copyright : (c) <You> and <Your affiliation> <br>
 
License : similar to LGPL, see LICENSE.txt <br>
 
<br>
 
Maintainer : maeder@tzi.de <br>
 
Stability : provisional <br>
 
Portability : portable <br>
 
<br>
 
<module description starting at first column> <br>
 
-} <br>
 
 
 
A possible compiler pragma (like {-# OPTIONS -cpp #-}) may precede
 
this header. The following hierarchical module name must of course
 
match the file name.
 
 
Make sure that the description is changed to meet the module (if the
 
header was copied from elsewhere). Insert your email address as maintainer.
 
 
Try to write portable (Haskell98) code. If you (indirectly) import
 
a module that uses i.e. multi-parameter type classes and functional
 
dependencies the code becomes "non-portable (MPTC with FD)".
 
 
The Dollar-Header-Dollar entry is automatically expanded by cvs (and will wrap
 
around). All other lines should not be longer than 80 (preferably 75)
 
characters to avoid wrapped lines (for casual readers)!
 
 
Expand all your tabs to spaces to avoid the danger of wrongly expanding
 
them (or a different display of tabs versus eight spaces). Possibly put
 
something like the following in your ~/.emacs file.
 
 
(custom-set-variables '(indent-tabs-mode nil))
 
 
The last character in your file should be a newline! Under solaris
 
you'll get a warning if this is not the case and sometimes last lines
 
without newlines are ignored (i.e. "#endif" without newline). Emacs
 
usually asks for a final newline.
 
 
The whole module should not be too long (about 400 lines)
 
 
 
=== Naming Conventions ===
 
 
 
In Haskell types start with capital and functions with lowercase
 
letters, so only avoid infix identifiers! Defining symbolic infix
 
identifiers should be left to library writers only.
 
 
(The infix identifier "\\" at the end of a line causes cpp preprocessor
 
problems.)
 
 
Names (especially global ones) should be descriptive and if you need
 
long names write them as mixed case words (aka camelCase). (but "tmp"
 
is to be preferred over "thisVariableIsATemporaryCounter")
 
 
 
=== Good Programming Practice ===
 
 
 
"Functions should be short and sweet, and do just one thing. They should
 
fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24,
 
as we all know), and do one thing and do that well."
 
 
Most haskell functions should be at most a few lines, only case
 
expression over large data types (that should be avoided, too) may need
 
corresponding space.
 
 
The code should be succinct (though not obfuscated), readable and easy to
 
maintain (after unforeseeable changes). Don't exploit exotic language
 
features without good reason.
 
 
It's not fixed how deep you indent (4 or 8 chars). You can break the
 
line after "do", "let", "where", and "case .. of". Make sure that
 
renamings don't destroy your layout. (If you get to far to the right,
 
the code is unreadable anyway and needs to be decomposed.)
 
 
Bad:
 
case foo of Foo -> "Foo"
 
Bar -> "Bar"
 
Good:
 
case <longer expression> of
 
Foo -> "Foo"
 
Bar -> "Bar"
 
 
Avoid the notation with braces and semicolons since the layout rule
 
forces you to properly align your alternatives.
 
 
Respect compiler warnings. Supply type signatures, avoid shadowing and
 
unused variables. Particularly avoid non-exhaustive and
 
overlapping patterns. Missing unreachable cases can be filled in using
 
"error" with a fixed string "<ModuleName>.<function>" to indicate the
 
error position (in case the impossible should happen). Don't invest
 
time to "show" the offending value, only do this temporarily when
 
debugging the code.
 
 
Don't leave unused or commented-out code in your files! Readers don't
 
know what to think of it.
 
 
 
==== Case expressions ====
 
 
Prefer case expressions over pattern binding declarations.
 
 
Not always nice:
 
longFunctionName (Foo: _ : _) = e1
 
longFunctionName (Bar: _) = e2
 
 
Better (I think):
 
longFunctionName arg = case arg of
 
Foo : _ : _ -> e1
 
Bar : _ -> e2
 
_ -> error "ProgrammingGuidelines.longFunctionName"
 
 
For partial functions document their preconditions (if not obvious)
 
and make sure that partial functions are only called when
 
preconditions are obviously fulfilled (i.e. by a case statement or a
 
previous test). Particularly the call of "head" should be used with
 
care or (even better) be made obsolete by a case statement.
 
 
Usually a case statement (and the import of isJust and fromJust from
 
Data.Maybe) can be avoided by using the "maybe" function:
 
 
maybe (error "<ModuleName>.<function>") id $ Map.lookup key map
 
 
Generally we require you to be more explicit about failure
 
cases. Surely a missing (or an irrefutable) pattern
 
would precisely report the position of a runtime error, but these are
 
not so obvious when reading the code.
 
 
Do avoid mixing "let" and "where". (I prefer "let" and have auxiliary
 
function on the top-level that are not exported.) Export lists also
 
support the detection of unused functions.
 
 
If you notice that you're doing the same task again, try to generalize
 
it in order to avoid duplicate code. It is frustrating to change the
 
same error in several places.
 
 
 
==== Application notation ====
 
 
 
Many parentheses can be eliminated using the infix application operator "$"
 
with lowest priority. Try at least to avoid unnecessary parentheses in
 
standard infix expression.
 
 
f x : g x ++ h x
 
 
a == 1 && b == 1 || a == 0 && b == 0
 
 
Rather than putting a large final argument in parentheses (with a
 
distant closing one) consider using "$" instead.
 
 
"f (g x)" becomes "f $ g x" and consecutive applications
 
"f (g (h x))" can be written as "f $ g $ h x" or "f . g $ h x".
 
 
A function definition like
 
"f x = g $ h x" can be abbreviated to "f = g . h".
 
 
Note that the final argument may even be an infix- or case expression:
 
 
map id $ c : l
 
 
filter (const True) . map id $ case l of ...
 
 
However, be aware that $-terms cannot be composed further in infix
 
expressions.
 
 
Probably wrong:
 
f $ x ++ g $ x
 
 
But the scope of an expression is also limited by the layout rule, so
 
it is usually save to use "$" on right hand sides.
 
 
Ok:
 
do y <- f $ l
 
++
 
do y <- g $ l
 
 
Of course "$" can not be used in types. GHC has also some primitive
 
functions involving the kind "#" that cannot be applied using "$".
 
 
Last warning: always leave spaces around "$" (and other mixfix
 
operators) since a clash with template haskell is possible.
 
 
(Also write "\ t" instead of "\t" in lambda expressions)
 
 
 
==== List Comprehensions ====
 
 
Use these only when "short and sweet". (I prefer map, filter, and foldr.)
 
 
 
==== Types ====
 
 
Prefer proper data types over type synonyms or tuples even if you have
 
to do more constructing and unpacking. This will make it easier to
 
supply class instances later on. Don't put class constraints on
 
a data type, constraints belong only to the functions that manipulate
 
the data.
 
 
Using type synonyms consistently is difficult over a longer time,
 
because this is not checked by the compiler. (The types shown by
 
the compiler may be unpredictable: i.e. FilePath, String or [Char])
 
 
Take care if your data type has many variants (unless it is an
 
enumeration type.) Don't repeat common parts in every variant since
 
this will cause code duplication.
 
 
Bad (to handle arguments in sync):
 
 
data Mode f p = Box f p | Diamond f p
 
 
Good (to handle arguments only once):
 
 
data BoxOrDiamond = Box | Diamond
 
 
data Mode f p = Mode BoxOrDiamond f p
 
 
 
Consider (bad):
 
 
data Tupel a b = Tupel a b | Undefined
 
 
versus (better):
 
 
data Tupel a b = Tupel a b
 
 
and using:
 
 
Maybe (Tupel a b)
 
 
(or another monad) whenever an undefined result needs to be propagated
 
 
 
==== Records ====
 
 
For (large) records avoid the use of the constructor directly and
 
remember that the order and number of fields may change.
 
 
Take care with (the rare case of) depend polymorphic fields:
 
 
data Fields a =
 
VariantWithTwo { field1 :: a
 
, field2 :: a }
 
 
The type of a value v can not be changed by only setting field1:
 
 
v { field1 = f }
 
 
Better construct a new value:
 
 
VariantWithTwo { field1 = f } -- leaving field2 undefined
 
 
Or use a polymorphic element that is instantiated by updating:
 
 
e = VariantWithTwo { field1 = [], field2 = [] }
 
 
e { field1 = [f] }
 
 
Several variants with identical fields may avoid some code duplication
 
when selecting and updating, though possibly not in a few
 
depended polymorphic cases.
 
 
However, I doubt if the following is a really good alternative to the
 
above data Mode with data BoxOrDiamond.
 
 
 
data Mode f p =
 
Box { formula :: f, positions :: p }
 
| Diamond { formula :: f, positions :: p }
 
 
 
 
==== IO ====
 
 
Try to strictly separate IO, Monad and pure (without do) function
 
programming (possibly via separate modules).
 
 
Bad:
 
x <- return y
 
...
 
 
Good:
 
let x = y
 
...
 
 
 
Don't use Prelude.interact and make sure your program does not depend
 
on the (not always obvious) order of evaluation. I.e. don't read and
 
write to the same file:
 
 
This will fail:
 
 
do s <- readFile f
 
writeFile f $ 'a' : s
 
 
because of lazy IO! (Writing is starting before reading is finished.)
 
 
 
==== Trace ====
 
 
Tracing is for debugging purposes only and should not be used as
 
feedback for the user. Clean code is not cluttered by trace calls.
 
 
 
==== Imports ====
 
 
Standard library modules like Char. List, Maybe, Monad, etc should be
 
imported by their hierarchical module name, i.e. the base package (so
 
that haddock finds them):
 
 
import Data.List
 
import Control.Monad
 
import System.Environment
 
 
The libraries for Set and Map are to be imported qualified:
 
 
import qualified Data.Set as Set
 
import qualified Data.Map as Map
 
 
 
 
==== Glasgow extensions and Classes ====
 
 
Stay away form extensions as long as possible. Also use classes with
 
care because soon the desire for overlapping instances (like for lists
 
and strings) may arise. Then you may want MPTC (multi-parameter type
 
classes), functional dependencies (FD), undecidable and possibly incoherent
 
instances and then you are "in the wild" (according to SPJ).
 
 
 
=== Final remarks ===
 
 
 
Despite guidelines, writing "correct code" (without formal proof
 
support yet) still remains the major challenge. As motivation to
 
follow these guidelines consider the points that are from the "C++
 
Coding Standard", where I replaced "C++" with "Haskell".
 
 
Good Points:
 
 
* programmers can go into any code and figure out what's going on
 
 
* new people can get up to speed quickly
 
 
* people new to Haskell are spared the need to develop a personal
 
style and defend it to the death
 
 
* people new to Haskell are spared making the same mistakes over
 
and over again
 
 
* people make fewer mistakes in consistent environments
 
 
* programmers have a common enemy :-)
 
 
Bad Points:
 
 
* the standard is usually stupid because it was made by someone
 
who doesn't understand Haskell
 
 
* the standard is usually stupid because it's not what I do
 
 
* standards reduce creativity
 
 
* standards are unnecessary as long as people are consistent
 
 
* standards enforce too much structure
 
 
* people ignore standards anyway
 
 
Send comments to the "Maintainer" of this file
 

Latest revision as of 13:12, 24 February 2006