Difference between revisions of "Programming guidelines"

From HaskellWiki
Jump to navigation Jump to search
m (changed my email)
 
(26 intermediate revisions by 13 users not shown)
Line 8: Line 8:
 
guidelines.
 
guidelines.
   
These guidelines have been originally set up for the hets-project
+
These guidelines have been originally set up for the [http://hets.eu hets-project] and are
[http://www.informatik.uni-bremen.de/cofi/hets/ hets-project] and are
 
 
now put on the [http://haskell.org/haskellwiki/ HaskellWiki] gradually
 
now put on the [http://haskell.org/haskellwiki/ HaskellWiki] gradually
integrating parts of [http://haskell.org/hawiki the old hawiki]
+
integrating parts of the old hawiki
 
entries [http://haskell.org/haskellwiki/Things_to_avoid ThingsToAvoid] and
 
entries [http://haskell.org/haskellwiki/Things_to_avoid ThingsToAvoid] and
[http://haskell.org/hawiki/HaskellStyle HaskellStyle] (hopefully not
+
HaskellStyle (hopefully not
 
hurting someone's copyrights). The other related entry
 
hurting someone's copyrights). The other related entry
[http://haskell.org/hawiki/TipsAndTricks TipsAndTricks] treats more
+
TipsAndTricks treats more
 
specific points that are left out here,
 
specific points that are left out here,
   
Line 28: Line 27:
   
 
The following quote and links are taken from
 
The following quote and links are taken from
  +
HaskellStyle:
[http://haskell.org/hawiki/HaskellStyle the old general comments]:
 
   
We all have our own ideas about good Haskell style. There's More Than
+
:<i>We all have our own ideas about good Haskell style. There's More Than One Way To Do It. But some ways are better than others.
  +
</i>
One Way To Do It. But some ways are better than others.
 
   
 
Some comments from the GHC team about their internal coding
 
Some comments from the GHC team about their internal coding
Line 37: Line 36:
 
http://hackage.haskell.org/trac/ghc/wiki/WorkingConventions
 
http://hackage.haskell.org/trac/ghc/wiki/WorkingConventions
   
  +
Also https://simon.peytonjones.org/publications-2000/#wearing-the-hair-shirt-a-retrospective-on-haskell-2003 contains some brief comments on syntax and style.
Also http://research.microsoft.com/~simonpj/papers/haskell-retrospective/
 
contains some brief comments on syntax and style,
 
   
 
What now follows are descriptions of program documentation, file
 
What now follows are descriptions of program documentation, file
Line 46: Line 44:
   
   
=== Documentation ===
+
== Documentation ==
   
  +
* Comments are to be written in application terms (i.e. user's point of view). Don't use technical terms - that's what the code is for!
   
Comments are to be written in application terms (i.e. user's point of
+
* Comments should be written using correct spelling and grammar in complete sentences with punctuation (in English only):
view). Don't use technical terms - that's what the code is for!
 
   
  +
::<i>Generally, you want your comments to tell WHAT your code does, not HOW. Also, try to avoid putting comments inside a function body: if the function is so complex that you need to separately comment parts of it, you should probably [... decompose it.]
Comments should be written using correct spelling and grammar in complete
 
  +
</i>
sentences with punctation (in English only).
 
   
  +
* Put a [http://haskell-haddock.readthedocs.io/en/latest/markup.html haddock comment] on top of every exported function and data type! Make sure haddock accepts these comments.
"Generally, you want your comments to tell WHAT your code does, not HOW.
 
Also, try to avoid putting comments inside a function body: if the
 
function is so complex that you need to separately comment parts of it,
 
you should probably" (... decompose it)
 
   
  +
== File Format ==
Put a haddock comment on top of every exported function and data type!
 
Make sure haddock accepts these comments.
 
   
  +
* All Haskell source files start with a haddock header of the form:
   
  +
:<haskell>
=== File Format ===
 
 
 
All Haskell source files start with a haddock header of the form:
 
 
<pre>
 
 
{- |
 
{- |
 
Module : <File name or $Header$ to be replaced automatically>
 
Module : <File name or $Header$ to be replaced automatically>
Description : <Short text displayed on contents page>
+
Description : <optional short text displayed on contents page>
Copyright : (c) <You> and <Your affiliation>
+
Copyright : (c) <Authors or Affiliations>
License : similar to LGPL, see LICENSE.txt
+
License : <license>
   
Maintainer : Christian.Maeder@dfki.de
+
Maintainer : <email>
Stability : provisional
+
Stability : unstable | experimental | provisional | stable | frozen
Portability : portable
+
Portability : portable | non-portable (<reason>)
   
 
<module description starting at first column>
 
<module description starting at first column>
 
-}
 
-}
</pre>
+
</haskell>
   
  +
:(the <code>\$Header\$</code> entry will be automatically expanded.)
A possible compiler pragma (like {-# OPTIONS -cpp #-}) may precede
 
this header. The following hierarchical module name must of course
 
match the file name.
 
   
  +
:A possible compiler pragma (like <code>{-# LANGUAGE CPP #-}</code>) may precede this header. The following hierarchical module name must, of course, match the file name.
Make sure that the description is changed to meet the module (if the
 
header was copied from elsewhere). Insert your email address as maintainer.
 
   
  +
* Make sure that the description is changed to meet the module (if the header was copied from elsewhere). Insert your email address as maintainer.
Try to write portable (Haskell98) code. If you (indirectly) import
 
a module that uses i.e. multi-parameter type classes and functional
 
dependencies the code becomes "non-portable (MPTC with FD)".
 
   
  +
* Try to write portable ([[Language_and_library_specification#The_Haskell_2010_report|Haskell 2010]]) code. If you use e.g. [[MPTC|multi-parameter type classes]] (MTPC) and [[Functional dependencies|functional dependencies]] (FD) the code becomes [https://mail.haskell.org/pipermail/haskell-prime/2006-February/000609.html "non-portable (MPTC with FD)"].
The \$Header\$ entry is automatically expanded by cvs (and will wrap
 
around). All other lines should not be longer than 80 (preferably 75)
 
characters to avoid wrapped lines (for casual readers)!
 
   
  +
* Lines should not be longer than 80 (preferably 75) characters. Code with short lines reads casually and easier to understand. If the expression is longer than 80 lines, try to structure & rewrite the code in a more expressive way. 80 character lines is considered good practice across IT industry and supported in all cases.
Expand all your tabs to spaces to avoid the danger of wrongly expanding
 
them (or a different display of tabs versus eight spaces). Possibly put
 
something like the following in your ~/.emacs file.
 
   
  +
* Don't leave trailing white space in your code in every line.
(custom-set-variables '(indent-tabs-mode nil))
 
   
  +
* Expand all your tabs to spaces to avoid the danger of wrongly expanding them (or a different display of tabs versus eight spaces). Possibly put something like the following in your <tt>~/.emacs</tt> file:
The last character in your file should be a newline! Under solaris
 
you'll get a warning if this is not the case and sometimes last lines
 
without newlines are ignored (i.e. "#endif" without newline). Emacs
 
usually asks for a final newline.
 
   
  +
: <code>(custom-set-variables '(indent-tabs-mode nil))</code>
The whole module should not be too long (about 400 lines)
 
   
  +
* The last character in your file should be a newline! Under Solaris you'll get a warning if this is not the case and sometimes last lines without newlines are ignored (i.e. <code>#endif</code> without newline). Emacs usually asks for a final newline.
   
  +
* You may use http://hackage.haskell.org/package/scan to check your file format.
=== Naming Conventions ===
 
   
  +
* The whole module should not be too long (about 400 lines)
   
  +
Please have a look at the [http://haskell-haddock.readthedocs.io/en/latest/markup.html#the-module-description Haddock module header documentation].
In Haskell types start with capital and functions with lowercase
 
letters, so only avoid infix identifiers! Defining symbolic infix
 
identifiers should be left to library writers only.
 
   
  +
== Naming Conventions ==
(The infix identifier "\\" at the end of a line causes cpp preprocessor
 
problems.)
 
   
  +
* In Haskell types start with capital and functions with lowercase letters, so only avoid infix identifiers! Defining symbolic infix identifiers should be left to library writers only.
Names (especially global ones) should be descriptive and if you need
 
long names write them as mixed case words (aka camelCase). (but "tmp"
 
is to be preferred over "thisVariableIsATemporaryCounter")
 
   
  +
:(The infix identifier <code>"\\"</code> at the end of a line causes CPP preprocessor problems.)
Also in the standard libraries, function names with multiple words are
 
written using the camelCase convention. Similarly, type, typeclass and
 
constructor names are written using the StudlyCaps convention.
 
   
  +
* Names (especially global ones) should be descriptive. If you need long names write them in [[wiki:CamelCase|lowerCamelCase]]. Laconic names are preferred.
Some parts of our code use underlines (without unnecessary uppercase
 
letters) for long identifiers to better reflect names given with
 
hyphens in the requirement documentation. Also such names should be
 
transliterated to camlCase identifiers possibly adding a (consistent)
 
suffix or prefix to avoid conflicts with keywords. However, instead of
 
a recurring prefix or suffix you may consider to use qualified imports
 
and names.
 
   
  +
* Similarly, type, type class, and constructor names are written using [[wiki:CamelCase|UpperCamelCase]].
   
  +
* In the standard libraries, some parts of Haskell code use [https://en.wikipedia.org/wiki/Snake_case snake_case] for long identifiers to better reflect names given with hyphens in the required documentation. If used in outer code - such names should be transliterated to camlCase identifiers possibly adding a (consistent) suffix or prefix to avoid conflicts with keywords. And instead of a recurring prefix or suffix, you may consider using qualified imports and names.
=== Good Programming Practice ===
 
   
  +
== Good Programming Practice ==
   
"Functions should be short and sweet, and do just one thing. They should
+
::<i>Functions should be short and sweet, and do just one thing. They should fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24, as we all know), and do one thing and do that well.
  +
</i>
fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24,
 
as we all know), and do one thing and do that well."
 
   
Most haskell functions should be at most a few lines, only case
+
* Most Haskell functions should be at most a few lines, only case expressions over large data types (that should be avoided, too) may need corresponding space.
expression over large data types (that should be avoided, too) may need
 
corresponding space.
 
   
  +
* For lambda expressions, write <code>\ t -> …</code> instead of <code>\t -> …</code>.
The code should be succinct (though not obfuscated), readable and easy to
 
maintain (after unforeseeable changes). Don't exploit exotic language
 
features without good reason.
 
   
  +
* The code should be succinct (though not obfuscated), readable and easy to maintain (after unforeseeable changes). Don't exploit exotic language features without good reason.
It's not fixed how deep you indent (4 or 8 chars). You can break the
 
line after "do", "let", "where", and "case .. of". Make sure that
 
renamings don't destroy your layout. (If you get to far to the right,
 
the code is unreadable anyway and needs to be decomposed.)
 
   
  +
* It's not fixed how deep you indent (4 or 8 chars). You can break the line after <code>do</code>, <code>let</code>, <code>where</code>, and <code>case … of …</code>. Make sure that renamings don't destroy your layout. (If you get too far to the right, the code is unreadable anyway and needs to be decomposed.)
Bad:
 
case foo of Foo -> "Foo"
 
Bar -> "Bar"
 
Good:
 
case <longer expression> of
 
Foo -> "Foo"
 
Bar -> "Bar"
 
   
  +
::Bad:
Avoid the notation with braces and semicolons since the layout rule
 
  +
::<haskell>
forces you to properly align your alternatives.
 
  +
case foo of Foo -> "Foo"
  +
Bar -> "Bar"
  +
</haskell>
   
  +
::Good:
Respect compiler warnings. Supply type signatures, avoid shadowing and
 
  +
::<haskell>
unused variables. Particularly avoid non-exhaustive and
 
  +
case <longer expression> of
overlapping patterns. Missing unreachable cases can be filled in using
 
  +
Foo -> "Foo"
"error" with a fixed string "<ModuleName>.<function>" to indicate the
 
  +
Bar -> "Bar"
error position (in case the impossible should happen). Don't invest
 
  +
</haskell>
time to "show" the offending value, only do this temporarily when
 
debugging the code.
 
   
  +
:Avoid the notation with braces and semicolons since the layout rule forces you to properly align your alternatives.
Don't leave unused or commented-out code in your files! Readers don't
 
know what to think of it.
 
   
  +
* Respect compiler warnings. Supply type signatures, avoid shadowing and unused variables. Particularly avoid non-exhaustive and overlapping patterns. Missing unreachable cases can be filled in using <code>error</code> with a fixed string <code>"<ModuleName>.<function>"</code> to indicate the error position (in case the impossible should happen). Don't invest time to "show" the offending value, only do this temporarily when debugging the code.
   
  +
* Don't leave unused or commented-out code in your files! Readers don't know what to think of it.
==== Case expressions ====
 
   
  +
=== Partial functions ===
Prefer case expressions over pattern binding declarations with
 
multiple equations.
 
   
  +
* For partial functions do document their preconditions (if not obvious) and make sure that partial functions are only called when preconditions are obviously fulfilled (i.e. by a case statement or a previous test). Particularly the call of <code>head</code> should be used with care or (even better) be made obsolete by a case statement.
Not always nice:
 
longFunctionName (Foo: _ : _) = e1
 
longFunctionName (Bar: _) = e2
 
   
  +
* Usually a case-expression (and the import of <code>isJust</code> and <code>fromJust</code> from <code>Data.[[Maybe]]</code>) can be avoided by using the <code>maybe</code> function:
Better:
 
longFunctionName arg = case arg of
 
Foo : _ : _ -> e1
 
Bar : _ -> e2
 
_ -> error "ProgrammingGuidelines.longFunctionName"
 
   
  +
:<haskell>
In
 
  +
maybe (error "<ModuleName>.<function>") id $ Map.lookup key map
http://research.microsoft.com/~simonpj/papers/haskell-retrospective/
 
  +
</haskell>
the first example is said to be written in [[declaration style]]. The
 
equations look like written for a rewrite system (although their order
 
matters of course).
 
   
  +
:Generally we require you to be more explicit about failure cases. Surely a missing (or an irrefutable) pattern would precisely report the position of a runtime error, but these are not so obvious when reading the code.
But this declarative style is only nice for toy examples and annoying
 
if functions are renamed or if the number of arguments changes.
 
   
  +
=== Let or where expressions ===
The other extreme (according to SPJ) is [[expression style]]:
 
longFunctionName = \ arg -> ...
 
   
  +
Do avoid mixing and nesting <code>let</code> and <code>where</code>. (I prefer the expression-stylistic <code>let</code>.) Use auxiliary top-level functions that you do not export. Export lists also support the detection of unused functions.
We don't propose this style either. We propose to use as much pattern
 
matching (including as-patterns) on a single left-hand-side as appropriate.
 
   
  +
=== Code reuse ===
However, the expression style with a lambda term may come in handy, when
 
setting record fields of a function type.
 
   
  +
If you notice that you're doing the same task again, try to generalize it in order to avoid duplicate code. It is frustrating to change the same error in several places.
We avoid lambda expressions if this is easily possibly using the
 
Prelude functions const, flip, curry, uncurry or section notation or
 
plain partial application. We do not indroduce an auxiliary function only to
 
avoid the lambda, though.
 
   
  +
=== Application notation ===
   
  +
Many parentheses can be eliminated using the infix application operator <code>$</code> with lowest priority. Try at least to avoid unnecessary parentheses in standard infix expression.
==== Partial functions ====
 
   
  +
<haskell>
For partial functions do document their preconditions (if not obvious)
 
  +
f x : g x ++ h x
and make sure that partial functions are only called when
 
preconditions are obviously fulfilled (i.e. by a case statement or a
 
previous test). Particularly the call of "head" should be used with
 
care or (even better) be made obsolete by a case statement.
 
   
  +
a == 1 && b == 1 || a == 0 && b == 0
Usually a case statement (and the import of isJust and fromJust from
 
  +
</haskell>
Data.[[Maybe]]) can be avoided by using the "maybe" function:
 
   
  +
Rather than putting a large final argument in parentheses (with a distant closing one) consider using <code>$</code> instead.
maybe (error "<ModuleName>.<function>") id $ Map.lookup key map
 
   
  +
* <code>f (g x)</code> becomes <code>f $ g x</code> and consecutive applications
Generally we require you to be more explicit about failure
 
  +
* <code>f (g (h x))</code> can be written as <code>f $ g $ h x</code> or <code>f . g $ h x</code>.
cases. Surely a missing (or an irrefutable) pattern would precisely
 
report the position of a runtime error, but these are not so obvious
 
when reading the code.
 
 
==== Let or where expressions ====
 
 
Do avoid mixing and nesting "let" and "where". (I prefer the
 
expression-stylistic "let".) Use auxiliary top-level functions that
 
you do not export. Export lists also support the detection of unused
 
functions.
 
 
 
==== Code reuse ====
 
 
If you notice that you're doing the same task again, try to generalize
 
it in order to avoid duplicate code. It is frustrating to change the
 
same error in several places.
 
 
 
==== Application notation ====
 
 
Many parentheses can be eliminated using the infix application operator "$"
 
with lowest priority. Try at least to avoid unnecessary parentheses in
 
standard infix expression.
 
 
f x : g x ++ h x
 
 
a == 1 && b == 1 || a == 0 && b == 0
 
 
Rather than putting a large final argument in parentheses (with a
 
distant closing one) consider using "$" instead.
 
 
"f (g x)" becomes "f $ g x" and consecutive applications
 
"f (g (h x))" can be written as "f $ g $ h x" or "f . g $ h x".
 
   
 
A function definition like
 
A function definition like
"f x = g $ h x" can be abbreviated to "f = g . h".
+
<code>f x = g $ h x</code> can be abbreviated to <code>f = g . h</code>.
   
Note that the final argument may even be an infix- or case expression:
+
Note that the final argument may even be an infix or case-expression:
   
  +
:<haskell>
map id $ c : l
 
  +
map id $ c : l
   
filter (const True) . map id $ case l of ...
+
filter (const True) . map id $ case l of
  +
</haskell>
   
However, be aware that $-terms cannot be composed further in infix
+
However, be aware that <code>$</code>-terms cannot be composed further in infix expressions.
expressions.
 
   
 
Probably wrong:
 
Probably wrong:
  +
:<haskell>
f $ x ++ g $ x
 
  +
f $ x ++ g $ x
 
  +
</haskell>
But the scope of an expression is also limited by the layout rule, so
 
it is usually save to use "$" on right hand sides.
 
 
Ok:
 
do f $ l
 
++
 
do g $ l
 
   
  +
But the scope of an expression is also limited by the layout rule, so it is usually safe to use <code>$</code> on right hand sides:
Of course "$" can not be used in types. GHC has also some primitive
 
functions involving the kind "#" that cannot be applied using "$".
 
   
  +
:Ok:
Last warning: always leave spaces around "$" (and other mixfix
 
  +
::<haskell>
operators) since a clash with template haskell is possible.
 
  +
do f $ l
  +
++
  +
do g $ l
  +
</haskell>
   
  +
Of course <code>$</code> can not be used in types. GHC has also some primitive functions involving the kind <code>#</code> that cannot be applied using <code>$</code>.
(Also write "\ t" instead of "\t" in lambda expressions)
 
   
  +
Last warning: always leave spaces around <code>$</code> (and other mixfix operators) since a clash with template haskell is possible.
   
==== List Comprehensions ====
+
=== List Comprehensions ===
   
Use these only when "short and sweet". Prefer map, filter, and foldr!
+
Use these only when "short and sweet". Prefer <code>map</code>, <code>filter</code>, and <code>foldr</code>!
   
 
Instead of:
 
Instead of:
   
  +
:<haskell>
[toUpper c | c <- s]
 
  +
[toUpper c | c <- s]
  +
</haskell>
   
 
write:
 
write:
   
  +
:<haskell>
map toUpper s
 
  +
map toUpper s
 
  +
</haskell>
   
 
Consider:
 
Consider:
   
  +
:<haskell>
[toUpper c | s <- strings, c <- s]
 
  +
[toUpper c | s <- strings, c <- s]
  +
</haskell>
   
Here it takes some time for the reader to find out which value depends
+
Here it takes some time for the reader to find out which value depends on what other value and it is not so clear how many times the interim values <code>s</code> and <code>c</code> are used.
on what other value and it is not so clear how many times the interim
 
values s and c are used. In contrast to that the following can't be clearer:
 
   
  +
In contrast to that the following can't be clearer:
map toUpper (concat strings)
 
   
  +
:<haskell>
  +
map toUpper (concat strings)
  +
</haskell>
   
When using higher order functions you can switch easier to data
+
When using higher-order functions you can switch easier to data structures different from list. Compare:
structures different from list. Compare:
 
   
  +
:<haskell>
map (1+) list
 
  +
map (1+) list
  +
</haskell>
   
 
and:
 
and:
   
  +
:<haskell>
Set.map (1+) set
 
  +
Set.map (1+) set
  +
</haskell>
   
  +
=== Records ===
   
  +
For (large) records avoid the use of the constructor directly and remember that the order and number of fields may change.
==== Types ====
 
   
  +
Take care with (the rare case of) depend polymorphic fields:
Prefer proper data types over type synonyms or tuples even if you have
 
to do more constructing and unpacking. This will make it easier to
 
supply class instances later on. Don't put class constraints on
 
a data type, constraints belong only to the functions that manipulate
 
the data.
 
   
  +
:<haskell>
Using type synonyms consistently is difficult over a longer time,
 
  +
data Fields a = VariantWithTwo
because this is not checked by the compiler. (The types shown by
 
  +
{ field1 :: a
the compiler may be unpredictable: i.e. FilePath, String or [Char])
 
  +
, field2 :: a }
  +
</haskell>
   
  +
The type of a value <code>v</code> can not be changed by only setting <code>field1</code>:
Take care if your data type has many variants (unless it is an
 
enumeration type.) Don't repeat common parts in every variant since
 
this will cause code duplication.
 
   
  +
:<haskell>
Bad (to handle arguments in sync):
 
  +
v { field1 = f }
  +
</haskell>
   
  +
Better construct a new value:
data Mode f p = Box f p | Diamond f p
 
   
  +
:<haskell>
Good (to handle arguments only once):
 
  +
VariantWithTwo { field1 = f } -- leaving field2 undefined
  +
</haskell>
   
  +
Or use a polymorphic element that is instantiated by updating:
data BoxOrDiamond = Box | Diamond
 
   
  +
:<haskell>
data Mode f p = Mode BoxOrDiamond f p
 
  +
empty = VariantWithTwo { field1 = [], field2 = [] }
   
  +
empty { field1 = [f] }
  +
</haskell>
   
  +
Several variants with identical fields may avoid some code duplication when selecting and updating, though possibly not in a few depended polymorphic cases.
Consider (bad):
 
   
  +
However, I doubt if the following is a really good alternative to the above data <code>Mode</code> with data <code>BoxOrDiamond</code>.
data Tuple a b = Tuple a b | Undefined
 
   
  +
:<haskell>
versus (better):
 
  +
data Mode f p =
  +
Box { formula :: f, positions :: p }
  +
| Diamond { formula :: f, positions :: p }
  +
</haskell>
   
  +
=== Types ===
data Tuple a b = Tuple a b
 
   
  +
* Prefer proper data types over type synonyms or tuples even if you have to do more constructing and unpacking. This will make it easier to supply class instances later on. Don't put class constraints on a data type, constraints belong only to the functions that manipulate the data.
and using:
 
   
  +
* Using type synonyms consistently is difficult over a longer time, because this is not checked by the compiler. (The types shown by the compiler may be unpredictable: i.e. <code>FilePath</code>, <code>String</code> or <code>[Char]</code>.)
Maybe (Tuple a b)
 
   
  +
* Take care if your data type has many variants (unless it is an enumeration type). Don't repeat common parts in every variant since this will cause code duplication.
(or another monad) whenever an undefined result needs to be propagated
 
   
  +
:Bad (to handle arguments in sync):
   
  +
::<haskell>
==== Records ====
 
  +
data Mode f p = Box f p | Diamond f p
  +
</haskell>
   
  +
:Good (to handle arguments only once):
For (large) records avoid the use of the constructor directly and
 
remember that the order and number of fields may change.
 
   
  +
::<haskell>
Take care with (the rare case of) depend polymorphic fields:
 
  +
data BoxOrDiamond = Box | Diamond
   
data Fields a = VariantWithTwo
+
data Mode f p = Mode BoxOrDiamond f p
  +
</haskell>
{ field1 :: a
 
, field2 :: a }
 
   
The type of a value v can not be changed by only setting field1:
 
   
  +
* Consider (bad):
v { field1 = f }
 
   
  +
::<haskell>
Better construct a new value:
 
  +
data Tuple a b = Tuple a b | Undefined
 
  +
</haskell>
VariantWithTwo { field1 = f } -- leaving field2 undefined
 
 
Or use a polymorphic element that is instantiated by updating:
 
 
empty = VariantWithTwo { field1 = [], field2 = [] }
 
   
  +
:versus (better):
empty { field1 = [f] }
 
   
  +
::<haskell>
Several variants with identical fields may avoid some code duplication
 
  +
data Tuple a b = Tuple a b
when selecting and updating, though possibly not in a few
 
  +
</haskell>
depended polymorphic cases.
 
   
  +
:and using:
However, I doubt if the following is a really good alternative to the
 
above data Mode with data BoxOrDiamond.
 
   
  +
::<haskell>
  +
Maybe (Tuple a b)
  +
</haskell>
   
  +
:(or another monad) whenever an undefined result needs to be propagated
data Mode f p =
 
Box { formula :: f, positions :: p }
 
| Diamond { formula :: f, positions :: p }
 
   
  +
=== I/O ===
   
  +
Try to strictly separate monadic I/O and pure (without <code>do</code>) function programming (possibly via separate modules).
==== IO ====
 
   
  +
:Bad:
Try to strictly separate IO, Monad and pure (without do) function
 
programming (possibly via separate modules).
 
   
  +
::<haskell>
Bad:
 
x <- return y
+
x <- return y
...
+
...
  +
</haskell>
   
Good:
+
:Good:
let x = y
 
...
 
   
  +
::<haskell>
  +
let x = y
  +
...
  +
</haskell>
   
Don't use Prelude.interact and make sure your program does not depend
+
Don't use <code>Prelude.interact</code> and make sure your program does not depend on the (not always obvious) order of evaluation e.g. don't read and write to the same file!
on the (not always obvious) order of evaluation. I.e. don't read and
 
write to the same file:
 
   
 
This will fail:
 
This will fail:
   
  +
:<haskell>
do s <- readFile f
 
  +
do s <- readFile f
writeFile f $ 'a' : s
 
  +
writeFile f $ 'a' : s
  +
</haskell>
   
because of lazy IO! (Writing is starting before reading is finished.)
+
because of lazy I/O! (Writing is starting before reading is finished).
   
  +
=== Imports ===
   
  +
Standard library modules like <code>Char</code>, <code>List</code>, <code>Maybe</code>, <code>Monad</code>, etc should be imported by their hierarchical module name, i.e. the base package (so that haddock finds them):
==== Trace ====
 
   
  +
:<haskell>
Tracing is for debugging purposes only and should not be used as
 
  +
import Data.List
feedback for the user. Clean code is not cluttered by trace calls.
 
  +
import Control.Monad
  +
import System.Environment
  +
</haskell>
   
  +
The libraries for <code>Set</code> and <code>Map</code> are to be imported qualified:
   
  +
:<haskell>
==== Imports ====
 
  +
import qualified Data.Set as Set
  +
import qualified Data.Map as Map
  +
</haskell>
   
Standard library modules like Char. List, Maybe, Monad, etc should be
 
imported by their hierarchical module name, i.e. the base package (so
 
that haddock finds them):
 
   
  +
=== Implementation-specific extensions and classes ===
import Data.List
 
import Control.Monad
 
import System.Environment
 
   
  +
[[Use of language extensions|Stay away from extensions]] as long as possible. Also use classes with care because soon the desire for overlapping instances (like for lists and strings) may arise. Then you may want MPTC (multi-parameter type classes), functional dependencies (FD), undecidable and possibly incoherent instances and then you are "in the wild" (according to SPJ).
The libraries for Set and Map are to be imported qualified:
 
   
  +
=== Trace ===
import qualified Data.Set as Set
 
import qualified Data.Map as Map
 
   
  +
Tracing is for debugging purposes only and should not be used as feedback for the user. Clean code is not cluttered by trace calls.
   
==== Glasgow extensions and Classes ====
+
== Style in other languages ==
 
[[Use of language extensions|Stay away from extensions]] as long as possible. Also use classes with
 
care because soon the desire for overlapping instances (like for lists
 
and strings) may arise. Then you may want MPTC (multi-parameter type
 
classes), functional dependencies (FD), undecidable and possibly incoherent
 
instances and then you are "in the wild" (according to SPJ).
 
 
=== Style in other languages ===
 
   
 
* [http://www.cs.caltech.edu/~cs20/a/style.html OCaml style]
 
* [http://www.cs.caltech.edu/~cs20/a/style.html OCaml style]
   
=== Final remarks ===
+
== Final remarks ==
   
Despite guidelines, writing "correct code" (without formal proof
+
Despite guidelines, writing "correct code" (without formal proof support yet) still remains the major challenge. As motivation to follow these guidelines consider the points that are from the "C++ Coding Standard", where I replaced "C++" with "Haskell".
support yet) still remains the major challenge. As motivation to
 
follow these guidelines consider the points that are from the "C++
 
Coding Standard", where I replaced "C++" with "Haskell".
 
   
 
Good Points:
 
Good Points:
Line 512: Line 426:
 
* people ignore standards anyway
 
* people ignore standards anyway
   
  +
== Other guidelines ==
  +
  +
[https://www.cse.unsw.edu.au/~cs3161/15s2/StyleGuide.html COMP3161]
  +
  +
[http://docs.ganeti.org/ganeti/2.13/html/dev-codestyle.html#haskell Google's ganeti]
  +
  +
[https://kowainik.github.io/posts/2019-02-06-style-guide kowainik]
  +
  +
[https://github.com/Soostone/style-guides/blob/master/haskell-style-guide.md Soostone]
  +
  +
[https://github.com/tibbe/haskell-style-guide/blob/master/haskell-style.md tibbe]
   
  +
[https://github.com/tweag/guides/blob/master/style/Haskell.md tweag]
   
 
[[Category:Style]]
 
[[Category:Style]]

Latest revision as of 09:36, 10 August 2022

Programming guidelines shall help to make the code of a project better readable and maintainable by the varying number of contributors.

It takes some programming experience to develop something like a personal "coding style" and guidelines only serve as rough shape for code. Guidelines should be followed by all members working on the project even if they prefer (or are already used to) different guidelines.

These guidelines have been originally set up for the hets-project and are now put on the HaskellWiki gradually integrating parts of the old hawiki entries ThingsToAvoid and HaskellStyle (hopefully not hurting someone's copyrights). The other related entry TipsAndTricks treats more specific points that are left out here,

Surely some style choices are a bit arbitrary (or "religious") and too restrictive with respect to language extensions. Nevertheless I hope to keep up these guidelines (at least as a basis) for our project in order to avoid maintaining diverging guidelines. Of course I want to supply - partly tool-dependent - reasons for certain decisions and also show alternatives by possibly bad examples. At the time of writing I use ghc-6.4.1, haddock-0.7 and (GNU-) emacs with the latest haskell mode.

The following quote and links are taken from HaskellStyle:

We all have our own ideas about good Haskell style. There's More Than One Way To Do It. But some ways are better than others.

Some comments from the GHC team about their internal coding standards can be found at http://hackage.haskell.org/trac/ghc/wiki/WorkingConventions

Also https://simon.peytonjones.org/publications-2000/#wearing-the-hair-shirt-a-retrospective-on-haskell-2003 contains some brief comments on syntax and style.

What now follows are descriptions of program documentation, file format, naming conventions and good programming practice (adapted form Matt's C/C++ Programming Guidelines and the Linux kernel coding style).


Documentation

  • Comments are to be written in application terms (i.e. user's point of view). Don't use technical terms - that's what the code is for!
  • Comments should be written using correct spelling and grammar in complete sentences with punctuation (in English only):
Generally, you want your comments to tell WHAT your code does, not HOW. Also, try to avoid putting comments inside a function body: if the function is so complex that you need to separately comment parts of it, you should probably [... decompose it.]

  • Put a haddock comment on top of every exported function and data type! Make sure haddock accepts these comments.

File Format

  • All Haskell source files start with a haddock header of the form:
{- |
Module      :  <File name or $Header$ to be replaced automatically>
Description :  <optional short text displayed on contents page>
Copyright   :  (c) <Authors or Affiliations>
License     :  <license>

Maintainer  :  <email>
Stability   :  unstable | experimental | provisional | stable | frozen
Portability :  portable | non-portable (<reason>)

<module description starting at first column>
-}
(the \$Header\$ entry will be automatically expanded.)
A possible compiler pragma (like {-# LANGUAGE CPP #-}) may precede this header. The following hierarchical module name must, of course, match the file name.
  • Make sure that the description is changed to meet the module (if the header was copied from elsewhere). Insert your email address as maintainer.
  • Lines should not be longer than 80 (preferably 75) characters. Code with short lines reads casually and easier to understand. If the expression is longer than 80 lines, try to structure & rewrite the code in a more expressive way. 80 character lines is considered good practice across IT industry and supported in all cases.
  • Don't leave trailing white space in your code in every line.
  • Expand all your tabs to spaces to avoid the danger of wrongly expanding them (or a different display of tabs versus eight spaces). Possibly put something like the following in your ~/.emacs file:
(custom-set-variables '(indent-tabs-mode nil))
  • The last character in your file should be a newline! Under Solaris you'll get a warning if this is not the case and sometimes last lines without newlines are ignored (i.e. #endif without newline). Emacs usually asks for a final newline.
  • The whole module should not be too long (about 400 lines)

Please have a look at the Haddock module header documentation.

Naming Conventions

  • In Haskell types start with capital and functions with lowercase letters, so only avoid infix identifiers! Defining symbolic infix identifiers should be left to library writers only.
(The infix identifier "\\" at the end of a line causes CPP preprocessor problems.)
  • Names (especially global ones) should be descriptive. If you need long names write them in lowerCamelCase. Laconic names are preferred.
  • Similarly, type, type class, and constructor names are written using UpperCamelCase.
  • In the standard libraries, some parts of Haskell code use snake_case for long identifiers to better reflect names given with hyphens in the required documentation. If used in outer code - such names should be transliterated to camlCase identifiers possibly adding a (consistent) suffix or prefix to avoid conflicts with keywords. And instead of a recurring prefix or suffix, you may consider using qualified imports and names.

Good Programming Practice

Functions should be short and sweet, and do just one thing. They should fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24, as we all know), and do one thing and do that well.

  • Most Haskell functions should be at most a few lines, only case expressions over large data types (that should be avoided, too) may need corresponding space.
  • For lambda expressions, write \ t -> … instead of \t -> ….
  • The code should be succinct (though not obfuscated), readable and easy to maintain (after unforeseeable changes). Don't exploit exotic language features without good reason.
  • It's not fixed how deep you indent (4 or 8 chars). You can break the line after do, let, where, and case … of …. Make sure that renamings don't destroy your layout. (If you get too far to the right, the code is unreadable anyway and needs to be decomposed.)
Bad:
case foo of Foo -> "Foo"
            Bar -> "Bar"
Good:
case <longer expression> of
    Foo -> "Foo"
    Bar -> "Bar"
Avoid the notation with braces and semicolons since the layout rule forces you to properly align your alternatives.
  • Respect compiler warnings. Supply type signatures, avoid shadowing and unused variables. Particularly avoid non-exhaustive and overlapping patterns. Missing unreachable cases can be filled in using error with a fixed string "<ModuleName>.<function>" to indicate the error position (in case the impossible should happen). Don't invest time to "show" the offending value, only do this temporarily when debugging the code.
  • Don't leave unused or commented-out code in your files! Readers don't know what to think of it.

Partial functions

  • For partial functions do document their preconditions (if not obvious) and make sure that partial functions are only called when preconditions are obviously fulfilled (i.e. by a case statement or a previous test). Particularly the call of head should be used with care or (even better) be made obsolete by a case statement.
  • Usually a case-expression (and the import of isJust and fromJust from Data.Maybe) can be avoided by using the maybe function:
maybe (error "<ModuleName>.<function>") id $ Map.lookup key map
Generally we require you to be more explicit about failure cases. Surely a missing (or an irrefutable) pattern would precisely report the position of a runtime error, but these are not so obvious when reading the code.

Let or where expressions

Do avoid mixing and nesting let and where. (I prefer the expression-stylistic let.) Use auxiliary top-level functions that you do not export. Export lists also support the detection of unused functions.

Code reuse

If you notice that you're doing the same task again, try to generalize it in order to avoid duplicate code. It is frustrating to change the same error in several places.

Application notation

Many parentheses can be eliminated using the infix application operator $ with lowest priority. Try at least to avoid unnecessary parentheses in standard infix expression.

f x : g x ++ h x

a == 1 && b == 1 || a == 0 && b == 0

Rather than putting a large final argument in parentheses (with a distant closing one) consider using $ instead.

  • f (g x) becomes f $ g x and consecutive applications
  • f (g (h x)) can be written as f $ g $ h x or f . g $ h x.

A function definition like f x = g $ h x can be abbreviated to f = g . h.

Note that the final argument may even be an infix or case-expression:

map id $ c : l

filter (const True) . map id $ case l of 

However, be aware that $-terms cannot be composed further in infix expressions.

Probably wrong:

f $ x ++ g $ x

But the scope of an expression is also limited by the layout rule, so it is usually safe to use $ on right hand sides:

Ok:
do f $ l
++
do g $ l

Of course $ can not be used in types. GHC has also some primitive functions involving the kind # that cannot be applied using $.

Last warning: always leave spaces around $ (and other mixfix operators) since a clash with template haskell is possible.

List Comprehensions

Use these only when "short and sweet". Prefer map, filter, and foldr!

Instead of:

[toUpper c | c <- s]

write:

map toUpper s

Consider:

[toUpper c | s <- strings, c <- s]

Here it takes some time for the reader to find out which value depends on what other value and it is not so clear how many times the interim values s and c are used.

In contrast to that the following can't be clearer:

map toUpper (concat strings)

When using higher-order functions you can switch easier to data structures different from list. Compare:

map (1+) list

and:

Set.map (1+) set

Records

For (large) records avoid the use of the constructor directly and remember that the order and number of fields may change.

Take care with (the rare case of) depend polymorphic fields:

data Fields a = VariantWithTwo
    { field1 :: a
    , field2 :: a }

The type of a value v can not be changed by only setting field1:

v { field1 = f }

Better construct a new value:

VariantWithTwo { field1 = f }  -- leaving field2 undefined

Or use a polymorphic element that is instantiated by updating:

empty = VariantWithTwo { field1 = [], field2 = [] }

empty { field1 = [f] }

Several variants with identical fields may avoid some code duplication when selecting and updating, though possibly not in a few depended polymorphic cases.

However, I doubt if the following is a really good alternative to the above data Mode with data BoxOrDiamond.

data Mode f p =
      Box     { formula :: f,  positions :: p }
    | Diamond { formula :: f,  positions :: p }

Types

  • Prefer proper data types over type synonyms or tuples even if you have to do more constructing and unpacking. This will make it easier to supply class instances later on. Don't put class constraints on a data type, constraints belong only to the functions that manipulate the data.
  • Using type synonyms consistently is difficult over a longer time, because this is not checked by the compiler. (The types shown by the compiler may be unpredictable: i.e. FilePath, String or [Char].)
  • Take care if your data type has many variants (unless it is an enumeration type). Don't repeat common parts in every variant since this will cause code duplication.
Bad (to handle arguments in sync):
data Mode f p = Box f p | Diamond f p
Good (to handle arguments only once):
data BoxOrDiamond = Box | Diamond

data Mode f p = Mode BoxOrDiamond f p


  • Consider (bad):
data Tuple a b = Tuple a b | Undefined
versus (better):
data Tuple a b = Tuple a b
and using:
Maybe (Tuple a b)
(or another monad) whenever an undefined result needs to be propagated

I/O

Try to strictly separate monadic I/O and pure (without do) function programming (possibly via separate modules).

Bad:
    x <- return y
    ...
Good:
    let x = y
    ...

Don't use Prelude.interact and make sure your program does not depend on the (not always obvious) order of evaluation e.g. don't read and write to the same file!

This will fail:

do s <- readFile f
    writeFile f $ 'a' : s

because of lazy I/O! (Writing is starting before reading is finished).

Imports

Standard library modules like Char, List, Maybe, Monad, etc should be imported by their hierarchical module name, i.e. the base package (so that haddock finds them):

import Data.List
import Control.Monad
import System.Environment

The libraries for Set and Map are to be imported qualified:

import qualified Data.Set as Set
import qualified Data.Map as Map


Implementation-specific extensions and classes

Stay away from extensions as long as possible. Also use classes with care because soon the desire for overlapping instances (like for lists and strings) may arise. Then you may want MPTC (multi-parameter type classes), functional dependencies (FD), undecidable and possibly incoherent instances and then you are "in the wild" (according to SPJ).

Trace

Tracing is for debugging purposes only and should not be used as feedback for the user. Clean code is not cluttered by trace calls.

Style in other languages

Final remarks

Despite guidelines, writing "correct code" (without formal proof support yet) still remains the major challenge. As motivation to follow these guidelines consider the points that are from the "C++ Coding Standard", where I replaced "C++" with "Haskell".

Good Points:

  • programmers can go into any code and figure out what's going on
  • new people can get up to speed quickly
  • people new to Haskell are spared the need to develop a personal style and defend it to the death
  • people new to Haskell are spared making the same mistakes over and over again
  • people make fewer mistakes in consistent environments
  • programmers have a common enemy :-)

Bad Points:

  • the standard is usually stupid because it was made by someone who doesn't understand Haskell
  • the standard is usually stupid because it's not what I do
  • standards reduce creativity
  • standards are unnecessary as long as people are consistent
  • standards enforce too much structure
  • people ignore standards anyway

Other guidelines

COMP3161

Google's ganeti

kowainik

Soostone

tibbe

tweag