Difference between revisions of "User:Maeder"
Jump to navigation
Jump to search
m (Haskell programming guidelines proposal) |
(moved to Programming_guidelines) |
||
Line 1: | Line 1: | ||
− | |||
− | == Haskell programming guidelines proposal == |
||
− | |||
− | |||
− | Programming guidelines shall help to make the code of a project better |
||
− | readable and maintainable by the varying number of contributors. |
||
− | |||
− | It takes some programming experience to develop something like a |
||
− | personal "coding style" and guidelines only serve |
||
− | as rough shape for code. Guidelines should be followed by all members |
||
− | working on the project even if they prefer (or are already used to) |
||
− | different guidelines. |
||
− | |||
− | In the following I will describe documentation, file format, naming |
||
− | conventions and good programming practice (adapted form Matt's C/C++ |
||
− | Programming Guidelines and the Linux kernel coding style). |
||
− | |||
− | |||
− | === Documentation === |
||
− | |||
− | |||
− | Comments are to be written in application terms (i.e. user's point of |
||
− | view). Don't use technical terms - that's what the code is for! |
||
− | |||
− | Comments should be written using correct spelling and grammar in complete |
||
− | sentences with punctation (in English only). |
||
− | |||
− | "Generally, you want your comments to tell WHAT your code does, not HOW. |
||
− | Also, try to avoid putting comments inside a function body: if the |
||
− | function is so complex that you need to separately comment parts of it, |
||
− | you should probably" (... decompose it) |
||
− | |||
− | Put a haddock comment on top of every exported function and data type! |
||
− | Make sure haddock accepts these comments. |
||
− | |||
− | |||
− | === File Format === |
||
− | |||
− | |||
− | All Haskell source files start with a haddock header of the form: |
||
− | |||
− | {- | <br> |
||
− | Module : <File name, i.e. generated by \$Header\$> <br> |
||
− | Description : <Short text displayed on contents page> <br> |
||
− | Copyright : (c) <You> and <Your affiliation> <br> |
||
− | License : similar to LGPL, see LICENSE.txt <br> |
||
− | <br> |
||
− | Maintainer : maeder@tzi.de <br> |
||
− | Stability : provisional <br> |
||
− | Portability : portable <br> |
||
− | <br> |
||
− | <module description starting at first column> <br> |
||
− | -} <br> |
||
− | |||
− | |||
− | A possible compiler pragma (like {-# OPTIONS -cpp #-}) may precede |
||
− | this header. The following hierarchical module name must of course |
||
− | match the file name. |
||
− | |||
− | Make sure that the description is changed to meet the module (if the |
||
− | header was copied from elsewhere). Insert your email address as maintainer. |
||
− | |||
− | Try to write portable (Haskell98) code. If you (indirectly) import |
||
− | a module that uses i.e. multi-parameter type classes and functional |
||
− | dependencies the code becomes "non-portable (MPTC with FD)". |
||
− | |||
− | The Dollar-Header-Dollar entry is automatically expanded by cvs (and will wrap |
||
− | around). All other lines should not be longer than 80 (preferably 75) |
||
− | characters to avoid wrapped lines (for casual readers)! |
||
− | |||
− | Expand all your tabs to spaces to avoid the danger of wrongly expanding |
||
− | them (or a different display of tabs versus eight spaces). Possibly put |
||
− | something like the following in your ~/.emacs file. |
||
− | |||
− | (custom-set-variables '(indent-tabs-mode nil)) |
||
− | |||
− | The last character in your file should be a newline! Under solaris |
||
− | you'll get a warning if this is not the case and sometimes last lines |
||
− | without newlines are ignored (i.e. "#endif" without newline). Emacs |
||
− | usually asks for a final newline. |
||
− | |||
− | The whole module should not be too long (about 400 lines) |
||
− | |||
− | |||
− | === Naming Conventions === |
||
− | |||
− | |||
− | In Haskell types start with capital and functions with lowercase |
||
− | letters, so only avoid infix identifiers! Defining symbolic infix |
||
− | identifiers should be left to library writers only. |
||
− | |||
− | (The infix identifier "\\" at the end of a line causes cpp preprocessor |
||
− | problems.) |
||
− | |||
− | Names (especially global ones) should be descriptive and if you need |
||
− | long names write them as mixed case words (aka camelCase). (but "tmp" |
||
− | is to be preferred over "thisVariableIsATemporaryCounter") |
||
− | |||
− | |||
− | === Good Programming Practice === |
||
− | |||
− | |||
− | "Functions should be short and sweet, and do just one thing. They should |
||
− | fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24, |
||
− | as we all know), and do one thing and do that well." |
||
− | |||
− | Most haskell functions should be at most a few lines, only case |
||
− | expression over large data types (that should be avoided, too) may need |
||
− | corresponding space. |
||
− | |||
− | The code should be succinct (though not obfuscated), readable and easy to |
||
− | maintain (after unforeseeable changes). Don't exploit exotic language |
||
− | features without good reason. |
||
− | |||
− | It's not fixed how deep you indent (4 or 8 chars). You can break the |
||
− | line after "do", "let", "where", and "case .. of". Make sure that |
||
− | renamings don't destroy your layout. (If you get to far to the right, |
||
− | the code is unreadable anyway and needs to be decomposed.) |
||
− | |||
− | Bad: |
||
− | case foo of Foo -> "Foo" |
||
− | Bar -> "Bar" |
||
− | Good: |
||
− | case <longer expression> of |
||
− | Foo -> "Foo" |
||
− | Bar -> "Bar" |
||
− | |||
− | Avoid the notation with braces and semicolons since the layout rule |
||
− | forces you to properly align your alternatives. |
||
− | |||
− | Respect compiler warnings. Supply type signatures, avoid shadowing and |
||
− | unused variables. Particularly avoid non-exhaustive and |
||
− | overlapping patterns. Missing unreachable cases can be filled in using |
||
− | "error" with a fixed string "<ModuleName>.<function>" to indicate the |
||
− | error position (in case the impossible should happen). Don't invest |
||
− | time to "show" the offending value, only do this temporarily when |
||
− | debugging the code. |
||
− | |||
− | Don't leave unused or commented-out code in your files! Readers don't |
||
− | know what to think of it. |
||
− | |||
− | |||
− | ==== Case expressions ==== |
||
− | |||
− | Prefer case expressions over pattern binding declarations. |
||
− | |||
− | Not always nice: |
||
− | longFunctionName (Foo: _ : _) = e1 |
||
− | longFunctionName (Bar: _) = e2 |
||
− | |||
− | Better (I think): |
||
− | longFunctionName arg = case arg of |
||
− | Foo : _ : _ -> e1 |
||
− | Bar : _ -> e2 |
||
− | _ -> error "ProgrammingGuidelines.longFunctionName" |
||
− | |||
− | For partial functions document their preconditions (if not obvious) |
||
− | and make sure that partial functions are only called when |
||
− | preconditions are obviously fulfilled (i.e. by a case statement or a |
||
− | previous test). Particularly the call of "head" should be used with |
||
− | care or (even better) be made obsolete by a case statement. |
||
− | |||
− | Usually a case statement (and the import of isJust and fromJust from |
||
− | Data.Maybe) can be avoided by using the "maybe" function: |
||
− | |||
− | maybe (error "<ModuleName>.<function>") id $ Map.lookup key map |
||
− | |||
− | Generally we require you to be more explicit about failure |
||
− | cases. Surely a missing (or an irrefutable) pattern |
||
− | would precisely report the position of a runtime error, but these are |
||
− | not so obvious when reading the code. |
||
− | |||
− | Do avoid mixing "let" and "where". (I prefer "let" and have auxiliary |
||
− | function on the top-level that are not exported.) Export lists also |
||
− | support the detection of unused functions. |
||
− | |||
− | If you notice that you're doing the same task again, try to generalize |
||
− | it in order to avoid duplicate code. It is frustrating to change the |
||
− | same error in several places. |
||
− | |||
− | |||
− | ==== Application notation ==== |
||
− | |||
− | |||
− | Many parentheses can be eliminated using the infix application operator "$" |
||
− | with lowest priority. Try at least to avoid unnecessary parentheses in |
||
− | standard infix expression. |
||
− | |||
− | f x : g x ++ h x |
||
− | |||
− | a == 1 && b == 1 || a == 0 && b == 0 |
||
− | |||
− | Rather than putting a large final argument in parentheses (with a |
||
− | distant closing one) consider using "$" instead. |
||
− | |||
− | "f (g x)" becomes "f $ g x" and consecutive applications |
||
− | "f (g (h x))" can be written as "f $ g $ h x" or "f . g $ h x". |
||
− | |||
− | A function definition like |
||
− | "f x = g $ h x" can be abbreviated to "f = g . h". |
||
− | |||
− | Note that the final argument may even be an infix- or case expression: |
||
− | |||
− | map id $ c : l |
||
− | |||
− | filter (const True) . map id $ case l of ... |
||
− | |||
− | However, be aware that $-terms cannot be composed further in infix |
||
− | expressions. |
||
− | |||
− | Probably wrong: |
||
− | f $ x ++ g $ x |
||
− | |||
− | But the scope of an expression is also limited by the layout rule, so |
||
− | it is usually save to use "$" on right hand sides. |
||
− | |||
− | Ok: |
||
− | do y <- f $ l |
||
− | ++ |
||
− | do y <- g $ l |
||
− | |||
− | Of course "$" can not be used in types. GHC has also some primitive |
||
− | functions involving the kind "#" that cannot be applied using "$". |
||
− | |||
− | Last warning: always leave spaces around "$" (and other mixfix |
||
− | operators) since a clash with template haskell is possible. |
||
− | |||
− | (Also write "\ t" instead of "\t" in lambda expressions) |
||
− | |||
− | |||
− | ==== List Comprehensions ==== |
||
− | |||
− | Use these only when "short and sweet". (I prefer map, filter, and foldr.) |
||
− | |||
− | |||
− | ==== Types ==== |
||
− | |||
− | Prefer proper data types over type synonyms or tuples even if you have |
||
− | to do more constructing and unpacking. This will make it easier to |
||
− | supply class instances later on. Don't put class constraints on |
||
− | a data type, constraints belong only to the functions that manipulate |
||
− | the data. |
||
− | |||
− | Using type synonyms consistently is difficult over a longer time, |
||
− | because this is not checked by the compiler. (The types shown by |
||
− | the compiler may be unpredictable: i.e. FilePath, String or [Char]) |
||
− | |||
− | Take care if your data type has many variants (unless it is an |
||
− | enumeration type.) Don't repeat common parts in every variant since |
||
− | this will cause code duplication. |
||
− | |||
− | Bad (to handle arguments in sync): |
||
− | |||
− | data Mode f p = Box f p | Diamond f p |
||
− | |||
− | Good (to handle arguments only once): |
||
− | |||
− | data BoxOrDiamond = Box | Diamond |
||
− | |||
− | data Mode f p = Mode BoxOrDiamond f p |
||
− | |||
− | |||
− | Consider (bad): |
||
− | |||
− | data Tupel a b = Tupel a b | Undefined |
||
− | |||
− | versus (better): |
||
− | |||
− | data Tupel a b = Tupel a b |
||
− | |||
− | and using: |
||
− | |||
− | Maybe (Tupel a b) |
||
− | |||
− | (or another monad) whenever an undefined result needs to be propagated |
||
− | |||
− | |||
− | ==== Records ==== |
||
− | |||
− | For (large) records avoid the use of the constructor directly and |
||
− | remember that the order and number of fields may change. |
||
− | |||
− | Take care with (the rare case of) depend polymorphic fields: |
||
− | |||
− | data Fields a = |
||
− | VariantWithTwo { field1 :: a |
||
− | , field2 :: a } |
||
− | |||
− | The type of a value v can not be changed by only setting field1: |
||
− | |||
− | v { field1 = f } |
||
− | |||
− | Better construct a new value: |
||
− | |||
− | VariantWithTwo { field1 = f } -- leaving field2 undefined |
||
− | |||
− | Or use a polymorphic element that is instantiated by updating: |
||
− | |||
− | e = VariantWithTwo { field1 = [], field2 = [] } |
||
− | |||
− | e { field1 = [f] } |
||
− | |||
− | Several variants with identical fields may avoid some code duplication |
||
− | when selecting and updating, though possibly not in a few |
||
− | depended polymorphic cases. |
||
− | |||
− | However, I doubt if the following is a really good alternative to the |
||
− | above data Mode with data BoxOrDiamond. |
||
− | |||
− | |||
− | data Mode f p = |
||
− | Box { formula :: f, positions :: p } |
||
− | | Diamond { formula :: f, positions :: p } |
||
− | |||
− | |||
− | |||
− | ==== IO ==== |
||
− | |||
− | Try to strictly separate IO, Monad and pure (without do) function |
||
− | programming (possibly via separate modules). |
||
− | |||
− | Bad: |
||
− | x <- return y |
||
− | ... |
||
− | |||
− | Good: |
||
− | let x = y |
||
− | ... |
||
− | |||
− | |||
− | Don't use Prelude.interact and make sure your program does not depend |
||
− | on the (not always obvious) order of evaluation. I.e. don't read and |
||
− | write to the same file: |
||
− | |||
− | This will fail: |
||
− | |||
− | do s <- readFile f |
||
− | writeFile f $ 'a' : s |
||
− | |||
− | because of lazy IO! (Writing is starting before reading is finished.) |
||
− | |||
− | |||
− | ==== Trace ==== |
||
− | |||
− | Tracing is for debugging purposes only and should not be used as |
||
− | feedback for the user. Clean code is not cluttered by trace calls. |
||
− | |||
− | |||
− | ==== Imports ==== |
||
− | |||
− | Standard library modules like Char. List, Maybe, Monad, etc should be |
||
− | imported by their hierarchical module name, i.e. the base package (so |
||
− | that haddock finds them): |
||
− | |||
− | import Data.List |
||
− | import Control.Monad |
||
− | import System.Environment |
||
− | |||
− | The libraries for Set and Map are to be imported qualified: |
||
− | |||
− | import qualified Data.Set as Set |
||
− | import qualified Data.Map as Map |
||
− | |||
− | |||
− | |||
− | ==== Glasgow extensions and Classes ==== |
||
− | |||
− | Stay away form extensions as long as possible. Also use classes with |
||
− | care because soon the desire for overlapping instances (like for lists |
||
− | and strings) may arise. Then you may want MPTC (multi-parameter type |
||
− | classes), functional dependencies (FD), undecidable and possibly incoherent |
||
− | instances and then you are "in the wild" (according to SPJ). |
||
− | |||
− | |||
− | === Final remarks === |
||
− | |||
− | |||
− | Despite guidelines, writing "correct code" (without formal proof |
||
− | support yet) still remains the major challenge. As motivation to |
||
− | follow these guidelines consider the points that are from the "C++ |
||
− | Coding Standard", where I replaced "C++" with "Haskell". |
||
− | |||
− | Good Points: |
||
− | |||
− | * programmers can go into any code and figure out what's going on |
||
− | |||
− | * new people can get up to speed quickly |
||
− | |||
− | * people new to Haskell are spared the need to develop a personal |
||
− | style and defend it to the death |
||
− | |||
− | * people new to Haskell are spared making the same mistakes over |
||
− | and over again |
||
− | |||
− | * people make fewer mistakes in consistent environments |
||
− | |||
− | * programmers have a common enemy :-) |
||
− | |||
− | Bad Points: |
||
− | |||
− | * the standard is usually stupid because it was made by someone |
||
− | who doesn't understand Haskell |
||
− | |||
− | * the standard is usually stupid because it's not what I do |
||
− | |||
− | * standards reduce creativity |
||
− | |||
− | * standards are unnecessary as long as people are consistent |
||
− | |||
− | * standards enforce too much structure |
||
− | |||
− | * people ignore standards anyway |
||
− | |||
− | Send comments to the "Maintainer" of this file |