HaskellWiki - User contributions [en]

MonadPlus reform proposal

2015-06-23T18:22:14Z

Blaisorblade: Question unbiased MonadPlus instance for Maybe

The [[MonadPlus]] class is ambiguous: while all instances satisfy '''Monoid''' and '''Left Zero''', some such as <tt>[]</tt> satisfy '''Left Distribution''', while others such as <tt>Maybe</tt> and <tt>IO</tt> satisfy '''Left Catch'''.

== Proposal ==

It is proposed that MonadPlus be split like this:

=== MonadZero ===

<haskell>
class Monad m => MonadZero m where
mzero :: m a
</haskell>

satisfying '''Left Zero''':

<haskell>
mzero >>= k = mzero
</haskell>

=== MonadPlus ===

<haskell>
class MonadZero m => MonadPlus m where
mplus :: m a -> m a -> m a
</haskell>

satisfying '''Monoid''' and '''Left Distribution''':

<haskell>
mplus mzero b = b
mplus a mzero = a
mplus (mplus a b) c = mplus a (mplus b c)
mplus a b >>= k = mplus (a >>= k) (b >>= k)
</haskell>

=== MonadOr ===

<haskell>
class MonadZero m => MonadOr m where
morelse :: m a -> m a -> m a
</haskell>

satisfying '''Monoid''' and '''Left Catch''':

<haskell>
morelse mzero b = b
morelse a mzero = a
morelse (morelse a b) c = morelse a (morelse b c)
morelse (return a) b = return a
</haskell>

== Instances of both ==

Some types could be made instances of both. For instance:

<haskell>
instance MonadOr [] where
morelse [] b = b
morelse a b = a
</haskell>

The left-biased implementation of mplus for the Maybe monad should be used as an implementation of morelse, but it is also possible to give an unbiased mplus for Maybe:

<haskell>
instance MonadPlus Maybe where
mplus (Just a) Nothing = Just a
mplus Nothing (Just a) = Just a
mplus _ _ = Nothing

instance MonadOr Maybe where
morelse (Just a) _ = Just a
morelse _ b = b
</haskell>

Question: But does this instance satisfy '''Left Distribution'''? If a = Just v1 and b = Just v2, '''Left Distribution''' implies that Nothing = mplus (k v1) (k v2), which isn't generally true — take for instance
<haskell>
v1 = 0
v2 = 1
f 0 = Just 0
f 1 = Nothing
</haskell>

Am I missing something? -- Blaisorblade

== Discussion ==
Given that Control.Applicative(Alternative) now defines a class which seems innately bound to '''Left Catch''', at least in spirit, it seems to make sense to clean up MonadPlus such that all instances obey '''Left Distribution'''? --sclv

I'd actually suggest almost the opposite, that MonadPlus be dispensed with and merged into Monad. The (controversial) fail method looks no different than an mzero, except the string argument; indeed, so far as I know <tt>fail s</tt> is just mzero for any MonadPlus. MonadPlus is also barely made use of; just guard and msum in the standard? To be concrete, I would make the following the default definitions (in Monad):

<haskell>
mzero = fail "something"
mplus a b = a
</haskell>

These are thus somewhat trivial by default, but having msum=head and guard=assert (roughly; more like <tt>(`assert` return ())</tt>) for less-flexible monads doesn't seem actually wrong and could be useful fallbacks.

I also question the claim that Maybe and IO should be thought of as "left catch". IO is not even in MonadPlus, and I don't see how it can be meaningfully in any way other than the above. Maybe does satisfy Left Catch, but it seems almost like that's only because it's such a simple monad (holding only one value). It is a useful observation that it fails Left Distribution, but that may only call for weaker Monad/Plus conditions.

The MonadOr idea is a solid one, but it seems to be taking the monad in a different direction. So if there's a good match in Control.Applicative or Parsec, that might be the best place to develop that idea. -- Galen

The default <hask>mplus</hask> doesn't satisfy <hask>mplus mzero b = b</hask>, so you lose Monoid which seems to be the only thing people actually agree on :) -- [[User:Benmachine|Benmachine]]

[[Category:Proposals]] [[Category:Monad]]

GHC/Using rules

2014-05-23T10:33:50Z

Blaisorblade: Merge lines to prevent unintended line breaks in output (and fix typo)

[[Category:GHC|Rules]]
[[Category:Performance]]
[[Category:Program transformation]]
== Using rules in GHC ==

GHC's rewrite rules (invoked by the RULES pragma) offer a powerful way to optimise your program. This page is a place for people who use rewrite rules to collect thoughts about how to use them.

If you aren't already familiar with RULES, read this stuff first:
* [http://www.haskell.org/ghc/docs/latest/html/users_guide/rewrite-rules.html The relevant section of the GHC user manual]
* [http://research.microsoft.com/%7Esimonpj/Papers/rules.htm Playing by the rules: rewriting as a practical optimisation technique in GHC]. This paper, from the 2001 Haskell workshop, describes the idea of rewrite rules.

=== Advice about using rewrite rules ===

* Remember to use the flag <tt>-fglasgow-exts</tt> and the optimisation flag <tt>-O</tt>
* Use the flag <tt>-ddump-simpl-stats</tt> to see how many rules actually fired.
* For even more detail use <tt>-ddump-simpl-stats -ddump-simpl-iterations</tt> to see the core code at each iteration of the simplifer. Note that this produces '''lots''' of output so you'll want to direct the output to a file or pipe it to <tt>less</tt>. Looking at the output of this can help you figure out why rules are not firing when you expect them to do so.
* Another tip for discovering why rules do not fire, is to use the flag <tt>-dverbose-core2core</tt>, which (amongst other things) produces the AST after every rule is fired. This can help you to examine whether one rule is creating an expression that thereby prevents another rule from firing, for example.
* You need to be careful that your identifiers aren't inlined before your RULES have a chance to fire. Consider
<haskell>
{-# INLINE nonFusable #-}
{-# RULES "fusable/aux" forall x y.
fusable x (aux y) = faux x y ; #-}
nonFusable x y = fusable x (aux y)
</haskell>
: You are possibly surprised when the rule for <hask>fusable</hask> does not fire. It may well be that <hask>fusable</hask> was inlined before rules were applied.
: To control this we add an <hask>NOINLINE</hask> or an <hask>INLINE [1]</hask> pragma to identifiers we want to match in rules, to ensure they haven't disappeared by the time the rule matching comes around.

To have rewrite rules fire in code interpreted in GHCi, you'll need
to explicitly ask for -frewrite-rules in an options pragma at the
start of your file.

=== Structure of simplification process ===

There are currently the simplifier phases "gentle", 2, 1, 0, each consisting of 4 iterations.
Starting with GHC 6.10 you can alter these numbers with the command line options <code>-fsimplifier-phases</code> and <code>-fmax-simplifier-iterations</code>.
However in each iteration rules are applied multiple times, until rules can no longer be applied.
That rules can no longer be applied is due to the fact that the simplifier chooses some way from outer to inner or reverse.
Actually it's always the same, but you should not rely on a particular order, mind you?
The good effect is that arbitrary big expressions of the type <hask>map f0 . map f1 . ... . map fn</hask>
can be collapsed to a single <hask>map</hask> by the single rule <hask>map f (map g xs) = map (f . g) xs</hask>.
The bad effect is that rules like <hask>f x y = f y x</hask> lead to an infinite loop.

=== Example: <hask>map</hask> ===

(This example code is taken from GHC's <tt>base/GHC/Base.lhs</tt> module.)

map :: (a -> b) -> [a] -> [b]
map _ [] = []
map f (x:xs) = f x : map f xs

mapFB :: (elt -> lst -> lst) -> (a -> elt) -> a -> lst -> lst
{-# INLINE [0] mapFB #-}
mapFB c f x ys = c (f x) ys

The rules for map work like this.

Up to (but not including) phase 1, we use the <tt>"map"</tt> rule to
rewrite all saturated applications of map with its build/fold
form, hoping for fusion to happen.
In phase 1 and 0, we switch off that rule, inline build, and
switch on the <tt>"mapList"</tt> rule, which rewrites the foldr/mapFB
thing back into plain map.

It's important that these two rules aren't both active at once
(along with build's unfolding) else we'd get an infinite loop
in the rules. Hence the activation control below.

The <tt>"mapFB"</tt> rule optimises compositions of map.

This same pattern is followed by many other functions:
e.g. <hask>append</hask>, <hask>filter</hask>, <hask>iterate</hask>, <hask>repeat</hask>, etc.

{-# RULES
"map" [~1] forall f xs. map f xs = build (\c n -> foldr (mapFB c f) n xs)
"mapList" [1] forall f. foldr (mapFB (:) f) [] = map f
"mapFB" forall c f g. mapFB (mapFB c f) g = mapFB c (f.g)
#-}

== Questions ==

=== Order of rule-matching ===

For example, let's say we have two rules
"f->g" forall x y . f x (h y) = g x y
"h->g" forall x . h x = g 0 x
and a fragment of the AST corresponding to
f a (h b)

Which rule will fire? "f->g" or "h->g"? (Each rule disables the other.)

Answer: rules are matched against the AST for expressions basically
''bottom-up'' rather than top-down. In this example, "h->g" is the rule
that fires. But due to the nature of inlining and so on, there are
absolutely no guarantees about this kind of behaviour. If you really
need to control the order of matching, phase control is the only
reliable mechanism.

=== [[Confluent term rewriting system]] ===

Since there is no guarantee on a particular order of rule application, except the control by phases,
you should assert that the result is the same independent of the order of rule application.
This property of a term rewriting system is called confluence.
See for example:
<haskell>
{-# RULES
"project/project" forall x.
project (project x) = project x ;

"project/foo" forall x.
project (foo x) = projectFoo x ;
#-}

f = project . project . foo
</haskell>
For this set of rewriting rules it matters whether you apply "project/project" or "project/foo" first to the body of <hask>f</hask>.
In the first case you can apply additionally "project/foo" yielding <hask>projectFoo x</hask>,
whereas the second case leaves you with <hask>project (projectFoo x)</hask>.
To make the system confluent you should add the rule
<haskell>
project (projectFoo x) = projectFoo x
</haskell>
You can complete a rule system this way by hand, although it'd be quite a nice thing to automate it in GHC.

We assume that non-confluent rewriting systems are bad design,
but it is not clear how to achieve confluence for any system.

=== Pair rules ===

It is often useful to provide two implementations of a function, and
have rewrite rules pick which version to use depending on context. In
both GHC's foldr/build fusion, and more extensively in Data.ByteString's
stream fusion system, pair rules are used to allow the compiler to
choose between two implementations of a function.

Consider the rules:

<haskell>
"FPS length -> fused" [~1]
length = F.strConsumerBi F.lengthS

"FPS length -> unfused" [1]
F.strConsumerBi F.lengthS = length
</haskell>

This rule pair tells the compiler to rewrite occurences of <hask>length</hask> to a stream-fusible form in early simplifications phases, hoping for fusion to happen. However, if by phase 1 (remember that phases count down from 4), the fusible form remains unfused, it is better to rewrite it back to the unfused-but-fast implementation of length. A similar trick is used for <hask>map</hask> in the base libraries.

As we want to match <hask>length</hask> in the rules, we need to ensure that it isn't inlined too soon:

<haskell>
length :: ByteString -> Int
length (PS _ _ l) = assert (l >= 0) $ l
{-# INLINE [1] length #-}
</haskell>

and we need <hask>strConsumerBi</hask> to stick around for even longer:

<haskell>
strConsumerBi :: (Stream -> a) -> (ByteString -> a)
strConsumerBi f = f . readStrUp
{-# INLINE [0] strConsumerBi #-}

lengthS :: Stream -> Int
lengthS ...
{-# INLINE [0] lengthS #-}
</haskell>

Pair rules thus provide a useful mechanism to allow a library to provide
multiple implementations of a function, picking the best one to use
based on context.

=== Custom specialisation rules ===

Another use for rules is to replace a particular use of a slow,
polymorphic function with a custom monomorphic implementation.

Consider:
<haskell>
zipWith :: (Word8 -> Word8 -> a) -> ByteString -> ByteString -> [a]
</haskell>

This is a bit slow, but useful. It's often used to zip ByteStrings into
a new ByteString, that is:
<haskell>
pack (zipWith f p q)
</haskell>

We'd like to spot this, and throw away the intermediate [a] created. And
also use a specialised implementation of:
<haskell>
zipWith :: (Word8 -> Word8 -> Word8) -> ByteString -> ByteString -> ByteString
</haskell>

We can use rules for this:

<haskell>
"FPS specialise pack.zipWith" forall (f :: Word8 -> Word8 -> Word8) p q .
pack (zipWith f p q) = zipWith' f p q
</haskell>

This rule spots the specific use of zipWith we're looking for, and
replaces it with a fast, specialised version.

=== Rules and sections ===

This is useful for higher order functions as well. As of ghc 6.6, the
rule LHS syntax has been relaxed, allowing for sections and lambda
abstractions to appear. Previously, only applications of the following
form were valid:

<haskell>
"FPS specialise break (x==)" forall x.
break ((==) x) = breakByte x
</haskell>

That is, replace occurences of: <hask>break (x==)</hask> with the optimised breakByte function.

This code illustrates how higher order functions can be rewritten to optimised first order equivalents, for special cases like <hask>(==)</hask>. In the case of Data.ByteString, functions using <hask>(==)</hask> or <hask>(/=)</hask> are much faster when implemented with memchr(3), and we can use rules to do this, as long as it is possible to match sections. In ghc 6.6 we can now write:

<haskell>
"FPS specialise break (==x)" forall x.
break (==x) = breakByte x

"FPS specialise break (x==)" forall x.
break (x==) = breakByte x
</haskell>

Some fragility remains in this rule though, as described below.

=== Literals, dictionaries and sections ===

Consider:
<haskell>
break (== 10)
</haskell>

Hopefully, this can be rewritten to a <hask>breakByte 10</hask> call, however, the combination of sections, literals and dictionaries for Eq makes this rather fragile.

The rule for break ends up translated by GHC as;

<haskell>
forall ($dEq :: base:GHC.Base.Eq base:GHC.Word.Word8)
(x :: base:GHC.Word.Word8)

break (base:GHC.Base.== @ base:GHC.Word.Word8 $dEq x) =
breakByte x
</haskell>

Notice the LHS: an application of the selector to a (suitably-typed) Eq
dictionary. GHC does very little simplification on LHSs, because if it
does too much, the LHS doesn't look like you thought it did. Here it
might perhaps be better to simplify to GHC.Word.Word8.==, by selecting
from the dictionary, but GHC does not do that.

When this rules works, GHC generates exactly that pattern; we get

<haskell>
eq = (==) deq
main = ... break (\x. eq x y) ...
</haskell>

GHC is anxious about substituting eq inside the lambda, but it does it
because (==) is just a record selector, and hence is very cheap.

But when we put a literal inline, we get an (Eq a) constraint and a (Num
a) constraint (from the literal). Ultimately, 'a' turns out to be Int,
by defaulting, but we don't know that yet. So GHC picks the Eq
dictionary from the Num dictionary:

<haskell>
eq = (==) ($p1 dnum)
main = ... break (\x. eq x y) ...
</haskell>

Now the 'eq' doesn't look quite so cheap, and it isn't inlined, so the
rule does not fire. However, GHC 6.6 has been modified to believe that
nested selection is also cheap, so that makes the rule fire.

The underlying lesson is this: the only robust way to make rules fire is
if the LHS is a normal form. Otherwise GHC may miss the fleeting moment
at which (an instance of) the rule LHS appears in the program. The way
you ensure this is with inline phases: don't inline LHS stuff until
later, so that the LHS stuff appears in the program more than
fleetingly.

But in this case you have (==) on the LHS, and you have no phase control
there. So it gets inlined right away, so the rule doesn't match any
more. The only way the rule "works" is because GHC catches the pattern
right away, before (==) is inlined. Not very robust.

To make this robust, you'd have to say something like

<haskell>
instance Eq Word 8 where
(==) = eqWord8

eqWord8 = ..
{-# NOINLINE [1] eqWord8 #-}

{-# RULES
"FPS specialise break (x==)" forall x.
break (x`eqWord8`) = breakByte x
#-}
</haskell>

=== Rules and method sharing ===

GHC by default instantiates overloaded methods by partially applying the original overloaded identifier. This facilitates sharing of multiple method instances with one global definition. However, since a new function name is created during this process, rules matching the original names will not fire. Here is an example from <tt>Control.Arrow</tt>:

<haskell>
class Arrow a where
arr :: (b -> c) -> a b c
first :: a b c -> a (b,d) (c,d)
(>>>) :: a b c -> a c d -> a b d

{-# RULES
"compose/arr" forall f g . arr f >>> arr g = arr (f >>> g)
"first/arr" forall f . first (arr f) = arr (first f)
...
-#}
</haskell>

Consider an instance of an arrow and some code on which the rules above should fire:

<haskell>
newtype SF a b = SF ([a] -> [b])

instance Arrow SF where
arr f = SF (map f)
...

foo :: SF (Int,Int) (Int,Int)
foo = first (arr (+1)) >>> first (arr (+2) >>> arr (+3))
</haskell>

GHC would generate intermediate code like:

<haskell>
dsf :: Arrow SF
dsf = ...

first_1 = Control.Arrow.first SF dsf
arr_1 = Control.Arrow.arr SF dsf

foo = first_1 (arr_1 (+1)) ...etc...
</haskell>

Due to the introduction of <tt>first_1</tt> and <tt>arr_1</tt>, the rules no longer match since the names have changed.

The solution is to switch off sharing with the <tt>-fno-method-sharing</tt> flag.

=== Coexistence of fusion frameworks ===

I like to use my own fusion framework on an existing data structure because I want to experiment with it
or because I have a specific application and I want to optimize the fusion framework for it.
How can I disable the fusion rules shipped with that data structure - or at least defer them until the optimizer is finished with my rules?

Answer:
Second part of the question first:
Asserting that your rules are used before the standard rules is not possible with [[GHC]] up to version 6.8.
The current system is quite monolithic in this respect.
It would be a nice application of a more sophisticated rule control system that allows any number of simplifier phases with explicit statements which phase shall be entered after which other phase.

First part of the question:
You may wrap the data structure in a <hask>newtype</hask> or, to be entirely safe, redefine the data structure.
This means that several functions have to be lifted to the wrapped data type.
This is tedious, but given that you make an application specific fusion framework,
the set of basic functions will be different from that of the general data structure.
You might have planned to make your data type distinct anyway, may it be for the <hask>Arbitrary</hask> class of [[QuickCheck]].
Remember to attach a NOINLINE pragma to the wrapped functions,
otherwise the compiler may unpack the wrappers and starts fusion on the underlying data structure.

=== Interaction of INLINE and RULES ===

Rules can be seen as alternative function definitions. They are somehow special because they do not allow pattern matching, but allow expressions in arguments using existing variable names in the left hand side. Since pattern matching can be moved into a <hask>case</hask>, fusion rules are actually the more flexible way to define functions. Alternative function definition means that the compiler has to decide which definition to use: The original function definition (by an explicit call or by [[inlining]]/[[unfolding]]) or one of the optimizer rules ("a rule fires").

Now the critical question is: How do inlining and rule application interact?

Inlining is always an option for the compiler, whether you use the INLINE pragma or not.
The compiler measures some kind of size of the function and
decides whether a function is small enough in order to be inlined.
The INLINE pragma reduces this size virtually.
But the inlining can still be omitted.
In that sense RULES are applied more aggressively because they don't respect a size measurement of functions.

It's interesting to note, that declaring a function as INLINE disables fusion for the inner of the function.
E.g., you can expect that
<haskell>
doubleMap f g = map f . map g
</haskell>
is fused to
<haskell>
doubleMap f g = map (f . g)
</haskell>
However, if you add
<haskell>
{-# INLINE doubleMap #-}
</haskell>
then the function definition is not fused to <hask>map (f . g)</hask>.
The compiler expects that the function will never called as is, and thus skips fusion.
That is, you cannot control the application order of rules by enclosing expressions in function definitions,
that shall be fused before fusion with outer parts.

However, since the compiler may decide not to inline, the compiler may leave you with a call to the unoptimized <hask>doubleMap</hask>. You could prevent this e.g. by:

<haskell>
{-# NOINLINE doubleMap #-}
doubleMap f g = map f . map g -- will be fused

{-# RULES
"doubleMap" forall f g. doubleMap f g = map f . map g
#-}
</haskell>

This makes sure that the right hand side of <hask>doubleMap</hask> will be optimised for those cases when the rule doesn't fire, e.g. when <hask>doubleMap</hask> is applied to less than two arguments.

=== Interaction of SPECIALISE and INLINE ===

<hask>SPECIALISE</hask> pragmas are also some kind of rules, where calls to functions with a specific [[type class dictionary]] are replaced by calls to versions of a function which are instantiated to a specific type.
If you want to use a function the inlined way, it might be a bad idea to add the <hask>SPECIALISE</hask> pragma, since this will replace a call to the function by a call to a specialised function instead of inlining it.

== Future of rules in GHC ==

GHC has much too rigid a notion of phases up to version 6.8.
There are precisely 3, namely 2 then 1 then 0, and that does not give enough control.
Really we should let you give arbitrary names to phases,
express constraints (A must be before B), and run a constraint solver to map phase names to a linear ordering.
The current system is horribly non-modular.
(See Haskell-Cafe on [http://www.haskell.org/pipermail/haskell-cafe/2008-January/038198.html Properties of optimizer rule application?])

Phantom type

2011-08-06T14:05:04Z

Blaisorblade: /* Why not type synonyms */ Insulate comments into a specific section

A '''phantom type''' is a parametrised type whose parameters do not all appear on the right-hand side of its definition, e.g. from <tt>Control.Applicative</tt>:

<haskell>
newtype Const a b = Const { getConst :: a }
</haskell>

Here <tt>Const</tt> is a phantom type, because the <tt>b</tt> parameter doesn't appear after the <tt>=</tt> sign.

Phantom types are useful in a variety of contexts: in the standard <tt>Data.Fixed</tt> module they are used with type classes to encode the precision being used, with [[smart constructors]] or GADTs they can encode information about how and where a value can be used, or with more exotic extensions they can be used for [[Smart_constructors#Enforcing_the_constraint_statically|encoding bounds checks in the type system.]]

Since the values of type parameters in a phantom type may be unused, they are often used in combination with [[empty type]]s.

A phantom type might not be a type synonym, but must be a newtype or a data type. Actually, the compiler will accept a "phantom type synonym", but it's a very bad idea, as explained below.

==Simple examples==

A phantom type will have a declaration that looks something like this:

<haskell>
data FormData a = FormData String
</haskell>

This looks strange since at first it seems the type parameter is unused and could be anything, without affecting the value inside. Indeed, one can write:

<haskell>
changeType :: FormData a -> FormData b
changeType (FormData str) = FormData str
</haskell>

to change it from any type to any other. However, if the constructor is not exported then users of the library that defined <hask>FormData</hask> can't define functions like the above, so the type parameter can only be set or changed by library functions. So we might do:

<haskell>
data Validated
data Unvalidated

-- since we don't export the constructor itself,
-- users with a String can only create Unvalidated values
formData :: String -> FormData Unvalidated
formData str = FormData str

-- Nothing if the data doesn't validate
validate :: FormData Unvalidated -> Maybe (FormData Validated)
validate (FormData str) = ...

-- can only be fed the result of a call to validate!
useData :: FormData Validated -> IO ()
useData (FormData str) = ...
</haskell>

The beauty of this is that we can define functions that work on all kinds of <hask>FormData</hask>, but still can't turn unvalidated data into validated data:

<haskell>
-- the library exports this
liftStringFn :: (String -> String) -> FormData a -> FormData a
liftStringFn fn (FormData str) = FormData (fn str)

-- the validation state is the *same* in the return type and the argument
dataToUpper :: FormData a -> FormData a
dataToUpper = liftStringFn (map toUpper)
</haskell>

With type classes, we can even choose different behaviours conditional on information that is nonexistent at runtime:

<haskell>
class Sanitise a where
sanitise :: FormData a -> FormData Validated

-- do nothing to data that is already validated
instance Sanitise Validated where
sanitise = id

-- sanitise untrusted data
instance Sanitise Unvalidated where
sanitise (FormData str) = FormData (filter isAlpha str)
</haskell>

This technique is perfect for e.g. escaping user input to a web application. We can ensure with zero overhead that the data is escaped once and only once everywhere that it needs to be, or else we get a compile-time error.

==The use of a type system to guarantee well-formedness.==

We create a Parameterized type in which the parameter does not appear
on the rhs (shameless cutting and pasting from Daan Leijen and Erik Meijer)
<haskell>
data Expr a = Expr PrimExpr

constant :: Show a => a -> Expr a
(.+.) :: Expr Int -> Expr Int -> Expr Int
(.==.) :: Eq a=> Expr a-> Expr a-> Expr Bool
(.&&.) :: Expr Bool -> Expr Bool-> Expr Bool

data PrimExpr
= BinExpr BinOp PrimExpr PrimExpr
| UnExpr UnOp PrimExpr
| ConstExpr String

data BinOp
= OpEq | OpAnd | OpPlus | ...
</haskell>
i.e. the datatype is such that we could get garbage such as
<haskell>
BinExpr OpEq (ConstExpr "1") (ConstExpr "\"foo\"")
</haskell>
but since we only expose the functions our attempts
to create this expression via
<haskell>
constant 1 .==. constant "foo"
</haskell>
would fail to typecheck

== Why not type synonyms ==
Remember that type synonyms are expanded behind the scenes before typechecking.
Suppose that in the above example you replace the declaration of Expr with <hask>type Expr a = PrimExpr</hask>. Then <hask>Expr Int</hask> and <hask>Expr String</hask> are both expanded to <hask>PrimExpr</hask> before being compared, and those types would be compatible, defeating the point of using a phantom type.

== Comments ==
I believe this technique is used when trying to interface
with a language that would cause a runtime exception if the types
were wrong but would have a go at running the expression first.
(They use it in the context of SQL but I have also seen it in the
context of FLI work.)

-- ChrisAngus

[http://www.brics.dk/RS/02/34/ A foundation for embedded languages] provides some formal background for embedding typed languages in Haskell, and also its references give a fairly comprehensive survey of uses of phantom types and related techniques.

[[Category:Idioms]]
[[Category:Glossary]]

Phantom type

2011-08-06T14:04:22Z

Blaisorblade: Explain that type synonyms shouldn't be used for phantom types

A '''phantom type''' is a parametrised type whose parameters do not all appear on the right-hand side of its definition, e.g. from <tt>Control.Applicative</tt>:

<haskell>
newtype Const a b = Const { getConst :: a }
</haskell>

Here <tt>Const</tt> is a phantom type, because the <tt>b</tt> parameter doesn't appear after the <tt>=</tt> sign.

Phantom types are useful in a variety of contexts: in the standard <tt>Data.Fixed</tt> module they are used with type classes to encode the precision being used, with [[smart constructors]] or GADTs they can encode information about how and where a value can be used, or with more exotic extensions they can be used for [[Smart_constructors#Enforcing_the_constraint_statically|encoding bounds checks in the type system.]]

Since the values of type parameters in a phantom type may be unused, they are often used in combination with [[empty type]]s.

A phantom type might not be a type synonym, but must be a newtype or a data type. Actually, the compiler will accept a "phantom type synonym", but it's a very bad idea, as explained below.

==Simple examples==

A phantom type will have a declaration that looks something like this:

<haskell>
data FormData a = FormData String
</haskell>

This looks strange since at first it seems the type parameter is unused and could be anything, without affecting the value inside. Indeed, one can write:

<haskell>
changeType :: FormData a -> FormData b
changeType (FormData str) = FormData str
</haskell>

to change it from any type to any other. However, if the constructor is not exported then users of the library that defined <hask>FormData</hask> can't define functions like the above, so the type parameter can only be set or changed by library functions. So we might do:

<haskell>
data Validated
data Unvalidated

-- since we don't export the constructor itself,
-- users with a String can only create Unvalidated values
formData :: String -> FormData Unvalidated
formData str = FormData str

-- Nothing if the data doesn't validate
validate :: FormData Unvalidated -> Maybe (FormData Validated)
validate (FormData str) = ...

-- can only be fed the result of a call to validate!
useData :: FormData Validated -> IO ()
useData (FormData str) = ...
</haskell>

The beauty of this is that we can define functions that work on all kinds of <hask>FormData</hask>, but still can't turn unvalidated data into validated data:

<haskell>
-- the library exports this
liftStringFn :: (String -> String) -> FormData a -> FormData a
liftStringFn fn (FormData str) = FormData (fn str)

-- the validation state is the *same* in the return type and the argument
dataToUpper :: FormData a -> FormData a
dataToUpper = liftStringFn (map toUpper)
</haskell>

With type classes, we can even choose different behaviours conditional on information that is nonexistent at runtime:

<haskell>
class Sanitise a where
sanitise :: FormData a -> FormData Validated

-- do nothing to data that is already validated
instance Sanitise Validated where
sanitise = id

-- sanitise untrusted data
instance Sanitise Unvalidated where
sanitise (FormData str) = FormData (filter isAlpha str)
</haskell>

This technique is perfect for e.g. escaping user input to a web application. We can ensure with zero overhead that the data is escaped once and only once everywhere that it needs to be, or else we get a compile-time error.

==The use of a type system to guarantee well-formedness.==

We create a Parameterized type in which the parameter does not appear
on the rhs (shameless cutting and pasting from Daan Leijen and Erik Meijer)
<haskell>
data Expr a = Expr PrimExpr

constant :: Show a => a -> Expr a
(.+.) :: Expr Int -> Expr Int -> Expr Int
(.==.) :: Eq a=> Expr a-> Expr a-> Expr Bool
(.&&.) :: Expr Bool -> Expr Bool-> Expr Bool

data PrimExpr
= BinExpr BinOp PrimExpr PrimExpr
| UnExpr UnOp PrimExpr
| ConstExpr String

data BinOp
= OpEq | OpAnd | OpPlus | ...
</haskell>
i.e. the datatype is such that we could get garbage such as
<haskell>
BinExpr OpEq (ConstExpr "1") (ConstExpr "\"foo\"")
</haskell>
but since we only expose the functions our attempts
to create this expression via
<haskell>
constant 1 .==. constant "foo"
</haskell>
would fail to typecheck

== Why not type synonyms ==
Remember that type synonyms are expanded behind the scenes before typechecking.
Suppose that in the above example you replace the declaration of Expr with <hask>type Expr a = PrimExpr</hask>. Then <hask>Expr Int</hask> and <hask>Expr String</hask> are both expanded to <hask>PrimExpr</hask> before being compared, and those types would be compatible, defeating the point of using a phantom type.

I believe this technique is used when trying to interface
with a language that would cause a runtime exception if the types
were wrong but would have a go at running the expression first.
(They use it in the context of SQL but I have also seen it in the
context of FLI work.)

-- ChrisAngus

[http://www.brics.dk/RS/02/34/ A foundation for embedded languages] provides some formal background for embedding typed languages in Haskell, and also its references give a fairly comprehensive survey of uses of phantom types and related techniques.

[[Category:Idioms]]
[[Category:Glossary]]

Import modules properly

2011-08-06T09:39:43Z

Blaisorblade: /* Clashing of module name abbreviations */ Reformat like introduction

== Introduction ==

Haskell has a lot of variants of [[Import|importing]] identifiers from other modules.
However not all of them are as comfortable as they seem to be at the first glance.
We recommend to focus on the following two forms of import:
<haskell>
import qualified Very.Special.Module as VSM
import Another.Important.Module (printf, (<|>), )
</haskell>
instead of
<haskell>
import Very.Special.Module
import Another.Important.Module hiding (open, close, )
</haskell>

There are three different kind of reasons for this.

* '''Style:''' If you read <hask>printf</hask>, <hask><|></hask> or <hask>VSM.open</hask> in the program you can find out easily where the identifier comes from. In the second case you don't know if these identifiers are from <hask>Very.Special.Module</hask>, <hask>Another.Important.Module</hask> or even other modules. Mind you that grep won't help, because <hask>Very.Special.Module</hask> and <hask>Another.Important.Module</hask> might just re-export other modules. You might guess the origin of <hask>printf</hask> according to its name, but for the infix operator <hask><|></hask> you will certainly have no idea.
* '''Compatibility:''' In the second case, if new identifiers are added to the imported modules they might clash with names of other modules. Thus updating imported modules may break your code. If you import a package A with version a.b.c.d that follows the [[Package versioning policy]] then within versions with the same a.b it is allowed to add identifiers. This means that if you import the suggested way, you can safely specify <code>A >= a.b.c && <a.b+1</code> in your [[Cabal]] file. Otherwise you have to chose the smaller range <code>A >= a.b.c && <a.b.c+1</code>. It may also be that <hask>Another.Important.Module.open</hask> was deprecated when you hid it, and with a module update removing that identifier, your import fails. That is, an identifier that you never needed but only annoyed you, annoys you again, when it was meant to not bother you any longer! The first variant of import does not suffer from these problems.
* '''Correctness:''' I once found a bug in the StorableVector package by converting anonymous imports to explicit imports. I found out that the function <hask>Foreign.Ptr.plusPtr</hask> was imported, although functions from this module always have to calculate with unit "element" not "byte". That is, <hask>advancePtr</hask> must be used instead. Actually, the <hask>reverse</hask> function used <hask>plusPtr</hask> and this was wrong. A misbehaviour could only be observed for sub-vectors and elements with size greater than 1 byte. The test suite did miss that.

== Exception from the rule ==

Since the Prelude is intended to be fixed for the future, it should be safe to use the <hask>hiding</hask> clause when importing <hask>Prelude</hask>.
Actually if you do not mention Prelude it will be imported anonymously.

== Clashing of module name abbreviations ==

In Haskell it is possible to use the same abbreviation for different modules:
<haskell>
import qualified Data.List as List
import qualified Data.List.Extra as List
</haskell>
This is discouraged for the same reasons as above:

* '''Style''': The identifier <hask>List.intercalate</hask> may refer to either <hask>Data.List</hask> or <hask>Data.List.Extra</hask>. The reader of that module has to check these modules in order to find it out.

* '''Compatibility''': The function <hask>List.intercalate</hask> may be currently defined only in <hask>Data.List.Extra</hask>. However after many people found it useful, it is also added to <hask>Data.List</hask>. Then <hask>List.intercalate</hask> can no longer be resolved.

== Counter-arguments to explicit import lists ==

The issue of whether to use explicit import lists is not always clear-cut, however.
Here are some reasons you might not want to do this:

* Development is slower: almost every change is accompanied by an import list change, especially if you want to keep your code warning-clean.

* When working on a project with multiple developers, explicit import lists can cause spurious conflicts, since two otherwise-unrelated changes to a file may both require changes to the same import list.

For these reasons amongst others, the GHC project decided to drop the use of explicit import lists.
We recommend using explicit import lists when importing from other packages,
but not when importing modules within the same package.

Qualified use of identifiers does not suffer from the above problems.

== See also ==

* [[Qualified names]]

{{essay}}

[[Category:Style]]

Import modules properly

2011-08-06T09:38:21Z

Blaisorblade: /* Introduction */ Remove extra linebreaks, reformat list of reasons, highlight their name

== Introduction ==

Haskell has a lot of variants of [[Import|importing]] identifiers from other modules.
However not all of them are as comfortable as they seem to be at the first glance.
We recommend to focus on the following two forms of import:
<haskell>
import qualified Very.Special.Module as VSM
import Another.Important.Module (printf, (<|>), )
</haskell>
instead of
<haskell>
import Very.Special.Module
import Another.Important.Module hiding (open, close, )
</haskell>

There are three different kind of reasons for this.

* '''Style:''' If you read <hask>printf</hask>, <hask><|></hask> or <hask>VSM.open</hask> in the program you can find out easily where the identifier comes from. In the second case you don't know if these identifiers are from <hask>Very.Special.Module</hask>, <hask>Another.Important.Module</hask> or even other modules. Mind you that grep won't help, because <hask>Very.Special.Module</hask> and <hask>Another.Important.Module</hask> might just re-export other modules. You might guess the origin of <hask>printf</hask> according to its name, but for the infix operator <hask><|></hask> you will certainly have no idea.
* '''Compatibility:''' In the second case, if new identifiers are added to the imported modules they might clash with names of other modules. Thus updating imported modules may break your code. If you import a package A with version a.b.c.d that follows the [[Package versioning policy]] then within versions with the same a.b it is allowed to add identifiers. This means that if you import the suggested way, you can safely specify <code>A >= a.b.c && <a.b+1</code> in your [[Cabal]] file. Otherwise you have to chose the smaller range <code>A >= a.b.c && <a.b.c+1</code>. It may also be that <hask>Another.Important.Module.open</hask> was deprecated when you hid it, and with a module update removing that identifier, your import fails. That is, an identifier that you never needed but only annoyed you, annoys you again, when it was meant to not bother you any longer! The first variant of import does not suffer from these problems.
* '''Correctness:''' I once found a bug in the StorableVector package by converting anonymous imports to explicit imports. I found out that the function <hask>Foreign.Ptr.plusPtr</hask> was imported, although functions from this module always have to calculate with unit "element" not "byte". That is, <hask>advancePtr</hask> must be used instead. Actually, the <hask>reverse</hask> function used <hask>plusPtr</hask> and this was wrong. A misbehaviour could only be observed for sub-vectors and elements with size greater than 1 byte. The test suite did miss that.

== Exception from the rule ==

Since the Prelude is intended to be fixed for the future, it should be safe to use the <hask>hiding</hask> clause when importing <hask>Prelude</hask>.
Actually if you do not mention Prelude it will be imported anonymously.

== Clashing of module name abbreviations ==

In Haskell it is possible to use the same abbreviation for different modules:
<haskell>
import qualified Data.List as List
import qualified Data.List.Extra as List
</haskell>
This is discouraged for the same reasons as above:

Stylistic reason:
The identifier <hask>List.intercalate</hask> may refer to either <hask>Data.List</hask> or <hask>Data.List.Extra</hask>.
The reader of that module has to check these modules in order to find it out.

Compatibility reason:
The function <hask>List.intercalate</hask> may be currently defined only in <hask>Data.List.Extra</hask>.
However after many people found it useful, it is also added to <hask>Data.List</hask>.
Then <hask>List.intercalate</hask> can no longer be resolved.

== Counter-arguments to explicit import lists ==

The issue of whether to use explicit import lists is not always clear-cut, however.
Here are some reasons you might not want to do this:

* Development is slower: almost every change is accompanied by an import list change, especially if you want to keep your code warning-clean.

* When working on a project with multiple developers, explicit import lists can cause spurious conflicts, since two otherwise-unrelated changes to a file may both require changes to the same import list.

For these reasons amongst others, the GHC project decided to drop the use of explicit import lists.
We recommend using explicit import lists when importing from other packages,
but not when importing modules within the same package.

Qualified use of identifiers does not suffer from the above problems.

== See also ==

* [[Qualified names]]

{{essay}}

[[Category:Style]]

Import modules properly

2011-08-06T09:29:43Z

Blaisorblade: /* Introduction */ Remove extra line break missed before

== Introduction ==

Haskell has a lot of variants of [[Import|importing]] identifiers from other modules.
However not all of them are as comfortable as they seem to be at the first glance.
We recommend to focus on the following two forms of import:
<haskell>
import qualified Very.Special.Module as VSM
import Another.Important.Module (printf, (<|>), )
</haskell>
instead of
<haskell>
import Very.Special.Module
import Another.Important.Module hiding (open, close, )
</haskell>

Stylistic reason:
If you read <hask>printf</hask>, <hask><|></hask> or <hask>VSM.open</hask> in the program you can find out easily where the identifier comes from.
In the second case you don't know if these identifiers are from <hask>Very.Special.Module</hask>, <hask>Another.Important.Module</hask> or even other modules.
Mind you that grep won't help, because <hask>Very.Special.Module</hask> and <hask>Another.Important.Module</hask> might just re-export other modules.
You might guess the origin of <hask>printf</hask> according to its name,
but for the infix operator <hask><|></hask> you will certainly have no idea.

Compatibility reason:
In the second case, if new identifiers are added to the imported modules they might clash with names of other modules.
Thus updating imported modules may break your code.
If you import a package A with version a.b.c.d that follows the [[Package versioning policy]] then within versions with the same a.b it is allowed to add identifiers.
This means that if you import the suggested way, you can safely specify <code>A >= a.b.c && <a.b+1</code> in your [[Cabal]] file.
Otherwise you have to chose the smaller range <code>A >= a.b.c && <a.b.c+1</code>.

It may also be that <hask>Another.Important.Module.open</hask> was deprecated when you hid it, and with a module update removing that identifier, your import fails.
That is, an identifier that you never needed but only annoyed you, annoys you again, when it was meant to not bother you any longer!
The first variant of import does not suffer from these problems.

Correctness reason:
I once found a bug in the StorableVector package by converting anonymous imports to explicit imports.
I found out that the function <hask>Foreign.Ptr.plusPtr</hask> was imported, although functions from this module always have to calculate with unit "element" not "byte".
That is, <hask>advancePtr</hask> must be used instead.
Actually, the <hask>reverse</hask> function used <hask>plusPtr</hask> and this was wrong.
A misbehaviour could only be observed for sub-vectors and elements with size greater than 1 byte.
The test suite did miss that.

== Exception from the rule ==

Since the Prelude is intended to be fixed for the future, it should be safe to use the <hask>hiding</hask> clause when importing <hask>Prelude</hask>.
Actually if you do not mention Prelude it will be imported anonymously.

== Clashing of module name abbreviations ==

In Haskell it is possible to use the same abbreviation for different modules:
<haskell>
import qualified Data.List as List
import qualified Data.List.Extra as List
</haskell>
This is discouraged for the same reasons as above:

Stylistic reason:
The identifier <hask>List.intercalate</hask> may refer to either <hask>Data.List</hask> or <hask>Data.List.Extra</hask>.
The reader of that module has to check these modules in order to find it out.

Compatibility reason:
The function <hask>List.intercalate</hask> may be currently defined only in <hask>Data.List.Extra</hask>.
However after many people found it useful, it is also added to <hask>Data.List</hask>.
Then <hask>List.intercalate</hask> can no longer be resolved.

== Counter-arguments to explicit import lists ==

The issue of whether to use explicit import lists is not always clear-cut, however.
Here are some reasons you might not want to do this:

* Development is slower: almost every change is accompanied by an import list change, especially if you want to keep your code warning-clean.

* When working on a project with multiple developers, explicit import lists can cause spurious conflicts, since two otherwise-unrelated changes to a file may both require changes to the same import list.

For these reasons amongst others, the GHC project decided to drop the use of explicit import lists.
We recommend using explicit import lists when importing from other packages,
but not when importing modules within the same package.

Qualified use of identifiers does not suffer from the above problems.

== See also ==

* [[Qualified names]]

{{essay}}

[[Category:Style]]

Import modules properly

2011-08-06T09:28:54Z

Blaisorblade: Remove spurious extra line breaks

== Introduction ==

Haskell has a lot of variants of [[Import|importing]] identifiers from other modules.
However not all of them are as comfortable as they seem to be at the first glance.
We recommend to focus on the following two forms of import:
<haskell>
import qualified Very.Special.Module as VSM
import Another.Important.Module (printf, (<|>), )
</haskell>
instead of
<haskell>
import Very.Special.Module
import Another.Important.Module hiding (open, close, )
</haskell>

Stylistic reason:
If you read <hask>printf</hask>, <hask><|></hask> or <hask>VSM.open</hask> in the program you can find out easily where the identifier comes from.
In the second case you don't know if these identifiers are from <hask>Very.Special.Module</hask>, <hask>Another.Important.Module</hask> or even other modules.
Mind you that grep won't help, because <hask>Very.Special.Module</hask> and <hask>Another.Important.Module</hask> might just re-export other modules.
You might guess the origin of <hask>printf</hask> according to its name,
but for the infix operator <hask><|></hask> you will certainly have no idea.

Compatibility reason:
In the second case, if new identifiers are added to the imported modules they might clash with names of other modules.
Thus updating imported modules may break your code.
If you import a package A with version a.b.c.d that follows the [[Package versioning policy]] then within versions with the same a.b it is allowed to add identifiers.
This means that if you import the suggested way, you can safely specify <code>A >= a.b.c && <a.b+1</code> in your [[Cabal]] file.
Otherwise you have to chose the smaller range <code>A >= a.b.c && <a.b.c+1</code>.

It may also be that <hask>Another.Important.Module.open</hask> was deprecated when you hid it, and with a module update removing that identifier, your import fails.
That is, an identifier that you never needed but only annoyed you, annoys you again, when it was meant to not bother you any longer!
The first variant of import does not suffer from these problems.

Correctness reason:
I once found a bug in the StorableVector package by converting anonymous imports to explicit imports.
I found out that the function <hask>Foreign.Ptr.plusPtr</hask> was imported,
although functions from this module always have to calculate with unit "element" not "byte".
That is, <hask>advancePtr</hask> must be used instead.
Actually, the <hask>reverse</hask> function used <hask>plusPtr</hask> and this was wrong.
A misbehaviour could only be observed for sub-vectors and elements with size greater than 1 byte.
The test suite did miss that.

== Exception from the rule ==

Since the Prelude is intended to be fixed for the future, it should be safe to use the <hask>hiding</hask> clause when importing <hask>Prelude</hask>.
Actually if you do not mention Prelude it will be imported anonymously.

== Clashing of module name abbreviations ==

In Haskell it is possible to use the same abbreviation for different modules:
<haskell>
import qualified Data.List as List
import qualified Data.List.Extra as List
</haskell>
This is discouraged for the same reasons as above:

Stylistic reason:
The identifier <hask>List.intercalate</hask> may refer to either <hask>Data.List</hask> or <hask>Data.List.Extra</hask>.
The reader of that module has to check these modules in order to find it out.

Compatibility reason:
The function <hask>List.intercalate</hask> may be currently defined only in <hask>Data.List.Extra</hask>.
However after many people found it useful, it is also added to <hask>Data.List</hask>.
Then <hask>List.intercalate</hask> can no longer be resolved.

== Counter-arguments to explicit import lists ==

The issue of whether to use explicit import lists is not always clear-cut, however.
Here are some reasons you might not want to do this:

* Development is slower: almost every change is accompanied by an import list change, especially if you want to keep your code warning-clean.

* When working on a project with multiple developers, explicit import lists can cause spurious conflicts, since two otherwise-unrelated changes to a file may both require changes to the same import list.

For these reasons amongst others, the GHC project decided to drop the use of explicit import lists.
We recommend using explicit import lists when importing from other packages,
but not when importing modules within the same package.

Qualified use of identifiers does not suffer from the above problems.

== See also ==

* [[Qualified names]]

{{essay}}

[[Category:Style]]

Pointfree

2011-06-05T14:44:55Z

Blaisorblade: Jos -> José

__TOC__

'''Pointfree Style'''

It is very common for functional programmers to write functions as a
composition of other functions, never mentioning the actual arguments
they will be applied to. For example, compare:

<haskell>
sum = foldr (+) 0
</haskell>

with:

<haskell>
sum' xs = foldr (+) 0 xs
</haskell>

These functions perform the same operation, however, the former is more compact, and is considered cleaner. This is closely related to function pipelines (and to [http://www.vex.net/~trebla/weblog/pointfree.html unix shell scripting]): it is clearer to write <hask>let fn = f . g . h</hask> than to write <hask>let fn x = f (g (h x))</hask>.

This style is particularly useful when deriving efficient programs by
calculation and, in general, constitutes good discipline. It helps the writer
(and reader) think about composing functions (high level), rather than
shuffling data (low level).

It is a common experience when rewriting expressions in pointfree style
to derive more compact, clearer versions of the code -- explicit points
often obscure the underlying algorithm.

Point-free map fusion:

<haskell>
foldr f e . map g == foldr (f . g) e
</haskell>

versus pointful map fusion:

<haskell>
foldr f e . map g == foldr f' e
where f' a b = f (g a) b
</haskell>

Some more examples:

<haskell>
-- point-wise, and point-free member
mem, mem' :: Eq a => a -> [a] -> Bool

mem x lst = any (== x) lst
mem' = any . (==)
</haskell>

== But pointfree has more points! ==

A common misconception is that the 'points' of pointfree style are the <hask>(.)</hask> operator (function composition, as an ASCII symbol), which uses the same identifier as the decimal point. This is wrong. The term originated in topology, a branch of mathematics which works with spaces composed of points, and functions between those spaces. So a 'points-free' definition of a function is one which does not explicitly mention the points (values) of the space on which the function acts. In Haskell, our 'space' is some type, and 'points' are values. In the declaration
<haskell>
f x = x + 1
</haskell>
we define the function <hask>f</hask> in terms of its action on an arbitrary point <hask>x</hask>. Contrast this with the points-free version:
<haskell>
f = (+ 1)
</haskell>
where there is no mention of the value on which the function is acting.

== Background ==

To find out more about this style, search for Squiggol and the Bird-Meertens Formalism, a style of functional programming by calculation that was developed by [http://web.comlab.ox.ac.uk/oucl/work/richard.bird/publications.html Richard Bird], [http://www.kestrel.edu/home/people/meertens/ Lambert Meertens], and others at Oxford University. [http://web.comlab.ox.ac.uk/oucl/work/jeremy.gibbons/publications/ Jeremy Gibbons] has also written a number of papers about the topic, which are cited below.

== Tool support ==

Thomas Yaeger has
[http://www.cse.unsw.edu.au/~dons/code/lambdabot/Plugins/Pl/ written] a
[http://haskell.org/haskellwiki/Lambdabot Lambdabot]
plugin to automatically convert a large subset of Haskell expressions to
pointfree form. This tool has made it easier to use the more abstract
pointfree encodings (as it saves some mental gymnastics on the part of
the programmer). You can experiment with this in the [[IRC channel|Haskell IRC channel]]. A stand-alone command-line version is available at [http://hackage.haskell.org/package/pointfree HackageDB] (package pointfree).

The @pl (point-less) plugin is rather infamous for using the <hask>(->) a</hask> [[Monad|monad]] to obtain concise code. It also makes use of [[Arrow|Arrows]]. It also sometimes produces (amusing) code blow ups with the
<hask>(.)</hask> operator.

Recently, @unpl has been written, which (attempts) to unscramble @pl-ified code. It also has a [http://hackage.haskell.org/package/pointful stand-alone command-line version] (package pointful).

A transcript:

<haskell>
> pl \x y -> x y
id

> unpl id
(\ a -> a)

> pl \x y -> x + 1
const . (1 +)

> unpl const . (1 +)
(\ e _ -> 1 + e)

> pl \v1 v2 -> sum (zipWith (*) v1 v2)
(sum .) . zipWith (*)

> unpl (sum .) . zipWith (*)
(\ d g -> sum (zipWith (*) d g))

> pl \x y z -> f (g x y z)
((f .) .) . g

> unpl ((f .) .) . g
(\ e j m -> f (g e j m))

> pl \x y z -> f (g x y) z
(f .) . g

> unpl (f .) . g
(\ d i -> f (g d i))

> pl \x y z -> f z (g x y)
(flip f .) . g

> unpl (flip f .) . g
(\ i l c -> f c (g i l))

> pl \(a,b) -> (b,a)
uncurry (flip (,))

> pl f a b = b a
f = flip id

> pl \ x -> x * x
join (*)

> pl \a b -> a:b:[]
(. return) . (:)

> pl \x -> x+x+x
(+) =<< join (+)

> pl \a b -> Nothing
const (const Nothing)

> pl \(a,b) -> (f a, g b)
f *** g

> pl \f g h x -> f x `h` g x
flip . (ap .) . flip (.)

> pl \x y -> x . f . y
(. (f .)) . (.)

> pl \f xs -> xs >>= return . f
fmap

> pl \h f g x -> f x `h` g x
liftM2

> pl \f a b c d -> f b c d a
flip . ((flip . (flip .)) .)

> pl \a (b,c) -> a c b
(`ap` snd) . (. fst) . flip

> pl \x y -> compare (f x) (f y)
((. f) . compare .)
</haskell>

For many many more examples, google for the results of '@pl' in the [[IRC_channel|#haskell]] logs. (Or join #haskell on FreeNode and try it yourself!) It can, of course, get out of hand:

<haskell>
> pl \(a,b) -> a:b:[]
uncurry ((. return) . (:))

> pl \a b c -> a*b+2+c
((+) .) . flip flip 2 . ((+) .) . (*)

> pl \f (a,b) -> (f a, f b)
(`ap` snd) . (. fst) . (flip =<< (((.) . (,)) .))

> pl \f g (a,b) -> (f a, g b)
flip flip snd . (ap .) . flip flip fst . ((.) .) . flip . (((.) . (,)) .)

> unpl flip flip snd . (ap .) . flip flip fst . ((.) .) . flip . (((.) . (,)) .)
(\ aa f ->
(\ p w -> ((,)) (aa (fst p)) (f w)) >>=
\ ao -> snd >>= \ an -> return (ao an))
</haskell>

== Combinator discoveries ==

Some fun combinators have been found via @pl. Here we list some of the best:

=== The owl ===

<haskell>
((.)$(.))
</haskell>

The owl has type <hask>(a -> b -> c) -> a -> (a1 -> b) -> a1 -> c</hask>, and in pointful style can be written as <hask> f a b c d = a b (c d)</hask>.

Example
<haskell>
> ((.)$(.)) (==) 1 (1+) 0
True
</haskell>

=== Dot ===

<haskell>
dot = ((.).(.))

dot :: (b -> c) -> (a -> a1 -> b) -> a -> a1 -> c
</haskell>

Example:

<haskell>
sequence `dot` replicate ==
(sequence .) . replicate ==
replicateM

(=<<) == join `dot` fmap
</haskell>

=== Swing ===

-- Note: @pl had nothing to do with the invention of this combinator. I constructed it by hand after noticing a common pattern. -- Cale

<haskell>
swing :: (((a -> b) -> b) -> c -> d) -> c -> a -> d
swing = flip . (. flip id)
swing f = flip (f . runCont . return)
swing f c a = f ($ a) c
</haskell>

Some examples of use:

<haskell>
swing map :: forall a b. [a -> b] -> a -> [b]
swing any :: forall a. [a -> Bool] -> a -> Bool
swing foldr :: forall a b. b -> a -> [a -> b -> b] -> b
swing zipWith :: forall a b c. [a -> b -> c] -> a -> [b] -> [c]
swing find :: forall a. [a -> Bool] -> a -> Maybe (a -> Bool)
-- applies each of the predicates to the given value, returning the first predicate which succeeds, if any
swing partition :: forall a. [a -> Bool] -> a -> ([a -> Bool], [a -> Bool])
</haskell>

=== Squish ===

<haskell>
f >>= a . b . c =<< g
</haskell>

Example:

<haskell>
(readFile y >>=) . ((a . b) .) . c =<< readFile x
</haskell>

[[/Combine|An actually useful example]], numbering lines of a file.

== Problems with pointfree ==

Point-free style can (clearly) lead to [[Obfuscation]] when used unwisely.
As higher-order functions are chained together, it can become harder to
mentally infer the types of expressions. The mental cues to an
expression's type (explicit function arguments, and the number of
arguments) go missing.

Point-free style often times leads to code which is difficult to modify. A function written in a pointfree style may have to be radically changed to make minor changes in functionality. This is because the function becomes more complicated than a composition of lambdas and other functions, and compositions must be changed to application for a pointful function.

Perhaps these are why pointfree style is sometimes (often?) referred to as
''pointless style''.

== References ==

One early reference is

* Backus, J. 1978. "Can Programming Be Liberated from the von Neumann Style? A Functional Style and Its Algebra of Programs," Communications of the Association for Computing Machinery 21:613-641.

which appears to be available (as a scan) at http://www.stanford.edu/class/cs242/readings/backus.pdf

A paper specifically about point-free style:
* http://web.comlab.ox.ac.uk/oucl/work/jeremy.gibbons/publications/index.html#radix

This style underlies a lot of expert Haskeller's intuitions. A rather infamous paper (for all the cute symbols) is Erik Meijer et. al's:

* Functional Programming with Bananas, Lenses, and Barbed Wire, http://wwwhome.cs.utwente.nl/~fokkinga/mmf91m.ps.

[http://en.wikipedia.org/wiki/Squiggol Squiggol], and the Bird-Meertens Formalism:
* http://web.comlab.ox.ac.uk/oucl/work/jeremy.gibbons/publications/index.html#squiggolintro.
* A Calculus of Functions for Program Derivation, R.S. Bird, in Res Topics in Fnl Prog, D. Turner ed, A-W 1990.
* The Squiggolist, ed Johan Jeuring, published irregularly by CWI Amsterdam.

[http://wiki.di.uminho.pt/twiki/bin/view/Personal/Alcino/PointlessHaskell Pointless Haskell] is a library for point-free programming with recursion patterns defined as hylomorphisms. It also allows the visualization of the intermediate data structure of the hylomorphisms with GHood. This feature together with the DrHylo tool allows us to easily visualize recursion trees of Haskell functions. [http://wiki.di.uminho.pt/wiki/pub/Ze/Bic/report.pdf Haskell Manipulation] by Jose Miguel Paiva Proenca discusses this tool based approach to re-factoring.

This project is written by [http://www.di.uminho.pt/~mac/ Manuel Alcino Cunha], see his homepage for more related materials on the topic.
An extended verson of his paper ''Point-free Programming with Hylomorphisms'' can be found [http://web.comlab.ox.ac.uk/oucl/research/pdt/ap/dgp/workshop2004/cunha.pdf here].

== Other areas ==

[[Combinatory logic]] and also [[Recursive function theory]] can be said in some sense pointfree.

Are there pointfree approaches to [[relational algebra]]?
See [http://www.di.uminho.pt/~jno/ps/_.pdf First Steps in Pointfree Functional Dependency Theory] written by José Nuno Oliveira. A concise and deep approach. See also [http://www.di.uminho.pt/~jno/html/ the author's homepage] and also [http://www.di.uminho.pt/~jno/html/jnopub.html his many other papers] -- many materials related to this topic can be found there.

[[Category:Idioms]]

GHC/Type families

2010-08-14T19:35:14Z

Blaisorblade: Minor clarifications to TillmannRendel's changes

[[Category:GHC|Indexed types]]

Indexed type families, or '''type families''' for short, are a Haskell extension supporting ad-hoc overloading of data types. Type families are parametric types that can be assigned specialized representations based on the type parameters they are instantiated with. They are the data type analogue of [[Type class|type classes]]: families are used to define overloaded ''data'' in the same way that classes are used to define overloaded ''functions''. Type families are useful for generic programming, for creating highly parameterised library interfaces, and for creating interfaces with enhanced static information, much like dependent types.

Type families come in two flavors: ''data families'' and ''type synonym families''. Data families are the indexed form of data and newtype definitions. Type synonym families are the indexed form of type synonyms. Each of these flavors can be defined in a standalone manner or ''associated'' with a type class. Standalone definitions are more general, while associated types can more clearly express how a type is used and lead to better error messages.

== What are type families? ==

The concept of a type family comes from type theory. An indexed type family in type theory is a partial function at the type level. Applying the function to parameters (called ''type indices'') yields a type. Type families permit a program to compute what data constructors it will operate on, rather than having them fixed statically (as with simple type systems) or treated as opaque unknowns (as with parametrically polymorphic types).

Type families are to vanilla data types what type class methods are to regular functions. Vanilla polymorphic data types and functions have a single definition, which is used at all type instances. Classes and type families, on the other hand, have an interface definition and any number of instance definitions. A type family's interface definition declares its [[kind]] and its arity, or the number of type indices it takes. Instance definitions define the type family over some part of the domain.

As a simple example of how type families differ from ordinary parametric data types, consider a strict list type. We can represent a list of <hask>Char</hask> in the usual way, with cons cells. We can do the same thing to represent a list of <hask>()</hask>, but since a strict <hask>()</hask> value carries no useful information, it would be more efficient to just store the length of the list. This can't be done with an ordinary parametric data type, because the data constructors used to represent the list would depend on the list's type parameter: if it's <hask>Char</hask> then the list consists of cons cells; if it's <hask>()</hask>, then the list consists of a single integer. We basically want to select between two different data types based on a type parameter. Using type families, this list type could be declared as follows:

<haskell>
-- Declare a list-like data family
data family XList a

-- Declare a list-like instance for Char
data instance XList Char = XCons !Char !(XList Char) | XNil

-- Declare a number-like instance for ()
data instance XList () = XListUnit !Int
</haskell>

The right-hand sides of the two <code>data instance</code> declarations are exactly ordinary data definitions. However, they define two instances of the same parametric data type, <hask>XList Char</hask> and <hask>XList ()</hask>, whereas ordinary data declarations define completely unrelated types. A recent [[Simonpj/Talk:FunWithTypeFuns|tutorial paper]] has more in-depth examples of programming with type families.

GADTs bear some similarity to type families, in the sense that they allow a parametric type's constructors to depend on the type's parameters. However, all GADT constructors must be defined in one place, whereas type families can be extended. Functional dependences are similar to type families, and many type classes that use functional dependences can be equivalently expressed with type families. Type families provide a more functional style of type-level programming than the relational style of functional dependences.

== What do I need to use type families? ==

Type families are a GHC extension enabled with the <code>-XTypeFamilies</code> flag or the <code>{-# LANGUAGE TypeFamilies #-}</code> pragma. The first stable release of GHC that properly supports type families is 6.10.1. (The 6.8 release included an early partial implementation, but its use is deprecated.) Please [http://hackage.haskell.org/trac/ghc/query?status=new&status=assigned&status=reopened&group=priority&type=bug&order=id&desc=1 report bugs] via the GHC bug tracker, ideally accompanied by a small example program that demonstrates the problem. Use the [mailto:glasgow-haskell-users@haskell.org GHC mailing list] for questions or for a discussion of this language extension and its description on this wiki page.

== An associated data type example ==

As an example, consider Ralf Hinze's [http://www.informatik.uni-bonn.de/~ralf/publications.html#J4 generalised tries], a form of generic finite maps.

=== The class declaration ===

We define a type class whose instances are the types that we can use as keys in our generic maps:
<haskell>
class GMapKey k where
data GMap k :: * -> *
empty :: GMap k v
lookup :: k -> GMap k v -> Maybe v
insert :: k -> v -> GMap k v -> GMap k v
</haskell>
The interesting part is the ''associated data family'' declaration of the class. It gives a [http://www.haskell.org/ghc/docs/latest/html/users_guide/type-families.html#data-family-declarations ''kind signature''] (here <hask>* -> *</hask>) for the associated data type <hask>GMap k</hask> - analogous to how methods receive a type signature in a class declaration.

What it is important to notice is that the first parameter of the associated type <hask>GMap</hask> coincides with the class parameter of <hask>GMapKey</hask>. This indicates that also in all instances of the class, the instances of the associated data type need to have their first argument match up with the instance type. In general, the type arguments of an associated type can be a subset of the class parameters (in a multi-parameter type class) and they can appear in any order, possibly in an order other than in the class head. The latter can be useful if the associated data type is partially applied in some contexts.

The second important point is that as <hask>GMap k</hask> has kind <hask>* -> *</hask> and <hask>k</hask> (implicitly) has kind <hask>*</hask>, the type constructor <hask>GMap</hask> (without an argument) has kind <hask>* -> * -> *</hask>. Consequently, we see that <hask>GMap</hask> is applied to two arguments in the signatures of the methods <hask>empty</hask>, <hask>lookup</hask>, and <hask>insert</hask>.

=== An Int instance ===

To use Ints as keys into generic maps, we declare an instance that simply uses <hask>Data.IntMap</hask>, thusly:
<haskell>
instance GMapKey Int where
data GMap Int v = GMapInt (Data.IntMap.IntMap v)
empty = GMapInt Data.IntMap.empty
lookup k (GMapInt m) = Data.IntMap.lookup k m
insert k v (GMapInt m) = GMapInt (Data.IntMap.insert k v m)
</haskell>
The <hask>Int</hask> instance of the associated data type <hask>GMap</hask> needs to have both of its parameters, but as only the first one corresponds to a class parameter, the second needs to be a type variable (here <hask>v</hask>). As mentioned before, any associated type parameter that corresponds to a class parameter must be identical to the class parameter in each instance. The right hand side of the associated data declaration is like that of any other data type.

NB: At the moment, GADT syntax is not allowed for associated data types (or other indexed types). This is not a fundamental limitation, but just a shortcoming of the current implementation, which we expect to lift in the future.

As an exercise, implement an instance for <hask>Char</hask> that maps back to the <hask>Int</hask> instance using the conversion functions <hask>Char.ord</hask> and <hask>Char.chr</hask>.

=== A unit instance ===

Generic definitions, apart from elementary types, typically cover units, products, and sums. We start here with the unit instance for <hask>GMap</hask>:
<haskell>
instance GMapKey () where
data GMap () v = GMapUnit (Maybe v)
empty = GMapUnit Nothing
lookup () (GMapUnit v) = v
insert () v (GMapUnit _) = GMapUnit $ Just v
</haskell>
For unit, the map is just a <hask>Maybe</hask> value.

=== Product and sum instances ===

Next, let us define the instances for pairs and sums (i.e., <hask>Either</hask>):
<haskell>
instance (GMapKey a, GMapKey b) => GMapKey (a, b) where
data GMap (a, b) v = GMapPair (GMap a (GMap b v))
empty = GMapPair empty
lookup (a, b) (GMapPair gm) = lookup a gm >>= lookup b
insert (a, b) v (GMapPair gm) = GMapPair $ case lookup a gm of
Nothing -> insert a (insert b v empty) gm
Just gm2 -> insert a (insert b v gm2 ) gm

instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where
data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)
empty = GMapEither empty empty
lookup (Left a) (GMapEither gm1 _gm2) = lookup a gm1
lookup (Right b) (GMapEither _gm1 gm2 ) = lookup b gm2
insert (Left a) v (GMapEither gm1 gm2) = GMapEither (insert a v gm1) gm2
insert (Right b) v (GMapEither gm1 gm2) = GMapEither gm1 (insert b v gm2)
</haskell>
If you find this code algorithmically surprising, I'd suggest to have a look at [http://www.informatik.uni-bonn.de/~ralf/publications.html#J4 Ralf Hinze's paper]. The only novelty concerning associated types, in these two instances, is that the instances have a context <hask>(GMapKey a, GMapKey b)</hask>. Consequently, the right hand sides of the associated type declarations can use <hask>GMap</hask> recursively at the key types <hask>a</hask> and <hask>b</hask> - not unlike the method definitions use the class methods recursively at the types for which the class is given in the instance context.

=== Using a generic map ===

Finally, some code building and querying a generic map:
<haskell>
myGMap :: GMap (Int, Either Char ()) String
myGMap = insert (5, Left 'c') "(5, Left 'c')" $
insert (4, Right ()) "(4, Right ())" $
insert (5, Right ()) "This is the one!" $
insert (5, Right ()) "This is the two!" $
insert (6, Right ()) "(6, Right ())" $
insert (5, Left 'a') "(5, Left 'a')" $
empty
main = putStrLn $ maybe "Couldn't find key!" id $ lookup (5, Right ()) myGMap
</haskell>

=== Download the code ===

If you want to play with this example without copying it off the wiki, just download the [http://darcs.haskell.org/testsuite/tests/ghc-regress/indexed-types/should_run/GMapAssoc.hs source code for <hask>GMap</hask>] from GHC's test suite.

== Detailed definition of data families ==

Data families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated types). The former is the more general variant, as it lacks the requirement for the type-indices to coincide with the class parameters. However, the latter can lead to more clearly structured code and compiler warnings if some type instances were - possibly accidentally - omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types.

=== Family declarations ===

Indexed data families are introduced by a signature, such as
<haskell>
data family GMap k :: * -> *
</haskell>
The special <hask>family</hask> distinguishes family from standard data declarations. The result kind annotation is optional and, as usual, defaults to <hask>*</hask> if omitted. An example is
<haskell>
data family Array e
</haskell>
Named arguments can also be given explicit kind signatures if needed. Just as with [http://www.haskell.org/ghc/docs/latest/html/users_guide/gadt.html GADT declarations] named arguments are entirely optional, so that we can declare <hask>Array</hask> alternatively with
<haskell>
data family Array :: * -> *
</haskell>

==== Associated family declarations ====

When a data family is declared as part of a type class, we drop the <hask>family</hask> keyword. The <hask>GMap</hask> declaration takes the following form
<haskell>
class GMapKey k where
data GMap k :: * -> *
...
</haskell>
In contrast to toplevel declarations, named arguments must be used for all type parameters that are to be used as type-indices. Moreover, the argument names must be class parameters. Each class parameter may only be used at most once per associated type, but some may be omitted and they may be in an order other than in the class head. In other words: '''the named type parameters of the data declaration must be a permutation of a subset of the class variables'''.

Example is admissible:
<haskell>
class C a b c where { data T c a :: * } -- OK
class C a b c where { data T a a :: * } -- Bad: repeated variable
class D a where { data T a x :: * } -- Bad: x is not a class variable
class D a where { data T a :: * -> * } -- OK
</haskell>

=== Instance declarations ===

Instance declarations of data and newtype families are very similar to standard data and newtype declarations. The only two differences are that the keyword <hask>data</hask> or <hask>newtype</hask> is followed by <hask>instance</hask> and that some or all of the type arguments can be non-variable types, but may not contain forall types or type synonym families. However, data families are generally allowed in type parameters, and type synonyms are allowed as long as they are fully applied and expand to a type that is itself admissible - exactly as this is required for occurrences of type synonyms in class instance parameters. For example, the <hask>Either</hask> instance for <hask>GMap</hask> is
<haskell>
data instance GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)
</haskell>
In this example, the declaration has only one variant. In general, it can be any number.

Data and newtype instance declarations are only legit when an appropriate family declaration is in scope - just like class instances require the class declaration to be visible. Moreover, each instance declaration has to conform to the kind determined by its family declaration. This implies that the number of parameters of an instance declaration matches the arity determined by the kind of the family. Although all data families are declared with the <hask>data</hask> keyword, instances can be either <hask>data</hask> or <hask>newtype</hask>s, or a mix of both.

Even if type families are defined as toplevel declarations, functions that perform different computations for different family instances still need to be defined as methods of type classes. In particular, the following is not possible:
<haskell>
data family T a
data instance T Int = A
data instance T Char = B
nonsense :: T a -> Int
nonsense A = 1 -- WRONG: These two equations together...
nonsense B = 2 -- ...will produce a type error.
</haskell>
Given the functionality provided by GADTs (Generalised Algebraic Data Types), it might seem as if a definition, such as the above, should be feasible. However, type families - in contrast to GADTs - are ''open''; i.e., new instances can always be added, possibly in other modules. Supporting pattern matching across different data instances would require a form of extensible case construct.

==== Associated type instances ====

When an associated family instance is declared within a type class instance, we drop the <hask>instance</hask> keyword in the family instance. So, the <hask>Either</hask> instance for <hask>GMap</hask> becomes:
<haskell>
instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where
data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v
...
</haskell>
The most important point about associated family instances is that the type indices corresponding to class parameters must be identical to the type given in the instance head; here this is the first argument of <hask>GMap</hask>, namely <hask>Either a b</hask>, which coincides with the only class parameter. Any parameters to the family constructor that do not correspond to class parameters, need to be variables in every instance; here this is the variable <hask>v</hask>.

Instances for an associated family can only appear as part of instance declarations of the class in which the family was declared - just as with the equations of the methods of a class. Also in correspondence to how methods are handled, declarations of associated types can be omitted in class instances. If an associated family instance is omitted, the corresponding instance type is not inhabited; i.e., only diverging expressions, such as <hask>undefined</hask>, can assume the type.

==== Scoping of class parameters ====

In the case of multi-parameter type classes, the visibility of class parameters in the right-hand side of associated family instances depends ''solely'' on the parameters of the data family. As an example, consider the simple class declaration
<haskell>
class C a b where
data T a
</haskell>
Only one of the two class parameters is a parameter to the data family. Hence, the following instance declaration is invalid:
<haskell>
instance C [c] d where
data T [c] = MkT (c, d) -- WRONG!! 'd' is not in scope
</haskell>
Here, the right-hand side of the data instance mentions the type variable <hask>d</hask> that does not occur in its left-hand side. We cannot admit such data instances as they would compromise type safety.

==== Type class instances of family instances ====

Type class instances of instances of data families can be defined as usual, and in particular data instance declarations can have <hask>deriving</hask> clauses. For example, we can write
<haskell>
data GMap () v = GMapUnit (Maybe v)
deriving Show
</haskell>
which implicitly defines an instance of the form
<haskell>
instance Show v => Show (GMap () v) where ...
</haskell>

Note that class instances are always for particular ''instances'' of a data family and never for an entire family as a whole. This is for essentially the same reasons that we cannot define a toplevel function that performs pattern matching on the data constructors of ''different'' instances of a single type family. It would require a form of extensible case construct.

==== Overlap ====

The instance declarations of a data family used in a single program may not overlap at all, independent of whether they are associated or not. In contrast to type class instances, this is not only a matter of consistency, but one of type safety.

=== Import and export ===

The association of data constructors with type families is more dynamic than that is the case with standard data and newtype declarations. In the standard case, the notation <hask>T(..)</hask> in an import or export list denotes the type constructor and all the data constructors introduced in its declaration. However, a family declaration never introduces any data constructors; instead, data constructors are introduced by family instances. As a result, which data constructors are associated with a type family depends on the currently visible instance declarations for that family. Consequently, an import or export item of the form <hask>T(..)</hask> denotes the family constructor and all currently visible data constructors - in the case of an export item, these may be either imported or defined in the current module. The treatment of import and export items that explicitly list data constructors, such as <hask>GMap(GMapEither)</hask>, is analogous.

==== Associated families ====

As expected, an import or export item of the form <hask>C(..)</hask> denotes all of the class' methods and associated types. However, when associated types are explicitly listed as subitems of a class, we need some new syntax, as uppercase identifiers as subitems are usually data constructors, not type constructors. To clarify that we denote types here, each associated type name needs to be prefixed by the keyword <hask>type</hask>. So for example, when explicitly listing the components of the <hask>GMapKey</hask> class, we write <hask>GMapKey(type GMap, empty, lookup, insert)</hask>.

==== Examples ====

Assuming our running <hask>GMapKey</hask> class example, let us look at some export lists and their meaning:

* <hask>module GMap (GMapKey) where...</hask>: Exports just the class name.
* <hask>module GMap (GMapKey(..)) where...</hask>: Exports the class, the associated type <hask>GMap</hask> and the member functions <hask>empty</hask>, <hask>lookup</hask>, and <hask>insert</hask>. None of the data constructors is exported.
* <hask>module GMap (GMapKey(..), GMap(..)) where...</hask>: As before, but also exports all the data constructors <hask>GMapInt</hask>, <hask>GMapChar</hask>, <hask>GMapUnit</hask>, <hask>GMapPair</hask>, and <hask>GMapEither</hask>.
* <hask>module GMap (GMapKey(empty, lookup, insert), GMap(..)) where...</hask>: As before.
* <hask>module GMap (GMapKey, empty, lookup, insert, GMap(..)) where...</hask>: As before.

Finally, you can write <hask>GMapKey(type GMap)</hask> to denote both the class <hask>GMapKey</hask> as well as its associated type <hask>GMap</hask>. However, you cannot write <hask>GMapKey(type GMap(..))</hask> — i.e., sub-component specifications cannot be nested. To specify <hask>GMap</hask>'s data constructors, you have to list it separately.

==== Instances ====

Family instances are implicitly exported, just like class instances. However, this applies only to the heads of instances, not to the data constructors an instance defines.

== An associated type synonym example ==

Type synonym families are an alternative to functional dependencies, which makes functional dependency examples well suited to introduce type synonym families. In fact, type families are a more functional way to express the same as functional dependencies (despite the name!), as they replace the relational notation of functional dependencies by an expression-oriented notation; i.e., functions on types are really represented by functions and not relations.

=== The <hask>class</hask> declaration ===

Here's an example from Mark Jones' seminal paper on functional dependencies:
<haskell>
class Collects e ce | ce -> e where
empty :: ce
insert :: e -> ce -> ce
member :: e -> ce -> Bool
toList :: ce -> [e]
</haskell>

With associated type synonyms we can write this as
<haskell>
class Collects ce where
type Elem ce
empty :: ce
insert :: Elem ce -> ce -> ce
member :: Elem ce -> ce -> Bool
toList :: ce -> [Elem ce]
</haskell>
Instead of the multi-parameter type class, we use a single parameter class, and the parameter <hask>e</hask>
turned into an associated type synonym <hask>Elem ce</hask>.

=== An <hask>instance</hask>===

Instances change correspondingly. An instance of the two-parameter class
<haskell>
instance Eq e => Collects e [e] where
empty = []
insert e l = (e:l)
member e [] = False
member e (x:xs)
| e == x = True
| otherwise = member e xs
toList l = l
</haskell>
becomes an instance of a single-parameter class, where the dependent type parameter turns into an associated type instance declaration:
<haskell>
instance Eq e => Collects [e] where
type Elem [e] = e
empty = []
insert e l = (e:l)
member e [] = False
member e (x:xs)
| e == x = True
| otherwise = member e xs
toList l = l
</haskell>

=== Using generic collections ===

With Functional Dependencies the code would be:
<haskell>
sumCollects :: (Collects e c1, Collects e c2) => c1 -> c2 -> c2
sumCollects c1 c2 = foldr insert c2 (toList c1)
</haskell>

In contrast, with associated type synonyms, we get:
<haskell>
sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 -> c2 -> c2
sumCollects c1 c2 = foldr insert c2 (toList c1)
</haskell>

== Detailed definition of type synonym families ==

Type families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated type synonyms). The former is the more general variant, as it lacks the requirement for the type-indices to coincide with the class parameters. However, the latter can lead to more clearly structured code and compiler warnings if some type instances were - possibly accidentally - omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types.

=== Family declarations ===

Indexed type families are introduced by a signature, such as
<haskell>
type family Elem c :: *
</haskell>
The special <hask>family</hask> distinguishes family from standard type declarations. The result kind annotation is optional and, as usual, defaults to <hask>*</hask> if omitted. An example is
<haskell>
type family Elem c
</haskell>
Parameters can also be given explicit kind signatures if needed. We call the number of parameters in a type family declaration, the family's arity, and all applications of a type family must be fully saturated w.r.t. to that arity. This requirement is unlike ordinary type synonyms and it implies that the kind of a type family is not sufficient to determine a family's arity, and hence in general, also insufficient to determine whether a type family application is well formed. As an example, consider the following declaration:
<haskell>
type family F a b :: * -> * -- F's arity is 2,
-- although its overall kind is * -> * -> * -> *
</haskell>
Given this declaration the following are examples of well-formed and malformed types:
<haskell>
F Char [Int] -- OK! Kind: * -> *
F Char [Int] Bool -- OK! Kind: *
F IO Bool -- WRONG: kind mismatch in the first argument
F Bool -- WRONG: unsaturated application
</haskell>

==== Associated family declarations ====

When a type family is declared as part of a type class, we drop the <hask>family</hask> special. The <hask>Elem</hask> declaration takes the following form
<haskell>
class Collects ce where
type Elem ce :: *
...
</haskell>
Exactly as in the case of an associated data declaration, '''the named type parameters must be a permutation of a subset of the class parameters'''. Examples
<haskell>
class C a b c where { type T c a :: * } -- OK
class D a where { type T a x :: * } -- No: x is not a class parameter
class D a where { type T a :: * -> * } -- OK
</haskell>

=== Instance declarations ===

Instance declarations of type families are very similar to standard type synonym declarations. The only two differences are that the keyword <hask>type</hask> is followed by <hask>instance</hask> and that some or all of the type arguments can be non-variable types, but may not contain forall types or type synonym families. However, data families are generally allowed, and type synonyms are allowed as long as they are fully applied and expand to a type that is admissible - these are the exact same requirements as for data instances. For example, the <hask>[e]</hask> instance for <hask>Elem</hask> is
<haskell>
type instance Elem [e] = e
</haskell>

A type family instance declaration must satisfy the following rules:
* An appropriate family declaration is in scope - just like class instances require the class declaration to be visible.
* The instance declaration conforms to the kind determined by its family declaration
* The number of type parameters in an instance declaration matches the number of type parameters in the family declaration.
* The right-hand side of a type instance must be a monotype (i.e., it may not include foralls) and after the expansion of all saturated vanilla type synonyms, no synonyms, except family synonyms may remain.

Here are some examples of admissible and illegal type instances:
<haskell>
type family F a :: *
type instance F [Int] = Int -- OK!
type instance F String = Char -- OK!
type instance F (F a) = a -- WRONG: type parameter mentions a type family
type instance F (forall a. (a, b)) = b -- WRONG: a forall type appears in a type parameter
type instance F Float = forall a.a -- WRONG: right-hand side may not be a forall type

type family G a b :: * -> *
type instance G Int = (,) -- WRONG: must be two type parameters
type instance G Int Char Float = Double -- WRONG: must be two type parameters
</haskell>

==== Associated type instances ====

When an associated family instance is declared within a type class instance, we drop the <hask>instance</hask> keyword in the family instance. So, the <hask>[e]</hask> instance for <hask>Elem</hask> becomes:
<haskell>
instance (Eq (Elem [e])) => Collects ([e]) where
type Elem [e] = e
...
</haskell>
The most important point about associated family instances is that the type indexes corresponding to class parameters must be identical to the type given in the instance head; here this is <hask>[e]</hask>, which coincides with the only class parameter.

Instances for an associated family can only appear as part of instance declarations of the class in which the family was declared - just as with the equations of the methods of a class. Also in correspondence to how methods are handled, declarations of associated types can be omitted in class instances. If an associated family instance is omitted, the corresponding instance type is not inhabited; i.e., only diverging expressions, such as <hask>undefined</hask>, can assume the type.

==== Overlap ====

The instance declarations of a type family used in a single program may only overlap if the right-hand sides of the overlapping instances coincide for the overlapping types. More formally, two instance declarations overlap if there is a substitution that makes the left-hand sides of the instances syntactically the same. Whenever that is the case, the right-hand sides of the instances must also be syntactically equal under the same substitution. This condition is independent of whether the type family is associated or not, and it is not only a matter of consistency, but one of type safety.

Here are two examples to illustrate the condition under which overlap is permitted.
<haskell>
type instance F (a, Int) = [a]
type instance F (Int, b) = [b] -- overlap permitted

type instance G (a, Int) = [a]
type instance G (Char, a) = [a] -- ILLEGAL overlap, as [Char] /= [Int]
</haskell>

==== Decidability ====

In order to guarantee that type inference in the presence of type families is decidable, we need to place a number of additional restrictions on the formation of type instance declarations (c.f., Definition 5 (Relaxed Conditions) of [http://www.cse.unsw.edu.au/~chak/papers/SPCS08.html Type Checking with Open Type Functions]). Instance declarations have the general form
<haskell>
type instance F t1 .. tn = t
</haskell>
where we require that for every type family application <hask>(G s1 .. sm)</hask> in <hask>t</hask>,
# <hask>s1 .. sm</hask> do not contain any type family constructors,
# the total number of symbols (data type constructors and type variables) in <hask>s1 .. sm</hask> is strictly smaller than in <hask>t1 .. tn</hask>, and
# for every type variable <hask>a</hask>, <hask>a</hask> occurs in <hask>s1 .. sm</hask> at most as often as in <hask>t1 .. tn</hask>.
These restrictions are easily verified and ensure termination of type inference. However, they are not sufficient to guarantee completeness of type inference in the presence of, so called, ''loopy equalities'', such as <hask>a ~ [F a]</hask>, where a recursive occurrence of a type variable is underneath a family application and data constructor application - see the above mentioned paper for details.

If the option <tt>-XUndecidableInstances</tt> is passed to the compiler, the above restrictions are not enforced and it is on the programmer to ensure termination of the normalisation of type families during type inference.

=== Equality constraints ===

Type context can include equality constraints of the form <hask>t1 ~ t2</hask>, which denote that the types <hask>t1</hask> and <hask>t2</hask> need to be the same. In the presence of type families, whether two types are equal cannot generally be decided locally. Hence, the contexts of function signatures may include equality constraints, as in the following example:
<haskell>
sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 -> c2 -> c2
</haskell>
where we require that the element type of <hask>c1</hask> and <hask>c2</hask> are the same. In general, the types <hask>t1</hask> and <hask>t2</hask> of an equality constraint may be arbitrary monotypes; i.e., they may not contain any quantifiers, independent of whether higher-rank types are otherwise enabled.

Equality constraints can also appear in class and instance contexts. The former enable a simple translation of programs using functional dependencies into programs using family synonyms instead. The general idea is to rewrite a class declaration of the form
<haskell>
class C a b | a -> b
</haskell>
to
<haskell>
class (F a ~ b) => C a b where
type F a
</haskell>
That is, we represent every functional dependency (FD) <hask>a1 .. an -> b</hask> by an FD type family <hask>F a1 .. an</hask> and a superclass context equality <hask>F a1 .. an ~ b</hask>, essentially giving a name to the functional dependency. In class instances, we define the type instances of FD families in accordance with the class head. Method signatures are not affected by that process.

NB: Equalities in superclass contexts are not fully implemented in the GHC 6.10 branch.

== Frequently asked questions ==

=== Injectivity, type inference, and ambiguity ===

A common problem is this
<haskell>
type family F a

f :: F a -> F a
f = undefined

g :: F Int -> F Int
g x = f x
</haskell>
The compiler complains about the definition of <tt>g</tt> saying
<haskell>
Couldn't match expected type `F Int' against inferred type `F a1'
</haskell>
In type-checking <tt>g</tt>'s right hand side GHC discovers (by instantiating <tt>f</tt>'s type with a fresh type variable) that it has type <tt>F a1 -> F a1</tt> for some as-yet-unknown type <tt>a1</tt>. Now it tries to make the inferred type match <tt>g</tt>'s type signature. Well, you say, just make <tt>a1</tt> equal to <tt>Int</tt> and you are done. True, but what if there were these instances
<haskell>
type instance F Int = Bool
type instance F Char = Bool
</haskell>
Then making <tt>a1</tt> equal to <tt>Char</tt> would ''also'' make the two types equal. Because there is more than one choice, the program is rejected.

Or, to put it another way, knowing that <tt>F t1</tt>=<tt>F t2</tt> does ''not'' imply that <tt>t1</tt> = <tt>t2</tt>.
The difficulty is that the type function <tt>F</tt> need not be ''injective''; it can map two distinct types to the same type. For an injective type constructor like <tt>Maybe</tt>, if we know that <tt>Maybe t1</tt> = <tt>Maybe t2</tt>, then we know that <tt>t1</tt> = <tt>t2</tt>. But not so for non-injective type functions.

The problem starts with <tt>f</tt>. Its type is ''ambiguous''; even if I know the argument and result types for <tt>f</tt>, I cannot use that to find the type at which <tt>a</tt> should be instantiated. (So arguably, <tt>f</tt> should be rejected as having an ambiguous type, and probably will be in future.) The situation is well known in type classes:
<haskell>
bad :: (Read a, Show a) => String -> String
bad x = show (read x)
</haskell>
At a call of <tt>bad</tt> one cannot tell at what type <tt>a</tt> should be instantiated.

The only solution is to avoid ambiguous types. In the type signature of a function,
* Ensure that every type variable occurs in the part after the "<tt>=></tt>"
* Ensure that every type variable appears at least once outside a type function call.

Even then ambiguity is possible. For example:
<haskell>
f :: F a -> [a]
f = undefined

g :: F b -> Int
g x = length (f x)
</haskell>
Although <tt>f</tt>'s type is unambiguous, its result type is swallowed up by <tt>length</tt>, which now makes <tt>g</tt>'s type ambiguous.

The above ambiguity is caused by <tt>F</tt> being a type family, so it is possibly non-injective. However, data families create new types, so they are always injective and the following code works:

<haskell>data family F a

f :: F a -> F a
f = undefined

g :: F Int -> F Int
g x = f x</haskell>

== References ==

* [http://www.cse.unsw.edu.au/~chak/papers/CKPM05.html Associated Types with Class.] Manuel M. T. Chakravarty, Gabriele Keller, Simon Peyton Jones, and Simon Marlow. In ''Proceedings of The 32nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'05)'', pages 1-13, ACM Press, 2005.
* [http://www.cse.unsw.edu.au/~chak/papers/CKP05.html Associated Type Synonyms.] Manuel M. T. Chakravarty, Gabriele Keller, and Simon Peyton Jones. In ''Proceedings of The Tenth ACM SIGPLAN International Conference on Functional Programming'', ACM Press, pages 241-253, 2005.
* [http://www.cse.unsw.edu.au/~chak/papers/SCPD07.html System F with Type Equality Coercions.] Martin Sulzmann, Manuel M. T. Chakravarty, Simon Peyton Jones, and Kevin Donnelly. In ''Proceedings of The Third ACM SIGPLAN Workshop on Types in Language Design and Implementation'', ACM Press, 2007.
* [http://www.cse.unsw.edu.au/~chak/papers/SPCS08.html Type Checking With Open Type Functions.] Tom Schrijvers, Simon Peyton-Jones, Manuel M. T. Chakravarty, Martin Sulzmann. In ''Proceedings of The 13th ACM SIGPLAN International Conference on Functional Programming'', ACM Press, pages 51-62, 2008.

[[Category:Type-level programming]]
[[Category:Language extension]]

Infix operator

2008-01-06T13:34:42Z

Blaisorblade: Correct a claim: backticks work even for functions with more than two arguments

[[Category:Syntax]] [[Category:Glossary]]
== Overview ==

Functions in Haskell are usually called using prefix notation, or the function name followed by its arguments. However, some functions, like +, are called with infix notation, or putting the function name between its two arguments.

== Using infix functions with prefix notation ==

Putting parenthesis around an infix operator converts it into a prefix function:

Prelude> (+) 1 2
3
Prelude> (*) 3 4
12

== Using prefix functions with infix notation ==

Putting ` marks around a prefix function allows us to use it like an infix function:

Prelude> let concatPrint x y = putStrLn $ (++) x y
Prelude> concatPrint "a" "b"
ab
Prelude> "a" `concatPrint` "b"
ab

Note that you can only normally do this with a function that takes two arguments. Actually, for a function taking more than two arguments, you can do it but it's not nearly as nice (note the need for extra parentheses):

Prelude> foldl (+) 0 [1..5]
15
Prelude> ((+) `foldl` 0) [1..5]
15

== See also ==

* [[section of an infix operator]]
* [[use of infix operators]]

Learn Haskell in 10 minutes

2008-01-06T13:28:04Z

Blaisorblade: Typo fix: actully -> actually

== Overview ==

Haskell is a functional (that is, everything is done with function calls), statically, implicitly typed ([[type]]s are checked by the compiler, but you don't have to declare them), lazy (nothing is done until it needs to be) language. Its closest popular relative is probably the ML family of languages (which are not, however, lazy languages).

The most common Haskell compiler is [[GHC]]. You can download GHC from http://www.haskell.org/ghc/download_ghc_661.html . GHC binaries are available for [[GNU/Linux]], [[BSD | FreeBSD]], [[Mac OS X |MacOS]], [[Windows]], and [[Solaris]]. Once you've installed [[GHC]], you get two programs you're interested in right now: <tt>ghc</tt>, and <tt>[[GHC/GHCi | ghci]]</tt>. The first compiles Haskell libraries or applications to binary code. The second is an interpreter that lets you write Haskell code and get feedback right away.

== Simple expressions ==

You can type most math expressions directly into <tt>ghci</tt> and get an answer. <tt>Prelude></tt> is the default GHCi prompt.

Prelude> <hask>3 * 5</hask>
15
Prelude> <hask>4 ^ 2 - 1</hask>
15
Prelude> <hask>(1 - 5)^(3 * 2 - 4)</hask>
16

Strings are in "double quotes." You can concatenate them with <hask>++</hask>.

Prelude> <hask>"Hello"</hask>
"Hello"
Prelude> <hask>"Hello" ++ ", Haskell"</hask>
"Hello, Haskell"

Calling [[function]]s is done by putting the arguments directly after the function. There are no parentheses as part of the function call:

Prelude> <hask>succ 5</hask>
6
Prelude> <hask>truncate 6.59</hask>
6
Prelude> <hask>round 6.59</hask>
7
Prelude> <hask>sqrt 2</hask>
1.4142135623730951
Prelude> <hask>not (5 < 3)</hask>
True
Prelude> <hask>gcd 21 14</hask>
7

== The console ==

[[Introduction to IO |I/O actions]] can be used to read from and write to the console. Some common ones include:

Prelude> <hask>putStrLn "Hello, Haskell"</hask>
Hello, Haskell
Prelude> <hask>putStr "No newline"</hask>
No newlinePrelude> <hask>print (5 + 4)</hask>
9
Prelude> <hask>print (1 < 2)</hask>
True

The <hask>putStr</hask> and <hask>putStrLn</hask> functions output strings to the terminal. The <hask>print</hask> function outputs any type of value. (If you <hask>print</hask> a string, it will have quotes around it.)

If you need multiple I/O actions in one expression, you can use a <hask>do</hask> block. Actions are separated by semicolons.

Prelude> <hask>do { putStr "2 + 2 = " ; print (2 + 2) }</hask>
2 + 2 = 4
Prelude> <hask>do { putStrLn "ABCDE" ; putStrLn "12345" }</hask>
ABCDE
12345

Reading can be done with <hask>getLine</hask> (which gives back a <hask>String</hask>) or <hask>readLn</hask> (which gives back whatever type of value you want). The <hask> <- </hask> symbol is used to assign a name to the result of an I/O action.

Prelude> <hask>do { n <- readLn ; print (n^2) }</hask>
4
16

(The 4 was input. The 16 was a result.)

There is actually another way to write <hask>do</hask> blocks. If you leave off the braces and semicolons, then indentation becomes significant. This doesn't work so well in <tt>ghci</tt>, but try putting the file in a source file (say, <tt>Test.hs</tt>) and build it.

<haskell>
main = do putStrLn "What is 2 + 2?"
x <- readLn
if x == 4
then putStrLn "You're right!"
else putStrLn "You're wrong!"
</haskell>

You can build with <tt>ghc --make Test.hs</tt>, and the result will be called <tt>Test</tt>. (On [[Windows]], <tt>Test.exe</tt>) You get an <hask>if</hask> expression as a bonus.

The first non-space character after <hask>do</hask> is special. In this case, it's the <tt>p</tt> from <hask>putStrLn</hask>. Every line that starts in the same column as that <hask>p</hask> is another statement in the <hask>do</hask> block. If you indent more, it's part of the previous statement. If you indent less, it ends the <hask>do</hask> block. This is called "layout", and Haskell uses it to avoid making you put in statement terminators and braces all the time. (The <hask>then</hask> and <hask>else</hask> phrases have to be indented for this reason: if they started in the same column, they'd be separate statements, which is wrong.)

(Note: Do '''not''' indent with tabs if you're using layout. It technically still works if your tabs are 8 spaces, but it's a bad idea. Also, don't use proportional fonts -- which apparently some people do, even when programming!)

== Simple types ==

So far, not a single [[type]] declaration has been mentioned. That's because Haskell does type inference. You generally don't have to declare types unless you want to. If you do want to declare types, you use <hask>::</hask> to do it.

Prelude> <hask>5 :: Int</hask>
5
Prelude> <hask>5 :: Double</hask>
5.0

[[Type]]s (and type [[class]]es, discussed later) always start with upper-case letters in Haskell. Variables always start with lower-case letters. This is a rule of the language, not a [[Studly capitals|naming convention]].

You can also ask <tt>ghci</tt> what type it has chosen for something. This is useful because you don't generally have to declare your types.

Prelude> :t <hask>True</hask>
<hask>True :: Bool</hask>
Prelude> :t <hask>'X'</hask>
<hask>'X' :: Char</hask>
Prelude> :t <hask>"Hello, Haskell"</hask>
<hask>"Hello, Haskell" :: [Char]</hask>

(In case you noticed, <hask>[Char]</hask> is another way of saying <hask>String</hask>. See the [[#Structured data|section on lists]] later.)

Things get more interesting for numbers.

Prelude> :t <hask>42</hask>
<hask>42 :: (Num t) => t</hask>
Prelude> :t <hask>42.0</hask>
<hask>42.0 :: (Fractional t) => t</hask>
Prelude> :t <hask>gcd 15 20</hask>
<hask>gcd 15 20 :: (Integral t) => t</hask>

These types use "type classes." They mean:

* <hask>42</hask> can be used as any numeric type. (This is why I was able to declare <hask>5</hask> as either an <hask>Int</hask> or a <hask>Double</hask> earlier.)
* <hask>42.0</hask> can be any fractional type, but not an integral type.
* <hask>gcd 15 20</hask> (which is a function call, incidentally) can be any integral type, but not a fractional type.

There are five numeric types in the Haskell "prelude" (the part of the library you get without having to import anything):

* <hask>Int</hask> is an integer with at least 30 bits of precision.
* <hask>Integer</hask> is an integer with unlimited precision.
* <hask>Float</hask> is a single precision floating point number.
* <hask>Double</hask> is a double precision floating point number.
* <hask>Rational</hask> is a fraction type, with no rounding error.

All five are '''instances''' of the <hask>Num</hask> type class. The first two are '''instances''' of <hask>Integral</hask>, and the last three are '''instances''' of <hask>Fractional</hask>.

Putting it all together,

Prelude> <hask>gcd 42 35 :: Int</hask>
7
Prelude> <hask>gcd 42 35 :: Double</hask>

<interactive>:1:0:
No instance for (Integral Double)

The final type worth mentioning here is <hask>()</hask>, pronounced "unit." It only has one value, also written as <hask>()</hask> and pronounced "unit."

Prelude> <hask>()</hask>
<hask>()</hask>
Prelude> :t <hask>()</hask>
<hask>() :: ()</hask>

You can think of this as similar to the <tt>void</tt> keyword in C family languages. You can return <hask>()</hask> from an I/O action if you don't want to return anything.

== Structured data ==

Basic data types can be easily combined in two ways: lists, which go in [square brackets], and tuples, which go in (parentheses).

Lists are used to hold multiple values of the same type.

Prelude> <hask>[1, 2, 3]</hask>
[1,2,3]
Prelude> <hask>[1 .. 5]</hask>
[1,2,3,4,5]
Prelude> <hask>[1, 3 .. 10]</hask>
[1,3,5,7,9]
Prelude> <hask>[True, False, True]</hask>
[True,False,True]

Strings are just lists of characters.

Prelude> <hask>['H', 'e', 'l', 'l', 'o']</hask>
"Hello"

The <hask>:</hask> operator appends an item to the beginning of a list. (It is Haskell's version of the <tt>cons</tt> function in the Lisp family of languages.)

Prelude> <hask>'C' : ['H', 'e', 'l', 'l', 'o']</hask>
"CHello"

Tuples hold a fixed number of values, which can have different types.

Prelude> <hask>(1, True)</hask>
(1,True)
Prelude> <hask>zip [1 .. 5] ['a' .. 'e']</hask>
[(1,'a'),(2,'b'),(3,'c'),(4,'d'),(5,'e')]

The last example used <hask>zip</hask>, a library function that turns two lists into a list of tuples.

The types are probably what you'd expect.

Prelude> :t <hask>['a' .. 'c']</hask>
<hask>['a' .. 'c'] :: [Char]</hask>
Prelude> :t <hask>[('x', True), ('y', False)]</hask>
<hask>[('x', True), ('y', False)] :: [(Char, Bool)]</hask>

Lists are used a lot in Haskell. There are several functions that do nice things with them.

Prelude> <hask>[1 .. 5]</hask>
<hask>[1,2,3,4,5]</hask>
Prelude> <hask>map (+ 2) [1 .. 5]</hask>
<hask>[3,4,5,6,7]</hask>
Prelude> <hask>filter (> 2) [1 .. 5]</hask>
<hask>[3,4,5]</hask>

There are two nice functions on ordered pairs (tuples of two elements):

Prelude> <hask>fst (1, 2)</hask>
<hask>1</hask>
Prelude> <hask>snd (1, 2)</hask>
<hask>2</hask>
Prelude> <hask>map fst [(1, 2), (3, 4), (5, 6)]</hask>
<hask>[1,3,5]</hask>

Also see [[how to work on lists]]

== [[Function]] definitions ==

We wrote a definition of an [[Introduction to Haskell IO/Actions |IO action]] earlier, called <hask>main</hask>:

<haskell>
main = do putStrLn "What is 2 + 2?"
x <- readLn
if x == 4
then putStrLn "You're right!"
else putStrLn "You're wrong!"
</haskell>

Now, let's supplement it by actually writing a ''[[function]]'' definition and call it <hask>factorial</hask>. I'm also adding a module header, which is good form.

<haskell>
module Main where

factorial n = if n == 0 then 1 else n * factorial (n - 1)

main = do putStrLn "What is 5! ?"
x <- readLn
if x == factorial 5
then putStrLn "You're right!"
else putStrLn "You're wrong!"
</haskell>

Build again with <tt>ghc --make Test.hs</tt>. And,

$ ./Test
What is 5! ?
120
You're right!

There's a function. Just like the built-in functions, it can be called as <hask>factorial 5</hask> without needing parentheses.

Now ask <tt>ghci</tt> for the [[type]].

$ ghci Test.hs
<< GHCi banner >>
Ok, modules loaded: Main.
Prelude Main> :t <hask>factorial</hask>
<hask>factorial :: (Num a) => a -> a</hask>

Function types are written with the argument type, then <hask> -> </hask>, then the result type. (This also has the type class <hask>Num</hask>.)

Factorial can be simplified by writing it with case analysis.

<haskell>
factorial 0 = 1
factorial n = n * factorial (n - 1)
</haskell>

== Convenient syntax ==

A couple extra pieces of [[:Category:Syntax |syntax]] are helpful.

<haskell>
secsToWeeks secs = let perMinute = 60
perHour = 60 * perMinute
perDay = 24 * perHour
perWeek = 7 * perday
in secs * perWeek
</haskell>

The <hask>let</hask> expression defines temporary names. (This is using layout again. You could use {braces}, and separate the names with semicolons, if you prefer.)

<haskell>
classify age = case age of 0 -> "newborn"
1 -> "infant"
2 -> "toddler"
_ -> "senior citizen"
</haskell>

The <hask>case</hask> expression does a multi-way branch. The special label <hask>_</hask> means "anything else".

== Using libraries ==

Everything used so far in this tutorial is part of the [[Prelude]], which is the set of Haskell functions that are always there in any program.

The best road from here to becoming a very productive Haskell programmer (aside from practice!) is becoming familiar with other [[Applications and libraries | libraries]] that do the things you need. Documentation on the standard libraries is at [http://haskell.org/ghc/docs/latest/html/libraries/ http://haskell.org/ghc/docs/latest/html/libraries/]. There are modules there with:

* [[Applications and libraries/Data structures |Useful data structures]]
* [[Applications and libraries/Concurrency and parallelism |Concurrent and parallel programming]]
* [[Applications and libraries/GUI libraries | Graphics and GUI libraries]]
* [[Applications and libraries/Network | Networking, POSIX, and other system-level stuff]]
* Two test frameworks, QuickCheck and HUnit
* Regular expressions and predictive parsers
* More...

<haskell>
module Main where

import qualified Data.Map as M

errorsPerLine = M.fromList
[ ("Chris", 472), ("Don", 100), ("Simon", -5) ]

main = do putStrLn "Who are you?"
name <- getLine
case M.lookup name errorsPerLine of
Nothing -> putStrLn "I don't know you"
Just n -> do putStr "Errors per line: "
print n
</haskell>

The <hask>import</hask> says to use code from <hask>Data.Map</hask> and that it will be prefixed by <hask>M</hask>. (That's necessary because some of the functions have the same names as functions from the prelude. Most libraries don't need the <hask>as</hask> part.)

If you want something that's not in the standard library, try looking at http://hackage.haskell.org/packages/hackage.html or this wiki's [[applications and libraries]] page. This is a collection of many different libraries written by a lot of people for Haskell. Once you've got a library, extract it and switch into that directory and do this:

runhaskell Setup configure
runhaskell Setup build
runhaskell Setup install

On a UNIX system, you may need to be root for that last part.

== Topics that don't fit in 10 minute limit ==

* [[:Category:Language | Advanced data types]]
** Arithmetic lists
** [[List comprehension]]s
** [[Type#Type and newtype | Type synonyms]]
** [[Type|data vs newtype]] (and [[Newtype|here]])
** [[Class |Type classes and instances]]
* [[:Category:Syntax |Advanced syntax]]
** [[Operator]]s
** [[Infix operator |(+) and `foo`]]
** [[Fixity declaration]]s
* Advanced functions
** [[Currying]]
** [[Lambda abstraction]]s
** [[Section of an infix operator |Sections]]
* [[:Category:Monad |Monads]]
* [[Tutorials/Programming Haskell/String IO |File I/O]]
** Reading files
** Writing Files

[[Category:Tutorials]]
Languages: [[Learn Haskell in 10 minutes|en]] [[Cn/十分钟学会 Haskell|zh/cn]]

Learn Haskell in 10 minutes

2008-01-06T13:24:21Z

Blaisorblade: Mention the important difference between Haskell and *ML, i.e. that ML is not lazy

== Overview ==

Haskell is a functional (that is, everything is done with function calls), statically, implicitly typed ([[type]]s are checked by the compiler, but you don't have to declare them), lazy (nothing is done until it needs to be) language. Its closest popular relative is probably the ML family of languages (which are not, however, lazy languages).

The most common Haskell compiler is [[GHC]]. You can download GHC from http://www.haskell.org/ghc/download_ghc_661.html . GHC binaries are available for [[GNU/Linux]], [[BSD | FreeBSD]], [[Mac OS X |MacOS]], [[Windows]], and [[Solaris]]. Once you've installed [[GHC]], you get two programs you're interested in right now: <tt>ghc</tt>, and <tt>[[GHC/GHCi | ghci]]</tt>. The first compiles Haskell libraries or applications to binary code. The second is an interpreter that lets you write Haskell code and get feedback right away.

== Simple expressions ==

You can type most math expressions directly into <tt>ghci</tt> and get an answer. <tt>Prelude></tt> is the default GHCi prompt.

Prelude> <hask>3 * 5</hask>
15
Prelude> <hask>4 ^ 2 - 1</hask>
15
Prelude> <hask>(1 - 5)^(3 * 2 - 4)</hask>
16

Strings are in "double quotes." You can concatenate them with <hask>++</hask>.

Prelude> <hask>"Hello"</hask>
"Hello"
Prelude> <hask>"Hello" ++ ", Haskell"</hask>
"Hello, Haskell"

Calling [[function]]s is done by putting the arguments directly after the function. There are no parentheses as part of the function call:

Prelude> <hask>succ 5</hask>
6
Prelude> <hask>truncate 6.59</hask>
6
Prelude> <hask>round 6.59</hask>
7
Prelude> <hask>sqrt 2</hask>
1.4142135623730951
Prelude> <hask>not (5 < 3)</hask>
True
Prelude> <hask>gcd 21 14</hask>
7

== The console ==

[[Introduction to IO |I/O actions]] can be used to read from and write to the console. Some common ones include:

Prelude> <hask>putStrLn "Hello, Haskell"</hask>
Hello, Haskell
Prelude> <hask>putStr "No newline"</hask>
No newlinePrelude> <hask>print (5 + 4)</hask>
9
Prelude> <hask>print (1 < 2)</hask>
True

The <hask>putStr</hask> and <hask>putStrLn</hask> functions output strings to the terminal. The <hask>print</hask> function outputs any type of value. (If you <hask>print</hask> a string, it will have quotes around it.)

If you need multiple I/O actions in one expression, you can use a <hask>do</hask> block. Actions are separated by semicolons.

Prelude> <hask>do { putStr "2 + 2 = " ; print (2 + 2) }</hask>
2 + 2 = 4
Prelude> <hask>do { putStrLn "ABCDE" ; putStrLn "12345" }</hask>
ABCDE
12345

Reading can be done with <hask>getLine</hask> (which gives back a <hask>String</hask>) or <hask>readLn</hask> (which gives back whatever type of value you want). The <hask> <- </hask> symbol is used to assign a name to the result of an I/O action.

Prelude> <hask>do { n <- readLn ; print (n^2) }</hask>
4
16

(The 4 was input. The 16 was a result.)

There is actually another way to write <hask>do</hask> blocks. If you leave off the braces and semicolons, then indentation becomes significant. This doesn't work so well in <tt>ghci</tt>, but try putting the file in a source file (say, <tt>Test.hs</tt>) and build it.

<haskell>
main = do putStrLn "What is 2 + 2?"
x <- readLn
if x == 4
then putStrLn "You're right!"
else putStrLn "You're wrong!"
</haskell>

You can build with <tt>ghc --make Test.hs</tt>, and the result will be called <tt>Test</tt>. (On [[Windows]], <tt>Test.exe</tt>) You get an <hask>if</hask> expression as a bonus.

The first non-space character after <hask>do</hask> is special. In this case, it's the <tt>p</tt> from <hask>putStrLn</hask>. Every line that starts in the same column as that <hask>p</hask> is another statement in the <hask>do</hask> block. If you indent more, it's part of the previous statement. If you indent less, it ends the <hask>do</hask> block. This is called "layout", and Haskell uses it to avoid making you put in statement terminators and braces all the time. (The <hask>then</hask> and <hask>else</hask> phrases have to be indented for this reason: if they started in the same column, they'd be separate statements, which is wrong.)

(Note: Do '''not''' indent with tabs if you're using layout. It technically still works if your tabs are 8 spaces, but it's a bad idea. Also, don't use proportional fonts -- which apparently some people do, even when programming!)

== Simple types ==

So far, not a single [[type]] declaration has been mentioned. That's because Haskell does type inference. You generally don't have to declare types unless you want to. If you do want to declare types, you use <hask>::</hask> to do it.

Prelude> <hask>5 :: Int</hask>
5
Prelude> <hask>5 :: Double</hask>
5.0

[[Type]]s (and type [[class]]es, discussed later) always start with upper-case letters in Haskell. Variables always start with lower-case letters. This is a rule of the language, not a [[Studly capitals|naming convention]].

You can also ask <tt>ghci</tt> what type it has chosen for something. This is useful because you don't generally have to declare your types.

Prelude> :t <hask>True</hask>
<hask>True :: Bool</hask>
Prelude> :t <hask>'X'</hask>
<hask>'X' :: Char</hask>
Prelude> :t <hask>"Hello, Haskell"</hask>
<hask>"Hello, Haskell" :: [Char]</hask>

(In case you noticed, <hask>[Char]</hask> is another way of saying <hask>String</hask>. See the [[#Structured data|section on lists]] later.)

Things get more interesting for numbers.

Prelude> :t <hask>42</hask>
<hask>42 :: (Num t) => t</hask>
Prelude> :t <hask>42.0</hask>
<hask>42.0 :: (Fractional t) => t</hask>
Prelude> :t <hask>gcd 15 20</hask>
<hask>gcd 15 20 :: (Integral t) => t</hask>

These types use "type classes." They mean:

* <hask>42</hask> can be used as any numeric type. (This is why I was able to declare <hask>5</hask> as either an <hask>Int</hask> or a <hask>Double</hask> earlier.)
* <hask>42.0</hask> can be any fractional type, but not an integral type.
* <hask>gcd 15 20</hask> (which is a function call, incidentally) can be any integral type, but not a fractional type.

There are five numeric types in the Haskell "prelude" (the part of the library you get without having to import anything):

* <hask>Int</hask> is an integer with at least 30 bits of precision.
* <hask>Integer</hask> is an integer with unlimited precision.
* <hask>Float</hask> is a single precision floating point number.
* <hask>Double</hask> is a double precision floating point number.
* <hask>Rational</hask> is a fraction type, with no rounding error.

All five are '''instances''' of the <hask>Num</hask> type class. The first two are '''instances''' of <hask>Integral</hask>, and the last three are '''instances''' of <hask>Fractional</hask>.

Putting it all together,

Prelude> <hask>gcd 42 35 :: Int</hask>
7
Prelude> <hask>gcd 42 35 :: Double</hask>

<interactive>:1:0:
No instance for (Integral Double)

The final type worth mentioning here is <hask>()</hask>, pronounced "unit." It only has one value, also written as <hask>()</hask> and pronounced "unit."

Prelude> <hask>()</hask>
<hask>()</hask>
Prelude> :t <hask>()</hask>
<hask>() :: ()</hask>

You can think of this as similar to the <tt>void</tt> keyword in C family languages. You can return <hask>()</hask> from an I/O action if you don't want to return anything.

== Structured data ==

Basic data types can be easily combined in two ways: lists, which go in [square brackets], and tuples, which go in (parentheses).

Lists are used to hold multiple values of the same type.

Prelude> <hask>[1, 2, 3]</hask>
[1,2,3]
Prelude> <hask>[1 .. 5]</hask>
[1,2,3,4,5]
Prelude> <hask>[1, 3 .. 10]</hask>
[1,3,5,7,9]
Prelude> <hask>[True, False, True]</hask>
[True,False,True]

Strings are just lists of characters.

Prelude> <hask>['H', 'e', 'l', 'l', 'o']</hask>
"Hello"

The <hask>:</hask> operator appends an item to the beginning of a list. (It is Haskell's version of the <tt>cons</tt> function in the Lisp family of languages.)

Prelude> <hask>'C' : ['H', 'e', 'l', 'l', 'o']</hask>
"CHello"

Tuples hold a fixed number of values, which can have different types.

Prelude> <hask>(1, True)</hask>
(1,True)
Prelude> <hask>zip [1 .. 5] ['a' .. 'e']</hask>
[(1,'a'),(2,'b'),(3,'c'),(4,'d'),(5,'e')]

The last example used <hask>zip</hask>, a library function that turns two lists into a list of tuples.

The types are probably what you'd expect.

Prelude> :t <hask>['a' .. 'c']</hask>
<hask>['a' .. 'c'] :: [Char]</hask>
Prelude> :t <hask>[('x', True), ('y', False)]</hask>
<hask>[('x', True), ('y', False)] :: [(Char, Bool)]</hask>

Lists are used a lot in Haskell. There are several functions that do nice things with them.

Prelude> <hask>[1 .. 5]</hask>
<hask>[1,2,3,4,5]</hask>
Prelude> <hask>map (+ 2) [1 .. 5]</hask>
<hask>[3,4,5,6,7]</hask>
Prelude> <hask>filter (> 2) [1 .. 5]</hask>
<hask>[3,4,5]</hask>

There are two nice functions on ordered pairs (tuples of two elements):

Prelude> <hask>fst (1, 2)</hask>
<hask>1</hask>
Prelude> <hask>snd (1, 2)</hask>
<hask>2</hask>
Prelude> <hask>map fst [(1, 2), (3, 4), (5, 6)]</hask>
<hask>[1,3,5]</hask>

Also see [[how to work on lists]]

== [[Function]] definitions ==

We wrote a definition of an [[Introduction to Haskell IO/Actions |IO action]] earlier, called <hask>main</hask>:

<haskell>
main = do putStrLn "What is 2 + 2?"
x <- readLn
if x == 4
then putStrLn "You're right!"
else putStrLn "You're wrong!"
</haskell>

Now, let's supplement it by actully writing a ''[[function]]'' definition and call it <hask>factorial</hask>. I'm also adding a module header, which is good form.

<haskell>
module Main where

factorial n = if n == 0 then 1 else n * factorial (n - 1)

main = do putStrLn "What is 5! ?"
x <- readLn
if x == factorial 5
then putStrLn "You're right!"
else putStrLn "You're wrong!"
</haskell>

Build again with <tt>ghc --make Test.hs</tt>. And,

$ ./Test
What is 5! ?
120
You're right!

There's a function. Just like the built-in functions, it can be called as <hask>factorial 5</hask> without needing parentheses.

Now ask <tt>ghci</tt> for the [[type]].

$ ghci Test.hs
<< GHCi banner >>
Ok, modules loaded: Main.
Prelude Main> :t <hask>factorial</hask>
<hask>factorial :: (Num a) => a -> a</hask>

Function types are written with the argument type, then <hask> -> </hask>, then the result type. (This also has the type class <hask>Num</hask>.)

Factorial can be simplified by writing it with case analysis.

<haskell>
factorial 0 = 1
factorial n = n * factorial (n - 1)
</haskell>

== Convenient syntax ==

A couple extra pieces of [[:Category:Syntax |syntax]] are helpful.

<haskell>
secsToWeeks secs = let perMinute = 60
perHour = 60 * perMinute
perDay = 24 * perHour
perWeek = 7 * perday
in secs * perWeek
</haskell>

The <hask>let</hask> expression defines temporary names. (This is using layout again. You could use {braces}, and separate the names with semicolons, if you prefer.)

<haskell>
classify age = case age of 0 -> "newborn"
1 -> "infant"
2 -> "toddler"
_ -> "senior citizen"
</haskell>

The <hask>case</hask> expression does a multi-way branch. The special label <hask>_</hask> means "anything else".

== Using libraries ==

Everything used so far in this tutorial is part of the [[Prelude]], which is the set of Haskell functions that are always there in any program.

The best road from here to becoming a very productive Haskell programmer (aside from practice!) is becoming familiar with other [[Applications and libraries | libraries]] that do the things you need. Documentation on the standard libraries is at [http://haskell.org/ghc/docs/latest/html/libraries/ http://haskell.org/ghc/docs/latest/html/libraries/]. There are modules there with:

* [[Applications and libraries/Data structures |Useful data structures]]
* [[Applications and libraries/Concurrency and parallelism |Concurrent and parallel programming]]
* [[Applications and libraries/GUI libraries | Graphics and GUI libraries]]
* [[Applications and libraries/Network | Networking, POSIX, and other system-level stuff]]
* Two test frameworks, QuickCheck and HUnit
* Regular expressions and predictive parsers
* More...

<haskell>
module Main where

import qualified Data.Map as M

errorsPerLine = M.fromList
[ ("Chris", 472), ("Don", 100), ("Simon", -5) ]

main = do putStrLn "Who are you?"
name <- getLine
case M.lookup name errorsPerLine of
Nothing -> putStrLn "I don't know you"
Just n -> do putStr "Errors per line: "
print n
</haskell>

The <hask>import</hask> says to use code from <hask>Data.Map</hask> and that it will be prefixed by <hask>M</hask>. (That's necessary because some of the functions have the same names as functions from the prelude. Most libraries don't need the <hask>as</hask> part.)

If you want something that's not in the standard library, try looking at http://hackage.haskell.org/packages/hackage.html or this wiki's [[applications and libraries]] page. This is a collection of many different libraries written by a lot of people for Haskell. Once you've got a library, extract it and switch into that directory and do this:

runhaskell Setup configure
runhaskell Setup build
runhaskell Setup install

On a UNIX system, you may need to be root for that last part.

== Topics that don't fit in 10 minute limit ==

* [[:Category:Language | Advanced data types]]
** Arithmetic lists
** [[List comprehension]]s
** [[Type#Type and newtype | Type synonyms]]
** [[Type|data vs newtype]] (and [[Newtype|here]])
** [[Class |Type classes and instances]]
* [[:Category:Syntax |Advanced syntax]]
** [[Operator]]s
** [[Infix operator |(+) and `foo`]]
** [[Fixity declaration]]s
* Advanced functions
** [[Currying]]
** [[Lambda abstraction]]s
** [[Section of an infix operator |Sections]]
* [[:Category:Monad |Monads]]
* [[Tutorials/Programming Haskell/String IO |File I/O]]
** Reading files
** Writing Files

[[Category:Tutorials]]
Languages: [[Learn Haskell in 10 minutes|en]] [[Cn/十分钟学会 Haskell|zh/cn]]

How to read Haskell

2008-01-06T13:20:48Z

Blaisorblade: Clearify "'" with a description (it can be hard to read in some fonts) + a spelling fix -> "arbitrary"

== Introduction ==

This tutorial is aimed at the non-Haskeller who probably doesn't care too much about trying to write code, but wants to understand it.
Our adopted format is a collection of tips and tricks broken down by category. It probably isn't very important what order you read it in, but it might be good to start with the general advice. Please feel encouraged to make any complaints about Haskell on the discussion page! It will help us to improve this tutorial.

Note: you should also consider having a look at [http://www.haskell.org/~pairwise/intro/intro.html Haskell for C Programmers]. It might be a good way to get over the culture shock.

----

== General advice ==

=== Tip: it's just very very concise ===

One thing that can make Haskell hard to read is that Haskell code is extremely succinct. One tiny little piece of code can say a lot, so many times, when you are faced with something you don't understand, the best thing you can do is to think about it for some time. It will usually make sense after a while. The good news is that because of this succinctness, Haskell functions tend to be very small, which means that when you're trying to understand a difficult piece of Haskell, you normally do not have to look very far. It's just two sides of the same coin:
* bad news: high density == spending more time per line of code
* good news: succinctness == fewer lines of code to spend time on

Spending on this time to get one tiny line of code may be frustrating, but it's well worth the effort, because the fact that a very small code is hard to understand probably means that it's very abstract, and the fact that it is abstract probably means that it's going to be used in many places. So understanding that one tiny line code, as painful as it may have been initially, can pay off in a big way.

=== Trick: use the haddock ===

When reading a long piece of Haskell code, one which is broken up into many modules, you should consider keeping a browser window open with the auto-generated API documentation on the side (if any).

----

== What does this function do? ==

=== Trick: use type signatures ===

When you see stuff like this
<haskell>
map :: (a -> b) -> [a] -> [b]
</haskell>
...don't fight it! These are type signatures and they are an incredibly useful way of getting a rough idea what a function is supposed to do.

For example, the function above takes any function of type <code>(a -> b)</code> and yields a function that takes a list of <code>a</code>'s and produces a list of <code>b</code>'s. So, if <code>sqrt</code> takes a number and returns the square root of that number, <code>map sqrt</code> takes a ''list'' of numbers and returns a ''list'' of their square roots.

As another example,
<haskell>
swap :: (a,b) -> (b,a)
</haskell>
This takes a tuple of <code>(a,b)</code> and gives back a tuple of type <code>(b,a)</code>.

Here are some more things you might see in Haskell type signatures:

<haskell>
fn :: (b -> c) -> Foo -- fn is higher order; it takes a function from b -> c as input
fn :: x -> IO Int -- fn is an input/output action that returns an Int
fn :: x -> [y] -- fn returns a list of ys
fn :: x -> (y,z) -- fn returns a tuple of (y,z)
fn :: x -- fn is just a value
</haskell>

=== Tip: Haskellers love pattern matching ===

<haskell>
head [x] = x
</haskell>
This says that if 'head' is followed by a list containing only 1 item, label that item as 'x', and then return 'x'. Another example might be
<haskell>
fst (x,y) = x

snd (x,y) = y
</haskell>
These functions fetch the '''f'''ir'''st''' and '''s'''eco'''nd''' items in a tuple, respectively. It should be fairly obvious how they work.

:''elaborate''

=== Tip: a function may be defined in more than one piece ===

Remember math class, where functions would be defined like abs(x) = x if x >= 0 or -x otherwise? It's a bit like that in Haskell too. Sometimes, rather than writing one big if-then-else, Haskellers find it more convenient to define a function separately for each case, such as...
<haskell>
abs x | x >= 0 = x
abs x = -x
</haskell>

What gets confusing is when you look at a definition like this...
<haskell>
foo x | blah =
some enormous long thing

foo x =
some other enormously long thing
</haskell>

Especially looking at the bottom bit, it's hard to remember that <code>foo</code> might have a <em>another</em> definition lurking around. Luckily, you never have to look very far, either immediately above or immediately below the other definition.

(Note: some programmers will perhaps write something like <code>foo x | otherwise = ...</code>. The <code>otherwise</code> is redundant (and equal to <code>True</code>), but useful as reminder that this isn't the entire definition of <code>foo</code>)

=== Tip: pattern matching and guards can be mixed and matched ===

:''elaborate''

<haskell>
combine ((f,a,b,r):(f',a',b',r'):ss)
| f == f' = combine ((f,a.+a',b.+b',r+r'):ss)
combine ((f,a,b,r):ss) = (f,a,b,r) : combine ss
combine [] = []
</haskell>

----

== What the heck is xyz? ==

One problem you might face when reading Haskell code is figuring out some cryptic entity like <code>xyz</code> is.

=== Tip: the smaller the name, the smaller the scope ===

Do you hate the way Haskell code is littered with short, meaningless name like <code>x</code> and <code>xs</code>? When Haskell programmers use names like that, it's often for good reason.

First, typically, the short, "meaningless" names are contained within a very small space. Consider this typical (and inefficient!) implementation of a prime number generator:
<haskell>
primes :: [Integer]
primes = sieve [2..]
where
sieve (p:xs) = p : sieve [x|x <- xs, x `mod` p > 0]
</haskell>

The where block contains a function with strange variables like <code>x</code> and <code>xs</code> and <code>p</code>. In a more verbose language this could be difficult to read simply because it's difficult to actually find the definitions of small variables in long blocks of code. In C, for example, these would usually be defined at the top of a function which could have dozens (if not hundreds) of lines of code. Thus you might want to see <code>p</code> named as <code>known_prime</code> and <code>xs</code> named as <code>candidate_primes</code> or the like.

In this code, however, there is no such need for it. <code>p</code> is (implicitly) defined in the same line of code that uses it. <code>xs</code>, too, is defined there, as is <code>x</code>. Further all three variables use a popular naming convention which appends 's' to the names of lists (or equivalents) and uses single letters for singular values. The only unusual part is the selection of the pattern <code>(p:xs)</code> in the arguments over the more common <code>(x:xs)</code>. Here the programmer is signalling (subtly) that this list head is somehow different from a normal list. Quick inspection demonstrates that <code>p</code> is guaranteed to be a prime number.

The reason coding can be expressed this way in Haskell without undue confusion is because of its extreme conciseness. The habits you've had to learn to manage more verbose languages simply don't apply anymore. It takes some getting used to, but it becomes a joy one you reach that point.

This, however, is not the main reason for such "meaningless" names. The real reason for such names is even deeper. The Haskell language allows for unparallelled levels of abstraction through functional composition and higher-order functions. Where in most imperative languages a "function" (or, more often, a procedure masquerading as a function) is a pretty low-level entity with very specific, tangible functionality, Haskell functions can be extremely abstract. Consider this canonical implementation of <code>foldl</code> from the Prelude:
<haskell>
foldl :: (a -> b -> a) -> a -> [b] -> a
foldl f z [] = z
foldl f z (x:xs) = foldl f (f z x) xs
</haskell>

This function is a highly-abstract one. It is, in fact, an abstraction of iterating over a list and computing an aggregate result. What kind of list? Pretty much any kind. What kind of computation? Anything you'd care to name. What kind of result? Anything that matches the type of the priming value. What "meaningful" names can you apply to the variables here? Should it look something like this (elided and formed to fit a reasonable screen)?:
<haskell>
foldl binary_operation priming_value (list_head:list_body) =
foldl
binary_operation
(binary_operation priming_value list_head)
list_body
</haskell>

Knowing a few simple conventions of Haskell variable naming (functions progress, typically, as f, g, etc. for example) makes the first, terse version far more readable as an abstract definition than does the second, verbose version—once you get used to it.

=== Tip: types, functions and values ===

Type variables in Haskell are typically named starting at <code>a</code>, <code>b</code>, etc. They are sometimes (but not often) decorated with numbers like <code>a1</code> or <code>b3</code>.

Functions used as higher-order arguments are typically named starting at <code>f</code>, <code>g</code>, etc. They will sometimes be decorated with numbers like type variables and will also be decorated with the <code>'</code> character like <code>g'</code>. You would read this latter example as "Jee-prime" and it is typically a function that is in some way related to <code>g</code> used as a helper or the like. Occasionally functions may be given names that are not on this continuum as an aide memoir, for example a function parameter used internally as a predicate may be given the name <code>p</code>.

Arguments to functions, or variables used exclusively inside short functions, are often given names starting at <code>x</code>, <code>y</code>, etc., again occasionally decorated by numbers. Other single-letter variable names may be chosen if they can act as a mnemonic for their role such as using a variable named <code>p</code> for a value known to be prime.

Note that these are guidelines and not rules. ''Any'' of them can and will be ignored, modified and/or abused in ''any'' given piece of Haskell code. (A quick look at the Standard Prelude as provided in the Haskell 98 Report should be convincing enough for this.)

=== Tip: the -s and m- habits ===

There is a variable name habit that sometimes comes with short names. Typically, if you have a thing you want to name <code>x</code>, you'll sometimes want to name lists of these <code>xs</code>. As in the plural of <code>x</code>. So if you see a name like <code>as</code> or <code>bs</code> or <code>foos</code>, it's often good to mentally read that as "aeyes" (the plural of a), "bees" (the plural of b), and "foohs" (the plural of foo). It might seem obvious to some, but it took me a while to stop asking myself in situations like this, "<code>as</code>? What the heck is aey-ess?"

Similarly, another habit you might see is people who begin variable names with m-. This is probably less common, but if you see a lot of m-, it might be because of the Maybe type. Sometimes we have <code>foo</code> of type <code>Whatever</code>, and <code>mfoo</code> of type <code>Maybe Whatever</code>. Relax, this isn't [http://en.wikipedia.org/wiki/Hungarian_notation Hungarian notation]. It's not something that's used systematically, or rigidly in any way.

Both of these conventions are just helpful when you have both variants floating around in the same place, that is, when you have both Whatever and [Whatever] (that would be list of whatever), <code>x</code> and <code>xs</code> is a good way to indicate that they are both the same thing, except one comes in a list. Likewise, when you have both Whatever and Maybe Whatever in the same function, <code>x</code> and <code>mx</code> are too.

Finally, library functions are sometimes suffixed with "l", "r", "_", "M" or "'" (a single quote). What do these mean?
<haskell>
mapM -- the 'map' function lifted into a monad. An 'M' suffix implies that the function is a
-- monadic version of an equivalent pure function
mapM_ -- the '_' suffix indicates that the result of this computation is discarded, and () is
-- returned (by analogy with the _ pattern).
foldl -- a fold that traverses its structure left to right
foldr -- a fold that traverses its structure right to left
foldl' -- a fold that is strict in its accumulator, "'" is used to indicate a strict variant of
-- a function
</haskell>

=== Tip: order mostly doesn't matter ===

It doesn't matter what order functions are defined in. This:
<haskell>
foo x y z = ...

bar a b = ... foo b ...
</haskell>
is exactly equivilent to this:
<haskell>
bar a b = ... foo b ...

foo x y z = ...
</haskell>
Functions further up can call functions that are defined lower down, and vice versa. Functions can be written in any order at all. It doesn't matter.

:* ''scope in a nutshell''

=== Tip: order does matter for pieces of functions ===

Very important: whilst the order that you define individual functions does not matter, what does matter is the order that you define its individual pieces.

For example, these two versions of abs do NOT mean the same thing!

<haskell>
-- the right order
abs x | x >= 0 = x
abs x = -x

-- the wrong order
abs2 x = -x
abs2 x | x >= 0 = x
</haskell>

=== Trick: use grep ===

(This might seem really obvious, but it's sometimes easy to forget)

Or use the search feature of your favourite text editor. It's probably defined right there before your eyes, and if it's true to Haskell style, the definition is probably so small you blew right through it. In vi, for example, you could do <code>/= *xyz</code> which searches for =, an arbitrary number of spaces, and then xyz.

Barring that, <code>xyz</code> might be defined in some different module in the code you downloaded. You can look for telltale signs like
<haskell>
import Manamana (xyz)
</haskell>

But note that sometimes programmers get lazy, and they don't specify that <code>xyz</code> should be imported. They just let rip with
<haskell>
import Manamana
</haskell>

So solution number 3 would be do something like
<code>
grep xyz *.lhs *.hs
</code>
(Note that literate programs sometimes use non-literate code, so search in both lhs AND hs)

A fourth idea, if you can't find something, is to look it up in [http://haskell.org/hoogle/ Hoogle]

A fifth idea, for Hugs/WinHugs users, is to use the ":find" command, ":find xyz" will open up your text editor with the appropriate module, jumped to the correct place. GHCi users can use ":i xyz" to get the place "xyz" is defined. (It won't open an editor, though.)

[[Category:Tutorials]]