https://wiki.haskell.org/api.php?action=feedcontributions&user=Anton-Latukha&feedformat=atomHaskellWiki - User contributions [en]2020-12-04T16:16:22ZUser contributionsMediaWiki 1.27.4https://wiki.haskell.org/index.php?title=Monads_as_containers&diff=63149Monads as containers2019-11-23T18:07:13Z<p>Anton-Latukha: upd references `return -> pure`, `map -> fmap`; add code tags</p>
<hr />
<div>''There now exists a translation of this article into [http://community.livejournal.com/ru_lambda/12467.html Russian]!''<br />
<br />
----<br />
<br />
A [[monad]] is a container type together with a few methods defined on it.<br />
Monads model different kinds of computations.<br />
<br />
Like Haskell lists, all the elements which a monadic container holds at any<br />
one time must be the same type (it is homogeneous).<br />
<br />
There are a few ways to choose the basic set of functions that one can perform<br />
on these containers to be able to define a monad. Haskell generally uses a<br />
pair of functions called <code>pure</code> (<code>return = pure</code>) and monadic<br />
bind <code>(>>=)</code>, but it is more natural sometimes to begin with <code>fmap</code>,<br />
<code>pure</code> and <code>join</code>, as these are simpler<br />
to understand at first. We can later define bind in terms of these.<br />
<br />
The first of these three, generally called ''map'', (but called <code>fmap</code> in<br />
Haskell 98) actually comes from the definition of a ''functor''. We can think<br />
of a functor as a type of container where we are permitted to apply a single<br />
function to every object in the container.<br />
<br />
That is, if <code style="color:darkgreen">f</code> is a functor, and we are given a function of type <code>(<span style="color:firebrick">a</span> -> <span style="color:mediumblue">b</span>)</code>,<br />
and a container of type <code>(<span style="color:darkgreen">f</span> <span style="color:firebrick">a</span>)</code>, we can get a new container of type <code>(<span style="color:darkgreen">f</span> <span style="color:mediumblue">b</span>)</code>.<br />
<br />
This is expressed in the type of <code>fmap</code>:<br />
<haskell><br />
fmap :: (Functor f) => (a -> b) -> f a -> f b<br />
</haskell><br />
''If you will give me a blueberry for each apple I give you'' <code>(<span style="color:firebrick">a</span> -> <span style="color:mediumblue">b</span>)</code>'', and I have a box of apples ''<code>(<span style="color:darkgreen">f</span> <span style="color:firebrick">a</span>)</code>'', then I can get a box of blueberries ''<code>(<span style="color:darkgreen">f</span> <span style="color:mediumblue">b</span>)</code>''.''<br />
<br />
Every monad is a functor.<br />
<br />
The second method, ''return'', is specific to monads. If <code style="color:indigo">m</code> is a monad, then<br />
return takes an element of type <code style="color:firebrick">a</code>, and gives a container of type <code>(<span style="color:indigo">m</span> <span style="color:firebrick">a</span>)</code><br />
with that element in it. So, its type in Haskell is<br />
<haskell><br />
return :: (Monad m) => a -> m a<br />
</haskell><br />
''If I have an apple'' <code>(<span style="color:firebrick">a</span>)</code> ''then I can put it in a box'' <code>(<span style="color:indigo">m</span> <span style="color:firebrick">a</span>)</code>''.''<br />
<br />
The third method, ''join'', also specific to monads, takes a container of<br />
containers <code><span style="color:indigo">m</span> (<span style="color:indigo">m</span> <span style="color:firebrick">a</span>)</code>, and combines them into one <code><span style="color:indigo">m</span> <span style="color:firebrick">a</span></code> in some sensible<br />
fashion. Its Haskell type is<br />
<haskell><br />
join :: (Monad m) => m (m a) -> m a<br />
</haskell><br />
''If I have a box of boxes of apples'' <code>(<span style="color:indigo">m</span> (<span style="color:indigo">m</span> <span style="color:firebrick">a</span>))</code> ''then I can take the apples from each, and put them in a new box'' <code>(<span style="color:indigo">m</span> <span style="color:firebrick">a</span>)</code>''.''<br />
<br />
From these, we can construct an important operation called ''bind'' or<br />
''extend'', which is commonly given the symbol <code>(>>=)</code>. When you define your<br />
own monad in Haskell, it will expect you to define just return and bind. It<br />
turns out that mapping and joining come for free from these two. Although only<br />
return and bind are needed to define a monad, it is usually simpler to think<br />
about map, return, and join first, and then get bind from these, as map and<br />
join are in general simpler than bind.<br />
<br />
What bind does is to take a container of type <code>(<span style="color:indigo">m</span> <span style="color:firebrick">a</span>)</code> and a function of type<br />
<code>(<span style="color:firebrick">a</span> -> <span style="color:indigo">m</span> <span style="color:mediumblue">b</span>)</code>. It first maps the function over the container, (which would give <br />
an <code><span style="color:indigo">m</span> (<span style="color:indigo">m</span> <span style="color:mediumblue">b</span>)</code>) and then applies join to the result to get a container of type<br />
<code>(<span style="color:indigo">m</span> <span style="color:mediumblue">b</span>)</code>. Its type and definition in Haskell is then<br />
<haskell><br />
(>>=) :: (Monad m) => m a -> (a -> m b) -> m b<br />
xs >>= f = join (fmap f xs) <br />
-- how we would get bind (>>=) in Haskell if it were join and fmap<br />
-- that were chosen to be primitive<br />
</haskell><br />
''If I have a box of apples'' <code>(<span style="color:indigo">m</span> <span style="color:firebrick">a</span>)</code> ''and for each apple, you will give me a box of blueberries'' <code>(<span style="color:firebrick">a</span> -> <span style="color:indigo">m</span> <span style="color:mediumblue">b</span>)</code> ''then I can get a box with all the blueberries together'' <code>(<span style="color:indigo">m</span> <span style="color:mediumblue">b</span>)</code>''.''<br />
<br />
Note that for a given container type, there might be more than one way to<br />
define these basic operations (though for obvious reasons, Haskell will only<br />
permit one instance of the Monad class per actual type). [Technical side note:<br />
The functions return and bind need to satisfy<br />
[[Monad_laws | a few laws]] in order to<br />
make a monad, but if you define them in a sensible way given what they are<br />
supposed to do, the laws will work out. The laws are only a formal way to<br />
give the informal description of the meanings of return and bind I have here.]<br />
<br />
It would be good to have a concrete example of a monad at this point, as these<br />
functions are not very useful if we cannot find any examples of a type to<br />
apply them to.<br />
<br />
Lists are most likely the simplest, most illustrative example. Here, <code>fmap</code><br />
is just the usual <code>map</code>, <code>return</code> is just <code>(\x -> [x])</code> and <code>join</code> is <code>concat</code>.<br />
<br />
<haskell><br />
instance Monad [] where<br />
--return :: a -> [a]<br />
return x = [x] -- make a list containing the one element given<br />
<br />
--(>>=) :: [a] -> (a -> [b]) -> [b]<br />
xs >>= f = concat (map f xs) <br />
-- collect up all the results of f (which are lists)<br />
-- and combine them into a new list<br />
</haskell><br />
<br />
The list monad, in some sense, models computations which could return any<br />
number of values. Bind pumps values in, and catches all the values output.<br />
Such computations are known in computer science as ''nondeterministic''.<br />
That is, a list [x,y,z] represents a value which is all of the values x, y,<br />
and z at once.<br />
<br />
A couple examples of using this definition of bind:<br />
<haskell><br />
[10,20,30] >>= \x -> [x, x+1] <br />
-- a function which takes a number and gives both it and its<br />
-- successor at once<br />
= [10,11,20,21,30,31]<br />
<br />
[10,20,30] >>= \x -> [x, x+1] >>= \y -> if y > 20 then [] else [y,y]<br />
= [10,10,11,11,20,20]<br />
</haskell><br />
<br />
And a simple fractal, exploiting the fact that lists are ordered:<br />
<haskell><br />
f x | x == '#' = "# #"<br />
| otherwise = " "<br />
<br />
"#" >>= f >>= f >>= f >>= f<br />
= "# # # # # # # # # # # # # # # #"<br />
</haskell><br />
<br />
You might notice a similarity here between bind and function application or<br />
composition, and this is no coincidence. The reason that bind is so important<br />
is that it serves to chain computations on monadic containers together.<br />
<br />
You might be interested in how, given just bind and return, we can get back<br />
to map and join.<br />
<br />
Mapping is equivalent to binding to a function which only returns containers<br />
with a single value in them -- the value that we get from the function of<br />
type <code>(a -> b)</code> which we are handed.<br />
<br />
The function that does this for any monad in Haskell is called <code>liftM</code> -- it<br />
can be written in terms of return and bind as follows:<br />
<haskell><br />
liftM :: (Monad m) => (a -> b) -> m a -> m b<br />
liftM f xs = xs >>= (return . f)<br />
-- take a container full of a's, to each, apply f,<br />
-- put the resulting value of type b in a new container,<br />
-- and then join all the containers together.<br />
</haskell><br />
<br />
Joining is equivalent to binding a container with the identity map. This is<br />
indeed still called <code>join</code> in Haskell:<br />
<haskell><br />
join :: (Monad m) => m (m a) -> m a<br />
join xss = xss >>= id<br />
</haskell><br />
<br />
It is common when constructing monadic computations that one ends up with a<br />
large chain of binds and lambdas. For this reason, some syntactic sugar called<br />
"do notation" was created to simplify this process, and at the same time, make<br />
the computations look somewhat like imperative programs.<br />
<br />
Note that in what follows, the syntax emphasizes the fact that the list monad<br />
models nondeterminism: the code <code>y <- xs</code> can be thought of as <code>y</code> taking on<br />
all the values in the list <code>xs</code> at once.<br />
<br />
The above (perhaps somewhat silly) list computations could be written:<br />
<haskell><br />
do x <- [10,20,30]<br />
[x, x+1]<br />
</haskell><br />
and,<br />
<haskell><br />
do x <- [10,20,30]<br />
y <- [x, x+1]<br />
if y > 20 then [] else [y,y]<br />
</haskell><br />
<br />
The code for liftM could be written:<br />
<haskell><br />
liftM f xs = do a <- xs<br />
return (f a)<br />
</haskell><br />
<br />
----<br />
<br />
If you understood the above, then you have a good start on understanding<br />
monads in Haskell. Check out Maybe (containers with at most one thing in them,<br />
modelling computations that might not return a value) and State (modelling<br />
computations that carry around a state parameter using an odd sort of<br />
container described below), and you'll start getting the picture as to how<br />
things work.<br />
<br />
A good exercise is to figure out a definition of bind and return (or fmap,<br />
join and return) which make the following tree type a monad. Just keep in<br />
mind what they are supposed to do.<br />
<haskell><br />
data Tree a = Leaf a | Branch (Tree a) (Tree a)<br />
</haskell><br />
<br />
For more good examples of monads, and lots of explanation see<br />
[http://www.haskell.org/haskellwiki/All_About_Monads All About Monads] which has a catalogue of the commonest<br />
ones, more explanation as to why you might be interested in monads, and<br />
information about how they work.<br />
<br />
----<br />
<br />
The question that many people will be asking at this point is "What does this<br />
all have to do with IO?".<br />
<br />
Well, in Haskell, IO is a monad. How does this mesh with the notion of a<br />
container?<br />
<br />
Consider <code>getChar :: IO Char</code> -- that is, an IO container with a character<br />
value in it. The exact character that the container holds is determined when<br />
the program runs by which keys the user presses. Trying to get the character<br />
out of the box will cause a side effect: the program stops and waits for the<br />
user to press a key. This is generally true of IO values - when you get a<br />
value out with bind, side effects can occur. Many IO containers don't<br />
actually contain interesting values. For example,<br />
<haskell><br />
putStrLn "Hello, World!" :: IO ()<br />
</haskell><br />
That is, the value returned by <code>putStrLn "Hello, World!"</code> is an IO container<br />
filled with a value of type <code>()</code>, a not so interesting type. However, when<br />
you pull this value out during a bind operation, the string <code>Hello, World!</code><br />
is printed on the screen. So another way to think about values of type <code>IO t</code><br />
is as computations which when executed, may have side effects before<br />
returning a value of type t.<br />
<br />
One thing that you might notice as well, is that there is no ordinary Haskell<br />
function you can call (at least not in standard Haskell) to actually get a<br />
value out of an IO container/computation, other than bind, which puts it right<br />
back in. Such a function of type <code>IO a -> a</code> would be very unsafe in the pure<br />
Haskell world, because the value produced could be different each time it was<br />
called, and the IO computation could have side effects, and there would be no<br />
way to control when it was executed (Haskell is lazy after all). So how do IO<br />
actions ever get run? The IO action called <code>main</code> runs when the program is<br />
executed. It can make use of other IO actions in the process, and everything<br />
starts from there.<br />
<br />
When doing IO, a handy special form of bind when you just want the side<br />
effects and don't care about the values returned by the container on the left<br />
is this:<br />
<haskell><br />
(>>) :: Monad m => m a -> m b -> m b<br />
m >> k = m >>= \_ -> k<br />
</haskell><br />
<br />
An example of doing some IO in do notation:<br />
<haskell><br />
main = do putStrLn "Hello, what is your name?"<br />
name <- getLine<br />
putStrLn ("Hello " ++ name ++ "!")<br />
</haskell><br />
<br />
or in terms of bind, making use of the special form:<br />
<haskell><br />
main = putStrLn "Hello, what is your name?" >><br />
getLine >>= \name -><br />
putStrLn ("Hello " ++ name ++ "!")<br />
</haskell><br />
<br />
or, very primitive, without the special form for bind:<br />
<haskell><br />
main = putStrLn "Hello, what is your name?" >>= \x -><br />
getLine >>= \name -><br />
putStrLn ("Hello " ++ name ++ "!")<br />
</haskell><br />
<br />
----<br />
<br />
Another good example of a monad which perhaps isn't obviously a container at<br />
first, is the [[Reader monad]]. This monad basically consists<br />
of functions from a particular type: <code>((->) e)</code>, which might be written<br />
<code>(e ->)</code> if that were supported syntax. These can be viewed as containers<br />
indexed by values of type <code>e</code>, having one spot for each and every value of<br />
type <code>e</code>. The primitive operations on them follow naturally from thinking<br />
this way.<br />
<br />
The [[Reader monad]] models computations which read from (depend on) a shared<br />
environment. To clear up the correspondence, the type of the environment is<br />
the index type on our indexed containers.<br />
<br />
<haskell><br />
type Reader e = (->) e -- our monad<br />
</haskell><br />
<br />
Return simply produces the container having a given value at every spot.<br />
<haskell><br />
return :: a -> (Reader e a)<br />
return x = (\k -> x)<br />
</haskell><br />
<br />
Mapping a function over such a container turns out to be nothing more than<br />
what composition does.<br />
<br />
<haskell><br />
fmap :: (a -> b) -> Reader e a -> Reader e b<br />
= (a -> b) -> (e -> a) -> (e -> b)<br />
-- by definition, (Reader a b) = (a -> b)<br />
<br />
fmap f xs = f . xs<br />
</haskell><br />
<br />
How about join? Well, let's have a look at the types.<br />
<haskell><br />
join :: (Reader e) (Reader e a) -> (Reader e a)<br />
= (e -> e -> a) -> (e -> a) -- by definition of (Reader a)<br />
</haskell><br />
There's only one thing the function of type <code>(e -> a)</code> constructed could<br />
really be doing:<br />
<br />
<haskell><br />
join xss = (\k -> xss k k)<br />
</haskell><br />
<br />
From the container perspective, we are taking an indexed container of indexed<br />
containers and producing a new one which at index <code>k</code>, has the value at index<br />
<code>k</code> in the container at index <code>k</code>.<br />
<br />
So we can derive what we want bind to do based on this:<br />
<haskell><br />
(>>=) :: (Reader e a) -> (a -> Reader e b) -> (Reader e b)<br />
= (e -> a) -> (a -> (e -> b)) -> (e -> b) -- by definition<br />
<br />
xs >>= f = join (fmap f xs)<br />
= join (f . xs)<br />
= (\k -> (f . xs) k k)<br />
= (\k -> f (xs k) k)<br />
</haskell><br />
<br />
Which is exactly what you'll find in other definitions of the Reader monad.<br />
What is it doing? Well, it's taking a container <code>xs</code>, and a function <code>f</code> from<br />
the values in it to new containers, and producing a new container which at<br />
index <code>k</code>, holds the result of looking up the value at <code>k</code> in <code>xs</code>, and then<br />
applying <code>f</code> to it to get a new container, and finally looking up the value<br />
in that container at <code>k</code>.<br />
<br />
The [[Monads as computation]] perspective makes the purpose of such a monad perhaps<br />
more obvious: bind is taking a computation which may read from the environment<br />
before producing a value of type <code>a</code>, and a function from values of type <code>a</code><br />
to computations which may read from the environment before returning a value<br />
of type <code>b</code>, and composing these together, to get a computation which might<br />
read from the (shared) environment, before returning a value of type <code>b</code>.<br />
<br />
----<br />
<br />
How about the [[State monad]]? Although I'll admit that<br />
with State and IO in particular, it is generally more natural to take the<br />
view of [[Monads as computation]], it is good to see that the container analogy<br />
doesn't break down. The state monad is a particular refinement of the reader<br />
monad discussed above.<br />
<br />
I won't go into huge detail about the state monad here, so if you don't<br />
already know what it's for, what follows may seem a bit unnatural. It's<br />
perhaps better taken as a secondary way to look at the structure.<br />
<br />
For reference to the analogy, a value of type <code>(State s a)</code> is like a<br />
container indexed by values of type <code>s</code>, and at each index, it has a value of<br />
type <code>a</code> and another, new value of type <code>s</code>. The function <code>runState</code> does<br />
this "lookup".<br />
<br />
<haskell><br />
newtype State s a = State { runState :: (s -> (a,s)) }<br />
</haskell><br />
<br />
What does return do? It gives a <code>State</code> container with the given element at<br />
every index, and with the "address" (a.k.a. state parameter) unchanged.<br />
<haskell><br />
return :: a -> State s a<br />
return x = State (\s -> (x,s))<br />
</haskell><br />
<br />
Mapping does the natural thing, applying a function to each of the values of<br />
type <code>a</code>, throughout the structure.<br />
<haskell><br />
fmap :: (a -> b) -> (State s a) -> (State s b)<br />
fmap f (State m) = State (onVal f . m)<br />
where onVal f (x, s) = (f x, s)<br />
</haskell><br />
<br />
Joining needs a bit more thought. We want to take a value of type<br />
<code>(State s (State s a))</code> and turn it into a <code>(State s a)</code> in a natural way.<br />
This is essentially removal of indirection. We take the new address and new<br />
box that we get from looking up a given address in the box, and we do another<br />
lookup -- note that this is almost the same as what we did with the reader<br />
monad, only we use the new address that we get at the location, rather than<br />
the same address as for the first lookup.<br />
<br />
So we get:<br />
<haskell><br />
join :: (State s (State s a)) -> (State s a)<br />
join xss = State (\s -> uncurry runState (runState xss s))<br />
</haskell><br />
<br />
----<br />
<br />
I hope that the above was a reasonably clear introduction to what monads are<br />
about. Feel free to [[Talk:Monads as containers|make criticisms and ask questions]].<br />
<br />
- [[User:CaleGibbard|CaleGibbard]]<br />
<br />
==See also==<br />
<br />
* [[Monad]]<br />
* [[Monads as computation]]<br />
<br />
<br />
[[Category:Tutorials]] [[Category:Monad]]</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=Failure&diff=62994Failure2019-08-12T12:45:20Z<p>Anton-Latukha: rm /* safe-failure */ - deprecated</p>
<hr />
<div>= Exceptions, errors, failures oh my! =<br />
<br />
Terminology is a little confusing. There's a few different things floating around:<br />
<br />
* Error usually refers to a programming error. It can also refer to the "error" function, which in fact causes a runtime exception to be triggered.<br />
* Exception usually refers to an exception which is thrown in the IO monad. It can also refer to the actual typeclass "Exception" which was introduced along with extensible exceptions.<br />
* Fail is referring to the "fail" function, which is part of the Monad typeclass and is almost universally despised.<br />
<br />
To avoid the baggage and name clashes introduced with all of the above, we use the term failure. This name is used consistently for module names, type classes, function names and the abstract concept.<br />
<br />
But what is a failure? It's any time that something does not succeed. Of course, this depends on your definition of success. Here are some examples:<br />
<br />
* readInt fails when it receives invalid input.<br />
* head fails when it receives an empty list.<br />
* lookup fails when the key is not found in the list.<br />
* at (also known as !!) fails when the index is past the end of the list.<br />
<br />
Now that we know what a failure is, let's discuss how to deal with it.<br />
<br />
= Prior Art =<br />
<br />
There are currently a number of methods available for dealing with failures. However, all of them are lacking in some way:<br />
<br />
* Maybe. However, there's no information on *what* failed.<br />
* The error function. But not all failures should be fatal.<br />
* IO exceptions. But they force your code into your IO monad, plus it's non-obvious that a function might fail with an exception based solely on its return type.<br />
* Custom error values. But how do you compose two libraries together?<br />
* The Either/ErrorT monads. The main problem with these lies in composability and extensibility.<br />
<br />
This abundance of error handling methods leads to lots of gluing code when combining libraries which use different notions of failure. <br />
Examples:<br />
<br />
* The partial functions in the Prelude and other base modules, such as head, tail, fromJust, lookup, etc. use the error function.<br />
* parsec models failure using Either ParseError<br />
* http models failure using Either ConnError<br />
* The HDBC package models failure using the SQLError exception<br />
* The cgi package uses exceptions<br />
<br />
Quoting from Eric Kidd:<br />
{| class="wikitable"<br />
|-<br />
|<br />
<pre><br />
Consider a program to download a web page and parse it:<br />
<br />
1. Network.URI.parseURI returns (Maybe URI).<br />
2. Network.HTTP.simpleHTTP returns (IO (Result Request)), which is basically a broken version of (IO (Either ConnError Request)).<br />
3. Parsec returns (Either ParseError a)<br />
<br />
So there's no hope that I can write anything like:<br />
<br />
do uri <- parseURI uriStr<br />
doc <- evenSimplerHTTP uri<br />
parsed <- parse grammar uriStr doc<br />
</pre><br />
|}<br />
<br />
= Enter failure package, stage left =<br />
<br />
What we want is an abstract notion of failure which does not tie us to any particular error handling mechanism. This is what the [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/failure failure] package intends to provide. <br />
<br />
Failure is the typeclass used to model this abstract notion of failure. It lives in the Control.Failure module and consists entirely of the following two lines:<br />
<br />
class Failure e f where<br />
failure :: e -> f v<br />
<br />
So now, you could define a safe head function as follows:<br />
<br />
data EmptyListFailure = EmptyListFailure<br />
head :: (Monad m, Failure EmptyListFailure m) => [a] -> m a<br />
head [] = failure EmptyListFailure<br />
head (x:_) = return x<br />
<br />
Notice how the opposite of failure is "return". Here, we introduced the Monad requirement explicitly; there are also two convenience classes that can help you out here:<br />
<br />
* ApplicativeFailure. In this case, just replace "return" with "pure"<br />
* MonadFailure: code would remain exactly as-is.<br />
<br />
== Handling failures ==<br />
<br />
When dealing with a failure, the type signature states what might go wrong. In the example of head above, head will fail if the list is empty.<br />
<br />
In any event, we need to instantiate Failure in order to actually call the head function. The failure package comes with instances for each of the above mentioned error handling mechanisms.<br />
<br />
* Maybe. failure == Nothing<br />
* List. failure == []<br />
* IO. failure == throw<br />
* Either. failure == Left<br />
<br />
Note: we previously provided an instance for ErrorT, where failure == throwError. Now, however, this is provided by the control-monad-failure or control-monad-failure-mtl packages.<br />
<br />
However, there are also two other packages available which provide more resilient options.<br />
<br />
=== [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/control-monad-exception control-monad-exception ]===<br />
<br />
c-m-e provides a EM monad for explicitly typed, checked exceptions a la Java.<br />
The type of a EM computation carries the list of exceptions it can throw in its type. For example, let's say that we have a basic arithmetic expression evaluator which can throw divide by zero or sum overflow exceptions. Its type will be:<br />
<br />
eval :: (Throws DivByZero l, Throws SumOverflow l) => Expr -> EM l Double<br />
<br />
Correspondingly, a computation which calls eval and handles only the DivByZero exception gets the type:<br />
<br />
:t eval <expr> `catch` \DivByZero -> return 0<br />
eval <expr> :: Throws SumOverflow l => EM l Double<br />
<br />
<br />
=== [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/attempt attempt] ===<br />
<br />
Attempt is intended when you don't know what might go wrong, but you know it could happen. For example, let's say I want to define a type class:<br />
<br />
class Convert a b where<br />
convert :: a -> Attempt b<br />
<br />
Other libraries may want to instantiate Convert for their own types. However, when writing the Convert typeclass, I don't know exactly what failures may occur. When writing the "Convert String Int" instances, it might be InvalidInteger. But when writing the "Convert English Spanish" typeclass (hey, it could happen) it might be InvalidGrammar.<br />
<br />
= Stack traces =<br />
<br />
As an added bonus, control-monad-exception provides stack traces, or more exactly [http://pepeiborra.posterous.com/monadic-stack-traces-that-make-a-lot-of-sense monadic call traces], via [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/monadloc MonadLoc].<br />
<br />
= See also =<br />
<br />
* [[Error]]<br />
* [[Exception]]<br />
* [[Error vs. Exception]]<br />
<br />
= References =<br />
<br />
* http://www.randomhacks.net/articles/2007/03/10/haskell-8-ways-to-report-errors<br />
* http://www.haskell.org/pipermail/libraries/2007-March/007010.html</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=Jobs&diff=62987Jobs2019-08-08T07:50:55Z<p>Anton-Latukha: rm cufp.org/jobs as not-existent</p>
<hr />
<div>This page collects advertisements for commercial and academic positions involving Haskell or related technologies. <br />
<br />
If you are seeking Haskell jobs, or wish to recruit, good places to follow are the Haskell and Haskell-cafe mailing lists, the Types mailing list (particularly for research jobs). Contacting those organisations listed on the [[Haskell in industry]] is a good idea. Joining the nascent networking site [http://www.haskellers.com Haskellers] may also prove beneficial.<br />
<br />
Please also supply the date when you add a new job opening to the list below.<br />
<br />
<br />
==Industry positions==<br />
<br />
*[http://www.bluespec.com/careers.html Bluespec, Inc.]<br />
*[http://cpmed.de/jobs factis research GmbH / Checkpad MED (German)]<br />
*[http://www.functor.se/careers/openings/ Functor AB]<br />
*[http://corp.galois.com/careers/ Galois, Inc]<br />
*[http://www.starling-software.com/en/employment/ Starling Software K.K.]<br />
*[http://www.tsurucapital.com/en/ Tsuru Capital LLC] <br />
*[http://www.wagonhq.com/jobs Wagon]<br />
*[https://www.xoken.org/careers/ Xoken Labs (Bangalore, India)]<br />
<br />
===Related positions===<br />
<br />
* [https://www.erlang-solutions.com/about/careers Erlang Solutions]<br />
<br />
==Academic positions==<br />
<br />
* [http://people.cs.kuleuven.be/~tom.schrijvers/postdocposition2.html Postdoctoral position in Functional and Constraint Programming at KU Leuven] (added October, 2014)<br />
* [http://www3.imperial.ac.uk/computing/vacancies#JD0414 Research Assistant / Research Associate in Applied Computing] (added April 2014)<br />
* [http://cufp.org/jobs/programming-languages-researcher-haskell-expertise Programming Languages Researcher with Haskell Expertise] (added Sep 2010)<br />
<br />
==PhD/MSc Studentships==<br />
<br />
* [http://www.functor.se Industrial PhDs and MSc projects at Functor AB]<br />
<br />
==Internships==<br />
<br />
* [http://hackage.haskell.org/trac/ghc/wiki/Internships Internships on Haskell and GHC, at Microsoft Research, Cambridge]<br />
* [http://www.haskellers.com/jobs/5 Spring Internship at Intel to develop EDSL for parallel vector computation] (added Nov 2010)<br />
* [https://www.vacationlabs.com/haskell-internship/ Remote Internship in Haskell web-development at Vacation Labs, India] (added Oct, 2016)<br />
<br />
==Job sites==<br />
<br />
* [http://www.haskellers.com/jobs Haskellers]<br />
* [http://cufp.org/jobs/language/31 Commercial Users of Functional Programming]<br />
* [http://functionaljobs.com/ Functional Jobs]<br />
<br />
[[Category:Community]]</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=Error_vs._Exception&diff=62969Error vs. Exception2019-07-23T16:45:24Z<p>Anton-Latukha: /* Examples */ rm first person talk</p>
<hr />
<div>There has been confusion about the distinction between [[error]]s and [[exception]]s for a long time,<br />
repeated threads in Haskell-Cafe and more and more packages that handle errors and exceptions or something between.<br />
Although both terms are related and sometimes hard to distinguish, it is important to do it carefully.<br />
This is like the confusion between [[Parallelism vs. Concurrency|parallelism and concurrency]].<br />
<br />
The first problem is that "exception" seems to me to be the historically younger term.<br />
Before there were only "errors", independent of whether they were programming, I/O or user errors.<br />
In this article we use the term '''exception''' for expected but irregular situations at runtime<br />
and the term '''error''' for mistakes in the running program that can be resolved only by fixing the program.<br />
We do not want to distinguish between different ways of representing exceptions:<br />
<hask>Maybe</hask>, <hask>Either</hask>, exceptions in <hask>IO</hask> monad, or return codes,<br />
they all represent exceptions and are worth considering for exception handling.<br />
<br />
The history may have led to the identifiers we find today in the Haskell language and standard Haskell modules.<br />
<br />
* Exceptions: <hask>Prelude.catch</hask>, <hask>Control.Exception.catch</hask>, <hask>Control.Exception.try</hask>, <hask>IOError</hask>, <hask>Control.Monad.Error</hask><br />
* Errors: <hask>error</hask>, <hask>assert</hask>, <hask>Control.Exception.catch</hask>, <hask>Debug.Trace.trace</hask><br />
<br />
Note, that the <hask>catch</hask> function from <hask>Prelude</hask> handles exclusively exceptions,<br />
whereas its counterpart from <hask>Control.Exception</hask> also catches certain kinds of undefined values.<br />
<haskell><br />
Prelude> catch (error "bla") (\msg -> putStrLn $ "caught " ++ show msg)<br />
*** Exception: bla<br />
<br />
Prelude> Control.Exception.catch (error "bla") (\msg -> putStrLn $ "caught " ++ show (msg::Control.Exception.SomeException))<br />
caught bla<br />
</haskell><br />
This is unsafe, since Haskell's <hask>error</hask> is just sugar for <hask>undefined</hask>, that shall help spotting a programming error.<br />
A program should work as well when all <hask>error</hask>s and <hask>undefined</hask>s are replaced by infinite loops.<br />
However infinite loops in general cannot be caught, whereas calls to sugared functions like <hask>error</hask> can.<br />
<br />
Even more confusion was initiated by the Java programming language<br />
to use the term "exceptions" for programming errors like the <code>NullPointerException</code><br />
and introducing the distinction between<br />
[http://java.sun.com/docs/books/tutorial/essential/exceptions/runtime.html checked and unchecked exceptions].<br />
<br />
<br />
== Examples ==<br />
<br />
Let's give some examples for explaining the difference between errors and exceptions and why the distinction is important.<br />
<br />
First, consider a compiler like [[GHC]].<br />
If you feed it a program that contains invalid syntax or inconsistent types, it emits a description of the problem.<br />
Such occurrences are considered to be exceptions.<br />
GHC anticipates bad syntax and mismatched types and handles them by generating useful messages for the user.<br />
However, if GHC spits out a message like "Panic! This should not happen: ... Send a bug report to ghc@haskell.org", then you've encountered a situation which indicates a flaw in GHC.<br />
This would be considered an error.<br />
It cannot be handled by GHC or by the user.<br />
The error message, "Panic!...", is only useful to the GHC developers in fixing the problem.<br />
<br />
Ok, these are possible reactions to user input.<br />
Now a more difficult question:<br />
How should GHC handle corruptions in the files it has generated itself like the interface (.hi) and object files (.o)?<br />
These corruptions can be introduced easily by the user by editing the files in a simple text editor,<br />
or by network problems or by exchanging files between operating systems or different GHC versions.<br />
Thus GHC must be prepared for them, which means, it must generate and handle exceptions here.<br />
Program must tell the user at least that there is some problem with the file.<br />
A more obscure case - modified files by malicious software.<br />
Next question: Must GHC also be prepared for corrupt memory or damages in the CPU?<br />
According to the above definition corrupt memory is an exception, not an error.<br />
However, GHC cannot do much to solve such situations. Checking hardware is a OS responsibility.<br />
<br />
<!-- User enters number as string, pass to function that expects non-negative number --><br />
<br />
Now we proceed with two examples that show, what happens if you try to treat errors like exceptions:<br><br />
I was involved in the development of a library that was written in C++.<br />
One of the developers told me, that the developers are divided into the ones who like exceptions<br />
and the other ones who prefer return codes.<br />
As it seem to me, the friends of return codes won.<br />
However, I got the impression that they debated the wrong point:<br />
Exceptions and return codes are equally expressive, they should however not be used to describe errors.<br />
Actually the return codes contained definitions like ARRAY_INDEX_OUT_OF_RANGE.<br />
But I wondered: How shall my function react, when it gets this return code from a subroutine?<br />
Shall it send a mail to its programmer?<br />
It could return this code to its caller in turn, but it will also not know how to cope with it.<br />
Even worse, since I cannot make assumptions about the implementation of a function,<br />
I have to expect an ARRAY_INDEX_OUT_OF_RANGE from every subroutine.<br />
My conclusion is, that ARRAY_INDEX_OUT_OF_RANGE is a (programming) error.<br />
It cannot be handled or fixed at runtime, it can only be fixed by its developer.<br />
Thus there should be no according return code, but instead there should be <code>assert</code>s.<br />
<br />
The second example is a library for advanced arithmetic in Modula-3.<br />
I decided to use exceptions for signalling problems.<br />
One of the exceptions was VectorSizeMismatch,<br />
that was raised whenever two vectors of different sizes should be added or multiplied by a scalar product.<br />
However I found, that quickly almost every function in the library could potentially raise this exception<br />
and Modula-3 urges you to declare all potential exceptions.<br />
(However, ignoring potential exceptions only yields a compiler warning, that can even be suppressed.)<br />
I also noticed that due to the way I generated and combined the vectors and matrices<br />
the sizes would always match.<br />
Thus in case of a mismatch this means, there is not a problem with user input but with my program.<br />
Consequently, I removed this exception and replaced the checks by <code>ASSERT</code>.<br />
These <code>ASSERT</code>s can be disabled by a compiler switch for efficiency concerns.<br />
A correct program fulfils all <code>ASSERT</code>s and thus it does not make a difference<br />
whether they are present in the compiled program or not.<br />
In a faulty program the presence (or lack) of <code>ASSERT</code>s controls the way a program fails: <br />
either by continuing execution giving wrong results, or by immediate termination due to a failed assertion.<br />
<br />
With the new handling of vector size compatibility,<br />
if the operands of a vector addition originate from user input,<br />
then you have to check that their sizes match before you call vector addition.<br />
However this is a cheap check.<br />
Thus if you want another criterion for distinction of errors and exceptions:<br />
Errors can be prevented by (cheap) checks in advance,<br />
whereas exceptions can only be handled after a risky action was run.<br />
You can easily check for array indices being within array bounds, pointers for being not <code>NULL</code>,<br />
divisors for being not zero before calling according functions.<br />
In many cases you will not need those checks,<br />
because e.g. you have a loop traversing all valid indices of an array,<br />
and consequently you know that every index is allowed.<br />
You do not need to check exceptions afterwards.<br />
In contrast to that, memory full, disk full, file not existing, file without write permission<br />
and even overflows are clearly exceptions.<br />
Even if you check that there is enough memory available before allocating,<br />
the required chunk of memory might just be allocated by someone else between your memory check and your allocation.<br />
The file permission might be just changed between checking the permission and writing to the file.<br />
Permissions might even change while you write.<br />
Overflows are deterministic, but in order to prevent an overflow say for a multiplication,<br />
you have to reimplement the multiplication in an overflow-proof way.<br />
This will be slower than the actual multiplication.<br />
(Processors always show overflows by flags,<br />
but almost none of the popular high-level languages allows to query this information.)<br />
<br />
My conclusion is that (programming) errors can only be handled by the programmer,<br />
not by the running program.<br />
Thus the term "error handling" sounds contradictory to me.<br />
However supporting a programmer with finding errors (bugs) in their programs is a good thing.<br />
I just wouldn't call it "error handling" but "debugging".<br />
An important example in Haskell is the module <hask>Debug.Trace</hask>. It provides the function <hask>trace</hask> that looks like a non-I/O function<br />
but actually outputs something on the console.<br />
It is natural that debugging functions employ hacks.<br />
For finding a programming error it would be inappropriate to transform the program code<br />
to allow I/O in a set of functions that do not need it otherwise.<br />
The change would only persist until the bug is detected and fixed.<br />
Summarized, hacks in debugging functions<br />
are necessary for quickly finding problems without large restructuring of the program<br />
and they are not problematic, because they only exist until the bug is removed.<br />
<br />
Different from that exceptions are things you cannot fix in advance.<br />
You will always have to live with files that cannot be found and user input that is malformed.<br />
You can insist that the user does not hit the X key, but your program has to be prepared to receive a "X key pressed" message nonetheless.<br />
Thus exceptions belong to the program and<br />
the program must be adapted to treat exceptional values where they can occur.<br />
No hacks can be accepted for exception handling.<br />
<br />
== When exceptions become errors ==<br />
<br />
Another issue that makes distinction between exceptions and errors difficult is,<br />
that sometimes the one gets converted into the other one.<br />
<br />
It is an error to not handle an exception.<br />
If a file cannot be opened you must respect that result.<br />
You can proceed as if the file could be opened, though.<br />
If you do so you might crash the machine<br />
or the runtime system terminates your program.<br />
All of these effects are possible consequences of a (programming) error.<br />
Again, it does not matter whether the exceptional situation is signaled by a return code that you ignore or an IO exception for which you did not run a <hask>catch</hask>.<br />
<br />
== When errors become exceptions ==<br />
<br />
Often there is criticism about the distinction between errors and exceptions<br />
because there are software architectures<br />
where even programming errors of a part shall not crash a larger piece of software.<br />
Typical examples are:<br />
A process in an operating system shall not crash the whole system if it crashes itself.<br />
A buggy browser plugin shall not terminate the browser.<br />
A corrupt CGI script shall not bring the web server down, where it runs on.<br />
<br />
In these cases errors are handled like exceptions.<br />
But there is no reason to dismiss the distinction of errors and exceptions, at all.<br />
Obviously there are levels, and when crossing level boundaries it is ok to turn an error into an exception.<br />
The part that contains an error cannot do anything to recover from it.<br />
Also the next higher level cannot fix it, but it can restrict the damage.<br />
Within one encapsulated part of an architecture errors and exceptions shall be strictly separated.<br />
(Or put differently: If at one place you think you have to handle an error like an exception,<br />
why not dividing the program into two parts at this position? :-) )<br />
<br />
There is an important reason to not simply catch an error and proceed as if it were only an exception:<br />
The error might have corrupted some data and there is no general way to recover from that.<br />
Say, after detecting an error you might want to close a file that you were working on.<br />
But since you are coping with an error, something you did not foresee,<br />
you cannot know whether the file was already closed again or never opened.<br />
So it is better to just abort the program.<br />
<br />
The next higher level, the shell calling your program or the browser calling your plugin,<br />
shall have registered what has been opened and allocated and can reliably free those resources.<br />
<br />
<br />
== Errors and type system ==<br />
<br />
It is generally considered, that errors in a program imply a lack in the type system. If the type system would be strong enough and the programmers would be patient enough to work out the proofs imposed by library functions, then there would be no errors in programs at all, only exceptions.<br />
<br />
An alternative to extending the type system to [[dependent type]] system that allows for a wide range of proofs<br />
is the [[Extended Static Checking]].<br />
For example:<br />
<haskell><br />
{-# CONTRACT head :: { xs | not (null xs) } -> Ok #-}<br />
head :: [a] -> a<br />
head [] = error "head: empty list"<br />
head (x:_) = x<br />
</haskell><br />
<br />
When there is a pre-condition (or a contract) like here, it is a programming error to give an empty list to <hask>head</hask>. This means that checking if the list is empty must be done before the call. It has to statically deductible from the call site.<br />
<br />
If you write a function and cannot prove that you will not call head on the empty list then either you check before calling, or you use a safe-head function like <hask>viewL :: [a] -> Maybe (a, [a])</hask> or a <hask>case xs of x:_ -> doSomethingWithHead a; [] -> doSomethingElse</hask> or you add a pre-condition to your function.<br />
<br />
These contracts somehow look like the exception declarations, but they specify something about preconditions, not about possible results. There would be no sense to give the contracts names in order to handle different ways of violating the contracts after the function has been called with inappropriate arguments.<br />
<br />
== Call stacks ==<br />
<br />
Both for errors and exceptions some kind of call stack might be helpful to be reported to the programmer or user, respectively.<br />
However the call stacks for programmers (for debugging) noticably differ from those for users generated as result of an exception.<br />
<br />
For errors we might prefer something like:<br />
<br />
Prelude.head:42:23: empty list<br />
when calling recursively MyModule.scan.go:2009:12 and MyModule.scan.view:2009:7<br />
when calling MyGUI.promptString:1234:321<br />
... many more details ...<br />
<br />
whereas users would certainly more like to see<br />
<br />
Program could not be started,<br />
because Config file could not be read<br />
because Config file does not exist in dir0, dir1, dir2<br />
<br />
but the exception handler may also decide to use a default configuration instead or ask the user, how to proceed.<br />
<br />
== Escaping from control structures ==<br />
<br />
In imperative languages we often find statements that escape from a control structure.<br />
These escapers are refered to as exceptions, as well.<br />
E.g. in C/C++/Java <code>break</code> escapes <code>for</code> loops and <code>return</code> escapes functions and methods.<br />
Analogously in Modula-3 <code>EXIT</code> escapes <code>LOOP</code>s<br />
and <code>RETURN</code> escapes <code>PROCEDURE</code>s.<br />
The analogy between these statements and using the explicit exception handling mechanism of the particular language<br />
is also helpful in order to describe the interaction between these statements and handling of regular exceptions.<br />
E.g. what exception handlers and resource deallocators shall be run when you leave a loop or function using <code>break</code>?<br />
Analogously exceptions can also be used to escape from custom control structures<br />
(yeah, higher order functions are also possible in imperative languages)<br />
or deep recursive searches.<br />
In imperative languages exceptions are often implemented in a way that is especially efficient<br />
when deep recursions have to be aborted.<br />
<br />
You might debate intensively about whether using exceptions for escaping control structures is abuse of exceptions or not.<br />
At least escaping from control structures is more exception than error.<br />
Escaping from a control structure is just the irregular case with respect to the regular case of looping/descending in recursion.<br />
In Haskell, when you use exception monads like <code>Control.Monad.Exception.Synchronous</code> or <code>Control.Monad.Error</code>,<br />
exceptions are just an automated handling of return codes.<br />
<br />
<br />
== See also ==<br />
<br />
* [[Error]]<br />
* [[Exception]]<br />
<br />
<br />
''This article is written by Henning Thielemann. Other authors may use the terms 'error' and 'exceptions' in different ways or do not distinguish them at all.''<br />
<br />
<br />
[[Category:Idioms]]</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=GHC/Type_families&diff=62968GHC/Type families2019-07-22T19:52:14Z<p>Anton-Latukha: /* Injectivity, type inference, and ambiguity */ : fx code block, upd language</p>
<hr />
<div>{{GHCUsersGuide|glasgow_exts|type-families|a Type Family section}}<br />
<br />
<br />
Indexed type families, or '''type families''' for short, are a Haskell extension supporting ad-hoc overloading of data types. Type families are parametric types that can be assigned specialized representations based on the type parameters they are instantiated with. They are the data type analogue of [[Type class|type classes]]: families are used to define overloaded ''data'' in the same way that classes are used to define overloaded ''functions''. Type families are useful for generic programming, for creating highly parameterised library interfaces, and for creating interfaces with enhanced static information, much like dependent types.<br />
<br />
Type families come in two flavors: ''data families'' and ''type synonym families''. Data families are the indexed form of data and newtype definitions. Type synonym families are the indexed form of type synonyms. Each of these flavors can be defined in a standalone manner or ''associated'' with a type class. Standalone definitions are more general, while associated types can more clearly express how a type is used and lead to better error messages.<br />
<br />
''NB: see also Simon's [https://www.haskell.org/ghc/blog/20100930-LetGeneralisationInGhc7.html blog entry on let generalisation] for a significant change in the policy for let generalisation, driven by the type family extension. In brief: a few programs will puzzlingly fail to compile with <tt>-XTypeFamilies</tt> even though the code is legal Haskell 98.''<br />
<br />
== What are type families? ==<br />
<br />
The concept of a type family comes from type theory. An indexed type family in type theory is a partial function at the type level. Applying the function to parameters (called ''type indices'') yields a type. Type families permit a program to compute what data constructors it will operate on, rather than having them fixed statically (as with simple type systems) or treated as opaque unknowns (as with parametrically polymorphic types).<br />
<br />
Type families are to vanilla data types what type class methods are to regular functions. Vanilla polymorphic data types and functions have a single definition, which is used at all type instances. Classes and type families, on the other hand, have an interface definition and any number of instance definitions. A type family's interface definition declares its [[kind]] and its arity, or the number of type indices it takes. Instance definitions define the type family over some part of the domain.<br />
<br />
As a simple example of how type families differ from ordinary parametric data types, consider a strict list type. We can represent a list of <hask>Char</hask> in the usual way, with cons cells. We can do the same thing to represent a list of <hask>()</hask>, but since a strict <hask>()</hask> value carries no useful information, it would be more efficient to just store the length of the list. This can't be done with an ordinary parametric data type, because the data constructors used to represent the list would depend on the list's type parameter: if it's <hask>Char</hask> then the list consists of cons cells; if it's <hask>()</hask>, then the list consists of a single integer. We basically want to select between two different data types based on a type parameter. Using type families, this list type could be declared as follows:<br />
<br />
<haskell><br />
-- Declare a list-like data family<br />
data family XList a<br />
<br />
-- Declare a list-like instance for Char<br />
data instance XList Char = XCons !Char !(XList Char) | XNil<br />
<br />
-- Declare a number-like instance for ()<br />
data instance XList () = XListUnit !Int<br />
</haskell><br />
<br />
The right-hand sides of the two <code>data instance</code> declarations are exactly ordinary data definitions. In fact, a <code>data instance</code> declaration is nothing more than a shorthand for a <code>data</code> declaration followed by a <code>type instance</code> (see below) declaration. However, they define two instances of the same parametric data type, <hask>XList Char</hask> and <hask>XList ()</hask>, whereas ordinary data declarations define completely unrelated types. A recent [[Simonpj/Talk:FunWithTypeFuns|tutorial paper]] has more in-depth examples of programming with type families. <br />
<br />
[[GADT]]s bear some similarity to type families, in the sense that they allow a parametric type's constructors to depend on the type's parameters. However, all GADT constructors must be defined in one place, whereas type families can be extended. Functional dependencies are similar to type families, and many type classes that use functional dependencies can be equivalently expressed with type families. Type families provide a more functional style of type-level programming than the relational style of functional dependencies.<br />
<br />
== Requirements to use type families ==<br />
<br />
Type families are a GHC extension enabled with the <code>-XTypeFamilies</code> flag or the <code>{-# LANGUAGE TypeFamilies #-}</code> option. The first stable release of GHC that properly supports type families is 6.10.1. Release 6.8 includes an early partial deprecated implementation.<br />
<br />
== An associated data type example ==<br />
<br />
As an example, consider Ralf Hinze's [http://www.cs.ox.ac.uk/ralf.hinze/publications/GGTries.ps.gz generalised tries], a form of generic finite maps. <br />
<br />
=== The class declaration ===<br />
<br />
We define a type class whose instances are the types that we can use as keys in our generic maps:<br />
<haskell><br />
class GMapKey k where<br />
data GMap k :: * -> *<br />
empty :: GMap k v<br />
lookup :: k -> GMap k v -> Maybe v<br />
insert :: k -> v -> GMap k v -> GMap k v<br />
</haskell><br />
The interesting part is the ''associated data family'' declaration of the class. It gives a [http://www.haskell.org/ghc/docs/latest/html/users_guide/type-families.html#data-family-declarations ''kind signature''] (here <hask>* -> *</hask>) for the associated data type <hask>GMap k</hask> - analogous to how methods receive a type signature in a class declaration.<br />
<br />
What it is important to notice is that the first parameter of the associated type <hask>GMap</hask> coincides with the class parameter of <hask>GMapKey</hask>. This indicates that also in all instances of the class, the instances of the associated data type need to have their first argument match up with the instance type. In general, the type arguments of an associated type can be a subset of the class parameters (in a multi-parameter type class) and they can appear in any order, possibly in an order other than in the class head. The latter can be useful if the associated data type is partially applied in some contexts.<br />
<br />
The second important point is that as <hask>GMap k</hask> has kind <hask>* -> *</hask> and <hask>k</hask> (implicitly) has kind <hask>*</hask>, the type constructor <hask>GMap</hask> (without an argument) has kind <hask>* -> * -> *</hask>. Consequently, we see that <hask>GMap</hask> is applied to two arguments in the signatures of the methods <hask>empty</hask>, <hask>lookup</hask>, and <hask>insert</hask>.<br />
<br />
=== An Int instance ===<br />
<br />
To use Ints as keys into generic maps, we declare an instance that simply uses <hask>Data.IntMap</hask>, thusly:<br />
<haskell><br />
instance GMapKey Int where<br />
data GMap Int v = GMapInt (Data.IntMap.IntMap v)<br />
empty = GMapInt Data.IntMap.empty<br />
lookup k (GMapInt m) = Data.IntMap.lookup k m<br />
insert k v (GMapInt m) = GMapInt (Data.IntMap.insert k v m)<br />
</haskell><br />
The <hask>Int</hask> instance of the associated data type <hask>GMap</hask> needs to have both of its parameters, but as only the first one corresponds to a class parameter, the second needs to be a type variable (here <hask>v</hask>). As mentioned before, any associated type parameter that corresponds to a class parameter must be identical to the class parameter in each instance. The right hand side of the associated data declaration is like that of any other data type.<br />
<br />
NB: At the moment, GADT syntax is not allowed for associated data types (or other indexed types). This is not a fundamental limitation, but just a shortcoming of the current implementation, which we expect to lift in the future.<br />
<br />
As an exercise, implement an instance for <hask>Char</hask> that maps back to the <hask>Int</hask> instance using the conversion functions <hask>Char.ord</hask> and <hask>Char.chr</hask>.<br />
<br />
=== A unit instance ===<br />
<br />
Generic definitions, apart from elementary types, typically cover units, products, and sums. We start here with the unit instance for <hask>GMap</hask>:<br />
<haskell><br />
instance GMapKey () where<br />
data GMap () v = GMapUnit (Maybe v)<br />
empty = GMapUnit Nothing<br />
lookup () (GMapUnit v) = v<br />
insert () v (GMapUnit _) = GMapUnit $ Just v<br />
</haskell><br />
For unit, the map is just a <hask>Maybe</hask> value.<br />
<br />
=== Product and sum instances ===<br />
<br />
Next, let us define the instances for pairs and sums (i.e., <hask>Either</hask>):<br />
<haskell><br />
instance (GMapKey a, GMapKey b) => GMapKey (a, b) where<br />
data GMap (a, b) v = GMapPair (GMap a (GMap b v))<br />
empty = GMapPair empty<br />
lookup (a, b) (GMapPair gm) = lookup a gm >>= lookup b <br />
insert (a, b) v (GMapPair gm) = GMapPair $ case lookup a gm of<br />
Nothing -> insert a (insert b v empty) gm<br />
Just gm2 -> insert a (insert b v gm2 ) gm<br />
<br />
instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where<br />
data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)<br />
empty = GMapEither empty empty<br />
lookup (Left a) (GMapEither gm1 _gm2) = lookup a gm1<br />
lookup (Right b) (GMapEither _gm1 gm2 ) = lookup b gm2<br />
insert (Left a) v (GMapEither gm1 gm2) = GMapEither (insert a v gm1) gm2<br />
insert (Right b) v (GMapEither gm1 gm2) = GMapEither gm1 (insert b v gm2)<br />
</haskell><br />
If you find this code algorithmically surprising, I'd suggest to have a look at [http://www.cs.ox.ac.uk/ralf.hinze/publications/index.html#J4 Ralf Hinze's paper]. The only novelty concerning associated types, in these two instances, is that the instances have a context <hask>(GMapKey a, GMapKey b)</hask>. Consequently, the right hand sides of the associated type declarations can use <hask>GMap</hask> recursively at the key types <hask>a</hask> and <hask>b</hask> - not unlike the method definitions use the class methods recursively at the types for which the class is given in the instance context.<br />
<br />
=== Using a generic map ===<br />
<br />
Finally, some code building and querying a generic map:<br />
<haskell><br />
myGMap :: GMap (Int, Either Char ()) String<br />
myGMap = insert (5, Left 'c') "(5, Left 'c')" $<br />
insert (4, Right ()) "(4, Right ())" $<br />
insert (5, Right ()) "This is the one!" $<br />
insert (5, Right ()) "This is the two!" $<br />
insert (6, Right ()) "(6, Right ())" $<br />
insert (5, Left 'a') "(5, Left 'a')" $<br />
empty<br />
main = putStrLn $ maybe "Couldn't find key!" id $ lookup (5, Right ()) myGMap<br />
</haskell><br />
<br />
=== Download the code ===<br />
<br />
If you want to play with this example without copying it off the wiki, just download the source code[http://darcs.haskell.org/testsuite/tests/ghc-regress/indexed-types/should_run/GMapAssoc.hs] for <hask>GMap</hask> from GHC's test suite.<br />
<br />
== Detailed definition of data families ==<br />
<br />
Data families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated types). The former is the more general variant, as it lacks the requirement for the type-indices to coincide with the class parameters. However, the latter can lead to more clearly structured code and compiler warnings if some type instances were - possibly accidentally - omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types.<br />
<br />
=== Family declarations ===<br />
<br />
Indexed data families are introduced by a signature, such as <br />
<haskell><br />
data family GMap k :: * -> *<br />
</haskell><br />
The special <hask>family</hask> distinguishes family from standard data declarations. The result kind annotation is optional and, as usual, defaults to <hask>*</hask> if omitted. An example is<br />
<haskell><br />
data family Array e<br />
</haskell><br />
Named arguments can also be given explicit kind signatures if needed. Just as with [http://www.haskell.org/ghc/docs/latest/html/users_guide/gadt.html GADT declarations] named arguments are entirely optional, so that we can declare <hask>Array</hask> alternatively with<br />
<haskell><br />
data family Array :: * -> *<br />
</haskell><br />
<br />
==== Associated family declarations ====<br />
<br />
When a data family is declared as part of a type class, we drop the <hask>family</hask> keyword. The <hask>GMap</hask> declaration takes the following form<br />
<haskell><br />
class GMapKey k where<br />
data GMap k :: * -> *<br />
...<br />
</haskell><br />
In contrast to toplevel declarations, named arguments must be used for all type parameters that are to be used as type-indices. Moreover, the argument names must be class parameters. Each class parameter may only be used at most once per associated type, but some may be omitted and they may be in an order other than in the class head. In other words: '''the named type parameters of the data declaration must be a permutation of a subset of the class variables'''. <br />
<br />
Example is admissible:<br />
<haskell><br />
class C a b c where { data T c a :: * } -- OK<br />
class C a b c where { data T a a :: * } -- Bad: repeated variable<br />
class D a where { data T a x :: * } -- Bad: x is not a class variable<br />
class D a where { data T a :: * -> * } -- OK<br />
</haskell><br />
<br />
=== Instance declarations ===<br />
<br />
Instance declarations of data and newtype families are very similar to standard data and newtype declarations. The only two differences are that the keyword <hask>data</hask> or <hask>newtype</hask> is followed by <hask>instance</hask> and that some or all of the type arguments can be non-variable types, but may not contain forall types or type synonym families. However, data families are generally allowed in type parameters, and type synonyms are allowed as long as they are fully applied and expand to a type that is itself admissible - exactly as this is required for occurrences of type synonyms in class instance parameters. For example, the <hask>Either</hask> instance for <hask>GMap</hask> is<br />
<haskell><br />
data instance GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)<br />
</haskell><br />
In this example, the declaration has only one variant. In general, it can be any number.<br />
<br />
Data and newtype instance declarations are only legit when an appropriate family declaration is in scope - just like class instances require the class declaration to be visible. Moreover, each instance declaration has to conform to the kind determined by its family declaration. This implies that the number of parameters of an instance declaration matches the arity determined by the kind of the family. Although all data families are declared with the <hask>data</hask> keyword, instances can be either <hask>data</hask> or <hask>newtype</hask>s, or a mix of both.<br />
<br />
Even if type families are defined as toplevel declarations, functions that perform different computations for different family instances still need to be defined as methods of type classes. In particular, the following is not possible:<br />
<haskell><br />
data family T a<br />
data instance T Int = A<br />
data instance T Char = B<br />
nonsense :: T a -> Int<br />
nonsense A = 1 -- WRONG: These two equations together...<br />
nonsense B = 2 -- ...will produce a type error.<br />
</haskell><br />
Given the functionality provided by GADTs (Generalised Algebraic Data Types), it might seem as if a definition, such as the above, should be feasible. However, type families - in contrast to GADTs - are ''open''; i.e., new instances can always be added, possibly in other modules. Supporting pattern matching across different data instances would require a form of extensible case construct.<br />
<br />
==== Associated type instances ====<br />
<br />
When an associated family instance is declared within a type class instance, we drop the <hask>instance</hask> keyword in the family instance. So, the <hask>Either</hask> instance for <hask>GMap</hask> becomes:<br />
<haskell><br />
instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where<br />
data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)<br />
...<br />
</haskell><br />
The most important point about associated family instances is that the type indices corresponding to class parameters must be identical to the type given in the instance head; here this is the first argument of <hask>GMap</hask>, namely <hask>Either a b</hask>, which coincides with the only class parameter. Any parameters to the family constructor that do not correspond to class parameters, need to be variables in every instance; here this is the variable <hask>v</hask>.<br />
<br />
Instances for an associated family can only appear as part of instance declarations of the class in which the family was declared - just as with the equations of the methods of a class. Also in correspondence to how methods are handled, declarations of associated types can be omitted in class instances. If an associated family instance is omitted, the corresponding instance type is not inhabited; i.e., only diverging expressions, such as <hask>undefined</hask>, can assume the type.<br />
<br />
==== Scoping of class parameters ====<br />
<br />
In the case of multi-parameter type classes, the visibility of class parameters in the right-hand side of associated family instances depends ''solely'' on the parameters of the data family. As an example, consider the simple class declaration<br />
<haskell><br />
class C a b where<br />
data T a<br />
</haskell><br />
Only one of the two class parameters is a parameter to the data family. Hence, the following instance declaration is invalid:<br />
<haskell><br />
instance C [c] d where<br />
data T [c] = MkT (c, d) -- WRONG!! 'd' is not in scope<br />
</haskell><br />
Here, the right-hand side of the data instance mentions the type variable <hask>d</hask> that does not occur in its left-hand side. We cannot admit such data instances as they would compromise type safety.<br />
<br />
==== Type class instances of family instances ====<br />
<br />
Type class instances of instances of data families can be defined as usual, and in particular data instance declarations can have <hask>deriving</hask> clauses. For example, we can write<br />
<haskell><br />
data GMap () v = GMapUnit (Maybe v)<br />
deriving Show<br />
</haskell><br />
which implicitly defines an instance of the form<br />
<haskell><br />
instance Show v => Show (GMap () v) where ...<br />
</haskell><br />
<br />
Note that class instances are always for particular ''instances'' of a data family and never for an entire family as a whole. This is for essentially the same reasons that we cannot define a toplevel function that performs pattern matching on the data constructors of ''different'' instances of a single type family. It would require a form of extensible case construct.<br />
<br />
==== Overlap ====<br />
<br />
The instance declarations of a data family used in a single program may not overlap at all, independent of whether they are associated or not. In contrast to type class instances, this is not only a matter of consistency, but one of type safety.<br />
<br />
=== Import and export ===<br />
<br />
The association of data constructors with type families is more dynamic than that is the case with standard data and newtype declarations. In the standard case, the notation <hask>T(..)</hask> in an import or export list denotes the type constructor and all the data constructors introduced in its declaration. However, a family declaration never introduces any data constructors; instead, data constructors are introduced by family instances. As a result, which data constructors are associated with a type family depends on the currently visible instance declarations for that family. Consequently, an import or export item of the form <hask>T(..)</hask> denotes the family constructor and all currently visible data constructors - in the case of an export item, these may be either imported or defined in the current module. The treatment of import and export items that explicitly list data constructors, such as <hask>GMap(GMapEither)</hask>, is analogous.<br />
<br />
==== Associated families ====<br />
<br />
As expected, an import or export item of the form <hask>C(..)</hask> denotes all of the class' methods and associated types. However, when associated types are explicitly listed as subitems of a class, we need some new syntax, as uppercase identifiers as subitems are usually data constructors, not type constructors. To clarify that we denote types here, each associated type name needs to be prefixed by the keyword <hask>type</hask>. So for example, when explicitly listing the components of the <hask>GMapKey</hask> class, we write <hask>GMapKey(type GMap, empty, lookup, insert)</hask>.<br />
<br />
==== Examples ====<br />
<br />
Assuming our running <hask>GMapKey</hask> class example, let us look at some export lists and their meaning:<br />
<br />
* <hask>module GMap (GMapKey) where...</hask>: Exports just the class name.<br />
* <hask>module GMap (GMapKey(..)) where...</hask>: Exports the class, the associated type <hask>GMap</hask> and the member functions <hask>empty</hask>, <hask>lookup</hask>, and <hask>insert</hask>. None of the data constructors is exported.<br />
* <hask>module GMap (GMapKey(..), GMap(..)) where...</hask>: As before, but also exports all the data constructors <hask>GMapInt</hask>, <hask>GMapChar</hask>, <hask>GMapUnit</hask>, <hask>GMapPair</hask>, and <hask>GMapEither</hask>.<br />
* <hask>module GMap (GMapKey(empty, lookup, insert), GMap(..)) where...</hask>: As before.<br />
* <hask>module GMap (GMapKey, empty, lookup, insert, GMap(..)) where...</hask>: As before.<br />
<br />
Finally, you can write <hask>GMapKey(type GMap)</hask> to denote both the class <hask>GMapKey</hask> as well as its associated type <hask>GMap</hask>. However, you cannot write <hask>GMapKey(type GMap(..))</hask> &mdash; i.e., sub-component specifications cannot be nested. To specify <hask>GMap</hask>'s data constructors, you have to list it separately.<br />
<br />
==== Instances ====<br />
<br />
Family instances are implicitly exported, just like class instances. However, this applies only to the heads of instances, not to the data constructors an instance defines.<br />
<br />
== An associated type synonym example ==<br />
<br />
Type synonym families are an alternative to functional dependencies, which makes functional dependency examples well suited to introduce type synonym families. In fact, type families are a more functional way to express the same as functional dependencies (despite the name!), as they replace the relational notation of functional dependencies by an expression-oriented notation; i.e., functions on types are really represented by functions and not relations.<br />
<br />
=== The <hask>class</hask> declaration ===<br />
<br />
Here's an example from Mark Jones' seminal paper on functional dependencies:<br />
<haskell><br />
class Collects e ce | ce -> e where<br />
empty :: ce<br />
insert :: e -> ce -> ce<br />
member :: e -> ce -> Bool<br />
toList :: ce -> [e]<br />
</haskell><br />
<br />
With associated type synonyms we can write this as<br />
<haskell><br />
class Collects ce where<br />
type Elem ce<br />
empty :: ce<br />
insert :: Elem ce -> ce -> ce<br />
member :: Elem ce -> ce -> Bool<br />
toList :: ce -> [Elem ce]<br />
</haskell><br />
Instead of the multi-parameter type class, we use a single parameter class, and the parameter <hask>e</hask><br />
turned into an associated type synonym <hask>Elem ce</hask>.<br />
<br />
=== An <hask>instance</hask>===<br />
<br />
Instances change correspondingly. An instance of the two-parameter class<br />
<haskell><br />
instance Eq e => Collects e [e] where<br />
empty = []<br />
insert e l = (e:l)<br />
member e [] = False<br />
member e (x:xs) <br />
| e == x = True<br />
| otherwise = member e xs<br />
toList l = l<br />
</haskell><br />
becomes an instance of a single-parameter class, where the dependent type parameter turns into an associated type instance declaration:<br />
<haskell><br />
instance Eq e => Collects [e] where<br />
type Elem [e] = e<br />
empty = []<br />
insert e l = (e:l)<br />
member e [] = False<br />
member e (x:xs) <br />
| e == x = True<br />
| otherwise = member e xs<br />
toList l = l<br />
</haskell><br />
<br />
=== Using generic collections ===<br />
<br />
With Functional Dependencies the code would be:<br />
<haskell><br />
sumCollects :: (Collects e c1, Collects e c2) => c1 -> c2 -> c2<br />
sumCollects c1 c2 = foldr insert c2 (toList c1)<br />
</haskell><br />
<br />
In contrast, with associated type synonyms, we get:<br />
<haskell><br />
sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 -> c2 -> c2<br />
sumCollects c1 c2 = foldr insert c2 (toList c1)<br />
</haskell><br />
<br />
== Detailed definition of type synonym families ==<br />
<br />
Type families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated type synonyms). The former is the more general variant, as it lacks the requirement for the type-indices to coincide with the class parameters. However, the latter can lead to more clearly structured code and compiler warnings if some type instances were - possibly accidentally - omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types.<br />
<br />
=== Family declarations ===<br />
<br />
Indexed type families are introduced by a signature, such as <br />
<haskell><br />
type family Elem c :: *<br />
</haskell><br />
The special <hask>family</hask> distinguishes family from standard type declarations. The result kind annotation is optional and, as usual, defaults to <hask>*</hask> if omitted. An example is<br />
<haskell><br />
type family Elem c<br />
</haskell><br />
Parameters can also be given explicit kind signatures if needed. We call the number of parameters in a type family declaration, the family's arity, and all applications of a type family must be fully saturated w.r.t. to that arity. This requirement is unlike ordinary type synonyms and it implies that the kind of a type family is not sufficient to determine a family's arity, and hence in general, also insufficient to determine whether a type family application is well formed. As an example, consider the following declaration:<br />
<haskell><br />
type family F a b :: * -> * -- F's arity is 2, <br />
-- although its overall kind is * -> * -> * -> *<br />
</haskell><br />
Given this declaration the following are examples of well-formed and malformed types:<br />
<haskell><br />
F Char [Int] -- OK! Kind: * -> *<br />
F Char [Int] Bool -- OK! Kind: *<br />
F IO Bool -- WRONG: kind mismatch in the first argument<br />
F Bool -- WRONG: unsaturated application<br />
</haskell><br />
<br />
A top-level type family can be declared as open or closed. (Associated type<br />
families are always open.) A closed type family has all of its equations<br />
defined in one place and cannot be extended, whereas an open family can have<br />
instances spread across modules. The advantage of a closed family is that<br />
its equations are tried in order, similar to a term-level function definition:<br />
<haskell><br />
type family G a where<br />
G Int = Bool<br />
G a = Char<br />
</haskell><br />
With this definition, the type <hask>G Int</hask> becomes <hask>Bool</hask><br />
and, say, <hask>G Double</hask> becomes <hask>Char</hask>. See also [http://ghc.haskell.org/trac/ghc/wiki/NewAxioms here] for more information about closed type families.<br />
<br />
==== Associated family declarations ====<br />
<br />
When a type family is declared as part of a type class, we drop the <hask>family</hask> special. The <hask>Elem</hask> declaration takes the following form<br />
<haskell><br />
class Collects ce where<br />
type Elem ce :: *<br />
...<br />
</haskell><br />
Exactly as in the case of an associated data declaration, '''the named type parameters must be a permutation of a subset of the class parameters'''. Examples<br />
<haskell><br />
class C a b c where { type T c a :: * } -- OK<br />
class D a where { type T a x :: * } -- No: x is not a class parameter<br />
class D a where { type T a :: * -> * } -- OK<br />
</haskell><br />
<br />
=== Type instance declarations ===<br />
<br />
Instance declarations of open type families are very similar to standard type synonym declarations. The only two differences are that the keyword <hask>type</hask> is followed by <hask>instance</hask> and that some or all of the type arguments can be non-variable types, but may not contain forall types or type synonym families. However, data families are generally allowed, and type synonyms are allowed as long as they are fully applied and expand to a type that is admissible - these are the exact same requirements as for data instances. For example, the <hask>[e]</hask> instance for <hask>Elem</hask> is<br />
<haskell><br />
type instance Elem [e] = e<br />
</haskell><br />
<br />
A type family instance declaration must satisfy the following rules:<br />
* An appropriate family declaration is in scope - just like class instances require the class declaration to be visible. <br />
* The instance declaration conforms to the kind determined by its family declaration<br />
* The number of type parameters in an instance declaration matches the number of type parameters in the family declaration.<br />
* The right-hand side of a type instance must be a monotype (i.e., it may not include foralls) and after the expansion of all saturated vanilla type synonyms, no synonyms, except family synonyms may remain.<br />
<br />
Here are some examples of admissible and illegal type instances and closed families:<br />
<haskell><br />
type family F a :: *<br />
type instance F [Int] = Int -- OK!<br />
type instance F String = Char -- OK!<br />
type instance F (F a) = a -- WRONG: type parameter mentions a type family<br />
type instance F (forall a. (a, b)) = b -- WRONG: a forall type appears in a type parameter<br />
type instance F Float = forall a.a -- WRONG: right-hand side may not be a forall type<br />
<br />
type family F2 a where -- OK!<br />
F2 (Maybe Int) = Int<br />
F2 (Maybe Bool) = Bool<br />
F2 (Maybe a) = String<br />
<br />
type family G a b :: * -> *<br />
type instance G Int = (,) -- WRONG: must be two type parameters<br />
type instance G Int Char Float = Double -- WRONG: must be two type parameters<br />
</haskell><br />
<br />
==== Closed family simplification ====<br />
<br />
Included in ghc starting 7.8.1.<br />
<br />
When dealing with closed families, simplifying the type is harder than just finding a left-hand side that matches and replacing that with a right-hand side. GHC will select an equation to use in a given type family application (the "target") if and only if the following 2 conditions hold:<br />
<br />
# There is a substitution from the variables in the equation's LHS that makes the left-hand side of the branch coincide with the target.<br />
# For each previous equation in the family: either the LHS of that equation is ''apart'' from the type family application, '''or''' the equation is ''compatible'' with the chosen equation.<br />
<br />
Now, we define ''apart'' and ''compatible'':<br />
# Two types are ''apart'' when one cannot simplify to the other, even after arbitrary type-family simplifications<br />
# Two equations are ''compatible'' if, either, their LHSs are apart or their LHSs unify and their RHSs are the same under the substitution induced by the unification.<br />
<br />
Some examples are in order:<br />
<haskell><br />
type family F a where<br />
F Int = Bool<br />
F Bool = Char<br />
F a = Bool<br />
<br />
type family And (a :: Bool) (b :: Bool) :: Bool where<br />
And False c = False<br />
And True d = d<br />
And e False = False<br />
And f True = f<br />
And g g = g<br />
</haskell><br />
<br />
In <hask>F</hask>, all pairs of equations are compatible except the second and third. The first two are compatible because their LHSs are apart. The first and third are compatible because the unifying substitution leads the RHSs to be the same. But, the second and third are not compatible because neither of these conditions holds. As a result, GHC will not use the third equation to simplify a target unless that target is apart from <hask>Bool</hask>.<br />
<br />
In <hask>And</hask>, ''every'' pair of equations is compatible, meaning GHC never has to make the extra apartness check during simplification.<br />
<br />
Why do all of this? It's a matter of type safety. Consider this example:<br />
<br />
<haskell><br />
type family J a b where<br />
J a a = Int<br />
J a b = Bool<br />
</haskell><br />
<br />
Say GHC selected the second branch just because the first doesn't apply at the moment, because two type variables are distinct. The problem is that those variables might later be instantiated at the same value, and then the first branch would have applied. You can convince this sort of inconsistency to produce <hask>unsafeCoerce</hask>.<br />
<br />
It gets worse. GHC has no internal notion of inequality, so it can't use previous, failed term-level GADT pattern matches to refine its type assumptions. For example:<br />
<br />
<haskell><br />
data G :: * -> * where<br />
GInt :: G Int<br />
GBool :: G Bool<br />
<br />
type family Foo (a :: *) :: * where<br />
Foo Int = Char<br />
Foo a = Double<br />
<br />
bar :: G a -> Foo a<br />
bar GInt = 'x'<br />
bar _ = 3.14<br />
</haskell><br />
<br />
The last line will fail to typecheck, because GHC doesn't know that the type variable <hask>a</hask> can't be <hask>Int</hask> here, even though it's obvious. The only general way to fix this is to have inequality evidence introduced into GHC, and that's a big deal and we don't know if we have the motivation for such a change yet.<br />
<br />
==== Associated type instances ====<br />
<br />
When an associated family instance is declared within a type class instance, we drop the <hask>instance</hask> keyword in the family instance. So, the <hask>[e]</hask> instance for <hask>Elem</hask> becomes:<br />
<haskell><br />
instance (Eq (Elem [e])) => Collects ([e]) where<br />
type Elem [e] = e<br />
...<br />
</haskell><br />
The most important point about associated family instances is that the type indexes corresponding to class parameters must be identical to the type given in the instance head; here this is <hask>[e]</hask>, which coincides with the only class parameter.<br />
<br />
Instances for an associated family can only appear as part of instance declarations of the class in which the family was declared - just as with the equations of the methods of a class. Also in correspondence to how methods are handled, declarations of associated types can be omitted in class instances. If an associated family instance is omitted, the corresponding instance type is not inhabited; i.e., only diverging expressions, such as <hask>undefined</hask>, can assume the type.<br />
<br />
==== Overlap ====<br />
<br />
The instance declarations of an open type family used in a single program must be compatible, in the form defined above. This condition is independent of whether the type family is associated or not, and it is not only a matter of consistency, but one of type safety. <br />
<br />
Here are two examples to illustrate the condition under which overlap is permitted.<br />
<haskell><br />
type instance F (a, Int) = [a]<br />
type instance F (Int, b) = [b] -- overlap permitted<br />
<br />
type instance G (a, Int) = [a]<br />
type instance G (Char, a) = [a] -- ILLEGAL overlap, as [Char] /= [Int]<br />
</haskell><br />
<br />
==== Decidability ====<br />
<br />
In order to guarantee that type inference in the presence of type families is decidable, we need to place a number of additional restrictions on the formation of type instance declarations (c.f., Definition 5 (Relaxed Conditions) of [http://www.cse.unsw.edu.au/~chak/papers/SPCS08.html Type Checking with Open Type Functions]). Instance declarations have the general form<br />
<haskell><br />
type instance F t1 .. tn = t<br />
</haskell><br />
where we require that for every type family application <hask>(G s1 .. sm)</hask> in <hask>t</hask>, <br />
# <hask>s1 .. sm</hask> do not contain any type family constructors,<br />
# the total number of symbols (data type constructors and type variables) in <hask>s1 .. sm</hask> is strictly smaller than in <hask>t1 .. tn</hask>, and<br />
# for every type variable <hask>a</hask>, <hask>a</hask> occurs in <hask>s1 .. sm</hask> at most as often as in <hask>t1 .. tn</hask>.<br />
These restrictions are easily verified and ensure termination of type inference. However, they are not sufficient to guarantee completeness of type inference in the presence of, so called, ''loopy equalities'', such as <hask>a ~ [F a]</hask>, where a recursive occurrence of a type variable is underneath a family application and data constructor application - see the above mentioned paper for details. <br />
<br />
If the option <tt>-XUndecidableInstances</tt> is passed to the compiler, the above restrictions are not enforced and it is on the programmer to ensure termination of the normalisation of type families during type inference.<br />
<br />
=== Equality constraints ===<br />
<br />
Type context can include equality constraints of the form <hask>t1 ~ t2</hask>, which denote that the types <hask>t1</hask> and <hask>t2</hask> need to be the same. In the presence of type families, whether two types are equal cannot generally be decided locally. Hence, the contexts of function signatures may include equality constraints, as in the following example:<br />
<haskell><br />
sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 -> c2 -> c2<br />
</haskell><br />
where we require that the element type of <hask>c1</hask> and <hask>c2</hask> are the same. In general, the types <hask>t1</hask> and <hask>t2</hask> of an equality constraint may be arbitrary monotypes; i.e., they may not contain any quantifiers, independent of whether higher-rank types are otherwise enabled.<br />
<br />
Equality constraints can also appear in class and instance contexts. The former enable a simple translation of programs using functional dependencies into programs using family synonyms instead. The general idea is to rewrite a class declaration of the form<br />
<haskell><br />
class C a b | a -> b<br />
</haskell><br />
to<br />
<haskell><br />
class (F a ~ b) => C a b where<br />
type F a<br />
</haskell><br />
That is, we represent every functional dependency (FD) <hask>a1 .. an -> b</hask> by an FD type family <hask>F a1 .. an</hask> and a superclass context equality <hask>F a1 .. an ~ b</hask>, essentially giving a name to the functional dependency. In class instances, we define the type instances of FD families in accordance with the class head. Method signatures are not affected by that process.<br />
<br />
== Frequently asked questions ==<br />
<br />
=== Comparing type families and functional dependencies ===<br />
<br />
Functional dependencies cover some of the same territory as type families. How do the two compare?<br />
There are some articles about this question:<br />
<br />
* Experiences in converting functional dependencies to type families: "[[Functional dependencies vs. type families]]"<br />
* [https://ghc.haskell.org/trac/ghc/wiki/TFvsFD GHC trac] on a comparison of functional dependencies and type families<br />
<br />
=== Injectivity, type inference, and ambiguity ===<br />
<br />
A common problem is if:<br />
<br />
<haskell><br />
type family F a<br />
<br />
f :: F a -> F a<br />
f = undefined<br />
<br />
g :: F Int -> F Int<br />
g x = f x<br />
</haskell><br />
<br />
Then compiler complains about the definition of <tt>g</tt>:<br />
<br />
<code><br />
Couldn't match expected type `F Int' against inferred type `F a1'.<br />
</code><br />
<br />
In type-checking <tt>g</tt>'s right hand side GHC discovers (by instantiating <tt>f</tt>'s type with a fresh type variable) that it has type <tt>F a1 -> F a1</tt> for some as-yet-unknown type <tt>a1</tt>. Now it tries to make the inferred type match <tt>g</tt>'s type signature. Well, you say, just make <tt>a1</tt> equal to <tt>Int</tt> and you are done. True, but what if there were these instances<br />
<haskell><br />
type instance F Int = Bool<br />
type instance F Char = Bool<br />
</haskell><br />
Then making <tt>a1</tt> equal to <tt>Char</tt> would ''also'' make the two types equal. Because there is (potentially) more than one choice, the program is rejected.<br />
<br />
However (and confusingly) if you omit the type signature on <tt>g</tt> altogether, thus<br />
<haskell><br />
f :: F a -> F a<br />
f = undefined<br />
<br />
g x = f x<br />
</haskell><br />
GHC will happily infer the type <tt>g :: F a -> F a</tt>. But you can't ''write'' that type signature or, indeed, the more specific one above. (Arguably this behaviour, where GHC ''infers'' a type it can't ''check'', is very confusing. I suppose we could make GHC reject both programs, with and without type signatures.)<br />
<br />
'''What is the problem?''' The nub of the issue is this: knowing that <tt>F t1</tt>=<tt>F t2</tt> does ''not'' imply that <tt>t1</tt> = <tt>t2</tt>.<br />
The difficulty is that the type function <tt>F</tt> need not be ''injective''; it can map two distinct types to the same type. For an injective type constructor like <tt>Maybe</tt>, if we know that <tt>Maybe t1</tt> = <tt>Maybe t2</tt>, then we know that <tt>t1</tt> = <tt>t2</tt>. But not so for non-injective type functions.<br />
<br />
The problem starts with <tt>f</tt>. Its type is ''ambiguous''; even if I know the argument and result types for <tt>f</tt>, I cannot use that to find the type at which <tt>a</tt> should be instantiated. (So arguably, <tt>f</tt> should be rejected as having an ambiguous type, and probably will be in future.) The situation is well known in type classes: <br />
<haskell><br />
bad :: (Read a, Show a) => String -> String<br />
bad x = show (read x)<br />
</haskell><br />
At a call of <tt>bad</tt> one cannot tell at what type <tt>a</tt> should be instantiated.<br />
<br />
The only solution is to avoid ambiguous types. In the type signature of a function, <br />
* Ensure that every type variable occurs in the part after the "<tt>=></tt>"<br />
* Ensure that every type variable appears at least once outside a type function call.<br />
Alternatively, you can use data families, which create new types and are therefore injective. The following code works:<br />
<br />
<haskell>data family F a<br />
<br />
f :: F a -> F a<br />
f = undefined<br />
<br />
g :: F Int -> F Int<br />
g x = f x</haskell><br />
<br />
== References ==<br />
<br />
* [http://www.cse.unsw.edu.au/~chak/papers/CKPM05.html Associated Types with Class.] Manuel M. T. Chakravarty, Gabriele Keller, Simon Peyton Jones, and Simon Marlow. In ''Proceedings of The 32nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'05)'', pages 1-13, ACM Press, 2005.<br />
* [http://www.cse.unsw.edu.au/~chak/papers/CKP05.html Associated Type Synonyms.] Manuel M. T. Chakravarty, Gabriele Keller, and Simon Peyton Jones. In ''Proceedings of The Tenth ACM SIGPLAN International Conference on Functional Programming'', ACM Press, pages 241-253, 2005.<br />
* [http://www.cse.unsw.edu.au/~chak/papers/SCPD07.html System F with Type Equality Coercions.] Martin Sulzmann, Manuel M. T. Chakravarty, Simon Peyton Jones, and Kevin Donnelly. In ''Proceedings of The Third ACM SIGPLAN Workshop on Types in Language Design and Implementation'', ACM Press, 2007.<br />
* [http://www.cse.unsw.edu.au/~chak/papers/SPCS08.html Type Checking With Open Type Functions.] Tom Schrijvers, Simon Peyton-Jones, Manuel M. T. Chakravarty, Martin Sulzmann. In ''Proceedings of The 13th ACM SIGPLAN International Conference on Functional Programming'', ACM Press, pages 51-62, 2008.<br />
* [[Simonpj/Talk:FunWithTypeFuns | Fun with Type Functions]] Oleg Kiselyov, Simon Peyton Jones, Chung-chieh Shan (the source for this paper can be found at http://patch-tag.com/r/schoenfinkel/typefunctions/wiki ) <br />
<br />
[[Category:Type-level programming]]<br />
[[Category:Language extensions]]<br />
[[Category:GHC|Indexed types]]</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=GHC/Type_families&diff=62967GHC/Type families2019-07-22T19:08:49Z<p>Anton-Latukha: /* Requirements to use type families */: upd LANGUAGE TypeFamilies is an option of pragma</p>
<hr />
<div>{{GHCUsersGuide|glasgow_exts|type-families|a Type Family section}}<br />
<br />
<br />
Indexed type families, or '''type families''' for short, are a Haskell extension supporting ad-hoc overloading of data types. Type families are parametric types that can be assigned specialized representations based on the type parameters they are instantiated with. They are the data type analogue of [[Type class|type classes]]: families are used to define overloaded ''data'' in the same way that classes are used to define overloaded ''functions''. Type families are useful for generic programming, for creating highly parameterised library interfaces, and for creating interfaces with enhanced static information, much like dependent types.<br />
<br />
Type families come in two flavors: ''data families'' and ''type synonym families''. Data families are the indexed form of data and newtype definitions. Type synonym families are the indexed form of type synonyms. Each of these flavors can be defined in a standalone manner or ''associated'' with a type class. Standalone definitions are more general, while associated types can more clearly express how a type is used and lead to better error messages.<br />
<br />
''NB: see also Simon's [https://www.haskell.org/ghc/blog/20100930-LetGeneralisationInGhc7.html blog entry on let generalisation] for a significant change in the policy for let generalisation, driven by the type family extension. In brief: a few programs will puzzlingly fail to compile with <tt>-XTypeFamilies</tt> even though the code is legal Haskell 98.''<br />
<br />
== What are type families? ==<br />
<br />
The concept of a type family comes from type theory. An indexed type family in type theory is a partial function at the type level. Applying the function to parameters (called ''type indices'') yields a type. Type families permit a program to compute what data constructors it will operate on, rather than having them fixed statically (as with simple type systems) or treated as opaque unknowns (as with parametrically polymorphic types).<br />
<br />
Type families are to vanilla data types what type class methods are to regular functions. Vanilla polymorphic data types and functions have a single definition, which is used at all type instances. Classes and type families, on the other hand, have an interface definition and any number of instance definitions. A type family's interface definition declares its [[kind]] and its arity, or the number of type indices it takes. Instance definitions define the type family over some part of the domain.<br />
<br />
As a simple example of how type families differ from ordinary parametric data types, consider a strict list type. We can represent a list of <hask>Char</hask> in the usual way, with cons cells. We can do the same thing to represent a list of <hask>()</hask>, but since a strict <hask>()</hask> value carries no useful information, it would be more efficient to just store the length of the list. This can't be done with an ordinary parametric data type, because the data constructors used to represent the list would depend on the list's type parameter: if it's <hask>Char</hask> then the list consists of cons cells; if it's <hask>()</hask>, then the list consists of a single integer. We basically want to select between two different data types based on a type parameter. Using type families, this list type could be declared as follows:<br />
<br />
<haskell><br />
-- Declare a list-like data family<br />
data family XList a<br />
<br />
-- Declare a list-like instance for Char<br />
data instance XList Char = XCons !Char !(XList Char) | XNil<br />
<br />
-- Declare a number-like instance for ()<br />
data instance XList () = XListUnit !Int<br />
</haskell><br />
<br />
The right-hand sides of the two <code>data instance</code> declarations are exactly ordinary data definitions. In fact, a <code>data instance</code> declaration is nothing more than a shorthand for a <code>data</code> declaration followed by a <code>type instance</code> (see below) declaration. However, they define two instances of the same parametric data type, <hask>XList Char</hask> and <hask>XList ()</hask>, whereas ordinary data declarations define completely unrelated types. A recent [[Simonpj/Talk:FunWithTypeFuns|tutorial paper]] has more in-depth examples of programming with type families. <br />
<br />
[[GADT]]s bear some similarity to type families, in the sense that they allow a parametric type's constructors to depend on the type's parameters. However, all GADT constructors must be defined in one place, whereas type families can be extended. Functional dependencies are similar to type families, and many type classes that use functional dependencies can be equivalently expressed with type families. Type families provide a more functional style of type-level programming than the relational style of functional dependencies.<br />
<br />
== Requirements to use type families ==<br />
<br />
Type families are a GHC extension enabled with the <code>-XTypeFamilies</code> flag or the <code>{-# LANGUAGE TypeFamilies #-}</code> option. The first stable release of GHC that properly supports type families is 6.10.1. Release 6.8 includes an early partial deprecated implementation.<br />
<br />
== An associated data type example ==<br />
<br />
As an example, consider Ralf Hinze's [http://www.cs.ox.ac.uk/ralf.hinze/publications/GGTries.ps.gz generalised tries], a form of generic finite maps. <br />
<br />
=== The class declaration ===<br />
<br />
We define a type class whose instances are the types that we can use as keys in our generic maps:<br />
<haskell><br />
class GMapKey k where<br />
data GMap k :: * -> *<br />
empty :: GMap k v<br />
lookup :: k -> GMap k v -> Maybe v<br />
insert :: k -> v -> GMap k v -> GMap k v<br />
</haskell><br />
The interesting part is the ''associated data family'' declaration of the class. It gives a [http://www.haskell.org/ghc/docs/latest/html/users_guide/type-families.html#data-family-declarations ''kind signature''] (here <hask>* -> *</hask>) for the associated data type <hask>GMap k</hask> - analogous to how methods receive a type signature in a class declaration.<br />
<br />
What it is important to notice is that the first parameter of the associated type <hask>GMap</hask> coincides with the class parameter of <hask>GMapKey</hask>. This indicates that also in all instances of the class, the instances of the associated data type need to have their first argument match up with the instance type. In general, the type arguments of an associated type can be a subset of the class parameters (in a multi-parameter type class) and they can appear in any order, possibly in an order other than in the class head. The latter can be useful if the associated data type is partially applied in some contexts.<br />
<br />
The second important point is that as <hask>GMap k</hask> has kind <hask>* -> *</hask> and <hask>k</hask> (implicitly) has kind <hask>*</hask>, the type constructor <hask>GMap</hask> (without an argument) has kind <hask>* -> * -> *</hask>. Consequently, we see that <hask>GMap</hask> is applied to two arguments in the signatures of the methods <hask>empty</hask>, <hask>lookup</hask>, and <hask>insert</hask>.<br />
<br />
=== An Int instance ===<br />
<br />
To use Ints as keys into generic maps, we declare an instance that simply uses <hask>Data.IntMap</hask>, thusly:<br />
<haskell><br />
instance GMapKey Int where<br />
data GMap Int v = GMapInt (Data.IntMap.IntMap v)<br />
empty = GMapInt Data.IntMap.empty<br />
lookup k (GMapInt m) = Data.IntMap.lookup k m<br />
insert k v (GMapInt m) = GMapInt (Data.IntMap.insert k v m)<br />
</haskell><br />
The <hask>Int</hask> instance of the associated data type <hask>GMap</hask> needs to have both of its parameters, but as only the first one corresponds to a class parameter, the second needs to be a type variable (here <hask>v</hask>). As mentioned before, any associated type parameter that corresponds to a class parameter must be identical to the class parameter in each instance. The right hand side of the associated data declaration is like that of any other data type.<br />
<br />
NB: At the moment, GADT syntax is not allowed for associated data types (or other indexed types). This is not a fundamental limitation, but just a shortcoming of the current implementation, which we expect to lift in the future.<br />
<br />
As an exercise, implement an instance for <hask>Char</hask> that maps back to the <hask>Int</hask> instance using the conversion functions <hask>Char.ord</hask> and <hask>Char.chr</hask>.<br />
<br />
=== A unit instance ===<br />
<br />
Generic definitions, apart from elementary types, typically cover units, products, and sums. We start here with the unit instance for <hask>GMap</hask>:<br />
<haskell><br />
instance GMapKey () where<br />
data GMap () v = GMapUnit (Maybe v)<br />
empty = GMapUnit Nothing<br />
lookup () (GMapUnit v) = v<br />
insert () v (GMapUnit _) = GMapUnit $ Just v<br />
</haskell><br />
For unit, the map is just a <hask>Maybe</hask> value.<br />
<br />
=== Product and sum instances ===<br />
<br />
Next, let us define the instances for pairs and sums (i.e., <hask>Either</hask>):<br />
<haskell><br />
instance (GMapKey a, GMapKey b) => GMapKey (a, b) where<br />
data GMap (a, b) v = GMapPair (GMap a (GMap b v))<br />
empty = GMapPair empty<br />
lookup (a, b) (GMapPair gm) = lookup a gm >>= lookup b <br />
insert (a, b) v (GMapPair gm) = GMapPair $ case lookup a gm of<br />
Nothing -> insert a (insert b v empty) gm<br />
Just gm2 -> insert a (insert b v gm2 ) gm<br />
<br />
instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where<br />
data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)<br />
empty = GMapEither empty empty<br />
lookup (Left a) (GMapEither gm1 _gm2) = lookup a gm1<br />
lookup (Right b) (GMapEither _gm1 gm2 ) = lookup b gm2<br />
insert (Left a) v (GMapEither gm1 gm2) = GMapEither (insert a v gm1) gm2<br />
insert (Right b) v (GMapEither gm1 gm2) = GMapEither gm1 (insert b v gm2)<br />
</haskell><br />
If you find this code algorithmically surprising, I'd suggest to have a look at [http://www.cs.ox.ac.uk/ralf.hinze/publications/index.html#J4 Ralf Hinze's paper]. The only novelty concerning associated types, in these two instances, is that the instances have a context <hask>(GMapKey a, GMapKey b)</hask>. Consequently, the right hand sides of the associated type declarations can use <hask>GMap</hask> recursively at the key types <hask>a</hask> and <hask>b</hask> - not unlike the method definitions use the class methods recursively at the types for which the class is given in the instance context.<br />
<br />
=== Using a generic map ===<br />
<br />
Finally, some code building and querying a generic map:<br />
<haskell><br />
myGMap :: GMap (Int, Either Char ()) String<br />
myGMap = insert (5, Left 'c') "(5, Left 'c')" $<br />
insert (4, Right ()) "(4, Right ())" $<br />
insert (5, Right ()) "This is the one!" $<br />
insert (5, Right ()) "This is the two!" $<br />
insert (6, Right ()) "(6, Right ())" $<br />
insert (5, Left 'a') "(5, Left 'a')" $<br />
empty<br />
main = putStrLn $ maybe "Couldn't find key!" id $ lookup (5, Right ()) myGMap<br />
</haskell><br />
<br />
=== Download the code ===<br />
<br />
If you want to play with this example without copying it off the wiki, just download the source code[http://darcs.haskell.org/testsuite/tests/ghc-regress/indexed-types/should_run/GMapAssoc.hs] for <hask>GMap</hask> from GHC's test suite.<br />
<br />
== Detailed definition of data families ==<br />
<br />
Data families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated types). The former is the more general variant, as it lacks the requirement for the type-indices to coincide with the class parameters. However, the latter can lead to more clearly structured code and compiler warnings if some type instances were - possibly accidentally - omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types.<br />
<br />
=== Family declarations ===<br />
<br />
Indexed data families are introduced by a signature, such as <br />
<haskell><br />
data family GMap k :: * -> *<br />
</haskell><br />
The special <hask>family</hask> distinguishes family from standard data declarations. The result kind annotation is optional and, as usual, defaults to <hask>*</hask> if omitted. An example is<br />
<haskell><br />
data family Array e<br />
</haskell><br />
Named arguments can also be given explicit kind signatures if needed. Just as with [http://www.haskell.org/ghc/docs/latest/html/users_guide/gadt.html GADT declarations] named arguments are entirely optional, so that we can declare <hask>Array</hask> alternatively with<br />
<haskell><br />
data family Array :: * -> *<br />
</haskell><br />
<br />
==== Associated family declarations ====<br />
<br />
When a data family is declared as part of a type class, we drop the <hask>family</hask> keyword. The <hask>GMap</hask> declaration takes the following form<br />
<haskell><br />
class GMapKey k where<br />
data GMap k :: * -> *<br />
...<br />
</haskell><br />
In contrast to toplevel declarations, named arguments must be used for all type parameters that are to be used as type-indices. Moreover, the argument names must be class parameters. Each class parameter may only be used at most once per associated type, but some may be omitted and they may be in an order other than in the class head. In other words: '''the named type parameters of the data declaration must be a permutation of a subset of the class variables'''. <br />
<br />
Example is admissible:<br />
<haskell><br />
class C a b c where { data T c a :: * } -- OK<br />
class C a b c where { data T a a :: * } -- Bad: repeated variable<br />
class D a where { data T a x :: * } -- Bad: x is not a class variable<br />
class D a where { data T a :: * -> * } -- OK<br />
</haskell><br />
<br />
=== Instance declarations ===<br />
<br />
Instance declarations of data and newtype families are very similar to standard data and newtype declarations. The only two differences are that the keyword <hask>data</hask> or <hask>newtype</hask> is followed by <hask>instance</hask> and that some or all of the type arguments can be non-variable types, but may not contain forall types or type synonym families. However, data families are generally allowed in type parameters, and type synonyms are allowed as long as they are fully applied and expand to a type that is itself admissible - exactly as this is required for occurrences of type synonyms in class instance parameters. For example, the <hask>Either</hask> instance for <hask>GMap</hask> is<br />
<haskell><br />
data instance GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)<br />
</haskell><br />
In this example, the declaration has only one variant. In general, it can be any number.<br />
<br />
Data and newtype instance declarations are only legit when an appropriate family declaration is in scope - just like class instances require the class declaration to be visible. Moreover, each instance declaration has to conform to the kind determined by its family declaration. This implies that the number of parameters of an instance declaration matches the arity determined by the kind of the family. Although all data families are declared with the <hask>data</hask> keyword, instances can be either <hask>data</hask> or <hask>newtype</hask>s, or a mix of both.<br />
<br />
Even if type families are defined as toplevel declarations, functions that perform different computations for different family instances still need to be defined as methods of type classes. In particular, the following is not possible:<br />
<haskell><br />
data family T a<br />
data instance T Int = A<br />
data instance T Char = B<br />
nonsense :: T a -> Int<br />
nonsense A = 1 -- WRONG: These two equations together...<br />
nonsense B = 2 -- ...will produce a type error.<br />
</haskell><br />
Given the functionality provided by GADTs (Generalised Algebraic Data Types), it might seem as if a definition, such as the above, should be feasible. However, type families - in contrast to GADTs - are ''open''; i.e., new instances can always be added, possibly in other modules. Supporting pattern matching across different data instances would require a form of extensible case construct.<br />
<br />
==== Associated type instances ====<br />
<br />
When an associated family instance is declared within a type class instance, we drop the <hask>instance</hask> keyword in the family instance. So, the <hask>Either</hask> instance for <hask>GMap</hask> becomes:<br />
<haskell><br />
instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where<br />
data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)<br />
...<br />
</haskell><br />
The most important point about associated family instances is that the type indices corresponding to class parameters must be identical to the type given in the instance head; here this is the first argument of <hask>GMap</hask>, namely <hask>Either a b</hask>, which coincides with the only class parameter. Any parameters to the family constructor that do not correspond to class parameters, need to be variables in every instance; here this is the variable <hask>v</hask>.<br />
<br />
Instances for an associated family can only appear as part of instance declarations of the class in which the family was declared - just as with the equations of the methods of a class. Also in correspondence to how methods are handled, declarations of associated types can be omitted in class instances. If an associated family instance is omitted, the corresponding instance type is not inhabited; i.e., only diverging expressions, such as <hask>undefined</hask>, can assume the type.<br />
<br />
==== Scoping of class parameters ====<br />
<br />
In the case of multi-parameter type classes, the visibility of class parameters in the right-hand side of associated family instances depends ''solely'' on the parameters of the data family. As an example, consider the simple class declaration<br />
<haskell><br />
class C a b where<br />
data T a<br />
</haskell><br />
Only one of the two class parameters is a parameter to the data family. Hence, the following instance declaration is invalid:<br />
<haskell><br />
instance C [c] d where<br />
data T [c] = MkT (c, d) -- WRONG!! 'd' is not in scope<br />
</haskell><br />
Here, the right-hand side of the data instance mentions the type variable <hask>d</hask> that does not occur in its left-hand side. We cannot admit such data instances as they would compromise type safety.<br />
<br />
==== Type class instances of family instances ====<br />
<br />
Type class instances of instances of data families can be defined as usual, and in particular data instance declarations can have <hask>deriving</hask> clauses. For example, we can write<br />
<haskell><br />
data GMap () v = GMapUnit (Maybe v)<br />
deriving Show<br />
</haskell><br />
which implicitly defines an instance of the form<br />
<haskell><br />
instance Show v => Show (GMap () v) where ...<br />
</haskell><br />
<br />
Note that class instances are always for particular ''instances'' of a data family and never for an entire family as a whole. This is for essentially the same reasons that we cannot define a toplevel function that performs pattern matching on the data constructors of ''different'' instances of a single type family. It would require a form of extensible case construct.<br />
<br />
==== Overlap ====<br />
<br />
The instance declarations of a data family used in a single program may not overlap at all, independent of whether they are associated or not. In contrast to type class instances, this is not only a matter of consistency, but one of type safety.<br />
<br />
=== Import and export ===<br />
<br />
The association of data constructors with type families is more dynamic than that is the case with standard data and newtype declarations. In the standard case, the notation <hask>T(..)</hask> in an import or export list denotes the type constructor and all the data constructors introduced in its declaration. However, a family declaration never introduces any data constructors; instead, data constructors are introduced by family instances. As a result, which data constructors are associated with a type family depends on the currently visible instance declarations for that family. Consequently, an import or export item of the form <hask>T(..)</hask> denotes the family constructor and all currently visible data constructors - in the case of an export item, these may be either imported or defined in the current module. The treatment of import and export items that explicitly list data constructors, such as <hask>GMap(GMapEither)</hask>, is analogous.<br />
<br />
==== Associated families ====<br />
<br />
As expected, an import or export item of the form <hask>C(..)</hask> denotes all of the class' methods and associated types. However, when associated types are explicitly listed as subitems of a class, we need some new syntax, as uppercase identifiers as subitems are usually data constructors, not type constructors. To clarify that we denote types here, each associated type name needs to be prefixed by the keyword <hask>type</hask>. So for example, when explicitly listing the components of the <hask>GMapKey</hask> class, we write <hask>GMapKey(type GMap, empty, lookup, insert)</hask>.<br />
<br />
==== Examples ====<br />
<br />
Assuming our running <hask>GMapKey</hask> class example, let us look at some export lists and their meaning:<br />
<br />
* <hask>module GMap (GMapKey) where...</hask>: Exports just the class name.<br />
* <hask>module GMap (GMapKey(..)) where...</hask>: Exports the class, the associated type <hask>GMap</hask> and the member functions <hask>empty</hask>, <hask>lookup</hask>, and <hask>insert</hask>. None of the data constructors is exported.<br />
* <hask>module GMap (GMapKey(..), GMap(..)) where...</hask>: As before, but also exports all the data constructors <hask>GMapInt</hask>, <hask>GMapChar</hask>, <hask>GMapUnit</hask>, <hask>GMapPair</hask>, and <hask>GMapEither</hask>.<br />
* <hask>module GMap (GMapKey(empty, lookup, insert), GMap(..)) where...</hask>: As before.<br />
* <hask>module GMap (GMapKey, empty, lookup, insert, GMap(..)) where...</hask>: As before.<br />
<br />
Finally, you can write <hask>GMapKey(type GMap)</hask> to denote both the class <hask>GMapKey</hask> as well as its associated type <hask>GMap</hask>. However, you cannot write <hask>GMapKey(type GMap(..))</hask> &mdash; i.e., sub-component specifications cannot be nested. To specify <hask>GMap</hask>'s data constructors, you have to list it separately.<br />
<br />
==== Instances ====<br />
<br />
Family instances are implicitly exported, just like class instances. However, this applies only to the heads of instances, not to the data constructors an instance defines.<br />
<br />
== An associated type synonym example ==<br />
<br />
Type synonym families are an alternative to functional dependencies, which makes functional dependency examples well suited to introduce type synonym families. In fact, type families are a more functional way to express the same as functional dependencies (despite the name!), as they replace the relational notation of functional dependencies by an expression-oriented notation; i.e., functions on types are really represented by functions and not relations.<br />
<br />
=== The <hask>class</hask> declaration ===<br />
<br />
Here's an example from Mark Jones' seminal paper on functional dependencies:<br />
<haskell><br />
class Collects e ce | ce -> e where<br />
empty :: ce<br />
insert :: e -> ce -> ce<br />
member :: e -> ce -> Bool<br />
toList :: ce -> [e]<br />
</haskell><br />
<br />
With associated type synonyms we can write this as<br />
<haskell><br />
class Collects ce where<br />
type Elem ce<br />
empty :: ce<br />
insert :: Elem ce -> ce -> ce<br />
member :: Elem ce -> ce -> Bool<br />
toList :: ce -> [Elem ce]<br />
</haskell><br />
Instead of the multi-parameter type class, we use a single parameter class, and the parameter <hask>e</hask><br />
turned into an associated type synonym <hask>Elem ce</hask>.<br />
<br />
=== An <hask>instance</hask>===<br />
<br />
Instances change correspondingly. An instance of the two-parameter class<br />
<haskell><br />
instance Eq e => Collects e [e] where<br />
empty = []<br />
insert e l = (e:l)<br />
member e [] = False<br />
member e (x:xs) <br />
| e == x = True<br />
| otherwise = member e xs<br />
toList l = l<br />
</haskell><br />
becomes an instance of a single-parameter class, where the dependent type parameter turns into an associated type instance declaration:<br />
<haskell><br />
instance Eq e => Collects [e] where<br />
type Elem [e] = e<br />
empty = []<br />
insert e l = (e:l)<br />
member e [] = False<br />
member e (x:xs) <br />
| e == x = True<br />
| otherwise = member e xs<br />
toList l = l<br />
</haskell><br />
<br />
=== Using generic collections ===<br />
<br />
With Functional Dependencies the code would be:<br />
<haskell><br />
sumCollects :: (Collects e c1, Collects e c2) => c1 -> c2 -> c2<br />
sumCollects c1 c2 = foldr insert c2 (toList c1)<br />
</haskell><br />
<br />
In contrast, with associated type synonyms, we get:<br />
<haskell><br />
sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 -> c2 -> c2<br />
sumCollects c1 c2 = foldr insert c2 (toList c1)<br />
</haskell><br />
<br />
== Detailed definition of type synonym families ==<br />
<br />
Type families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated type synonyms). The former is the more general variant, as it lacks the requirement for the type-indices to coincide with the class parameters. However, the latter can lead to more clearly structured code and compiler warnings if some type instances were - possibly accidentally - omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types.<br />
<br />
=== Family declarations ===<br />
<br />
Indexed type families are introduced by a signature, such as <br />
<haskell><br />
type family Elem c :: *<br />
</haskell><br />
The special <hask>family</hask> distinguishes family from standard type declarations. The result kind annotation is optional and, as usual, defaults to <hask>*</hask> if omitted. An example is<br />
<haskell><br />
type family Elem c<br />
</haskell><br />
Parameters can also be given explicit kind signatures if needed. We call the number of parameters in a type family declaration, the family's arity, and all applications of a type family must be fully saturated w.r.t. to that arity. This requirement is unlike ordinary type synonyms and it implies that the kind of a type family is not sufficient to determine a family's arity, and hence in general, also insufficient to determine whether a type family application is well formed. As an example, consider the following declaration:<br />
<haskell><br />
type family F a b :: * -> * -- F's arity is 2, <br />
-- although its overall kind is * -> * -> * -> *<br />
</haskell><br />
Given this declaration the following are examples of well-formed and malformed types:<br />
<haskell><br />
F Char [Int] -- OK! Kind: * -> *<br />
F Char [Int] Bool -- OK! Kind: *<br />
F IO Bool -- WRONG: kind mismatch in the first argument<br />
F Bool -- WRONG: unsaturated application<br />
</haskell><br />
<br />
A top-level type family can be declared as open or closed. (Associated type<br />
families are always open.) A closed type family has all of its equations<br />
defined in one place and cannot be extended, whereas an open family can have<br />
instances spread across modules. The advantage of a closed family is that<br />
its equations are tried in order, similar to a term-level function definition:<br />
<haskell><br />
type family G a where<br />
G Int = Bool<br />
G a = Char<br />
</haskell><br />
With this definition, the type <hask>G Int</hask> becomes <hask>Bool</hask><br />
and, say, <hask>G Double</hask> becomes <hask>Char</hask>. See also [http://ghc.haskell.org/trac/ghc/wiki/NewAxioms here] for more information about closed type families.<br />
<br />
==== Associated family declarations ====<br />
<br />
When a type family is declared as part of a type class, we drop the <hask>family</hask> special. The <hask>Elem</hask> declaration takes the following form<br />
<haskell><br />
class Collects ce where<br />
type Elem ce :: *<br />
...<br />
</haskell><br />
Exactly as in the case of an associated data declaration, '''the named type parameters must be a permutation of a subset of the class parameters'''. Examples<br />
<haskell><br />
class C a b c where { type T c a :: * } -- OK<br />
class D a where { type T a x :: * } -- No: x is not a class parameter<br />
class D a where { type T a :: * -> * } -- OK<br />
</haskell><br />
<br />
=== Type instance declarations ===<br />
<br />
Instance declarations of open type families are very similar to standard type synonym declarations. The only two differences are that the keyword <hask>type</hask> is followed by <hask>instance</hask> and that some or all of the type arguments can be non-variable types, but may not contain forall types or type synonym families. However, data families are generally allowed, and type synonyms are allowed as long as they are fully applied and expand to a type that is admissible - these are the exact same requirements as for data instances. For example, the <hask>[e]</hask> instance for <hask>Elem</hask> is<br />
<haskell><br />
type instance Elem [e] = e<br />
</haskell><br />
<br />
A type family instance declaration must satisfy the following rules:<br />
* An appropriate family declaration is in scope - just like class instances require the class declaration to be visible. <br />
* The instance declaration conforms to the kind determined by its family declaration<br />
* The number of type parameters in an instance declaration matches the number of type parameters in the family declaration.<br />
* The right-hand side of a type instance must be a monotype (i.e., it may not include foralls) and after the expansion of all saturated vanilla type synonyms, no synonyms, except family synonyms may remain.<br />
<br />
Here are some examples of admissible and illegal type instances and closed families:<br />
<haskell><br />
type family F a :: *<br />
type instance F [Int] = Int -- OK!<br />
type instance F String = Char -- OK!<br />
type instance F (F a) = a -- WRONG: type parameter mentions a type family<br />
type instance F (forall a. (a, b)) = b -- WRONG: a forall type appears in a type parameter<br />
type instance F Float = forall a.a -- WRONG: right-hand side may not be a forall type<br />
<br />
type family F2 a where -- OK!<br />
F2 (Maybe Int) = Int<br />
F2 (Maybe Bool) = Bool<br />
F2 (Maybe a) = String<br />
<br />
type family G a b :: * -> *<br />
type instance G Int = (,) -- WRONG: must be two type parameters<br />
type instance G Int Char Float = Double -- WRONG: must be two type parameters<br />
</haskell><br />
<br />
==== Closed family simplification ====<br />
<br />
Included in ghc starting 7.8.1.<br />
<br />
When dealing with closed families, simplifying the type is harder than just finding a left-hand side that matches and replacing that with a right-hand side. GHC will select an equation to use in a given type family application (the "target") if and only if the following 2 conditions hold:<br />
<br />
# There is a substitution from the variables in the equation's LHS that makes the left-hand side of the branch coincide with the target.<br />
# For each previous equation in the family: either the LHS of that equation is ''apart'' from the type family application, '''or''' the equation is ''compatible'' with the chosen equation.<br />
<br />
Now, we define ''apart'' and ''compatible'':<br />
# Two types are ''apart'' when one cannot simplify to the other, even after arbitrary type-family simplifications<br />
# Two equations are ''compatible'' if, either, their LHSs are apart or their LHSs unify and their RHSs are the same under the substitution induced by the unification.<br />
<br />
Some examples are in order:<br />
<haskell><br />
type family F a where<br />
F Int = Bool<br />
F Bool = Char<br />
F a = Bool<br />
<br />
type family And (a :: Bool) (b :: Bool) :: Bool where<br />
And False c = False<br />
And True d = d<br />
And e False = False<br />
And f True = f<br />
And g g = g<br />
</haskell><br />
<br />
In <hask>F</hask>, all pairs of equations are compatible except the second and third. The first two are compatible because their LHSs are apart. The first and third are compatible because the unifying substitution leads the RHSs to be the same. But, the second and third are not compatible because neither of these conditions holds. As a result, GHC will not use the third equation to simplify a target unless that target is apart from <hask>Bool</hask>.<br />
<br />
In <hask>And</hask>, ''every'' pair of equations is compatible, meaning GHC never has to make the extra apartness check during simplification.<br />
<br />
Why do all of this? It's a matter of type safety. Consider this example:<br />
<br />
<haskell><br />
type family J a b where<br />
J a a = Int<br />
J a b = Bool<br />
</haskell><br />
<br />
Say GHC selected the second branch just because the first doesn't apply at the moment, because two type variables are distinct. The problem is that those variables might later be instantiated at the same value, and then the first branch would have applied. You can convince this sort of inconsistency to produce <hask>unsafeCoerce</hask>.<br />
<br />
It gets worse. GHC has no internal notion of inequality, so it can't use previous, failed term-level GADT pattern matches to refine its type assumptions. For example:<br />
<br />
<haskell><br />
data G :: * -> * where<br />
GInt :: G Int<br />
GBool :: G Bool<br />
<br />
type family Foo (a :: *) :: * where<br />
Foo Int = Char<br />
Foo a = Double<br />
<br />
bar :: G a -> Foo a<br />
bar GInt = 'x'<br />
bar _ = 3.14<br />
</haskell><br />
<br />
The last line will fail to typecheck, because GHC doesn't know that the type variable <hask>a</hask> can't be <hask>Int</hask> here, even though it's obvious. The only general way to fix this is to have inequality evidence introduced into GHC, and that's a big deal and we don't know if we have the motivation for such a change yet.<br />
<br />
==== Associated type instances ====<br />
<br />
When an associated family instance is declared within a type class instance, we drop the <hask>instance</hask> keyword in the family instance. So, the <hask>[e]</hask> instance for <hask>Elem</hask> becomes:<br />
<haskell><br />
instance (Eq (Elem [e])) => Collects ([e]) where<br />
type Elem [e] = e<br />
...<br />
</haskell><br />
The most important point about associated family instances is that the type indexes corresponding to class parameters must be identical to the type given in the instance head; here this is <hask>[e]</hask>, which coincides with the only class parameter.<br />
<br />
Instances for an associated family can only appear as part of instance declarations of the class in which the family was declared - just as with the equations of the methods of a class. Also in correspondence to how methods are handled, declarations of associated types can be omitted in class instances. If an associated family instance is omitted, the corresponding instance type is not inhabited; i.e., only diverging expressions, such as <hask>undefined</hask>, can assume the type.<br />
<br />
==== Overlap ====<br />
<br />
The instance declarations of an open type family used in a single program must be compatible, in the form defined above. This condition is independent of whether the type family is associated or not, and it is not only a matter of consistency, but one of type safety. <br />
<br />
Here are two examples to illustrate the condition under which overlap is permitted.<br />
<haskell><br />
type instance F (a, Int) = [a]<br />
type instance F (Int, b) = [b] -- overlap permitted<br />
<br />
type instance G (a, Int) = [a]<br />
type instance G (Char, a) = [a] -- ILLEGAL overlap, as [Char] /= [Int]<br />
</haskell><br />
<br />
==== Decidability ====<br />
<br />
In order to guarantee that type inference in the presence of type families is decidable, we need to place a number of additional restrictions on the formation of type instance declarations (c.f., Definition 5 (Relaxed Conditions) of [http://www.cse.unsw.edu.au/~chak/papers/SPCS08.html Type Checking with Open Type Functions]). Instance declarations have the general form<br />
<haskell><br />
type instance F t1 .. tn = t<br />
</haskell><br />
where we require that for every type family application <hask>(G s1 .. sm)</hask> in <hask>t</hask>, <br />
# <hask>s1 .. sm</hask> do not contain any type family constructors,<br />
# the total number of symbols (data type constructors and type variables) in <hask>s1 .. sm</hask> is strictly smaller than in <hask>t1 .. tn</hask>, and<br />
# for every type variable <hask>a</hask>, <hask>a</hask> occurs in <hask>s1 .. sm</hask> at most as often as in <hask>t1 .. tn</hask>.<br />
These restrictions are easily verified and ensure termination of type inference. However, they are not sufficient to guarantee completeness of type inference in the presence of, so called, ''loopy equalities'', such as <hask>a ~ [F a]</hask>, where a recursive occurrence of a type variable is underneath a family application and data constructor application - see the above mentioned paper for details. <br />
<br />
If the option <tt>-XUndecidableInstances</tt> is passed to the compiler, the above restrictions are not enforced and it is on the programmer to ensure termination of the normalisation of type families during type inference.<br />
<br />
=== Equality constraints ===<br />
<br />
Type context can include equality constraints of the form <hask>t1 ~ t2</hask>, which denote that the types <hask>t1</hask> and <hask>t2</hask> need to be the same. In the presence of type families, whether two types are equal cannot generally be decided locally. Hence, the contexts of function signatures may include equality constraints, as in the following example:<br />
<haskell><br />
sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 -> c2 -> c2<br />
</haskell><br />
where we require that the element type of <hask>c1</hask> and <hask>c2</hask> are the same. In general, the types <hask>t1</hask> and <hask>t2</hask> of an equality constraint may be arbitrary monotypes; i.e., they may not contain any quantifiers, independent of whether higher-rank types are otherwise enabled.<br />
<br />
Equality constraints can also appear in class and instance contexts. The former enable a simple translation of programs using functional dependencies into programs using family synonyms instead. The general idea is to rewrite a class declaration of the form<br />
<haskell><br />
class C a b | a -> b<br />
</haskell><br />
to<br />
<haskell><br />
class (F a ~ b) => C a b where<br />
type F a<br />
</haskell><br />
That is, we represent every functional dependency (FD) <hask>a1 .. an -> b</hask> by an FD type family <hask>F a1 .. an</hask> and a superclass context equality <hask>F a1 .. an ~ b</hask>, essentially giving a name to the functional dependency. In class instances, we define the type instances of FD families in accordance with the class head. Method signatures are not affected by that process.<br />
<br />
== Frequently asked questions ==<br />
<br />
=== Comparing type families and functional dependencies ===<br />
<br />
Functional dependencies cover some of the same territory as type families. How do the two compare?<br />
There are some articles about this question:<br />
<br />
* Experiences in converting functional dependencies to type families: "[[Functional dependencies vs. type families]]"<br />
* [https://ghc.haskell.org/trac/ghc/wiki/TFvsFD GHC trac] on a comparison of functional dependencies and type families<br />
<br />
=== Injectivity, type inference, and ambiguity ===<br />
<br />
A common problem is this<br />
<haskell><br />
type family F a<br />
<br />
f :: F a -> F a<br />
f = undefined<br />
<br />
g :: F Int -> F Int<br />
g x = f x<br />
</haskell><br />
The compiler complains about the definition of <tt>g</tt> saying<br />
<code><br />
Couldn't match expected type `F Int' against inferred type `F a1'<br />
</code><br />
In type-checking <tt>g</tt>'s right hand side GHC discovers (by instantiating <tt>f</tt>'s type with a fresh type variable) that it has type <tt>F a1 -> F a1</tt> for some as-yet-unknown type <tt>a1</tt>. Now it tries to make the inferred type match <tt>g</tt>'s type signature. Well, you say, just make <tt>a1</tt> equal to <tt>Int</tt> and you are done. True, but what if there were these instances<br />
<haskell><br />
type instance F Int = Bool<br />
type instance F Char = Bool<br />
</haskell><br />
Then making <tt>a1</tt> equal to <tt>Char</tt> would ''also'' make the two types equal. Because there is (potentially) more than one choice, the program is rejected.<br />
<br />
However (and confusingly) if you omit the type signature on <tt>g</tt> altogether, thus<br />
<haskell><br />
f :: F a -> F a<br />
f = undefined<br />
<br />
g x = f x<br />
</haskell><br />
GHC will happily infer the type <tt>g :: F a -> F a</tt>. But you can't ''write'' that type signature or, indeed, the more specific one above. (Arguably this behaviour, where GHC ''infers'' a type it can't ''check'', is very confusing. I suppose we could make GHC reject both programs, with and without type signatures.)<br />
<br />
'''What is the problem?''' The nub of the issue is this: knowing that <tt>F t1</tt>=<tt>F t2</tt> does ''not'' imply that <tt>t1</tt> = <tt>t2</tt>.<br />
The difficulty is that the type function <tt>F</tt> need not be ''injective''; it can map two distinct types to the same type. For an injective type constructor like <tt>Maybe</tt>, if we know that <tt>Maybe t1</tt> = <tt>Maybe t2</tt>, then we know that <tt>t1</tt> = <tt>t2</tt>. But not so for non-injective type functions.<br />
<br />
The problem starts with <tt>f</tt>. Its type is ''ambiguous''; even if I know the argument and result types for <tt>f</tt>, I cannot use that to find the type at which <tt>a</tt> should be instantiated. (So arguably, <tt>f</tt> should be rejected as having an ambiguous type, and probably will be in future.) The situation is well known in type classes: <br />
<haskell><br />
bad :: (Read a, Show a) => String -> String<br />
bad x = show (read x)<br />
</haskell><br />
At a call of <tt>bad</tt> one cannot tell at what type <tt>a</tt> should be instantiated.<br />
<br />
The only solution is to avoid ambiguous types. In the type signature of a function, <br />
* Ensure that every type variable occurs in the part after the "<tt>=></tt>"<br />
* Ensure that every type variable appears at least once outside a type function call.<br />
Alternatively, you can use data families, which create new types and are therefore injective. The following code works:<br />
<br />
<haskell>data family F a<br />
<br />
f :: F a -> F a<br />
f = undefined<br />
<br />
g :: F Int -> F Int<br />
g x = f x</haskell><br />
<br />
== References ==<br />
<br />
* [http://www.cse.unsw.edu.au/~chak/papers/CKPM05.html Associated Types with Class.] Manuel M. T. Chakravarty, Gabriele Keller, Simon Peyton Jones, and Simon Marlow. In ''Proceedings of The 32nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'05)'', pages 1-13, ACM Press, 2005.<br />
* [http://www.cse.unsw.edu.au/~chak/papers/CKP05.html Associated Type Synonyms.] Manuel M. T. Chakravarty, Gabriele Keller, and Simon Peyton Jones. In ''Proceedings of The Tenth ACM SIGPLAN International Conference on Functional Programming'', ACM Press, pages 241-253, 2005.<br />
* [http://www.cse.unsw.edu.au/~chak/papers/SCPD07.html System F with Type Equality Coercions.] Martin Sulzmann, Manuel M. T. Chakravarty, Simon Peyton Jones, and Kevin Donnelly. In ''Proceedings of The Third ACM SIGPLAN Workshop on Types in Language Design and Implementation'', ACM Press, 2007.<br />
* [http://www.cse.unsw.edu.au/~chak/papers/SPCS08.html Type Checking With Open Type Functions.] Tom Schrijvers, Simon Peyton-Jones, Manuel M. T. Chakravarty, Martin Sulzmann. In ''Proceedings of The 13th ACM SIGPLAN International Conference on Functional Programming'', ACM Press, pages 51-62, 2008.<br />
* [[Simonpj/Talk:FunWithTypeFuns | Fun with Type Functions]] Oleg Kiselyov, Simon Peyton Jones, Chung-chieh Shan (the source for this paper can be found at http://patch-tag.com/r/schoenfinkel/typefunctions/wiki ) <br />
<br />
[[Category:Type-level programming]]<br />
[[Category:Language extensions]]<br />
[[Category:GHC|Indexed types]]</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=GHC/Type_families&diff=62966GHC/Type families2019-07-22T19:07:21Z<p>Anton-Latukha: /* Requirements to use type families */: rm out-of place bug reporting (that is for GHC wiki)</p>
<hr />
<div>{{GHCUsersGuide|glasgow_exts|type-families|a Type Family section}}<br />
<br />
<br />
Indexed type families, or '''type families''' for short, are a Haskell extension supporting ad-hoc overloading of data types. Type families are parametric types that can be assigned specialized representations based on the type parameters they are instantiated with. They are the data type analogue of [[Type class|type classes]]: families are used to define overloaded ''data'' in the same way that classes are used to define overloaded ''functions''. Type families are useful for generic programming, for creating highly parameterised library interfaces, and for creating interfaces with enhanced static information, much like dependent types.<br />
<br />
Type families come in two flavors: ''data families'' and ''type synonym families''. Data families are the indexed form of data and newtype definitions. Type synonym families are the indexed form of type synonyms. Each of these flavors can be defined in a standalone manner or ''associated'' with a type class. Standalone definitions are more general, while associated types can more clearly express how a type is used and lead to better error messages.<br />
<br />
''NB: see also Simon's [https://www.haskell.org/ghc/blog/20100930-LetGeneralisationInGhc7.html blog entry on let generalisation] for a significant change in the policy for let generalisation, driven by the type family extension. In brief: a few programs will puzzlingly fail to compile with <tt>-XTypeFamilies</tt> even though the code is legal Haskell 98.''<br />
<br />
== What are type families? ==<br />
<br />
The concept of a type family comes from type theory. An indexed type family in type theory is a partial function at the type level. Applying the function to parameters (called ''type indices'') yields a type. Type families permit a program to compute what data constructors it will operate on, rather than having them fixed statically (as with simple type systems) or treated as opaque unknowns (as with parametrically polymorphic types).<br />
<br />
Type families are to vanilla data types what type class methods are to regular functions. Vanilla polymorphic data types and functions have a single definition, which is used at all type instances. Classes and type families, on the other hand, have an interface definition and any number of instance definitions. A type family's interface definition declares its [[kind]] and its arity, or the number of type indices it takes. Instance definitions define the type family over some part of the domain.<br />
<br />
As a simple example of how type families differ from ordinary parametric data types, consider a strict list type. We can represent a list of <hask>Char</hask> in the usual way, with cons cells. We can do the same thing to represent a list of <hask>()</hask>, but since a strict <hask>()</hask> value carries no useful information, it would be more efficient to just store the length of the list. This can't be done with an ordinary parametric data type, because the data constructors used to represent the list would depend on the list's type parameter: if it's <hask>Char</hask> then the list consists of cons cells; if it's <hask>()</hask>, then the list consists of a single integer. We basically want to select between two different data types based on a type parameter. Using type families, this list type could be declared as follows:<br />
<br />
<haskell><br />
-- Declare a list-like data family<br />
data family XList a<br />
<br />
-- Declare a list-like instance for Char<br />
data instance XList Char = XCons !Char !(XList Char) | XNil<br />
<br />
-- Declare a number-like instance for ()<br />
data instance XList () = XListUnit !Int<br />
</haskell><br />
<br />
The right-hand sides of the two <code>data instance</code> declarations are exactly ordinary data definitions. In fact, a <code>data instance</code> declaration is nothing more than a shorthand for a <code>data</code> declaration followed by a <code>type instance</code> (see below) declaration. However, they define two instances of the same parametric data type, <hask>XList Char</hask> and <hask>XList ()</hask>, whereas ordinary data declarations define completely unrelated types. A recent [[Simonpj/Talk:FunWithTypeFuns|tutorial paper]] has more in-depth examples of programming with type families. <br />
<br />
[[GADT]]s bear some similarity to type families, in the sense that they allow a parametric type's constructors to depend on the type's parameters. However, all GADT constructors must be defined in one place, whereas type families can be extended. Functional dependencies are similar to type families, and many type classes that use functional dependencies can be equivalently expressed with type families. Type families provide a more functional style of type-level programming than the relational style of functional dependencies.<br />
<br />
== Requirements to use type families ==<br />
<br />
Type families are a GHC extension enabled with the <code>-XTypeFamilies</code> flag or the <code>{-# LANGUAGE TypeFamilies #-}</code> progma. The first stable release of GHC that properly supports type families is 6.10.1. Release 6.8 includes an early partial deprecated implementation.<br />
<br />
== An associated data type example ==<br />
<br />
As an example, consider Ralf Hinze's [http://www.cs.ox.ac.uk/ralf.hinze/publications/GGTries.ps.gz generalised tries], a form of generic finite maps. <br />
<br />
=== The class declaration ===<br />
<br />
We define a type class whose instances are the types that we can use as keys in our generic maps:<br />
<haskell><br />
class GMapKey k where<br />
data GMap k :: * -> *<br />
empty :: GMap k v<br />
lookup :: k -> GMap k v -> Maybe v<br />
insert :: k -> v -> GMap k v -> GMap k v<br />
</haskell><br />
The interesting part is the ''associated data family'' declaration of the class. It gives a [http://www.haskell.org/ghc/docs/latest/html/users_guide/type-families.html#data-family-declarations ''kind signature''] (here <hask>* -> *</hask>) for the associated data type <hask>GMap k</hask> - analogous to how methods receive a type signature in a class declaration.<br />
<br />
What it is important to notice is that the first parameter of the associated type <hask>GMap</hask> coincides with the class parameter of <hask>GMapKey</hask>. This indicates that also in all instances of the class, the instances of the associated data type need to have their first argument match up with the instance type. In general, the type arguments of an associated type can be a subset of the class parameters (in a multi-parameter type class) and they can appear in any order, possibly in an order other than in the class head. The latter can be useful if the associated data type is partially applied in some contexts.<br />
<br />
The second important point is that as <hask>GMap k</hask> has kind <hask>* -> *</hask> and <hask>k</hask> (implicitly) has kind <hask>*</hask>, the type constructor <hask>GMap</hask> (without an argument) has kind <hask>* -> * -> *</hask>. Consequently, we see that <hask>GMap</hask> is applied to two arguments in the signatures of the methods <hask>empty</hask>, <hask>lookup</hask>, and <hask>insert</hask>.<br />
<br />
=== An Int instance ===<br />
<br />
To use Ints as keys into generic maps, we declare an instance that simply uses <hask>Data.IntMap</hask>, thusly:<br />
<haskell><br />
instance GMapKey Int where<br />
data GMap Int v = GMapInt (Data.IntMap.IntMap v)<br />
empty = GMapInt Data.IntMap.empty<br />
lookup k (GMapInt m) = Data.IntMap.lookup k m<br />
insert k v (GMapInt m) = GMapInt (Data.IntMap.insert k v m)<br />
</haskell><br />
The <hask>Int</hask> instance of the associated data type <hask>GMap</hask> needs to have both of its parameters, but as only the first one corresponds to a class parameter, the second needs to be a type variable (here <hask>v</hask>). As mentioned before, any associated type parameter that corresponds to a class parameter must be identical to the class parameter in each instance. The right hand side of the associated data declaration is like that of any other data type.<br />
<br />
NB: At the moment, GADT syntax is not allowed for associated data types (or other indexed types). This is not a fundamental limitation, but just a shortcoming of the current implementation, which we expect to lift in the future.<br />
<br />
As an exercise, implement an instance for <hask>Char</hask> that maps back to the <hask>Int</hask> instance using the conversion functions <hask>Char.ord</hask> and <hask>Char.chr</hask>.<br />
<br />
=== A unit instance ===<br />
<br />
Generic definitions, apart from elementary types, typically cover units, products, and sums. We start here with the unit instance for <hask>GMap</hask>:<br />
<haskell><br />
instance GMapKey () where<br />
data GMap () v = GMapUnit (Maybe v)<br />
empty = GMapUnit Nothing<br />
lookup () (GMapUnit v) = v<br />
insert () v (GMapUnit _) = GMapUnit $ Just v<br />
</haskell><br />
For unit, the map is just a <hask>Maybe</hask> value.<br />
<br />
=== Product and sum instances ===<br />
<br />
Next, let us define the instances for pairs and sums (i.e., <hask>Either</hask>):<br />
<haskell><br />
instance (GMapKey a, GMapKey b) => GMapKey (a, b) where<br />
data GMap (a, b) v = GMapPair (GMap a (GMap b v))<br />
empty = GMapPair empty<br />
lookup (a, b) (GMapPair gm) = lookup a gm >>= lookup b <br />
insert (a, b) v (GMapPair gm) = GMapPair $ case lookup a gm of<br />
Nothing -> insert a (insert b v empty) gm<br />
Just gm2 -> insert a (insert b v gm2 ) gm<br />
<br />
instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where<br />
data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)<br />
empty = GMapEither empty empty<br />
lookup (Left a) (GMapEither gm1 _gm2) = lookup a gm1<br />
lookup (Right b) (GMapEither _gm1 gm2 ) = lookup b gm2<br />
insert (Left a) v (GMapEither gm1 gm2) = GMapEither (insert a v gm1) gm2<br />
insert (Right b) v (GMapEither gm1 gm2) = GMapEither gm1 (insert b v gm2)<br />
</haskell><br />
If you find this code algorithmically surprising, I'd suggest to have a look at [http://www.cs.ox.ac.uk/ralf.hinze/publications/index.html#J4 Ralf Hinze's paper]. The only novelty concerning associated types, in these two instances, is that the instances have a context <hask>(GMapKey a, GMapKey b)</hask>. Consequently, the right hand sides of the associated type declarations can use <hask>GMap</hask> recursively at the key types <hask>a</hask> and <hask>b</hask> - not unlike the method definitions use the class methods recursively at the types for which the class is given in the instance context.<br />
<br />
=== Using a generic map ===<br />
<br />
Finally, some code building and querying a generic map:<br />
<haskell><br />
myGMap :: GMap (Int, Either Char ()) String<br />
myGMap = insert (5, Left 'c') "(5, Left 'c')" $<br />
insert (4, Right ()) "(4, Right ())" $<br />
insert (5, Right ()) "This is the one!" $<br />
insert (5, Right ()) "This is the two!" $<br />
insert (6, Right ()) "(6, Right ())" $<br />
insert (5, Left 'a') "(5, Left 'a')" $<br />
empty<br />
main = putStrLn $ maybe "Couldn't find key!" id $ lookup (5, Right ()) myGMap<br />
</haskell><br />
<br />
=== Download the code ===<br />
<br />
If you want to play with this example without copying it off the wiki, just download the source code[http://darcs.haskell.org/testsuite/tests/ghc-regress/indexed-types/should_run/GMapAssoc.hs] for <hask>GMap</hask> from GHC's test suite.<br />
<br />
== Detailed definition of data families ==<br />
<br />
Data families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated types). The former is the more general variant, as it lacks the requirement for the type-indices to coincide with the class parameters. However, the latter can lead to more clearly structured code and compiler warnings if some type instances were - possibly accidentally - omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types.<br />
<br />
=== Family declarations ===<br />
<br />
Indexed data families are introduced by a signature, such as <br />
<haskell><br />
data family GMap k :: * -> *<br />
</haskell><br />
The special <hask>family</hask> distinguishes family from standard data declarations. The result kind annotation is optional and, as usual, defaults to <hask>*</hask> if omitted. An example is<br />
<haskell><br />
data family Array e<br />
</haskell><br />
Named arguments can also be given explicit kind signatures if needed. Just as with [http://www.haskell.org/ghc/docs/latest/html/users_guide/gadt.html GADT declarations] named arguments are entirely optional, so that we can declare <hask>Array</hask> alternatively with<br />
<haskell><br />
data family Array :: * -> *<br />
</haskell><br />
<br />
==== Associated family declarations ====<br />
<br />
When a data family is declared as part of a type class, we drop the <hask>family</hask> keyword. The <hask>GMap</hask> declaration takes the following form<br />
<haskell><br />
class GMapKey k where<br />
data GMap k :: * -> *<br />
...<br />
</haskell><br />
In contrast to toplevel declarations, named arguments must be used for all type parameters that are to be used as type-indices. Moreover, the argument names must be class parameters. Each class parameter may only be used at most once per associated type, but some may be omitted and they may be in an order other than in the class head. In other words: '''the named type parameters of the data declaration must be a permutation of a subset of the class variables'''. <br />
<br />
Example is admissible:<br />
<haskell><br />
class C a b c where { data T c a :: * } -- OK<br />
class C a b c where { data T a a :: * } -- Bad: repeated variable<br />
class D a where { data T a x :: * } -- Bad: x is not a class variable<br />
class D a where { data T a :: * -> * } -- OK<br />
</haskell><br />
<br />
=== Instance declarations ===<br />
<br />
Instance declarations of data and newtype families are very similar to standard data and newtype declarations. The only two differences are that the keyword <hask>data</hask> or <hask>newtype</hask> is followed by <hask>instance</hask> and that some or all of the type arguments can be non-variable types, but may not contain forall types or type synonym families. However, data families are generally allowed in type parameters, and type synonyms are allowed as long as they are fully applied and expand to a type that is itself admissible - exactly as this is required for occurrences of type synonyms in class instance parameters. For example, the <hask>Either</hask> instance for <hask>GMap</hask> is<br />
<haskell><br />
data instance GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)<br />
</haskell><br />
In this example, the declaration has only one variant. In general, it can be any number.<br />
<br />
Data and newtype instance declarations are only legit when an appropriate family declaration is in scope - just like class instances require the class declaration to be visible. Moreover, each instance declaration has to conform to the kind determined by its family declaration. This implies that the number of parameters of an instance declaration matches the arity determined by the kind of the family. Although all data families are declared with the <hask>data</hask> keyword, instances can be either <hask>data</hask> or <hask>newtype</hask>s, or a mix of both.<br />
<br />
Even if type families are defined as toplevel declarations, functions that perform different computations for different family instances still need to be defined as methods of type classes. In particular, the following is not possible:<br />
<haskell><br />
data family T a<br />
data instance T Int = A<br />
data instance T Char = B<br />
nonsense :: T a -> Int<br />
nonsense A = 1 -- WRONG: These two equations together...<br />
nonsense B = 2 -- ...will produce a type error.<br />
</haskell><br />
Given the functionality provided by GADTs (Generalised Algebraic Data Types), it might seem as if a definition, such as the above, should be feasible. However, type families - in contrast to GADTs - are ''open''; i.e., new instances can always be added, possibly in other modules. Supporting pattern matching across different data instances would require a form of extensible case construct.<br />
<br />
==== Associated type instances ====<br />
<br />
When an associated family instance is declared within a type class instance, we drop the <hask>instance</hask> keyword in the family instance. So, the <hask>Either</hask> instance for <hask>GMap</hask> becomes:<br />
<haskell><br />
instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where<br />
data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)<br />
...<br />
</haskell><br />
The most important point about associated family instances is that the type indices corresponding to class parameters must be identical to the type given in the instance head; here this is the first argument of <hask>GMap</hask>, namely <hask>Either a b</hask>, which coincides with the only class parameter. Any parameters to the family constructor that do not correspond to class parameters, need to be variables in every instance; here this is the variable <hask>v</hask>.<br />
<br />
Instances for an associated family can only appear as part of instance declarations of the class in which the family was declared - just as with the equations of the methods of a class. Also in correspondence to how methods are handled, declarations of associated types can be omitted in class instances. If an associated family instance is omitted, the corresponding instance type is not inhabited; i.e., only diverging expressions, such as <hask>undefined</hask>, can assume the type.<br />
<br />
==== Scoping of class parameters ====<br />
<br />
In the case of multi-parameter type classes, the visibility of class parameters in the right-hand side of associated family instances depends ''solely'' on the parameters of the data family. As an example, consider the simple class declaration<br />
<haskell><br />
class C a b where<br />
data T a<br />
</haskell><br />
Only one of the two class parameters is a parameter to the data family. Hence, the following instance declaration is invalid:<br />
<haskell><br />
instance C [c] d where<br />
data T [c] = MkT (c, d) -- WRONG!! 'd' is not in scope<br />
</haskell><br />
Here, the right-hand side of the data instance mentions the type variable <hask>d</hask> that does not occur in its left-hand side. We cannot admit such data instances as they would compromise type safety.<br />
<br />
==== Type class instances of family instances ====<br />
<br />
Type class instances of instances of data families can be defined as usual, and in particular data instance declarations can have <hask>deriving</hask> clauses. For example, we can write<br />
<haskell><br />
data GMap () v = GMapUnit (Maybe v)<br />
deriving Show<br />
</haskell><br />
which implicitly defines an instance of the form<br />
<haskell><br />
instance Show v => Show (GMap () v) where ...<br />
</haskell><br />
<br />
Note that class instances are always for particular ''instances'' of a data family and never for an entire family as a whole. This is for essentially the same reasons that we cannot define a toplevel function that performs pattern matching on the data constructors of ''different'' instances of a single type family. It would require a form of extensible case construct.<br />
<br />
==== Overlap ====<br />
<br />
The instance declarations of a data family used in a single program may not overlap at all, independent of whether they are associated or not. In contrast to type class instances, this is not only a matter of consistency, but one of type safety.<br />
<br />
=== Import and export ===<br />
<br />
The association of data constructors with type families is more dynamic than that is the case with standard data and newtype declarations. In the standard case, the notation <hask>T(..)</hask> in an import or export list denotes the type constructor and all the data constructors introduced in its declaration. However, a family declaration never introduces any data constructors; instead, data constructors are introduced by family instances. As a result, which data constructors are associated with a type family depends on the currently visible instance declarations for that family. Consequently, an import or export item of the form <hask>T(..)</hask> denotes the family constructor and all currently visible data constructors - in the case of an export item, these may be either imported or defined in the current module. The treatment of import and export items that explicitly list data constructors, such as <hask>GMap(GMapEither)</hask>, is analogous.<br />
<br />
==== Associated families ====<br />
<br />
As expected, an import or export item of the form <hask>C(..)</hask> denotes all of the class' methods and associated types. However, when associated types are explicitly listed as subitems of a class, we need some new syntax, as uppercase identifiers as subitems are usually data constructors, not type constructors. To clarify that we denote types here, each associated type name needs to be prefixed by the keyword <hask>type</hask>. So for example, when explicitly listing the components of the <hask>GMapKey</hask> class, we write <hask>GMapKey(type GMap, empty, lookup, insert)</hask>.<br />
<br />
==== Examples ====<br />
<br />
Assuming our running <hask>GMapKey</hask> class example, let us look at some export lists and their meaning:<br />
<br />
* <hask>module GMap (GMapKey) where...</hask>: Exports just the class name.<br />
* <hask>module GMap (GMapKey(..)) where...</hask>: Exports the class, the associated type <hask>GMap</hask> and the member functions <hask>empty</hask>, <hask>lookup</hask>, and <hask>insert</hask>. None of the data constructors is exported.<br />
* <hask>module GMap (GMapKey(..), GMap(..)) where...</hask>: As before, but also exports all the data constructors <hask>GMapInt</hask>, <hask>GMapChar</hask>, <hask>GMapUnit</hask>, <hask>GMapPair</hask>, and <hask>GMapEither</hask>.<br />
* <hask>module GMap (GMapKey(empty, lookup, insert), GMap(..)) where...</hask>: As before.<br />
* <hask>module GMap (GMapKey, empty, lookup, insert, GMap(..)) where...</hask>: As before.<br />
<br />
Finally, you can write <hask>GMapKey(type GMap)</hask> to denote both the class <hask>GMapKey</hask> as well as its associated type <hask>GMap</hask>. However, you cannot write <hask>GMapKey(type GMap(..))</hask> &mdash; i.e., sub-component specifications cannot be nested. To specify <hask>GMap</hask>'s data constructors, you have to list it separately.<br />
<br />
==== Instances ====<br />
<br />
Family instances are implicitly exported, just like class instances. However, this applies only to the heads of instances, not to the data constructors an instance defines.<br />
<br />
== An associated type synonym example ==<br />
<br />
Type synonym families are an alternative to functional dependencies, which makes functional dependency examples well suited to introduce type synonym families. In fact, type families are a more functional way to express the same as functional dependencies (despite the name!), as they replace the relational notation of functional dependencies by an expression-oriented notation; i.e., functions on types are really represented by functions and not relations.<br />
<br />
=== The <hask>class</hask> declaration ===<br />
<br />
Here's an example from Mark Jones' seminal paper on functional dependencies:<br />
<haskell><br />
class Collects e ce | ce -> e where<br />
empty :: ce<br />
insert :: e -> ce -> ce<br />
member :: e -> ce -> Bool<br />
toList :: ce -> [e]<br />
</haskell><br />
<br />
With associated type synonyms we can write this as<br />
<haskell><br />
class Collects ce where<br />
type Elem ce<br />
empty :: ce<br />
insert :: Elem ce -> ce -> ce<br />
member :: Elem ce -> ce -> Bool<br />
toList :: ce -> [Elem ce]<br />
</haskell><br />
Instead of the multi-parameter type class, we use a single parameter class, and the parameter <hask>e</hask><br />
turned into an associated type synonym <hask>Elem ce</hask>.<br />
<br />
=== An <hask>instance</hask>===<br />
<br />
Instances change correspondingly. An instance of the two-parameter class<br />
<haskell><br />
instance Eq e => Collects e [e] where<br />
empty = []<br />
insert e l = (e:l)<br />
member e [] = False<br />
member e (x:xs) <br />
| e == x = True<br />
| otherwise = member e xs<br />
toList l = l<br />
</haskell><br />
becomes an instance of a single-parameter class, where the dependent type parameter turns into an associated type instance declaration:<br />
<haskell><br />
instance Eq e => Collects [e] where<br />
type Elem [e] = e<br />
empty = []<br />
insert e l = (e:l)<br />
member e [] = False<br />
member e (x:xs) <br />
| e == x = True<br />
| otherwise = member e xs<br />
toList l = l<br />
</haskell><br />
<br />
=== Using generic collections ===<br />
<br />
With Functional Dependencies the code would be:<br />
<haskell><br />
sumCollects :: (Collects e c1, Collects e c2) => c1 -> c2 -> c2<br />
sumCollects c1 c2 = foldr insert c2 (toList c1)<br />
</haskell><br />
<br />
In contrast, with associated type synonyms, we get:<br />
<haskell><br />
sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 -> c2 -> c2<br />
sumCollects c1 c2 = foldr insert c2 (toList c1)<br />
</haskell><br />
<br />
== Detailed definition of type synonym families ==<br />
<br />
Type families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated type synonyms). The former is the more general variant, as it lacks the requirement for the type-indices to coincide with the class parameters. However, the latter can lead to more clearly structured code and compiler warnings if some type instances were - possibly accidentally - omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types.<br />
<br />
=== Family declarations ===<br />
<br />
Indexed type families are introduced by a signature, such as <br />
<haskell><br />
type family Elem c :: *<br />
</haskell><br />
The special <hask>family</hask> distinguishes family from standard type declarations. The result kind annotation is optional and, as usual, defaults to <hask>*</hask> if omitted. An example is<br />
<haskell><br />
type family Elem c<br />
</haskell><br />
Parameters can also be given explicit kind signatures if needed. We call the number of parameters in a type family declaration, the family's arity, and all applications of a type family must be fully saturated w.r.t. to that arity. This requirement is unlike ordinary type synonyms and it implies that the kind of a type family is not sufficient to determine a family's arity, and hence in general, also insufficient to determine whether a type family application is well formed. As an example, consider the following declaration:<br />
<haskell><br />
type family F a b :: * -> * -- F's arity is 2, <br />
-- although its overall kind is * -> * -> * -> *<br />
</haskell><br />
Given this declaration the following are examples of well-formed and malformed types:<br />
<haskell><br />
F Char [Int] -- OK! Kind: * -> *<br />
F Char [Int] Bool -- OK! Kind: *<br />
F IO Bool -- WRONG: kind mismatch in the first argument<br />
F Bool -- WRONG: unsaturated application<br />
</haskell><br />
<br />
A top-level type family can be declared as open or closed. (Associated type<br />
families are always open.) A closed type family has all of its equations<br />
defined in one place and cannot be extended, whereas an open family can have<br />
instances spread across modules. The advantage of a closed family is that<br />
its equations are tried in order, similar to a term-level function definition:<br />
<haskell><br />
type family G a where<br />
G Int = Bool<br />
G a = Char<br />
</haskell><br />
With this definition, the type <hask>G Int</hask> becomes <hask>Bool</hask><br />
and, say, <hask>G Double</hask> becomes <hask>Char</hask>. See also [http://ghc.haskell.org/trac/ghc/wiki/NewAxioms here] for more information about closed type families.<br />
<br />
==== Associated family declarations ====<br />
<br />
When a type family is declared as part of a type class, we drop the <hask>family</hask> special. The <hask>Elem</hask> declaration takes the following form<br />
<haskell><br />
class Collects ce where<br />
type Elem ce :: *<br />
...<br />
</haskell><br />
Exactly as in the case of an associated data declaration, '''the named type parameters must be a permutation of a subset of the class parameters'''. Examples<br />
<haskell><br />
class C a b c where { type T c a :: * } -- OK<br />
class D a where { type T a x :: * } -- No: x is not a class parameter<br />
class D a where { type T a :: * -> * } -- OK<br />
</haskell><br />
<br />
=== Type instance declarations ===<br />
<br />
Instance declarations of open type families are very similar to standard type synonym declarations. The only two differences are that the keyword <hask>type</hask> is followed by <hask>instance</hask> and that some or all of the type arguments can be non-variable types, but may not contain forall types or type synonym families. However, data families are generally allowed, and type synonyms are allowed as long as they are fully applied and expand to a type that is admissible - these are the exact same requirements as for data instances. For example, the <hask>[e]</hask> instance for <hask>Elem</hask> is<br />
<haskell><br />
type instance Elem [e] = e<br />
</haskell><br />
<br />
A type family instance declaration must satisfy the following rules:<br />
* An appropriate family declaration is in scope - just like class instances require the class declaration to be visible. <br />
* The instance declaration conforms to the kind determined by its family declaration<br />
* The number of type parameters in an instance declaration matches the number of type parameters in the family declaration.<br />
* The right-hand side of a type instance must be a monotype (i.e., it may not include foralls) and after the expansion of all saturated vanilla type synonyms, no synonyms, except family synonyms may remain.<br />
<br />
Here are some examples of admissible and illegal type instances and closed families:<br />
<haskell><br />
type family F a :: *<br />
type instance F [Int] = Int -- OK!<br />
type instance F String = Char -- OK!<br />
type instance F (F a) = a -- WRONG: type parameter mentions a type family<br />
type instance F (forall a. (a, b)) = b -- WRONG: a forall type appears in a type parameter<br />
type instance F Float = forall a.a -- WRONG: right-hand side may not be a forall type<br />
<br />
type family F2 a where -- OK!<br />
F2 (Maybe Int) = Int<br />
F2 (Maybe Bool) = Bool<br />
F2 (Maybe a) = String<br />
<br />
type family G a b :: * -> *<br />
type instance G Int = (,) -- WRONG: must be two type parameters<br />
type instance G Int Char Float = Double -- WRONG: must be two type parameters<br />
</haskell><br />
<br />
==== Closed family simplification ====<br />
<br />
Included in ghc starting 7.8.1.<br />
<br />
When dealing with closed families, simplifying the type is harder than just finding a left-hand side that matches and replacing that with a right-hand side. GHC will select an equation to use in a given type family application (the "target") if and only if the following 2 conditions hold:<br />
<br />
# There is a substitution from the variables in the equation's LHS that makes the left-hand side of the branch coincide with the target.<br />
# For each previous equation in the family: either the LHS of that equation is ''apart'' from the type family application, '''or''' the equation is ''compatible'' with the chosen equation.<br />
<br />
Now, we define ''apart'' and ''compatible'':<br />
# Two types are ''apart'' when one cannot simplify to the other, even after arbitrary type-family simplifications<br />
# Two equations are ''compatible'' if, either, their LHSs are apart or their LHSs unify and their RHSs are the same under the substitution induced by the unification.<br />
<br />
Some examples are in order:<br />
<haskell><br />
type family F a where<br />
F Int = Bool<br />
F Bool = Char<br />
F a = Bool<br />
<br />
type family And (a :: Bool) (b :: Bool) :: Bool where<br />
And False c = False<br />
And True d = d<br />
And e False = False<br />
And f True = f<br />
And g g = g<br />
</haskell><br />
<br />
In <hask>F</hask>, all pairs of equations are compatible except the second and third. The first two are compatible because their LHSs are apart. The first and third are compatible because the unifying substitution leads the RHSs to be the same. But, the second and third are not compatible because neither of these conditions holds. As a result, GHC will not use the third equation to simplify a target unless that target is apart from <hask>Bool</hask>.<br />
<br />
In <hask>And</hask>, ''every'' pair of equations is compatible, meaning GHC never has to make the extra apartness check during simplification.<br />
<br />
Why do all of this? It's a matter of type safety. Consider this example:<br />
<br />
<haskell><br />
type family J a b where<br />
J a a = Int<br />
J a b = Bool<br />
</haskell><br />
<br />
Say GHC selected the second branch just because the first doesn't apply at the moment, because two type variables are distinct. The problem is that those variables might later be instantiated at the same value, and then the first branch would have applied. You can convince this sort of inconsistency to produce <hask>unsafeCoerce</hask>.<br />
<br />
It gets worse. GHC has no internal notion of inequality, so it can't use previous, failed term-level GADT pattern matches to refine its type assumptions. For example:<br />
<br />
<haskell><br />
data G :: * -> * where<br />
GInt :: G Int<br />
GBool :: G Bool<br />
<br />
type family Foo (a :: *) :: * where<br />
Foo Int = Char<br />
Foo a = Double<br />
<br />
bar :: G a -> Foo a<br />
bar GInt = 'x'<br />
bar _ = 3.14<br />
</haskell><br />
<br />
The last line will fail to typecheck, because GHC doesn't know that the type variable <hask>a</hask> can't be <hask>Int</hask> here, even though it's obvious. The only general way to fix this is to have inequality evidence introduced into GHC, and that's a big deal and we don't know if we have the motivation for such a change yet.<br />
<br />
==== Associated type instances ====<br />
<br />
When an associated family instance is declared within a type class instance, we drop the <hask>instance</hask> keyword in the family instance. So, the <hask>[e]</hask> instance for <hask>Elem</hask> becomes:<br />
<haskell><br />
instance (Eq (Elem [e])) => Collects ([e]) where<br />
type Elem [e] = e<br />
...<br />
</haskell><br />
The most important point about associated family instances is that the type indexes corresponding to class parameters must be identical to the type given in the instance head; here this is <hask>[e]</hask>, which coincides with the only class parameter.<br />
<br />
Instances for an associated family can only appear as part of instance declarations of the class in which the family was declared - just as with the equations of the methods of a class. Also in correspondence to how methods are handled, declarations of associated types can be omitted in class instances. If an associated family instance is omitted, the corresponding instance type is not inhabited; i.e., only diverging expressions, such as <hask>undefined</hask>, can assume the type.<br />
<br />
==== Overlap ====<br />
<br />
The instance declarations of an open type family used in a single program must be compatible, in the form defined above. This condition is independent of whether the type family is associated or not, and it is not only a matter of consistency, but one of type safety. <br />
<br />
Here are two examples to illustrate the condition under which overlap is permitted.<br />
<haskell><br />
type instance F (a, Int) = [a]<br />
type instance F (Int, b) = [b] -- overlap permitted<br />
<br />
type instance G (a, Int) = [a]<br />
type instance G (Char, a) = [a] -- ILLEGAL overlap, as [Char] /= [Int]<br />
</haskell><br />
<br />
==== Decidability ====<br />
<br />
In order to guarantee that type inference in the presence of type families is decidable, we need to place a number of additional restrictions on the formation of type instance declarations (c.f., Definition 5 (Relaxed Conditions) of [http://www.cse.unsw.edu.au/~chak/papers/SPCS08.html Type Checking with Open Type Functions]). Instance declarations have the general form<br />
<haskell><br />
type instance F t1 .. tn = t<br />
</haskell><br />
where we require that for every type family application <hask>(G s1 .. sm)</hask> in <hask>t</hask>, <br />
# <hask>s1 .. sm</hask> do not contain any type family constructors,<br />
# the total number of symbols (data type constructors and type variables) in <hask>s1 .. sm</hask> is strictly smaller than in <hask>t1 .. tn</hask>, and<br />
# for every type variable <hask>a</hask>, <hask>a</hask> occurs in <hask>s1 .. sm</hask> at most as often as in <hask>t1 .. tn</hask>.<br />
These restrictions are easily verified and ensure termination of type inference. However, they are not sufficient to guarantee completeness of type inference in the presence of, so called, ''loopy equalities'', such as <hask>a ~ [F a]</hask>, where a recursive occurrence of a type variable is underneath a family application and data constructor application - see the above mentioned paper for details. <br />
<br />
If the option <tt>-XUndecidableInstances</tt> is passed to the compiler, the above restrictions are not enforced and it is on the programmer to ensure termination of the normalisation of type families during type inference.<br />
<br />
=== Equality constraints ===<br />
<br />
Type context can include equality constraints of the form <hask>t1 ~ t2</hask>, which denote that the types <hask>t1</hask> and <hask>t2</hask> need to be the same. In the presence of type families, whether two types are equal cannot generally be decided locally. Hence, the contexts of function signatures may include equality constraints, as in the following example:<br />
<haskell><br />
sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 -> c2 -> c2<br />
</haskell><br />
where we require that the element type of <hask>c1</hask> and <hask>c2</hask> are the same. In general, the types <hask>t1</hask> and <hask>t2</hask> of an equality constraint may be arbitrary monotypes; i.e., they may not contain any quantifiers, independent of whether higher-rank types are otherwise enabled.<br />
<br />
Equality constraints can also appear in class and instance contexts. The former enable a simple translation of programs using functional dependencies into programs using family synonyms instead. The general idea is to rewrite a class declaration of the form<br />
<haskell><br />
class C a b | a -> b<br />
</haskell><br />
to<br />
<haskell><br />
class (F a ~ b) => C a b where<br />
type F a<br />
</haskell><br />
That is, we represent every functional dependency (FD) <hask>a1 .. an -> b</hask> by an FD type family <hask>F a1 .. an</hask> and a superclass context equality <hask>F a1 .. an ~ b</hask>, essentially giving a name to the functional dependency. In class instances, we define the type instances of FD families in accordance with the class head. Method signatures are not affected by that process.<br />
<br />
== Frequently asked questions ==<br />
<br />
=== Comparing type families and functional dependencies ===<br />
<br />
Functional dependencies cover some of the same territory as type families. How do the two compare?<br />
There are some articles about this question:<br />
<br />
* Experiences in converting functional dependencies to type families: "[[Functional dependencies vs. type families]]"<br />
* [https://ghc.haskell.org/trac/ghc/wiki/TFvsFD GHC trac] on a comparison of functional dependencies and type families<br />
<br />
=== Injectivity, type inference, and ambiguity ===<br />
<br />
A common problem is this<br />
<haskell><br />
type family F a<br />
<br />
f :: F a -> F a<br />
f = undefined<br />
<br />
g :: F Int -> F Int<br />
g x = f x<br />
</haskell><br />
The compiler complains about the definition of <tt>g</tt> saying<br />
<code><br />
Couldn't match expected type `F Int' against inferred type `F a1'<br />
</code><br />
In type-checking <tt>g</tt>'s right hand side GHC discovers (by instantiating <tt>f</tt>'s type with a fresh type variable) that it has type <tt>F a1 -> F a1</tt> for some as-yet-unknown type <tt>a1</tt>. Now it tries to make the inferred type match <tt>g</tt>'s type signature. Well, you say, just make <tt>a1</tt> equal to <tt>Int</tt> and you are done. True, but what if there were these instances<br />
<haskell><br />
type instance F Int = Bool<br />
type instance F Char = Bool<br />
</haskell><br />
Then making <tt>a1</tt> equal to <tt>Char</tt> would ''also'' make the two types equal. Because there is (potentially) more than one choice, the program is rejected.<br />
<br />
However (and confusingly) if you omit the type signature on <tt>g</tt> altogether, thus<br />
<haskell><br />
f :: F a -> F a<br />
f = undefined<br />
<br />
g x = f x<br />
</haskell><br />
GHC will happily infer the type <tt>g :: F a -> F a</tt>. But you can't ''write'' that type signature or, indeed, the more specific one above. (Arguably this behaviour, where GHC ''infers'' a type it can't ''check'', is very confusing. I suppose we could make GHC reject both programs, with and without type signatures.)<br />
<br />
'''What is the problem?''' The nub of the issue is this: knowing that <tt>F t1</tt>=<tt>F t2</tt> does ''not'' imply that <tt>t1</tt> = <tt>t2</tt>.<br />
The difficulty is that the type function <tt>F</tt> need not be ''injective''; it can map two distinct types to the same type. For an injective type constructor like <tt>Maybe</tt>, if we know that <tt>Maybe t1</tt> = <tt>Maybe t2</tt>, then we know that <tt>t1</tt> = <tt>t2</tt>. But not so for non-injective type functions.<br />
<br />
The problem starts with <tt>f</tt>. Its type is ''ambiguous''; even if I know the argument and result types for <tt>f</tt>, I cannot use that to find the type at which <tt>a</tt> should be instantiated. (So arguably, <tt>f</tt> should be rejected as having an ambiguous type, and probably will be in future.) The situation is well known in type classes: <br />
<haskell><br />
bad :: (Read a, Show a) => String -> String<br />
bad x = show (read x)<br />
</haskell><br />
At a call of <tt>bad</tt> one cannot tell at what type <tt>a</tt> should be instantiated.<br />
<br />
The only solution is to avoid ambiguous types. In the type signature of a function, <br />
* Ensure that every type variable occurs in the part after the "<tt>=></tt>"<br />
* Ensure that every type variable appears at least once outside a type function call.<br />
Alternatively, you can use data families, which create new types and are therefore injective. The following code works:<br />
<br />
<haskell>data family F a<br />
<br />
f :: F a -> F a<br />
f = undefined<br />
<br />
g :: F Int -> F Int<br />
g x = f x</haskell><br />
<br />
== References ==<br />
<br />
* [http://www.cse.unsw.edu.au/~chak/papers/CKPM05.html Associated Types with Class.] Manuel M. T. Chakravarty, Gabriele Keller, Simon Peyton Jones, and Simon Marlow. In ''Proceedings of The 32nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'05)'', pages 1-13, ACM Press, 2005.<br />
* [http://www.cse.unsw.edu.au/~chak/papers/CKP05.html Associated Type Synonyms.] Manuel M. T. Chakravarty, Gabriele Keller, and Simon Peyton Jones. In ''Proceedings of The Tenth ACM SIGPLAN International Conference on Functional Programming'', ACM Press, pages 241-253, 2005.<br />
* [http://www.cse.unsw.edu.au/~chak/papers/SCPD07.html System F with Type Equality Coercions.] Martin Sulzmann, Manuel M. T. Chakravarty, Simon Peyton Jones, and Kevin Donnelly. In ''Proceedings of The Third ACM SIGPLAN Workshop on Types in Language Design and Implementation'', ACM Press, 2007.<br />
* [http://www.cse.unsw.edu.au/~chak/papers/SPCS08.html Type Checking With Open Type Functions.] Tom Schrijvers, Simon Peyton-Jones, Manuel M. T. Chakravarty, Martin Sulzmann. In ''Proceedings of The 13th ACM SIGPLAN International Conference on Functional Programming'', ACM Press, pages 51-62, 2008.<br />
* [[Simonpj/Talk:FunWithTypeFuns | Fun with Type Functions]] Oleg Kiselyov, Simon Peyton Jones, Chung-chieh Shan (the source for this paper can be found at http://patch-tag.com/r/schoenfinkel/typefunctions/wiki ) <br />
<br />
[[Category:Type-level programming]]<br />
[[Category:Language extensions]]<br />
[[Category:GHC|Indexed types]]</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=GHC/Type_families&diff=62965GHC/Type families2019-07-22T19:02:52Z<p>Anton-Latukha: /* Requirements to use type families */: upd description</p>
<hr />
<div>{{GHCUsersGuide|glasgow_exts|type-families|a Type Family section}}<br />
<br />
<br />
Indexed type families, or '''type families''' for short, are a Haskell extension supporting ad-hoc overloading of data types. Type families are parametric types that can be assigned specialized representations based on the type parameters they are instantiated with. They are the data type analogue of [[Type class|type classes]]: families are used to define overloaded ''data'' in the same way that classes are used to define overloaded ''functions''. Type families are useful for generic programming, for creating highly parameterised library interfaces, and for creating interfaces with enhanced static information, much like dependent types.<br />
<br />
Type families come in two flavors: ''data families'' and ''type synonym families''. Data families are the indexed form of data and newtype definitions. Type synonym families are the indexed form of type synonyms. Each of these flavors can be defined in a standalone manner or ''associated'' with a type class. Standalone definitions are more general, while associated types can more clearly express how a type is used and lead to better error messages.<br />
<br />
''NB: see also Simon's [https://www.haskell.org/ghc/blog/20100930-LetGeneralisationInGhc7.html blog entry on let generalisation] for a significant change in the policy for let generalisation, driven by the type family extension. In brief: a few programs will puzzlingly fail to compile with <tt>-XTypeFamilies</tt> even though the code is legal Haskell 98.''<br />
<br />
== What are type families? ==<br />
<br />
The concept of a type family comes from type theory. An indexed type family in type theory is a partial function at the type level. Applying the function to parameters (called ''type indices'') yields a type. Type families permit a program to compute what data constructors it will operate on, rather than having them fixed statically (as with simple type systems) or treated as opaque unknowns (as with parametrically polymorphic types).<br />
<br />
Type families are to vanilla data types what type class methods are to regular functions. Vanilla polymorphic data types and functions have a single definition, which is used at all type instances. Classes and type families, on the other hand, have an interface definition and any number of instance definitions. A type family's interface definition declares its [[kind]] and its arity, or the number of type indices it takes. Instance definitions define the type family over some part of the domain.<br />
<br />
As a simple example of how type families differ from ordinary parametric data types, consider a strict list type. We can represent a list of <hask>Char</hask> in the usual way, with cons cells. We can do the same thing to represent a list of <hask>()</hask>, but since a strict <hask>()</hask> value carries no useful information, it would be more efficient to just store the length of the list. This can't be done with an ordinary parametric data type, because the data constructors used to represent the list would depend on the list's type parameter: if it's <hask>Char</hask> then the list consists of cons cells; if it's <hask>()</hask>, then the list consists of a single integer. We basically want to select between two different data types based on a type parameter. Using type families, this list type could be declared as follows:<br />
<br />
<haskell><br />
-- Declare a list-like data family<br />
data family XList a<br />
<br />
-- Declare a list-like instance for Char<br />
data instance XList Char = XCons !Char !(XList Char) | XNil<br />
<br />
-- Declare a number-like instance for ()<br />
data instance XList () = XListUnit !Int<br />
</haskell><br />
<br />
The right-hand sides of the two <code>data instance</code> declarations are exactly ordinary data definitions. In fact, a <code>data instance</code> declaration is nothing more than a shorthand for a <code>data</code> declaration followed by a <code>type instance</code> (see below) declaration. However, they define two instances of the same parametric data type, <hask>XList Char</hask> and <hask>XList ()</hask>, whereas ordinary data declarations define completely unrelated types. A recent [[Simonpj/Talk:FunWithTypeFuns|tutorial paper]] has more in-depth examples of programming with type families. <br />
<br />
[[GADT]]s bear some similarity to type families, in the sense that they allow a parametric type's constructors to depend on the type's parameters. However, all GADT constructors must be defined in one place, whereas type families can be extended. Functional dependencies are similar to type families, and many type classes that use functional dependencies can be equivalently expressed with type families. Type families provide a more functional style of type-level programming than the relational style of functional dependencies.<br />
<br />
== Requirements to use type families ==<br />
<br />
Type families are a GHC extension enabled with the <code>-XTypeFamilies</code> flag or the <code>{-# LANGUAGE TypeFamilies #-}</code> progma. The first stable release of GHC that properly supports type families is 6.10.1. Release 6.8 includes an early partial deprecated implementation.<br />
<br />
Please [http://hackage.haskell.org/trac/ghc/query?status=new&status=assigned&status=reopened&group=priority&type=bug&order=id&desc=1 report bugs] via the GHC bug tracker, ideally accompanied by a small example program that demonstrates the problem. Use the [mailto:glasgow-haskell-users@haskell.org GHC mailing list] for questions or for a discussion of this language extension and its description on this wiki page.<br />
<br />
== An associated data type example ==<br />
<br />
As an example, consider Ralf Hinze's [http://www.cs.ox.ac.uk/ralf.hinze/publications/GGTries.ps.gz generalised tries], a form of generic finite maps. <br />
<br />
=== The class declaration ===<br />
<br />
We define a type class whose instances are the types that we can use as keys in our generic maps:<br />
<haskell><br />
class GMapKey k where<br />
data GMap k :: * -> *<br />
empty :: GMap k v<br />
lookup :: k -> GMap k v -> Maybe v<br />
insert :: k -> v -> GMap k v -> GMap k v<br />
</haskell><br />
The interesting part is the ''associated data family'' declaration of the class. It gives a [http://www.haskell.org/ghc/docs/latest/html/users_guide/type-families.html#data-family-declarations ''kind signature''] (here <hask>* -> *</hask>) for the associated data type <hask>GMap k</hask> - analogous to how methods receive a type signature in a class declaration.<br />
<br />
What it is important to notice is that the first parameter of the associated type <hask>GMap</hask> coincides with the class parameter of <hask>GMapKey</hask>. This indicates that also in all instances of the class, the instances of the associated data type need to have their first argument match up with the instance type. In general, the type arguments of an associated type can be a subset of the class parameters (in a multi-parameter type class) and they can appear in any order, possibly in an order other than in the class head. The latter can be useful if the associated data type is partially applied in some contexts.<br />
<br />
The second important point is that as <hask>GMap k</hask> has kind <hask>* -> *</hask> and <hask>k</hask> (implicitly) has kind <hask>*</hask>, the type constructor <hask>GMap</hask> (without an argument) has kind <hask>* -> * -> *</hask>. Consequently, we see that <hask>GMap</hask> is applied to two arguments in the signatures of the methods <hask>empty</hask>, <hask>lookup</hask>, and <hask>insert</hask>.<br />
<br />
=== An Int instance ===<br />
<br />
To use Ints as keys into generic maps, we declare an instance that simply uses <hask>Data.IntMap</hask>, thusly:<br />
<haskell><br />
instance GMapKey Int where<br />
data GMap Int v = GMapInt (Data.IntMap.IntMap v)<br />
empty = GMapInt Data.IntMap.empty<br />
lookup k (GMapInt m) = Data.IntMap.lookup k m<br />
insert k v (GMapInt m) = GMapInt (Data.IntMap.insert k v m)<br />
</haskell><br />
The <hask>Int</hask> instance of the associated data type <hask>GMap</hask> needs to have both of its parameters, but as only the first one corresponds to a class parameter, the second needs to be a type variable (here <hask>v</hask>). As mentioned before, any associated type parameter that corresponds to a class parameter must be identical to the class parameter in each instance. The right hand side of the associated data declaration is like that of any other data type.<br />
<br />
NB: At the moment, GADT syntax is not allowed for associated data types (or other indexed types). This is not a fundamental limitation, but just a shortcoming of the current implementation, which we expect to lift in the future.<br />
<br />
As an exercise, implement an instance for <hask>Char</hask> that maps back to the <hask>Int</hask> instance using the conversion functions <hask>Char.ord</hask> and <hask>Char.chr</hask>.<br />
<br />
=== A unit instance ===<br />
<br />
Generic definitions, apart from elementary types, typically cover units, products, and sums. We start here with the unit instance for <hask>GMap</hask>:<br />
<haskell><br />
instance GMapKey () where<br />
data GMap () v = GMapUnit (Maybe v)<br />
empty = GMapUnit Nothing<br />
lookup () (GMapUnit v) = v<br />
insert () v (GMapUnit _) = GMapUnit $ Just v<br />
</haskell><br />
For unit, the map is just a <hask>Maybe</hask> value.<br />
<br />
=== Product and sum instances ===<br />
<br />
Next, let us define the instances for pairs and sums (i.e., <hask>Either</hask>):<br />
<haskell><br />
instance (GMapKey a, GMapKey b) => GMapKey (a, b) where<br />
data GMap (a, b) v = GMapPair (GMap a (GMap b v))<br />
empty = GMapPair empty<br />
lookup (a, b) (GMapPair gm) = lookup a gm >>= lookup b <br />
insert (a, b) v (GMapPair gm) = GMapPair $ case lookup a gm of<br />
Nothing -> insert a (insert b v empty) gm<br />
Just gm2 -> insert a (insert b v gm2 ) gm<br />
<br />
instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where<br />
data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)<br />
empty = GMapEither empty empty<br />
lookup (Left a) (GMapEither gm1 _gm2) = lookup a gm1<br />
lookup (Right b) (GMapEither _gm1 gm2 ) = lookup b gm2<br />
insert (Left a) v (GMapEither gm1 gm2) = GMapEither (insert a v gm1) gm2<br />
insert (Right b) v (GMapEither gm1 gm2) = GMapEither gm1 (insert b v gm2)<br />
</haskell><br />
If you find this code algorithmically surprising, I'd suggest to have a look at [http://www.cs.ox.ac.uk/ralf.hinze/publications/index.html#J4 Ralf Hinze's paper]. The only novelty concerning associated types, in these two instances, is that the instances have a context <hask>(GMapKey a, GMapKey b)</hask>. Consequently, the right hand sides of the associated type declarations can use <hask>GMap</hask> recursively at the key types <hask>a</hask> and <hask>b</hask> - not unlike the method definitions use the class methods recursively at the types for which the class is given in the instance context.<br />
<br />
=== Using a generic map ===<br />
<br />
Finally, some code building and querying a generic map:<br />
<haskell><br />
myGMap :: GMap (Int, Either Char ()) String<br />
myGMap = insert (5, Left 'c') "(5, Left 'c')" $<br />
insert (4, Right ()) "(4, Right ())" $<br />
insert (5, Right ()) "This is the one!" $<br />
insert (5, Right ()) "This is the two!" $<br />
insert (6, Right ()) "(6, Right ())" $<br />
insert (5, Left 'a') "(5, Left 'a')" $<br />
empty<br />
main = putStrLn $ maybe "Couldn't find key!" id $ lookup (5, Right ()) myGMap<br />
</haskell><br />
<br />
=== Download the code ===<br />
<br />
If you want to play with this example without copying it off the wiki, just download the source code[http://darcs.haskell.org/testsuite/tests/ghc-regress/indexed-types/should_run/GMapAssoc.hs] for <hask>GMap</hask> from GHC's test suite.<br />
<br />
== Detailed definition of data families ==<br />
<br />
Data families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated types). The former is the more general variant, as it lacks the requirement for the type-indices to coincide with the class parameters. However, the latter can lead to more clearly structured code and compiler warnings if some type instances were - possibly accidentally - omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types.<br />
<br />
=== Family declarations ===<br />
<br />
Indexed data families are introduced by a signature, such as <br />
<haskell><br />
data family GMap k :: * -> *<br />
</haskell><br />
The special <hask>family</hask> distinguishes family from standard data declarations. The result kind annotation is optional and, as usual, defaults to <hask>*</hask> if omitted. An example is<br />
<haskell><br />
data family Array e<br />
</haskell><br />
Named arguments can also be given explicit kind signatures if needed. Just as with [http://www.haskell.org/ghc/docs/latest/html/users_guide/gadt.html GADT declarations] named arguments are entirely optional, so that we can declare <hask>Array</hask> alternatively with<br />
<haskell><br />
data family Array :: * -> *<br />
</haskell><br />
<br />
==== Associated family declarations ====<br />
<br />
When a data family is declared as part of a type class, we drop the <hask>family</hask> keyword. The <hask>GMap</hask> declaration takes the following form<br />
<haskell><br />
class GMapKey k where<br />
data GMap k :: * -> *<br />
...<br />
</haskell><br />
In contrast to toplevel declarations, named arguments must be used for all type parameters that are to be used as type-indices. Moreover, the argument names must be class parameters. Each class parameter may only be used at most once per associated type, but some may be omitted and they may be in an order other than in the class head. In other words: '''the named type parameters of the data declaration must be a permutation of a subset of the class variables'''. <br />
<br />
Example is admissible:<br />
<haskell><br />
class C a b c where { data T c a :: * } -- OK<br />
class C a b c where { data T a a :: * } -- Bad: repeated variable<br />
class D a where { data T a x :: * } -- Bad: x is not a class variable<br />
class D a where { data T a :: * -> * } -- OK<br />
</haskell><br />
<br />
=== Instance declarations ===<br />
<br />
Instance declarations of data and newtype families are very similar to standard data and newtype declarations. The only two differences are that the keyword <hask>data</hask> or <hask>newtype</hask> is followed by <hask>instance</hask> and that some or all of the type arguments can be non-variable types, but may not contain forall types or type synonym families. However, data families are generally allowed in type parameters, and type synonyms are allowed as long as they are fully applied and expand to a type that is itself admissible - exactly as this is required for occurrences of type synonyms in class instance parameters. For example, the <hask>Either</hask> instance for <hask>GMap</hask> is<br />
<haskell><br />
data instance GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)<br />
</haskell><br />
In this example, the declaration has only one variant. In general, it can be any number.<br />
<br />
Data and newtype instance declarations are only legit when an appropriate family declaration is in scope - just like class instances require the class declaration to be visible. Moreover, each instance declaration has to conform to the kind determined by its family declaration. This implies that the number of parameters of an instance declaration matches the arity determined by the kind of the family. Although all data families are declared with the <hask>data</hask> keyword, instances can be either <hask>data</hask> or <hask>newtype</hask>s, or a mix of both.<br />
<br />
Even if type families are defined as toplevel declarations, functions that perform different computations for different family instances still need to be defined as methods of type classes. In particular, the following is not possible:<br />
<haskell><br />
data family T a<br />
data instance T Int = A<br />
data instance T Char = B<br />
nonsense :: T a -> Int<br />
nonsense A = 1 -- WRONG: These two equations together...<br />
nonsense B = 2 -- ...will produce a type error.<br />
</haskell><br />
Given the functionality provided by GADTs (Generalised Algebraic Data Types), it might seem as if a definition, such as the above, should be feasible. However, type families - in contrast to GADTs - are ''open''; i.e., new instances can always be added, possibly in other modules. Supporting pattern matching across different data instances would require a form of extensible case construct.<br />
<br />
==== Associated type instances ====<br />
<br />
When an associated family instance is declared within a type class instance, we drop the <hask>instance</hask> keyword in the family instance. So, the <hask>Either</hask> instance for <hask>GMap</hask> becomes:<br />
<haskell><br />
instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where<br />
data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)<br />
...<br />
</haskell><br />
The most important point about associated family instances is that the type indices corresponding to class parameters must be identical to the type given in the instance head; here this is the first argument of <hask>GMap</hask>, namely <hask>Either a b</hask>, which coincides with the only class parameter. Any parameters to the family constructor that do not correspond to class parameters, need to be variables in every instance; here this is the variable <hask>v</hask>.<br />
<br />
Instances for an associated family can only appear as part of instance declarations of the class in which the family was declared - just as with the equations of the methods of a class. Also in correspondence to how methods are handled, declarations of associated types can be omitted in class instances. If an associated family instance is omitted, the corresponding instance type is not inhabited; i.e., only diverging expressions, such as <hask>undefined</hask>, can assume the type.<br />
<br />
==== Scoping of class parameters ====<br />
<br />
In the case of multi-parameter type classes, the visibility of class parameters in the right-hand side of associated family instances depends ''solely'' on the parameters of the data family. As an example, consider the simple class declaration<br />
<haskell><br />
class C a b where<br />
data T a<br />
</haskell><br />
Only one of the two class parameters is a parameter to the data family. Hence, the following instance declaration is invalid:<br />
<haskell><br />
instance C [c] d where<br />
data T [c] = MkT (c, d) -- WRONG!! 'd' is not in scope<br />
</haskell><br />
Here, the right-hand side of the data instance mentions the type variable <hask>d</hask> that does not occur in its left-hand side. We cannot admit such data instances as they would compromise type safety.<br />
<br />
==== Type class instances of family instances ====<br />
<br />
Type class instances of instances of data families can be defined as usual, and in particular data instance declarations can have <hask>deriving</hask> clauses. For example, we can write<br />
<haskell><br />
data GMap () v = GMapUnit (Maybe v)<br />
deriving Show<br />
</haskell><br />
which implicitly defines an instance of the form<br />
<haskell><br />
instance Show v => Show (GMap () v) where ...<br />
</haskell><br />
<br />
Note that class instances are always for particular ''instances'' of a data family and never for an entire family as a whole. This is for essentially the same reasons that we cannot define a toplevel function that performs pattern matching on the data constructors of ''different'' instances of a single type family. It would require a form of extensible case construct.<br />
<br />
==== Overlap ====<br />
<br />
The instance declarations of a data family used in a single program may not overlap at all, independent of whether they are associated or not. In contrast to type class instances, this is not only a matter of consistency, but one of type safety.<br />
<br />
=== Import and export ===<br />
<br />
The association of data constructors with type families is more dynamic than that is the case with standard data and newtype declarations. In the standard case, the notation <hask>T(..)</hask> in an import or export list denotes the type constructor and all the data constructors introduced in its declaration. However, a family declaration never introduces any data constructors; instead, data constructors are introduced by family instances. As a result, which data constructors are associated with a type family depends on the currently visible instance declarations for that family. Consequently, an import or export item of the form <hask>T(..)</hask> denotes the family constructor and all currently visible data constructors - in the case of an export item, these may be either imported or defined in the current module. The treatment of import and export items that explicitly list data constructors, such as <hask>GMap(GMapEither)</hask>, is analogous.<br />
<br />
==== Associated families ====<br />
<br />
As expected, an import or export item of the form <hask>C(..)</hask> denotes all of the class' methods and associated types. However, when associated types are explicitly listed as subitems of a class, we need some new syntax, as uppercase identifiers as subitems are usually data constructors, not type constructors. To clarify that we denote types here, each associated type name needs to be prefixed by the keyword <hask>type</hask>. So for example, when explicitly listing the components of the <hask>GMapKey</hask> class, we write <hask>GMapKey(type GMap, empty, lookup, insert)</hask>.<br />
<br />
==== Examples ====<br />
<br />
Assuming our running <hask>GMapKey</hask> class example, let us look at some export lists and their meaning:<br />
<br />
* <hask>module GMap (GMapKey) where...</hask>: Exports just the class name.<br />
* <hask>module GMap (GMapKey(..)) where...</hask>: Exports the class, the associated type <hask>GMap</hask> and the member functions <hask>empty</hask>, <hask>lookup</hask>, and <hask>insert</hask>. None of the data constructors is exported.<br />
* <hask>module GMap (GMapKey(..), GMap(..)) where...</hask>: As before, but also exports all the data constructors <hask>GMapInt</hask>, <hask>GMapChar</hask>, <hask>GMapUnit</hask>, <hask>GMapPair</hask>, and <hask>GMapEither</hask>.<br />
* <hask>module GMap (GMapKey(empty, lookup, insert), GMap(..)) where...</hask>: As before.<br />
* <hask>module GMap (GMapKey, empty, lookup, insert, GMap(..)) where...</hask>: As before.<br />
<br />
Finally, you can write <hask>GMapKey(type GMap)</hask> to denote both the class <hask>GMapKey</hask> as well as its associated type <hask>GMap</hask>. However, you cannot write <hask>GMapKey(type GMap(..))</hask> &mdash; i.e., sub-component specifications cannot be nested. To specify <hask>GMap</hask>'s data constructors, you have to list it separately.<br />
<br />
==== Instances ====<br />
<br />
Family instances are implicitly exported, just like class instances. However, this applies only to the heads of instances, not to the data constructors an instance defines.<br />
<br />
== An associated type synonym example ==<br />
<br />
Type synonym families are an alternative to functional dependencies, which makes functional dependency examples well suited to introduce type synonym families. In fact, type families are a more functional way to express the same as functional dependencies (despite the name!), as they replace the relational notation of functional dependencies by an expression-oriented notation; i.e., functions on types are really represented by functions and not relations.<br />
<br />
=== The <hask>class</hask> declaration ===<br />
<br />
Here's an example from Mark Jones' seminal paper on functional dependencies:<br />
<haskell><br />
class Collects e ce | ce -> e where<br />
empty :: ce<br />
insert :: e -> ce -> ce<br />
member :: e -> ce -> Bool<br />
toList :: ce -> [e]<br />
</haskell><br />
<br />
With associated type synonyms we can write this as<br />
<haskell><br />
class Collects ce where<br />
type Elem ce<br />
empty :: ce<br />
insert :: Elem ce -> ce -> ce<br />
member :: Elem ce -> ce -> Bool<br />
toList :: ce -> [Elem ce]<br />
</haskell><br />
Instead of the multi-parameter type class, we use a single parameter class, and the parameter <hask>e</hask><br />
turned into an associated type synonym <hask>Elem ce</hask>.<br />
<br />
=== An <hask>instance</hask>===<br />
<br />
Instances change correspondingly. An instance of the two-parameter class<br />
<haskell><br />
instance Eq e => Collects e [e] where<br />
empty = []<br />
insert e l = (e:l)<br />
member e [] = False<br />
member e (x:xs) <br />
| e == x = True<br />
| otherwise = member e xs<br />
toList l = l<br />
</haskell><br />
becomes an instance of a single-parameter class, where the dependent type parameter turns into an associated type instance declaration:<br />
<haskell><br />
instance Eq e => Collects [e] where<br />
type Elem [e] = e<br />
empty = []<br />
insert e l = (e:l)<br />
member e [] = False<br />
member e (x:xs) <br />
| e == x = True<br />
| otherwise = member e xs<br />
toList l = l<br />
</haskell><br />
<br />
=== Using generic collections ===<br />
<br />
With Functional Dependencies the code would be:<br />
<haskell><br />
sumCollects :: (Collects e c1, Collects e c2) => c1 -> c2 -> c2<br />
sumCollects c1 c2 = foldr insert c2 (toList c1)<br />
</haskell><br />
<br />
In contrast, with associated type synonyms, we get:<br />
<haskell><br />
sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 -> c2 -> c2<br />
sumCollects c1 c2 = foldr insert c2 (toList c1)<br />
</haskell><br />
<br />
== Detailed definition of type synonym families ==<br />
<br />
Type families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated type synonyms). The former is the more general variant, as it lacks the requirement for the type-indices to coincide with the class parameters. However, the latter can lead to more clearly structured code and compiler warnings if some type instances were - possibly accidentally - omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types.<br />
<br />
=== Family declarations ===<br />
<br />
Indexed type families are introduced by a signature, such as <br />
<haskell><br />
type family Elem c :: *<br />
</haskell><br />
The special <hask>family</hask> distinguishes family from standard type declarations. The result kind annotation is optional and, as usual, defaults to <hask>*</hask> if omitted. An example is<br />
<haskell><br />
type family Elem c<br />
</haskell><br />
Parameters can also be given explicit kind signatures if needed. We call the number of parameters in a type family declaration, the family's arity, and all applications of a type family must be fully saturated w.r.t. to that arity. This requirement is unlike ordinary type synonyms and it implies that the kind of a type family is not sufficient to determine a family's arity, and hence in general, also insufficient to determine whether a type family application is well formed. As an example, consider the following declaration:<br />
<haskell><br />
type family F a b :: * -> * -- F's arity is 2, <br />
-- although its overall kind is * -> * -> * -> *<br />
</haskell><br />
Given this declaration the following are examples of well-formed and malformed types:<br />
<haskell><br />
F Char [Int] -- OK! Kind: * -> *<br />
F Char [Int] Bool -- OK! Kind: *<br />
F IO Bool -- WRONG: kind mismatch in the first argument<br />
F Bool -- WRONG: unsaturated application<br />
</haskell><br />
<br />
A top-level type family can be declared as open or closed. (Associated type<br />
families are always open.) A closed type family has all of its equations<br />
defined in one place and cannot be extended, whereas an open family can have<br />
instances spread across modules. The advantage of a closed family is that<br />
its equations are tried in order, similar to a term-level function definition:<br />
<haskell><br />
type family G a where<br />
G Int = Bool<br />
G a = Char<br />
</haskell><br />
With this definition, the type <hask>G Int</hask> becomes <hask>Bool</hask><br />
and, say, <hask>G Double</hask> becomes <hask>Char</hask>. See also [http://ghc.haskell.org/trac/ghc/wiki/NewAxioms here] for more information about closed type families.<br />
<br />
==== Associated family declarations ====<br />
<br />
When a type family is declared as part of a type class, we drop the <hask>family</hask> special. The <hask>Elem</hask> declaration takes the following form<br />
<haskell><br />
class Collects ce where<br />
type Elem ce :: *<br />
...<br />
</haskell><br />
Exactly as in the case of an associated data declaration, '''the named type parameters must be a permutation of a subset of the class parameters'''. Examples<br />
<haskell><br />
class C a b c where { type T c a :: * } -- OK<br />
class D a where { type T a x :: * } -- No: x is not a class parameter<br />
class D a where { type T a :: * -> * } -- OK<br />
</haskell><br />
<br />
=== Type instance declarations ===<br />
<br />
Instance declarations of open type families are very similar to standard type synonym declarations. The only two differences are that the keyword <hask>type</hask> is followed by <hask>instance</hask> and that some or all of the type arguments can be non-variable types, but may not contain forall types or type synonym families. However, data families are generally allowed, and type synonyms are allowed as long as they are fully applied and expand to a type that is admissible - these are the exact same requirements as for data instances. For example, the <hask>[e]</hask> instance for <hask>Elem</hask> is<br />
<haskell><br />
type instance Elem [e] = e<br />
</haskell><br />
<br />
A type family instance declaration must satisfy the following rules:<br />
* An appropriate family declaration is in scope - just like class instances require the class declaration to be visible. <br />
* The instance declaration conforms to the kind determined by its family declaration<br />
* The number of type parameters in an instance declaration matches the number of type parameters in the family declaration.<br />
* The right-hand side of a type instance must be a monotype (i.e., it may not include foralls) and after the expansion of all saturated vanilla type synonyms, no synonyms, except family synonyms may remain.<br />
<br />
Here are some examples of admissible and illegal type instances and closed families:<br />
<haskell><br />
type family F a :: *<br />
type instance F [Int] = Int -- OK!<br />
type instance F String = Char -- OK!<br />
type instance F (F a) = a -- WRONG: type parameter mentions a type family<br />
type instance F (forall a. (a, b)) = b -- WRONG: a forall type appears in a type parameter<br />
type instance F Float = forall a.a -- WRONG: right-hand side may not be a forall type<br />
<br />
type family F2 a where -- OK!<br />
F2 (Maybe Int) = Int<br />
F2 (Maybe Bool) = Bool<br />
F2 (Maybe a) = String<br />
<br />
type family G a b :: * -> *<br />
type instance G Int = (,) -- WRONG: must be two type parameters<br />
type instance G Int Char Float = Double -- WRONG: must be two type parameters<br />
</haskell><br />
<br />
==== Closed family simplification ====<br />
<br />
Included in ghc starting 7.8.1.<br />
<br />
When dealing with closed families, simplifying the type is harder than just finding a left-hand side that matches and replacing that with a right-hand side. GHC will select an equation to use in a given type family application (the "target") if and only if the following 2 conditions hold:<br />
<br />
# There is a substitution from the variables in the equation's LHS that makes the left-hand side of the branch coincide with the target.<br />
# For each previous equation in the family: either the LHS of that equation is ''apart'' from the type family application, '''or''' the equation is ''compatible'' with the chosen equation.<br />
<br />
Now, we define ''apart'' and ''compatible'':<br />
# Two types are ''apart'' when one cannot simplify to the other, even after arbitrary type-family simplifications<br />
# Two equations are ''compatible'' if, either, their LHSs are apart or their LHSs unify and their RHSs are the same under the substitution induced by the unification.<br />
<br />
Some examples are in order:<br />
<haskell><br />
type family F a where<br />
F Int = Bool<br />
F Bool = Char<br />
F a = Bool<br />
<br />
type family And (a :: Bool) (b :: Bool) :: Bool where<br />
And False c = False<br />
And True d = d<br />
And e False = False<br />
And f True = f<br />
And g g = g<br />
</haskell><br />
<br />
In <hask>F</hask>, all pairs of equations are compatible except the second and third. The first two are compatible because their LHSs are apart. The first and third are compatible because the unifying substitution leads the RHSs to be the same. But, the second and third are not compatible because neither of these conditions holds. As a result, GHC will not use the third equation to simplify a target unless that target is apart from <hask>Bool</hask>.<br />
<br />
In <hask>And</hask>, ''every'' pair of equations is compatible, meaning GHC never has to make the extra apartness check during simplification.<br />
<br />
Why do all of this? It's a matter of type safety. Consider this example:<br />
<br />
<haskell><br />
type family J a b where<br />
J a a = Int<br />
J a b = Bool<br />
</haskell><br />
<br />
Say GHC selected the second branch just because the first doesn't apply at the moment, because two type variables are distinct. The problem is that those variables might later be instantiated at the same value, and then the first branch would have applied. You can convince this sort of inconsistency to produce <hask>unsafeCoerce</hask>.<br />
<br />
It gets worse. GHC has no internal notion of inequality, so it can't use previous, failed term-level GADT pattern matches to refine its type assumptions. For example:<br />
<br />
<haskell><br />
data G :: * -> * where<br />
GInt :: G Int<br />
GBool :: G Bool<br />
<br />
type family Foo (a :: *) :: * where<br />
Foo Int = Char<br />
Foo a = Double<br />
<br />
bar :: G a -> Foo a<br />
bar GInt = 'x'<br />
bar _ = 3.14<br />
</haskell><br />
<br />
The last line will fail to typecheck, because GHC doesn't know that the type variable <hask>a</hask> can't be <hask>Int</hask> here, even though it's obvious. The only general way to fix this is to have inequality evidence introduced into GHC, and that's a big deal and we don't know if we have the motivation for such a change yet.<br />
<br />
==== Associated type instances ====<br />
<br />
When an associated family instance is declared within a type class instance, we drop the <hask>instance</hask> keyword in the family instance. So, the <hask>[e]</hask> instance for <hask>Elem</hask> becomes:<br />
<haskell><br />
instance (Eq (Elem [e])) => Collects ([e]) where<br />
type Elem [e] = e<br />
...<br />
</haskell><br />
The most important point about associated family instances is that the type indexes corresponding to class parameters must be identical to the type given in the instance head; here this is <hask>[e]</hask>, which coincides with the only class parameter.<br />
<br />
Instances for an associated family can only appear as part of instance declarations of the class in which the family was declared - just as with the equations of the methods of a class. Also in correspondence to how methods are handled, declarations of associated types can be omitted in class instances. If an associated family instance is omitted, the corresponding instance type is not inhabited; i.e., only diverging expressions, such as <hask>undefined</hask>, can assume the type.<br />
<br />
==== Overlap ====<br />
<br />
The instance declarations of an open type family used in a single program must be compatible, in the form defined above. This condition is independent of whether the type family is associated or not, and it is not only a matter of consistency, but one of type safety. <br />
<br />
Here are two examples to illustrate the condition under which overlap is permitted.<br />
<haskell><br />
type instance F (a, Int) = [a]<br />
type instance F (Int, b) = [b] -- overlap permitted<br />
<br />
type instance G (a, Int) = [a]<br />
type instance G (Char, a) = [a] -- ILLEGAL overlap, as [Char] /= [Int]<br />
</haskell><br />
<br />
==== Decidability ====<br />
<br />
In order to guarantee that type inference in the presence of type families is decidable, we need to place a number of additional restrictions on the formation of type instance declarations (c.f., Definition 5 (Relaxed Conditions) of [http://www.cse.unsw.edu.au/~chak/papers/SPCS08.html Type Checking with Open Type Functions]). Instance declarations have the general form<br />
<haskell><br />
type instance F t1 .. tn = t<br />
</haskell><br />
where we require that for every type family application <hask>(G s1 .. sm)</hask> in <hask>t</hask>, <br />
# <hask>s1 .. sm</hask> do not contain any type family constructors,<br />
# the total number of symbols (data type constructors and type variables) in <hask>s1 .. sm</hask> is strictly smaller than in <hask>t1 .. tn</hask>, and<br />
# for every type variable <hask>a</hask>, <hask>a</hask> occurs in <hask>s1 .. sm</hask> at most as often as in <hask>t1 .. tn</hask>.<br />
These restrictions are easily verified and ensure termination of type inference. However, they are not sufficient to guarantee completeness of type inference in the presence of, so called, ''loopy equalities'', such as <hask>a ~ [F a]</hask>, where a recursive occurrence of a type variable is underneath a family application and data constructor application - see the above mentioned paper for details. <br />
<br />
If the option <tt>-XUndecidableInstances</tt> is passed to the compiler, the above restrictions are not enforced and it is on the programmer to ensure termination of the normalisation of type families during type inference.<br />
<br />
=== Equality constraints ===<br />
<br />
Type context can include equality constraints of the form <hask>t1 ~ t2</hask>, which denote that the types <hask>t1</hask> and <hask>t2</hask> need to be the same. In the presence of type families, whether two types are equal cannot generally be decided locally. Hence, the contexts of function signatures may include equality constraints, as in the following example:<br />
<haskell><br />
sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 -> c2 -> c2<br />
</haskell><br />
where we require that the element type of <hask>c1</hask> and <hask>c2</hask> are the same. In general, the types <hask>t1</hask> and <hask>t2</hask> of an equality constraint may be arbitrary monotypes; i.e., they may not contain any quantifiers, independent of whether higher-rank types are otherwise enabled.<br />
<br />
Equality constraints can also appear in class and instance contexts. The former enable a simple translation of programs using functional dependencies into programs using family synonyms instead. The general idea is to rewrite a class declaration of the form<br />
<haskell><br />
class C a b | a -> b<br />
</haskell><br />
to<br />
<haskell><br />
class (F a ~ b) => C a b where<br />
type F a<br />
</haskell><br />
That is, we represent every functional dependency (FD) <hask>a1 .. an -> b</hask> by an FD type family <hask>F a1 .. an</hask> and a superclass context equality <hask>F a1 .. an ~ b</hask>, essentially giving a name to the functional dependency. In class instances, we define the type instances of FD families in accordance with the class head. Method signatures are not affected by that process.<br />
<br />
== Frequently asked questions ==<br />
<br />
=== Comparing type families and functional dependencies ===<br />
<br />
Functional dependencies cover some of the same territory as type families. How do the two compare?<br />
There are some articles about this question:<br />
<br />
* Experiences in converting functional dependencies to type families: "[[Functional dependencies vs. type families]]"<br />
* [https://ghc.haskell.org/trac/ghc/wiki/TFvsFD GHC trac] on a comparison of functional dependencies and type families<br />
<br />
=== Injectivity, type inference, and ambiguity ===<br />
<br />
A common problem is this<br />
<haskell><br />
type family F a<br />
<br />
f :: F a -> F a<br />
f = undefined<br />
<br />
g :: F Int -> F Int<br />
g x = f x<br />
</haskell><br />
The compiler complains about the definition of <tt>g</tt> saying<br />
<code><br />
Couldn't match expected type `F Int' against inferred type `F a1'<br />
</code><br />
In type-checking <tt>g</tt>'s right hand side GHC discovers (by instantiating <tt>f</tt>'s type with a fresh type variable) that it has type <tt>F a1 -> F a1</tt> for some as-yet-unknown type <tt>a1</tt>. Now it tries to make the inferred type match <tt>g</tt>'s type signature. Well, you say, just make <tt>a1</tt> equal to <tt>Int</tt> and you are done. True, but what if there were these instances<br />
<haskell><br />
type instance F Int = Bool<br />
type instance F Char = Bool<br />
</haskell><br />
Then making <tt>a1</tt> equal to <tt>Char</tt> would ''also'' make the two types equal. Because there is (potentially) more than one choice, the program is rejected.<br />
<br />
However (and confusingly) if you omit the type signature on <tt>g</tt> altogether, thus<br />
<haskell><br />
f :: F a -> F a<br />
f = undefined<br />
<br />
g x = f x<br />
</haskell><br />
GHC will happily infer the type <tt>g :: F a -> F a</tt>. But you can't ''write'' that type signature or, indeed, the more specific one above. (Arguably this behaviour, where GHC ''infers'' a type it can't ''check'', is very confusing. I suppose we could make GHC reject both programs, with and without type signatures.)<br />
<br />
'''What is the problem?''' The nub of the issue is this: knowing that <tt>F t1</tt>=<tt>F t2</tt> does ''not'' imply that <tt>t1</tt> = <tt>t2</tt>.<br />
The difficulty is that the type function <tt>F</tt> need not be ''injective''; it can map two distinct types to the same type. For an injective type constructor like <tt>Maybe</tt>, if we know that <tt>Maybe t1</tt> = <tt>Maybe t2</tt>, then we know that <tt>t1</tt> = <tt>t2</tt>. But not so for non-injective type functions.<br />
<br />
The problem starts with <tt>f</tt>. Its type is ''ambiguous''; even if I know the argument and result types for <tt>f</tt>, I cannot use that to find the type at which <tt>a</tt> should be instantiated. (So arguably, <tt>f</tt> should be rejected as having an ambiguous type, and probably will be in future.) The situation is well known in type classes: <br />
<haskell><br />
bad :: (Read a, Show a) => String -> String<br />
bad x = show (read x)<br />
</haskell><br />
At a call of <tt>bad</tt> one cannot tell at what type <tt>a</tt> should be instantiated.<br />
<br />
The only solution is to avoid ambiguous types. In the type signature of a function, <br />
* Ensure that every type variable occurs in the part after the "<tt>=></tt>"<br />
* Ensure that every type variable appears at least once outside a type function call.<br />
Alternatively, you can use data families, which create new types and are therefore injective. The following code works:<br />
<br />
<haskell>data family F a<br />
<br />
f :: F a -> F a<br />
f = undefined<br />
<br />
g :: F Int -> F Int<br />
g x = f x</haskell><br />
<br />
== References ==<br />
<br />
* [http://www.cse.unsw.edu.au/~chak/papers/CKPM05.html Associated Types with Class.] Manuel M. T. Chakravarty, Gabriele Keller, Simon Peyton Jones, and Simon Marlow. In ''Proceedings of The 32nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'05)'', pages 1-13, ACM Press, 2005.<br />
* [http://www.cse.unsw.edu.au/~chak/papers/CKP05.html Associated Type Synonyms.] Manuel M. T. Chakravarty, Gabriele Keller, and Simon Peyton Jones. In ''Proceedings of The Tenth ACM SIGPLAN International Conference on Functional Programming'', ACM Press, pages 241-253, 2005.<br />
* [http://www.cse.unsw.edu.au/~chak/papers/SCPD07.html System F with Type Equality Coercions.] Martin Sulzmann, Manuel M. T. Chakravarty, Simon Peyton Jones, and Kevin Donnelly. In ''Proceedings of The Third ACM SIGPLAN Workshop on Types in Language Design and Implementation'', ACM Press, 2007.<br />
* [http://www.cse.unsw.edu.au/~chak/papers/SPCS08.html Type Checking With Open Type Functions.] Tom Schrijvers, Simon Peyton-Jones, Manuel M. T. Chakravarty, Martin Sulzmann. In ''Proceedings of The 13th ACM SIGPLAN International Conference on Functional Programming'', ACM Press, pages 51-62, 2008.<br />
* [[Simonpj/Talk:FunWithTypeFuns | Fun with Type Functions]] Oleg Kiselyov, Simon Peyton Jones, Chung-chieh Shan (the source for this paper can be found at http://patch-tag.com/r/schoenfinkel/typefunctions/wiki ) <br />
<br />
[[Category:Type-level programming]]<br />
[[Category:Language extensions]]<br />
[[Category:GHC|Indexed types]]</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=GHC/Type_families&diff=62964GHC/Type families2019-07-22T12:59:30Z<p>Anton-Latukha: fx /* Requirements to use type families */</p>
<hr />
<div>{{GHCUsersGuide|glasgow_exts|type-families|a Type Family section}}<br />
<br />
<br />
Indexed type families, or '''type families''' for short, are a Haskell extension supporting ad-hoc overloading of data types. Type families are parametric types that can be assigned specialized representations based on the type parameters they are instantiated with. They are the data type analogue of [[Type class|type classes]]: families are used to define overloaded ''data'' in the same way that classes are used to define overloaded ''functions''. Type families are useful for generic programming, for creating highly parameterised library interfaces, and for creating interfaces with enhanced static information, much like dependent types.<br />
<br />
Type families come in two flavors: ''data families'' and ''type synonym families''. Data families are the indexed form of data and newtype definitions. Type synonym families are the indexed form of type synonyms. Each of these flavors can be defined in a standalone manner or ''associated'' with a type class. Standalone definitions are more general, while associated types can more clearly express how a type is used and lead to better error messages.<br />
<br />
''NB: see also Simon's [https://www.haskell.org/ghc/blog/20100930-LetGeneralisationInGhc7.html blog entry on let generalisation] for a significant change in the policy for let generalisation, driven by the type family extension. In brief: a few programs will puzzlingly fail to compile with <tt>-XTypeFamilies</tt> even though the code is legal Haskell 98.''<br />
<br />
== What are type families? ==<br />
<br />
The concept of a type family comes from type theory. An indexed type family in type theory is a partial function at the type level. Applying the function to parameters (called ''type indices'') yields a type. Type families permit a program to compute what data constructors it will operate on, rather than having them fixed statically (as with simple type systems) or treated as opaque unknowns (as with parametrically polymorphic types).<br />
<br />
Type families are to vanilla data types what type class methods are to regular functions. Vanilla polymorphic data types and functions have a single definition, which is used at all type instances. Classes and type families, on the other hand, have an interface definition and any number of instance definitions. A type family's interface definition declares its [[kind]] and its arity, or the number of type indices it takes. Instance definitions define the type family over some part of the domain.<br />
<br />
As a simple example of how type families differ from ordinary parametric data types, consider a strict list type. We can represent a list of <hask>Char</hask> in the usual way, with cons cells. We can do the same thing to represent a list of <hask>()</hask>, but since a strict <hask>()</hask> value carries no useful information, it would be more efficient to just store the length of the list. This can't be done with an ordinary parametric data type, because the data constructors used to represent the list would depend on the list's type parameter: if it's <hask>Char</hask> then the list consists of cons cells; if it's <hask>()</hask>, then the list consists of a single integer. We basically want to select between two different data types based on a type parameter. Using type families, this list type could be declared as follows:<br />
<br />
<haskell><br />
-- Declare a list-like data family<br />
data family XList a<br />
<br />
-- Declare a list-like instance for Char<br />
data instance XList Char = XCons !Char !(XList Char) | XNil<br />
<br />
-- Declare a number-like instance for ()<br />
data instance XList () = XListUnit !Int<br />
</haskell><br />
<br />
The right-hand sides of the two <code>data instance</code> declarations are exactly ordinary data definitions. In fact, a <code>data instance</code> declaration is nothing more than a shorthand for a <code>data</code> declaration followed by a <code>type instance</code> (see below) declaration. However, they define two instances of the same parametric data type, <hask>XList Char</hask> and <hask>XList ()</hask>, whereas ordinary data declarations define completely unrelated types. A recent [[Simonpj/Talk:FunWithTypeFuns|tutorial paper]] has more in-depth examples of programming with type families. <br />
<br />
[[GADT]]s bear some similarity to type families, in the sense that they allow a parametric type's constructors to depend on the type's parameters. However, all GADT constructors must be defined in one place, whereas type families can be extended. Functional dependencies are similar to type families, and many type classes that use functional dependencies can be equivalently expressed with type families. Type families provide a more functional style of type-level programming than the relational style of functional dependencies.<br />
<br />
== Requirements to use type families ==<br />
<br />
Type families are a GHC extension enabled with the <code>-XTypeFamilies</code> flag or the <code>{-# LANGUAGE TypeFamilies #-}</code> pragma. The first stable release of GHC that properly supports type families is 6.10.1. (The 6.8 release included an early partial implementation, but its use is deprecated.) Please [http://hackage.haskell.org/trac/ghc/query?status=new&status=assigned&status=reopened&group=priority&type=bug&order=id&desc=1 report bugs] via the GHC bug tracker, ideally accompanied by a small example program that demonstrates the problem. Use the [mailto:glasgow-haskell-users@haskell.org GHC mailing list] for questions or for a discussion of this language extension and its description on this wiki page.<br />
<br />
== An associated data type example ==<br />
<br />
As an example, consider Ralf Hinze's [http://www.cs.ox.ac.uk/ralf.hinze/publications/GGTries.ps.gz generalised tries], a form of generic finite maps. <br />
<br />
=== The class declaration ===<br />
<br />
We define a type class whose instances are the types that we can use as keys in our generic maps:<br />
<haskell><br />
class GMapKey k where<br />
data GMap k :: * -> *<br />
empty :: GMap k v<br />
lookup :: k -> GMap k v -> Maybe v<br />
insert :: k -> v -> GMap k v -> GMap k v<br />
</haskell><br />
The interesting part is the ''associated data family'' declaration of the class. It gives a [http://www.haskell.org/ghc/docs/latest/html/users_guide/type-families.html#data-family-declarations ''kind signature''] (here <hask>* -> *</hask>) for the associated data type <hask>GMap k</hask> - analogous to how methods receive a type signature in a class declaration.<br />
<br />
What it is important to notice is that the first parameter of the associated type <hask>GMap</hask> coincides with the class parameter of <hask>GMapKey</hask>. This indicates that also in all instances of the class, the instances of the associated data type need to have their first argument match up with the instance type. In general, the type arguments of an associated type can be a subset of the class parameters (in a multi-parameter type class) and they can appear in any order, possibly in an order other than in the class head. The latter can be useful if the associated data type is partially applied in some contexts.<br />
<br />
The second important point is that as <hask>GMap k</hask> has kind <hask>* -> *</hask> and <hask>k</hask> (implicitly) has kind <hask>*</hask>, the type constructor <hask>GMap</hask> (without an argument) has kind <hask>* -> * -> *</hask>. Consequently, we see that <hask>GMap</hask> is applied to two arguments in the signatures of the methods <hask>empty</hask>, <hask>lookup</hask>, and <hask>insert</hask>.<br />
<br />
=== An Int instance ===<br />
<br />
To use Ints as keys into generic maps, we declare an instance that simply uses <hask>Data.IntMap</hask>, thusly:<br />
<haskell><br />
instance GMapKey Int where<br />
data GMap Int v = GMapInt (Data.IntMap.IntMap v)<br />
empty = GMapInt Data.IntMap.empty<br />
lookup k (GMapInt m) = Data.IntMap.lookup k m<br />
insert k v (GMapInt m) = GMapInt (Data.IntMap.insert k v m)<br />
</haskell><br />
The <hask>Int</hask> instance of the associated data type <hask>GMap</hask> needs to have both of its parameters, but as only the first one corresponds to a class parameter, the second needs to be a type variable (here <hask>v</hask>). As mentioned before, any associated type parameter that corresponds to a class parameter must be identical to the class parameter in each instance. The right hand side of the associated data declaration is like that of any other data type.<br />
<br />
NB: At the moment, GADT syntax is not allowed for associated data types (or other indexed types). This is not a fundamental limitation, but just a shortcoming of the current implementation, which we expect to lift in the future.<br />
<br />
As an exercise, implement an instance for <hask>Char</hask> that maps back to the <hask>Int</hask> instance using the conversion functions <hask>Char.ord</hask> and <hask>Char.chr</hask>.<br />
<br />
=== A unit instance ===<br />
<br />
Generic definitions, apart from elementary types, typically cover units, products, and sums. We start here with the unit instance for <hask>GMap</hask>:<br />
<haskell><br />
instance GMapKey () where<br />
data GMap () v = GMapUnit (Maybe v)<br />
empty = GMapUnit Nothing<br />
lookup () (GMapUnit v) = v<br />
insert () v (GMapUnit _) = GMapUnit $ Just v<br />
</haskell><br />
For unit, the map is just a <hask>Maybe</hask> value.<br />
<br />
=== Product and sum instances ===<br />
<br />
Next, let us define the instances for pairs and sums (i.e., <hask>Either</hask>):<br />
<haskell><br />
instance (GMapKey a, GMapKey b) => GMapKey (a, b) where<br />
data GMap (a, b) v = GMapPair (GMap a (GMap b v))<br />
empty = GMapPair empty<br />
lookup (a, b) (GMapPair gm) = lookup a gm >>= lookup b <br />
insert (a, b) v (GMapPair gm) = GMapPair $ case lookup a gm of<br />
Nothing -> insert a (insert b v empty) gm<br />
Just gm2 -> insert a (insert b v gm2 ) gm<br />
<br />
instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where<br />
data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)<br />
empty = GMapEither empty empty<br />
lookup (Left a) (GMapEither gm1 _gm2) = lookup a gm1<br />
lookup (Right b) (GMapEither _gm1 gm2 ) = lookup b gm2<br />
insert (Left a) v (GMapEither gm1 gm2) = GMapEither (insert a v gm1) gm2<br />
insert (Right b) v (GMapEither gm1 gm2) = GMapEither gm1 (insert b v gm2)<br />
</haskell><br />
If you find this code algorithmically surprising, I'd suggest to have a look at [http://www.cs.ox.ac.uk/ralf.hinze/publications/index.html#J4 Ralf Hinze's paper]. The only novelty concerning associated types, in these two instances, is that the instances have a context <hask>(GMapKey a, GMapKey b)</hask>. Consequently, the right hand sides of the associated type declarations can use <hask>GMap</hask> recursively at the key types <hask>a</hask> and <hask>b</hask> - not unlike the method definitions use the class methods recursively at the types for which the class is given in the instance context.<br />
<br />
=== Using a generic map ===<br />
<br />
Finally, some code building and querying a generic map:<br />
<haskell><br />
myGMap :: GMap (Int, Either Char ()) String<br />
myGMap = insert (5, Left 'c') "(5, Left 'c')" $<br />
insert (4, Right ()) "(4, Right ())" $<br />
insert (5, Right ()) "This is the one!" $<br />
insert (5, Right ()) "This is the two!" $<br />
insert (6, Right ()) "(6, Right ())" $<br />
insert (5, Left 'a') "(5, Left 'a')" $<br />
empty<br />
main = putStrLn $ maybe "Couldn't find key!" id $ lookup (5, Right ()) myGMap<br />
</haskell><br />
<br />
=== Download the code ===<br />
<br />
If you want to play with this example without copying it off the wiki, just download the source code[http://darcs.haskell.org/testsuite/tests/ghc-regress/indexed-types/should_run/GMapAssoc.hs] for <hask>GMap</hask> from GHC's test suite.<br />
<br />
== Detailed definition of data families ==<br />
<br />
Data families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated types). The former is the more general variant, as it lacks the requirement for the type-indices to coincide with the class parameters. However, the latter can lead to more clearly structured code and compiler warnings if some type instances were - possibly accidentally - omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types.<br />
<br />
=== Family declarations ===<br />
<br />
Indexed data families are introduced by a signature, such as <br />
<haskell><br />
data family GMap k :: * -> *<br />
</haskell><br />
The special <hask>family</hask> distinguishes family from standard data declarations. The result kind annotation is optional and, as usual, defaults to <hask>*</hask> if omitted. An example is<br />
<haskell><br />
data family Array e<br />
</haskell><br />
Named arguments can also be given explicit kind signatures if needed. Just as with [http://www.haskell.org/ghc/docs/latest/html/users_guide/gadt.html GADT declarations] named arguments are entirely optional, so that we can declare <hask>Array</hask> alternatively with<br />
<haskell><br />
data family Array :: * -> *<br />
</haskell><br />
<br />
==== Associated family declarations ====<br />
<br />
When a data family is declared as part of a type class, we drop the <hask>family</hask> keyword. The <hask>GMap</hask> declaration takes the following form<br />
<haskell><br />
class GMapKey k where<br />
data GMap k :: * -> *<br />
...<br />
</haskell><br />
In contrast to toplevel declarations, named arguments must be used for all type parameters that are to be used as type-indices. Moreover, the argument names must be class parameters. Each class parameter may only be used at most once per associated type, but some may be omitted and they may be in an order other than in the class head. In other words: '''the named type parameters of the data declaration must be a permutation of a subset of the class variables'''. <br />
<br />
Example is admissible:<br />
<haskell><br />
class C a b c where { data T c a :: * } -- OK<br />
class C a b c where { data T a a :: * } -- Bad: repeated variable<br />
class D a where { data T a x :: * } -- Bad: x is not a class variable<br />
class D a where { data T a :: * -> * } -- OK<br />
</haskell><br />
<br />
=== Instance declarations ===<br />
<br />
Instance declarations of data and newtype families are very similar to standard data and newtype declarations. The only two differences are that the keyword <hask>data</hask> or <hask>newtype</hask> is followed by <hask>instance</hask> and that some or all of the type arguments can be non-variable types, but may not contain forall types or type synonym families. However, data families are generally allowed in type parameters, and type synonyms are allowed as long as they are fully applied and expand to a type that is itself admissible - exactly as this is required for occurrences of type synonyms in class instance parameters. For example, the <hask>Either</hask> instance for <hask>GMap</hask> is<br />
<haskell><br />
data instance GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)<br />
</haskell><br />
In this example, the declaration has only one variant. In general, it can be any number.<br />
<br />
Data and newtype instance declarations are only legit when an appropriate family declaration is in scope - just like class instances require the class declaration to be visible. Moreover, each instance declaration has to conform to the kind determined by its family declaration. This implies that the number of parameters of an instance declaration matches the arity determined by the kind of the family. Although all data families are declared with the <hask>data</hask> keyword, instances can be either <hask>data</hask> or <hask>newtype</hask>s, or a mix of both.<br />
<br />
Even if type families are defined as toplevel declarations, functions that perform different computations for different family instances still need to be defined as methods of type classes. In particular, the following is not possible:<br />
<haskell><br />
data family T a<br />
data instance T Int = A<br />
data instance T Char = B<br />
nonsense :: T a -> Int<br />
nonsense A = 1 -- WRONG: These two equations together...<br />
nonsense B = 2 -- ...will produce a type error.<br />
</haskell><br />
Given the functionality provided by GADTs (Generalised Algebraic Data Types), it might seem as if a definition, such as the above, should be feasible. However, type families - in contrast to GADTs - are ''open''; i.e., new instances can always be added, possibly in other modules. Supporting pattern matching across different data instances would require a form of extensible case construct.<br />
<br />
==== Associated type instances ====<br />
<br />
When an associated family instance is declared within a type class instance, we drop the <hask>instance</hask> keyword in the family instance. So, the <hask>Either</hask> instance for <hask>GMap</hask> becomes:<br />
<haskell><br />
instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where<br />
data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)<br />
...<br />
</haskell><br />
The most important point about associated family instances is that the type indices corresponding to class parameters must be identical to the type given in the instance head; here this is the first argument of <hask>GMap</hask>, namely <hask>Either a b</hask>, which coincides with the only class parameter. Any parameters to the family constructor that do not correspond to class parameters, need to be variables in every instance; here this is the variable <hask>v</hask>.<br />
<br />
Instances for an associated family can only appear as part of instance declarations of the class in which the family was declared - just as with the equations of the methods of a class. Also in correspondence to how methods are handled, declarations of associated types can be omitted in class instances. If an associated family instance is omitted, the corresponding instance type is not inhabited; i.e., only diverging expressions, such as <hask>undefined</hask>, can assume the type.<br />
<br />
==== Scoping of class parameters ====<br />
<br />
In the case of multi-parameter type classes, the visibility of class parameters in the right-hand side of associated family instances depends ''solely'' on the parameters of the data family. As an example, consider the simple class declaration<br />
<haskell><br />
class C a b where<br />
data T a<br />
</haskell><br />
Only one of the two class parameters is a parameter to the data family. Hence, the following instance declaration is invalid:<br />
<haskell><br />
instance C [c] d where<br />
data T [c] = MkT (c, d) -- WRONG!! 'd' is not in scope<br />
</haskell><br />
Here, the right-hand side of the data instance mentions the type variable <hask>d</hask> that does not occur in its left-hand side. We cannot admit such data instances as they would compromise type safety.<br />
<br />
==== Type class instances of family instances ====<br />
<br />
Type class instances of instances of data families can be defined as usual, and in particular data instance declarations can have <hask>deriving</hask> clauses. For example, we can write<br />
<haskell><br />
data GMap () v = GMapUnit (Maybe v)<br />
deriving Show<br />
</haskell><br />
which implicitly defines an instance of the form<br />
<haskell><br />
instance Show v => Show (GMap () v) where ...<br />
</haskell><br />
<br />
Note that class instances are always for particular ''instances'' of a data family and never for an entire family as a whole. This is for essentially the same reasons that we cannot define a toplevel function that performs pattern matching on the data constructors of ''different'' instances of a single type family. It would require a form of extensible case construct.<br />
<br />
==== Overlap ====<br />
<br />
The instance declarations of a data family used in a single program may not overlap at all, independent of whether they are associated or not. In contrast to type class instances, this is not only a matter of consistency, but one of type safety.<br />
<br />
=== Import and export ===<br />
<br />
The association of data constructors with type families is more dynamic than that is the case with standard data and newtype declarations. In the standard case, the notation <hask>T(..)</hask> in an import or export list denotes the type constructor and all the data constructors introduced in its declaration. However, a family declaration never introduces any data constructors; instead, data constructors are introduced by family instances. As a result, which data constructors are associated with a type family depends on the currently visible instance declarations for that family. Consequently, an import or export item of the form <hask>T(..)</hask> denotes the family constructor and all currently visible data constructors - in the case of an export item, these may be either imported or defined in the current module. The treatment of import and export items that explicitly list data constructors, such as <hask>GMap(GMapEither)</hask>, is analogous.<br />
<br />
==== Associated families ====<br />
<br />
As expected, an import or export item of the form <hask>C(..)</hask> denotes all of the class' methods and associated types. However, when associated types are explicitly listed as subitems of a class, we need some new syntax, as uppercase identifiers as subitems are usually data constructors, not type constructors. To clarify that we denote types here, each associated type name needs to be prefixed by the keyword <hask>type</hask>. So for example, when explicitly listing the components of the <hask>GMapKey</hask> class, we write <hask>GMapKey(type GMap, empty, lookup, insert)</hask>.<br />
<br />
==== Examples ====<br />
<br />
Assuming our running <hask>GMapKey</hask> class example, let us look at some export lists and their meaning:<br />
<br />
* <hask>module GMap (GMapKey) where...</hask>: Exports just the class name.<br />
* <hask>module GMap (GMapKey(..)) where...</hask>: Exports the class, the associated type <hask>GMap</hask> and the member functions <hask>empty</hask>, <hask>lookup</hask>, and <hask>insert</hask>. None of the data constructors is exported.<br />
* <hask>module GMap (GMapKey(..), GMap(..)) where...</hask>: As before, but also exports all the data constructors <hask>GMapInt</hask>, <hask>GMapChar</hask>, <hask>GMapUnit</hask>, <hask>GMapPair</hask>, and <hask>GMapEither</hask>.<br />
* <hask>module GMap (GMapKey(empty, lookup, insert), GMap(..)) where...</hask>: As before.<br />
* <hask>module GMap (GMapKey, empty, lookup, insert, GMap(..)) where...</hask>: As before.<br />
<br />
Finally, you can write <hask>GMapKey(type GMap)</hask> to denote both the class <hask>GMapKey</hask> as well as its associated type <hask>GMap</hask>. However, you cannot write <hask>GMapKey(type GMap(..))</hask> &mdash; i.e., sub-component specifications cannot be nested. To specify <hask>GMap</hask>'s data constructors, you have to list it separately.<br />
<br />
==== Instances ====<br />
<br />
Family instances are implicitly exported, just like class instances. However, this applies only to the heads of instances, not to the data constructors an instance defines.<br />
<br />
== An associated type synonym example ==<br />
<br />
Type synonym families are an alternative to functional dependencies, which makes functional dependency examples well suited to introduce type synonym families. In fact, type families are a more functional way to express the same as functional dependencies (despite the name!), as they replace the relational notation of functional dependencies by an expression-oriented notation; i.e., functions on types are really represented by functions and not relations.<br />
<br />
=== The <hask>class</hask> declaration ===<br />
<br />
Here's an example from Mark Jones' seminal paper on functional dependencies:<br />
<haskell><br />
class Collects e ce | ce -> e where<br />
empty :: ce<br />
insert :: e -> ce -> ce<br />
member :: e -> ce -> Bool<br />
toList :: ce -> [e]<br />
</haskell><br />
<br />
With associated type synonyms we can write this as<br />
<haskell><br />
class Collects ce where<br />
type Elem ce<br />
empty :: ce<br />
insert :: Elem ce -> ce -> ce<br />
member :: Elem ce -> ce -> Bool<br />
toList :: ce -> [Elem ce]<br />
</haskell><br />
Instead of the multi-parameter type class, we use a single parameter class, and the parameter <hask>e</hask><br />
turned into an associated type synonym <hask>Elem ce</hask>.<br />
<br />
=== An <hask>instance</hask>===<br />
<br />
Instances change correspondingly. An instance of the two-parameter class<br />
<haskell><br />
instance Eq e => Collects e [e] where<br />
empty = []<br />
insert e l = (e:l)<br />
member e [] = False<br />
member e (x:xs) <br />
| e == x = True<br />
| otherwise = member e xs<br />
toList l = l<br />
</haskell><br />
becomes an instance of a single-parameter class, where the dependent type parameter turns into an associated type instance declaration:<br />
<haskell><br />
instance Eq e => Collects [e] where<br />
type Elem [e] = e<br />
empty = []<br />
insert e l = (e:l)<br />
member e [] = False<br />
member e (x:xs) <br />
| e == x = True<br />
| otherwise = member e xs<br />
toList l = l<br />
</haskell><br />
<br />
=== Using generic collections ===<br />
<br />
With Functional Dependencies the code would be:<br />
<haskell><br />
sumCollects :: (Collects e c1, Collects e c2) => c1 -> c2 -> c2<br />
sumCollects c1 c2 = foldr insert c2 (toList c1)<br />
</haskell><br />
<br />
In contrast, with associated type synonyms, we get:<br />
<haskell><br />
sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 -> c2 -> c2<br />
sumCollects c1 c2 = foldr insert c2 (toList c1)<br />
</haskell><br />
<br />
== Detailed definition of type synonym families ==<br />
<br />
Type families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated type synonyms). The former is the more general variant, as it lacks the requirement for the type-indices to coincide with the class parameters. However, the latter can lead to more clearly structured code and compiler warnings if some type instances were - possibly accidentally - omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types.<br />
<br />
=== Family declarations ===<br />
<br />
Indexed type families are introduced by a signature, such as <br />
<haskell><br />
type family Elem c :: *<br />
</haskell><br />
The special <hask>family</hask> distinguishes family from standard type declarations. The result kind annotation is optional and, as usual, defaults to <hask>*</hask> if omitted. An example is<br />
<haskell><br />
type family Elem c<br />
</haskell><br />
Parameters can also be given explicit kind signatures if needed. We call the number of parameters in a type family declaration, the family's arity, and all applications of a type family must be fully saturated w.r.t. to that arity. This requirement is unlike ordinary type synonyms and it implies that the kind of a type family is not sufficient to determine a family's arity, and hence in general, also insufficient to determine whether a type family application is well formed. As an example, consider the following declaration:<br />
<haskell><br />
type family F a b :: * -> * -- F's arity is 2, <br />
-- although its overall kind is * -> * -> * -> *<br />
</haskell><br />
Given this declaration the following are examples of well-formed and malformed types:<br />
<haskell><br />
F Char [Int] -- OK! Kind: * -> *<br />
F Char [Int] Bool -- OK! Kind: *<br />
F IO Bool -- WRONG: kind mismatch in the first argument<br />
F Bool -- WRONG: unsaturated application<br />
</haskell><br />
<br />
A top-level type family can be declared as open or closed. (Associated type<br />
families are always open.) A closed type family has all of its equations<br />
defined in one place and cannot be extended, whereas an open family can have<br />
instances spread across modules. The advantage of a closed family is that<br />
its equations are tried in order, similar to a term-level function definition:<br />
<haskell><br />
type family G a where<br />
G Int = Bool<br />
G a = Char<br />
</haskell><br />
With this definition, the type <hask>G Int</hask> becomes <hask>Bool</hask><br />
and, say, <hask>G Double</hask> becomes <hask>Char</hask>. See also [http://ghc.haskell.org/trac/ghc/wiki/NewAxioms here] for more information about closed type families.<br />
<br />
==== Associated family declarations ====<br />
<br />
When a type family is declared as part of a type class, we drop the <hask>family</hask> special. The <hask>Elem</hask> declaration takes the following form<br />
<haskell><br />
class Collects ce where<br />
type Elem ce :: *<br />
...<br />
</haskell><br />
Exactly as in the case of an associated data declaration, '''the named type parameters must be a permutation of a subset of the class parameters'''. Examples<br />
<haskell><br />
class C a b c where { type T c a :: * } -- OK<br />
class D a where { type T a x :: * } -- No: x is not a class parameter<br />
class D a where { type T a :: * -> * } -- OK<br />
</haskell><br />
<br />
=== Type instance declarations ===<br />
<br />
Instance declarations of open type families are very similar to standard type synonym declarations. The only two differences are that the keyword <hask>type</hask> is followed by <hask>instance</hask> and that some or all of the type arguments can be non-variable types, but may not contain forall types or type synonym families. However, data families are generally allowed, and type synonyms are allowed as long as they are fully applied and expand to a type that is admissible - these are the exact same requirements as for data instances. For example, the <hask>[e]</hask> instance for <hask>Elem</hask> is<br />
<haskell><br />
type instance Elem [e] = e<br />
</haskell><br />
<br />
A type family instance declaration must satisfy the following rules:<br />
* An appropriate family declaration is in scope - just like class instances require the class declaration to be visible. <br />
* The instance declaration conforms to the kind determined by its family declaration<br />
* The number of type parameters in an instance declaration matches the number of type parameters in the family declaration.<br />
* The right-hand side of a type instance must be a monotype (i.e., it may not include foralls) and after the expansion of all saturated vanilla type synonyms, no synonyms, except family synonyms may remain.<br />
<br />
Here are some examples of admissible and illegal type instances and closed families:<br />
<haskell><br />
type family F a :: *<br />
type instance F [Int] = Int -- OK!<br />
type instance F String = Char -- OK!<br />
type instance F (F a) = a -- WRONG: type parameter mentions a type family<br />
type instance F (forall a. (a, b)) = b -- WRONG: a forall type appears in a type parameter<br />
type instance F Float = forall a.a -- WRONG: right-hand side may not be a forall type<br />
<br />
type family F2 a where -- OK!<br />
F2 (Maybe Int) = Int<br />
F2 (Maybe Bool) = Bool<br />
F2 (Maybe a) = String<br />
<br />
type family G a b :: * -> *<br />
type instance G Int = (,) -- WRONG: must be two type parameters<br />
type instance G Int Char Float = Double -- WRONG: must be two type parameters<br />
</haskell><br />
<br />
==== Closed family simplification ====<br />
<br />
Included in ghc starting 7.8.1.<br />
<br />
When dealing with closed families, simplifying the type is harder than just finding a left-hand side that matches and replacing that with a right-hand side. GHC will select an equation to use in a given type family application (the "target") if and only if the following 2 conditions hold:<br />
<br />
# There is a substitution from the variables in the equation's LHS that makes the left-hand side of the branch coincide with the target.<br />
# For each previous equation in the family: either the LHS of that equation is ''apart'' from the type family application, '''or''' the equation is ''compatible'' with the chosen equation.<br />
<br />
Now, we define ''apart'' and ''compatible'':<br />
# Two types are ''apart'' when one cannot simplify to the other, even after arbitrary type-family simplifications<br />
# Two equations are ''compatible'' if, either, their LHSs are apart or their LHSs unify and their RHSs are the same under the substitution induced by the unification.<br />
<br />
Some examples are in order:<br />
<haskell><br />
type family F a where<br />
F Int = Bool<br />
F Bool = Char<br />
F a = Bool<br />
<br />
type family And (a :: Bool) (b :: Bool) :: Bool where<br />
And False c = False<br />
And True d = d<br />
And e False = False<br />
And f True = f<br />
And g g = g<br />
</haskell><br />
<br />
In <hask>F</hask>, all pairs of equations are compatible except the second and third. The first two are compatible because their LHSs are apart. The first and third are compatible because the unifying substitution leads the RHSs to be the same. But, the second and third are not compatible because neither of these conditions holds. As a result, GHC will not use the third equation to simplify a target unless that target is apart from <hask>Bool</hask>.<br />
<br />
In <hask>And</hask>, ''every'' pair of equations is compatible, meaning GHC never has to make the extra apartness check during simplification.<br />
<br />
Why do all of this? It's a matter of type safety. Consider this example:<br />
<br />
<haskell><br />
type family J a b where<br />
J a a = Int<br />
J a b = Bool<br />
</haskell><br />
<br />
Say GHC selected the second branch just because the first doesn't apply at the moment, because two type variables are distinct. The problem is that those variables might later be instantiated at the same value, and then the first branch would have applied. You can convince this sort of inconsistency to produce <hask>unsafeCoerce</hask>.<br />
<br />
It gets worse. GHC has no internal notion of inequality, so it can't use previous, failed term-level GADT pattern matches to refine its type assumptions. For example:<br />
<br />
<haskell><br />
data G :: * -> * where<br />
GInt :: G Int<br />
GBool :: G Bool<br />
<br />
type family Foo (a :: *) :: * where<br />
Foo Int = Char<br />
Foo a = Double<br />
<br />
bar :: G a -> Foo a<br />
bar GInt = 'x'<br />
bar _ = 3.14<br />
</haskell><br />
<br />
The last line will fail to typecheck, because GHC doesn't know that the type variable <hask>a</hask> can't be <hask>Int</hask> here, even though it's obvious. The only general way to fix this is to have inequality evidence introduced into GHC, and that's a big deal and we don't know if we have the motivation for such a change yet.<br />
<br />
==== Associated type instances ====<br />
<br />
When an associated family instance is declared within a type class instance, we drop the <hask>instance</hask> keyword in the family instance. So, the <hask>[e]</hask> instance for <hask>Elem</hask> becomes:<br />
<haskell><br />
instance (Eq (Elem [e])) => Collects ([e]) where<br />
type Elem [e] = e<br />
...<br />
</haskell><br />
The most important point about associated family instances is that the type indexes corresponding to class parameters must be identical to the type given in the instance head; here this is <hask>[e]</hask>, which coincides with the only class parameter.<br />
<br />
Instances for an associated family can only appear as part of instance declarations of the class in which the family was declared - just as with the equations of the methods of a class. Also in correspondence to how methods are handled, declarations of associated types can be omitted in class instances. If an associated family instance is omitted, the corresponding instance type is not inhabited; i.e., only diverging expressions, such as <hask>undefined</hask>, can assume the type.<br />
<br />
==== Overlap ====<br />
<br />
The instance declarations of an open type family used in a single program must be compatible, in the form defined above. This condition is independent of whether the type family is associated or not, and it is not only a matter of consistency, but one of type safety. <br />
<br />
Here are two examples to illustrate the condition under which overlap is permitted.<br />
<haskell><br />
type instance F (a, Int) = [a]<br />
type instance F (Int, b) = [b] -- overlap permitted<br />
<br />
type instance G (a, Int) = [a]<br />
type instance G (Char, a) = [a] -- ILLEGAL overlap, as [Char] /= [Int]<br />
</haskell><br />
<br />
==== Decidability ====<br />
<br />
In order to guarantee that type inference in the presence of type families is decidable, we need to place a number of additional restrictions on the formation of type instance declarations (c.f., Definition 5 (Relaxed Conditions) of [http://www.cse.unsw.edu.au/~chak/papers/SPCS08.html Type Checking with Open Type Functions]). Instance declarations have the general form<br />
<haskell><br />
type instance F t1 .. tn = t<br />
</haskell><br />
where we require that for every type family application <hask>(G s1 .. sm)</hask> in <hask>t</hask>, <br />
# <hask>s1 .. sm</hask> do not contain any type family constructors,<br />
# the total number of symbols (data type constructors and type variables) in <hask>s1 .. sm</hask> is strictly smaller than in <hask>t1 .. tn</hask>, and<br />
# for every type variable <hask>a</hask>, <hask>a</hask> occurs in <hask>s1 .. sm</hask> at most as often as in <hask>t1 .. tn</hask>.<br />
These restrictions are easily verified and ensure termination of type inference. However, they are not sufficient to guarantee completeness of type inference in the presence of, so called, ''loopy equalities'', such as <hask>a ~ [F a]</hask>, where a recursive occurrence of a type variable is underneath a family application and data constructor application - see the above mentioned paper for details. <br />
<br />
If the option <tt>-XUndecidableInstances</tt> is passed to the compiler, the above restrictions are not enforced and it is on the programmer to ensure termination of the normalisation of type families during type inference.<br />
<br />
=== Equality constraints ===<br />
<br />
Type context can include equality constraints of the form <hask>t1 ~ t2</hask>, which denote that the types <hask>t1</hask> and <hask>t2</hask> need to be the same. In the presence of type families, whether two types are equal cannot generally be decided locally. Hence, the contexts of function signatures may include equality constraints, as in the following example:<br />
<haskell><br />
sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 -> c2 -> c2<br />
</haskell><br />
where we require that the element type of <hask>c1</hask> and <hask>c2</hask> are the same. In general, the types <hask>t1</hask> and <hask>t2</hask> of an equality constraint may be arbitrary monotypes; i.e., they may not contain any quantifiers, independent of whether higher-rank types are otherwise enabled.<br />
<br />
Equality constraints can also appear in class and instance contexts. The former enable a simple translation of programs using functional dependencies into programs using family synonyms instead. The general idea is to rewrite a class declaration of the form<br />
<haskell><br />
class C a b | a -> b<br />
</haskell><br />
to<br />
<haskell><br />
class (F a ~ b) => C a b where<br />
type F a<br />
</haskell><br />
That is, we represent every functional dependency (FD) <hask>a1 .. an -> b</hask> by an FD type family <hask>F a1 .. an</hask> and a superclass context equality <hask>F a1 .. an ~ b</hask>, essentially giving a name to the functional dependency. In class instances, we define the type instances of FD families in accordance with the class head. Method signatures are not affected by that process.<br />
<br />
== Frequently asked questions ==<br />
<br />
=== Comparing type families and functional dependencies ===<br />
<br />
Functional dependencies cover some of the same territory as type families. How do the two compare?<br />
There are some articles about this question:<br />
<br />
* Experiences in converting functional dependencies to type families: "[[Functional dependencies vs. type families]]"<br />
* [https://ghc.haskell.org/trac/ghc/wiki/TFvsFD GHC trac] on a comparison of functional dependencies and type families<br />
<br />
=== Injectivity, type inference, and ambiguity ===<br />
<br />
A common problem is this<br />
<haskell><br />
type family F a<br />
<br />
f :: F a -> F a<br />
f = undefined<br />
<br />
g :: F Int -> F Int<br />
g x = f x<br />
</haskell><br />
The compiler complains about the definition of <tt>g</tt> saying<br />
<code><br />
Couldn't match expected type `F Int' against inferred type `F a1'<br />
</code><br />
In type-checking <tt>g</tt>'s right hand side GHC discovers (by instantiating <tt>f</tt>'s type with a fresh type variable) that it has type <tt>F a1 -> F a1</tt> for some as-yet-unknown type <tt>a1</tt>. Now it tries to make the inferred type match <tt>g</tt>'s type signature. Well, you say, just make <tt>a1</tt> equal to <tt>Int</tt> and you are done. True, but what if there were these instances<br />
<haskell><br />
type instance F Int = Bool<br />
type instance F Char = Bool<br />
</haskell><br />
Then making <tt>a1</tt> equal to <tt>Char</tt> would ''also'' make the two types equal. Because there is (potentially) more than one choice, the program is rejected.<br />
<br />
However (and confusingly) if you omit the type signature on <tt>g</tt> altogether, thus<br />
<haskell><br />
f :: F a -> F a<br />
f = undefined<br />
<br />
g x = f x<br />
</haskell><br />
GHC will happily infer the type <tt>g :: F a -> F a</tt>. But you can't ''write'' that type signature or, indeed, the more specific one above. (Arguably this behaviour, where GHC ''infers'' a type it can't ''check'', is very confusing. I suppose we could make GHC reject both programs, with and without type signatures.)<br />
<br />
'''What is the problem?''' The nub of the issue is this: knowing that <tt>F t1</tt>=<tt>F t2</tt> does ''not'' imply that <tt>t1</tt> = <tt>t2</tt>.<br />
The difficulty is that the type function <tt>F</tt> need not be ''injective''; it can map two distinct types to the same type. For an injective type constructor like <tt>Maybe</tt>, if we know that <tt>Maybe t1</tt> = <tt>Maybe t2</tt>, then we know that <tt>t1</tt> = <tt>t2</tt>. But not so for non-injective type functions.<br />
<br />
The problem starts with <tt>f</tt>. Its type is ''ambiguous''; even if I know the argument and result types for <tt>f</tt>, I cannot use that to find the type at which <tt>a</tt> should be instantiated. (So arguably, <tt>f</tt> should be rejected as having an ambiguous type, and probably will be in future.) The situation is well known in type classes: <br />
<haskell><br />
bad :: (Read a, Show a) => String -> String<br />
bad x = show (read x)<br />
</haskell><br />
At a call of <tt>bad</tt> one cannot tell at what type <tt>a</tt> should be instantiated.<br />
<br />
The only solution is to avoid ambiguous types. In the type signature of a function, <br />
* Ensure that every type variable occurs in the part after the "<tt>=></tt>"<br />
* Ensure that every type variable appears at least once outside a type function call.<br />
Alternatively, you can use data families, which create new types and are therefore injective. The following code works:<br />
<br />
<haskell>data family F a<br />
<br />
f :: F a -> F a<br />
f = undefined<br />
<br />
g :: F Int -> F Int<br />
g x = f x</haskell><br />
<br />
== References ==<br />
<br />
* [http://www.cse.unsw.edu.au/~chak/papers/CKPM05.html Associated Types with Class.] Manuel M. T. Chakravarty, Gabriele Keller, Simon Peyton Jones, and Simon Marlow. In ''Proceedings of The 32nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'05)'', pages 1-13, ACM Press, 2005.<br />
* [http://www.cse.unsw.edu.au/~chak/papers/CKP05.html Associated Type Synonyms.] Manuel M. T. Chakravarty, Gabriele Keller, and Simon Peyton Jones. In ''Proceedings of The Tenth ACM SIGPLAN International Conference on Functional Programming'', ACM Press, pages 241-253, 2005.<br />
* [http://www.cse.unsw.edu.au/~chak/papers/SCPD07.html System F with Type Equality Coercions.] Martin Sulzmann, Manuel M. T. Chakravarty, Simon Peyton Jones, and Kevin Donnelly. In ''Proceedings of The Third ACM SIGPLAN Workshop on Types in Language Design and Implementation'', ACM Press, 2007.<br />
* [http://www.cse.unsw.edu.au/~chak/papers/SPCS08.html Type Checking With Open Type Functions.] Tom Schrijvers, Simon Peyton-Jones, Manuel M. T. Chakravarty, Martin Sulzmann. In ''Proceedings of The 13th ACM SIGPLAN International Conference on Functional Programming'', ACM Press, pages 51-62, 2008.<br />
* [[Simonpj/Talk:FunWithTypeFuns | Fun with Type Functions]] Oleg Kiselyov, Simon Peyton Jones, Chung-chieh Shan (the source for this paper can be found at http://patch-tag.com/r/schoenfinkel/typefunctions/wiki ) <br />
<br />
[[Category:Type-level programming]]<br />
[[Category:Language extensions]]<br />
[[Category:GHC|Indexed types]]</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=GHC/Type_families&diff=62963GHC/Type families2019-07-22T12:58:46Z<p>Anton-Latukha: ch /* (What do I need -> Requirements) to use type families? */</p>
<hr />
<div>{{GHCUsersGuide|glasgow_exts|type-families|a Type Family section}}<br />
<br />
<br />
Indexed type families, or '''type families''' for short, are a Haskell extension supporting ad-hoc overloading of data types. Type families are parametric types that can be assigned specialized representations based on the type parameters they are instantiated with. They are the data type analogue of [[Type class|type classes]]: families are used to define overloaded ''data'' in the same way that classes are used to define overloaded ''functions''. Type families are useful for generic programming, for creating highly parameterised library interfaces, and for creating interfaces with enhanced static information, much like dependent types.<br />
<br />
Type families come in two flavors: ''data families'' and ''type synonym families''. Data families are the indexed form of data and newtype definitions. Type synonym families are the indexed form of type synonyms. Each of these flavors can be defined in a standalone manner or ''associated'' with a type class. Standalone definitions are more general, while associated types can more clearly express how a type is used and lead to better error messages.<br />
<br />
''NB: see also Simon's [https://www.haskell.org/ghc/blog/20100930-LetGeneralisationInGhc7.html blog entry on let generalisation] for a significant change in the policy for let generalisation, driven by the type family extension. In brief: a few programs will puzzlingly fail to compile with <tt>-XTypeFamilies</tt> even though the code is legal Haskell 98.''<br />
<br />
== What are type families? ==<br />
<br />
The concept of a type family comes from type theory. An indexed type family in type theory is a partial function at the type level. Applying the function to parameters (called ''type indices'') yields a type. Type families permit a program to compute what data constructors it will operate on, rather than having them fixed statically (as with simple type systems) or treated as opaque unknowns (as with parametrically polymorphic types).<br />
<br />
Type families are to vanilla data types what type class methods are to regular functions. Vanilla polymorphic data types and functions have a single definition, which is used at all type instances. Classes and type families, on the other hand, have an interface definition and any number of instance definitions. A type family's interface definition declares its [[kind]] and its arity, or the number of type indices it takes. Instance definitions define the type family over some part of the domain.<br />
<br />
As a simple example of how type families differ from ordinary parametric data types, consider a strict list type. We can represent a list of <hask>Char</hask> in the usual way, with cons cells. We can do the same thing to represent a list of <hask>()</hask>, but since a strict <hask>()</hask> value carries no useful information, it would be more efficient to just store the length of the list. This can't be done with an ordinary parametric data type, because the data constructors used to represent the list would depend on the list's type parameter: if it's <hask>Char</hask> then the list consists of cons cells; if it's <hask>()</hask>, then the list consists of a single integer. We basically want to select between two different data types based on a type parameter. Using type families, this list type could be declared as follows:<br />
<br />
<haskell><br />
-- Declare a list-like data family<br />
data family XList a<br />
<br />
-- Declare a list-like instance for Char<br />
data instance XList Char = XCons !Char !(XList Char) | XNil<br />
<br />
-- Declare a number-like instance for ()<br />
data instance XList () = XListUnit !Int<br />
</haskell><br />
<br />
The right-hand sides of the two <code>data instance</code> declarations are exactly ordinary data definitions. In fact, a <code>data instance</code> declaration is nothing more than a shorthand for a <code>data</code> declaration followed by a <code>type instance</code> (see below) declaration. However, they define two instances of the same parametric data type, <hask>XList Char</hask> and <hask>XList ()</hask>, whereas ordinary data declarations define completely unrelated types. A recent [[Simonpj/Talk:FunWithTypeFuns|tutorial paper]] has more in-depth examples of programming with type families. <br />
<br />
[[GADT]]s bear some similarity to type families, in the sense that they allow a parametric type's constructors to depend on the type's parameters. However, all GADT constructors must be defined in one place, whereas type families can be extended. Functional dependencies are similar to type families, and many type classes that use functional dependencies can be equivalently expressed with type families. Type families provide a more functional style of type-level programming than the relational style of functional dependencies.<br />
<br />
== Requirements to use type families? ==<br />
<br />
Type families are a GHC extension enabled with the <code>-XTypeFamilies</code> flag or the <code>{-# LANGUAGE TypeFamilies #-}</code> pragma. The first stable release of GHC that properly supports type families is 6.10.1. (The 6.8 release included an early partial implementation, but its use is deprecated.) Please [http://hackage.haskell.org/trac/ghc/query?status=new&status=assigned&status=reopened&group=priority&type=bug&order=id&desc=1 report bugs] via the GHC bug tracker, ideally accompanied by a small example program that demonstrates the problem. Use the [mailto:glasgow-haskell-users@haskell.org GHC mailing list] for questions or for a discussion of this language extension and its description on this wiki page.<br />
<br />
== An associated data type example ==<br />
<br />
As an example, consider Ralf Hinze's [http://www.cs.ox.ac.uk/ralf.hinze/publications/GGTries.ps.gz generalised tries], a form of generic finite maps. <br />
<br />
=== The class declaration ===<br />
<br />
We define a type class whose instances are the types that we can use as keys in our generic maps:<br />
<haskell><br />
class GMapKey k where<br />
data GMap k :: * -> *<br />
empty :: GMap k v<br />
lookup :: k -> GMap k v -> Maybe v<br />
insert :: k -> v -> GMap k v -> GMap k v<br />
</haskell><br />
The interesting part is the ''associated data family'' declaration of the class. It gives a [http://www.haskell.org/ghc/docs/latest/html/users_guide/type-families.html#data-family-declarations ''kind signature''] (here <hask>* -> *</hask>) for the associated data type <hask>GMap k</hask> - analogous to how methods receive a type signature in a class declaration.<br />
<br />
What it is important to notice is that the first parameter of the associated type <hask>GMap</hask> coincides with the class parameter of <hask>GMapKey</hask>. This indicates that also in all instances of the class, the instances of the associated data type need to have their first argument match up with the instance type. In general, the type arguments of an associated type can be a subset of the class parameters (in a multi-parameter type class) and they can appear in any order, possibly in an order other than in the class head. The latter can be useful if the associated data type is partially applied in some contexts.<br />
<br />
The second important point is that as <hask>GMap k</hask> has kind <hask>* -> *</hask> and <hask>k</hask> (implicitly) has kind <hask>*</hask>, the type constructor <hask>GMap</hask> (without an argument) has kind <hask>* -> * -> *</hask>. Consequently, we see that <hask>GMap</hask> is applied to two arguments in the signatures of the methods <hask>empty</hask>, <hask>lookup</hask>, and <hask>insert</hask>.<br />
<br />
=== An Int instance ===<br />
<br />
To use Ints as keys into generic maps, we declare an instance that simply uses <hask>Data.IntMap</hask>, thusly:<br />
<haskell><br />
instance GMapKey Int where<br />
data GMap Int v = GMapInt (Data.IntMap.IntMap v)<br />
empty = GMapInt Data.IntMap.empty<br />
lookup k (GMapInt m) = Data.IntMap.lookup k m<br />
insert k v (GMapInt m) = GMapInt (Data.IntMap.insert k v m)<br />
</haskell><br />
The <hask>Int</hask> instance of the associated data type <hask>GMap</hask> needs to have both of its parameters, but as only the first one corresponds to a class parameter, the second needs to be a type variable (here <hask>v</hask>). As mentioned before, any associated type parameter that corresponds to a class parameter must be identical to the class parameter in each instance. The right hand side of the associated data declaration is like that of any other data type.<br />
<br />
NB: At the moment, GADT syntax is not allowed for associated data types (or other indexed types). This is not a fundamental limitation, but just a shortcoming of the current implementation, which we expect to lift in the future.<br />
<br />
As an exercise, implement an instance for <hask>Char</hask> that maps back to the <hask>Int</hask> instance using the conversion functions <hask>Char.ord</hask> and <hask>Char.chr</hask>.<br />
<br />
=== A unit instance ===<br />
<br />
Generic definitions, apart from elementary types, typically cover units, products, and sums. We start here with the unit instance for <hask>GMap</hask>:<br />
<haskell><br />
instance GMapKey () where<br />
data GMap () v = GMapUnit (Maybe v)<br />
empty = GMapUnit Nothing<br />
lookup () (GMapUnit v) = v<br />
insert () v (GMapUnit _) = GMapUnit $ Just v<br />
</haskell><br />
For unit, the map is just a <hask>Maybe</hask> value.<br />
<br />
=== Product and sum instances ===<br />
<br />
Next, let us define the instances for pairs and sums (i.e., <hask>Either</hask>):<br />
<haskell><br />
instance (GMapKey a, GMapKey b) => GMapKey (a, b) where<br />
data GMap (a, b) v = GMapPair (GMap a (GMap b v))<br />
empty = GMapPair empty<br />
lookup (a, b) (GMapPair gm) = lookup a gm >>= lookup b <br />
insert (a, b) v (GMapPair gm) = GMapPair $ case lookup a gm of<br />
Nothing -> insert a (insert b v empty) gm<br />
Just gm2 -> insert a (insert b v gm2 ) gm<br />
<br />
instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where<br />
data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)<br />
empty = GMapEither empty empty<br />
lookup (Left a) (GMapEither gm1 _gm2) = lookup a gm1<br />
lookup (Right b) (GMapEither _gm1 gm2 ) = lookup b gm2<br />
insert (Left a) v (GMapEither gm1 gm2) = GMapEither (insert a v gm1) gm2<br />
insert (Right b) v (GMapEither gm1 gm2) = GMapEither gm1 (insert b v gm2)<br />
</haskell><br />
If you find this code algorithmically surprising, I'd suggest to have a look at [http://www.cs.ox.ac.uk/ralf.hinze/publications/index.html#J4 Ralf Hinze's paper]. The only novelty concerning associated types, in these two instances, is that the instances have a context <hask>(GMapKey a, GMapKey b)</hask>. Consequently, the right hand sides of the associated type declarations can use <hask>GMap</hask> recursively at the key types <hask>a</hask> and <hask>b</hask> - not unlike the method definitions use the class methods recursively at the types for which the class is given in the instance context.<br />
<br />
=== Using a generic map ===<br />
<br />
Finally, some code building and querying a generic map:<br />
<haskell><br />
myGMap :: GMap (Int, Either Char ()) String<br />
myGMap = insert (5, Left 'c') "(5, Left 'c')" $<br />
insert (4, Right ()) "(4, Right ())" $<br />
insert (5, Right ()) "This is the one!" $<br />
insert (5, Right ()) "This is the two!" $<br />
insert (6, Right ()) "(6, Right ())" $<br />
insert (5, Left 'a') "(5, Left 'a')" $<br />
empty<br />
main = putStrLn $ maybe "Couldn't find key!" id $ lookup (5, Right ()) myGMap<br />
</haskell><br />
<br />
=== Download the code ===<br />
<br />
If you want to play with this example without copying it off the wiki, just download the source code[http://darcs.haskell.org/testsuite/tests/ghc-regress/indexed-types/should_run/GMapAssoc.hs] for <hask>GMap</hask> from GHC's test suite.<br />
<br />
== Detailed definition of data families ==<br />
<br />
Data families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated types). The former is the more general variant, as it lacks the requirement for the type-indices to coincide with the class parameters. However, the latter can lead to more clearly structured code and compiler warnings if some type instances were - possibly accidentally - omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types.<br />
<br />
=== Family declarations ===<br />
<br />
Indexed data families are introduced by a signature, such as <br />
<haskell><br />
data family GMap k :: * -> *<br />
</haskell><br />
The special <hask>family</hask> distinguishes family from standard data declarations. The result kind annotation is optional and, as usual, defaults to <hask>*</hask> if omitted. An example is<br />
<haskell><br />
data family Array e<br />
</haskell><br />
Named arguments can also be given explicit kind signatures if needed. Just as with [http://www.haskell.org/ghc/docs/latest/html/users_guide/gadt.html GADT declarations] named arguments are entirely optional, so that we can declare <hask>Array</hask> alternatively with<br />
<haskell><br />
data family Array :: * -> *<br />
</haskell><br />
<br />
==== Associated family declarations ====<br />
<br />
When a data family is declared as part of a type class, we drop the <hask>family</hask> keyword. The <hask>GMap</hask> declaration takes the following form<br />
<haskell><br />
class GMapKey k where<br />
data GMap k :: * -> *<br />
...<br />
</haskell><br />
In contrast to toplevel declarations, named arguments must be used for all type parameters that are to be used as type-indices. Moreover, the argument names must be class parameters. Each class parameter may only be used at most once per associated type, but some may be omitted and they may be in an order other than in the class head. In other words: '''the named type parameters of the data declaration must be a permutation of a subset of the class variables'''. <br />
<br />
Example is admissible:<br />
<haskell><br />
class C a b c where { data T c a :: * } -- OK<br />
class C a b c where { data T a a :: * } -- Bad: repeated variable<br />
class D a where { data T a x :: * } -- Bad: x is not a class variable<br />
class D a where { data T a :: * -> * } -- OK<br />
</haskell><br />
<br />
=== Instance declarations ===<br />
<br />
Instance declarations of data and newtype families are very similar to standard data and newtype declarations. The only two differences are that the keyword <hask>data</hask> or <hask>newtype</hask> is followed by <hask>instance</hask> and that some or all of the type arguments can be non-variable types, but may not contain forall types or type synonym families. However, data families are generally allowed in type parameters, and type synonyms are allowed as long as they are fully applied and expand to a type that is itself admissible - exactly as this is required for occurrences of type synonyms in class instance parameters. For example, the <hask>Either</hask> instance for <hask>GMap</hask> is<br />
<haskell><br />
data instance GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)<br />
</haskell><br />
In this example, the declaration has only one variant. In general, it can be any number.<br />
<br />
Data and newtype instance declarations are only legit when an appropriate family declaration is in scope - just like class instances require the class declaration to be visible. Moreover, each instance declaration has to conform to the kind determined by its family declaration. This implies that the number of parameters of an instance declaration matches the arity determined by the kind of the family. Although all data families are declared with the <hask>data</hask> keyword, instances can be either <hask>data</hask> or <hask>newtype</hask>s, or a mix of both.<br />
<br />
Even if type families are defined as toplevel declarations, functions that perform different computations for different family instances still need to be defined as methods of type classes. In particular, the following is not possible:<br />
<haskell><br />
data family T a<br />
data instance T Int = A<br />
data instance T Char = B<br />
nonsense :: T a -> Int<br />
nonsense A = 1 -- WRONG: These two equations together...<br />
nonsense B = 2 -- ...will produce a type error.<br />
</haskell><br />
Given the functionality provided by GADTs (Generalised Algebraic Data Types), it might seem as if a definition, such as the above, should be feasible. However, type families - in contrast to GADTs - are ''open''; i.e., new instances can always be added, possibly in other modules. Supporting pattern matching across different data instances would require a form of extensible case construct.<br />
<br />
==== Associated type instances ====<br />
<br />
When an associated family instance is declared within a type class instance, we drop the <hask>instance</hask> keyword in the family instance. So, the <hask>Either</hask> instance for <hask>GMap</hask> becomes:<br />
<haskell><br />
instance (GMapKey a, GMapKey b) => GMapKey (Either a b) where<br />
data GMap (Either a b) v = GMapEither (GMap a v) (GMap b v)<br />
...<br />
</haskell><br />
The most important point about associated family instances is that the type indices corresponding to class parameters must be identical to the type given in the instance head; here this is the first argument of <hask>GMap</hask>, namely <hask>Either a b</hask>, which coincides with the only class parameter. Any parameters to the family constructor that do not correspond to class parameters, need to be variables in every instance; here this is the variable <hask>v</hask>.<br />
<br />
Instances for an associated family can only appear as part of instance declarations of the class in which the family was declared - just as with the equations of the methods of a class. Also in correspondence to how methods are handled, declarations of associated types can be omitted in class instances. If an associated family instance is omitted, the corresponding instance type is not inhabited; i.e., only diverging expressions, such as <hask>undefined</hask>, can assume the type.<br />
<br />
==== Scoping of class parameters ====<br />
<br />
In the case of multi-parameter type classes, the visibility of class parameters in the right-hand side of associated family instances depends ''solely'' on the parameters of the data family. As an example, consider the simple class declaration<br />
<haskell><br />
class C a b where<br />
data T a<br />
</haskell><br />
Only one of the two class parameters is a parameter to the data family. Hence, the following instance declaration is invalid:<br />
<haskell><br />
instance C [c] d where<br />
data T [c] = MkT (c, d) -- WRONG!! 'd' is not in scope<br />
</haskell><br />
Here, the right-hand side of the data instance mentions the type variable <hask>d</hask> that does not occur in its left-hand side. We cannot admit such data instances as they would compromise type safety.<br />
<br />
==== Type class instances of family instances ====<br />
<br />
Type class instances of instances of data families can be defined as usual, and in particular data instance declarations can have <hask>deriving</hask> clauses. For example, we can write<br />
<haskell><br />
data GMap () v = GMapUnit (Maybe v)<br />
deriving Show<br />
</haskell><br />
which implicitly defines an instance of the form<br />
<haskell><br />
instance Show v => Show (GMap () v) where ...<br />
</haskell><br />
<br />
Note that class instances are always for particular ''instances'' of a data family and never for an entire family as a whole. This is for essentially the same reasons that we cannot define a toplevel function that performs pattern matching on the data constructors of ''different'' instances of a single type family. It would require a form of extensible case construct.<br />
<br />
==== Overlap ====<br />
<br />
The instance declarations of a data family used in a single program may not overlap at all, independent of whether they are associated or not. In contrast to type class instances, this is not only a matter of consistency, but one of type safety.<br />
<br />
=== Import and export ===<br />
<br />
The association of data constructors with type families is more dynamic than that is the case with standard data and newtype declarations. In the standard case, the notation <hask>T(..)</hask> in an import or export list denotes the type constructor and all the data constructors introduced in its declaration. However, a family declaration never introduces any data constructors; instead, data constructors are introduced by family instances. As a result, which data constructors are associated with a type family depends on the currently visible instance declarations for that family. Consequently, an import or export item of the form <hask>T(..)</hask> denotes the family constructor and all currently visible data constructors - in the case of an export item, these may be either imported or defined in the current module. The treatment of import and export items that explicitly list data constructors, such as <hask>GMap(GMapEither)</hask>, is analogous.<br />
<br />
==== Associated families ====<br />
<br />
As expected, an import or export item of the form <hask>C(..)</hask> denotes all of the class' methods and associated types. However, when associated types are explicitly listed as subitems of a class, we need some new syntax, as uppercase identifiers as subitems are usually data constructors, not type constructors. To clarify that we denote types here, each associated type name needs to be prefixed by the keyword <hask>type</hask>. So for example, when explicitly listing the components of the <hask>GMapKey</hask> class, we write <hask>GMapKey(type GMap, empty, lookup, insert)</hask>.<br />
<br />
==== Examples ====<br />
<br />
Assuming our running <hask>GMapKey</hask> class example, let us look at some export lists and their meaning:<br />
<br />
* <hask>module GMap (GMapKey) where...</hask>: Exports just the class name.<br />
* <hask>module GMap (GMapKey(..)) where...</hask>: Exports the class, the associated type <hask>GMap</hask> and the member functions <hask>empty</hask>, <hask>lookup</hask>, and <hask>insert</hask>. None of the data constructors is exported.<br />
* <hask>module GMap (GMapKey(..), GMap(..)) where...</hask>: As before, but also exports all the data constructors <hask>GMapInt</hask>, <hask>GMapChar</hask>, <hask>GMapUnit</hask>, <hask>GMapPair</hask>, and <hask>GMapEither</hask>.<br />
* <hask>module GMap (GMapKey(empty, lookup, insert), GMap(..)) where...</hask>: As before.<br />
* <hask>module GMap (GMapKey, empty, lookup, insert, GMap(..)) where...</hask>: As before.<br />
<br />
Finally, you can write <hask>GMapKey(type GMap)</hask> to denote both the class <hask>GMapKey</hask> as well as its associated type <hask>GMap</hask>. However, you cannot write <hask>GMapKey(type GMap(..))</hask> &mdash; i.e., sub-component specifications cannot be nested. To specify <hask>GMap</hask>'s data constructors, you have to list it separately.<br />
<br />
==== Instances ====<br />
<br />
Family instances are implicitly exported, just like class instances. However, this applies only to the heads of instances, not to the data constructors an instance defines.<br />
<br />
== An associated type synonym example ==<br />
<br />
Type synonym families are an alternative to functional dependencies, which makes functional dependency examples well suited to introduce type synonym families. In fact, type families are a more functional way to express the same as functional dependencies (despite the name!), as they replace the relational notation of functional dependencies by an expression-oriented notation; i.e., functions on types are really represented by functions and not relations.<br />
<br />
=== The <hask>class</hask> declaration ===<br />
<br />
Here's an example from Mark Jones' seminal paper on functional dependencies:<br />
<haskell><br />
class Collects e ce | ce -> e where<br />
empty :: ce<br />
insert :: e -> ce -> ce<br />
member :: e -> ce -> Bool<br />
toList :: ce -> [e]<br />
</haskell><br />
<br />
With associated type synonyms we can write this as<br />
<haskell><br />
class Collects ce where<br />
type Elem ce<br />
empty :: ce<br />
insert :: Elem ce -> ce -> ce<br />
member :: Elem ce -> ce -> Bool<br />
toList :: ce -> [Elem ce]<br />
</haskell><br />
Instead of the multi-parameter type class, we use a single parameter class, and the parameter <hask>e</hask><br />
turned into an associated type synonym <hask>Elem ce</hask>.<br />
<br />
=== An <hask>instance</hask>===<br />
<br />
Instances change correspondingly. An instance of the two-parameter class<br />
<haskell><br />
instance Eq e => Collects e [e] where<br />
empty = []<br />
insert e l = (e:l)<br />
member e [] = False<br />
member e (x:xs) <br />
| e == x = True<br />
| otherwise = member e xs<br />
toList l = l<br />
</haskell><br />
becomes an instance of a single-parameter class, where the dependent type parameter turns into an associated type instance declaration:<br />
<haskell><br />
instance Eq e => Collects [e] where<br />
type Elem [e] = e<br />
empty = []<br />
insert e l = (e:l)<br />
member e [] = False<br />
member e (x:xs) <br />
| e == x = True<br />
| otherwise = member e xs<br />
toList l = l<br />
</haskell><br />
<br />
=== Using generic collections ===<br />
<br />
With Functional Dependencies the code would be:<br />
<haskell><br />
sumCollects :: (Collects e c1, Collects e c2) => c1 -> c2 -> c2<br />
sumCollects c1 c2 = foldr insert c2 (toList c1)<br />
</haskell><br />
<br />
In contrast, with associated type synonyms, we get:<br />
<haskell><br />
sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 -> c2 -> c2<br />
sumCollects c1 c2 = foldr insert c2 (toList c1)<br />
</haskell><br />
<br />
== Detailed definition of type synonym families ==<br />
<br />
Type families appear in two flavours: (1) they can be defined on the toplevel or (2) they can appear inside type classes (in which case they are known as associated type synonyms). The former is the more general variant, as it lacks the requirement for the type-indices to coincide with the class parameters. However, the latter can lead to more clearly structured code and compiler warnings if some type instances were - possibly accidentally - omitted. In the following, we always discuss the general toplevel form first and then cover the additional constraints placed on associated types.<br />
<br />
=== Family declarations ===<br />
<br />
Indexed type families are introduced by a signature, such as <br />
<haskell><br />
type family Elem c :: *<br />
</haskell><br />
The special <hask>family</hask> distinguishes family from standard type declarations. The result kind annotation is optional and, as usual, defaults to <hask>*</hask> if omitted. An example is<br />
<haskell><br />
type family Elem c<br />
</haskell><br />
Parameters can also be given explicit kind signatures if needed. We call the number of parameters in a type family declaration, the family's arity, and all applications of a type family must be fully saturated w.r.t. to that arity. This requirement is unlike ordinary type synonyms and it implies that the kind of a type family is not sufficient to determine a family's arity, and hence in general, also insufficient to determine whether a type family application is well formed. As an example, consider the following declaration:<br />
<haskell><br />
type family F a b :: * -> * -- F's arity is 2, <br />
-- although its overall kind is * -> * -> * -> *<br />
</haskell><br />
Given this declaration the following are examples of well-formed and malformed types:<br />
<haskell><br />
F Char [Int] -- OK! Kind: * -> *<br />
F Char [Int] Bool -- OK! Kind: *<br />
F IO Bool -- WRONG: kind mismatch in the first argument<br />
F Bool -- WRONG: unsaturated application<br />
</haskell><br />
<br />
A top-level type family can be declared as open or closed. (Associated type<br />
families are always open.) A closed type family has all of its equations<br />
defined in one place and cannot be extended, whereas an open family can have<br />
instances spread across modules. The advantage of a closed family is that<br />
its equations are tried in order, similar to a term-level function definition:<br />
<haskell><br />
type family G a where<br />
G Int = Bool<br />
G a = Char<br />
</haskell><br />
With this definition, the type <hask>G Int</hask> becomes <hask>Bool</hask><br />
and, say, <hask>G Double</hask> becomes <hask>Char</hask>. See also [http://ghc.haskell.org/trac/ghc/wiki/NewAxioms here] for more information about closed type families.<br />
<br />
==== Associated family declarations ====<br />
<br />
When a type family is declared as part of a type class, we drop the <hask>family</hask> special. The <hask>Elem</hask> declaration takes the following form<br />
<haskell><br />
class Collects ce where<br />
type Elem ce :: *<br />
...<br />
</haskell><br />
Exactly as in the case of an associated data declaration, '''the named type parameters must be a permutation of a subset of the class parameters'''. Examples<br />
<haskell><br />
class C a b c where { type T c a :: * } -- OK<br />
class D a where { type T a x :: * } -- No: x is not a class parameter<br />
class D a where { type T a :: * -> * } -- OK<br />
</haskell><br />
<br />
=== Type instance declarations ===<br />
<br />
Instance declarations of open type families are very similar to standard type synonym declarations. The only two differences are that the keyword <hask>type</hask> is followed by <hask>instance</hask> and that some or all of the type arguments can be non-variable types, but may not contain forall types or type synonym families. However, data families are generally allowed, and type synonyms are allowed as long as they are fully applied and expand to a type that is admissible - these are the exact same requirements as for data instances. For example, the <hask>[e]</hask> instance for <hask>Elem</hask> is<br />
<haskell><br />
type instance Elem [e] = e<br />
</haskell><br />
<br />
A type family instance declaration must satisfy the following rules:<br />
* An appropriate family declaration is in scope - just like class instances require the class declaration to be visible. <br />
* The instance declaration conforms to the kind determined by its family declaration<br />
* The number of type parameters in an instance declaration matches the number of type parameters in the family declaration.<br />
* The right-hand side of a type instance must be a monotype (i.e., it may not include foralls) and after the expansion of all saturated vanilla type synonyms, no synonyms, except family synonyms may remain.<br />
<br />
Here are some examples of admissible and illegal type instances and closed families:<br />
<haskell><br />
type family F a :: *<br />
type instance F [Int] = Int -- OK!<br />
type instance F String = Char -- OK!<br />
type instance F (F a) = a -- WRONG: type parameter mentions a type family<br />
type instance F (forall a. (a, b)) = b -- WRONG: a forall type appears in a type parameter<br />
type instance F Float = forall a.a -- WRONG: right-hand side may not be a forall type<br />
<br />
type family F2 a where -- OK!<br />
F2 (Maybe Int) = Int<br />
F2 (Maybe Bool) = Bool<br />
F2 (Maybe a) = String<br />
<br />
type family G a b :: * -> *<br />
type instance G Int = (,) -- WRONG: must be two type parameters<br />
type instance G Int Char Float = Double -- WRONG: must be two type parameters<br />
</haskell><br />
<br />
==== Closed family simplification ====<br />
<br />
Included in ghc starting 7.8.1.<br />
<br />
When dealing with closed families, simplifying the type is harder than just finding a left-hand side that matches and replacing that with a right-hand side. GHC will select an equation to use in a given type family application (the "target") if and only if the following 2 conditions hold:<br />
<br />
# There is a substitution from the variables in the equation's LHS that makes the left-hand side of the branch coincide with the target.<br />
# For each previous equation in the family: either the LHS of that equation is ''apart'' from the type family application, '''or''' the equation is ''compatible'' with the chosen equation.<br />
<br />
Now, we define ''apart'' and ''compatible'':<br />
# Two types are ''apart'' when one cannot simplify to the other, even after arbitrary type-family simplifications<br />
# Two equations are ''compatible'' if, either, their LHSs are apart or their LHSs unify and their RHSs are the same under the substitution induced by the unification.<br />
<br />
Some examples are in order:<br />
<haskell><br />
type family F a where<br />
F Int = Bool<br />
F Bool = Char<br />
F a = Bool<br />
<br />
type family And (a :: Bool) (b :: Bool) :: Bool where<br />
And False c = False<br />
And True d = d<br />
And e False = False<br />
And f True = f<br />
And g g = g<br />
</haskell><br />
<br />
In <hask>F</hask>, all pairs of equations are compatible except the second and third. The first two are compatible because their LHSs are apart. The first and third are compatible because the unifying substitution leads the RHSs to be the same. But, the second and third are not compatible because neither of these conditions holds. As a result, GHC will not use the third equation to simplify a target unless that target is apart from <hask>Bool</hask>.<br />
<br />
In <hask>And</hask>, ''every'' pair of equations is compatible, meaning GHC never has to make the extra apartness check during simplification.<br />
<br />
Why do all of this? It's a matter of type safety. Consider this example:<br />
<br />
<haskell><br />
type family J a b where<br />
J a a = Int<br />
J a b = Bool<br />
</haskell><br />
<br />
Say GHC selected the second branch just because the first doesn't apply at the moment, because two type variables are distinct. The problem is that those variables might later be instantiated at the same value, and then the first branch would have applied. You can convince this sort of inconsistency to produce <hask>unsafeCoerce</hask>.<br />
<br />
It gets worse. GHC has no internal notion of inequality, so it can't use previous, failed term-level GADT pattern matches to refine its type assumptions. For example:<br />
<br />
<haskell><br />
data G :: * -> * where<br />
GInt :: G Int<br />
GBool :: G Bool<br />
<br />
type family Foo (a :: *) :: * where<br />
Foo Int = Char<br />
Foo a = Double<br />
<br />
bar :: G a -> Foo a<br />
bar GInt = 'x'<br />
bar _ = 3.14<br />
</haskell><br />
<br />
The last line will fail to typecheck, because GHC doesn't know that the type variable <hask>a</hask> can't be <hask>Int</hask> here, even though it's obvious. The only general way to fix this is to have inequality evidence introduced into GHC, and that's a big deal and we don't know if we have the motivation for such a change yet.<br />
<br />
==== Associated type instances ====<br />
<br />
When an associated family instance is declared within a type class instance, we drop the <hask>instance</hask> keyword in the family instance. So, the <hask>[e]</hask> instance for <hask>Elem</hask> becomes:<br />
<haskell><br />
instance (Eq (Elem [e])) => Collects ([e]) where<br />
type Elem [e] = e<br />
...<br />
</haskell><br />
The most important point about associated family instances is that the type indexes corresponding to class parameters must be identical to the type given in the instance head; here this is <hask>[e]</hask>, which coincides with the only class parameter.<br />
<br />
Instances for an associated family can only appear as part of instance declarations of the class in which the family was declared - just as with the equations of the methods of a class. Also in correspondence to how methods are handled, declarations of associated types can be omitted in class instances. If an associated family instance is omitted, the corresponding instance type is not inhabited; i.e., only diverging expressions, such as <hask>undefined</hask>, can assume the type.<br />
<br />
==== Overlap ====<br />
<br />
The instance declarations of an open type family used in a single program must be compatible, in the form defined above. This condition is independent of whether the type family is associated or not, and it is not only a matter of consistency, but one of type safety. <br />
<br />
Here are two examples to illustrate the condition under which overlap is permitted.<br />
<haskell><br />
type instance F (a, Int) = [a]<br />
type instance F (Int, b) = [b] -- overlap permitted<br />
<br />
type instance G (a, Int) = [a]<br />
type instance G (Char, a) = [a] -- ILLEGAL overlap, as [Char] /= [Int]<br />
</haskell><br />
<br />
==== Decidability ====<br />
<br />
In order to guarantee that type inference in the presence of type families is decidable, we need to place a number of additional restrictions on the formation of type instance declarations (c.f., Definition 5 (Relaxed Conditions) of [http://www.cse.unsw.edu.au/~chak/papers/SPCS08.html Type Checking with Open Type Functions]). Instance declarations have the general form<br />
<haskell><br />
type instance F t1 .. tn = t<br />
</haskell><br />
where we require that for every type family application <hask>(G s1 .. sm)</hask> in <hask>t</hask>, <br />
# <hask>s1 .. sm</hask> do not contain any type family constructors,<br />
# the total number of symbols (data type constructors and type variables) in <hask>s1 .. sm</hask> is strictly smaller than in <hask>t1 .. tn</hask>, and<br />
# for every type variable <hask>a</hask>, <hask>a</hask> occurs in <hask>s1 .. sm</hask> at most as often as in <hask>t1 .. tn</hask>.<br />
These restrictions are easily verified and ensure termination of type inference. However, they are not sufficient to guarantee completeness of type inference in the presence of, so called, ''loopy equalities'', such as <hask>a ~ [F a]</hask>, where a recursive occurrence of a type variable is underneath a family application and data constructor application - see the above mentioned paper for details. <br />
<br />
If the option <tt>-XUndecidableInstances</tt> is passed to the compiler, the above restrictions are not enforced and it is on the programmer to ensure termination of the normalisation of type families during type inference.<br />
<br />
=== Equality constraints ===<br />
<br />
Type context can include equality constraints of the form <hask>t1 ~ t2</hask>, which denote that the types <hask>t1</hask> and <hask>t2</hask> need to be the same. In the presence of type families, whether two types are equal cannot generally be decided locally. Hence, the contexts of function signatures may include equality constraints, as in the following example:<br />
<haskell><br />
sumCollects :: (Collects c1, Collects c2, Elem c1 ~ Elem c2) => c1 -> c2 -> c2<br />
</haskell><br />
where we require that the element type of <hask>c1</hask> and <hask>c2</hask> are the same. In general, the types <hask>t1</hask> and <hask>t2</hask> of an equality constraint may be arbitrary monotypes; i.e., they may not contain any quantifiers, independent of whether higher-rank types are otherwise enabled.<br />
<br />
Equality constraints can also appear in class and instance contexts. The former enable a simple translation of programs using functional dependencies into programs using family synonyms instead. The general idea is to rewrite a class declaration of the form<br />
<haskell><br />
class C a b | a -> b<br />
</haskell><br />
to<br />
<haskell><br />
class (F a ~ b) => C a b where<br />
type F a<br />
</haskell><br />
That is, we represent every functional dependency (FD) <hask>a1 .. an -> b</hask> by an FD type family <hask>F a1 .. an</hask> and a superclass context equality <hask>F a1 .. an ~ b</hask>, essentially giving a name to the functional dependency. In class instances, we define the type instances of FD families in accordance with the class head. Method signatures are not affected by that process.<br />
<br />
== Frequently asked questions ==<br />
<br />
=== Comparing type families and functional dependencies ===<br />
<br />
Functional dependencies cover some of the same territory as type families. How do the two compare?<br />
There are some articles about this question:<br />
<br />
* Experiences in converting functional dependencies to type families: "[[Functional dependencies vs. type families]]"<br />
* [https://ghc.haskell.org/trac/ghc/wiki/TFvsFD GHC trac] on a comparison of functional dependencies and type families<br />
<br />
=== Injectivity, type inference, and ambiguity ===<br />
<br />
A common problem is this<br />
<haskell><br />
type family F a<br />
<br />
f :: F a -> F a<br />
f = undefined<br />
<br />
g :: F Int -> F Int<br />
g x = f x<br />
</haskell><br />
The compiler complains about the definition of <tt>g</tt> saying<br />
<code><br />
Couldn't match expected type `F Int' against inferred type `F a1'<br />
</code><br />
In type-checking <tt>g</tt>'s right hand side GHC discovers (by instantiating <tt>f</tt>'s type with a fresh type variable) that it has type <tt>F a1 -> F a1</tt> for some as-yet-unknown type <tt>a1</tt>. Now it tries to make the inferred type match <tt>g</tt>'s type signature. Well, you say, just make <tt>a1</tt> equal to <tt>Int</tt> and you are done. True, but what if there were these instances<br />
<haskell><br />
type instance F Int = Bool<br />
type instance F Char = Bool<br />
</haskell><br />
Then making <tt>a1</tt> equal to <tt>Char</tt> would ''also'' make the two types equal. Because there is (potentially) more than one choice, the program is rejected.<br />
<br />
However (and confusingly) if you omit the type signature on <tt>g</tt> altogether, thus<br />
<haskell><br />
f :: F a -> F a<br />
f = undefined<br />
<br />
g x = f x<br />
</haskell><br />
GHC will happily infer the type <tt>g :: F a -> F a</tt>. But you can't ''write'' that type signature or, indeed, the more specific one above. (Arguably this behaviour, where GHC ''infers'' a type it can't ''check'', is very confusing. I suppose we could make GHC reject both programs, with and without type signatures.)<br />
<br />
'''What is the problem?''' The nub of the issue is this: knowing that <tt>F t1</tt>=<tt>F t2</tt> does ''not'' imply that <tt>t1</tt> = <tt>t2</tt>.<br />
The difficulty is that the type function <tt>F</tt> need not be ''injective''; it can map two distinct types to the same type. For an injective type constructor like <tt>Maybe</tt>, if we know that <tt>Maybe t1</tt> = <tt>Maybe t2</tt>, then we know that <tt>t1</tt> = <tt>t2</tt>. But not so for non-injective type functions.<br />
<br />
The problem starts with <tt>f</tt>. Its type is ''ambiguous''; even if I know the argument and result types for <tt>f</tt>, I cannot use that to find the type at which <tt>a</tt> should be instantiated. (So arguably, <tt>f</tt> should be rejected as having an ambiguous type, and probably will be in future.) The situation is well known in type classes: <br />
<haskell><br />
bad :: (Read a, Show a) => String -> String<br />
bad x = show (read x)<br />
</haskell><br />
At a call of <tt>bad</tt> one cannot tell at what type <tt>a</tt> should be instantiated.<br />
<br />
The only solution is to avoid ambiguous types. In the type signature of a function, <br />
* Ensure that every type variable occurs in the part after the "<tt>=></tt>"<br />
* Ensure that every type variable appears at least once outside a type function call.<br />
Alternatively, you can use data families, which create new types and are therefore injective. The following code works:<br />
<br />
<haskell>data family F a<br />
<br />
f :: F a -> F a<br />
f = undefined<br />
<br />
g :: F Int -> F Int<br />
g x = f x</haskell><br />
<br />
== References ==<br />
<br />
* [http://www.cse.unsw.edu.au/~chak/papers/CKPM05.html Associated Types with Class.] Manuel M. T. Chakravarty, Gabriele Keller, Simon Peyton Jones, and Simon Marlow. In ''Proceedings of The 32nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'05)'', pages 1-13, ACM Press, 2005.<br />
* [http://www.cse.unsw.edu.au/~chak/papers/CKP05.html Associated Type Synonyms.] Manuel M. T. Chakravarty, Gabriele Keller, and Simon Peyton Jones. In ''Proceedings of The Tenth ACM SIGPLAN International Conference on Functional Programming'', ACM Press, pages 241-253, 2005.<br />
* [http://www.cse.unsw.edu.au/~chak/papers/SCPD07.html System F with Type Equality Coercions.] Martin Sulzmann, Manuel M. T. Chakravarty, Simon Peyton Jones, and Kevin Donnelly. In ''Proceedings of The Third ACM SIGPLAN Workshop on Types in Language Design and Implementation'', ACM Press, 2007.<br />
* [http://www.cse.unsw.edu.au/~chak/papers/SPCS08.html Type Checking With Open Type Functions.] Tom Schrijvers, Simon Peyton-Jones, Manuel M. T. Chakravarty, Martin Sulzmann. In ''Proceedings of The 13th ACM SIGPLAN International Conference on Functional Programming'', ACM Press, pages 51-62, 2008.<br />
* [[Simonpj/Talk:FunWithTypeFuns | Fun with Type Functions]] Oleg Kiselyov, Simon Peyton Jones, Chung-chieh Shan (the source for this paper can be found at http://patch-tag.com/r/schoenfinkel/typefunctions/wiki ) <br />
<br />
[[Category:Type-level programming]]<br />
[[Category:Language extensions]]<br />
[[Category:GHC|Indexed types]]</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=Thunk&diff=62927Thunk2019-06-06T16:50:59Z<p>Anton-Latukha: wiki is no place for personal active voice; add explanation of G-machine name</p>
<hr />
<div>[[Category:Glossary]]<br />
A '''thunk''' is a value that is yet to be evaluated.<br />
It is used in Haskell systems, that implement [[non-strict semantics]] by [[lazy evaluation]].<br />
A lazy run-time system does not evaluate a thunk unless it has to.<br />
Expressions are translated into a graph (''not'' a tree, as you might have expected, this enables sharing, and infinite lists!) and a Spineless Tagless G-machine (STG, G-machine is short for graph reduction machine) ''reduces'' it, chucking out any unneeded thunk, unevaluated.<br />
<br />
== Why are thunks useful? ==<br />
Well, if you don't need it, why evaluate it? Take for example the "lazy" <hask>&&</hask> (and) operation. Two boolean expressions joined by <hask>&&</hask> together is true if and only if both of them are. If you find out that one of them is false, you immediately know the joined expression cannot be true.<br />
<haskell><br />
-- the first is false, so is the answer, don't even need to know what the other is<br />
False && _ = False<br />
-- so the first turns out to be true, hmm...<br />
-- if the second is true, then the result is true<br />
-- if it's false, so is the result<br />
-- in other words, the result is the second!<br />
True && x = x<br />
</haskell> <br />
This function only evaluates the first parameter, because that's all that is needed. Even if the first parameter ''is'' true, you don't need to evaluate the second, so don't (so this version effectively is smarter than the explicit truth table). Who knows, the second parameter may get thrown out later as well!<br />
<br />
Perhaps a more convincing example is a (naive but intuitive) algorithm to find out if a given number is prime.<br />
<haskell><br />
-- the non-trivial factors are those who divide the number so no remainder<br />
factors n = filter (\m -> n `mod` m == 0) [2 .. (n - 1)]<br />
-- a number is a prime if it has no non-trivial factors<br />
isPrime n = n > 1 && null (factors n)<br />
</haskell><br />
Fascinatingly, <hask>isPrime</hask> evaluates to <hask>False</hask> as soon as it finds a factor (due to the lazy definition of <hask>null</hask>), and discards the rest of the list.<br />
<br />
== When are thunks not so useful? ==<br />
If you keep building up a very complicated graph to reduce later, it consumes memory (naturally), and can hinder performance, like, (from a blog comment made by Cale Gibbard)<br />
<haskell><br />
foldl (+) 0 [1,2,3]<br />
==> foldl (+) (0 + 1) [2,3]<br />
==> foldl (+) ((0 + 1) + 2) [3]<br />
==> foldl (+) (((0 + 1) + 2) + 3) []<br />
==> ((0 + 1) + 2) + 3<br />
==> (1 + 2) + 3<br />
==> 3 + 3<br />
==> 6<br />
</haskell><br />
and by <hask>==></hask>, "is reduced to" is meant. Lucky, the example is not applied to <hask>[1 .. 2^40]</hask> because it would have taken a very long time to load this page then, and so would your program when run. For more involved (and subtle!) examples, see [[Performance/Strictness]].<br />
<br />
== See also ==<br />
[[Performance/Laziness]]</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=Programming_guidelines&diff=62899Programming guidelines2019-04-10T10:06:51Z<p>Anton-Latukha: /* Naming Conventions */ upd handling of standard libs snake_case in outer code</p>
<hr />
<div>Programming guidelines shall help to make the code of a project better<br />
readable and maintainable by the varying number of contributors.<br />
<br />
It takes some programming experience to develop something like a<br />
personal "coding style" and guidelines only serve as rough shape for<br />
code. Guidelines should be followed by all members working on the<br />
project even if they prefer (or are already used to) different<br />
guidelines.<br />
<br />
These guidelines have been originally set up for the [http://hets.eu hets-project] and are<br />
now put on the [http://haskell.org/haskellwiki/ HaskellWiki] gradually<br />
integrating parts of the old hawiki<br />
entries [http://haskell.org/haskellwiki/Things_to_avoid ThingsToAvoid] and<br />
HaskellStyle (hopefully not<br />
hurting someone's copyrights). The other related entry<br />
TipsAndTricks treats more<br />
specific points that are left out here,<br />
<br />
Surely some style choices are a bit arbitrary (or "religious") and<br />
too restrictive with respect to language extensions. Nevertheless I hope<br />
to keep up these guidelines (at least as a basis) for our project<br />
in order to avoid maintaining diverging guidelines. Of course I want<br />
to supply - partly tool-dependent - reasons for certain decisions and<br />
also show alternatives by possibly bad examples. At the time of<br />
writing I use ghc-6.4.1, haddock-0.7 and (GNU-) emacs with the latest<br />
[http://www.haskell.org/haskell-mode/ haskell mode].<br />
<br />
The following quote and links are taken from<br />
HaskellStyle:<br />
<br />
We all have our own ideas about good Haskell style. There's More Than<br />
One Way To Do It. But some ways are better than others.<br />
<br />
Some comments from the GHC team about their internal coding<br />
standards can be found at<br />
http://hackage.haskell.org/trac/ghc/wiki/WorkingConventions<br />
<br />
Also http://research.microsoft.com/~simonpj/papers/haskell-retrospective/<br />
contains some brief comments on syntax and style,<br />
<br />
What now follows are descriptions of program documentation, file<br />
format, naming conventions and good programming practice (adapted form<br />
Matt's C/C++ Programming Guidelines and the Linux kernel coding<br />
style).<br />
<br />
<br />
=== Documentation ===<br />
<br />
<br />
Comments are to be written in application terms (i.e. user's point of<br />
view). Don't use technical terms - that's what the code is for!<br />
<br />
Comments should be written using correct spelling and grammar in complete<br />
sentences with punctation (in English only).<br />
<br />
"Generally, you want your comments to tell WHAT your code does, not HOW.<br />
Also, try to avoid putting comments inside a function body: if the<br />
function is so complex that you need to separately comment parts of it,<br />
you should probably" (... decompose it)<br />
<br />
Put a [http://haskell-haddock.readthedocs.io/en/latest/markup.html haddock comment] on top of every exported function and data type!<br />
Make sure haddock accepts these comments.<br />
<br />
=== File Format ===<br />
<br />
<br />
All Haskell source files start with a haddock header of the form:<br />
<br />
<haskell><br />
{- |<br />
Module : <File name or $Header$ to be replaced automatically><br />
Description : <optional short text displayed on contents page><br />
Copyright : (c) <Authors or Affiliations><br />
License : <license><br />
<br />
Maintainer : <email><br />
Stability : unstable | experimental | provisional | stable | frozen<br />
Portability : portable | non-portable (<reason>)<br />
<br />
<module description starting at first column><br />
-}<br />
</haskell><br />
<br />
A possible compiler pragma (like {-# LANGUAGE CPP #-}) may precede<br />
this header. The following hierarchical module name must, of course,<br />
match the file name.<br />
<br />
Make sure that the description is changed to meet the module (if the<br />
header was copied from elsewhere). Insert your email address as maintainer.<br />
<br />
Try to write portable ([[Language_and_library_specification#The_Haskell_98_report|Haskell98]]) code.<br />
If you use e.g. [[MPTC|multi-parameter type classes]] (MTPC) and [[Functional dependencies|functional dependencies]] (FD)<br />
the code becomes [https://mail.haskell.org/pipermail/haskell-prime/2006-February/000609.html "non-portable (MPTC with FD)"].<br />
<br />
The \$Header\$ entry will be automatically expanded.<br />
<br />
Lines should not be longer than 80 (preferably 75) characters.<br />
Code with short lines reads casually and easier to understand.<br />
If the expression is longer than 80 lines, try to structure & rewrite<br />
the code in a more expressive way.<br />
80 character lines considered over IT industry good practice and<br />
supported in all cases.<br />
<br />
Don't leave trailing white space in your code in every line.<br />
<br />
Expand all your tabs to spaces to avoid the danger of wrongly expanding<br />
them (or a different display of tabs versus eight spaces). Possibly put<br />
something like the following in your ~/.emacs file.<br />
<br />
(custom-set-variables '(indent-tabs-mode nil))<br />
<br />
The last character in your file should be a newline! Under Solaris<br />
you'll get a warning if this is not the case and sometimes last lines<br />
without newlines are ignored (i.e. "#endif" without newline). Emacs<br />
usually asks for a final newline.<br />
<br />
You may use http://hackage.haskell.org/package/scan to check your file format.<br />
<br />
The whole module should not be too long (about 400 lines)<br />
<br />
Please have a look at the [http://haskell-haddock.readthedocs.io/en/latest/markup.html#the-module-description Haddock module header documentation].<br />
<br />
=== Naming Conventions ===<br />
<br />
<br />
In Haskell types start with capital and functions with lowercase<br />
letters, so only avoid infix identifiers! Defining symbolic infix<br />
identifiers should be left to library writers only.<br />
<br />
(The infix identifier "\\" at the end of a line causes cpp preprocessor<br />
problems.)<br />
<br />
Names (especially global ones) should be descriptive. If you need<br />
long names write them in [[wiki:CamelCase|loverCamelCase]].<br />
Laconic names are preferred.<br />
<br />
Similarly, type, type class, and constructor names are written using<br />
[[wiki:CamelCase|UpperCamelCase]].<br />
<br />
In the standard libraries, some parts of Haskell code use<br />
[https://en.wikipedia.org/wiki/Snake_case snake_case] for long<br />
identifiers to better reflect names given with hyphens in the required<br />
documentation. If used in outer code - such names should be<br />
transliterated to camlCase identifiers possibly adding a (consistent)<br />
suffix or prefix to avoid conflicts with keywords. And instead of<br />
a recurring prefix or suffix, you may consider using qualified imports<br />
and names.<br />
<br />
=== Good Programming Practice ===<br />
<br />
<br />
"Functions should be short and sweet, and do just one thing. They should<br />
fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24,<br />
as we all know), and do one thing and do that well."<br />
<br />
Most haskell functions should be at most a few lines, only case<br />
expression over large data types (that should be avoided, too) may need<br />
corresponding space.<br />
<br />
The code should be succinct (though not obfuscated), readable and easy to<br />
maintain (after unforeseeable changes). Don't exploit exotic language<br />
features without good reason.<br />
<br />
It's not fixed how deep you indent (4 or 8 chars). You can break the<br />
line after "do", "let", "where", and "case .. of". Make sure that<br />
renamings don't destroy your layout. (If you get too far to the right,<br />
the code is unreadable anyway and needs to be decomposed.)<br />
<br />
Bad:<br />
<haskell><br />
case foo of Foo -> "Foo"<br />
Bar -> "Bar"<br />
</haskell><br />
<br />
Good:<br />
<haskell><br />
case <longer expression> of<br />
Foo -> "Foo"<br />
Bar -> "Bar"<br />
</haskell><br />
<br />
Avoid the notation with braces and semicolons since the layout rule<br />
forces you to properly align your alternatives.<br />
<br />
Respect compiler warnings. Supply type signatures, avoid shadowing and<br />
unused variables. Particularly avoid non-exhaustive and<br />
overlapping patterns. Missing unreachable cases can be filled in using<br />
"error" with a fixed string "<ModuleName>.<function>" to indicate the<br />
error position (in case the impossible should happen). Don't invest<br />
time to "show" the offending value, only do this temporarily when<br />
debugging the code.<br />
<br />
Don't leave unused or commented-out code in your files! Readers don't<br />
know what to think of it.<br />
<br />
<br />
==== Partial functions ====<br />
<br />
For partial functions do document their preconditions (if not obvious)<br />
and make sure that partial functions are only called when<br />
preconditions are obviously fulfilled (i.e. by a case statement or a<br />
previous test). Particularly the call of "head" should be used with<br />
care or (even better) be made obsolete by a case statement.<br />
<br />
Usually a case statement (and the import of isJust and fromJust from<br />
Data.[[Maybe]]) can be avoided by using the "maybe" function:<br />
<br />
<haskell><br />
maybe (error "<ModuleName>.<function>") id $ Map.lookup key map<br />
</haskell><br />
<br />
Generally we require you to be more explicit about failure<br />
cases. Surely a missing (or an irrefutable) pattern would precisely<br />
report the position of a runtime error, but these are not so obvious<br />
when reading the code.<br />
<br />
==== Let or where expressions ====<br />
<br />
Do avoid mixing and nesting "let" and "where". (I prefer the<br />
expression-stylistic "let".) Use auxiliary top-level functions that<br />
you do not export. Export lists also support the detection of unused<br />
functions.<br />
<br />
<br />
==== Code reuse ====<br />
<br />
If you notice that you're doing the same task again, try to generalize<br />
it in order to avoid duplicate code. It is frustrating to change the<br />
same error in several places.<br />
<br />
<br />
==== Application notation ====<br />
<br />
Many parentheses can be eliminated using the infix application operator <code>$</code><br />
with lowest priority. Try at least to avoid unnecessary parentheses in<br />
standard infix expression.<br />
<br />
<haskell><br />
f x : g x ++ h x<br />
<br />
a == 1 && b == 1 || a == 0 && b == 0<br />
</haskell><br />
<br />
Rather than putting a large final argument in parentheses (with a<br />
distant closing one) consider using <code>$</code> instead.<br />
<br />
<code>f (g x)</code> becomes <code>f $ g x</code> and consecutive applications<br />
<code>f (g (h x))</code> can be written as <code>f $ g $ h x</code> or <code>f . g $ h x</code>.<br />
<br />
A function definition like<br />
<code>f x = g $ h x</code> can be abbreviated to <code>f = g . h</code>.<br />
<br />
Note that the final argument may even be an infix- or case expression:<br />
<br />
<haskell><br />
map id $ c : l<br />
<br />
filter (const True) . map id $ case l of ...<br />
</haskell><br />
<br />
However, be aware that $-terms cannot be composed further in infix<br />
expressions.<br />
<br />
Probably wrong:<br />
<haskell><br />
f $ x ++ g $ x<br />
</haskell><br />
<br />
But the scope of an expression is also limited by the layout rule, so<br />
it is usually safe to use "$" on right hand sides.<br />
<br />
Ok:<br />
<br />
<haskell><br />
do f $ l<br />
++<br />
do g $ l<br />
</haskell><br />
<br />
Of course <code>$</code> can not be used in types. GHC has also some primitive<br />
functions involving the kind <code>#</code> that cannot be applied using <code>$</code>.<br />
<br />
Last warning: always leave spaces around <code>$</code> (and other mixfix<br />
operators) since a clash with template haskell is possible.<br />
<br />
(Also write <code>\ t</code> instead of <code>\t</code> in lambda expressions)<br />
<br />
==== List Comprehensions ====<br />
<br />
Use these only when "short and sweet". Prefer map, filter, and foldr!<br />
<br />
Instead of:<br />
<br />
<haskell><br />
[toUpper c | c <- s]<br />
</haskell><br />
<br />
write:<br />
<br />
<haskell><br />
map toUpper s<br />
</haskell><br />
<br />
Consider:<br />
<br />
<haskell><br />
[toUpper c | s <- strings, c <- s]<br />
</haskell><br />
<br />
Here it takes some time for the reader to find out which value depends<br />
on what other value and it is not so clear how many times the interim<br />
values s and c are used. In contrast to that the following can't be clearer:<br />
<br />
<haskell><br />
map toUpper (concat strings)<br />
</haskell><br />
<br />
<br />
When using higher order functions you can switch easier to data<br />
structures different from list. Compare:<br />
<br />
<haskell><br />
map (1+) list<br />
</haskell><br />
<br />
and:<br />
<br />
<haskell><br />
Set.map (1+) set<br />
</haskell><br />
<br />
<br />
==== Types ====<br />
<br />
Prefer proper data types over type synonyms or tuples even if you have<br />
to do more constructing and unpacking. This will make it easier to<br />
supply class instances later on. Don't put class constraints on<br />
a data type, constraints belong only to the functions that manipulate<br />
the data.<br />
<br />
Using type synonyms consistently is difficult over a longer time,<br />
because this is not checked by the compiler. (The types shown by<br />
the compiler may be unpredictable: i.e. FilePath, String or [Char])<br />
<br />
Take care if your data type has many variants (unless it is an<br />
enumeration type.) Don't repeat common parts in every variant since<br />
this will cause code duplication.<br />
<br />
Bad (to handle arguments in sync):<br />
<br />
<haskell><br />
data Mode f p = Box f p | Diamond f p<br />
</haskell><br />
<br />
Good (to handle arguments only once):<br />
<br />
<haskell><br />
data BoxOrDiamond = Box | Diamond<br />
<br />
data Mode f p = Mode BoxOrDiamond f p<br />
</haskell><br />
<br />
<br />
Consider (bad):<br />
<br />
<haskell><br />
data Tuple a b = Tuple a b | Undefined<br />
</haskell><br />
<br />
versus (better):<br />
<br />
<haskell><br />
data Tuple a b = Tuple a b<br />
</haskell><br />
<br />
and using:<br />
<br />
<haskell><br />
Maybe (Tuple a b)<br />
</haskell><br />
<br />
(or another monad) whenever an undefined result needs to be propagated<br />
<br />
<br />
==== Records ====<br />
<br />
For (large) records avoid the use of the constructor directly and<br />
remember that the order and number of fields may change.<br />
<br />
Take care with (the rare case of) depend polymorphic fields:<br />
<br />
<haskell><br />
data Fields a = VariantWithTwo<br />
{ field1 :: a<br />
, field2 :: a }<br />
</haskell><br />
<br />
The type of a value v can not be changed by only setting field1:<br />
<br />
<haskell><br />
v { field1 = f }<br />
</haskell><br />
<br />
Better construct a new value:<br />
<br />
<haskell><br />
VariantWithTwo { field1 = f } -- leaving field2 undefined<br />
</haskell><br />
<br />
Or use a polymorphic element that is instantiated by updating:<br />
<br />
<haskell><br />
empty = VariantWithTwo { field1 = [], field2 = [] }<br />
<br />
empty { field1 = [f] }<br />
</haskell><br />
<br />
Several variants with identical fields may avoid some code duplication<br />
when selecting and updating, though possibly not in a few<br />
depended polymorphic cases.<br />
<br />
However, I doubt if the following is a really good alternative to the<br />
above data Mode with data BoxOrDiamond.<br />
<br />
<haskell><br />
data Mode f p =<br />
Box { formula :: f, positions :: p }<br />
| Diamond { formula :: f, positions :: p }<br />
</haskell><br />
<br />
<br />
==== IO ====<br />
<br />
Try to strictly separate IO, Monad and pure (without do) function<br />
programming (possibly via separate modules).<br />
<br />
Bad:<br />
<br />
<haskell><br />
x <- return y<br />
...<br />
</haskell><br />
<br />
Good:<br />
<br />
<haskell><br />
let x = y<br />
...<br />
</haskell><br />
<br />
Don't use <code>Prelude.interact</code> and make sure your program does not depend<br />
on the (not always obvious) order of evaluation. E.g. don't read and<br />
write to the same file:<br />
<br />
This will fail:<br />
<br />
<haskell><br />
do s <- readFile f<br />
writeFile f $ 'a' : s<br />
</haskell><br />
<br />
because of lazy IO! (Writing is starting before reading is finished.)<br />
<br />
==== Trace ====<br />
<br />
Tracing is for debugging purposes only and should not be used as<br />
feedback for the user. Clean code is not cluttered by trace calls.<br />
<br />
<br />
==== Imports ====<br />
<br />
Standard library modules like Char. List, Maybe, Monad, etc should be<br />
imported by their hierarchical module name, i.e. the base package (so<br />
that haddock finds them):<br />
<br />
<haskell><br />
import Data.List<br />
import Control.Monad<br />
import System.Environment<br />
</haskell><br />
<br />
The libraries for Set and Map are to be imported qualified:<br />
<br />
<haskell><br />
import qualified Data.Set as Set<br />
import qualified Data.Map as Map<br />
</haskell><br />
<br />
<br />
==== Glasgow extensions and Classes ====<br />
<br />
[[Use of language extensions|Stay away from extensions]] as long as possible. Also use classes with<br />
care because soon the desire for overlapping instances (like for lists<br />
and strings) may arise. Then you may want MPTC (multi-parameter type<br />
classes), functional dependencies (FD), undecidable and possibly incoherent<br />
instances and then you are "in the wild" (according to SPJ).<br />
<br />
=== Style in other languages ===<br />
<br />
* [http://www.cs.caltech.edu/~cs20/a/style.html OCaml style]<br />
<br />
=== Final remarks ===<br />
<br />
Despite guidelines, writing "correct code" (without formal proof<br />
support yet) still remains the major challenge. As motivation to<br />
follow these guidelines consider the points that are from the "C++<br />
Coding Standard", where I replaced "C++" with "Haskell".<br />
<br />
Good Points:<br />
<br />
* programmers can go into any code and figure out what's going on<br />
<br />
* new people can get up to speed quickly<br />
<br />
* people new to Haskell are spared the need to develop a personal style and defend it to the death<br />
<br />
* people new to Haskell are spared making the same mistakes over and over again<br />
<br />
* people make fewer mistakes in consistent environments<br />
<br />
* programmers have a common enemy :-)<br />
<br />
Bad Points:<br />
<br />
* the standard is usually stupid because it was made by someone who doesn't understand Haskell<br />
<br />
* the standard is usually stupid because it's not what I do<br />
<br />
* standards reduce creativity<br />
<br />
* standards are unnecessary as long as people are consistent<br />
<br />
* standards enforce too much structure<br />
<br />
* people ignore standards anyway<br />
<br />
== Another guidelines ==<br />
<br />
[https://www.cse.unsw.edu.au/~cs3161/15s2/StyleGuide.html COMP3161]<br />
<br />
[http://docs.ganeti.org/ganeti/2.13/html/dev-codestyle.html#haskell Google's ganeti]<br />
<br />
[https://kowainik.github.io/posts/2019-02-06-style-guide kowainik]<br />
<br />
[https://github.com/Soostone/style-guides/blob/master/haskell-style-guide.md Soostone]<br />
<br />
[https://github.com/tibbe/haskell-style-guide/blob/master/haskell-style.md tibbe]<br />
<br />
[https://github.com/tweag/guides/blob/master/style/Haskell.md tweag]<br />
<br />
[[Category:Style]]</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=Programming_guidelines&diff=62898Programming guidelines2019-04-10T10:01:00Z<p>Anton-Latukha: /* Naming Conventions */ fx url</p>
<hr />
<div>Programming guidelines shall help to make the code of a project better<br />
readable and maintainable by the varying number of contributors.<br />
<br />
It takes some programming experience to develop something like a<br />
personal "coding style" and guidelines only serve as rough shape for<br />
code. Guidelines should be followed by all members working on the<br />
project even if they prefer (or are already used to) different<br />
guidelines.<br />
<br />
These guidelines have been originally set up for the [http://hets.eu hets-project] and are<br />
now put on the [http://haskell.org/haskellwiki/ HaskellWiki] gradually<br />
integrating parts of the old hawiki<br />
entries [http://haskell.org/haskellwiki/Things_to_avoid ThingsToAvoid] and<br />
HaskellStyle (hopefully not<br />
hurting someone's copyrights). The other related entry<br />
TipsAndTricks treats more<br />
specific points that are left out here,<br />
<br />
Surely some style choices are a bit arbitrary (or "religious") and<br />
too restrictive with respect to language extensions. Nevertheless I hope<br />
to keep up these guidelines (at least as a basis) for our project<br />
in order to avoid maintaining diverging guidelines. Of course I want<br />
to supply - partly tool-dependent - reasons for certain decisions and<br />
also show alternatives by possibly bad examples. At the time of<br />
writing I use ghc-6.4.1, haddock-0.7 and (GNU-) emacs with the latest<br />
[http://www.haskell.org/haskell-mode/ haskell mode].<br />
<br />
The following quote and links are taken from<br />
HaskellStyle:<br />
<br />
We all have our own ideas about good Haskell style. There's More Than<br />
One Way To Do It. But some ways are better than others.<br />
<br />
Some comments from the GHC team about their internal coding<br />
standards can be found at<br />
http://hackage.haskell.org/trac/ghc/wiki/WorkingConventions<br />
<br />
Also http://research.microsoft.com/~simonpj/papers/haskell-retrospective/<br />
contains some brief comments on syntax and style,<br />
<br />
What now follows are descriptions of program documentation, file<br />
format, naming conventions and good programming practice (adapted form<br />
Matt's C/C++ Programming Guidelines and the Linux kernel coding<br />
style).<br />
<br />
<br />
=== Documentation ===<br />
<br />
<br />
Comments are to be written in application terms (i.e. user's point of<br />
view). Don't use technical terms - that's what the code is for!<br />
<br />
Comments should be written using correct spelling and grammar in complete<br />
sentences with punctation (in English only).<br />
<br />
"Generally, you want your comments to tell WHAT your code does, not HOW.<br />
Also, try to avoid putting comments inside a function body: if the<br />
function is so complex that you need to separately comment parts of it,<br />
you should probably" (... decompose it)<br />
<br />
Put a [http://haskell-haddock.readthedocs.io/en/latest/markup.html haddock comment] on top of every exported function and data type!<br />
Make sure haddock accepts these comments.<br />
<br />
=== File Format ===<br />
<br />
<br />
All Haskell source files start with a haddock header of the form:<br />
<br />
<haskell><br />
{- |<br />
Module : <File name or $Header$ to be replaced automatically><br />
Description : <optional short text displayed on contents page><br />
Copyright : (c) <Authors or Affiliations><br />
License : <license><br />
<br />
Maintainer : <email><br />
Stability : unstable | experimental | provisional | stable | frozen<br />
Portability : portable | non-portable (<reason>)<br />
<br />
<module description starting at first column><br />
-}<br />
</haskell><br />
<br />
A possible compiler pragma (like {-# LANGUAGE CPP #-}) may precede<br />
this header. The following hierarchical module name must, of course,<br />
match the file name.<br />
<br />
Make sure that the description is changed to meet the module (if the<br />
header was copied from elsewhere). Insert your email address as maintainer.<br />
<br />
Try to write portable ([[Language_and_library_specification#The_Haskell_98_report|Haskell98]]) code.<br />
If you use e.g. [[MPTC|multi-parameter type classes]] (MTPC) and [[Functional dependencies|functional dependencies]] (FD)<br />
the code becomes [https://mail.haskell.org/pipermail/haskell-prime/2006-February/000609.html "non-portable (MPTC with FD)"].<br />
<br />
The \$Header\$ entry will be automatically expanded.<br />
<br />
Lines should not be longer than 80 (preferably 75) characters.<br />
Code with short lines reads casually and easier to understand.<br />
If the expression is longer than 80 lines, try to structure & rewrite<br />
the code in a more expressive way.<br />
80 character lines considered over IT industry good practice and<br />
supported in all cases.<br />
<br />
Don't leave trailing white space in your code in every line.<br />
<br />
Expand all your tabs to spaces to avoid the danger of wrongly expanding<br />
them (or a different display of tabs versus eight spaces). Possibly put<br />
something like the following in your ~/.emacs file.<br />
<br />
(custom-set-variables '(indent-tabs-mode nil))<br />
<br />
The last character in your file should be a newline! Under Solaris<br />
you'll get a warning if this is not the case and sometimes last lines<br />
without newlines are ignored (i.e. "#endif" without newline). Emacs<br />
usually asks for a final newline.<br />
<br />
You may use http://hackage.haskell.org/package/scan to check your file format.<br />
<br />
The whole module should not be too long (about 400 lines)<br />
<br />
Please have a look at the [http://haskell-haddock.readthedocs.io/en/latest/markup.html#the-module-description Haddock module header documentation].<br />
<br />
=== Naming Conventions ===<br />
<br />
<br />
In Haskell types start with capital and functions with lowercase<br />
letters, so only avoid infix identifiers! Defining symbolic infix<br />
identifiers should be left to library writers only.<br />
<br />
(The infix identifier "\\" at the end of a line causes cpp preprocessor<br />
problems.)<br />
<br />
Names (especially global ones) should be descriptive. If you need<br />
long names write them in [[wiki:CamelCase|loverCamelCase]].<br />
Laconic names are preferred.<br />
<br />
Similarly, type, type class, and constructor names are written using<br />
[[wiki:CamelCase|UpperCamelCase]].<br />
<br />
In the standard libraries, some parts of Haskell code use<br />
[https://en.wikipedia.org/wiki/Snake_case snake_case] for long<br />
identifiers to better reflect names given with hyphens in the required<br />
documentation. Also, such names should be<br />
transliterated to camlCase identifiers possibly adding a (consistent)<br />
suffix or prefix to avoid conflicts with keywords. However, instead of<br />
a recurring prefix or suffix, you may consider using qualified imports<br />
and names.<br />
<br />
=== Good Programming Practice ===<br />
<br />
<br />
"Functions should be short and sweet, and do just one thing. They should<br />
fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24,<br />
as we all know), and do one thing and do that well."<br />
<br />
Most haskell functions should be at most a few lines, only case<br />
expression over large data types (that should be avoided, too) may need<br />
corresponding space.<br />
<br />
The code should be succinct (though not obfuscated), readable and easy to<br />
maintain (after unforeseeable changes). Don't exploit exotic language<br />
features without good reason.<br />
<br />
It's not fixed how deep you indent (4 or 8 chars). You can break the<br />
line after "do", "let", "where", and "case .. of". Make sure that<br />
renamings don't destroy your layout. (If you get too far to the right,<br />
the code is unreadable anyway and needs to be decomposed.)<br />
<br />
Bad:<br />
<haskell><br />
case foo of Foo -> "Foo"<br />
Bar -> "Bar"<br />
</haskell><br />
<br />
Good:<br />
<haskell><br />
case <longer expression> of<br />
Foo -> "Foo"<br />
Bar -> "Bar"<br />
</haskell><br />
<br />
Avoid the notation with braces and semicolons since the layout rule<br />
forces you to properly align your alternatives.<br />
<br />
Respect compiler warnings. Supply type signatures, avoid shadowing and<br />
unused variables. Particularly avoid non-exhaustive and<br />
overlapping patterns. Missing unreachable cases can be filled in using<br />
"error" with a fixed string "<ModuleName>.<function>" to indicate the<br />
error position (in case the impossible should happen). Don't invest<br />
time to "show" the offending value, only do this temporarily when<br />
debugging the code.<br />
<br />
Don't leave unused or commented-out code in your files! Readers don't<br />
know what to think of it.<br />
<br />
<br />
==== Partial functions ====<br />
<br />
For partial functions do document their preconditions (if not obvious)<br />
and make sure that partial functions are only called when<br />
preconditions are obviously fulfilled (i.e. by a case statement or a<br />
previous test). Particularly the call of "head" should be used with<br />
care or (even better) be made obsolete by a case statement.<br />
<br />
Usually a case statement (and the import of isJust and fromJust from<br />
Data.[[Maybe]]) can be avoided by using the "maybe" function:<br />
<br />
<haskell><br />
maybe (error "<ModuleName>.<function>") id $ Map.lookup key map<br />
</haskell><br />
<br />
Generally we require you to be more explicit about failure<br />
cases. Surely a missing (or an irrefutable) pattern would precisely<br />
report the position of a runtime error, but these are not so obvious<br />
when reading the code.<br />
<br />
==== Let or where expressions ====<br />
<br />
Do avoid mixing and nesting "let" and "where". (I prefer the<br />
expression-stylistic "let".) Use auxiliary top-level functions that<br />
you do not export. Export lists also support the detection of unused<br />
functions.<br />
<br />
<br />
==== Code reuse ====<br />
<br />
If you notice that you're doing the same task again, try to generalize<br />
it in order to avoid duplicate code. It is frustrating to change the<br />
same error in several places.<br />
<br />
<br />
==== Application notation ====<br />
<br />
Many parentheses can be eliminated using the infix application operator <code>$</code><br />
with lowest priority. Try at least to avoid unnecessary parentheses in<br />
standard infix expression.<br />
<br />
<haskell><br />
f x : g x ++ h x<br />
<br />
a == 1 && b == 1 || a == 0 && b == 0<br />
</haskell><br />
<br />
Rather than putting a large final argument in parentheses (with a<br />
distant closing one) consider using <code>$</code> instead.<br />
<br />
<code>f (g x)</code> becomes <code>f $ g x</code> and consecutive applications<br />
<code>f (g (h x))</code> can be written as <code>f $ g $ h x</code> or <code>f . g $ h x</code>.<br />
<br />
A function definition like<br />
<code>f x = g $ h x</code> can be abbreviated to <code>f = g . h</code>.<br />
<br />
Note that the final argument may even be an infix- or case expression:<br />
<br />
<haskell><br />
map id $ c : l<br />
<br />
filter (const True) . map id $ case l of ...<br />
</haskell><br />
<br />
However, be aware that $-terms cannot be composed further in infix<br />
expressions.<br />
<br />
Probably wrong:<br />
<haskell><br />
f $ x ++ g $ x<br />
</haskell><br />
<br />
But the scope of an expression is also limited by the layout rule, so<br />
it is usually safe to use "$" on right hand sides.<br />
<br />
Ok:<br />
<br />
<haskell><br />
do f $ l<br />
++<br />
do g $ l<br />
</haskell><br />
<br />
Of course <code>$</code> can not be used in types. GHC has also some primitive<br />
functions involving the kind <code>#</code> that cannot be applied using <code>$</code>.<br />
<br />
Last warning: always leave spaces around <code>$</code> (and other mixfix<br />
operators) since a clash with template haskell is possible.<br />
<br />
(Also write <code>\ t</code> instead of <code>\t</code> in lambda expressions)<br />
<br />
==== List Comprehensions ====<br />
<br />
Use these only when "short and sweet". Prefer map, filter, and foldr!<br />
<br />
Instead of:<br />
<br />
<haskell><br />
[toUpper c | c <- s]<br />
</haskell><br />
<br />
write:<br />
<br />
<haskell><br />
map toUpper s<br />
</haskell><br />
<br />
Consider:<br />
<br />
<haskell><br />
[toUpper c | s <- strings, c <- s]<br />
</haskell><br />
<br />
Here it takes some time for the reader to find out which value depends<br />
on what other value and it is not so clear how many times the interim<br />
values s and c are used. In contrast to that the following can't be clearer:<br />
<br />
<haskell><br />
map toUpper (concat strings)<br />
</haskell><br />
<br />
<br />
When using higher order functions you can switch easier to data<br />
structures different from list. Compare:<br />
<br />
<haskell><br />
map (1+) list<br />
</haskell><br />
<br />
and:<br />
<br />
<haskell><br />
Set.map (1+) set<br />
</haskell><br />
<br />
<br />
==== Types ====<br />
<br />
Prefer proper data types over type synonyms or tuples even if you have<br />
to do more constructing and unpacking. This will make it easier to<br />
supply class instances later on. Don't put class constraints on<br />
a data type, constraints belong only to the functions that manipulate<br />
the data.<br />
<br />
Using type synonyms consistently is difficult over a longer time,<br />
because this is not checked by the compiler. (The types shown by<br />
the compiler may be unpredictable: i.e. FilePath, String or [Char])<br />
<br />
Take care if your data type has many variants (unless it is an<br />
enumeration type.) Don't repeat common parts in every variant since<br />
this will cause code duplication.<br />
<br />
Bad (to handle arguments in sync):<br />
<br />
<haskell><br />
data Mode f p = Box f p | Diamond f p<br />
</haskell><br />
<br />
Good (to handle arguments only once):<br />
<br />
<haskell><br />
data BoxOrDiamond = Box | Diamond<br />
<br />
data Mode f p = Mode BoxOrDiamond f p<br />
</haskell><br />
<br />
<br />
Consider (bad):<br />
<br />
<haskell><br />
data Tuple a b = Tuple a b | Undefined<br />
</haskell><br />
<br />
versus (better):<br />
<br />
<haskell><br />
data Tuple a b = Tuple a b<br />
</haskell><br />
<br />
and using:<br />
<br />
<haskell><br />
Maybe (Tuple a b)<br />
</haskell><br />
<br />
(or another monad) whenever an undefined result needs to be propagated<br />
<br />
<br />
==== Records ====<br />
<br />
For (large) records avoid the use of the constructor directly and<br />
remember that the order and number of fields may change.<br />
<br />
Take care with (the rare case of) depend polymorphic fields:<br />
<br />
<haskell><br />
data Fields a = VariantWithTwo<br />
{ field1 :: a<br />
, field2 :: a }<br />
</haskell><br />
<br />
The type of a value v can not be changed by only setting field1:<br />
<br />
<haskell><br />
v { field1 = f }<br />
</haskell><br />
<br />
Better construct a new value:<br />
<br />
<haskell><br />
VariantWithTwo { field1 = f } -- leaving field2 undefined<br />
</haskell><br />
<br />
Or use a polymorphic element that is instantiated by updating:<br />
<br />
<haskell><br />
empty = VariantWithTwo { field1 = [], field2 = [] }<br />
<br />
empty { field1 = [f] }<br />
</haskell><br />
<br />
Several variants with identical fields may avoid some code duplication<br />
when selecting and updating, though possibly not in a few<br />
depended polymorphic cases.<br />
<br />
However, I doubt if the following is a really good alternative to the<br />
above data Mode with data BoxOrDiamond.<br />
<br />
<haskell><br />
data Mode f p =<br />
Box { formula :: f, positions :: p }<br />
| Diamond { formula :: f, positions :: p }<br />
</haskell><br />
<br />
<br />
==== IO ====<br />
<br />
Try to strictly separate IO, Monad and pure (without do) function<br />
programming (possibly via separate modules).<br />
<br />
Bad:<br />
<br />
<haskell><br />
x <- return y<br />
...<br />
</haskell><br />
<br />
Good:<br />
<br />
<haskell><br />
let x = y<br />
...<br />
</haskell><br />
<br />
Don't use <code>Prelude.interact</code> and make sure your program does not depend<br />
on the (not always obvious) order of evaluation. E.g. don't read and<br />
write to the same file:<br />
<br />
This will fail:<br />
<br />
<haskell><br />
do s <- readFile f<br />
writeFile f $ 'a' : s<br />
</haskell><br />
<br />
because of lazy IO! (Writing is starting before reading is finished.)<br />
<br />
==== Trace ====<br />
<br />
Tracing is for debugging purposes only and should not be used as<br />
feedback for the user. Clean code is not cluttered by trace calls.<br />
<br />
<br />
==== Imports ====<br />
<br />
Standard library modules like Char. List, Maybe, Monad, etc should be<br />
imported by their hierarchical module name, i.e. the base package (so<br />
that haddock finds them):<br />
<br />
<haskell><br />
import Data.List<br />
import Control.Monad<br />
import System.Environment<br />
</haskell><br />
<br />
The libraries for Set and Map are to be imported qualified:<br />
<br />
<haskell><br />
import qualified Data.Set as Set<br />
import qualified Data.Map as Map<br />
</haskell><br />
<br />
<br />
==== Glasgow extensions and Classes ====<br />
<br />
[[Use of language extensions|Stay away from extensions]] as long as possible. Also use classes with<br />
care because soon the desire for overlapping instances (like for lists<br />
and strings) may arise. Then you may want MPTC (multi-parameter type<br />
classes), functional dependencies (FD), undecidable and possibly incoherent<br />
instances and then you are "in the wild" (according to SPJ).<br />
<br />
=== Style in other languages ===<br />
<br />
* [http://www.cs.caltech.edu/~cs20/a/style.html OCaml style]<br />
<br />
=== Final remarks ===<br />
<br />
Despite guidelines, writing "correct code" (without formal proof<br />
support yet) still remains the major challenge. As motivation to<br />
follow these guidelines consider the points that are from the "C++<br />
Coding Standard", where I replaced "C++" with "Haskell".<br />
<br />
Good Points:<br />
<br />
* programmers can go into any code and figure out what's going on<br />
<br />
* new people can get up to speed quickly<br />
<br />
* people new to Haskell are spared the need to develop a personal style and defend it to the death<br />
<br />
* people new to Haskell are spared making the same mistakes over and over again<br />
<br />
* people make fewer mistakes in consistent environments<br />
<br />
* programmers have a common enemy :-)<br />
<br />
Bad Points:<br />
<br />
* the standard is usually stupid because it was made by someone who doesn't understand Haskell<br />
<br />
* the standard is usually stupid because it's not what I do<br />
<br />
* standards reduce creativity<br />
<br />
* standards are unnecessary as long as people are consistent<br />
<br />
* standards enforce too much structure<br />
<br />
* people ignore standards anyway<br />
<br />
== Another guidelines ==<br />
<br />
[https://www.cse.unsw.edu.au/~cs3161/15s2/StyleGuide.html COMP3161]<br />
<br />
[http://docs.ganeti.org/ganeti/2.13/html/dev-codestyle.html#haskell Google's ganeti]<br />
<br />
[https://kowainik.github.io/posts/2019-02-06-style-guide kowainik]<br />
<br />
[https://github.com/Soostone/style-guides/blob/master/haskell-style-guide.md Soostone]<br />
<br />
[https://github.com/tibbe/haskell-style-guide/blob/master/haskell-style.md tibbe]<br />
<br />
[https://github.com/tweag/guides/blob/master/style/Haskell.md tweag]<br />
<br />
[[Category:Style]]</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=Programming_guidelines&diff=62897Programming guidelines2019-04-10T10:00:06Z<p>Anton-Latukha: /* Naming Conventions */ upd case suggestions</p>
<hr />
<div>Programming guidelines shall help to make the code of a project better<br />
readable and maintainable by the varying number of contributors.<br />
<br />
It takes some programming experience to develop something like a<br />
personal "coding style" and guidelines only serve as rough shape for<br />
code. Guidelines should be followed by all members working on the<br />
project even if they prefer (or are already used to) different<br />
guidelines.<br />
<br />
These guidelines have been originally set up for the [http://hets.eu hets-project] and are<br />
now put on the [http://haskell.org/haskellwiki/ HaskellWiki] gradually<br />
integrating parts of the old hawiki<br />
entries [http://haskell.org/haskellwiki/Things_to_avoid ThingsToAvoid] and<br />
HaskellStyle (hopefully not<br />
hurting someone's copyrights). The other related entry<br />
TipsAndTricks treats more<br />
specific points that are left out here,<br />
<br />
Surely some style choices are a bit arbitrary (or "religious") and<br />
too restrictive with respect to language extensions. Nevertheless I hope<br />
to keep up these guidelines (at least as a basis) for our project<br />
in order to avoid maintaining diverging guidelines. Of course I want<br />
to supply - partly tool-dependent - reasons for certain decisions and<br />
also show alternatives by possibly bad examples. At the time of<br />
writing I use ghc-6.4.1, haddock-0.7 and (GNU-) emacs with the latest<br />
[http://www.haskell.org/haskell-mode/ haskell mode].<br />
<br />
The following quote and links are taken from<br />
HaskellStyle:<br />
<br />
We all have our own ideas about good Haskell style. There's More Than<br />
One Way To Do It. But some ways are better than others.<br />
<br />
Some comments from the GHC team about their internal coding<br />
standards can be found at<br />
http://hackage.haskell.org/trac/ghc/wiki/WorkingConventions<br />
<br />
Also http://research.microsoft.com/~simonpj/papers/haskell-retrospective/<br />
contains some brief comments on syntax and style,<br />
<br />
What now follows are descriptions of program documentation, file<br />
format, naming conventions and good programming practice (adapted form<br />
Matt's C/C++ Programming Guidelines and the Linux kernel coding<br />
style).<br />
<br />
<br />
=== Documentation ===<br />
<br />
<br />
Comments are to be written in application terms (i.e. user's point of<br />
view). Don't use technical terms - that's what the code is for!<br />
<br />
Comments should be written using correct spelling and grammar in complete<br />
sentences with punctation (in English only).<br />
<br />
"Generally, you want your comments to tell WHAT your code does, not HOW.<br />
Also, try to avoid putting comments inside a function body: if the<br />
function is so complex that you need to separately comment parts of it,<br />
you should probably" (... decompose it)<br />
<br />
Put a [http://haskell-haddock.readthedocs.io/en/latest/markup.html haddock comment] on top of every exported function and data type!<br />
Make sure haddock accepts these comments.<br />
<br />
=== File Format ===<br />
<br />
<br />
All Haskell source files start with a haddock header of the form:<br />
<br />
<haskell><br />
{- |<br />
Module : <File name or $Header$ to be replaced automatically><br />
Description : <optional short text displayed on contents page><br />
Copyright : (c) <Authors or Affiliations><br />
License : <license><br />
<br />
Maintainer : <email><br />
Stability : unstable | experimental | provisional | stable | frozen<br />
Portability : portable | non-portable (<reason>)<br />
<br />
<module description starting at first column><br />
-}<br />
</haskell><br />
<br />
A possible compiler pragma (like {-# LANGUAGE CPP #-}) may precede<br />
this header. The following hierarchical module name must, of course,<br />
match the file name.<br />
<br />
Make sure that the description is changed to meet the module (if the<br />
header was copied from elsewhere). Insert your email address as maintainer.<br />
<br />
Try to write portable ([[Language_and_library_specification#The_Haskell_98_report|Haskell98]]) code.<br />
If you use e.g. [[MPTC|multi-parameter type classes]] (MTPC) and [[Functional dependencies|functional dependencies]] (FD)<br />
the code becomes [https://mail.haskell.org/pipermail/haskell-prime/2006-February/000609.html "non-portable (MPTC with FD)"].<br />
<br />
The \$Header\$ entry will be automatically expanded.<br />
<br />
Lines should not be longer than 80 (preferably 75) characters.<br />
Code with short lines reads casually and easier to understand.<br />
If the expression is longer than 80 lines, try to structure & rewrite<br />
the code in a more expressive way.<br />
80 character lines considered over IT industry good practice and<br />
supported in all cases.<br />
<br />
Don't leave trailing white space in your code in every line.<br />
<br />
Expand all your tabs to spaces to avoid the danger of wrongly expanding<br />
them (or a different display of tabs versus eight spaces). Possibly put<br />
something like the following in your ~/.emacs file.<br />
<br />
(custom-set-variables '(indent-tabs-mode nil))<br />
<br />
The last character in your file should be a newline! Under Solaris<br />
you'll get a warning if this is not the case and sometimes last lines<br />
without newlines are ignored (i.e. "#endif" without newline). Emacs<br />
usually asks for a final newline.<br />
<br />
You may use http://hackage.haskell.org/package/scan to check your file format.<br />
<br />
The whole module should not be too long (about 400 lines)<br />
<br />
Please have a look at the [http://haskell-haddock.readthedocs.io/en/latest/markup.html#the-module-description Haddock module header documentation].<br />
<br />
=== Naming Conventions ===<br />
<br />
<br />
In Haskell types start with capital and functions with lowercase<br />
letters, so only avoid infix identifiers! Defining symbolic infix<br />
identifiers should be left to library writers only.<br />
<br />
(The infix identifier "\\" at the end of a line causes cpp preprocessor<br />
problems.)<br />
<br />
Names (especially global ones) should be descriptive. If you need<br />
long names write them in [[wiki:CamelCase|loverCamelCase]].<br />
Laconic names are preferred.<br />
<br />
Similarly, type, type class, and constructor names are written using<br />
[[wiki:CamelCase|UpperCamelCase]].<br />
<br />
In the standard libraries, some parts of Haskell code use<br />
[https://en.wikipedia.org/wiki/Snake_case|snake_case] for long<br />
identifiers to better reflect names given with hyphens in the required<br />
documentation. Also, such names should be<br />
transliterated to camlCase identifiers possibly adding a (consistent)<br />
suffix or prefix to avoid conflicts with keywords. However, instead of<br />
a recurring prefix or suffix, you may consider using qualified imports<br />
and names.<br />
<br />
=== Good Programming Practice ===<br />
<br />
<br />
"Functions should be short and sweet, and do just one thing. They should<br />
fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24,<br />
as we all know), and do one thing and do that well."<br />
<br />
Most haskell functions should be at most a few lines, only case<br />
expression over large data types (that should be avoided, too) may need<br />
corresponding space.<br />
<br />
The code should be succinct (though not obfuscated), readable and easy to<br />
maintain (after unforeseeable changes). Don't exploit exotic language<br />
features without good reason.<br />
<br />
It's not fixed how deep you indent (4 or 8 chars). You can break the<br />
line after "do", "let", "where", and "case .. of". Make sure that<br />
renamings don't destroy your layout. (If you get too far to the right,<br />
the code is unreadable anyway and needs to be decomposed.)<br />
<br />
Bad:<br />
<haskell><br />
case foo of Foo -> "Foo"<br />
Bar -> "Bar"<br />
</haskell><br />
<br />
Good:<br />
<haskell><br />
case <longer expression> of<br />
Foo -> "Foo"<br />
Bar -> "Bar"<br />
</haskell><br />
<br />
Avoid the notation with braces and semicolons since the layout rule<br />
forces you to properly align your alternatives.<br />
<br />
Respect compiler warnings. Supply type signatures, avoid shadowing and<br />
unused variables. Particularly avoid non-exhaustive and<br />
overlapping patterns. Missing unreachable cases can be filled in using<br />
"error" with a fixed string "<ModuleName>.<function>" to indicate the<br />
error position (in case the impossible should happen). Don't invest<br />
time to "show" the offending value, only do this temporarily when<br />
debugging the code.<br />
<br />
Don't leave unused or commented-out code in your files! Readers don't<br />
know what to think of it.<br />
<br />
<br />
==== Partial functions ====<br />
<br />
For partial functions do document their preconditions (if not obvious)<br />
and make sure that partial functions are only called when<br />
preconditions are obviously fulfilled (i.e. by a case statement or a<br />
previous test). Particularly the call of "head" should be used with<br />
care or (even better) be made obsolete by a case statement.<br />
<br />
Usually a case statement (and the import of isJust and fromJust from<br />
Data.[[Maybe]]) can be avoided by using the "maybe" function:<br />
<br />
<haskell><br />
maybe (error "<ModuleName>.<function>") id $ Map.lookup key map<br />
</haskell><br />
<br />
Generally we require you to be more explicit about failure<br />
cases. Surely a missing (or an irrefutable) pattern would precisely<br />
report the position of a runtime error, but these are not so obvious<br />
when reading the code.<br />
<br />
==== Let or where expressions ====<br />
<br />
Do avoid mixing and nesting "let" and "where". (I prefer the<br />
expression-stylistic "let".) Use auxiliary top-level functions that<br />
you do not export. Export lists also support the detection of unused<br />
functions.<br />
<br />
<br />
==== Code reuse ====<br />
<br />
If you notice that you're doing the same task again, try to generalize<br />
it in order to avoid duplicate code. It is frustrating to change the<br />
same error in several places.<br />
<br />
<br />
==== Application notation ====<br />
<br />
Many parentheses can be eliminated using the infix application operator <code>$</code><br />
with lowest priority. Try at least to avoid unnecessary parentheses in<br />
standard infix expression.<br />
<br />
<haskell><br />
f x : g x ++ h x<br />
<br />
a == 1 && b == 1 || a == 0 && b == 0<br />
</haskell><br />
<br />
Rather than putting a large final argument in parentheses (with a<br />
distant closing one) consider using <code>$</code> instead.<br />
<br />
<code>f (g x)</code> becomes <code>f $ g x</code> and consecutive applications<br />
<code>f (g (h x))</code> can be written as <code>f $ g $ h x</code> or <code>f . g $ h x</code>.<br />
<br />
A function definition like<br />
<code>f x = g $ h x</code> can be abbreviated to <code>f = g . h</code>.<br />
<br />
Note that the final argument may even be an infix- or case expression:<br />
<br />
<haskell><br />
map id $ c : l<br />
<br />
filter (const True) . map id $ case l of ...<br />
</haskell><br />
<br />
However, be aware that $-terms cannot be composed further in infix<br />
expressions.<br />
<br />
Probably wrong:<br />
<haskell><br />
f $ x ++ g $ x<br />
</haskell><br />
<br />
But the scope of an expression is also limited by the layout rule, so<br />
it is usually safe to use "$" on right hand sides.<br />
<br />
Ok:<br />
<br />
<haskell><br />
do f $ l<br />
++<br />
do g $ l<br />
</haskell><br />
<br />
Of course <code>$</code> can not be used in types. GHC has also some primitive<br />
functions involving the kind <code>#</code> that cannot be applied using <code>$</code>.<br />
<br />
Last warning: always leave spaces around <code>$</code> (and other mixfix<br />
operators) since a clash with template haskell is possible.<br />
<br />
(Also write <code>\ t</code> instead of <code>\t</code> in lambda expressions)<br />
<br />
==== List Comprehensions ====<br />
<br />
Use these only when "short and sweet". Prefer map, filter, and foldr!<br />
<br />
Instead of:<br />
<br />
<haskell><br />
[toUpper c | c <- s]<br />
</haskell><br />
<br />
write:<br />
<br />
<haskell><br />
map toUpper s<br />
</haskell><br />
<br />
Consider:<br />
<br />
<haskell><br />
[toUpper c | s <- strings, c <- s]<br />
</haskell><br />
<br />
Here it takes some time for the reader to find out which value depends<br />
on what other value and it is not so clear how many times the interim<br />
values s and c are used. In contrast to that the following can't be clearer:<br />
<br />
<haskell><br />
map toUpper (concat strings)<br />
</haskell><br />
<br />
<br />
When using higher order functions you can switch easier to data<br />
structures different from list. Compare:<br />
<br />
<haskell><br />
map (1+) list<br />
</haskell><br />
<br />
and:<br />
<br />
<haskell><br />
Set.map (1+) set<br />
</haskell><br />
<br />
<br />
==== Types ====<br />
<br />
Prefer proper data types over type synonyms or tuples even if you have<br />
to do more constructing and unpacking. This will make it easier to<br />
supply class instances later on. Don't put class constraints on<br />
a data type, constraints belong only to the functions that manipulate<br />
the data.<br />
<br />
Using type synonyms consistently is difficult over a longer time,<br />
because this is not checked by the compiler. (The types shown by<br />
the compiler may be unpredictable: i.e. FilePath, String or [Char])<br />
<br />
Take care if your data type has many variants (unless it is an<br />
enumeration type.) Don't repeat common parts in every variant since<br />
this will cause code duplication.<br />
<br />
Bad (to handle arguments in sync):<br />
<br />
<haskell><br />
data Mode f p = Box f p | Diamond f p<br />
</haskell><br />
<br />
Good (to handle arguments only once):<br />
<br />
<haskell><br />
data BoxOrDiamond = Box | Diamond<br />
<br />
data Mode f p = Mode BoxOrDiamond f p<br />
</haskell><br />
<br />
<br />
Consider (bad):<br />
<br />
<haskell><br />
data Tuple a b = Tuple a b | Undefined<br />
</haskell><br />
<br />
versus (better):<br />
<br />
<haskell><br />
data Tuple a b = Tuple a b<br />
</haskell><br />
<br />
and using:<br />
<br />
<haskell><br />
Maybe (Tuple a b)<br />
</haskell><br />
<br />
(or another monad) whenever an undefined result needs to be propagated<br />
<br />
<br />
==== Records ====<br />
<br />
For (large) records avoid the use of the constructor directly and<br />
remember that the order and number of fields may change.<br />
<br />
Take care with (the rare case of) depend polymorphic fields:<br />
<br />
<haskell><br />
data Fields a = VariantWithTwo<br />
{ field1 :: a<br />
, field2 :: a }<br />
</haskell><br />
<br />
The type of a value v can not be changed by only setting field1:<br />
<br />
<haskell><br />
v { field1 = f }<br />
</haskell><br />
<br />
Better construct a new value:<br />
<br />
<haskell><br />
VariantWithTwo { field1 = f } -- leaving field2 undefined<br />
</haskell><br />
<br />
Or use a polymorphic element that is instantiated by updating:<br />
<br />
<haskell><br />
empty = VariantWithTwo { field1 = [], field2 = [] }<br />
<br />
empty { field1 = [f] }<br />
</haskell><br />
<br />
Several variants with identical fields may avoid some code duplication<br />
when selecting and updating, though possibly not in a few<br />
depended polymorphic cases.<br />
<br />
However, I doubt if the following is a really good alternative to the<br />
above data Mode with data BoxOrDiamond.<br />
<br />
<haskell><br />
data Mode f p =<br />
Box { formula :: f, positions :: p }<br />
| Diamond { formula :: f, positions :: p }<br />
</haskell><br />
<br />
<br />
==== IO ====<br />
<br />
Try to strictly separate IO, Monad and pure (without do) function<br />
programming (possibly via separate modules).<br />
<br />
Bad:<br />
<br />
<haskell><br />
x <- return y<br />
...<br />
</haskell><br />
<br />
Good:<br />
<br />
<haskell><br />
let x = y<br />
...<br />
</haskell><br />
<br />
Don't use <code>Prelude.interact</code> and make sure your program does not depend<br />
on the (not always obvious) order of evaluation. E.g. don't read and<br />
write to the same file:<br />
<br />
This will fail:<br />
<br />
<haskell><br />
do s <- readFile f<br />
writeFile f $ 'a' : s<br />
</haskell><br />
<br />
because of lazy IO! (Writing is starting before reading is finished.)<br />
<br />
==== Trace ====<br />
<br />
Tracing is for debugging purposes only and should not be used as<br />
feedback for the user. Clean code is not cluttered by trace calls.<br />
<br />
<br />
==== Imports ====<br />
<br />
Standard library modules like Char. List, Maybe, Monad, etc should be<br />
imported by their hierarchical module name, i.e. the base package (so<br />
that haddock finds them):<br />
<br />
<haskell><br />
import Data.List<br />
import Control.Monad<br />
import System.Environment<br />
</haskell><br />
<br />
The libraries for Set and Map are to be imported qualified:<br />
<br />
<haskell><br />
import qualified Data.Set as Set<br />
import qualified Data.Map as Map<br />
</haskell><br />
<br />
<br />
==== Glasgow extensions and Classes ====<br />
<br />
[[Use of language extensions|Stay away from extensions]] as long as possible. Also use classes with<br />
care because soon the desire for overlapping instances (like for lists<br />
and strings) may arise. Then you may want MPTC (multi-parameter type<br />
classes), functional dependencies (FD), undecidable and possibly incoherent<br />
instances and then you are "in the wild" (according to SPJ).<br />
<br />
=== Style in other languages ===<br />
<br />
* [http://www.cs.caltech.edu/~cs20/a/style.html OCaml style]<br />
<br />
=== Final remarks ===<br />
<br />
Despite guidelines, writing "correct code" (without formal proof<br />
support yet) still remains the major challenge. As motivation to<br />
follow these guidelines consider the points that are from the "C++<br />
Coding Standard", where I replaced "C++" with "Haskell".<br />
<br />
Good Points:<br />
<br />
* programmers can go into any code and figure out what's going on<br />
<br />
* new people can get up to speed quickly<br />
<br />
* people new to Haskell are spared the need to develop a personal style and defend it to the death<br />
<br />
* people new to Haskell are spared making the same mistakes over and over again<br />
<br />
* people make fewer mistakes in consistent environments<br />
<br />
* programmers have a common enemy :-)<br />
<br />
Bad Points:<br />
<br />
* the standard is usually stupid because it was made by someone who doesn't understand Haskell<br />
<br />
* the standard is usually stupid because it's not what I do<br />
<br />
* standards reduce creativity<br />
<br />
* standards are unnecessary as long as people are consistent<br />
<br />
* standards enforce too much structure<br />
<br />
* people ignore standards anyway<br />
<br />
== Another guidelines ==<br />
<br />
[https://www.cse.unsw.edu.au/~cs3161/15s2/StyleGuide.html COMP3161]<br />
<br />
[http://docs.ganeti.org/ganeti/2.13/html/dev-codestyle.html#haskell Google's ganeti]<br />
<br />
[https://kowainik.github.io/posts/2019-02-06-style-guide kowainik]<br />
<br />
[https://github.com/Soostone/style-guides/blob/master/haskell-style-guide.md Soostone]<br />
<br />
[https://github.com/tibbe/haskell-style-guide/blob/master/haskell-style.md tibbe]<br />
<br />
[https://github.com/tweag/guides/blob/master/style/Haskell.md tweag]<br />
<br />
[[Category:Style]]</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=Programming_guidelines&diff=62896Programming guidelines2019-04-10T09:19:57Z<p>Anton-Latukha: /* File Format */ upd 80-char lines recommendation</p>
<hr />
<div>Programming guidelines shall help to make the code of a project better<br />
readable and maintainable by the varying number of contributors.<br />
<br />
It takes some programming experience to develop something like a<br />
personal "coding style" and guidelines only serve as rough shape for<br />
code. Guidelines should be followed by all members working on the<br />
project even if they prefer (or are already used to) different<br />
guidelines.<br />
<br />
These guidelines have been originally set up for the [http://hets.eu hets-project] and are<br />
now put on the [http://haskell.org/haskellwiki/ HaskellWiki] gradually<br />
integrating parts of the old hawiki<br />
entries [http://haskell.org/haskellwiki/Things_to_avoid ThingsToAvoid] and<br />
HaskellStyle (hopefully not<br />
hurting someone's copyrights). The other related entry<br />
TipsAndTricks treats more<br />
specific points that are left out here,<br />
<br />
Surely some style choices are a bit arbitrary (or "religious") and<br />
too restrictive with respect to language extensions. Nevertheless I hope<br />
to keep up these guidelines (at least as a basis) for our project<br />
in order to avoid maintaining diverging guidelines. Of course I want<br />
to supply - partly tool-dependent - reasons for certain decisions and<br />
also show alternatives by possibly bad examples. At the time of<br />
writing I use ghc-6.4.1, haddock-0.7 and (GNU-) emacs with the latest<br />
[http://www.haskell.org/haskell-mode/ haskell mode].<br />
<br />
The following quote and links are taken from<br />
HaskellStyle:<br />
<br />
We all have our own ideas about good Haskell style. There's More Than<br />
One Way To Do It. But some ways are better than others.<br />
<br />
Some comments from the GHC team about their internal coding<br />
standards can be found at<br />
http://hackage.haskell.org/trac/ghc/wiki/WorkingConventions<br />
<br />
Also http://research.microsoft.com/~simonpj/papers/haskell-retrospective/<br />
contains some brief comments on syntax and style,<br />
<br />
What now follows are descriptions of program documentation, file<br />
format, naming conventions and good programming practice (adapted form<br />
Matt's C/C++ Programming Guidelines and the Linux kernel coding<br />
style).<br />
<br />
<br />
=== Documentation ===<br />
<br />
<br />
Comments are to be written in application terms (i.e. user's point of<br />
view). Don't use technical terms - that's what the code is for!<br />
<br />
Comments should be written using correct spelling and grammar in complete<br />
sentences with punctation (in English only).<br />
<br />
"Generally, you want your comments to tell WHAT your code does, not HOW.<br />
Also, try to avoid putting comments inside a function body: if the<br />
function is so complex that you need to separately comment parts of it,<br />
you should probably" (... decompose it)<br />
<br />
Put a [http://haskell-haddock.readthedocs.io/en/latest/markup.html haddock comment] on top of every exported function and data type!<br />
Make sure haddock accepts these comments.<br />
<br />
=== File Format ===<br />
<br />
<br />
All Haskell source files start with a haddock header of the form:<br />
<br />
<haskell><br />
{- |<br />
Module : <File name or $Header$ to be replaced automatically><br />
Description : <optional short text displayed on contents page><br />
Copyright : (c) <Authors or Affiliations><br />
License : <license><br />
<br />
Maintainer : <email><br />
Stability : unstable | experimental | provisional | stable | frozen<br />
Portability : portable | non-portable (<reason>)<br />
<br />
<module description starting at first column><br />
-}<br />
</haskell><br />
<br />
A possible compiler pragma (like {-# LANGUAGE CPP #-}) may precede<br />
this header. The following hierarchical module name must, of course,<br />
match the file name.<br />
<br />
Make sure that the description is changed to meet the module (if the<br />
header was copied from elsewhere). Insert your email address as maintainer.<br />
<br />
Try to write portable ([[Language_and_library_specification#The_Haskell_98_report|Haskell98]]) code.<br />
If you use e.g. [[MPTC|multi-parameter type classes]] (MTPC) and [[Functional dependencies|functional dependencies]] (FD)<br />
the code becomes [https://mail.haskell.org/pipermail/haskell-prime/2006-February/000609.html "non-portable (MPTC with FD)"].<br />
<br />
The \$Header\$ entry will be automatically expanded.<br />
<br />
Lines should not be longer than 80 (preferably 75) characters.<br />
Code with short lines reads casually and easier to understand.<br />
If the expression is longer than 80 lines, try to structure & rewrite<br />
the code in a more expressive way.<br />
80 character lines considered over IT industry good practice and<br />
supported in all cases.<br />
<br />
Don't leave trailing white space in your code in every line.<br />
<br />
Expand all your tabs to spaces to avoid the danger of wrongly expanding<br />
them (or a different display of tabs versus eight spaces). Possibly put<br />
something like the following in your ~/.emacs file.<br />
<br />
(custom-set-variables '(indent-tabs-mode nil))<br />
<br />
The last character in your file should be a newline! Under Solaris<br />
you'll get a warning if this is not the case and sometimes last lines<br />
without newlines are ignored (i.e. "#endif" without newline). Emacs<br />
usually asks for a final newline.<br />
<br />
You may use http://hackage.haskell.org/package/scan to check your file format.<br />
<br />
The whole module should not be too long (about 400 lines)<br />
<br />
Please have a look at the [http://haskell-haddock.readthedocs.io/en/latest/markup.html#the-module-description Haddock module header documentation].<br />
<br />
=== Naming Conventions ===<br />
<br />
<br />
In Haskell types start with capital and functions with lowercase<br />
letters, so only avoid infix identifiers! Defining symbolic infix<br />
identifiers should be left to library writers only.<br />
<br />
(The infix identifier "\\" at the end of a line causes cpp preprocessor<br />
problems.)<br />
<br />
Names (especially global ones) should be descriptive and if you need<br />
long names write them as mixed case words (aka camelCase). (but "tmp"<br />
is to be preferred over "thisVariableIsATemporaryCounter")<br />
<br />
Also in the standard libraries, function names with multiple words are<br />
written using the camelCase convention. Similarly, type, typeclass and<br />
constructor names are written using the StudlyCaps convention.<br />
<br />
Some parts of our code use underlines (without unnecessary uppercase<br />
letters) for long identifiers to better reflect names given with<br />
hyphens in the requirement documentation. Also such names should be<br />
transliterated to camlCase identifiers possibly adding a (consistent)<br />
suffix or prefix to avoid conflicts with keywords. However, instead of<br />
a recurring prefix or suffix you may consider to use qualified imports<br />
and names.<br />
<br />
<br />
=== Good Programming Practice ===<br />
<br />
<br />
"Functions should be short and sweet, and do just one thing. They should<br />
fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24,<br />
as we all know), and do one thing and do that well."<br />
<br />
Most haskell functions should be at most a few lines, only case<br />
expression over large data types (that should be avoided, too) may need<br />
corresponding space.<br />
<br />
The code should be succinct (though not obfuscated), readable and easy to<br />
maintain (after unforeseeable changes). Don't exploit exotic language<br />
features without good reason.<br />
<br />
It's not fixed how deep you indent (4 or 8 chars). You can break the<br />
line after "do", "let", "where", and "case .. of". Make sure that<br />
renamings don't destroy your layout. (If you get too far to the right,<br />
the code is unreadable anyway and needs to be decomposed.)<br />
<br />
Bad:<br />
<haskell><br />
case foo of Foo -> "Foo"<br />
Bar -> "Bar"<br />
</haskell><br />
<br />
Good:<br />
<haskell><br />
case <longer expression> of<br />
Foo -> "Foo"<br />
Bar -> "Bar"<br />
</haskell><br />
<br />
Avoid the notation with braces and semicolons since the layout rule<br />
forces you to properly align your alternatives.<br />
<br />
Respect compiler warnings. Supply type signatures, avoid shadowing and<br />
unused variables. Particularly avoid non-exhaustive and<br />
overlapping patterns. Missing unreachable cases can be filled in using<br />
"error" with a fixed string "<ModuleName>.<function>" to indicate the<br />
error position (in case the impossible should happen). Don't invest<br />
time to "show" the offending value, only do this temporarily when<br />
debugging the code.<br />
<br />
Don't leave unused or commented-out code in your files! Readers don't<br />
know what to think of it.<br />
<br />
<br />
==== Partial functions ====<br />
<br />
For partial functions do document their preconditions (if not obvious)<br />
and make sure that partial functions are only called when<br />
preconditions are obviously fulfilled (i.e. by a case statement or a<br />
previous test). Particularly the call of "head" should be used with<br />
care or (even better) be made obsolete by a case statement.<br />
<br />
Usually a case statement (and the import of isJust and fromJust from<br />
Data.[[Maybe]]) can be avoided by using the "maybe" function:<br />
<br />
<haskell><br />
maybe (error "<ModuleName>.<function>") id $ Map.lookup key map<br />
</haskell><br />
<br />
Generally we require you to be more explicit about failure<br />
cases. Surely a missing (or an irrefutable) pattern would precisely<br />
report the position of a runtime error, but these are not so obvious<br />
when reading the code.<br />
<br />
==== Let or where expressions ====<br />
<br />
Do avoid mixing and nesting "let" and "where". (I prefer the<br />
expression-stylistic "let".) Use auxiliary top-level functions that<br />
you do not export. Export lists also support the detection of unused<br />
functions.<br />
<br />
<br />
==== Code reuse ====<br />
<br />
If you notice that you're doing the same task again, try to generalize<br />
it in order to avoid duplicate code. It is frustrating to change the<br />
same error in several places.<br />
<br />
<br />
==== Application notation ====<br />
<br />
Many parentheses can be eliminated using the infix application operator <code>$</code><br />
with lowest priority. Try at least to avoid unnecessary parentheses in<br />
standard infix expression.<br />
<br />
<haskell><br />
f x : g x ++ h x<br />
<br />
a == 1 && b == 1 || a == 0 && b == 0<br />
</haskell><br />
<br />
Rather than putting a large final argument in parentheses (with a<br />
distant closing one) consider using <code>$</code> instead.<br />
<br />
<code>f (g x)</code> becomes <code>f $ g x</code> and consecutive applications<br />
<code>f (g (h x))</code> can be written as <code>f $ g $ h x</code> or <code>f . g $ h x</code>.<br />
<br />
A function definition like<br />
<code>f x = g $ h x</code> can be abbreviated to <code>f = g . h</code>.<br />
<br />
Note that the final argument may even be an infix- or case expression:<br />
<br />
<haskell><br />
map id $ c : l<br />
<br />
filter (const True) . map id $ case l of ...<br />
</haskell><br />
<br />
However, be aware that $-terms cannot be composed further in infix<br />
expressions.<br />
<br />
Probably wrong:<br />
<haskell><br />
f $ x ++ g $ x<br />
</haskell><br />
<br />
But the scope of an expression is also limited by the layout rule, so<br />
it is usually safe to use "$" on right hand sides.<br />
<br />
Ok:<br />
<br />
<haskell><br />
do f $ l<br />
++<br />
do g $ l<br />
</haskell><br />
<br />
Of course <code>$</code> can not be used in types. GHC has also some primitive<br />
functions involving the kind <code>#</code> that cannot be applied using <code>$</code>.<br />
<br />
Last warning: always leave spaces around <code>$</code> (and other mixfix<br />
operators) since a clash with template haskell is possible.<br />
<br />
(Also write <code>\ t</code> instead of <code>\t</code> in lambda expressions)<br />
<br />
==== List Comprehensions ====<br />
<br />
Use these only when "short and sweet". Prefer map, filter, and foldr!<br />
<br />
Instead of:<br />
<br />
<haskell><br />
[toUpper c | c <- s]<br />
</haskell><br />
<br />
write:<br />
<br />
<haskell><br />
map toUpper s<br />
</haskell><br />
<br />
Consider:<br />
<br />
<haskell><br />
[toUpper c | s <- strings, c <- s]<br />
</haskell><br />
<br />
Here it takes some time for the reader to find out which value depends<br />
on what other value and it is not so clear how many times the interim<br />
values s and c are used. In contrast to that the following can't be clearer:<br />
<br />
<haskell><br />
map toUpper (concat strings)<br />
</haskell><br />
<br />
<br />
When using higher order functions you can switch easier to data<br />
structures different from list. Compare:<br />
<br />
<haskell><br />
map (1+) list<br />
</haskell><br />
<br />
and:<br />
<br />
<haskell><br />
Set.map (1+) set<br />
</haskell><br />
<br />
<br />
==== Types ====<br />
<br />
Prefer proper data types over type synonyms or tuples even if you have<br />
to do more constructing and unpacking. This will make it easier to<br />
supply class instances later on. Don't put class constraints on<br />
a data type, constraints belong only to the functions that manipulate<br />
the data.<br />
<br />
Using type synonyms consistently is difficult over a longer time,<br />
because this is not checked by the compiler. (The types shown by<br />
the compiler may be unpredictable: i.e. FilePath, String or [Char])<br />
<br />
Take care if your data type has many variants (unless it is an<br />
enumeration type.) Don't repeat common parts in every variant since<br />
this will cause code duplication.<br />
<br />
Bad (to handle arguments in sync):<br />
<br />
<haskell><br />
data Mode f p = Box f p | Diamond f p<br />
</haskell><br />
<br />
Good (to handle arguments only once):<br />
<br />
<haskell><br />
data BoxOrDiamond = Box | Diamond<br />
<br />
data Mode f p = Mode BoxOrDiamond f p<br />
</haskell><br />
<br />
<br />
Consider (bad):<br />
<br />
<haskell><br />
data Tuple a b = Tuple a b | Undefined<br />
</haskell><br />
<br />
versus (better):<br />
<br />
<haskell><br />
data Tuple a b = Tuple a b<br />
</haskell><br />
<br />
and using:<br />
<br />
<haskell><br />
Maybe (Tuple a b)<br />
</haskell><br />
<br />
(or another monad) whenever an undefined result needs to be propagated<br />
<br />
<br />
==== Records ====<br />
<br />
For (large) records avoid the use of the constructor directly and<br />
remember that the order and number of fields may change.<br />
<br />
Take care with (the rare case of) depend polymorphic fields:<br />
<br />
<haskell><br />
data Fields a = VariantWithTwo<br />
{ field1 :: a<br />
, field2 :: a }<br />
</haskell><br />
<br />
The type of a value v can not be changed by only setting field1:<br />
<br />
<haskell><br />
v { field1 = f }<br />
</haskell><br />
<br />
Better construct a new value:<br />
<br />
<haskell><br />
VariantWithTwo { field1 = f } -- leaving field2 undefined<br />
</haskell><br />
<br />
Or use a polymorphic element that is instantiated by updating:<br />
<br />
<haskell><br />
empty = VariantWithTwo { field1 = [], field2 = [] }<br />
<br />
empty { field1 = [f] }<br />
</haskell><br />
<br />
Several variants with identical fields may avoid some code duplication<br />
when selecting and updating, though possibly not in a few<br />
depended polymorphic cases.<br />
<br />
However, I doubt if the following is a really good alternative to the<br />
above data Mode with data BoxOrDiamond.<br />
<br />
<haskell><br />
data Mode f p =<br />
Box { formula :: f, positions :: p }<br />
| Diamond { formula :: f, positions :: p }<br />
</haskell><br />
<br />
<br />
==== IO ====<br />
<br />
Try to strictly separate IO, Monad and pure (without do) function<br />
programming (possibly via separate modules).<br />
<br />
Bad:<br />
<br />
<haskell><br />
x <- return y<br />
...<br />
</haskell><br />
<br />
Good:<br />
<br />
<haskell><br />
let x = y<br />
...<br />
</haskell><br />
<br />
Don't use <code>Prelude.interact</code> and make sure your program does not depend<br />
on the (not always obvious) order of evaluation. E.g. don't read and<br />
write to the same file:<br />
<br />
This will fail:<br />
<br />
<haskell><br />
do s <- readFile f<br />
writeFile f $ 'a' : s<br />
</haskell><br />
<br />
because of lazy IO! (Writing is starting before reading is finished.)<br />
<br />
==== Trace ====<br />
<br />
Tracing is for debugging purposes only and should not be used as<br />
feedback for the user. Clean code is not cluttered by trace calls.<br />
<br />
<br />
==== Imports ====<br />
<br />
Standard library modules like Char. List, Maybe, Monad, etc should be<br />
imported by their hierarchical module name, i.e. the base package (so<br />
that haddock finds them):<br />
<br />
<haskell><br />
import Data.List<br />
import Control.Monad<br />
import System.Environment<br />
</haskell><br />
<br />
The libraries for Set and Map are to be imported qualified:<br />
<br />
<haskell><br />
import qualified Data.Set as Set<br />
import qualified Data.Map as Map<br />
</haskell><br />
<br />
<br />
==== Glasgow extensions and Classes ====<br />
<br />
[[Use of language extensions|Stay away from extensions]] as long as possible. Also use classes with<br />
care because soon the desire for overlapping instances (like for lists<br />
and strings) may arise. Then you may want MPTC (multi-parameter type<br />
classes), functional dependencies (FD), undecidable and possibly incoherent<br />
instances and then you are "in the wild" (according to SPJ).<br />
<br />
=== Style in other languages ===<br />
<br />
* [http://www.cs.caltech.edu/~cs20/a/style.html OCaml style]<br />
<br />
=== Final remarks ===<br />
<br />
Despite guidelines, writing "correct code" (without formal proof<br />
support yet) still remains the major challenge. As motivation to<br />
follow these guidelines consider the points that are from the "C++<br />
Coding Standard", where I replaced "C++" with "Haskell".<br />
<br />
Good Points:<br />
<br />
* programmers can go into any code and figure out what's going on<br />
<br />
* new people can get up to speed quickly<br />
<br />
* people new to Haskell are spared the need to develop a personal style and defend it to the death<br />
<br />
* people new to Haskell are spared making the same mistakes over and over again<br />
<br />
* people make fewer mistakes in consistent environments<br />
<br />
* programmers have a common enemy :-)<br />
<br />
Bad Points:<br />
<br />
* the standard is usually stupid because it was made by someone who doesn't understand Haskell<br />
<br />
* the standard is usually stupid because it's not what I do<br />
<br />
* standards reduce creativity<br />
<br />
* standards are unnecessary as long as people are consistent<br />
<br />
* standards enforce too much structure<br />
<br />
* people ignore standards anyway<br />
<br />
== Another guidelines ==<br />
<br />
[https://www.cse.unsw.edu.au/~cs3161/15s2/StyleGuide.html COMP3161]<br />
<br />
[http://docs.ganeti.org/ganeti/2.13/html/dev-codestyle.html#haskell Google's ganeti]<br />
<br />
[https://kowainik.github.io/posts/2019-02-06-style-guide kowainik]<br />
<br />
[https://github.com/Soostone/style-guides/blob/master/haskell-style-guide.md Soostone]<br />
<br />
[https://github.com/tibbe/haskell-style-guide/blob/master/haskell-style.md tibbe]<br />
<br />
[https://github.com/tweag/guides/blob/master/style/Haskell.md tweag]<br />
<br />
[[Category:Style]]</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=User:Anton-Latukha/common.js&diff=62895User:Anton-Latukha/common.js2019-04-09T11:18:46Z<p>Anton-Latukha: upd wiki-monkey to Wikipedia profile</p>
<hr />
<div>mw.loader.load('https://rawcdn.githack.com/kynikos/wiki-monkey/v5.0.1/dist/WikiMonkey-Wikipedia.min.js');</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=User:Anton-Latukha/common.js&diff=62894User:Anton-Latukha/common.js2019-04-09T11:16:54Z<p>Anton-Latukha: upd wiki-monkey to arch-wiki profile</p>
<hr />
<div>mw.loader.load('https://rawcdn.githack.com/kynikos/wiki-monkey/v5.0.1/dist/WikiMonkey-ArchWiki.min.js');</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=User:Anton-Latukha/common.js&diff=62893User:Anton-Latukha/common.js2019-04-09T11:12:40Z<p>Anton-Latukha: add wiki-monkey script</p>
<hr />
<div>mw.loader.load('https://rawcdn.githack.com/kynikos/wiki-monkey/v5.0.1/dist/WikiMonkey-Wikipedia.min.js');</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=Programming_guidelines&diff=62892Programming guidelines2019-04-08T21:08:55Z<p>Anton-Latukha: /* File Format */ add reference links</p>
<hr />
<div>Programming guidelines shall help to make the code of a project better<br />
readable and maintainable by the varying number of contributors.<br />
<br />
It takes some programming experience to develop something like a<br />
personal "coding style" and guidelines only serve as rough shape for<br />
code. Guidelines should be followed by all members working on the<br />
project even if they prefer (or are already used to) different<br />
guidelines.<br />
<br />
These guidelines have been originally set up for the [http://hets.eu hets-project] and are<br />
now put on the [http://haskell.org/haskellwiki/ HaskellWiki] gradually<br />
integrating parts of the old hawiki<br />
entries [http://haskell.org/haskellwiki/Things_to_avoid ThingsToAvoid] and<br />
HaskellStyle (hopefully not<br />
hurting someone's copyrights). The other related entry<br />
TipsAndTricks treats more<br />
specific points that are left out here,<br />
<br />
Surely some style choices are a bit arbitrary (or "religious") and<br />
too restrictive with respect to language extensions. Nevertheless I hope<br />
to keep up these guidelines (at least as a basis) for our project<br />
in order to avoid maintaining diverging guidelines. Of course I want<br />
to supply - partly tool-dependent - reasons for certain decisions and<br />
also show alternatives by possibly bad examples. At the time of<br />
writing I use ghc-6.4.1, haddock-0.7 and (GNU-) emacs with the latest<br />
[http://www.haskell.org/haskell-mode/ haskell mode].<br />
<br />
The following quote and links are taken from<br />
HaskellStyle:<br />
<br />
We all have our own ideas about good Haskell style. There's More Than<br />
One Way To Do It. But some ways are better than others.<br />
<br />
Some comments from the GHC team about their internal coding<br />
standards can be found at<br />
http://hackage.haskell.org/trac/ghc/wiki/WorkingConventions<br />
<br />
Also http://research.microsoft.com/~simonpj/papers/haskell-retrospective/<br />
contains some brief comments on syntax and style,<br />
<br />
What now follows are descriptions of program documentation, file<br />
format, naming conventions and good programming practice (adapted form<br />
Matt's C/C++ Programming Guidelines and the Linux kernel coding<br />
style).<br />
<br />
<br />
=== Documentation ===<br />
<br />
<br />
Comments are to be written in application terms (i.e. user's point of<br />
view). Don't use technical terms - that's what the code is for!<br />
<br />
Comments should be written using correct spelling and grammar in complete<br />
sentences with punctation (in English only).<br />
<br />
"Generally, you want your comments to tell WHAT your code does, not HOW.<br />
Also, try to avoid putting comments inside a function body: if the<br />
function is so complex that you need to separately comment parts of it,<br />
you should probably" (... decompose it)<br />
<br />
Put a [http://haskell-haddock.readthedocs.io/en/latest/markup.html haddock comment] on top of every exported function and data type!<br />
Make sure haddock accepts these comments.<br />
<br />
=== File Format ===<br />
<br />
<br />
All Haskell source files start with a haddock header of the form:<br />
<br />
<haskell><br />
{- |<br />
Module : <File name or $Header$ to be replaced automatically><br />
Description : <optional short text displayed on contents page><br />
Copyright : (c) <Authors or Affiliations><br />
License : <license><br />
<br />
Maintainer : <email><br />
Stability : unstable | experimental | provisional | stable | frozen<br />
Portability : portable | non-portable (<reason>)<br />
<br />
<module description starting at first column><br />
-}<br />
</haskell><br />
<br />
A possible compiler pragma (like {-# LANGUAGE CPP #-}) may precede<br />
this header. The following hierarchical module name must, of course,<br />
match the file name.<br />
<br />
Make sure that the description is changed to meet the module (if the<br />
header was copied from elsewhere). Insert your email address as maintainer.<br />
<br />
Try to write portable ([[Language_and_library_specification#The_Haskell_98_report|Haskell98]]) code.<br />
If you use e.g. [[MPTC|multi-parameter type classes]] (MTPC) and [[Functional dependencies|functional dependencies]] (FD)<br />
the code becomes [https://mail.haskell.org/pipermail/haskell-prime/2006-February/000609.html "non-portable (MPTC with FD)"].<br />
<br />
The \$Header\$ entry will be automatically expanded.<br />
<br />
Lines should not be longer than 80 (preferably 75)<br />
characters to avoid wrapped lines (for casual readers)!<br />
<br />
Don't leave trailing white space in your code in every line.<br />
<br />
Expand all your tabs to spaces to avoid the danger of wrongly expanding<br />
them (or a different display of tabs versus eight spaces). Possibly put<br />
something like the following in your ~/.emacs file.<br />
<br />
(custom-set-variables '(indent-tabs-mode nil))<br />
<br />
The last character in your file should be a newline! Under Solaris<br />
you'll get a warning if this is not the case and sometimes last lines<br />
without newlines are ignored (i.e. "#endif" without newline). Emacs<br />
usually asks for a final newline.<br />
<br />
You may use http://hackage.haskell.org/package/scan to check your file format.<br />
<br />
The whole module should not be too long (about 400 lines)<br />
<br />
Please have a look at the [http://haskell-haddock.readthedocs.io/en/latest/markup.html#the-module-description Haddock module header documentation].<br />
<br />
=== Naming Conventions ===<br />
<br />
<br />
In Haskell types start with capital and functions with lowercase<br />
letters, so only avoid infix identifiers! Defining symbolic infix<br />
identifiers should be left to library writers only.<br />
<br />
(The infix identifier "\\" at the end of a line causes cpp preprocessor<br />
problems.)<br />
<br />
Names (especially global ones) should be descriptive and if you need<br />
long names write them as mixed case words (aka camelCase). (but "tmp"<br />
is to be preferred over "thisVariableIsATemporaryCounter")<br />
<br />
Also in the standard libraries, function names with multiple words are<br />
written using the camelCase convention. Similarly, type, typeclass and<br />
constructor names are written using the StudlyCaps convention.<br />
<br />
Some parts of our code use underlines (without unnecessary uppercase<br />
letters) for long identifiers to better reflect names given with<br />
hyphens in the requirement documentation. Also such names should be<br />
transliterated to camlCase identifiers possibly adding a (consistent)<br />
suffix or prefix to avoid conflicts with keywords. However, instead of<br />
a recurring prefix or suffix you may consider to use qualified imports<br />
and names.<br />
<br />
<br />
=== Good Programming Practice ===<br />
<br />
<br />
"Functions should be short and sweet, and do just one thing. They should<br />
fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24,<br />
as we all know), and do one thing and do that well."<br />
<br />
Most haskell functions should be at most a few lines, only case<br />
expression over large data types (that should be avoided, too) may need<br />
corresponding space.<br />
<br />
The code should be succinct (though not obfuscated), readable and easy to<br />
maintain (after unforeseeable changes). Don't exploit exotic language<br />
features without good reason.<br />
<br />
It's not fixed how deep you indent (4 or 8 chars). You can break the<br />
line after "do", "let", "where", and "case .. of". Make sure that<br />
renamings don't destroy your layout. (If you get too far to the right,<br />
the code is unreadable anyway and needs to be decomposed.)<br />
<br />
Bad:<br />
<haskell><br />
case foo of Foo -> "Foo"<br />
Bar -> "Bar"<br />
</haskell><br />
<br />
Good:<br />
<haskell><br />
case <longer expression> of<br />
Foo -> "Foo"<br />
Bar -> "Bar"<br />
</haskell><br />
<br />
Avoid the notation with braces and semicolons since the layout rule<br />
forces you to properly align your alternatives.<br />
<br />
Respect compiler warnings. Supply type signatures, avoid shadowing and<br />
unused variables. Particularly avoid non-exhaustive and<br />
overlapping patterns. Missing unreachable cases can be filled in using<br />
"error" with a fixed string "<ModuleName>.<function>" to indicate the<br />
error position (in case the impossible should happen). Don't invest<br />
time to "show" the offending value, only do this temporarily when<br />
debugging the code.<br />
<br />
Don't leave unused or commented-out code in your files! Readers don't<br />
know what to think of it.<br />
<br />
<br />
==== Partial functions ====<br />
<br />
For partial functions do document their preconditions (if not obvious)<br />
and make sure that partial functions are only called when<br />
preconditions are obviously fulfilled (i.e. by a case statement or a<br />
previous test). Particularly the call of "head" should be used with<br />
care or (even better) be made obsolete by a case statement.<br />
<br />
Usually a case statement (and the import of isJust and fromJust from<br />
Data.[[Maybe]]) can be avoided by using the "maybe" function:<br />
<br />
<haskell><br />
maybe (error "<ModuleName>.<function>") id $ Map.lookup key map<br />
</haskell><br />
<br />
Generally we require you to be more explicit about failure<br />
cases. Surely a missing (or an irrefutable) pattern would precisely<br />
report the position of a runtime error, but these are not so obvious<br />
when reading the code.<br />
<br />
==== Let or where expressions ====<br />
<br />
Do avoid mixing and nesting "let" and "where". (I prefer the<br />
expression-stylistic "let".) Use auxiliary top-level functions that<br />
you do not export. Export lists also support the detection of unused<br />
functions.<br />
<br />
<br />
==== Code reuse ====<br />
<br />
If you notice that you're doing the same task again, try to generalize<br />
it in order to avoid duplicate code. It is frustrating to change the<br />
same error in several places.<br />
<br />
<br />
==== Application notation ====<br />
<br />
Many parentheses can be eliminated using the infix application operator <code>$</code><br />
with lowest priority. Try at least to avoid unnecessary parentheses in<br />
standard infix expression.<br />
<br />
<haskell><br />
f x : g x ++ h x<br />
<br />
a == 1 && b == 1 || a == 0 && b == 0<br />
</haskell><br />
<br />
Rather than putting a large final argument in parentheses (with a<br />
distant closing one) consider using <code>$</code> instead.<br />
<br />
<code>f (g x)</code> becomes <code>f $ g x</code> and consecutive applications<br />
<code>f (g (h x))</code> can be written as <code>f $ g $ h x</code> or <code>f . g $ h x</code>.<br />
<br />
A function definition like<br />
<code>f x = g $ h x</code> can be abbreviated to <code>f = g . h</code>.<br />
<br />
Note that the final argument may even be an infix- or case expression:<br />
<br />
<haskell><br />
map id $ c : l<br />
<br />
filter (const True) . map id $ case l of ...<br />
</haskell><br />
<br />
However, be aware that $-terms cannot be composed further in infix<br />
expressions.<br />
<br />
Probably wrong:<br />
<haskell><br />
f $ x ++ g $ x<br />
</haskell><br />
<br />
But the scope of an expression is also limited by the layout rule, so<br />
it is usually safe to use "$" on right hand sides.<br />
<br />
Ok:<br />
<br />
<haskell><br />
do f $ l<br />
++<br />
do g $ l<br />
</haskell><br />
<br />
Of course <code>$</code> can not be used in types. GHC has also some primitive<br />
functions involving the kind <code>#</code> that cannot be applied using <code>$</code>.<br />
<br />
Last warning: always leave spaces around <code>$</code> (and other mixfix<br />
operators) since a clash with template haskell is possible.<br />
<br />
(Also write <code>\ t</code> instead of <code>\t</code> in lambda expressions)<br />
<br />
==== List Comprehensions ====<br />
<br />
Use these only when "short and sweet". Prefer map, filter, and foldr!<br />
<br />
Instead of:<br />
<br />
<haskell><br />
[toUpper c | c <- s]<br />
</haskell><br />
<br />
write:<br />
<br />
<haskell><br />
map toUpper s<br />
</haskell><br />
<br />
Consider:<br />
<br />
<haskell><br />
[toUpper c | s <- strings, c <- s]<br />
</haskell><br />
<br />
Here it takes some time for the reader to find out which value depends<br />
on what other value and it is not so clear how many times the interim<br />
values s and c are used. In contrast to that the following can't be clearer:<br />
<br />
<haskell><br />
map toUpper (concat strings)<br />
</haskell><br />
<br />
<br />
When using higher order functions you can switch easier to data<br />
structures different from list. Compare:<br />
<br />
<haskell><br />
map (1+) list<br />
</haskell><br />
<br />
and:<br />
<br />
<haskell><br />
Set.map (1+) set<br />
</haskell><br />
<br />
<br />
==== Types ====<br />
<br />
Prefer proper data types over type synonyms or tuples even if you have<br />
to do more constructing and unpacking. This will make it easier to<br />
supply class instances later on. Don't put class constraints on<br />
a data type, constraints belong only to the functions that manipulate<br />
the data.<br />
<br />
Using type synonyms consistently is difficult over a longer time,<br />
because this is not checked by the compiler. (The types shown by<br />
the compiler may be unpredictable: i.e. FilePath, String or [Char])<br />
<br />
Take care if your data type has many variants (unless it is an<br />
enumeration type.) Don't repeat common parts in every variant since<br />
this will cause code duplication.<br />
<br />
Bad (to handle arguments in sync):<br />
<br />
<haskell><br />
data Mode f p = Box f p | Diamond f p<br />
</haskell><br />
<br />
Good (to handle arguments only once):<br />
<br />
<haskell><br />
data BoxOrDiamond = Box | Diamond<br />
<br />
data Mode f p = Mode BoxOrDiamond f p<br />
</haskell><br />
<br />
<br />
Consider (bad):<br />
<br />
<haskell><br />
data Tuple a b = Tuple a b | Undefined<br />
</haskell><br />
<br />
versus (better):<br />
<br />
<haskell><br />
data Tuple a b = Tuple a b<br />
</haskell><br />
<br />
and using:<br />
<br />
<haskell><br />
Maybe (Tuple a b)<br />
</haskell><br />
<br />
(or another monad) whenever an undefined result needs to be propagated<br />
<br />
<br />
==== Records ====<br />
<br />
For (large) records avoid the use of the constructor directly and<br />
remember that the order and number of fields may change.<br />
<br />
Take care with (the rare case of) depend polymorphic fields:<br />
<br />
<haskell><br />
data Fields a = VariantWithTwo<br />
{ field1 :: a<br />
, field2 :: a }<br />
</haskell><br />
<br />
The type of a value v can not be changed by only setting field1:<br />
<br />
<haskell><br />
v { field1 = f }<br />
</haskell><br />
<br />
Better construct a new value:<br />
<br />
<haskell><br />
VariantWithTwo { field1 = f } -- leaving field2 undefined<br />
</haskell><br />
<br />
Or use a polymorphic element that is instantiated by updating:<br />
<br />
<haskell><br />
empty = VariantWithTwo { field1 = [], field2 = [] }<br />
<br />
empty { field1 = [f] }<br />
</haskell><br />
<br />
Several variants with identical fields may avoid some code duplication<br />
when selecting and updating, though possibly not in a few<br />
depended polymorphic cases.<br />
<br />
However, I doubt if the following is a really good alternative to the<br />
above data Mode with data BoxOrDiamond.<br />
<br />
<haskell><br />
data Mode f p =<br />
Box { formula :: f, positions :: p }<br />
| Diamond { formula :: f, positions :: p }<br />
</haskell><br />
<br />
<br />
==== IO ====<br />
<br />
Try to strictly separate IO, Monad and pure (without do) function<br />
programming (possibly via separate modules).<br />
<br />
Bad:<br />
<br />
<haskell><br />
x <- return y<br />
...<br />
</haskell><br />
<br />
Good:<br />
<br />
<haskell><br />
let x = y<br />
...<br />
</haskell><br />
<br />
Don't use <code>Prelude.interact</code> and make sure your program does not depend<br />
on the (not always obvious) order of evaluation. E.g. don't read and<br />
write to the same file:<br />
<br />
This will fail:<br />
<br />
<haskell><br />
do s <- readFile f<br />
writeFile f $ 'a' : s<br />
</haskell><br />
<br />
because of lazy IO! (Writing is starting before reading is finished.)<br />
<br />
==== Trace ====<br />
<br />
Tracing is for debugging purposes only and should not be used as<br />
feedback for the user. Clean code is not cluttered by trace calls.<br />
<br />
<br />
==== Imports ====<br />
<br />
Standard library modules like Char. List, Maybe, Monad, etc should be<br />
imported by their hierarchical module name, i.e. the base package (so<br />
that haddock finds them):<br />
<br />
<haskell><br />
import Data.List<br />
import Control.Monad<br />
import System.Environment<br />
</haskell><br />
<br />
The libraries for Set and Map are to be imported qualified:<br />
<br />
<haskell><br />
import qualified Data.Set as Set<br />
import qualified Data.Map as Map<br />
</haskell><br />
<br />
<br />
==== Glasgow extensions and Classes ====<br />
<br />
[[Use of language extensions|Stay away from extensions]] as long as possible. Also use classes with<br />
care because soon the desire for overlapping instances (like for lists<br />
and strings) may arise. Then you may want MPTC (multi-parameter type<br />
classes), functional dependencies (FD), undecidable and possibly incoherent<br />
instances and then you are "in the wild" (according to SPJ).<br />
<br />
=== Style in other languages ===<br />
<br />
* [http://www.cs.caltech.edu/~cs20/a/style.html OCaml style]<br />
<br />
=== Final remarks ===<br />
<br />
Despite guidelines, writing "correct code" (without formal proof<br />
support yet) still remains the major challenge. As motivation to<br />
follow these guidelines consider the points that are from the "C++<br />
Coding Standard", where I replaced "C++" with "Haskell".<br />
<br />
Good Points:<br />
<br />
* programmers can go into any code and figure out what's going on<br />
<br />
* new people can get up to speed quickly<br />
<br />
* people new to Haskell are spared the need to develop a personal style and defend it to the death<br />
<br />
* people new to Haskell are spared making the same mistakes over and over again<br />
<br />
* people make fewer mistakes in consistent environments<br />
<br />
* programmers have a common enemy :-)<br />
<br />
Bad Points:<br />
<br />
* the standard is usually stupid because it was made by someone who doesn't understand Haskell<br />
<br />
* the standard is usually stupid because it's not what I do<br />
<br />
* standards reduce creativity<br />
<br />
* standards are unnecessary as long as people are consistent<br />
<br />
* standards enforce too much structure<br />
<br />
* people ignore standards anyway<br />
<br />
== Another guidelines ==<br />
<br />
[https://www.cse.unsw.edu.au/~cs3161/15s2/StyleGuide.html COMP3161]<br />
<br />
[http://docs.ganeti.org/ganeti/2.13/html/dev-codestyle.html#haskell Google's ganeti]<br />
<br />
[https://kowainik.github.io/posts/2019-02-06-style-guide kowainik]<br />
<br />
[https://github.com/Soostone/style-guides/blob/master/haskell-style-guide.md Soostone]<br />
<br />
[https://github.com/tibbe/haskell-style-guide/blob/master/haskell-style.md tibbe]<br />
<br />
[https://github.com/tweag/guides/blob/master/style/Haskell.md tweag]<br />
<br />
[[Category:Style]]</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=Programming_guidelines&diff=62891Programming guidelines2019-04-08T20:30:57Z<p>Anton-Latukha: /* File Format */</p>
<hr />
<div>Programming guidelines shall help to make the code of a project better<br />
readable and maintainable by the varying number of contributors.<br />
<br />
It takes some programming experience to develop something like a<br />
personal "coding style" and guidelines only serve as rough shape for<br />
code. Guidelines should be followed by all members working on the<br />
project even if they prefer (or are already used to) different<br />
guidelines.<br />
<br />
These guidelines have been originally set up for the [http://hets.eu hets-project] and are<br />
now put on the [http://haskell.org/haskellwiki/ HaskellWiki] gradually<br />
integrating parts of the old hawiki<br />
entries [http://haskell.org/haskellwiki/Things_to_avoid ThingsToAvoid] and<br />
HaskellStyle (hopefully not<br />
hurting someone's copyrights). The other related entry<br />
TipsAndTricks treats more<br />
specific points that are left out here,<br />
<br />
Surely some style choices are a bit arbitrary (or "religious") and<br />
too restrictive with respect to language extensions. Nevertheless I hope<br />
to keep up these guidelines (at least as a basis) for our project<br />
in order to avoid maintaining diverging guidelines. Of course I want<br />
to supply - partly tool-dependent - reasons for certain decisions and<br />
also show alternatives by possibly bad examples. At the time of<br />
writing I use ghc-6.4.1, haddock-0.7 and (GNU-) emacs with the latest<br />
[http://www.haskell.org/haskell-mode/ haskell mode].<br />
<br />
The following quote and links are taken from<br />
HaskellStyle:<br />
<br />
We all have our own ideas about good Haskell style. There's More Than<br />
One Way To Do It. But some ways are better than others.<br />
<br />
Some comments from the GHC team about their internal coding<br />
standards can be found at<br />
http://hackage.haskell.org/trac/ghc/wiki/WorkingConventions<br />
<br />
Also http://research.microsoft.com/~simonpj/papers/haskell-retrospective/<br />
contains some brief comments on syntax and style,<br />
<br />
What now follows are descriptions of program documentation, file<br />
format, naming conventions and good programming practice (adapted form<br />
Matt's C/C++ Programming Guidelines and the Linux kernel coding<br />
style).<br />
<br />
<br />
=== Documentation ===<br />
<br />
<br />
Comments are to be written in application terms (i.e. user's point of<br />
view). Don't use technical terms - that's what the code is for!<br />
<br />
Comments should be written using correct spelling and grammar in complete<br />
sentences with punctation (in English only).<br />
<br />
"Generally, you want your comments to tell WHAT your code does, not HOW.<br />
Also, try to avoid putting comments inside a function body: if the<br />
function is so complex that you need to separately comment parts of it,<br />
you should probably" (... decompose it)<br />
<br />
Put a [http://haskell-haddock.readthedocs.io/en/latest/markup.html haddock comment] on top of every exported function and data type!<br />
Make sure haddock accepts these comments.<br />
<br />
=== File Format ===<br />
<br />
<br />
All Haskell source files start with a haddock header of the form:<br />
<br />
<haskell><br />
{- |<br />
Module : <File name or $Header$ to be replaced automatically><br />
Description : <optional short text displayed on contents page><br />
Copyright : (c) <Authors or Affiliations><br />
License : <license><br />
<br />
Maintainer : <email><br />
Stability : unstable | experimental | provisional | stable | frozen<br />
Portability : portable | non-portable (<reason>)<br />
<br />
<module description starting at first column><br />
-}<br />
</haskell><br />
<br />
A possible compiler pragma (like {-# LANGUAGE CPP #-}) may precede<br />
this header. The following hierarchical module name must, of course,<br />
match the file name.<br />
<br />
Make sure that the description is changed to meet the module (if the<br />
header was copied from elsewhere). Insert your email address as maintainer.<br />
<br />
Try to write portable ([[Language_and_library_specification#The_Haskell_98_report|Haskell98]]) code. If you use e.g. multi-parameter<br />
type classes and functional dependencies the code becomes<br />
"non-portable (MPTC with FD)".<br />
<br />
The \$Header\$ entry will be automatically expanded.<br />
<br />
Lines should not be longer than 80 (preferably 75)<br />
characters to avoid wrapped lines (for casual readers)!<br />
<br />
Don't leave trailing white space in your code in every line.<br />
<br />
Expand all your tabs to spaces to avoid the danger of wrongly expanding<br />
them (or a different display of tabs versus eight spaces). Possibly put<br />
something like the following in your ~/.emacs file.<br />
<br />
(custom-set-variables '(indent-tabs-mode nil))<br />
<br />
The last character in your file should be a newline! Under Solaris<br />
you'll get a warning if this is not the case and sometimes last lines<br />
without newlines are ignored (i.e. "#endif" without newline). Emacs<br />
usually asks for a final newline.<br />
<br />
You may use http://hackage.haskell.org/package/scan to check your file format.<br />
<br />
The whole module should not be too long (about 400 lines)<br />
<br />
Please have a look at the [http://haskell-haddock.readthedocs.io/en/latest/markup.html#the-module-description Haddock module header documentation].<br />
<br />
=== Naming Conventions ===<br />
<br />
<br />
In Haskell types start with capital and functions with lowercase<br />
letters, so only avoid infix identifiers! Defining symbolic infix<br />
identifiers should be left to library writers only.<br />
<br />
(The infix identifier "\\" at the end of a line causes cpp preprocessor<br />
problems.)<br />
<br />
Names (especially global ones) should be descriptive and if you need<br />
long names write them as mixed case words (aka camelCase). (but "tmp"<br />
is to be preferred over "thisVariableIsATemporaryCounter")<br />
<br />
Also in the standard libraries, function names with multiple words are<br />
written using the camelCase convention. Similarly, type, typeclass and<br />
constructor names are written using the StudlyCaps convention.<br />
<br />
Some parts of our code use underlines (without unnecessary uppercase<br />
letters) for long identifiers to better reflect names given with<br />
hyphens in the requirement documentation. Also such names should be<br />
transliterated to camlCase identifiers possibly adding a (consistent)<br />
suffix or prefix to avoid conflicts with keywords. However, instead of<br />
a recurring prefix or suffix you may consider to use qualified imports<br />
and names.<br />
<br />
<br />
=== Good Programming Practice ===<br />
<br />
<br />
"Functions should be short and sweet, and do just one thing. They should<br />
fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24,<br />
as we all know), and do one thing and do that well."<br />
<br />
Most haskell functions should be at most a few lines, only case<br />
expression over large data types (that should be avoided, too) may need<br />
corresponding space.<br />
<br />
The code should be succinct (though not obfuscated), readable and easy to<br />
maintain (after unforeseeable changes). Don't exploit exotic language<br />
features without good reason.<br />
<br />
It's not fixed how deep you indent (4 or 8 chars). You can break the<br />
line after "do", "let", "where", and "case .. of". Make sure that<br />
renamings don't destroy your layout. (If you get too far to the right,<br />
the code is unreadable anyway and needs to be decomposed.)<br />
<br />
Bad:<br />
<haskell><br />
case foo of Foo -> "Foo"<br />
Bar -> "Bar"<br />
</haskell><br />
<br />
Good:<br />
<haskell><br />
case <longer expression> of<br />
Foo -> "Foo"<br />
Bar -> "Bar"<br />
</haskell><br />
<br />
Avoid the notation with braces and semicolons since the layout rule<br />
forces you to properly align your alternatives.<br />
<br />
Respect compiler warnings. Supply type signatures, avoid shadowing and<br />
unused variables. Particularly avoid non-exhaustive and<br />
overlapping patterns. Missing unreachable cases can be filled in using<br />
"error" with a fixed string "<ModuleName>.<function>" to indicate the<br />
error position (in case the impossible should happen). Don't invest<br />
time to "show" the offending value, only do this temporarily when<br />
debugging the code.<br />
<br />
Don't leave unused or commented-out code in your files! Readers don't<br />
know what to think of it.<br />
<br />
<br />
==== Partial functions ====<br />
<br />
For partial functions do document their preconditions (if not obvious)<br />
and make sure that partial functions are only called when<br />
preconditions are obviously fulfilled (i.e. by a case statement or a<br />
previous test). Particularly the call of "head" should be used with<br />
care or (even better) be made obsolete by a case statement.<br />
<br />
Usually a case statement (and the import of isJust and fromJust from<br />
Data.[[Maybe]]) can be avoided by using the "maybe" function:<br />
<br />
<haskell><br />
maybe (error "<ModuleName>.<function>") id $ Map.lookup key map<br />
</haskell><br />
<br />
Generally we require you to be more explicit about failure<br />
cases. Surely a missing (or an irrefutable) pattern would precisely<br />
report the position of a runtime error, but these are not so obvious<br />
when reading the code.<br />
<br />
==== Let or where expressions ====<br />
<br />
Do avoid mixing and nesting "let" and "where". (I prefer the<br />
expression-stylistic "let".) Use auxiliary top-level functions that<br />
you do not export. Export lists also support the detection of unused<br />
functions.<br />
<br />
<br />
==== Code reuse ====<br />
<br />
If you notice that you're doing the same task again, try to generalize<br />
it in order to avoid duplicate code. It is frustrating to change the<br />
same error in several places.<br />
<br />
<br />
==== Application notation ====<br />
<br />
Many parentheses can be eliminated using the infix application operator <code>$</code><br />
with lowest priority. Try at least to avoid unnecessary parentheses in<br />
standard infix expression.<br />
<br />
<haskell><br />
f x : g x ++ h x<br />
<br />
a == 1 && b == 1 || a == 0 && b == 0<br />
</haskell><br />
<br />
Rather than putting a large final argument in parentheses (with a<br />
distant closing one) consider using <code>$</code> instead.<br />
<br />
<code>f (g x)</code> becomes <code>f $ g x</code> and consecutive applications<br />
<code>f (g (h x))</code> can be written as <code>f $ g $ h x</code> or <code>f . g $ h x</code>.<br />
<br />
A function definition like<br />
<code>f x = g $ h x</code> can be abbreviated to <code>f = g . h</code>.<br />
<br />
Note that the final argument may even be an infix- or case expression:<br />
<br />
<haskell><br />
map id $ c : l<br />
<br />
filter (const True) . map id $ case l of ...<br />
</haskell><br />
<br />
However, be aware that $-terms cannot be composed further in infix<br />
expressions.<br />
<br />
Probably wrong:<br />
<haskell><br />
f $ x ++ g $ x<br />
</haskell><br />
<br />
But the scope of an expression is also limited by the layout rule, so<br />
it is usually safe to use "$" on right hand sides.<br />
<br />
Ok:<br />
<br />
<haskell><br />
do f $ l<br />
++<br />
do g $ l<br />
</haskell><br />
<br />
Of course <code>$</code> can not be used in types. GHC has also some primitive<br />
functions involving the kind <code>#</code> that cannot be applied using <code>$</code>.<br />
<br />
Last warning: always leave spaces around <code>$</code> (and other mixfix<br />
operators) since a clash with template haskell is possible.<br />
<br />
(Also write <code>\ t</code> instead of <code>\t</code> in lambda expressions)<br />
<br />
==== List Comprehensions ====<br />
<br />
Use these only when "short and sweet". Prefer map, filter, and foldr!<br />
<br />
Instead of:<br />
<br />
<haskell><br />
[toUpper c | c <- s]<br />
</haskell><br />
<br />
write:<br />
<br />
<haskell><br />
map toUpper s<br />
</haskell><br />
<br />
Consider:<br />
<br />
<haskell><br />
[toUpper c | s <- strings, c <- s]<br />
</haskell><br />
<br />
Here it takes some time for the reader to find out which value depends<br />
on what other value and it is not so clear how many times the interim<br />
values s and c are used. In contrast to that the following can't be clearer:<br />
<br />
<haskell><br />
map toUpper (concat strings)<br />
</haskell><br />
<br />
<br />
When using higher order functions you can switch easier to data<br />
structures different from list. Compare:<br />
<br />
<haskell><br />
map (1+) list<br />
</haskell><br />
<br />
and:<br />
<br />
<haskell><br />
Set.map (1+) set<br />
</haskell><br />
<br />
<br />
==== Types ====<br />
<br />
Prefer proper data types over type synonyms or tuples even if you have<br />
to do more constructing and unpacking. This will make it easier to<br />
supply class instances later on. Don't put class constraints on<br />
a data type, constraints belong only to the functions that manipulate<br />
the data.<br />
<br />
Using type synonyms consistently is difficult over a longer time,<br />
because this is not checked by the compiler. (The types shown by<br />
the compiler may be unpredictable: i.e. FilePath, String or [Char])<br />
<br />
Take care if your data type has many variants (unless it is an<br />
enumeration type.) Don't repeat common parts in every variant since<br />
this will cause code duplication.<br />
<br />
Bad (to handle arguments in sync):<br />
<br />
<haskell><br />
data Mode f p = Box f p | Diamond f p<br />
</haskell><br />
<br />
Good (to handle arguments only once):<br />
<br />
<haskell><br />
data BoxOrDiamond = Box | Diamond<br />
<br />
data Mode f p = Mode BoxOrDiamond f p<br />
</haskell><br />
<br />
<br />
Consider (bad):<br />
<br />
<haskell><br />
data Tuple a b = Tuple a b | Undefined<br />
</haskell><br />
<br />
versus (better):<br />
<br />
<haskell><br />
data Tuple a b = Tuple a b<br />
</haskell><br />
<br />
and using:<br />
<br />
<haskell><br />
Maybe (Tuple a b)<br />
</haskell><br />
<br />
(or another monad) whenever an undefined result needs to be propagated<br />
<br />
<br />
==== Records ====<br />
<br />
For (large) records avoid the use of the constructor directly and<br />
remember that the order and number of fields may change.<br />
<br />
Take care with (the rare case of) depend polymorphic fields:<br />
<br />
<haskell><br />
data Fields a = VariantWithTwo<br />
{ field1 :: a<br />
, field2 :: a }<br />
</haskell><br />
<br />
The type of a value v can not be changed by only setting field1:<br />
<br />
<haskell><br />
v { field1 = f }<br />
</haskell><br />
<br />
Better construct a new value:<br />
<br />
<haskell><br />
VariantWithTwo { field1 = f } -- leaving field2 undefined<br />
</haskell><br />
<br />
Or use a polymorphic element that is instantiated by updating:<br />
<br />
<haskell><br />
empty = VariantWithTwo { field1 = [], field2 = [] }<br />
<br />
empty { field1 = [f] }<br />
</haskell><br />
<br />
Several variants with identical fields may avoid some code duplication<br />
when selecting and updating, though possibly not in a few<br />
depended polymorphic cases.<br />
<br />
However, I doubt if the following is a really good alternative to the<br />
above data Mode with data BoxOrDiamond.<br />
<br />
<haskell><br />
data Mode f p =<br />
Box { formula :: f, positions :: p }<br />
| Diamond { formula :: f, positions :: p }<br />
</haskell><br />
<br />
<br />
==== IO ====<br />
<br />
Try to strictly separate IO, Monad and pure (without do) function<br />
programming (possibly via separate modules).<br />
<br />
Bad:<br />
<br />
<haskell><br />
x <- return y<br />
...<br />
</haskell><br />
<br />
Good:<br />
<br />
<haskell><br />
let x = y<br />
...<br />
</haskell><br />
<br />
Don't use <code>Prelude.interact</code> and make sure your program does not depend<br />
on the (not always obvious) order of evaluation. E.g. don't read and<br />
write to the same file:<br />
<br />
This will fail:<br />
<br />
<haskell><br />
do s <- readFile f<br />
writeFile f $ 'a' : s<br />
</haskell><br />
<br />
because of lazy IO! (Writing is starting before reading is finished.)<br />
<br />
==== Trace ====<br />
<br />
Tracing is for debugging purposes only and should not be used as<br />
feedback for the user. Clean code is not cluttered by trace calls.<br />
<br />
<br />
==== Imports ====<br />
<br />
Standard library modules like Char. List, Maybe, Monad, etc should be<br />
imported by their hierarchical module name, i.e. the base package (so<br />
that haddock finds them):<br />
<br />
<haskell><br />
import Data.List<br />
import Control.Monad<br />
import System.Environment<br />
</haskell><br />
<br />
The libraries for Set and Map are to be imported qualified:<br />
<br />
<haskell><br />
import qualified Data.Set as Set<br />
import qualified Data.Map as Map<br />
</haskell><br />
<br />
<br />
==== Glasgow extensions and Classes ====<br />
<br />
[[Use of language extensions|Stay away from extensions]] as long as possible. Also use classes with<br />
care because soon the desire for overlapping instances (like for lists<br />
and strings) may arise. Then you may want MPTC (multi-parameter type<br />
classes), functional dependencies (FD), undecidable and possibly incoherent<br />
instances and then you are "in the wild" (according to SPJ).<br />
<br />
=== Style in other languages ===<br />
<br />
* [http://www.cs.caltech.edu/~cs20/a/style.html OCaml style]<br />
<br />
=== Final remarks ===<br />
<br />
Despite guidelines, writing "correct code" (without formal proof<br />
support yet) still remains the major challenge. As motivation to<br />
follow these guidelines consider the points that are from the "C++<br />
Coding Standard", where I replaced "C++" with "Haskell".<br />
<br />
Good Points:<br />
<br />
* programmers can go into any code and figure out what's going on<br />
<br />
* new people can get up to speed quickly<br />
<br />
* people new to Haskell are spared the need to develop a personal style and defend it to the death<br />
<br />
* people new to Haskell are spared making the same mistakes over and over again<br />
<br />
* people make fewer mistakes in consistent environments<br />
<br />
* programmers have a common enemy :-)<br />
<br />
Bad Points:<br />
<br />
* the standard is usually stupid because it was made by someone who doesn't understand Haskell<br />
<br />
* the standard is usually stupid because it's not what I do<br />
<br />
* standards reduce creativity<br />
<br />
* standards are unnecessary as long as people are consistent<br />
<br />
* standards enforce too much structure<br />
<br />
* people ignore standards anyway<br />
<br />
== Another guidelines ==<br />
<br />
[https://www.cse.unsw.edu.au/~cs3161/15s2/StyleGuide.html COMP3161]<br />
<br />
[http://docs.ganeti.org/ganeti/2.13/html/dev-codestyle.html#haskell Google's ganeti]<br />
<br />
[https://kowainik.github.io/posts/2019-02-06-style-guide kowainik]<br />
<br />
[https://github.com/Soostone/style-guides/blob/master/haskell-style-guide.md Soostone]<br />
<br />
[https://github.com/tibbe/haskell-style-guide/blob/master/haskell-style.md tibbe]<br />
<br />
[https://github.com/tweag/guides/blob/master/style/Haskell.md tweag]<br />
<br />
[[Category:Style]]</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=HaskellWiki:Guidelines&diff=62890HaskellWiki:Guidelines2019-04-08T20:17:17Z<p>Anton-Latukha: add sec "MediaWiki configuration bugs and feature requests"</p>
<hr />
<div>Here you will find guidelines for editing this wiki.<br />
<br />
== General principles ==<br />
<br />
Please bear in mind that our first audience is those expecting a web-site on Haskell and not necessarily interested in this wiki thing. The default style, for instance, tries to make the wiki-ness as discreet as possible for readers not logged in.<br />
<br />
== Page structure ==<br />
<br />
There shall not be a single top level section. If you divide a page into sections, please introduce multiple sections at the top level. That means, the sections should form a true forest instead of just a tree. Using a single top level section is very untypical and is also not needed for giving the whole text a headline since there is the page title for this issue.<br />
<br />
== Page titles ==<br />
<br />
These are some guidelines for page titles:<br />
; No CamelCase<br />
: Page titles should be real titles like titles in a book or paper. Therefore, please don’t use CamelCase.<br />
; No mnemonic path names<br />
: Page titles not only affect a page’s URL but they are also shown at the top of the page. Therefore, it is important that a page title is a title, not a mnemonic path name. On the other hand, it is good if titles are not unnecessarily long so that URLs have reasonable size.<br />
; Sentence-style capitalization<br />
: Titles should use sentence-style capitalization (see [http://en.wikipedia.org/wiki/Capitalization#Headings_and_publication_titles]). Capitalizing more words seems to be only appropriate in larger documents like books. If slashes are used to denote hierarchy, each part of the hierarchical title should be handled as if it were a title itself, i.e., its first word should be capitalized.<br />
<br />
=== Page titles of “quiz” pages ===<br />
The de-facto standard for quiz and contest pages is capitalization of each word in the page title, apparently contradicting the “sentence case” guideline above. This is based on the idea that each of the quiz/contest/shootout problems is considered a proper name, hence should be capitalized. Similarly, the solutions are named “Solution John Doe”, where the author of the solution is John Doe, again requiring capitalization of all words.<br />
<br />
== Renaming of page titles ==<br />
When doing a significant renaming of a page, (i.e., something more than just correcting the case as per the guidelines), use the discussion page to suggest a new title and the reasons for it. After a reasonable amount of time, if there are no objections, go ahead and move/rename the page.<br />
<br />
== Categories ==<br />
<br />
In order to ease navigation in the wiki, several categories are<br />
introduced. Whenever possible, please classify your page under one of<br />
the categories. For example, if you write a page about your new project,<br />
you can add it to the [[Category:Applications|Applications category]]<br />
by adding <nowiki>[[Category:Applications]]</nowiki> to your page.<br />
<br />
All known categories are in the wiki special page [[Special:Categories]].<br />
<br />
== Headlines ==<br />
<br />
Level 1 headlines (the ones written as <code>= Headline =</code>) shall not be used. Instead, a headline which is logically at the top level shall be implemented as a level 2 headline (i.e, as <code>== Headline ==</code>). The reason is that the page title and the level 1 headlines are both implemented as <tt>h1</tt> elements in the page’s HTML. They are distinguished by using different HTML classes indeed, but this distinction is too light. For example, a browser which doesn't support CSS would probably render page titles and level 1 headlines the same way.<br />
<br />
Like page titles, headlines shall use sentence-style capitalization [http://en.wikipedia.org/wiki/Capitalization#Headings_and_publication_titles].<br />
<br />
== Links ==<br />
<br />
Please don’t use the click here approach for inserting links. Instead, formulate your text as if you would put it on paper (where the request to click doesn't make sense) and markup parts of the text (often only single words) as links.<br />
<br />
== Code on the Wiki ==<br />
<br />
As this site is about Haskell, there will obviously be cases where you need to include Haskell (or other) code. Use the appropriate tags for the language (<code>&lt;hask&gt;</code> and <code>&lt;/hask&gt;</code> for inline Haskell, <code>&lt;haskell&gt;</code> and <code>&lt;/haskell&gt;</code> for full blocks.) See, for example, [[HaskellWiki:Style test]].<br />
<br />
Code examples should be kept as small and relevant as possible.<br />
<br />
=== Large blocks of code ===<br />
<br />
Rarely will people wade through a large piece of code, unless they are looking for something to implement or actually use. For those cases, consider using the Wiki [[Special:Upload | file upload]] capability to load the source code and then provide a link to that on the page. As an example, see the [[Alex/Wrapper monadUser]] page. <br />
<br />
This will contribute to the readability of the page.<br />
<br />
== Punctuation ==<br />
<br />
If would be great if you would use the punctuation symbols which are used in professional documents like “, ” and — instead of their ASCII workarounds like " and -. If you use an X server with a keymap like mine, you can get “ and ” with AltGr&nbsp;+&nbsp;V and AltGr&nbsp;+&nbsp;B.<br />
<br />
== Editing ==<br />
<br />
When editing the wiki, the bottom of the editing page has a place to put a summary. It is considered good form to put a one line comment about any changes you are making. As well, you have the capability to mark an edit as ''minor''. Please use this only for changing format, spelling, fixing bad links or minor word re-arrangement. It should not be used when adding whole new sections, a significant number of links, new pages or rewrites of the page.<br />
<br />
There is special [[Help:Editing|information on editor support]] for the Haskell wiki.<br />
<br />
== MediaWiki configuration bugs and feature requests ==<br />
<br />
Refer to [[HaskellWiki:About]] and [https://github.com/haskell/haskell-wiki-configuration haskell-wiki-configuration repo].</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=User:Anton-Latukha&diff=62889User:Anton-Latukha2019-04-08T19:55:57Z<p>Anton-Latukha: </p>
<hr />
<div>Profiles:<br />
* [https://github.com/Anton-Latukha GitHub]<br />
* [https://gitlab.com/Anton.Latukha GitLab]<br />
* [https://gitlab.haskell.org/Anton-Latukha Haskell GitLab]</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=User:Anton-Latukha&diff=62888User:Anton-Latukha2019-04-08T19:55:20Z<p>Anton-Latukha: init</p>
<hr />
<div>Profiles:<br />
* [https://github.com/Anton-Latukha GitHub]<br />
* [https://gitlab.com/Anton.Latukha GitLab]<br />
* [https://gitlab.haskell.org/Anton-Latukha Haskell GitLab profile]</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=HaskellWiki_talk:Guidelines&diff=62887HaskellWiki talk:Guidelines2019-04-08T19:50:45Z<p>Anton-Latukha: /* Why this article attached to Category: Applications */</p>
<hr />
<div>==Headlines==<br />
<br />
[[User:BrettGiles|Brett]], this is not to offend you. It just expresses my conviction. -- [[User:Wolfgang Jeltsch|Wolfgang Jeltsch]]<br />
<br />
:Not to worry [[User:Wolfgang Jeltsch|Wolfgang]] - I never take offence :) - My thoughts on the usage of level 2 only were primarily due to the size balance, but also I note this is a guideline for the original mediawiki site. Does anyone know why that is? [[User:BrettGiles|BrettGiles]] 22:11, 25 February 2006 (UTC)<br />
<br />
::No idea, but it seems loose to me. -- [[User:Wolfgang Jeltsch|Wolfgang Jeltsch]] 22:40, 25 February 2006 (UTC)<br />
<br />
:::I did some research on [http://meta.wikimedia.org/wiki/Help_talk:Section metawiki] and this is the reason they start at level 2. ''"So that when a user without CSS, or a text-mode browser, or a screen reader visits, they'll be presented with a page that at least has a logical document flow."'' From what I can tell, by default, the page title is rendered as a '''h1''' element. So are single '''=''' headings. Double '''==''' are '''h2''' and so on. So from that point of view it does make sense to start at two equals. I think the real issue is that the software is slightly broken in not generating an '''h2''' for a single equals sign. So, what do you think? [[User:BrettGiles|BrettGiles]] 22:36, 26 February 2006 (UTC)<br />
<br />
::::I prefer starting with ==, like Wikipedia. &mdash;[[User:Ashley Y|Ashley Y]] 00:52, 2 March 2006 (UTC)<br />
<br />
::::MediaWiki is broken in not creating an '''h2''' element for a headline with single equal signs. So we should work around this brokeness by making our top-level headlines <tt>==</tt> headlines. -- [[User:Wolfgang Jeltsch|Wolfgang Jeltsch]] 21:22, 8 March 2006 (UTC)<br />
<br />
:::::I've added a respective section to [[HaskellWiki:Guidelines]]. -- [[User:Wolfgang Jeltsch|Wolfgang Jeltsch]] 21:40, 8 March 2006 (UTC)<br />
<br />
==Page nameing and renaming==<br />
Apparently the last rename I did so offended the author that they quit the wiki. I used to follow the guideline I just added on the main page, but had never got a response before. However, considering the size of the wiki and the number of new users (especially those that seem to not know there are guidelines :), perhaps it is time to re-institute the practice. --[[User:BrettGiles|BrettGiles]] 18:04, 24 July 2007 (UTC)<br />
<br />
==Official guidelines?==<br />
<br />
What makes a guidline an "official" guidline? What's the process? &mdash;[[User:Ashley Y|Ashley Y]] 00:53, 2 March 2006 (UTC)<br />
<br />
:I suspect that a large number of the users / contributors to this site have an academic background. My experience with academia is that some kind of consensus based decision is the most successful at getting the decision adopted. In a community like this wiki, I’m not sure how to get that. We could send out notices on the various mailing lists, we could post notes on the front page and probably other ways as well. Like any consensus decision making it would take a long time. On the other hand, those that support the site could just be autocratic, set the rule and if there is too much flak, change it. <br />
:As to what actually makes an "official" guidline, I think it is up to the '''sysops''' whether we even have such a thing. The word tends to mean that '''sysops''' are required to monitor and fix the site when these are not followed. So, we might not even want to go to official, stay at unofficial and just let the community grow the guidlines (and do the monitoring) as they see fit. Sometimes, a certain level of anarchy works fine.[[User:BrettGiles|BrettGiles]] 16:06, 2 March 2006 (UTC)<br />
::I think that the traditional consensus decision making process is too slow. In addition, not everyone is interested in every guideline question, so it is not sensible to bother everyone with each of these questions. Therefore, it seems to me that it is better to follow your “autocracy approach”. And I also doubt that it is sensible to let some sysops decide about what guidelines are official. At least, this would be a bit contrary to the wiki approach, in my opinion. So I propose that users are allowed to set up guidelines they think of as sensible—possibly after some short discussion—and to declare them as official. If others don't like the guidelines, the guidelines can still be discussed (again) and changed. -- [[User:Wolfgang Jeltsch|Wolfgang Jeltsch]] 21:22, 8 March 2006 (UTC)<br />
:::So this is how all that started: with an abuse of power. Is this real? I mean, it really started this way? Do you understand that autocracy is against the very wiki philosophy and is an abuse against every other wiki users? Users are not allowed to set rules and start enforcing them '''against''' other users. This is going to create tension and, in the long run, to destroy a wiki. --AndreaRossato Thu 26 2007 07:37 CEST <br />
<br />
OK. Everything on the project page should have a consensus. Everything else should be discussed here. I have adjusted the project page accordingly. &mdash;[[User:Ashley Y|Ashley Y]] 06:34, 3 March 2006 (UTC)<br />
:What is the “project page”? -- [[User:Wolfgang Jeltsch|Wolfgang Jeltsch]] 21:22, 8 March 2006 (UTC)<br />
::Ah, I understand. It's [[HaskellWiki:Guidelines]]. -- [[User:Wolfgang Jeltsch|Wolfgang Jeltsch]] 21:40, 8 March 2006 (UTC)<br />
<br />
== Headlines ==<br />
<br />
Each page has a title which is automatically shown near the top of the page (currently centered and in red). So it doesn't make sense to put a single level 1 headline at the top of the page. (At the moment, you will see this on a few pages of this wiki. I suppose this is a legacy which comes from copying content from the old website to this wiki.)<br />
<br />
For top-level structuring, level 1 headlines (the ones denoted by single equals signs in the source) should be used. It is semantically incorrect to use level 2 headlines for this purpose. I note that level 1 headlines are currently rather large and in a font that might not be preferable. However, the solution to this is not to refrain from using level 1 headlines but to change the stylesheet appropriately. ''&mdash;says [[User:Wolfgang Jeltsch]]''<br />
<br />
:Note that is is currently being discussed. Level 1 (single equals) create an <tt>&lt;h1&gt;</tt> heading, which is the same level as the page title. We may (or may not:) switch to the standard of starting at level 2 (double equals) ''&mdash;says [[User:BrettGiles]]''<br />
<br />
:I prefer starting with level 2 as per Wikipedia. I consider the page title to be implicitly level 1 (whether it gets rendered as h1 or not). &mdash;[[User:Ashley Y|Ashley Y]] 06:34, 3 March 2006 (UTC)<br />
<br />
<br />
== Wiki Bug ==<br />
Try giving a Haskell example with the greater than or the less than sign in it. It clashes with the HTML rendering! -- says [[User:Uchchwhash|Pirated Dreams]] 04:22, 17 March 2006 (UTC)<br />
<br />
==What's Wrong With This Guidelines==<br />
I don't know how these guidelines became the official policy of this<br />
wiki, and I cannot find out how some of us decided to become the<br />
housekeepers, or the [http://c2.com/cgi/wiki?WikiGnome WikiGnome]s, of<br />
it.<br />
<br />
Still I've been involved in wiki development and I have some ideas on<br />
how a wiki should work. I tried to discuss it in the<br />
[http://www.haskell.org/pipermail/haskell-cafe/2007-July/029483.html haskell-café mailing list].<br />
<br />
Probably I should do that here too.<br />
<br />
[http://c2.com/cgi/wiki?HowToReactToGoodStyleTransgressions How to react to good style transgression?]<br />
Are these guidelines to be enforced? Aren't enforced guidelines a<br />
contradiction in terms? Sure they are...<br />
<br />
I don't think that cosmetic edits done just to enforce these<br />
guidelines are going to help this wiki on the long run. They piss<br />
newcomers off. And you are pissing off right the newcomers you would<br />
like to keep, the ones who are willing to contribute.<br />
<br />
You could start by reading the new material, and doing some edit to<br />
the content: if you collaborate the wiki way, probably your<br />
suggestions on style issues would be followed.<br />
<br />
Anyway, these are guidelines, or they pretend they are, and so you<br />
must always think about the fact the new paths can be taken.<br />
<br />
Instead what you have been doing, in the last year, is setting some<br />
rules here and reshaping the wiki in the name of those rules.<br />
<br />
===And what is right with these guidelines===<br />
So, when the "new" wiki started, there was quite a bit of discussion here and on the haskell list about what was an appropriate style for the wiki pages, considering that the site is meant to be a resource for new and exsisting Haskell users and as such, were desired to be reasonably consistent in style. These guidelines were the result. A variety of people edit pages to bring them into the stylistic guidelines, but if anyone is not happy with the current set, this seems like a good place to discuss changing them. <br />
<br />
Andrea's link to the "WikiGnome" has some very good pointers about stylistic changes being applied to the edits of ''newcomers'', and an even better one about dropping the argument. So this is my last edit about the subject :) --[[User:BrettGiles|BrettGiles]] 21:58, 25 July 2007 (UTC)<br />
:: Style consistency is not a primary goal of a wiki. A primary goal for the wiki is always having new people contribute to its grow. If you think that stylistic guidelines enforcing must be achieved regardless of the consequences than you should drop wiki technologies and use a blogging system or other kind of content management system. Instead, if you want to use a wiki, you should try to stick to the wiki philosophy as close as you can. --AndreaRossato Thu 26 2007 07:27 CEST <br />
::I'd like to point out that I perfectly understand that your effort has the aim of improving this wiki, and I don't think that style consistency is something to be avoided. What I'm trying to do, is to help finding a better way of achieving style consistency without increasing the level of entrance for new users. I hope you - Brett - will understand I have nothing against you. But I would like you to take into account the perspective of a newcomer too. I hope you are going to keep on discussing these issues with me. That would be helpful for the both of us. Thanks for your kind attention. --AndreaRossato Jul 26 2007 10:05 CEST. <br />
<br />
And that leads me to the second real big issue: page renaming.<br />
<br />
==Why Page Renaming Is BAD==<br />
<br />
Wiki page titles are [http://en.wikipedia.org/wiki/URI URI], and URI<br />
should be persistent over time.<br />
<br />
When you rename a page you are breaking URI persistence. A redirect is<br />
only a work around. A bad work around.<br />
<br />
You did a lot of page renaming lately. By doing so you broke a lot of<br />
URIs.<br />
<br />
You are also breaking the structure of this wiki, by creating a false<br />
hierarchy of pages. In a wiki all pages are top level. Hierarchy and<br />
wikis belong to different worlds.<br />
<br />
I do believe that this guidelines are used in a vary bad way. I would<br />
like you to stop. Or at least to seriously reconsider your way of<br />
enforcing them.<br />
<br />
AndreaRossato -- Wed Jul 25 2007 19:57 CEST<br />
<br />
==Amendments Proposal==<br />
I think we could use this episode to grow, as a community. I suggest some amendments to these guidelines:<br />
# enforcement: we could add something about guidelines enforcement, referring to the link I [http://c2.com/cgi/wiki?HowToReactToGoodStyleTransgressions posted before]: the user's Talk page should be used instead, and cosmetic edits should be ruled out as a bad practice, incompatible with the wiki philosopy;<br />
# page renaming: we should adopt a very strict policy for page renaming. URI persistence should be the primary goal, so page names can be changed only within a very short period of time after page creation, only for compelling reasons and only with the consent of other users, after discussing it in the talk page;<br />
# wiki structure: I'm not going to make any proposal but I would like you to think about the fact that wiki pages should be all top level, with no (false) hierarchy. This means that using slashes in page name should be considered bad practice.<br />
I hope I will find people willing to discuss these proposals.<br />
Thanks for your kind attention. --AndreaRossato Jul 26 07:59 CEST<br />
:Hi Andrea - I would suggest a few things regarding this:<br />
:*Let's move the proposal to the top of the page so that it is the first thing people see. We may even want to group the rest (or some of the rest) as old history.<br />
:*It might be a good idea to advertise this on the haskell mailing list and the haskell-cafe lists. Some people who are interested might not check the latest changes. Previous discussions seemed effective to me because there were multiple points of view.<br />
::--[[User:BrettGiles|BrettGiles]] 01:24, 27 July 2007 (UTC)<br />
:::So, a month has gone by with no changes / additions / updates to this page. I wonder if anyone supports or disagrees with the proposal by Andrea? --[[User:BrettGiles|BrettGiles]] 20:10, 24 August 2007 (UTC)<br />
<br />
== Please, attach this article to better Category ==<br />
<br />
It is strange to see it grooped in Categoty: Applications. Mostly all internal wiki articles group around several internal special categories. For people to be able to explore the internal Wiki articles, rules, guidelines, style guides.<br />
I found these Categories:<br />
# Community<br />
# FAQ<br />
# Tutorials<br />
# Style<br />
# Reference<br />
# special category for internal documentation.<br />
<br />
Each of them seems more suitable then Category: Applications.<br />
[[User:Anton-Latukha|Anton-Latukha]] ([[User talk:Anton-Latukha|talk]])</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=HaskellWiki_talk:Guidelines&diff=62886HaskellWiki talk:Guidelines2019-04-08T19:15:43Z<p>Anton-Latukha: /* Why this article attached to Category: Applications */</p>
<hr />
<div>==Headlines==<br />
<br />
[[User:BrettGiles|Brett]], this is not to offend you. It just expresses my conviction. -- [[User:Wolfgang Jeltsch|Wolfgang Jeltsch]]<br />
<br />
:Not to worry [[User:Wolfgang Jeltsch|Wolfgang]] - I never take offence :) - My thoughts on the usage of level 2 only were primarily due to the size balance, but also I note this is a guideline for the original mediawiki site. Does anyone know why that is? [[User:BrettGiles|BrettGiles]] 22:11, 25 February 2006 (UTC)<br />
<br />
::No idea, but it seems loose to me. -- [[User:Wolfgang Jeltsch|Wolfgang Jeltsch]] 22:40, 25 February 2006 (UTC)<br />
<br />
:::I did some research on [http://meta.wikimedia.org/wiki/Help_talk:Section metawiki] and this is the reason they start at level 2. ''"So that when a user without CSS, or a text-mode browser, or a screen reader visits, they'll be presented with a page that at least has a logical document flow."'' From what I can tell, by default, the page title is rendered as a '''h1''' element. So are single '''=''' headings. Double '''==''' are '''h2''' and so on. So from that point of view it does make sense to start at two equals. I think the real issue is that the software is slightly broken in not generating an '''h2''' for a single equals sign. So, what do you think? [[User:BrettGiles|BrettGiles]] 22:36, 26 February 2006 (UTC)<br />
<br />
::::I prefer starting with ==, like Wikipedia. &mdash;[[User:Ashley Y|Ashley Y]] 00:52, 2 March 2006 (UTC)<br />
<br />
::::MediaWiki is broken in not creating an '''h2''' element for a headline with single equal signs. So we should work around this brokeness by making our top-level headlines <tt>==</tt> headlines. -- [[User:Wolfgang Jeltsch|Wolfgang Jeltsch]] 21:22, 8 March 2006 (UTC)<br />
<br />
:::::I've added a respective section to [[HaskellWiki:Guidelines]]. -- [[User:Wolfgang Jeltsch|Wolfgang Jeltsch]] 21:40, 8 March 2006 (UTC)<br />
<br />
==Page nameing and renaming==<br />
Apparently the last rename I did so offended the author that they quit the wiki. I used to follow the guideline I just added on the main page, but had never got a response before. However, considering the size of the wiki and the number of new users (especially those that seem to not know there are guidelines :), perhaps it is time to re-institute the practice. --[[User:BrettGiles|BrettGiles]] 18:04, 24 July 2007 (UTC)<br />
<br />
==Official guidelines?==<br />
<br />
What makes a guidline an "official" guidline? What's the process? &mdash;[[User:Ashley Y|Ashley Y]] 00:53, 2 March 2006 (UTC)<br />
<br />
:I suspect that a large number of the users / contributors to this site have an academic background. My experience with academia is that some kind of consensus based decision is the most successful at getting the decision adopted. In a community like this wiki, I’m not sure how to get that. We could send out notices on the various mailing lists, we could post notes on the front page and probably other ways as well. Like any consensus decision making it would take a long time. On the other hand, those that support the site could just be autocratic, set the rule and if there is too much flak, change it. <br />
:As to what actually makes an "official" guidline, I think it is up to the '''sysops''' whether we even have such a thing. The word tends to mean that '''sysops''' are required to monitor and fix the site when these are not followed. So, we might not even want to go to official, stay at unofficial and just let the community grow the guidlines (and do the monitoring) as they see fit. Sometimes, a certain level of anarchy works fine.[[User:BrettGiles|BrettGiles]] 16:06, 2 March 2006 (UTC)<br />
::I think that the traditional consensus decision making process is too slow. In addition, not everyone is interested in every guideline question, so it is not sensible to bother everyone with each of these questions. Therefore, it seems to me that it is better to follow your “autocracy approach”. And I also doubt that it is sensible to let some sysops decide about what guidelines are official. At least, this would be a bit contrary to the wiki approach, in my opinion. So I propose that users are allowed to set up guidelines they think of as sensible—possibly after some short discussion—and to declare them as official. If others don't like the guidelines, the guidelines can still be discussed (again) and changed. -- [[User:Wolfgang Jeltsch|Wolfgang Jeltsch]] 21:22, 8 March 2006 (UTC)<br />
:::So this is how all that started: with an abuse of power. Is this real? I mean, it really started this way? Do you understand that autocracy is against the very wiki philosophy and is an abuse against every other wiki users? Users are not allowed to set rules and start enforcing them '''against''' other users. This is going to create tension and, in the long run, to destroy a wiki. --AndreaRossato Thu 26 2007 07:37 CEST <br />
<br />
OK. Everything on the project page should have a consensus. Everything else should be discussed here. I have adjusted the project page accordingly. &mdash;[[User:Ashley Y|Ashley Y]] 06:34, 3 March 2006 (UTC)<br />
:What is the “project page”? -- [[User:Wolfgang Jeltsch|Wolfgang Jeltsch]] 21:22, 8 March 2006 (UTC)<br />
::Ah, I understand. It's [[HaskellWiki:Guidelines]]. -- [[User:Wolfgang Jeltsch|Wolfgang Jeltsch]] 21:40, 8 March 2006 (UTC)<br />
<br />
== Headlines ==<br />
<br />
Each page has a title which is automatically shown near the top of the page (currently centered and in red). So it doesn't make sense to put a single level 1 headline at the top of the page. (At the moment, you will see this on a few pages of this wiki. I suppose this is a legacy which comes from copying content from the old website to this wiki.)<br />
<br />
For top-level structuring, level 1 headlines (the ones denoted by single equals signs in the source) should be used. It is semantically incorrect to use level 2 headlines for this purpose. I note that level 1 headlines are currently rather large and in a font that might not be preferable. However, the solution to this is not to refrain from using level 1 headlines but to change the stylesheet appropriately. ''&mdash;says [[User:Wolfgang Jeltsch]]''<br />
<br />
:Note that is is currently being discussed. Level 1 (single equals) create an <tt>&lt;h1&gt;</tt> heading, which is the same level as the page title. We may (or may not:) switch to the standard of starting at level 2 (double equals) ''&mdash;says [[User:BrettGiles]]''<br />
<br />
:I prefer starting with level 2 as per Wikipedia. I consider the page title to be implicitly level 1 (whether it gets rendered as h1 or not). &mdash;[[User:Ashley Y|Ashley Y]] 06:34, 3 March 2006 (UTC)<br />
<br />
<br />
== Wiki Bug ==<br />
Try giving a Haskell example with the greater than or the less than sign in it. It clashes with the HTML rendering! -- says [[User:Uchchwhash|Pirated Dreams]] 04:22, 17 March 2006 (UTC)<br />
<br />
==What's Wrong With This Guidelines==<br />
I don't know how these guidelines became the official policy of this<br />
wiki, and I cannot find out how some of us decided to become the<br />
housekeepers, or the [http://c2.com/cgi/wiki?WikiGnome WikiGnome]s, of<br />
it.<br />
<br />
Still I've been involved in wiki development and I have some ideas on<br />
how a wiki should work. I tried to discuss it in the<br />
[http://www.haskell.org/pipermail/haskell-cafe/2007-July/029483.html haskell-café mailing list].<br />
<br />
Probably I should do that here too.<br />
<br />
[http://c2.com/cgi/wiki?HowToReactToGoodStyleTransgressions How to react to good style transgression?]<br />
Are these guidelines to be enforced? Aren't enforced guidelines a<br />
contradiction in terms? Sure they are...<br />
<br />
I don't think that cosmetic edits done just to enforce these<br />
guidelines are going to help this wiki on the long run. They piss<br />
newcomers off. And you are pissing off right the newcomers you would<br />
like to keep, the ones who are willing to contribute.<br />
<br />
You could start by reading the new material, and doing some edit to<br />
the content: if you collaborate the wiki way, probably your<br />
suggestions on style issues would be followed.<br />
<br />
Anyway, these are guidelines, or they pretend they are, and so you<br />
must always think about the fact the new paths can be taken.<br />
<br />
Instead what you have been doing, in the last year, is setting some<br />
rules here and reshaping the wiki in the name of those rules.<br />
<br />
===And what is right with these guidelines===<br />
So, when the "new" wiki started, there was quite a bit of discussion here and on the haskell list about what was an appropriate style for the wiki pages, considering that the site is meant to be a resource for new and exsisting Haskell users and as such, were desired to be reasonably consistent in style. These guidelines were the result. A variety of people edit pages to bring them into the stylistic guidelines, but if anyone is not happy with the current set, this seems like a good place to discuss changing them. <br />
<br />
Andrea's link to the "WikiGnome" has some very good pointers about stylistic changes being applied to the edits of ''newcomers'', and an even better one about dropping the argument. So this is my last edit about the subject :) --[[User:BrettGiles|BrettGiles]] 21:58, 25 July 2007 (UTC)<br />
:: Style consistency is not a primary goal of a wiki. A primary goal for the wiki is always having new people contribute to its grow. If you think that stylistic guidelines enforcing must be achieved regardless of the consequences than you should drop wiki technologies and use a blogging system or other kind of content management system. Instead, if you want to use a wiki, you should try to stick to the wiki philosophy as close as you can. --AndreaRossato Thu 26 2007 07:27 CEST <br />
::I'd like to point out that I perfectly understand that your effort has the aim of improving this wiki, and I don't think that style consistency is something to be avoided. What I'm trying to do, is to help finding a better way of achieving style consistency without increasing the level of entrance for new users. I hope you - Brett - will understand I have nothing against you. But I would like you to take into account the perspective of a newcomer too. I hope you are going to keep on discussing these issues with me. That would be helpful for the both of us. Thanks for your kind attention. --AndreaRossato Jul 26 2007 10:05 CEST. <br />
<br />
And that leads me to the second real big issue: page renaming.<br />
<br />
==Why Page Renaming Is BAD==<br />
<br />
Wiki page titles are [http://en.wikipedia.org/wiki/URI URI], and URI<br />
should be persistent over time.<br />
<br />
When you rename a page you are breaking URI persistence. A redirect is<br />
only a work around. A bad work around.<br />
<br />
You did a lot of page renaming lately. By doing so you broke a lot of<br />
URIs.<br />
<br />
You are also breaking the structure of this wiki, by creating a false<br />
hierarchy of pages. In a wiki all pages are top level. Hierarchy and<br />
wikis belong to different worlds.<br />
<br />
I do believe that this guidelines are used in a vary bad way. I would<br />
like you to stop. Or at least to seriously reconsider your way of<br />
enforcing them.<br />
<br />
AndreaRossato -- Wed Jul 25 2007 19:57 CEST<br />
<br />
==Amendments Proposal==<br />
I think we could use this episode to grow, as a community. I suggest some amendments to these guidelines:<br />
# enforcement: we could add something about guidelines enforcement, referring to the link I [http://c2.com/cgi/wiki?HowToReactToGoodStyleTransgressions posted before]: the user's Talk page should be used instead, and cosmetic edits should be ruled out as a bad practice, incompatible with the wiki philosopy;<br />
# page renaming: we should adopt a very strict policy for page renaming. URI persistence should be the primary goal, so page names can be changed only within a very short period of time after page creation, only for compelling reasons and only with the consent of other users, after discussing it in the talk page;<br />
# wiki structure: I'm not going to make any proposal but I would like you to think about the fact that wiki pages should be all top level, with no (false) hierarchy. This means that using slashes in page name should be considered bad practice.<br />
I hope I will find people willing to discuss these proposals.<br />
Thanks for your kind attention. --AndreaRossato Jul 26 07:59 CEST<br />
:Hi Andrea - I would suggest a few things regarding this:<br />
:*Let's move the proposal to the top of the page so that it is the first thing people see. We may even want to group the rest (or some of the rest) as old history.<br />
:*It might be a good idea to advertise this on the haskell mailing list and the haskell-cafe lists. Some people who are interested might not check the latest changes. Previous discussions seemed effective to me because there were multiple points of view.<br />
::--[[User:BrettGiles|BrettGiles]] 01:24, 27 July 2007 (UTC)<br />
:::So, a month has gone by with no changes / additions / updates to this page. I wonder if anyone supports or disagrees with the proposal by Andrea? --[[User:BrettGiles|BrettGiles]] 20:10, 24 August 2007 (UTC)<br />
<br />
== Why this article attached to Category: Applications ==<br />
<br />
It is strange. Mostly all internal wiki articles should group around several internal special categories. For people to be able to explore the internal Wiki articles, rules, guidelines, style guides.<br />
I found these Categories:<br />
# Community<br />
# FAQ<br />
# Tutorials<br />
# Style<br />
# Reference<br />
# special category for internal documentation.<br />
<br />
Each of them seems more suitable then Category: Applications.<br />
[[User:Anton-Latukha|Anton-Latukha]] ([[User talk:Anton-Latukha|talk]])</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=HaskellWiki_talk:Guidelines&diff=62885HaskellWiki talk:Guidelines2019-04-08T19:14:38Z<p>Anton-Latukha: /* Why this article attached to Category: Applications */ new section</p>
<hr />
<div>==Headlines==<br />
<br />
[[User:BrettGiles|Brett]], this is not to offend you. It just expresses my conviction. -- [[User:Wolfgang Jeltsch|Wolfgang Jeltsch]]<br />
<br />
:Not to worry [[User:Wolfgang Jeltsch|Wolfgang]] - I never take offence :) - My thoughts on the usage of level 2 only were primarily due to the size balance, but also I note this is a guideline for the original mediawiki site. Does anyone know why that is? [[User:BrettGiles|BrettGiles]] 22:11, 25 February 2006 (UTC)<br />
<br />
::No idea, but it seems loose to me. -- [[User:Wolfgang Jeltsch|Wolfgang Jeltsch]] 22:40, 25 February 2006 (UTC)<br />
<br />
:::I did some research on [http://meta.wikimedia.org/wiki/Help_talk:Section metawiki] and this is the reason they start at level 2. ''"So that when a user without CSS, or a text-mode browser, or a screen reader visits, they'll be presented with a page that at least has a logical document flow."'' From what I can tell, by default, the page title is rendered as a '''h1''' element. So are single '''=''' headings. Double '''==''' are '''h2''' and so on. So from that point of view it does make sense to start at two equals. I think the real issue is that the software is slightly broken in not generating an '''h2''' for a single equals sign. So, what do you think? [[User:BrettGiles|BrettGiles]] 22:36, 26 February 2006 (UTC)<br />
<br />
::::I prefer starting with ==, like Wikipedia. &mdash;[[User:Ashley Y|Ashley Y]] 00:52, 2 March 2006 (UTC)<br />
<br />
::::MediaWiki is broken in not creating an '''h2''' element for a headline with single equal signs. So we should work around this brokeness by making our top-level headlines <tt>==</tt> headlines. -- [[User:Wolfgang Jeltsch|Wolfgang Jeltsch]] 21:22, 8 March 2006 (UTC)<br />
<br />
:::::I've added a respective section to [[HaskellWiki:Guidelines]]. -- [[User:Wolfgang Jeltsch|Wolfgang Jeltsch]] 21:40, 8 March 2006 (UTC)<br />
<br />
==Page nameing and renaming==<br />
Apparently the last rename I did so offended the author that they quit the wiki. I used to follow the guideline I just added on the main page, but had never got a response before. However, considering the size of the wiki and the number of new users (especially those that seem to not know there are guidelines :), perhaps it is time to re-institute the practice. --[[User:BrettGiles|BrettGiles]] 18:04, 24 July 2007 (UTC)<br />
<br />
==Official guidelines?==<br />
<br />
What makes a guidline an "official" guidline? What's the process? &mdash;[[User:Ashley Y|Ashley Y]] 00:53, 2 March 2006 (UTC)<br />
<br />
:I suspect that a large number of the users / contributors to this site have an academic background. My experience with academia is that some kind of consensus based decision is the most successful at getting the decision adopted. In a community like this wiki, I’m not sure how to get that. We could send out notices on the various mailing lists, we could post notes on the front page and probably other ways as well. Like any consensus decision making it would take a long time. On the other hand, those that support the site could just be autocratic, set the rule and if there is too much flak, change it. <br />
:As to what actually makes an "official" guidline, I think it is up to the '''sysops''' whether we even have such a thing. The word tends to mean that '''sysops''' are required to monitor and fix the site when these are not followed. So, we might not even want to go to official, stay at unofficial and just let the community grow the guidlines (and do the monitoring) as they see fit. Sometimes, a certain level of anarchy works fine.[[User:BrettGiles|BrettGiles]] 16:06, 2 March 2006 (UTC)<br />
::I think that the traditional consensus decision making process is too slow. In addition, not everyone is interested in every guideline question, so it is not sensible to bother everyone with each of these questions. Therefore, it seems to me that it is better to follow your “autocracy approach”. And I also doubt that it is sensible to let some sysops decide about what guidelines are official. At least, this would be a bit contrary to the wiki approach, in my opinion. So I propose that users are allowed to set up guidelines they think of as sensible—possibly after some short discussion—and to declare them as official. If others don't like the guidelines, the guidelines can still be discussed (again) and changed. -- [[User:Wolfgang Jeltsch|Wolfgang Jeltsch]] 21:22, 8 March 2006 (UTC)<br />
:::So this is how all that started: with an abuse of power. Is this real? I mean, it really started this way? Do you understand that autocracy is against the very wiki philosophy and is an abuse against every other wiki users? Users are not allowed to set rules and start enforcing them '''against''' other users. This is going to create tension and, in the long run, to destroy a wiki. --AndreaRossato Thu 26 2007 07:37 CEST <br />
<br />
OK. Everything on the project page should have a consensus. Everything else should be discussed here. I have adjusted the project page accordingly. &mdash;[[User:Ashley Y|Ashley Y]] 06:34, 3 March 2006 (UTC)<br />
:What is the “project page”? -- [[User:Wolfgang Jeltsch|Wolfgang Jeltsch]] 21:22, 8 March 2006 (UTC)<br />
::Ah, I understand. It's [[HaskellWiki:Guidelines]]. -- [[User:Wolfgang Jeltsch|Wolfgang Jeltsch]] 21:40, 8 March 2006 (UTC)<br />
<br />
== Headlines ==<br />
<br />
Each page has a title which is automatically shown near the top of the page (currently centered and in red). So it doesn't make sense to put a single level 1 headline at the top of the page. (At the moment, you will see this on a few pages of this wiki. I suppose this is a legacy which comes from copying content from the old website to this wiki.)<br />
<br />
For top-level structuring, level 1 headlines (the ones denoted by single equals signs in the source) should be used. It is semantically incorrect to use level 2 headlines for this purpose. I note that level 1 headlines are currently rather large and in a font that might not be preferable. However, the solution to this is not to refrain from using level 1 headlines but to change the stylesheet appropriately. ''&mdash;says [[User:Wolfgang Jeltsch]]''<br />
<br />
:Note that is is currently being discussed. Level 1 (single equals) create an <tt>&lt;h1&gt;</tt> heading, which is the same level as the page title. We may (or may not:) switch to the standard of starting at level 2 (double equals) ''&mdash;says [[User:BrettGiles]]''<br />
<br />
:I prefer starting with level 2 as per Wikipedia. I consider the page title to be implicitly level 1 (whether it gets rendered as h1 or not). &mdash;[[User:Ashley Y|Ashley Y]] 06:34, 3 March 2006 (UTC)<br />
<br />
<br />
== Wiki Bug ==<br />
Try giving a Haskell example with the greater than or the less than sign in it. It clashes with the HTML rendering! -- says [[User:Uchchwhash|Pirated Dreams]] 04:22, 17 March 2006 (UTC)<br />
<br />
==What's Wrong With This Guidelines==<br />
I don't know how these guidelines became the official policy of this<br />
wiki, and I cannot find out how some of us decided to become the<br />
housekeepers, or the [http://c2.com/cgi/wiki?WikiGnome WikiGnome]s, of<br />
it.<br />
<br />
Still I've been involved in wiki development and I have some ideas on<br />
how a wiki should work. I tried to discuss it in the<br />
[http://www.haskell.org/pipermail/haskell-cafe/2007-July/029483.html haskell-café mailing list].<br />
<br />
Probably I should do that here too.<br />
<br />
[http://c2.com/cgi/wiki?HowToReactToGoodStyleTransgressions How to react to good style transgression?]<br />
Are these guidelines to be enforced? Aren't enforced guidelines a<br />
contradiction in terms? Sure they are...<br />
<br />
I don't think that cosmetic edits done just to enforce these<br />
guidelines are going to help this wiki on the long run. They piss<br />
newcomers off. And you are pissing off right the newcomers you would<br />
like to keep, the ones who are willing to contribute.<br />
<br />
You could start by reading the new material, and doing some edit to<br />
the content: if you collaborate the wiki way, probably your<br />
suggestions on style issues would be followed.<br />
<br />
Anyway, these are guidelines, or they pretend they are, and so you<br />
must always think about the fact the new paths can be taken.<br />
<br />
Instead what you have been doing, in the last year, is setting some<br />
rules here and reshaping the wiki in the name of those rules.<br />
<br />
===And what is right with these guidelines===<br />
So, when the "new" wiki started, there was quite a bit of discussion here and on the haskell list about what was an appropriate style for the wiki pages, considering that the site is meant to be a resource for new and exsisting Haskell users and as such, were desired to be reasonably consistent in style. These guidelines were the result. A variety of people edit pages to bring them into the stylistic guidelines, but if anyone is not happy with the current set, this seems like a good place to discuss changing them. <br />
<br />
Andrea's link to the "WikiGnome" has some very good pointers about stylistic changes being applied to the edits of ''newcomers'', and an even better one about dropping the argument. So this is my last edit about the subject :) --[[User:BrettGiles|BrettGiles]] 21:58, 25 July 2007 (UTC)<br />
:: Style consistency is not a primary goal of a wiki. A primary goal for the wiki is always having new people contribute to its grow. If you think that stylistic guidelines enforcing must be achieved regardless of the consequences than you should drop wiki technologies and use a blogging system or other kind of content management system. Instead, if you want to use a wiki, you should try to stick to the wiki philosophy as close as you can. --AndreaRossato Thu 26 2007 07:27 CEST <br />
::I'd like to point out that I perfectly understand that your effort has the aim of improving this wiki, and I don't think that style consistency is something to be avoided. What I'm trying to do, is to help finding a better way of achieving style consistency without increasing the level of entrance for new users. I hope you - Brett - will understand I have nothing against you. But I would like you to take into account the perspective of a newcomer too. I hope you are going to keep on discussing these issues with me. That would be helpful for the both of us. Thanks for your kind attention. --AndreaRossato Jul 26 2007 10:05 CEST. <br />
<br />
And that leads me to the second real big issue: page renaming.<br />
<br />
==Why Page Renaming Is BAD==<br />
<br />
Wiki page titles are [http://en.wikipedia.org/wiki/URI URI], and URI<br />
should be persistent over time.<br />
<br />
When you rename a page you are breaking URI persistence. A redirect is<br />
only a work around. A bad work around.<br />
<br />
You did a lot of page renaming lately. By doing so you broke a lot of<br />
URIs.<br />
<br />
You are also breaking the structure of this wiki, by creating a false<br />
hierarchy of pages. In a wiki all pages are top level. Hierarchy and<br />
wikis belong to different worlds.<br />
<br />
I do believe that this guidelines are used in a vary bad way. I would<br />
like you to stop. Or at least to seriously reconsider your way of<br />
enforcing them.<br />
<br />
AndreaRossato -- Wed Jul 25 2007 19:57 CEST<br />
<br />
==Amendments Proposal==<br />
I think we could use this episode to grow, as a community. I suggest some amendments to these guidelines:<br />
# enforcement: we could add something about guidelines enforcement, referring to the link I [http://c2.com/cgi/wiki?HowToReactToGoodStyleTransgressions posted before]: the user's Talk page should be used instead, and cosmetic edits should be ruled out as a bad practice, incompatible with the wiki philosopy;<br />
# page renaming: we should adopt a very strict policy for page renaming. URI persistence should be the primary goal, so page names can be changed only within a very short period of time after page creation, only for compelling reasons and only with the consent of other users, after discussing it in the talk page;<br />
# wiki structure: I'm not going to make any proposal but I would like you to think about the fact that wiki pages should be all top level, with no (false) hierarchy. This means that using slashes in page name should be considered bad practice.<br />
I hope I will find people willing to discuss these proposals.<br />
Thanks for your kind attention. --AndreaRossato Jul 26 07:59 CEST<br />
:Hi Andrea - I would suggest a few things regarding this:<br />
:*Let's move the proposal to the top of the page so that it is the first thing people see. We may even want to group the rest (or some of the rest) as old history.<br />
:*It might be a good idea to advertise this on the haskell mailing list and the haskell-cafe lists. Some people who are interested might not check the latest changes. Previous discussions seemed effective to me because there were multiple points of view.<br />
::--[[User:BrettGiles|BrettGiles]] 01:24, 27 July 2007 (UTC)<br />
:::So, a month has gone by with no changes / additions / updates to this page. I wonder if anyone supports or disagrees with the proposal by Andrea? --[[User:BrettGiles|BrettGiles]] 20:10, 24 August 2007 (UTC)<br />
<br />
== Why this article attached to Category: Applications ==<br />
<br />
It is strange. Mostly all internal wiki articles should group around several internal special categories. For people to be able to explore the internal Wiki articles, rules, guidelines, style guides.<br />
I found these Categories:<br />
# Community<br />
# FAQ<br />
# Tutorials<br />
# Style<br />
# Reference<br />
# create special category for internal documentation.<br />
<br />
Each of them seems more suitable then Category: Applications.<br />
[[User:Anton-Latukha|Anton-Latukha]] ([[User talk:Anton-Latukha|talk]])</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=Programming_guidelines&diff=62884Programming guidelines2019-04-08T18:47:13Z<p>Anton-Latukha: /* File Format */ fx -> Solaris</p>
<hr />
<div>Programming guidelines shall help to make the code of a project better<br />
readable and maintainable by the varying number of contributors.<br />
<br />
It takes some programming experience to develop something like a<br />
personal "coding style" and guidelines only serve as rough shape for<br />
code. Guidelines should be followed by all members working on the<br />
project even if they prefer (or are already used to) different<br />
guidelines.<br />
<br />
These guidelines have been originally set up for the [http://hets.eu hets-project] and are<br />
now put on the [http://haskell.org/haskellwiki/ HaskellWiki] gradually<br />
integrating parts of the old hawiki<br />
entries [http://haskell.org/haskellwiki/Things_to_avoid ThingsToAvoid] and<br />
HaskellStyle (hopefully not<br />
hurting someone's copyrights). The other related entry<br />
TipsAndTricks treats more<br />
specific points that are left out here,<br />
<br />
Surely some style choices are a bit arbitrary (or "religious") and<br />
too restrictive with respect to language extensions. Nevertheless I hope<br />
to keep up these guidelines (at least as a basis) for our project<br />
in order to avoid maintaining diverging guidelines. Of course I want<br />
to supply - partly tool-dependent - reasons for certain decisions and<br />
also show alternatives by possibly bad examples. At the time of<br />
writing I use ghc-6.4.1, haddock-0.7 and (GNU-) emacs with the latest<br />
[http://www.haskell.org/haskell-mode/ haskell mode].<br />
<br />
The following quote and links are taken from<br />
HaskellStyle:<br />
<br />
We all have our own ideas about good Haskell style. There's More Than<br />
One Way To Do It. But some ways are better than others.<br />
<br />
Some comments from the GHC team about their internal coding<br />
standards can be found at<br />
http://hackage.haskell.org/trac/ghc/wiki/WorkingConventions<br />
<br />
Also http://research.microsoft.com/~simonpj/papers/haskell-retrospective/<br />
contains some brief comments on syntax and style,<br />
<br />
What now follows are descriptions of program documentation, file<br />
format, naming conventions and good programming practice (adapted form<br />
Matt's C/C++ Programming Guidelines and the Linux kernel coding<br />
style).<br />
<br />
<br />
=== Documentation ===<br />
<br />
<br />
Comments are to be written in application terms (i.e. user's point of<br />
view). Don't use technical terms - that's what the code is for!<br />
<br />
Comments should be written using correct spelling and grammar in complete<br />
sentences with punctation (in English only).<br />
<br />
"Generally, you want your comments to tell WHAT your code does, not HOW.<br />
Also, try to avoid putting comments inside a function body: if the<br />
function is so complex that you need to separately comment parts of it,<br />
you should probably" (... decompose it)<br />
<br />
Put a [http://haskell-haddock.readthedocs.io/en/latest/markup.html haddock comment] on top of every exported function and data type!<br />
Make sure haddock accepts these comments.<br />
<br />
=== File Format ===<br />
<br />
<br />
All Haskell source files start with a haddock header of the form:<br />
<br />
<haskell><br />
{- |<br />
Module : <File name or $Header$ to be replaced automatically><br />
Description : <optional short text displayed on contents page><br />
Copyright : (c) <Authors or Affiliations><br />
License : <license><br />
<br />
Maintainer : <email><br />
Stability : unstable | experimental | provisional | stable | frozen<br />
Portability : portable | non-portable (<reason>)<br />
<br />
<module description starting at first column><br />
-}<br />
</haskell><br />
<br />
A possible compiler pragma (like {-# LANGUAGE CPP #-}) may precede<br />
this header. The following hierarchical module name must, of course,<br />
match the file name.<br />
<br />
Make sure that the description is changed to meet the module (if the<br />
header was copied from elsewhere). Insert your email address as maintainer.<br />
<br />
Try to write portable (Haskell98) code. If you use e.g. multi-parameter<br />
type classes and functional dependencies the code becomes<br />
"non-portable (MPTC with FD)".<br />
<br />
The \$Header\$ entry will be automatically expanded.<br />
<br />
Lines should not be longer than 80 (preferably 75)<br />
characters to avoid wrapped lines (for casual readers)!<br />
<br />
Don't leave trailing white space in your code in every line.<br />
<br />
Expand all your tabs to spaces to avoid the danger of wrongly expanding<br />
them (or a different display of tabs versus eight spaces). Possibly put<br />
something like the following in your ~/.emacs file.<br />
<br />
(custom-set-variables '(indent-tabs-mode nil))<br />
<br />
The last character in your file should be a newline! Under Solaris<br />
you'll get a warning if this is not the case and sometimes last lines<br />
without newlines are ignored (i.e. "#endif" without newline). Emacs<br />
usually asks for a final newline.<br />
<br />
You may use http://hackage.haskell.org/package/scan to check your file format.<br />
<br />
The whole module should not be too long (about 400 lines)<br />
<br />
Please have a look at the [http://haskell-haddock.readthedocs.io/en/latest/markup.html#the-module-description Haddock module header documentation].<br />
<br />
=== Naming Conventions ===<br />
<br />
<br />
In Haskell types start with capital and functions with lowercase<br />
letters, so only avoid infix identifiers! Defining symbolic infix<br />
identifiers should be left to library writers only.<br />
<br />
(The infix identifier "\\" at the end of a line causes cpp preprocessor<br />
problems.)<br />
<br />
Names (especially global ones) should be descriptive and if you need<br />
long names write them as mixed case words (aka camelCase). (but "tmp"<br />
is to be preferred over "thisVariableIsATemporaryCounter")<br />
<br />
Also in the standard libraries, function names with multiple words are<br />
written using the camelCase convention. Similarly, type, typeclass and<br />
constructor names are written using the StudlyCaps convention.<br />
<br />
Some parts of our code use underlines (without unnecessary uppercase<br />
letters) for long identifiers to better reflect names given with<br />
hyphens in the requirement documentation. Also such names should be<br />
transliterated to camlCase identifiers possibly adding a (consistent)<br />
suffix or prefix to avoid conflicts with keywords. However, instead of<br />
a recurring prefix or suffix you may consider to use qualified imports<br />
and names.<br />
<br />
<br />
=== Good Programming Practice ===<br />
<br />
<br />
"Functions should be short and sweet, and do just one thing. They should<br />
fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24,<br />
as we all know), and do one thing and do that well."<br />
<br />
Most haskell functions should be at most a few lines, only case<br />
expression over large data types (that should be avoided, too) may need<br />
corresponding space.<br />
<br />
The code should be succinct (though not obfuscated), readable and easy to<br />
maintain (after unforeseeable changes). Don't exploit exotic language<br />
features without good reason.<br />
<br />
It's not fixed how deep you indent (4 or 8 chars). You can break the<br />
line after "do", "let", "where", and "case .. of". Make sure that<br />
renamings don't destroy your layout. (If you get too far to the right,<br />
the code is unreadable anyway and needs to be decomposed.)<br />
<br />
Bad:<br />
<haskell><br />
case foo of Foo -> "Foo"<br />
Bar -> "Bar"<br />
</haskell><br />
<br />
Good:<br />
<haskell><br />
case <longer expression> of<br />
Foo -> "Foo"<br />
Bar -> "Bar"<br />
</haskell><br />
<br />
Avoid the notation with braces and semicolons since the layout rule<br />
forces you to properly align your alternatives.<br />
<br />
Respect compiler warnings. Supply type signatures, avoid shadowing and<br />
unused variables. Particularly avoid non-exhaustive and<br />
overlapping patterns. Missing unreachable cases can be filled in using<br />
"error" with a fixed string "<ModuleName>.<function>" to indicate the<br />
error position (in case the impossible should happen). Don't invest<br />
time to "show" the offending value, only do this temporarily when<br />
debugging the code.<br />
<br />
Don't leave unused or commented-out code in your files! Readers don't<br />
know what to think of it.<br />
<br />
<br />
==== Partial functions ====<br />
<br />
For partial functions do document their preconditions (if not obvious)<br />
and make sure that partial functions are only called when<br />
preconditions are obviously fulfilled (i.e. by a case statement or a<br />
previous test). Particularly the call of "head" should be used with<br />
care or (even better) be made obsolete by a case statement.<br />
<br />
Usually a case statement (and the import of isJust and fromJust from<br />
Data.[[Maybe]]) can be avoided by using the "maybe" function:<br />
<br />
<haskell><br />
maybe (error "<ModuleName>.<function>") id $ Map.lookup key map<br />
</haskell><br />
<br />
Generally we require you to be more explicit about failure<br />
cases. Surely a missing (or an irrefutable) pattern would precisely<br />
report the position of a runtime error, but these are not so obvious<br />
when reading the code.<br />
<br />
==== Let or where expressions ====<br />
<br />
Do avoid mixing and nesting "let" and "where". (I prefer the<br />
expression-stylistic "let".) Use auxiliary top-level functions that<br />
you do not export. Export lists also support the detection of unused<br />
functions.<br />
<br />
<br />
==== Code reuse ====<br />
<br />
If you notice that you're doing the same task again, try to generalize<br />
it in order to avoid duplicate code. It is frustrating to change the<br />
same error in several places.<br />
<br />
<br />
==== Application notation ====<br />
<br />
Many parentheses can be eliminated using the infix application operator <code>$</code><br />
with lowest priority. Try at least to avoid unnecessary parentheses in<br />
standard infix expression.<br />
<br />
<haskell><br />
f x : g x ++ h x<br />
<br />
a == 1 && b == 1 || a == 0 && b == 0<br />
</haskell><br />
<br />
Rather than putting a large final argument in parentheses (with a<br />
distant closing one) consider using <code>$</code> instead.<br />
<br />
<code>f (g x)</code> becomes <code>f $ g x</code> and consecutive applications<br />
<code>f (g (h x))</code> can be written as <code>f $ g $ h x</code> or <code>f . g $ h x</code>.<br />
<br />
A function definition like<br />
<code>f x = g $ h x</code> can be abbreviated to <code>f = g . h</code>.<br />
<br />
Note that the final argument may even be an infix- or case expression:<br />
<br />
<haskell><br />
map id $ c : l<br />
<br />
filter (const True) . map id $ case l of ...<br />
</haskell><br />
<br />
However, be aware that $-terms cannot be composed further in infix<br />
expressions.<br />
<br />
Probably wrong:<br />
<haskell><br />
f $ x ++ g $ x<br />
</haskell><br />
<br />
But the scope of an expression is also limited by the layout rule, so<br />
it is usually safe to use "$" on right hand sides.<br />
<br />
Ok:<br />
<br />
<haskell><br />
do f $ l<br />
++<br />
do g $ l<br />
</haskell><br />
<br />
Of course <code>$</code> can not be used in types. GHC has also some primitive<br />
functions involving the kind <code>#</code> that cannot be applied using <code>$</code>.<br />
<br />
Last warning: always leave spaces around <code>$</code> (and other mixfix<br />
operators) since a clash with template haskell is possible.<br />
<br />
(Also write <code>\ t</code> instead of <code>\t</code> in lambda expressions)<br />
<br />
==== List Comprehensions ====<br />
<br />
Use these only when "short and sweet". Prefer map, filter, and foldr!<br />
<br />
Instead of:<br />
<br />
<haskell><br />
[toUpper c | c <- s]<br />
</haskell><br />
<br />
write:<br />
<br />
<haskell><br />
map toUpper s<br />
</haskell><br />
<br />
Consider:<br />
<br />
<haskell><br />
[toUpper c | s <- strings, c <- s]<br />
</haskell><br />
<br />
Here it takes some time for the reader to find out which value depends<br />
on what other value and it is not so clear how many times the interim<br />
values s and c are used. In contrast to that the following can't be clearer:<br />
<br />
<haskell><br />
map toUpper (concat strings)<br />
</haskell><br />
<br />
<br />
When using higher order functions you can switch easier to data<br />
structures different from list. Compare:<br />
<br />
<haskell><br />
map (1+) list<br />
</haskell><br />
<br />
and:<br />
<br />
<haskell><br />
Set.map (1+) set<br />
</haskell><br />
<br />
<br />
==== Types ====<br />
<br />
Prefer proper data types over type synonyms or tuples even if you have<br />
to do more constructing and unpacking. This will make it easier to<br />
supply class instances later on. Don't put class constraints on<br />
a data type, constraints belong only to the functions that manipulate<br />
the data.<br />
<br />
Using type synonyms consistently is difficult over a longer time,<br />
because this is not checked by the compiler. (The types shown by<br />
the compiler may be unpredictable: i.e. FilePath, String or [Char])<br />
<br />
Take care if your data type has many variants (unless it is an<br />
enumeration type.) Don't repeat common parts in every variant since<br />
this will cause code duplication.<br />
<br />
Bad (to handle arguments in sync):<br />
<br />
<haskell><br />
data Mode f p = Box f p | Diamond f p<br />
</haskell><br />
<br />
Good (to handle arguments only once):<br />
<br />
<haskell><br />
data BoxOrDiamond = Box | Diamond<br />
<br />
data Mode f p = Mode BoxOrDiamond f p<br />
</haskell><br />
<br />
<br />
Consider (bad):<br />
<br />
<haskell><br />
data Tuple a b = Tuple a b | Undefined<br />
</haskell><br />
<br />
versus (better):<br />
<br />
<haskell><br />
data Tuple a b = Tuple a b<br />
</haskell><br />
<br />
and using:<br />
<br />
<haskell><br />
Maybe (Tuple a b)<br />
</haskell><br />
<br />
(or another monad) whenever an undefined result needs to be propagated<br />
<br />
<br />
==== Records ====<br />
<br />
For (large) records avoid the use of the constructor directly and<br />
remember that the order and number of fields may change.<br />
<br />
Take care with (the rare case of) depend polymorphic fields:<br />
<br />
<haskell><br />
data Fields a = VariantWithTwo<br />
{ field1 :: a<br />
, field2 :: a }<br />
</haskell><br />
<br />
The type of a value v can not be changed by only setting field1:<br />
<br />
<haskell><br />
v { field1 = f }<br />
</haskell><br />
<br />
Better construct a new value:<br />
<br />
<haskell><br />
VariantWithTwo { field1 = f } -- leaving field2 undefined<br />
</haskell><br />
<br />
Or use a polymorphic element that is instantiated by updating:<br />
<br />
<haskell><br />
empty = VariantWithTwo { field1 = [], field2 = [] }<br />
<br />
empty { field1 = [f] }<br />
</haskell><br />
<br />
Several variants with identical fields may avoid some code duplication<br />
when selecting and updating, though possibly not in a few<br />
depended polymorphic cases.<br />
<br />
However, I doubt if the following is a really good alternative to the<br />
above data Mode with data BoxOrDiamond.<br />
<br />
<haskell><br />
data Mode f p =<br />
Box { formula :: f, positions :: p }<br />
| Diamond { formula :: f, positions :: p }<br />
</haskell><br />
<br />
<br />
==== IO ====<br />
<br />
Try to strictly separate IO, Monad and pure (without do) function<br />
programming (possibly via separate modules).<br />
<br />
Bad:<br />
<br />
<haskell><br />
x <- return y<br />
...<br />
</haskell><br />
<br />
Good:<br />
<br />
<haskell><br />
let x = y<br />
...<br />
</haskell><br />
<br />
Don't use <code>Prelude.interact</code> and make sure your program does not depend<br />
on the (not always obvious) order of evaluation. E.g. don't read and<br />
write to the same file:<br />
<br />
This will fail:<br />
<br />
<haskell><br />
do s <- readFile f<br />
writeFile f $ 'a' : s<br />
</haskell><br />
<br />
because of lazy IO! (Writing is starting before reading is finished.)<br />
<br />
==== Trace ====<br />
<br />
Tracing is for debugging purposes only and should not be used as<br />
feedback for the user. Clean code is not cluttered by trace calls.<br />
<br />
<br />
==== Imports ====<br />
<br />
Standard library modules like Char. List, Maybe, Monad, etc should be<br />
imported by their hierarchical module name, i.e. the base package (so<br />
that haddock finds them):<br />
<br />
<haskell><br />
import Data.List<br />
import Control.Monad<br />
import System.Environment<br />
</haskell><br />
<br />
The libraries for Set and Map are to be imported qualified:<br />
<br />
<haskell><br />
import qualified Data.Set as Set<br />
import qualified Data.Map as Map<br />
</haskell><br />
<br />
<br />
==== Glasgow extensions and Classes ====<br />
<br />
[[Use of language extensions|Stay away from extensions]] as long as possible. Also use classes with<br />
care because soon the desire for overlapping instances (like for lists<br />
and strings) may arise. Then you may want MPTC (multi-parameter type<br />
classes), functional dependencies (FD), undecidable and possibly incoherent<br />
instances and then you are "in the wild" (according to SPJ).<br />
<br />
=== Style in other languages ===<br />
<br />
* [http://www.cs.caltech.edu/~cs20/a/style.html OCaml style]<br />
<br />
=== Final remarks ===<br />
<br />
Despite guidelines, writing "correct code" (without formal proof<br />
support yet) still remains the major challenge. As motivation to<br />
follow these guidelines consider the points that are from the "C++<br />
Coding Standard", where I replaced "C++" with "Haskell".<br />
<br />
Good Points:<br />
<br />
* programmers can go into any code and figure out what's going on<br />
<br />
* new people can get up to speed quickly<br />
<br />
* people new to Haskell are spared the need to develop a personal style and defend it to the death<br />
<br />
* people new to Haskell are spared making the same mistakes over and over again<br />
<br />
* people make fewer mistakes in consistent environments<br />
<br />
* programmers have a common enemy :-)<br />
<br />
Bad Points:<br />
<br />
* the standard is usually stupid because it was made by someone who doesn't understand Haskell<br />
<br />
* the standard is usually stupid because it's not what I do<br />
<br />
* standards reduce creativity<br />
<br />
* standards are unnecessary as long as people are consistent<br />
<br />
* standards enforce too much structure<br />
<br />
* people ignore standards anyway<br />
<br />
== Another guidelines ==<br />
<br />
[https://www.cse.unsw.edu.au/~cs3161/15s2/StyleGuide.html COMP3161]<br />
<br />
[http://docs.ganeti.org/ganeti/2.13/html/dev-codestyle.html#haskell Google's ganeti]<br />
<br />
[https://kowainik.github.io/posts/2019-02-06-style-guide kowainik]<br />
<br />
[https://github.com/Soostone/style-guides/blob/master/haskell-style-guide.md Soostone]<br />
<br />
[https://github.com/tibbe/haskell-style-guide/blob/master/haskell-style.md tibbe]<br />
<br />
[https://github.com/tweag/guides/blob/master/style/Haskell.md tweag]<br />
<br />
[[Category:Style]]</div>Anton-Latukhahttps://wiki.haskell.org/index.php?title=GHC/Stand-alone_deriving_declarations&diff=62812GHC/Stand-alone deriving declarations2019-03-05T20:37:00Z<p>Anton-Latukha: fx option name 'XUndecidableInstances'</p>
<hr />
<div>[[Category:GHC|Stand-alone deriving declarations]]<br />
<br />
{{GHCUsersGuide|glasgow_exts|stand-alone-deriving-declarations| a StandaloneDeriving section}}<br />
<br />
''This page is from an early point in the life of the stand-alone deriving mechanism. Please see the linked documentation for an up-to-date account of the present situation.''<br />
<br />
== Standalone deriving ==<br />
<br />
This page mentions points that may not be immediately obvious from the manual.<br />
<br />
== Deriving data types with non-standard contexts ==<br />
<br />
In Haskell 98, and GHC, you can't say this<br />
<haskell><br />
data T m = MkT (m Int) deriving Eq<br />
</haskell><br />
because the instance declaration would have a non-standard context. It would have to look like this:<br />
<haskell><br />
instance Eq (m Int) => Eq (T m) where ...<br />
</haskell><br />
Of course, you can write the instance manually, but then you have to write<br />
all that tiresome code for equality. Standalone deriving lets you supply the context yourself, but have GHC write the code:<br />
<haskell><br />
data T m = MkT (m Int)<br />
deriving instance Eq (m Int) => Eq (T m)<br />
</haskell><br />
Of course, you'll need to add the flags <hask>-XFlexibleContexts</hask> and <hask>-XUndecidableInstances</hask> to allow this instance declaration, but that's fair enough.<br />
<br />
The same applies to data type declarations involving type functions.<br />
<br />
== Variations (not implemented) ==<br />
<br />
This section collects some un-implemented ideas.<br />
<br />
=== Interaction with "newtype-deriving" ===<br />
<br />
GHC's "newtype deriving mechanism" (see [http://www.haskell.org/ghc/dist/current/docs/users_guide/deriving.html#newtype-deriving]) should obviously work in a standalone deriving setting too. But perhaps it can be generalised a little. Currently you can only say<br />
<haskell><br />
deriving instance C a Foo<br />
</haskell><br />
(where Foo is the newtype), and get an instance for <hask>(C a Foo)</hask>. But what if you want an instance for <hask>C Foo a</hask>, where the new type is not the last parameter? You can't do that at the moment. However, even with the new instance-like syntax, it's not clear to me how to signal the type to be derived. Consider<br />
<haskell><br />
newtype Foo = F Int<br />
newtype Bar = B Bool<br />
deriving instance C Foo Bar<br />
</haskell><br />
Which of these thee instances do we want?<br />
<haskell><br />
instance C Foo Bool => C Foo Bar<br />
instance C Int Bar => C Foo Bar<br />
instance C Int Bool => C Foo Bar<br />
</haskell><br />
The obvious way to signal this is to give the instance context (just as above). This is perhaps another reason for having an explicit instance context in a standalone deriving declaration.<br />
<br />
Incidentally, notice that the third of the alternatives in the previous bullet unwraps two newtypes simultaneously. John Meacham suggested this example:<br />
<haskell><br />
class SetLike m k where <br />
instance SetLike IntSet Int where<br />
<br />
newtype Id = Id Int<br />
newtype IdSet = IdSet IntSet<br />
deriving instance SetLike IntSet Int => SetLike IdSet Id<br />
</haskell><br />
<br />
=== Duplicate instances === <br />
<br />
Suppose two modules, M1 and M2 both contain an identical standalone deriving declaration<br />
<haskell><br />
deriving Show T<br />
</haskell><br />
Then, can you import M1 and M2 into another module X and use show on values of type T, or will you get an overlapping instance error? Since both instances are derived in the very same way, their code must be identical, so arguably we can choose either. (There is some duplicated code of course.)<br />
<br />
This situation is expected to be common, as the main use of the standalone feature is to obtain derived instances that were omitted when the data type was defined.<br />
<br />
But, that means whether or not an instance was derived is now part of the module's. Programs would be able to use this (mis)feature to perform a compile-time check and execute code differently depending on whether any given instance is derived or hand-coded:<br />
<haskell><br />
module MA(A) where<br />
data A = A deriving Show<br />
<br />
module MB(B) where<br />
data B = B deriving Show<br />
<br />
module MC where<br />
import MA<br />
import MB<br />
<br />
-- verify that the A and B Show instances were derived<br />
-- (they need to be derived to ensure the output can<br />
-- be parsed in our non-Haskell code).<br />
deriving instance Show A <br />
deriving instance Show B<br />
</haskell><br />
The writer of MC already knows that MA and MB defined instances of Show for A and B. He just wants to ensure that nobody changes either module to use a non-derived instance; if someone does try to use a non-derived instance:<br />
<haskell><br />
module MA(A) where<br />
data A = A<br />
instance Show A where<br />
show _ = "a"<br />
</haskell><br />
then they will get an overlapping instance error in MC. <br />
<br />
The result is that programs would be able to require, for any Class, not just that an instance of the class was defined for a type, but that a /derived/ instance was defined. Is this good?</div>Anton-Latukha