HaskellWiki - User contributions [en]

Talk:Structure of a Haskell project

2008-08-06T20:02:30Z

MathematicalOrchid:

Do we ''really'' want <code>runtests.sh</code>? Shell scripts are only likely to work on Unix-like operating systems. Something more portable would be better.

[[User:MathematicalOrchid|MathematicalOrchid]] 20:02, 6 August 2008 (UTC)

Cabal/FAQ

2008-05-25T20:29:21Z

MathematicalOrchid:

[[Category:FAQ]]
== What is this hidden package? ==

You build a package and get a message like:

<pre>
Could not find module `Data.Map': it is a member of package
containers-0.1.0.0, which is hidden.
</pre>

This is because the package has not been updated for ghc-6.8 which has split the base package into lots of smaller packages. The package needs to be updated to say that it depends on these new split base packages, like containers, process and several others.

If you just want to get the package to build, add the missing package names to the build-depends: line in the .cabal file. For example given the above error message we would add the 'containers' package to the build-depends.

Developers of packages who want to know how to update their package properly so that it will continue to work with old and new compilers should see [[Upgrading_packages]].

== How do I handle extensions? ==
If your code uses some of the advanced Haskell extensions, you have a number of options.
# If you're distributing via Cabal, you can simply add <code>ghc-options: -fglasgow-exts</code> to your .cabal file.
# You can use the <code>OPTIONS_GHC</code> pragma to supply the -fglasgow-exts on a per-file basis (as opposed to in the cabal file which would apply to every file), like thus: <haskell>{-# OPTIONS_GHC -fglasgow-exts #-}</haskell>.
# The best way to do it, if you know your users are on GHC 6.8.x are the new LANGUAGE pragmas. ie, each extension has a name, and you only list those which you are using (you can find the list in [http://haskell.org/ghc/docs/latest/html/libraries/Cabal/Distribution-Extension.html Distribution.Extension]). So you can enable only those extensions you are using. Like before, you enable them in the cabal file, mention them like <code>extensions: CPP, ForeignFunctionInterface</code>.
#Of course, even better is specifying them in only the file using them. A pragma might look like: <haskell>{-# LANGUAGE CPP, ForeignFunctionInterface #-}</haskell> (You'll probably still want to use the extensions field just to make clear at the top level what extensions the project uses.)

== [Windows] I tried to install a Haskell binding to (some external C library), but I get build errors ==

Packages consisting of 100% Haskell code almost always build perfectly on Windows. However, packages which implement bindings to external C libraries (e.g., OpenSSH, libSDL, etc.) sometimes won't build on Windows without prodding.

# Check that the external C library is actually installed on your system. (Cabal does ''not'' do this for you.)
# Check the package contents, package home page, etc., to see if the author has ''told'' you how to get this package to work on Windows.

If those two fail to get you any further, proceed as follows:

* Cabal probably needs to be able to find header files in order to compile the package. In future there will be some switches for the 'configure' step to allow you to specify the path to these. For now, you'll have to manually ''hack'' the Cabal information file to tell Cabal where to look. Try adding a line in the 'library' section saying something like <code>include-dirs: "C:\\Program Files\\My External Library\\include"</code> (Note carefully the quotes and double backslashes!) Obviously the actual path varies depending on where you installed the thing.
* Cabal may also need to find object files that need to be statically linked. Again, a future Cabal release will allow you to specify these during the configure state with switches, but for now try adding <code>extra-lib-dirs: "C:\\Program Files\\My External Library\\lib"</code> or similar.
* Assuming you get your library to compile, you may still need to add DLLs or other resources to your PATH variable to get any programs ''using'' the package to actually run. (But the installer for the external library might have done this for you already.)

Talk:GHC/Type families

2007-11-11T12:55:25Z

MathematicalOrchid:

How does this change now that GHC 6.8.1 is out? [[User:MathematicalOrchid|MathematicalOrchid]] 12:55, 11 November 2007 (UTC)

In 4.3.2 Examples

module GMap (GMapKey(..), GMap(..)) where...: As before, but also exports all the data constructors GMapInt, GMapChar, GMapUnit, GMapPair, and GMapUnit.

should probably be:

module GMap (GMapKey(..), GMap(..)) where...: As before, but also exports all the data constructors GMapInt, GMapChar, GMapUnit, GMapPair, and GMapEither.

Paragraph 6 is a copy of Paragraph 4 except Collects is used.

GHC optimisations

2007-08-13T18:47:09Z

MathematicalOrchid: More details in dead code elimination.

[[Category:GHC]]

== Introduction ==

This page collects together information about the optimisations that GHC does and does ''not'' perform.

* GHC experts: Please check that the info in this page is correct.
* Everybody else: Feel free to add questions!

== General optimisations ==

=== Dead code elimination ===

Does GHC remove code that you're not actually using?

Yes and no. If there is something in a module that isn't exported and isn't used by anything that ''is'' exported, it gets ignored. (This makes your compiled program smaller.) So at the module level, yes, GHC does dead code elimination.

On the other hand, if you import a module and use just 1 function from it, ''all'' of the code for ''all'' of the functions in that module get linked in. So in this sense, no, GHC doesn't do dead code elimination.

(There is a switch to make GHC spit out a separate object file for each individual function in a module. If you use this, only the functions are actually used will get linked into your executable. But this tends to freak out the linker program...)

If you want to be warned about unused code (Why do you have it there if it's unused? Did you forget to type something?) you can use the <code>-fwarn-unused-binds</code> option (or just <code>-Wall</code>).

=== Common subexpression elimination ===

First of all, ''common subexpression elemination'' (CSE) means that if an expression appears in several places, the code is rearranged so that the value of that expression is computed only once. For example:

<haskell>
foo x = (bar x) * (bar x)
</haskell>

might be transformed into

<haskell>
foo x = let x' = bar x in x' * x'
</haskell>

thus, the <hask>bar</hask> function is only called once. (And if <hask>bar</hask> is a particularly expensive function, this might save quite a lot of work.)

GHC doesn't actually perform CSE as often as you might expect. The trouble is, performing CSE can affect the strictness/lazyness of the program. So GHC ''does'' do CSE, but only in specific circumstances --- see the GHC manual. (Section??)

Long story short: "If you care about CSE, do it by hand."

=== Inlining ===

Inlining is where a function call is replaced by that function's definition. For example, the standard <hask>map</hask> function can be defined as

<haskell>
map :: (a -> b) -> [a] -> [b]
map f [] = []
map f (x:xs) = f x : map f xs
</haskell>

Now if you write something like

<haskell>
foo = map bar
</haskell>

it's possible that the compiler might ''inline'' the definition of <hask>map</hask>, yielding something like

<haskell>
foo [] = []
foo (x:xs) = bar x : foo xs
</haskell>

which is (hopefully!) faster, because it doesn't involve a call to the <hask>map</hask> function any more, it just does the work directly. (This might also expose new optimisations opportunities; <hask>map</hask> works for ''any'' types, whereas <hask>foo</hask> probably works for only ''one'' type.)

So, that's what inlining is. By default, GHC will inline things if they are 'small enough'. Every time you inline a function, you are in a sense making a (customised) ''copy'' of that function. Do too much of this and the compiled program will be enormous. So it's only worth it for 'small' functions.

(How does GHC determine 'small'? Isn't there a switch that adjusts this?)

=== Specialisation ===

Flexibility is the enemy of performance. Take <hask>(+)</hask> for example. As you know, it adds two numbers together. However, would that be two integers? Two floating-point numbers? Two complex numbers? Two vectors? The generated machine code is very, very different in each case!

It's easy enough to make a function such as <hask>sum</hask>, which will work for ''any'' type of number. However, in the interests of performance, if it can be determined exactly which type of number we're going to be working on, the compiler can generate exactly the right machine code, without having to do lots of runtime lookups.

GHC tries to do this where possible. However (as I understand it?) this tends to work less well across module boundaries. For example, suppose you write

<haskell>
module Physics where

data Force = ...

instance Num Force where ...

resultant_force :: [Force] -> Force
resultant_force = sum
</haskell>

One might ''hope'' that <hask>resultant_force</hask> would get compiled using a special version of <hask>sum</hask> tailored to adding up only <hask>Force</hask> objects. This may or may not happen.

Generally GHC won't just take an existing function and recompile it with a new type signature. What ''might'' happen is that the function gets inlined, and specialised from there. (Can someone say something more concrete here?)

=== Strictness analysis ===

Haskell is a lazy language. Calculations are notionally not performed until their results are 'needed'. However, if the result definitely ''will'' be needed, it's a waste of time and effort to save up the expression and execute it later; more efficient to just execute it right now.

''Strictness analysis'' is a process by which GHC attempts to determine, at compile-time, which data definitely will 'always be needed'. GHC can then build code to just calculate such data, rather than the normal (higher overhead) process for storing up the calculation and executing it later.

Unfortunately, looking at a program and saying "will this data be needed?" is a bit like looking at a program and saying "this program will never halt" --- see The Halting Problem. (Good link?) But GHC does its best, and can give big speedups in some cases.

=== Fusion ===

In Haskell, it is common to write expressions such as

<haskell>
foo = length . filter p . map g . map f . concat
</haskell>

This style of writing makes it very clear what the function ''does'' (it takes a list of lists, concatenates them all, applies f to every element, applies g to every element, throws away all elements that fail p, and then calculates the length of the result). However, if executed literally, it's very inefficient.

When executed, <hask>concat</hask> takes a list of lists and constructs a flat list. Then <hask>map</hask> constructs another list. Then the second <hask>map</hask> function creates yet another list...

Since Haskell is a lazy language, these intermediate lists never exist in memory in their entirety. One element will be generated by one function, and then immediately consumed by the next function in the chain. So as each element is generated, it instantly becomes garbage. So the memory usage isn't that great, but the GC load is quite high. (Not to mention all the time wasted on creating thunks, evaluating thunks, and allocating/deallocating RAM.) So we really want to avoid all this!

The term ''fusion'' refers to program transformations aimed at removing intermediate data structures. (''Deforestation'' refers specifically to lists, but in general fusion is applicable to operations on any structure.)

The standard libraries provide a function <hask>concatMap</hask> such that

<haskell>
concatMap f = concat . map f
</haskell>

As you can see, we don't 'need' this function --- we can define it in turns of other, simpler functions. However, it's more efficient to run because it doesn't generate an intermediate list of lists. (It's also used to define the list monad, which is probably why it's there.)

Having <hask>concatMap</hask> is nice. But we really don't want to define new functions for every possible combination of list operators. (Do ''you'' fancy implementing a <hask>lengthFilterMapMapConat</hask> function?) So one of the optimisations that GHC performs is to attempt to perform fusion automatically.

One way that we could try to do this is by inlining all the function definitions. But list processing functions are generally recursive, which makes matters rather complicated. (I.e., this doesn't really work.)

Currently (GHC 6.6.1) we have build/foldr fusion. That is, where a function ''builds'' a list and passes the result to a function that ''consumes'' a list, GHC can (usually) elide the list itself. There are also other transformations that can be applied. For example, map fusion. Map fusion simply says that

<haskell>
map g . map f
</haskell>

is equivilent to

<haskell>
map (g . f)
</haskell>

(but the latter is more efficient).

All of this is implemented using GHC's ''transformation rules'' facility. See the manual. (Section??) This functionality is only turned on with -O or -O2.

In the future (GHC 6.7?) we will have ''stream fusion''. In layman's terms, this increases the number of functions that can be fused = big speedups.

To be more technical, a ''stream'' represents a traversal of a list (or, indeed, some other structure such as an array). All the list functions become stream functions --- but, crucially, stream operations are ''non-recursive'', meaning they can all be glued together. Taking our example above:

<haskell>
foo = length . filter p . map g . map f . concat
</haskell>

becomes something like

<haskell>
foo =
length .
fromStream . streamFilter p . toStream .
fromStream . streamMap g . toStream .
fromStream . streamMap f . toStream .
fromStream . concat . toStream
</haskell>

which, obviously, is massively ''less'' efficient than the original. However, since

<haskell>
toStream . fromStream = id
</haskell>

we can simplify that down to

<haskell>
foo =
length .
fromStream . streamFilter p .
streamMap g .
streamMap f .
streamConcat . toStream
</haskell>

In other words, we have a <hask>toStream</hask> at one end, and a <hask>fromStream</hask> at the other end, with a bunch of stream operations in the middle. These are all non-recursive; onto <hask>fromStream</hask> actually performs a recursive loop, so once GHC does all its inlining we'll end up with something like

<code>
foreach x in xs do
... concat ...
... map f ...
... map g ...
... filter p ...
... length ...
</code>

which is what we want.

(...Add link to papers...)

== Execution Model ==

In order to understand how to write efficient code, and what GHC does with your code to optimise it, it helps to know a bit about what your compiled code looks like and how it works.

=== Graph reduction ===

To a first approximation, at any moment your program is a 'graph' of objects in memory. ('Graph' in the graph theory sense --- nodes connected by arcs.) Some of the objects are 'data' --- booleans, integers, strings, lists, etc. Some of those objects are functions (because Haskell lets you pass functions around like data). And some of these are ''thunks'' --- unevaluated expressions (because Haskell only evaluates expressions 'as needed').

The program starts off with a single node representing the unevaluated call to <hask>main</hask>, and proceeds to execute from there. Each time a thunk is executed, the result (whatever it is) overwrites the thunk data. (It's possible that the result of evaluating a thunk is a new thunk of course.)

=== About STG ===

GHC compiles to the ''spineless tagness G-machine'' (STG). This is a notional graph reduction machine (i.e., a virtual machine that performs graph reductions as described above). 'G-machine' because it does graph reduction. 'Spineless' because it can't stand up to bullies. 'Tagless' because the graph nodes don't have 'tags' on them to say what they are.

Instead of tags, the nodes have access pointers. If the node is a thunk, its pointer points to the code to evaluate the thunk and return the real result. Otherwise the pointer points to some 'do-nothing' code. So to access any type of node, you just do an indirect jump on this pointer; no case analysis is necessary.

(Gosh I hope I got that lot right!)

Internally, GHC uses a kind of 'machine code' that runs on this non-existent G-machine. It does a number of optimisations on that representation, before finally compiling it into ''real'' machine code (possibly via C using GCC).

=== STG optimisations ===

There are a number of optimisations done at the STG level. These mainly involve trying to avoid unnecessary steps. For example, avoid creating a thunk which immediately creates another thunk when executed; make it evaluate all the way down to a final result in one go. (If we 'need' the thunk's value, we're going to evaluate all the way down anyway, so let's leave out the overhead...)

=== Primitive data types ===

Haskell-98 provides some standard types such as <hask>Int</hask>, etc. GHC defines these as 'boxed' versions of GHC-specific 'unboxed' types:

<haskell>
-- From GHC.Exts:
data Int = I# Int#
data Word = W# Word#
data Double = D# Double#
-- etc.
</haskell>

Here <hask>Int#</hask> is a GHC-specific internal type representing, literally, a plain ordinary bundle of 32 or 64 bits inside the computer somewhere. (Depending on whether it's a 32 or 64-bit architecture.)

In particular, a <hask>Int#</hask> is strict, whereas a <hask>Int</hask> isn't.

=== Algebraic data types ===

(I'm not sure about the basic memory layout. Somebody fill in the general case?)

There are a few special cases:

==== Types with 1 constructor ====

If a function returns a tuple of values, and the caller immediately takes the tuple apart again, GHC will attempt to eliminate the tuple completely at the machine code level. Actually, this works for ''all'' types having exactly 1 constructor.

==== Constructors with no fields ====

Booleans are a good example:

<haskell>
data Bool = False | True
</haskell>

GHC will construct a single object in memory representing <hask>False</hask>, and another representing <hask>True</hask>. All <hask>Bool</hask> values are thus pointers to one or the other of these objects. (And hence, consume either 32 or 64 bits.)

GHC optimisations

2007-08-13T18:32:33Z

MathematicalOrchid: Fixed note about tuples.

[[Category:GHC]]

== Introduction ==

This page collects together information about the optimisations that GHC does and does ''not'' perform.

* GHC experts: Please check that the info in this page is correct.
* Everybody else: Feel free to add questions!

== General optimisations ==

=== Dead code elimination ===

Does GHC remove code that you're not actually using?

Yes and no. If there is something in a module that isn't exported and isn't used by anything that ''is'' exported, it gets ignored. (This makes your compiled program smaller.) So at the module level, yes, GHC does dead code elimination.

On the other hand, if you import a module and use just 1 function from it, ''all' of the code for ''all'' of the functions in that module get linked in. So in this sense, no, GHC doesn't do dead code elimination.

(There is a switch to make GHC spit out a separate object file for each individual function in a module. If you use this, only the functions are actually use will get linked into your executable. But this tends to freak out the linker program...)

=== Common subexpression elimination ===

First of all, ''common subexpression elemination'' (CSE) means that if an expression appears in several places, the code is rearranged so that the value of that expression is computed only once. For example:

<haskell>
foo x = (bar x) * (bar x)
</haskell>

might be transformed into

<haskell>
foo x = let x' = bar x in x' * x'
</haskell>

thus, the <hask>bar</hask> function is only called once. (And if <hask>bar</hask> is a particularly expensive function, this might save quite a lot of work.)

GHC doesn't actually perform CSE as often as you might expect. The trouble is, performing CSE can affect the strictness/lazyness of the program. So GHC ''does'' do CSE, but only in specific circumstances --- see the GHC manual. (Section??)

Long story short: "If you care about CSE, do it by hand."

=== Inlining ===

Inlining is where a function call is replaced by that function's definition. For example, the standard <hask>map</hask> function can be defined as

<haskell>
map :: (a -> b) -> [a] -> [b]
map f [] = []
map f (x:xs) = f x : map f xs
</haskell>

Now if you write something like

<haskell>
foo = map bar
</haskell>

it's possible that the compiler might ''inline'' the definition of <hask>map</hask>, yielding something like

<haskell>
foo [] = []
foo (x:xs) = bar x : foo xs
</haskell>

which is (hopefully!) faster, because it doesn't involve a call to the <hask>map</hask> function any more, it just does the work directly. (This might also expose new optimisations opportunities; <hask>map</hask> works for ''any'' types, whereas <hask>foo</hask> probably works for only ''one'' type.)

So, that's what inlining is. By default, GHC will inline things if they are 'small enough'. Every time you inline a function, you are in a sense making a (customised) ''copy'' of that function. Do too much of this and the compiled program will be enormous. So it's only worth it for 'small' functions.

(How does GHC determine 'small'? Isn't there a switch that adjusts this?)

=== Specialisation ===

Flexibility is the enemy of performance. Take <hask>(+)</hask> for example. As you know, it adds two numbers together. However, would that be two integers? Two floating-point numbers? Two complex numbers? Two vectors? The generated machine code is very, very different in each case!

It's easy enough to make a function such as <hask>sum</hask>, which will work for ''any'' type of number. However, in the interests of performance, if it can be determined exactly which type of number we're going to be working on, the compiler can generate exactly the right machine code, without having to do lots of runtime lookups.

GHC tries to do this where possible. However (as I understand it?) this tends to work less well across module boundaries. For example, suppose you write

<haskell>
module Physics where

data Force = ...

instance Num Force where ...

resultant_force :: [Force] -> Force
resultant_force = sum
</haskell>

One might ''hope'' that <hask>resultant_force</hask> would get compiled using a special version of <hask>sum</hask> tailored to adding up only <hask>Force</hask> objects. This may or may not happen.

Generally GHC won't just take an existing function and recompile it with a new type signature. What ''might'' happen is that the function gets inlined, and specialised from there. (Can someone say something more concrete here?)

=== Strictness analysis ===

Haskell is a lazy language. Calculations are notionally not performed until their results are 'needed'. However, if the result definitely ''will'' be needed, it's a waste of time and effort to save up the expression and execute it later; more efficient to just execute it right now.

''Strictness analysis'' is a process by which GHC attempts to determine, at compile-time, which data definitely will 'always be needed'. GHC can then build code to just calculate such data, rather than the normal (higher overhead) process for storing up the calculation and executing it later.

Unfortunately, looking at a program and saying "will this data be needed?" is a bit like looking at a program and saying "this program will never halt" --- see The Halting Problem. (Good link?) But GHC does its best, and can give big speedups in some cases.

=== Fusion ===

In Haskell, it is common to write expressions such as

<haskell>
foo = length . filter p . map g . map f . concat
</haskell>

This style of writing makes it very clear what the function ''does'' (it takes a list of lists, concatenates them all, applies f to every element, applies g to every element, throws away all elements that fail p, and then calculates the length of the result). However, if executed literally, it's very inefficient.

When executed, <hask>concat</hask> takes a list of lists and constructs a flat list. Then <hask>map</hask> constructs another list. Then the second <hask>map</hask> function creates yet another list...

Since Haskell is a lazy language, these intermediate lists never exist in memory in their entirety. One element will be generated by one function, and then immediately consumed by the next function in the chain. So as each element is generated, it instantly becomes garbage. So the memory usage isn't that great, but the GC load is quite high. (Not to mention all the time wasted on creating thunks, evaluating thunks, and allocating/deallocating RAM.) So we really want to avoid all this!

The term ''fusion'' refers to program transformations aimed at removing intermediate data structures. (''Deforestation'' refers specifically to lists, but in general fusion is applicable to operations on any structure.)

The standard libraries provide a function <hask>concatMap</hask> such that

<haskell>
concatMap f = concat . map f
</haskell>

As you can see, we don't 'need' this function --- we can define it in turns of other, simpler functions. However, it's more efficient to run because it doesn't generate an intermediate list of lists. (It's also used to define the list monad, which is probably why it's there.)

Having <hask>concatMap</hask> is nice. But we really don't want to define new functions for every possible combination of list operators. (Do ''you'' fancy implementing a <hask>lengthFilterMapMapConat</hask> function?) So one of the optimisations that GHC performs is to attempt to perform fusion automatically.

One way that we could try to do this is by inlining all the function definitions. But list processing functions are generally recursive, which makes matters rather complicated. (I.e., this doesn't really work.)

Currently (GHC 6.6.1) we have build/foldr fusion. That is, where a function ''builds'' a list and passes the result to a function that ''consumes'' a list, GHC can (usually) elide the list itself. There are also other transformations that can be applied. For example, map fusion. Map fusion simply says that

<haskell>
map g . map f
</haskell>

is equivilent to

<haskell>
map (g . f)
</haskell>

(but the latter is more efficient).

All of this is implemented using GHC's ''transformation rules'' facility. See the manual. (Section??) This functionality is only turned on with -O or -O2.

In the future (GHC 6.7?) we will have ''stream fusion''. In layman's terms, this increases the number of functions that can be fused = big speedups.

To be more technical, a ''stream'' represents a traversal of a list (or, indeed, some other structure such as an array). All the list functions become stream functions --- but, crucially, stream operations are ''non-recursive'', meaning they can all be glued together. Taking our example above:

<haskell>
foo = length . filter p . map g . map f . concat
</haskell>

becomes something like

<haskell>
foo =
length .
fromStream . streamFilter p . toStream .
fromStream . streamMap g . toStream .
fromStream . streamMap f . toStream .
fromStream . concat . toStream
</haskell>

which, obviously, is massively ''less'' efficient than the original. However, since

<haskell>
toStream . fromStream = id
</haskell>

we can simplify that down to

<haskell>
foo =
length .
fromStream . streamFilter p .
streamMap g .
streamMap f .
streamConcat . toStream
</haskell>

In other words, we have a <hask>toStream</hask> at one end, and a <hask>fromStream</hask> at the other end, with a bunch of stream operations in the middle. These are all non-recursive; onto <hask>fromStream</hask> actually performs a recursive loop, so once GHC does all its inlining we'll end up with something like

<code>
foreach x in xs do
... concat ...
... map f ...
... map g ...
... filter p ...
... length ...
</code>

which is what we want.

(...Add link to papers...)

== Execution Model ==

In order to understand how to write efficient code, and what GHC does with your code to optimise it, it helps to know a bit about what your compiled code looks like and how it works.

=== Graph reduction ===

To a first approximation, at any moment your program is a 'graph' of objects in memory. ('Graph' in the graph theory sense --- nodes connected by arcs.) Some of the objects are 'data' --- booleans, integers, strings, lists, etc. Some of those objects are functions (because Haskell lets you pass functions around like data). And some of these are ''thunks'' --- unevaluated expressions (because Haskell only evaluates expressions 'as needed').

The program starts off with a single node representing the unevaluated call to <hask>main</hask>, and proceeds to execute from there. Each time a thunk is executed, the result (whatever it is) overwrites the thunk data. (It's possible that the result of evaluating a thunk is a new thunk of course.)

=== About STG ===

GHC compiles to the ''spineless tagness G-machine'' (STG). This is a notional graph reduction machine (i.e., a virtual machine that performs graph reductions as described above). 'G-machine' because it does graph reduction. 'Spineless' because it can't stand up to bullies. 'Tagless' because the graph nodes don't have 'tags' on them to say what they are.

Instead of tags, the nodes have access pointers. If the node is a thunk, its pointer points to the code to evaluate the thunk and return the real result. Otherwise the pointer points to some 'do-nothing' code. So to access any type of node, you just do an indirect jump on this pointer; no case analysis is necessary.

(Gosh I hope I got that lot right!)

Internally, GHC uses a kind of 'machine code' that runs on this non-existent G-machine. It does a number of optimisations on that representation, before finally compiling it into ''real'' machine code (possibly via C using GCC).

=== STG optimisations ===

There are a number of optimisations done at the STG level. These mainly involve trying to avoid unnecessary steps. For example, avoid creating a thunk which immediately creates another thunk when executed; make it evaluate all the way down to a final result in one go. (If we 'need' the thunk's value, we're going to evaluate all the way down anyway, so let's leave out the overhead...)

=== Primitive data types ===

Haskell-98 provides some standard types such as <hask>Int</hask>, etc. GHC defines these as 'boxed' versions of GHC-specific 'unboxed' types:

<haskell>
-- From GHC.Exts:
data Int = I# Int#
data Word = W# Word#
data Double = D# Double#
-- etc.
</haskell>

Here <hask>Int#</hask> is a GHC-specific internal type representing, literally, a plain ordinary bundle of 32 or 64 bits inside the computer somewhere. (Depending on whether it's a 32 or 64-bit architecture.)

In particular, a <hask>Int#</hask> is strict, whereas a <hask>Int</hask> isn't.

=== Algebraic data types ===

(I'm not sure about the basic memory layout. Somebody fill in the general case?)

There are a few special cases:

==== Types with 1 constructor ====

If a function returns a tuple of values, and the caller immediately takes the tuple apart again, GHC will attempt to eliminate the tuple completely at the machine code level. Actually, this works for ''all'' types having exactly 1 constructor.

==== Constructors with no fields ====

Booleans are a good example:

<haskell>
data Bool = False | True
</haskell>

GHC will construct a single object in memory representing <hask>False</hask>, and another representing <hask>True</hask>. All <hask>Bool</hask> values are thus pointers to one or the other of these objects. (And hence, consume either 32 or 64 bits.)

GHC optimisations

2007-08-13T18:30:20Z

MathematicalOrchid: Added dead code elimination.

[[Category:GHC]]

== Introduction ==

This page collects together information about the optimisations that GHC does and does ''not'' perform.

* GHC experts: Please check that the info in this page is correct.
* Everybody else: Feel free to add questions!

== General optimisations ==

=== Dead code elimination ===

Does GHC remove code that you're not actually using?

Yes and no. If there is something in a module that isn't exported and isn't used by anything that ''is'' exported, it gets ignored. (This makes your compiled program smaller.) So at the module level, yes, GHC does dead code elimination.

On the other hand, if you import a module and use just 1 function from it, ''all' of the code for ''all'' of the functions in that module get linked in. So in this sense, no, GHC doesn't do dead code elimination.

(There is a switch to make GHC spit out a separate object file for each individual function in a module. If you use this, only the functions are actually use will get linked into your executable. But this tends to freak out the linker program...)

=== Common subexpression elimination ===

First of all, ''common subexpression elemination'' (CSE) means that if an expression appears in several places, the code is rearranged so that the value of that expression is computed only once. For example:

<haskell>
foo x = (bar x) * (bar x)
</haskell>

might be transformed into

<haskell>
foo x = let x' = bar x in x' * x'
</haskell>

thus, the <hask>bar</hask> function is only called once. (And if <hask>bar</hask> is a particularly expensive function, this might save quite a lot of work.)

GHC doesn't actually perform CSE as often as you might expect. The trouble is, performing CSE can affect the strictness/lazyness of the program. So GHC ''does'' do CSE, but only in specific circumstances --- see the GHC manual. (Section??)

Long story short: "If you care about CSE, do it by hand."

=== Inlining ===

Inlining is where a function call is replaced by that function's definition. For example, the standard <hask>map</hask> function can be defined as

<haskell>
map :: (a -> b) -> [a] -> [b]
map f [] = []
map f (x:xs) = f x : map f xs
</haskell>

Now if you write something like

<haskell>
foo = map bar
</haskell>

it's possible that the compiler might ''inline'' the definition of <hask>map</hask>, yielding something like

<haskell>
foo [] = []
foo (x:xs) = bar x : foo xs
</haskell>

which is (hopefully!) faster, because it doesn't involve a call to the <hask>map</hask> function any more, it just does the work directly. (This might also expose new optimisations opportunities; <hask>map</hask> works for ''any'' types, whereas <hask>foo</hask> probably works for only ''one'' type.)

So, that's what inlining is. By default, GHC will inline things if they are 'small enough'. Every time you inline a function, you are in a sense making a (customised) ''copy'' of that function. Do too much of this and the compiled program will be enormous. So it's only worth it for 'small' functions.

(How does GHC determine 'small'? Isn't there a switch that adjusts this?)

=== Specialisation ===

Flexibility is the enemy of performance. Take <hask>(+)</hask> for example. As you know, it adds two numbers together. However, would that be two integers? Two floating-point numbers? Two complex numbers? Two vectors? The generated machine code is very, very different in each case!

It's easy enough to make a function such as <hask>sum</hask>, which will work for ''any'' type of number. However, in the interests of performance, if it can be determined exactly which type of number we're going to be working on, the compiler can generate exactly the right machine code, without having to do lots of runtime lookups.

GHC tries to do this where possible. However (as I understand it?) this tends to work less well across module boundaries. For example, suppose you write

<haskell>
module Physics where

data Force = ...

instance Num Force where ...

resultant_force :: [Force] -> Force
resultant_force = sum
</haskell>

One might ''hope'' that <hask>resultant_force</hask> would get compiled using a special version of <hask>sum</hask> tailored to adding up only <hask>Force</hask> objects. This may or may not happen.

Generally GHC won't just take an existing function and recompile it with a new type signature. What ''might'' happen is that the function gets inlined, and specialised from there. (Can someone say something more concrete here?)

=== Strictness analysis ===

Haskell is a lazy language. Calculations are notionally not performed until their results are 'needed'. However, if the result definitely ''will'' be needed, it's a waste of time and effort to save up the expression and execute it later; more efficient to just execute it right now.

''Strictness analysis'' is a process by which GHC attempts to determine, at compile-time, which data definitely will 'always be needed'. GHC can then build code to just calculate such data, rather than the normal (higher overhead) process for storing up the calculation and executing it later.

Unfortunately, looking at a program and saying "will this data be needed?" is a bit like looking at a program and saying "this program will never halt" --- see The Halting Problem. (Good link?) But GHC does its best, and can give big speedups in some cases.

=== Fusion ===

In Haskell, it is common to write expressions such as

<haskell>
foo = length . filter p . map g . map f . concat
</haskell>

This style of writing makes it very clear what the function ''does'' (it takes a list of lists, concatenates them all, applies f to every element, applies g to every element, throws away all elements that fail p, and then calculates the length of the result). However, if executed literally, it's very inefficient.

When executed, <hask>concat</hask> takes a list of lists and constructs a flat list. Then <hask>map</hask> constructs another list. Then the second <hask>map</hask> function creates yet another list...

Since Haskell is a lazy language, these intermediate lists never exist in memory in their entirety. One element will be generated by one function, and then immediately consumed by the next function in the chain. So as each element is generated, it instantly becomes garbage. So the memory usage isn't that great, but the GC load is quite high. (Not to mention all the time wasted on creating thunks, evaluating thunks, and allocating/deallocating RAM.) So we really want to avoid all this!

The term ''fusion'' refers to program transformations aimed at removing intermediate data structures. (''Deforestation'' refers specifically to lists, but in general fusion is applicable to operations on any structure.)

The standard libraries provide a function <hask>concatMap</hask> such that

<haskell>
concatMap f = concat . map f
</haskell>

As you can see, we don't 'need' this function --- we can define it in turns of other, simpler functions. However, it's more efficient to run because it doesn't generate an intermediate list of lists. (It's also used to define the list monad, which is probably why it's there.)

Having <hask>concatMap</hask> is nice. But we really don't want to define new functions for every possible combination of list operators. (Do ''you'' fancy implementing a <hask>lengthFilterMapMapConat</hask> function?) So one of the optimisations that GHC performs is to attempt to perform fusion automatically.

One way that we could try to do this is by inlining all the function definitions. But list processing functions are generally recursive, which makes matters rather complicated. (I.e., this doesn't really work.)

Currently (GHC 6.6.1) we have build/foldr fusion. That is, where a function ''builds'' a list and passes the result to a function that ''consumes'' a list, GHC can (usually) elide the list itself. There are also other transformations that can be applied. For example, map fusion. Map fusion simply says that

<haskell>
map g . map f
</haskell>

is equivilent to

<haskell>
map (g . f)
</haskell>

(but the latter is more efficient).

All of this is implemented using GHC's ''transformation rules'' facility. See the manual. (Section??) This functionality is only turned on with -O or -O2.

In the future (GHC 6.7?) we will have ''stream fusion''. In layman's terms, this increases the number of functions that can be fused = big speedups.

To be more technical, a ''stream'' represents a traversal of a list (or, indeed, some other structure such as an array). All the list functions become stream functions --- but, crucially, stream operations are ''non-recursive'', meaning they can all be glued together. Taking our example above:

<haskell>
foo = length . filter p . map g . map f . concat
</haskell>

becomes something like

<haskell>
foo =
length .
fromStream . streamFilter p . toStream .
fromStream . streamMap g . toStream .
fromStream . streamMap f . toStream .
fromStream . concat . toStream
</haskell>

which, obviously, is massively ''less'' efficient than the original. However, since

<haskell>
toStream . fromStream = id
</haskell>

we can simplify that down to

<haskell>
foo =
length .
fromStream . streamFilter p .
streamMap g .
streamMap f .
streamConcat . toStream
</haskell>

In other words, we have a <hask>toStream</hask> at one end, and a <hask>fromStream</hask> at the other end, with a bunch of stream operations in the middle. These are all non-recursive; onto <hask>fromStream</hask> actually performs a recursive loop, so once GHC does all its inlining we'll end up with something like

<code>
foreach x in xs do
... concat ...
... map f ...
... map g ...
... filter p ...
... length ...
</code>

which is what we want.

(...Add link to papers...)

== Execution Model ==

In order to understand how to write efficient code, and what GHC does with your code to optimise it, it helps to know a bit about what your compiled code looks like and how it works.

=== Graph reduction ===

To a first approximation, at any moment your program is a 'graph' of objects in memory. ('Graph' in the graph theory sense --- nodes connected by arcs.) Some of the objects are 'data' --- booleans, integers, strings, lists, etc. Some of those objects are functions (because Haskell lets you pass functions around like data). And some of these are ''thunks'' --- unevaluated expressions (because Haskell only evaluates expressions 'as needed').

The program starts off with a single node representing the unevaluated call to <hask>main</hask>, and proceeds to execute from there. Each time a thunk is executed, the result (whatever it is) overwrites the thunk data. (It's possible that the result of evaluating a thunk is a new thunk of course.)

=== About STG ===

GHC compiles to the ''spineless tagness G-machine'' (STG). This is a notional graph reduction machine (i.e., a virtual machine that performs graph reductions as described above). 'G-machine' because it does graph reduction. 'Spineless' because it can't stand up to bullies. 'Tagless' because the graph nodes don't have 'tags' on them to say what they are.

Instead of tags, the nodes have access pointers. If the node is a thunk, its pointer points to the code to evaluate the thunk and return the real result. Otherwise the pointer points to some 'do-nothing' code. So to access any type of node, you just do an indirect jump on this pointer; no case analysis is necessary.

(Gosh I hope I got that lot right!)

Internally, GHC uses a kind of 'machine code' that runs on this non-existent G-machine. It does a number of optimisations on that representation, before finally compiling it into ''real'' machine code (possibly via C using GCC).

=== STG optimisations ===

There are a number of optimisations done at the STG level. These mainly involve trying to avoid unnecessary steps. For example, avoid creating a thunk which immediately creates another thunk when executed; make it evaluate all the way down to a final result in one go. (If we 'need' the thunk's value, we're going to evaluate all the way down anyway, so let's leave out the overhead...)

=== Primitive data types ===

Haskell-98 provides some standard types such as <hask>Int</hask>, etc. GHC defines these as 'boxed' versions of GHC-specific 'unboxed' types:

<haskell>
-- From GHC.Exts:
data Int = I# Int#
data Word = W# Word#
data Double = D# Double#
-- etc.
</haskell>

Here <hask>Int#</hask> is a GHC-specific internal type representing, literally, a plain ordinary bundle of 32 or 64 bits inside the computer somewhere. (Depending on whether it's a 32 or 64-bit architecture.)

In particular, a <hask>Int#</hask> is strict, whereas a <hask>Int</hask> isn't.

=== Algebraic data types ===

(I'm not sure about the basic memory layout. Somebody fill in the general case?)

There are a few special cases:

==== Types with 1 constructor ====

If a function puts a bunch of things into a type value, and the caller immediately takes the things out of the bunch again, GHC will try to eliminate the bundle type all together. (Or is that just for ''tuples''?)

==== Constructors with no fields ====

Booleans are a good example:

<haskell>
data Bool = False | True
</haskell>

GHC will construct a single object in memory representing <hask>False</hask>, and another representing <hask>True</hask>. All <hask>Bool</hask> values are thus pointers to one or the other of these objects. (And hence, consume either 32 or 64 bits.)

GHC optimisations

2007-08-12T08:43:56Z

MathematicalOrchid: Fusion goodness.

[[Category:GHC]]

== Introduction ==

This page collects together information about the optimisations that GHC does and does ''not'' perform.

* GHC experts: Please check that the info in this page is correct.
* Everybody else: Feel free to add questions!

== General optimisations ==

=== Common subexpression elimination ===

First of all, ''common subexpression elemination'' (CSE) means that if an expression appears in several places, the code is rearranged so that the value of that expression is computed only once. For example:

<haskell>
foo x = (bar x) * (bar x)
</haskell>

might be transformed into

<haskell>
foo x = let x' = bar x in x' * x'
</haskell>

thus, the <hask>bar</hask> function is only called once. (And if <hask>bar</hask> is a particularly expensive function, this might save quite a lot of work.)

GHC doesn't actually perform CSE as often as you might expect. The trouble is, performing CSE can affect the strictness/lazyness of the program. So GHC ''does'' do CSE, but only in specific circumstances --- see the GHC manual. (Section??)

Long story short: "If you care about CSE, do it by hand."

=== Inlining ===

Inlining is where a function call is replaced by that function's definition. For example, the standard <hask>map</hask> function can be defined as

<haskell>
map :: (a -> b) -> [a] -> [b]
map f [] = []
map f (x:xs) = f x : map f xs
</haskell>

Now if you write something like

<haskell>
foo = map bar
</haskell>

it's possible that the compiler might ''inline'' the definition of <hask>map</hask>, yielding something like

<haskell>
foo [] = []
foo (x:xs) = bar x : foo xs
</haskell>

which is (hopefully!) faster, because it doesn't involve a call to the <hask>map</hask> function any more, it just does the work directly. (This might also expose new optimisations opportunities; <hask>map</hask> works for ''any'' types, whereas <hask>foo</hask> probably works for only ''one'' type.)

So, that's what inlining is. By default, GHC will inline things if they are 'small enough'. Every time you inline a function, you are in a sense making a (customised) ''copy'' of that function. Do too much of this and the compiled program will be enormous. So it's only worth it for 'small' functions.

(How does GHC determine 'small'? Isn't there a switch that adjusts this?)

=== Specialisation ===

Flexibility is the enemy of performance. Take <hask>(+)</hask> for example. As you know, it adds two numbers together. However, would that be two integers? Two floating-point numbers? Two complex numbers? Two vectors? The generated machine code is very, very different in each case!

It's easy enough to make a function such as <hask>sum</hask>, which will work for ''any'' type of number. However, in the interests of performance, if it can be determined exactly which type of number we're going to be working on, the compiler can generate exactly the right machine code, without having to do lots of runtime lookups.

GHC tries to do this where possible. However (as I understand it?) this tends to work less well across module boundaries. For example, suppose you write

<haskell>
module Physics where

data Force = ...

instance Num Force where ...

resultant_force :: [Force] -> Force
resultant_force = sum
</haskell>

One might ''hope'' that <hask>resultant_force</hask> would get compiled using a special version of <hask>sum</hask> tailored to adding up only <hask>Force</hask> objects. This may or may not happen.

Generally GHC won't just take an existing function and recompile it with a new type signature. What ''might'' happen is that the function gets inlined, and specialised from there. (Can someone say something more concrete here?)

=== Strictness analysis ===

Haskell is a lazy language. Calculations are notionally not performed until their results are 'needed'. However, if the result definitely ''will'' be needed, it's a waste of time and effort to save up the expression and execute it later; more efficient to just execute it right now.

''Strictness analysis'' is a process by which GHC attempts to determine, at compile-time, which data definitely will 'always be needed'. GHC can then build code to just calculate such data, rather than the normal (higher overhead) process for storing up the calculation and executing it later.

Unfortunately, looking at a program and saying "will this data be needed?" is a bit like looking at a program and saying "this program will never halt" --- see The Halting Problem. (Good link?) But GHC does its best, and can give big speedups in some cases.

=== Fusion ===

In Haskell, it is common to write expressions such as

<haskell>
foo = length . filter p . map g . map f . concat
</haskell>

This style of writing makes it very clear what the function ''does'' (it takes a list of lists, concatenates them all, applies f to every element, applies g to every element, throws away all elements that fail p, and then calculates the length of the result). However, if executed literally, it's very inefficient.

When executed, <hask>concat</hask> takes a list of lists and constructs a flat list. Then <hask>map</hask> constructs another list. Then the second <hask>map</hask> function creates yet another list...

Since Haskell is a lazy language, these intermediate lists never exist in memory in their entirety. One element will be generated by one function, and then immediately consumed by the next function in the chain. So as each element is generated, it instantly becomes garbage. So the memory usage isn't that great, but the GC load is quite high. (Not to mention all the time wasted on creating thunks, evaluating thunks, and allocating/deallocating RAM.) So we really want to avoid all this!

The term ''fusion'' refers to program transformations aimed at removing intermediate data structures. (''Deforestation'' refers specifically to lists, but in general fusion is applicable to operations on any structure.)

The standard libraries provide a function <hask>concatMap</hask> such that

<haskell>
concatMap f = concat . map f
</haskell>

As you can see, we don't 'need' this function --- we can define it in turns of other, simpler functions. However, it's more efficient to run because it doesn't generate an intermediate list of lists. (It's also used to define the list monad, which is probably why it's there.)

Having <hask>concatMap</hask> is nice. But we really don't want to define new functions for every possible combination of list operators. (Do ''you'' fancy implementing a <hask>lengthFilterMapMapConat</hask> function?) So one of the optimisations that GHC performs is to attempt to perform fusion automatically.

One way that we could try to do this is by inlining all the function definitions. But list processing functions are generally recursive, which makes matters rather complicated. (I.e., this doesn't really work.)

Currently (GHC 6.6.1) we have build/foldr fusion. That is, where a function ''builds'' a list and passes the result to a function that ''consumes'' a list, GHC can (usually) elide the list itself. There are also other transformations that can be applied. For example, map fusion. Map fusion simply says that

<haskell>
map g . map f
</haskell>

is equivilent to

<haskell>
map (g . f)
</haskell>

(but the latter is more efficient).

All of this is implemented using GHC's ''transformation rules'' facility. See the manual. (Section??) This functionality is only turned on with -O or -O2.

In the future (GHC 6.7?) we will have ''stream fusion''. In layman's terms, this increases the number of functions that can be fused = big speedups.

To be more technical, a ''stream'' represents a traversal of a list (or, indeed, some other structure such as an array). All the list functions become stream functions --- but, crucially, stream operations are ''non-recursive'', meaning they can all be glued together. Taking our example above:

<haskell>
foo = length . filter p . map g . map f . concat
</haskell>

becomes something like

<haskell>
foo =
length .
fromStream . streamFilter p . toStream .
fromStream . streamMap g . toStream .
fromStream . streamMap f . toStream .
fromStream . concat . toStream
</haskell>

which, obviously, is massively ''less'' efficient than the original. However, since

<haskell>
toStream . fromStream = id
</haskell>

we can simplify that down to

<haskell>
foo =
length .
fromStream . streamFilter p .
streamMap g .
streamMap f .
streamConcat . toStream
</haskell>

In other words, we have a <hask>toStream</hask> at one end, and a <hask>fromStream</hask> at the other end, with a bunch of stream operations in the middle. These are all non-recursive; onto <hask>fromStream</hask> actually performs a recursive loop, so once GHC does all its inlining we'll end up with something like

<code>
foreach x in xs do
... concat ...
... map f ...
... map g ...
... filter p ...
... length ...
</code>

which is what we want.

(...Add link to papers...)

== Execution Model ==

In order to understand how to write efficient code, and what GHC does with your code to optimise it, it helps to know a bit about what your compiled code looks like and how it works.

=== Graph reduction ===

To a first approximation, at any moment your program is a 'graph' of objects in memory. ('Graph' in the graph theory sense --- nodes connected by arcs.) Some of the objects are 'data' --- booleans, integers, strings, lists, etc. Some of those objects are functions (because Haskell lets you pass functions around like data). And some of these are ''thunks'' --- unevaluated expressions (because Haskell only evaluates expressions 'as needed').

The program starts off with a single node representing the unevaluated call to <hask>main</hask>, and proceeds to execute from there. Each time a thunk is executed, the result (whatever it is) overwrites the thunk data. (It's possible that the result of evaluating a thunk is a new thunk of course.)

=== About STG ===

GHC compiles to the ''spineless tagness G-machine'' (STG). This is a notional graph reduction machine (i.e., a virtual machine that performs graph reductions as described above). 'G-machine' because it does graph reduction. 'Spineless' because it can't stand up to bullies. 'Tagless' because the graph nodes don't have 'tags' on them to say what they are.

Instead of tags, the nodes have access pointers. If the node is a thunk, its pointer points to the code to evaluate the thunk and return the real result. Otherwise the pointer points to some 'do-nothing' code. So to access any type of node, you just do an indirect jump on this pointer; no case analysis is necessary.

(Gosh I hope I got that lot right!)

Internally, GHC uses a kind of 'machine code' that runs on this non-existent G-machine. It does a number of optimisations on that representation, before finally compiling it into ''real'' machine code (possibly via C using GCC).

=== STG optimisations ===

There are a number of optimisations done at the STG level. These mainly involve trying to avoid unnecessary steps. For example, avoid creating a thunk which immediately creates another thunk when executed; make it evaluate all the way down to a final result in one go. (If we 'need' the thunk's value, we're going to evaluate all the way down anyway, so let's leave out the overhead...)

=== Primitive data types ===

Haskell-98 provides some standard types such as <hask>Int</hask>, etc. GHC defines these as 'boxed' versions of GHC-specific 'unboxed' types:

<haskell>
-- From GHC.Exts:
data Int = I# Int#
data Word = W# Word#
data Double = D# Double#
-- etc.
</haskell>

Here <hask>Int#</hask> is a GHC-specific internal type representing, literally, a plain ordinary bundle of 32 or 64 bits inside the computer somewhere. (Depending on whether it's a 32 or 64-bit architecture.)

In particular, a <hask>Int#</hask> is strict, whereas a <hask>Int</hask> isn't.

=== Algebraic data types ===

(I'm not sure about the basic memory layout. Somebody fill in the general case?)

There are a few special cases:

==== Types with 1 constructor ====

If a function puts a bunch of things into a type value, and the caller immediately takes the things out of the bunch again, GHC will try to eliminate the bundle type all together. (Or is that just for ''tuples''?)

==== Constructors with no fields ====

Booleans are a good example:

<haskell>
data Bool = False | True
</haskell>

GHC will construct a single object in memory representing <hask>False</hask>, and another representing <hask>True</hask>. All <hask>Bool</hask> values are thus pointers to one or the other of these objects. (And hence, consume either 32 or 64 bits.)

GHC optimisations

2007-08-11T15:54:15Z

MathematicalOrchid:

GHC optimisations

2007-08-11T15:43:16Z

MathematicalOrchid: More babble.

GHC optimisations

2007-08-11T11:13:30Z

MathematicalOrchid: OK, I'm done with this - for now.

GHC optimisations

2007-08-11T10:49:23Z

MathematicalOrchid: More content. Still not finished...

GHC optimisations

2007-08-11T10:31:14Z

MathematicalOrchid: Initial version; still working on this...

Rank-N types

2007-07-09T12:40:01Z

MathematicalOrchid: +category

[[Category:Language extensions]]
[[Category:Stub articles]]

== About ==

As best as I can tell, rank-N types are exactly like [[existential type]]s - except that they're completely different.

Rank-2 types are a special case of rank-N types, and normal Haskell 98 types are all rank-1 types.

== Also see ==

[http://hackage.haskell.org/trac/haskell-prime/wiki/RankNTypes Rank-N types] on the Haskell' website.

Existential type

2007-07-09T12:37:44Z

MathematicalOrchid: Corrected invalid syntax. (I wonder if anybody ever checks this...?)

__TOC__
This is a extension of Haskell available in [[GHC]]. See the GHC documentation:
http://www.haskell.org/ghc/docs/latest/html/users_guide/type-extensions.html

==Introduction to existential types==

=== Overview ===

Normally when creating a new type using <hask>type</hask>, <hask>newtype</hask>, <hask>data</hask>, etc., every type variable that appears on the right-hand side must also appear on the left-hand side. Existential types are a way of turning this off.

=== Basics ===

Existential types can be ''used'' for several different purposes. But what they ''do'' is to 'hide' a type variable on the right-hand side.

Normally, any type variable appearing on the right must also appear on the left:

<haskell>
data Worker x y = Worker {buffer :: b, input :: x, output :: y}
</haskell>

This is an error, since the type of the buffer isn't specified on the right (it's a type variable rather than a type) but also isn't specified on the left (there's no 'b' in the left part). In Haskell98, you would have to write

<haskell>
data Worker b x y = Worker {buffer :: b, input :: x, output :: y}
</haskell>

That may or may not be an actual problem.

Usually there is no problem at all with this state of affairs (which is why Haskell98 works this way). However, suppose that a <hask>Worker</hask> can use ''any'' type 'b' so long as it belongs to some particular class. Then every function that uses a <hask>Worker</hask> will have a type like

<haskell>
foo :: (Buffer b) => Worker b Int Int
</haskell>

or something. (In particular, failing to write an explicit type signature will invoke the dreaded [[monomorphism restriction]].) Using existential types, we can avoid this:

<haskell>
data Worker x y = forall b. Buffer b => Worker {buffer :: b, input :: x, output :: y}

foo :: Worker Int Int
</haskell>

The type of the buffer now does ''not'' appear in the <hask>Worker</hask> type at all.

This has a number of consequences. First of all, it is now impossible for a function to demand a <hask>Worker</hask> having a specific type of buffer. Second, the type of <hask>foo</hask> can now be derived automatically without needing an explicit type signature. (No [[monomorphism restriction]].) Thirdly, since code now has ''no idea'' what type the <hask>buffer</hask> function returns, you are more limited in what you can do to it.

In general, when you use a 'hidden' type in this way, you will usually want that type to belong to a specific class, or you will want to pass some functions along that can work on that type. Otherwise you'll have some value belonging to a random unknown type, and you won't be able to ''do'' anything to it!

Note: You can use existential types to convert a more specific type into a less specific one. (See the examples below.) There is ''no way'' to perform the reverse conversion!

== Examples ==

===A short example===

This illustrates creating a heterogeneous list, all of whose members implement "Show", and progressing through that list to show these items:

<haskell>
data Obj = forall a. (Show a) => Obj a

xs = [Obj 1, Obj "foo", Obj 'c']

doShow :: [Obj] -> String
doShow [] = ""
doShow ((Obj x):xs) = show x ++ doShow xs
</haskell>

With output: <code>doShow xs ==> "1\"foo\"'c'"</code>

===Expanded example - rendering objects in a raytracer===

====Problem statement====

In a raytracer, a requirement is to be able to render several different objects (like a ball, mesh or whatever). The first step is a type class for Renderable like so:

<haskell>
class Renderable a where
boundingSphere :: a -> Sphere
hit :: a -> [Fragment] -- returns the "fragments" of all hits with ray
{- ... etc ... -}
</haskell>

To solve the problem, the <hask>hit</hask> function must apply to several objects (like a sphere and a polygon for instance).

<haskell>
hits :: Renderable a => [a] -> [Fragment]
hits xs = sortByDistance $ concatMap hit xs
</haskell>

However, this does not work as written since the elements of the list can be of '''SEVERAL''' different types (like a sphere and a polygon and a mesh etc. etc.) but
lists need to have elements of the same type.

====The solution====

Use 'existential types' - an extension to Haskell that can be found in most compilers.

The following example is based on GHC :

<haskell>
{-# OPTIONS -fglasgow-exts #-}

{- ...-}

data AnyRenderable = forall a. Renderable a => AnyRenderable a

instance Renderable AnyRenderable where
boundingSphere (AnyRenderable a) = boundingSphere a
hit (AnyRenderable a) = hit a
{- ... -}
</haskell>

Now, create lists with type <hask>[AnyRenderable]</hask>, for example,
<haskell>
[ AnyRenderable x
, AnyRenderable y
, AnyRenderable z ]
</haskell>
where x, y, z can be from different instances of <hask>Renderable</hask>.
=== Dynamic dispatch mechanism of OOP ===

'''Existential types''' in conjunction with type classes can be used to emulate the dynamic dispatch mechanism of object oriented programming languages. To illustrate this concept I show how a classic example from object oriented programming can be encoded in Haskell.

<haskell>
class Shape_ a where
perimeter :: a -> Double
area :: a -> Double

data Shape = forall a. Shape_ a => Shape a

type Radius = Double
type Side = Double

data Circle = Circle Radius
data Rectangle = Rectangle Side Side
data Square = Square Side

instance Shape_ Circle where
perimeter (Circle r) = 2 * pi * r
area (Circle r) = pi * r * r

instance Shape_ Rectangle where
perimeter (Rectangle x y) = 2*(x + y)
area (Rectangle x y) = x * y

instance Shape_ Square where
perimeter (Square s) = 4*s
area (Square s) = s*s

instance Shape_ Shape where
perimeter (Shape shape) = perimeter shape
area (Shape shape) = area shape

--
-- Smart constructor
--

circle :: Radius -> Shape
circle r = Shape (Circle r)

rectangle :: Side -> Side -> Shape
rectangle x y = Shape (Rectangle x y)

square :: Side -> Shape
square s = Shape (Square s)

shapes :: [Shape]
shapes = [circle 2.4, rectangle 3.1 4.4, square 2.1]
</haskell>

(You may see other [[Smart constructors]] for other purposes).

=== [[Generalised algebraic datatype]] ===

The type of the <hask>parse</hask> function for [[Generalised algebraic datatype#Motivating example|this GADT]] is a good example to illustrate the concept of existential type.

==Alternate methods==
===Concrete data types===
====Universal instance of a Class====
Here one way to simulate existentials (Hawiki note: (Borrowed from somewhere...))

Suppose I have a type class Shape a
<haskell>
type Point = (Float,Float)

class Shape a where
draw :: a -> IO ()
translate :: a-> Point -> a

</haskell>

Then we can pack shapes up into a [[concrete data type]] like this:
<haskell>
data SHAPE = SHAPE (IO ()) (Point -> SHAPE)
</haskell>
with a function like this
<haskell>
packShape :: Shape a => a -> SHAPE
packShape s = SHAPE (draw s) (\(x,y) -> packShape (translate s (x,y)))
</haskell>
This would be useful if we needed a list of shapes that we would need to translate and draw.

In fact we can make <hask>SHAPE</hask> an instance of <hask>Shape</hask>:
<haskell>
instance Shape SHAPE where
draw (SHAPE d t) = d
translate (SHAPE d t) = t
</haskell>

So SHAPE is a sort of universal instance.

====Using constructors and combinators====
Why bother with class <hask>Shape</hask>? Why not just go straight to

<haskell>
data Shape = Shape {
draw :: IO()
translate :: (Int, Int) -> Shape
}
</haskell>

Then you can create a library of shape [[constructor]]s and [[combinator]]s
that each have defined "draw" and "translate" in their "where" clauses.

<haskell>
circle :: (Int, Int) -> Int -> Shape
circle (x,y) r =
Shape draw1 translate1
where
draw1 = ...
translate1 (x1,y1) = circle (x+x1, y+y1) r

shapeGroup :: [Shape] -> Shape
shapeGroup shapes = Shape draw1 translate1
where
draw1 = sequence_ $ map draw shapes
translate1 v = shapeGroup $ map (translate v) shapes
</haskell>

===Cases that really require existentials===

There are cases where this sort of trick doesnt work. Here are two examples from a haskell mailing list discussion (from K. Claussen) that don't seem expressible without
existentials. (But maybe one can rethink the whole thing :)
<haskell>
data Expr a = Val a | forall b . Apply (Expr (b -> a)) (Expr b)
</haskell>
and
<haskell>
data Action = forall b . Act (IORef b) (b -> IO ())
</haskell>
(Maybe this last one could be done as a <hask>type Act (IORef b) (IORef b -> IO ())</hask> then we could hide the <hask>IORef</hask> as above, that is go ahead and apply the second argument to the first)

== Examples from the [http://www.cs.uu.nl/wiki/Ehc/ Essential Haskell Compiler] project ==

See the [http://www.cs.uu.nl/wiki/Ehc/#On_EHC documentation on EHC], each paper at the ''Version 4'' part:
* Chapter 8 (EH4) of Atze Dijkstra's [http://www.cs.uu.nl/groups/ST/Projects/ehc/ehc-book.pdf Essential Haskell PhD thesis] (most recent version). A detailed explanation. It explains also that existential types can be expressed in Haskell, but their use is restricted to data declarations, and the notation (using keyword <hask>forall</hask>) may be confusing. In Essential Haskell, existential types can occur not only in data declarations, and a separate keyword <hask>exists</hask> is used for their notation.
* [http://www.cs.uu.nl/wiki/pub/Ehc/WebHome/20050107-eh-intro.pdf Essential Haskell Compiler overview]
* [http://www.cs.uu.nl/wiki/Ehc/Examples#EH_4_forall_and_exists_everywher Examples]

==See also==
* A mailinglist discussion: http://haskell.org/pipermail/haskell-cafe/2003-October/005231.html
*An example of encoding existentials using RankTwoPolymorphism : http://haskell.org/pipermail/haskell-cafe/2003-October/005304.html
=== Trac ===

[http://hackage.haskell.org/trac/haskell-prime/wiki/ExistentialQuantification Existential Quantification] is a detailed material on the topic. It has link also to the smaller [http://hackage.haskell.org/trac/haskell-prime/wiki/ExistentialQuantifier Existential Quantifier] page.

[[Category:Idioms]]
[[Category:Glossary]]
[[Category:Language extensions]]

User:MathematicalOrchid

2007-07-09T12:36:03Z

MathematicalOrchid:

=== Status ===

Enthusiastic Haskell newbie.

=== Main Interests ===

* Using Haskell to write triposcopic mathematical algorithms with only a tiny amount of code.
* Using Haskell to do seriously compute-bounded work in a multiprocessor setup.

=== Projects ===

==== Active ====

* Toy compression implementations in Haskell.

==== On Hold ====

* ''Indoculate'' — Program to convert a single (custom) source to both HTML and LaTeX, and also do cross-linking. (Status: in production use)
* ''Chaos'' — chaos pendulum simulator (Status: moderately working, needs UI)
* ''Haktal'' — fractal generator. (Status: minimal functionality)
* ''HoJ'' — Haskell to Java compiler. (Status: skeletal)
* ''Evlor'' — Interactive Haskell step-line debugger. (Status: skeletal)
* Sorting algorithm benchmarks.
* Audio DSP in Haskell.
* [[POV-Ray SDL project|Haskell SDL]] for [http://www.povray.org/ POV-Ray].

==== Failed ====

* Haskell ray tracer.
* Haskell type deducer.
* Haskell program to cause world peace.

=== Darcs ===

==== Indoculate ====

* <code>darcs get http://www.orphi.me.uk/darcs/Indoculate</code>
* <code>ghc --make MakeHTML</code>
* <code>ghc --make MakeSite</code>
* <code>ghc --make MakeLaTeX</code>
* Comes with a minimal manual. (<code>Manual.html</code> in the darcs repo.)

==== Chaos pendulum simulator ====

* <code>darcs get http://www.orphi.me.uk/darcs/Chaos</code> (Chaos pendulum simulator.)
* <code>ghc -O2 --make System1</code>
* <code>System1</code>
* Go have a cup of tea, what some TV, go to bed, come back next day, and it might have finished. Will draw 500 frames at 200x200 pixels each, and save them as PPM image files. Make an animation out of these, and enjoy the light show!

==== Toy Compression ====

* <code>darcs get http://www.orphi.me.uk/darcs/ToyCompression</code>
* <code>ghc -O2 --make Encode</code>
* <code>ghc -O2 --make Decode</code>
* <code>Encode algorithm file</code> (Compress <code>file</code> using specified algorithm, and save as <code>file-algorithm</code>.)
* <code>Decode algorithm file</code> (Decompress <code>file</code> using specified algorithm, and save as <code>file-unalgorithm</code>.)

Currently working algorithms:
* '<code>RLE</code>': ''Run-length encoding''. Works well on files containing lots of 'runs' of the same value - e.g., pixel data. Works horribly on text.
* '<code>BWT</code>': ''Burrows-Wheeler transform''. Doesn't actually do any compression, but tends to make data more compressible.
* '<code>MTF</code>': ''Move-to-front encoding''. Again, doesn't compress, but makes the data more compressible.
* '<code>Fib</code>': ''Fibonacci codes''. Low numbers take up fewer bits than large numbers.
* '<code>LZW</code>':'' Lempel-Ziv-Welch''. Works well on just about everything!

Notes:
* Danger: BWT is ''extremely'' slow. It also uses ''absurd'' amounts of RAM! Run this algorithm only on small files. (Less than about 10 KB.)
* LZW works very well, but BWT+MTF+Fib is currently unbeaten...

=== Contributed Code ===

* [[Library for binary]]
* [[Library for vectors]]
* [[Library for colours]]
* [[Library for PPM images]]
* [[Toy compression implementations]]

=== Current Unsolved Questions ===

* Why do Haskell language extensions exist?
* How do you do graphics in Haskell?
* How come (e.g.) Smalltalk provides 27 different types of collection, but Haskell only ever involves single-linked lists and binary trees?
* Why is <hask>putStr xs1; putStr xs2</hask> slower than <hask>putStr (xs1 ++ xs2)</hask>?

User:MathematicalOrchid

2007-07-09T12:31:05Z

MathematicalOrchid:

=== Status ===

Enthusiastic Haskell newbie.

=== Main Interests ===

* Using Haskell to write triposcopic mathematical algorithms with only a tiny amount of code.
* Using Haskell to do seriously compute-bounded work in a multiprocessor setup.

=== Projects ===

==== Active ====

* Toy compression implementations in Haskell.

==== On Hold ====

* ''Indoculate'' — Program to convert a single (custom) source to both HTML and LaTeX, and also do cross-linking. (Status: in production use)
* ''Chaos'' — chaos pendulum simulator (Status: moderately working, needs UI)
* ''Haktal'' — fractal generator. (Status: minimal functionality)
* ''HoJ'' — Haskell to Java compiler. (Status: skeletal)
* ''Evlor'' — Interactive Haskell step-line debugger. (Status: skeletal)
* Sorting algorithm benchmarks.
* Audio DSP in Haskell.
* [[POV-Ray SDL project|Haskell SDL]] for [http://www.povray.org/ POV-Ray].

==== Failed ====

* Haskell ray tracer.
* Haskell type deducer.
* Haskell program to cause world peace.

=== Darcs ===

==== Indoculate ====

* <code>darcs get http://www.orphi.me.uk/darcs/Indoculate</code>
* <code>ghc --make MakeHTML</code>
* <code>ghc --make MakeSite</code>
* <code>ghc --make MakeLaTeX</code>

==== Chaos pendulum simulator ====

* <code>darcs get http://www.orphi.me.uk/darcs/Chaos</code> (Chaos pendulum simulator.)

==== Toy Compression ====

* <code>darcs get http://www.orphi.me.uk/darcs/ToyCompression</code>
* <code>ghc -O2 --make Encode</code>
* <code>ghc -O2 --make Decode</code>
* <code>Encode algorithm file</code> (Compress <code>file</code> using specified algorithm, and save as <code>file-algorithm</code>.)
* <code>Decode algorithm file</code> (Decompress <code>file</code> using specified algorithm, and save as <code>file-unalgorithm</code>.)

Currently working algorithms:
* '<code>RLE</code>': ''Run-length encoding''. Works well on files containing lots of 'runs' of the same value - e.g., pixel data. Works horribly on text.
* '<code>BWT</code>': ''Burrows-Wheeler transform''. Doesn't actually do any compression, but tends to make data more compressible.
* '<code>MTF</code>': ''Move-to-front encoding''. Again, doesn't compress, but makes the data more compressible.
* '<code>Fib</code>': ''Fibonacci codes''. Low numbers take up fewer bits than large numbers.
* '<code>LZW</code>':'' Lempel-Ziv-Welch''. Works well on just about everything!

Notes:
* Danger: BWT is ''extremely'' slow. It also uses ''absurd'' amounts of RAM! Run this algorithm only on small files. (Less than about 10 KB.)
* LZW works very well, but BWT+MTF+Fib is currently unbeaten...

=== Contributed Code ===

* [[Library for binary]]
* [[Library for vectors]]
* [[Library for colours]]
* [[Library for PPM images]]
* [[Toy compression implementations]]

=== Current Unsolved Questions ===

* Why do Haskell language extensions exist?
* How do you do graphics in Haskell?
* How come (e.g.) Smalltalk provides 27 different types of collection, but Haskell only ever involves single-linked lists and binary trees?
* Why is <hask>putStr xs1; putStr xs2</hask> slower than <hask>putStr (xs1 ++ xs2)</hask>?

User talk:MathematicalOrchid

2007-07-09T12:07:01Z

MathematicalOrchid:

Rank-N types

2007-07-09T12:05:49Z

MathematicalOrchid: +category

[[Category:Language extensions]]

== About ==

As best as I can tell, rank-N types are exactly like [[existential type]]s - except that they're completely different.

Rank-2 types are a special case of rank-N types, and normal Haskell 98 types are all rank-1 types.

== Also see ==

[http://hackage.haskell.org/trac/haskell-prime/wiki/RankNTypes Rank-N types] on the Haskell' website.

Rank-N types

2007-07-09T12:05:20Z

MathematicalOrchid: Can somebody add some content here?

== About ==

As best as I can tell, rank-N types are exactly like [[existential type]]s - except that they're completely different.

Rank-2 types are a special case of rank-N types, and normal Haskell 98 types are all rank-1 types.

== Also see ==

[http://hackage.haskell.org/trac/haskell-prime/wiki/RankNTypes Rank-N types] on the Haskell' website.

Type

2007-07-09T12:01:06Z

MathematicalOrchid: Added link to the language extensions category. (Clumsy, but works.)

In Haskell, '''types''' are how you describe the data your program will work with.

[[Category:Language]]

==Data declarations==

One introduces, or declares, at type in Haskell via the <code>data</code> statement. In general a data declaration looks like:

data [context =>] type tv1 ... tvi = con1 c1t1 c1c2... c1tn |
... | conm cmt1 ... cmtq
[deriving]

which probably explains nothing if you don't already know Haskell!

The essence of the above statement is that you use the keyword <hask>data</hask>,
supply an optional context, give the type name and a variable number of
[[type variable]]s. This is then followed by a variable number of [[constructor]]s, each of which has a list of [[type variable]]s or [[type constant]]s. At the end, there is an optional <code>deriving</code>.

There are a number of other subtleties associated with this, such as requiring
parameters to the data constructors to be [[eager]], what [[class]]es are
allowed in the [[deriving]], use of [[field]] names in the constructors
and what the [[context]] actually does. Please refer to the specific articles for more on each of those.

Let's look at some examples. The Haskell standard data type [[Maybe]] is typically declared as:
<haskell>
data Maybe a = Just a | Nothing
</haskell>
What this means is that the type '''Maybe''' has one type variable, represented by the ''a'' and two [[constructor]]s '''Just''' and '''Nothing'''. (Note that Haskell requires type names and constructor names to begin with an uppercase letter). The '''Just''' constructor takes one parameter, ''a''.

As another example, consider binary [[Tree]]s. They could be represented by:
<haskell>
data Tree a = Branch (Tree a) (Tree a) | Leaf a
</haskell>
Here, one of the constructors, '''Branch''' of '''Tree''' takes two trees as
parameters to the constructor, while '''Leaf''' takes the type variable ''a''. This type of recursion is a very common [[:Category:Idioms |pattern]] in Haskell.

==Type and newtype==

The other two ways one may introduce types to Haskell programs are via the
<hask>type</hask> and <hask>newtype</hask> statements.

<hask>type</hask> introduces a synonym for a type and uses the same data
constructors. <hask>newtype</hask> introduces a renaming of a type and
requires you to provide new constructors.

When using a <hask>type</hask> declaration, the type synonym and its base type
are interchangeble almost everywhere (There are some restrictions when dealing with [[instance]] declarations). For example, if you had the declaration:
<haskell>
type Name = String
</haskell>
then any [[function]] you had declared that had <hask>String</hask> in its
signature could be used on any element of type <code>Name</code>

However, if one had the declaration:
<haskell>
newtype FirstName = FirstName String
</haskell>
this would no longer be the case. Functions would have to be declared that
actually were defined on '''FirstName'''. Often, one creates a deconstructor
at the same time which helps alleviate this requirement. e.g.:
<haskell>
unFirstName :: FirstName -> String
unFirstName (FirstName s) = s
</haskell>
This is often done by the use of [[field]]s in the <code>newtype</code>. (Note
that many consider the Haskell field implementation sub-optimal, while
others use it extensively. See [[Programming guidelines]] and [[Future of Haskell]])

==A simple example==

Suppose you want to create a program to play bridge. You need something to represent cards. Here is one way to do that.

First, create data types for the suit and card number.
<haskell>
data Suit = Club | Diamond | Heart | Spade
deriving (Read, Show, Enum, Eq, Ord)

data CardValue = Two | Three | Four
| Five | Six | Seven | Eight | Nine | Ten
| Jack | Queen | King | Ace
deriving (Read, Show, Enum, Eq, Ord)
</haskell>
Each of these uses a [[deriving]] clause to allow us to convert them from / to [[String]] and Int, test them for equality and ordering. With types like this,
where there are no [[type variable]]s, equality is based upon which constructor is used and order by the order you wrote them. e.g. <code>Three</code> is less than <code>Queen</code>.

Now we define an actual <code>Card</code>
<haskell>
data Card = Card {value::CardValue,
suit::Suit}
deriving (Read, Show, Eq)
</haskell>
In this definition, we use [[field]]s, which give us ready made functions to
access the two parts of a <code>Card</code>. Again, [[type variables]] were not
used, but the data [[constructor]] requires its two parameters to be of
specific types, <code>CardValue</code> and <code>Suit</code>.

The deriving clause here only specifies three of our desired [[Class]]es, we supply [[instance]] declarations for [[Ord]] and [[Enum]].
<haskell>
instance Ord Card where
compare c1 c2 | (value c1 == (value c2)) = compare (suit c1) (suit c2)
| otherwise = compare (value c1) (value c2)

instance Enum Card where
toEnum n = Card (toEnum (n `div` 4)) (toEnum (n `mod` 4))
fromEnum c = 4*(fromEnum (value c)) + (fromEnum (suit c))
</haskell>
Finally, we alias the type <code>Deck</code> to a list of <code>Card</code>s
and populate the deck with a [[list comprehension]]
<haskell>
type Deck = [Card]

deck::Deck
deck = [Card val su | val <- [Two .. Ace], su <- [Club .. Spade]]
</haskell>

==Please add==

Further illustrative examples would be most appreciated.

==See also==
Read the (wanted) articles about data [[constructor]]s and [[class]]es. As well the
[http://haskell.org/definition/haskell98-report.pdf Haskell 98 report] and
your chosen implementation (e.g. [[GHC/Documentation]]) have the latest words.

*[http://www.haskell.org/haskellwiki/Category:Language_extensions Language extensions] - many language extensions are to do with changes to the type system.
*[[Smart constructors]] shows some interesting examples including a non-trivial usage of <code>newtype</code>.
*[[Unboxed type]] shows ways to have values closer to the bare metal :).
*[[Phantom type]] discusses types without constructors.
*[[Type witness]] gives an example of [[Generalised algebraic datatype | GADTs]], a [[GHC]] extension.
*[[Existential type]] shows how to implement a common O-O programming paradigm.
*[[Type arithmetic]] implements the [[Peano numbers]].
*[[Reified type]], [[Non-trivial type synonyms]], [[Abstract data type]], [[Concrete data type]], [[Algebraic data type]].
*[[Research_papers/Type_systems]] allow the curious to delve deeper.

Existential type

2007-07-05T09:20:16Z

MathematicalOrchid: Idiot-features here doesn't know left from right... >_<

__TOC__
This is a extension of Haskell available in [[GHC]]. See the GHC documentation:
http://www.haskell.org/ghc/docs/latest/html/users_guide/type-extensions.html

==Introduction to existential types==

=== Overview ===

Normally when creating a new type using <hask>type</hask>, <hask>newtype</hask>, <hask>data</hask>, etc., every type variable that appears on the right-hand side must also appear on the left-hand side. Existential types are a way of turning this off.

=== Basics ===

Existential types can be ''used'' for several different purposes. But what they ''do'' is to 'hide' a type variable on the right-hand side.

Normally, any type variable appearing on the right must also appear on the left:

<haskell>
data Worker x y = Worker {buffer :: b, input :: x, output :: y}
</haskell>

This is an error, since the type of the buffer isn't specified on the right (it's a type variable rather than a type) but also isn't specified on the left (there's no 'b' in the left part). In Haskell98, you would have to write

<haskell>
data Worker b x y = Worker {buffer :: b, input :: x, output :: y}
</haskell>

That may or may not be an actual problem.

Usually there is no problem at all with this state of affairs (which is why Haskell98 works this way). However, suppose that a <hask>Worker</hask> can use ''any'' type 'b' so long as it belongs to some particular class. Then every function that uses a <hask>Worker</hask> will have a type like

<haskell>
foo :: (Buffer b) => Worker b Int Int
</haskell>

or something. (In particular, failing to write an explicit type signature will invoke the dreaded [[monomorphism restriction]].) Using existential types, we can avoid this:

<haskell>
data Worker x y = forall (Buffer b). Worker {buffer :: b, input :: x, output :: y}

foo :: Worker Int Int
</haskell>

The type of the buffer now does ''not'' appear in the <hask>Worker</hask> type at all.

This has a number of consequences. First of all, it is now impossible for a function to demand a <hask>Worker</hask> having a specific type of buffer. Second, the type of <hask>foo</hask> can now be derived automatically without needing an explicit type signature. (No [[monomorphism restriction]].) Thirdly, since code now has ''no idea'' what type the <hask>buffer</hask> function returns, you are more limited in what you can do to it.

In general, when you use a 'hidden' type in this way, you will usually want that type to belong to a specific class, or you will want to pass some functions along that can work on that type. Otherwise you'll have some value belonging to a random unknown type, and you won't be able to ''do'' anything to it!

Note: You can use existential types to convert a more specific type into a less specific one. (See the examples below.) There is ''no way'' to perform the reverse conversion!

== Examples ==

===A short example===

This illustrates creating a heterogeneous list, all of whose members implement "Show", and progressing through that list to show these items:

<haskell>
data Obj = forall a. (Show a) => Obj a

xs = [Obj 1, Obj "foo", Obj 'c']

doShow :: [Obj] -> String
doShow [] = ""
doShow ((Obj x):xs) = show x ++ doShow xs
</haskell>

With output: <code>doShow xs ==> "1\"foo\"'c'"</code>

===Expanded example - rendering objects in a raytracer===

====Problem statement====

In a raytracer, a requirement is to be able to render several different objects (like a ball, mesh or whatever). The first step is a type class for Renderable like so:

<haskell>
class Renderable a where
boundingSphere :: a -> Sphere
hit :: a -> [Fragment] -- returns the "fragments" of all hits with ray
{- ... etc ... -}
</haskell>

To solve the problem, the <hask>hit</hask> function must apply to several objects (like a sphere and a polygon for instance).

<haskell>
hits :: Renderable a => [a] -> [Fragment]
hits xs = sortByDistance $ concatMap hit xs
</haskell>

However, this does not work as written since the elements of the list can be of '''SEVERAL''' different types (like a sphere and a polygon and a mesh etc. etc.) but
lists need to have elements of the same type.

====The solution====

Use 'existential types' - an extension to Haskell that can be found in most compilers.

The following example is based on GHC :

<haskell>
{-# OPTIONS -fglasgow-exts #-}

{- ...-}

data AnyRenderable = forall a. Renderable a => AnyRenderable a

instance Renderable AnyRenderable where
boundingSphere (AnyRenderable a) = boundingSphere a
hit (AnyRenderable a) = hit a
{- ... -}
</haskell>

Now, create lists with type <hask>[AnyRenderable]</hask>, for example,
<haskell>
[ AnyRenderable x
, AnyRenderable y
, AnyRenderable z ]
</haskell>
where x, y, z can be from different instances of <hask>Renderable</hask>.
=== Dynamic dispatch mechanism of OOP ===

'''Existential types''' in conjunction with type classes can be used to emulate the dynamic dispatch mechanism of object oriented programming languages. To illustrate this concept I show how a classic example from object oriented programming can be encoded in Haskell.

<haskell>
class Shape_ a where
perimeter :: a -> Double
area :: a -> Double

data Shape = forall a. Shape_ a => Shape a

type Radius = Double
type Side = Double

data Circle = Circle Radius
data Rectangle = Rectangle Side Side
data Square = Square Side

instance Shape_ Circle where
perimeter (Circle r) = 2 * pi * r
area (Circle r) = pi * r * r

instance Shape_ Rectangle where
perimeter (Rectangle x y) = 2*(x + y)
area (Rectangle x y) = x * y

instance Shape_ Square where
perimeter (Square s) = 4*s
area (Square s) = s*s

instance Shape_ Shape where
perimeter (Shape shape) = perimeter shape
area (Shape shape) = area shape

--
-- Smart constructor
--

circle :: Radius -> Shape
circle r = Shape (Circle r)

rectangle :: Side -> Side -> Shape
rectangle x y = Shape (Rectangle x y)

square :: Side -> Shape
square s = Shape (Square s)

shapes :: [Shape]
shapes = [circle 2.4, rectangle 3.1 4.4, square 2.1]
</haskell>

(You may see other [[Smart constructors]] for other purposes).

=== [[Generalised algebraic datatype]] ===

The type of the <hask>parse</hask> function for [[Generalised algebraic datatype#Motivating example|this GADT]] is a good example to illustrate the concept of existential type.

==Alternate methods==
===Concrete data types===
====Universal instance of a Class====
Here one way to simulate existentials (Hawiki note: (Borrowed from somewhere...))

Suppose I have a type class Shape a
<haskell>
type Point = (Float,Float)

class Shape a where
draw :: a -> IO ()
translate :: a-> Point -> a

</haskell>

Then we can pack shapes up into a [[concrete data type]] like this:
<haskell>
data SHAPE = SHAPE (IO ()) (Point -> SHAPE)
</haskell>
with a function like this
<haskell>
packShape :: Shape a => a -> SHAPE
packShape s = SHAPE (draw s) (\(x,y) -> packShape (translate s (x,y)))
</haskell>
This would be useful if we needed a list of shapes that we would need to translate and draw.

In fact we can make <hask>SHAPE</hask> an instance of <hask>Shape</hask>:
<haskell>
instance Shape SHAPE where
draw (SHAPE d t) = d
translate (SHAPE d t) = t
</haskell>

So SHAPE is a sort of universal instance.

====Using constructors and combinators====
Why bother with class <hask>Shape</hask>? Why not just go straight to

<haskell>
data Shape = Shape {
draw :: IO()
translate :: (Int, Int) -> Shape
}
</haskell>

Then you can create a library of shape [[constructor]]s and [[combinator]]s
that each have defined "draw" and "translate" in their "where" clauses.

<haskell>
circle :: (Int, Int) -> Int -> Shape
circle (x,y) r =
Shape draw1 translate1
where
draw1 = ...
translate1 (x1,y1) = circle (x+x1, y+y1) r

shapeGroup :: [Shape] -> Shape
shapeGroup shapes = Shape draw1 translate1
where
draw1 = sequence_ $ map draw shapes
translate1 v = shapeGroup $ map (translate v) shapes
</haskell>

===Cases that really require existentials===

There are cases where this sort of trick doesnt work. Here are two examples from a haskell mailing list discussion (from K. Claussen) that don't seem expressible without
existentials. (But maybe one can rethink the whole thing :)
<haskell>
data Expr a = Val a | forall b . Apply (Expr (b -> a)) (Expr b)
</haskell>
and
<haskell>
data Action = forall b . Act (IORef b) (b -> IO ())
</haskell>
(Maybe this last one could be done as a <hask>type Act (IORef b) (IORef b -> IO ())</hask> then we could hide the <hask>IORef</hask> as above, that is go ahead and apply the second argument to the first)

== Examples from the [http://www.cs.uu.nl/wiki/Ehc/ Essential Haskell Compiler] project ==

See the [http://www.cs.uu.nl/wiki/Ehc/#On_EHC documentation on EHC], each paper at the ''Version 4'' part:
* Chapter 8 (EH4) of Atze Dijkstra's [http://www.cs.uu.nl/groups/ST/Projects/ehc/ehc-book.pdf Essential Haskell PhD thesis] (most recent version). A detailed explanation. It explains also that existential types can be expressed in Haskell, but their use is restricted to data declarations, and the notation (using keyword <hask>forall</hask>) may be confusing. In Essential Haskell, existential types can occur not only in data declarations, and a separate keyword <hask>exists</hask> is used for their notation.
* [http://www.cs.uu.nl/wiki/pub/Ehc/WebHome/20050107-eh-intro.pdf Essential Haskell Compiler overview]
* [http://www.cs.uu.nl/wiki/Ehc/Examples#EH_4_forall_and_exists_everywher Examples]

==See also==
* A mailinglist discussion: http://haskell.org/pipermail/haskell-cafe/2003-October/005231.html
*An example of encoding existentials using RankTwoPolymorphism : http://haskell.org/pipermail/haskell-cafe/2003-October/005304.html
=== Trac ===

[http://hackage.haskell.org/trac/haskell-prime/wiki/ExistentialQuantification Existential Quantification] is a detailed material on the topic. It has link also to the smaller [http://hackage.haskell.org/trac/haskell-prime/wiki/ExistentialQuantifier Existential Quantifier] page.

[[Category:Idioms]]
[[Category:Glossary]]
[[Category:Language extensions]]

Existential type

2007-07-05T09:16:50Z

MathematicalOrchid: Added an explanation that mere mortals can (hopefully!) understand.

__TOC__
This is a extension of Haskell available in [[GHC]]. See the GHC documentation:
http://www.haskell.org/ghc/docs/latest/html/users_guide/type-extensions.html

==Introduction to existential types==

=== Overview ===

Normally when creating a new type using <hask>type</hask>, <hask>newtype</hask>, <hask>data</hask>, etc., every type variable that appears on the left-hand side must also appear on the right-hand side. Existential types are a way of turning this off.

=== Basics ===

Existential types can be ''used'' for several different purposes. But what they ''do'' is to 'hide' a type variable on the right-hand side.

Normally, any type variable appearing on the right must also appear on the left:

<haskell>
data Worker x y = Worker {buffer :: b, input :: x, output :: y}
</haskell>

This is an error, since the type of the buffer isn't specified on the right (it's a type variable rather than a type) but also isn't specified on the left (there's no 'b' in the left part). In Haskell98, you would have to write

<haskell>
data Worker b x y = Worker {buffer :: b, input :: x, output :: y}
</haskell>

That may or may not be an actual problem.

Usually there is no problem at all with this state of affairs (which is why Haskell98 works this way). However, suppose that a <hask>Worker<hask> can use ''any'' type 'b' so long as it belongs to some particular class. Then every function that uses a <hask>Worker</hask> will have a type like

<haskell>
foo :: (Buffer b) => Worker b Int Int
</haskell>

or something. (In particular, failing to write an explicit type signature will invoke the dreaded [[monomorphism restriction]].) Using existential types, we can avoid this:

<haskell>
data Worker x y = forall (Buffer b). Worker {buffer :: b, input :: x, output :: y}

foo :: Worker Int Int
</haskell>

The type of the buffer now does ''not'' appear in the <hask>Worker</hask> type at all.

This has a number of consequences. First of all, it is now impossible for a function to demand a <hask>Worker</hask> having a specific type of buffer. Second, the type of <hask>foo</hask> can now be derived automatically without needing an explicit type signature. (No [[monomorphism restriction]].) Thirdly, since code now has ''no idea'' what type the <hask>buffer</hask> function returns, you are more limited in what you can do to it.

In general, when you use a 'hidden' type in this way, you will usually want that type to belong to a specific class, or you will want to pass some functions along that can work on that type. Otherwise you'll have some value belonging to a random unknown type, and you won't be able to ''do'' anything to it!

Note: You can use existential types to convert a more specific type into a less specific one. (See the examples below.) There is ''no way'' to perform the reverse conversion!

== Examples ==

===A short example===

This illustrates creating a heterogeneous list, all of whose members implement "Show", and progressing through that list to show these items:

<haskell>
data Obj = forall a. (Show a) => Obj a

xs = [Obj 1, Obj "foo", Obj 'c']

doShow :: [Obj] -> String
doShow [] = ""
doShow ((Obj x):xs) = show x ++ doShow xs
</haskell>

With output: <code>doShow xs ==> "1\"foo\"'c'"</code>

===Expanded example - rendering objects in a raytracer===

====Problem statement====

In a raytracer, a requirement is to be able to render several different objects (like a ball, mesh or whatever). The first step is a type class for Renderable like so:

<haskell>
class Renderable a where
boundingSphere :: a -> Sphere
hit :: a -> [Fragment] -- returns the "fragments" of all hits with ray
{- ... etc ... -}
</haskell>

To solve the problem, the <hask>hit</hask> function must apply to several objects (like a sphere and a polygon for instance).

<haskell>
hits :: Renderable a => [a] -> [Fragment]
hits xs = sortByDistance $ concatMap hit xs
</haskell>

However, this does not work as written since the elements of the list can be of '''SEVERAL''' different types (like a sphere and a polygon and a mesh etc. etc.) but
lists need to have elements of the same type.

====The solution====

Use 'existential types' - an extension to Haskell that can be found in most compilers.

The following example is based on GHC :

<haskell>
{-# OPTIONS -fglasgow-exts #-}

{- ...-}

data AnyRenderable = forall a. Renderable a => AnyRenderable a

instance Renderable AnyRenderable where
boundingSphere (AnyRenderable a) = boundingSphere a
hit (AnyRenderable a) = hit a
{- ... -}
</haskell>

Now, create lists with type <hask>[AnyRenderable]</hask>, for example,
<haskell>
[ AnyRenderable x
, AnyRenderable y
, AnyRenderable z ]
</haskell>
where x, y, z can be from different instances of <hask>Renderable</hask>.
=== Dynamic dispatch mechanism of OOP ===

'''Existential types''' in conjunction with type classes can be used to emulate the dynamic dispatch mechanism of object oriented programming languages. To illustrate this concept I show how a classic example from object oriented programming can be encoded in Haskell.

<haskell>
class Shape_ a where
perimeter :: a -> Double
area :: a -> Double

data Shape = forall a. Shape_ a => Shape a

type Radius = Double
type Side = Double

data Circle = Circle Radius
data Rectangle = Rectangle Side Side
data Square = Square Side

instance Shape_ Circle where
perimeter (Circle r) = 2 * pi * r
area (Circle r) = pi * r * r

instance Shape_ Rectangle where
perimeter (Rectangle x y) = 2*(x + y)
area (Rectangle x y) = x * y

instance Shape_ Square where
perimeter (Square s) = 4*s
area (Square s) = s*s

instance Shape_ Shape where
perimeter (Shape shape) = perimeter shape
area (Shape shape) = area shape

--
-- Smart constructor
--

circle :: Radius -> Shape
circle r = Shape (Circle r)

rectangle :: Side -> Side -> Shape
rectangle x y = Shape (Rectangle x y)

square :: Side -> Shape
square s = Shape (Square s)

shapes :: [Shape]
shapes = [circle 2.4, rectangle 3.1 4.4, square 2.1]
</haskell>

(You may see other [[Smart constructors]] for other purposes).

=== [[Generalised algebraic datatype]] ===

The type of the <hask>parse</hask> function for [[Generalised algebraic datatype#Motivating example|this GADT]] is a good example to illustrate the concept of existential type.

==Alternate methods==
===Concrete data types===
====Universal instance of a Class====
Here one way to simulate existentials (Hawiki note: (Borrowed from somewhere...))

Suppose I have a type class Shape a
<haskell>
type Point = (Float,Float)

class Shape a where
draw :: a -> IO ()
translate :: a-> Point -> a

</haskell>

Then we can pack shapes up into a [[concrete data type]] like this:
<haskell>
data SHAPE = SHAPE (IO ()) (Point -> SHAPE)
</haskell>
with a function like this
<haskell>
packShape :: Shape a => a -> SHAPE
packShape s = SHAPE (draw s) (\(x,y) -> packShape (translate s (x,y)))
</haskell>
This would be useful if we needed a list of shapes that we would need to translate and draw.

In fact we can make <hask>SHAPE</hask> an instance of <hask>Shape</hask>:
<haskell>
instance Shape SHAPE where
draw (SHAPE d t) = d
translate (SHAPE d t) = t
</haskell>

So SHAPE is a sort of universal instance.

====Using constructors and combinators====
Why bother with class <hask>Shape</hask>? Why not just go straight to

<haskell>
data Shape = Shape {
draw :: IO()
translate :: (Int, Int) -> Shape
}
</haskell>

Then you can create a library of shape [[constructor]]s and [[combinator]]s
that each have defined "draw" and "translate" in their "where" clauses.

<haskell>
circle :: (Int, Int) -> Int -> Shape
circle (x,y) r =
Shape draw1 translate1
where
draw1 = ...
translate1 (x1,y1) = circle (x+x1, y+y1) r

shapeGroup :: [Shape] -> Shape
shapeGroup shapes = Shape draw1 translate1
where
draw1 = sequence_ $ map draw shapes
translate1 v = shapeGroup $ map (translate v) shapes
</haskell>

===Cases that really require existentials===

There are cases where this sort of trick doesnt work. Here are two examples from a haskell mailing list discussion (from K. Claussen) that don't seem expressible without
existentials. (But maybe one can rethink the whole thing :)
<haskell>
data Expr a = Val a | forall b . Apply (Expr (b -> a)) (Expr b)
</haskell>
and
<haskell>
data Action = forall b . Act (IORef b) (b -> IO ())
</haskell>
(Maybe this last one could be done as a <hask>type Act (IORef b) (IORef b -> IO ())</hask> then we could hide the <hask>IORef</hask> as above, that is go ahead and apply the second argument to the first)

== Examples from the [http://www.cs.uu.nl/wiki/Ehc/ Essential Haskell Compiler] project ==

See the [http://www.cs.uu.nl/wiki/Ehc/#On_EHC documentation on EHC], each paper at the ''Version 4'' part:
* Chapter 8 (EH4) of Atze Dijkstra's [http://www.cs.uu.nl/groups/ST/Projects/ehc/ehc-book.pdf Essential Haskell PhD thesis] (most recent version). A detailed explanation. It explains also that existential types can be expressed in Haskell, but their use is restricted to data declarations, and the notation (using keyword <hask>forall</hask>) may be confusing. In Essential Haskell, existential types can occur not only in data declarations, and a separate keyword <hask>exists</hask> is used for their notation.
* [http://www.cs.uu.nl/wiki/pub/Ehc/WebHome/20050107-eh-intro.pdf Essential Haskell Compiler overview]
* [http://www.cs.uu.nl/wiki/Ehc/Examples#EH_4_forall_and_exists_everywher Examples]

==See also==
* A mailinglist discussion: http://haskell.org/pipermail/haskell-cafe/2003-October/005231.html
*An example of encoding existentials using RankTwoPolymorphism : http://haskell.org/pipermail/haskell-cafe/2003-October/005304.html
=== Trac ===

[http://hackage.haskell.org/trac/haskell-prime/wiki/ExistentialQuantification Existential Quantification] is a detailed material on the topic. It has link also to the smaller [http://hackage.haskell.org/trac/haskell-prime/wiki/ExistentialQuantifier Existential Quantifier] page.

[[Category:Idioms]]
[[Category:Glossary]]
[[Category:Language extensions]]

Talk:Haskell in industry

2007-06-23T07:44:50Z

MathematicalOrchid: Erlang...?

Erm... Erlang Training and Consultancy Ltd? What's that got to do with ''Haskell''? [[User:MathematicalOrchid|MathematicalOrchid]] 07:44, 23 June 2007 (UTC)

Library for PPM images

2007-04-24T10:45:20Z

MathematicalOrchid: Coming soon...

[[Category:Code]]

Here's a trivial little thing I wrote for saving PPM images.

For those that don't know, PPM is probably the simplest possible image file format that other software will actually read! For example, [http://www.irfanview.com/ IrfanView] will read it. Thus, this is a simple, light-weight way to write programs that will output graphics files, using only pure Haskell 98 I/O.

The code is actually designed to work with my [[Library for colours]] - but you can supply something of your own if you prefer.

=== ASCII PPM ===

This is the 'P3' PPM format. The entire thing is plain ASCII. This makes it very easy to read and write, and extremely inefficient. Don't be surprised if a 800x800 pixel image takes up a couple of MB of space!

<haskell>
module PPM (make_ppm, save_ppm) where

import Colour

save_ppm :: FilePath -> [[Colour]] -> IO ()
save_ppm f css = writeFile f $ make_ppm css

make_ppm :: [[Colour]] -> String
make_ppm css =
"P3\n" ++ (show $ length $ head css) ++ " " ++ (show $ length css) ++ " 255\n" ++
(unlines $ map unwords $ group 15 $ map show $ concatMap colour $ concat css)

group _ [] = []
group n xs =
let (xs0,xs1) = splitAt n xs
in xs0 : group n xs1

colour (Colour r g b) = [channel r, channel g, channel b]

channel :: Double -> Int
channel = floor . (255*) . min 1 . max 0
</haskell>

=== Binary PPM ===

This is the 'P6' PPM format. The header is still plain ASCII, but the actual raster data is binary. This makes the file roughly 10x smaller. I suspect it also makes it go ''faster'' too. This library is a drop-in replacement for the one above; include whichever one you want depending on what output you want.

<haskell>
module Fast_PPM (make_ppm, save_ppm) where

import Data.Word
import qualified Data.ByteString as BIN
import Colour

quant8 :: Double -> Word8
quant8 x = floor $ x * 0xFF

cquant8 :: Colour -> [Word8]
cquant8 (Colour r g b) = [quant8 r, quant8 g, quant8 b]

string_to_bin :: String -> BIN.ByteString
string_to_bin = BIN.pack . map (fromIntegral . fromEnum)

header :: [[Colour]] -> BIN.ByteString
header pss =
let nx = length $ head pss
ny = length pss
in string_to_bin $ "P6\n" ++ show nx ++ " " ++ show ny ++ " 255\n"

body :: [[Colour]] -> BIN.ByteString
body pss = BIN.pack $ concatMap (cquant8 . cclip) $ concat pss

make_ppm :: [[Colour]] -> BIN.ByteString
make_ppm pss = BIN.append (header pss) (body pss)

save_ppm :: FilePath -> [[Colour]] -> IO ()
save_ppm f pss = BIN.writeFile f (make_ppm pss)
</haskell>

=== Binary PPM using Arrays ===

Coming soon. External interface looks something like this:

<haskell>
module FrameBuffer where

import Colour

data FrameBuffer

make_fb :: (Int,Int) -> IO FrameBuffer

write_pixel :: FrameBuffer -> (Int,Int) -> Colour -> IO ()

save_ppm :: FrameBuffer -> FilePath -> IO ()
</haskell>

Uses IOUArrays to drastically improve save speed. (And, in general, improves the efficiency of the rest of the program by 1) being more strict, and 2) using constant space for all drawing operations.)

User:MathematicalOrchid

2007-04-24T10:41:24Z

MathematicalOrchid:

User:MathematicalOrchid

2007-04-24T10:37:30Z

MathematicalOrchid: New projects, better formatting, etc.

'''Status''': Enthusiastic Haskell newbie.

=== Main Interests ===

* Using Haskell to write triposcopic mathematical algorithms with only a tiny amount of code.
* Using Haskell to do seriously compute-bounded work in a multiprocessor setup.

=== Current Projects ===

* (no name) — chaos pendulum simulator (Status: moderately working, needs UI)
* ''Haktal'' — fractal generator. (Status: minimal functionality)
* ''HoJ'' — Haskell to Java compiler. (Status: skeletal)
* ''Evlor'' — Interactive Haskell step-line debugger. (Status: skeletal)
* ''Indoculate'' — Program to convert a single (custom) source to both HTML and LaTeX, and also do cross-linking. (Status: in production use)

=== Projects On Hold ===

* Sorting algorithm benchmarks.
* Audio DSP in Haskell.
* [[Toy compression implementations|Haskell implementation of compression algorithms]].
* [[POV-Ray SDL project|Haskell SDL]] for [http://www.povray.org/ POV-Ray].

=== Failed Projects ===

* Haskell ray tracer.
* Haskell program to cause world peace.

=== Contributed Code ===

* [[Library for binary]]
* [[Library for vectors]]
* [[Library for colours]]
* [[Library for PPM images]]

=== Current Unsolved Questions ===

* Why do Haskell language extensions exist?
* How do you do graphics in Haskell?
* How come (e.g.) Smalltalk provides 27 different types of collection, but Haskell only ever involves single-linked lists and binary trees?
* Why is <hask>putStr xs1; putStr xs2</hask> slower than <hask>putStr (xs1 ++ xs2)</hask>?

User talk:MathematicalOrchid

2007-04-24T10:27:46Z

MathematicalOrchid: Interesting idea...

==Indoculate==

My [http://www.haskell.org/pipermail/haskell-cafe/2007-March/023335.html survey] on a HTML+LaTeX generator resulted in [http://sophos.berkeley.edu/macfarlane/pandoc/ PanDoc].

Interesting tip. However, Indoculate performs 2 functions. First, it provides a tool <code>MakeHTML</code>, which takes a source file and converts it to HTML. Similarly, <code>MakeLaTeX</code> takes the same source file and converts it to LaTeX instead. However, it also provides a second function. The <code>MakeSite</code> tool takes an entire hierarchical tree of source files and generates a complete fully cross-linked website. I very much doubt PanDoc can do that. [[User:MathematicalOrchid|MathematicalOrchid]] 10:27, 24 April 2007 (UTC)

Library for PPM images

2007-04-18T14:18:24Z

MathematicalOrchid:

Library for PPM images

2007-04-18T14:17:24Z

MathematicalOrchid:

[[Category:Code]]

Here's a trivial little thing I wrote for saving PPM images.

For those that don't know, PPM is probably the simplest possible image file format that other software will actually read! For example, [http://www.irfanview.com/ IrfanView] will read it. Thus, this is a simple, light-weight way to write programs that will output graphics files, using only pure Haskell 98 I/O.

The code is actually designed to work with my [[Library for colours]] - but you can supply something of your own if you prefer.

=== ASCII PPM ===

This is the 'P6' PPM format. The entire thing is plain ASCII. This makes it very easy to read and write, and extremely inefficient. Don't be surprised if a 800x800 pixel image takes up a couple of MB of space!

<haskell>
module PPM (make_ppm, save_ppm) where

import Colour

save_ppm :: FilePath -> [[Colour]] -> IO ()
save_ppm f css = writeFile f $ make_ppm css

make_ppm :: [[Colour]] -> String
make_ppm css =
"P3\n" ++ (show $ length $ head css) ++ " " ++ (show $ length css) ++ " 255\n" ++
(unlines $ map unwords $ group 15 $ map show $ concatMap colour $ concat css)

group _ [] = []
group n xs =
let (xs0,xs1) = splitAt n xs
in xs0 : group n xs1

colour (Colour r g b) = [channel r, channel g, channel b]

channel :: Double -> Int
channel = floor . (255*) . min 1 . max 0
</haskell>

=== Binary PPM ===

This is the 'P6' PPM format. The header is still plain ASCII, but the actual raster data is binary. This makes the file roughly 10x smaller. I suspect it also makes it go ''faster'' too. This library is a drop-in replacement for the one above; include whichever one you want depending on what output you want.

<haskell>
module Fast_PPM (make_ppm, save_ppm) where

import Data.Word
import qualified Data.ByteString as BIN
import Colour

quant8 :: Double -> Word8
quant8 x = floor $ x * 0xFF

cquant8 :: Colour -> [Word8]
cquant8 (Colour r g b) = [quant8 r, quant8 g, quant8 b]

string_to_bin :: String -> BIN.ByteString
string_to_bin = BIN.pack . map (fromIntegral . fromEnum)

header :: [[Colour]] -> BIN.ByteString
header pss =
let nx = length $ head pss
ny = length pss
in string_to_bin $ "P6\n" ++ show nx ++ " " ++ show ny ++ " 255\n"

body :: [[Colour]] -> BIN.ByteString
body pss = BIN.pack $ concatMap (cquant8 . cclip) $ concat pss

make_ppm :: [[Colour]] -> BIN.ByteString
make_ppm pss = BIN.append (header pss) (body pss)

save_ppm :: FilePath -> [[Colour]] -> IO ()
save_ppm f pss = BIN.writeFile f (make_ppm pss)
</haskell>

Library for colours

2007-04-18T14:12:33Z

MathematicalOrchid: Removed old quantinize function. (Doesn't really belong here.)

[[Category:Code]]

Simple thing for working on colours in the RGB colours space. (The intention being that each component is in the interval 0 ≤ x ≤ 1.) You could just use tuples, but this library provides simple colour arithmetic.

<haskell>
module Colour where

data Colour = Colour {red, green, blue :: Double} deriving (Eq, Show)

cmap :: (Double -> Double) -> Colour -> Colour
cmap f (Colour r g b) = Colour (f r) (f g) (f b)

czip :: (Double -> Double -> Double) -> Colour -> Colour -> Colour
czip f (Colour r1 g1 b1) (Colour r2 g2 b2) = Colour (f r1 r2) (f g1 g2) (f b1 b2)

cfold :: (Double -> Double -> Double) -> Colour -> Double
cfold f (Colour r g b) = r `f` g `f` b

cpromote :: Double -> Colour
cpromote x = Colour x x x

instance Num Colour where
(+) = czip (+)
(-) = czip (-)
(*) = czip (*)
negate = cmap negate
abs = cmap abs
signum = cmap signum
fromInteger x = cpromote (fromInteger x)

instance Fractional Colour where
(/) = czip (/)
recip = cmap recip
fromRational x = cpromote (fromRational x)

clip :: (Num n, Ord n) => n -> n
clip n
| n < 0 = 0
| n > 1 = 1
| otherwise = n

cclip :: Colour -> Colour
cclip = cmap clip
</haskell>

Library for vectors

2007-04-18T14:10:32Z

MathematicalOrchid: Added vmag_sqr

[[Category:Code]]

Most people just use <hask>(Int,Int)</hask> or similar for a 2-vector. However, if you find yourself wanting to do lots of vector arithmetic, that becomes annoying quite quickly. Below is what I use; feel free to adapt it to your needs.

<haskell>
module Vector where

type Scalar = Double

class Vector v where
vmap :: (Scalar -> Scalar) -> v -> v
vzip :: (Scalar -> Scalar -> Scalar) -> v -> v -> v
vfold :: (x -> Scalar -> x) -> x -> v -> x

vdot :: Vector v => v -> v -> Scalar
vdot v0 v1 = vfold (+) 0 $ vzip (*) v0 v1

vmag_sqr :: Vector v => v -> Scalar
vmag_sqr v = v `vdot` v

vmag :: Vector v => v -> Scalar
vmag = sqrt . vmag_sqr

vscale :: Vector v => Scalar -> v -> v
vscale s = vmap (s*)

vunit :: Vector v => v -> v
vunit v =
if vmag v == 0
then v
else vscale (1 / vmag v) v

data Vector2 = Vector2 {v2x, v2y :: Scalar} deriving (Eq)

instance Show Vector2 where
show (Vector2 x y) = "<" ++ (show x) ++ ", " ++ (show y) ++ ">"

instance Vector Vector2 where
vmap f (Vector2 x y) = Vector2 (f x) (f y)
vfold f i (Vector2 x y) = (i `f` x) `f` y
vzip f (Vector2 x0 y0) (Vector2 x1 y1) = Vector2 (f x0 x1) (f y0 y1)

instance Num Vector2 where
(+) = vzip (+)
(-) = vzip (-)
(*) = vzip (*)
negate = vmap negate
fromInteger s = Vector2 (fromInteger s) (fromInteger s)

instance Fractional Vector2 where
(/) = vzip (/)
fromDouble s = Vector2 s s

data Vector3 = Vector3 {v3x, v3y, v3z :: Scalar} deriving (Eq)

instance Show Vector3 where
show (Vector3 x y z) = "<" ++ (show x) ++ ", " ++ (show y) ++ ", " ++ (show z) ++ ">"

instance Vector Vector3 where
vmap f (Vector3 x y z) = Vector3 (f x) (f y) (f z)
vfold f i (Vector3 x y z) = ((i `f` x) `f` y) `f` z
vzip f (Vector3 x0 y0 z0) (Vector3 x1 y1 z1) = Vector3 (f x0 x1) (f y0 y1) (f z0 z1)

instance Num Vector3 where
(+) = vzip (+)
(-) = vzip (-)
(*) = vzip (*)
negate = vmap negate
fromInteger s = Vector3 (fromInteger s) (fromInteger s) (fromInteger s)

instance Fractional Vector3 where
(/) = vzip (/)
fromDouble s = Vector3 s s s

v3cross (Vector3 x0 y0 z0) (Vector3 x1 y1 z1) = Vector3 (y0*z1 - y1*z0) (x0*z1 - x1*z0) (x0*y1 - x1*y0)
</haskell>

PS. If anybody knows a way to make every instance of <hask>Vector</hask> automatically become an instance of <hask>Num</hask>, etc., let me know!

Library for PPM images

2007-04-17T12:48:49Z

MathematicalOrchid: Added 'P6' PPM file format.

[[Category:Code]]

Here's a trivial little thing I wrote for saving PPM images.

For those that don't know, PPM is probably the simplest possible image file format that other software will actually read! For example, [http://www.irfanview.com/ IrfanView] will read it. Thus, this is a simple, light-weight way to write programs that will output graphics files, using only pure Haskell 98 I/O.

=== ASCII PPM ===

<haskell>
module PPM (make_ppm, save_ppm) where

import Colour

save_ppm :: FilePath -> [[Colour]] -> IO ()
save_ppm f css = writeFile f $ make_ppm css

make_ppm :: [[Colour]] -> String
make_ppm css =
"P3\n" ++ (show $ length $ head css) ++ " " ++ (show $ length css) ++ " 255\n" ++
(unlines $ map unwords $ group 15 $ map show $ concatMap colour $ concat css)

group _ [] = []
group n xs =
let (xs0,xs1) = splitAt n xs
in xs0 : group n xs1

colour (Colour r g b) = [channel r, channel g, channel b]

channel :: Double -> Int
channel = floor . (255*) . min 1 . max 0
</haskell>

The code is actually designed to work with my [[Library for colours]] - but you can supply something of your own if you prefer.

=== Binary PPM ===

This is the 'P6' PPM format. The header is still plain ASCII, but the actual raster data is binary. This makes the file roughly 10x smaller. I suspect it also makes it go ''faster'' too. This library is a drop-in replacement for the one about; include whichever one you want depending on what output you want.

<haskell>
module Fast_PPM (make_ppm, save_ppm) where

import Data.Word
import qualified Data.ByteString as BIN
import Colour

quant8 :: Double -> Word8
quant8 x = floor $ x * 0xFF

cquant8 :: Colour -> [Word8]
cquant8 (Colour r g b) = [quant8 r, quant8 g, quant8 b]

string_to_bin :: String -> BIN.ByteString
string_to_bin = BIN.pack . map (fromIntegral . fromEnum)

header :: [[Colour]] -> BIN.ByteString
header pss =
let nx = length $ head pss
ny = length pss
in string_to_bin $ "P6\n" ++ show nx ++ " " ++ show ny ++ " 255\n"

body :: [[Colour]] -> BIN.ByteString
body pss = BIN.pack $ concatMap (cquant8 . cclip) $ concat pss

make_ppm :: [[Colour]] -> BIN.ByteString
make_ppm pss = BIN.append (header pss) (body pss)

save_ppm :: FilePath -> [[Colour]] -> IO ()
save_ppm f pss = BIN.writeFile f (make_ppm pss)
</haskell>

Library for colours

2007-04-16T11:50:20Z

MathematicalOrchid:

Library for PPM images

2007-04-16T11:46:07Z

MathematicalOrchid: I'll get it right in a minute...

Library for PPM images

2007-04-16T11:45:43Z

MathematicalOrchid:

Library for PPM images

2007-04-16T11:45:25Z

MathematicalOrchid:

User talk:MathematicalOrchid

2007-04-12T15:55:12Z

MathematicalOrchid:

Library for colours

2007-04-12T15:54:19Z

MathematicalOrchid: Ooops! Meant to press preview, not save...

[[Category:Code]]

Simple thing for working on colours in the RGB colours space. (The intention being that each component is in the interval 0 <= x <= 1.) You could just use tuples, but this library provides simple colour arithmetic.

<haskell>
module Colour where

data Colour = Colour {red, green, blue :: Double} deriving (Eq, Show)

cmap :: (Double -> Double) -> Colour -> Colour
cmap f (Colour r g b) = Colour (f r) (f g) (f b)

czip :: (Double -> Double -> Double) -> Colour -> Colour -> Colour
czip f (Colour r1 g1 b1) (Colour r2 g2 b2) = Colour (f r1 r2) (f g1 g2) (f b1 b2)

cfold :: (Double -> Double -> Double) -> Colour -> Double
cfold f (Colour r g b) = r `f` g `f` b

cpromote :: Double -> Colour
cpromote x = Colour x x x

instance Num Colour where
(+) = czip (+)
(-) = czip (-)
(*) = czip (*)
negate = cmap negate
abs = cmap abs
signum = cmap signum
fromInteger x = cpromote (fromInteger x)

instance Fractional Colour where
(/) = czip (/)
recip = cmap recip
fromRational x = cpromote (fromRational x)

clip :: (Num n, Ord n) => n -> n
clip n
| n < 0 = 0
| n > 1 = 1
| otherwise = n

cclip :: Colour -> Colour
cclip = cmap clip

quantinize :: Int -> Double -> Int
quantinize max v = floor (v * (fromIntegral max))
</haskell>

Library for colours

2007-04-12T15:52:51Z

MathematicalOrchid:

[[Category:Code]]

<haskell>
module Colour where

data Colour = Colour {red, green, blue :: Double} deriving (Eq, Show)

cmap :: (Double -> Double) -> Colour -> Colour
cmap f (Colour r g b) = Colour (f r) (f g) (f b)

czip :: (Double -> Double -> Double) -> Colour -> Colour -> Colour
czip f (Colour r1 g1 b1) (Colour r2 g2 b2) = Colour (f r1 r2) (f g1 g2) (f b1 b2)

cfold :: (Double -> Double -> Double) -> Colour -> Double
cfold f (Colour r g b) = r `f` g `f` b

cpromote :: Double -> Colour
cpromote x = Colour x x x

instance Num Colour where
(+) = czip (+)
(-) = czip (-)
(*) = czip (*)
negate = cmap negate
abs = cmap abs
signum = cmap signum
fromInteger x = cpromote (fromInteger x)

instance Fractional Colour where
(/) = czip (/)
recip = cmap recip
fromRational x = cpromote (fromRational x)

clip :: (Num n, Ord n) => n -> n
clip n
| n < 0 = 0
| n > 1 = 1
| otherwise = n

cclip :: Colour -> Colour
cclip = cmap clip

quantinize :: Int -> Double -> Int
quantinize max v = floor (v * (fromIntegral max))
</haskell>

Library for PPM images

2007-04-12T15:51:59Z

MathematicalOrchid: Simple PPM image saving.

User:MathematicalOrchid

2007-04-12T15:48:00Z

MathematicalOrchid: Added links to random stuff I wrote.

'''Status''': Enthusiastic Haskell newbie.

'''Main Interests''':

* Using Haskell to write triposcopic mathematical algorithms in tiny amounts of code.
* Using Haskell to do seriously compute-bounded work in a multiprocessor setup.

'''Current Projects''':

* Haskell to Java compiler. (Status: skeletal)
* ''Evlor'' — Interactive Haskell step-line debugger. (Status: broken/incomplete/major work required)
* ''Indoculate'' — Program to convert a single (custom) source to both HTML and LaTeX. (Status: medium-complete)

'''Projects On Hold''':

* Sorting algorithm benchmarks.
* Audio DSP in Haskell.
* [[Toy compression implementations|Haskell implementation of compression algorithms]].
* [[POV-Ray SDL project|Haskell SDL]] for [http://www.povray.org/ POV-Ray].

'''Failed Projects''':

* Haskell fractal generator.
* Haskell ray tracer.
* Haskell program to cause world peace.

'''Contributed Code'''

* [[Library for binary]]
* [[Library for vectors]]
* [[Library for colours]]
* [[Library for PPM images]]

'''Current Unsolved Questions''':

* Why do Haskell language extensions exist?
* How do you do graphics in Haskell?
* Why does Hugs crash so much?
* How come (e.g.) Smalltalk provides 27 different types of collection, but Haskell only ever involves single-linked lists and binary trees?
* Is <hask>putStr xs1; putStr xs2</hask> faster or slower than <hask>putStr (xs1 ++ xs2)</hask>?

Talk:Haskell in 5 steps

2007-03-26T12:54:31Z

MathematicalOrchid: Just noticing...

1. Is there a difference between <code>ghc -o hello hello.hs</code> and <code>ghc --make hello</code>? Which one is preferable?

2. The example with <code>let</code> is only going to work in GHCi, not Hugs. (Don't want to confuse beginners with that one...)

[[User:MathematicalOrchid|MathematicalOrchid]] 12:54, 26 March 2007 (UTC)

Monomorphism

2007-03-16T22:01:12Z

MathematicalOrchid: This is a wild guess! Somebody check it...

[[Category:Glossary]]

Monomorphism is the opposite of [[polymorphism]]. That is, a function is polymorphic if it works for several different types - and thus, a function is ''monomorphic'' if it works only for ''one'' type.

As an example, <hask>map</hask> is polymorphic. It's type is simply

<haskell>
map :: (a -> b) -> [a] -> [b]
</haskell>

However, the function

<haskell>
foo :: (Int -> Int) -> [Int] -> [Int]
foo = map
</haskell>

performs an identical operation to <hask>map</hask> (as is evident from the second line), but has a monomorphic type; it will ''only'' accept lists of <hask>Int</hask> and functions over them.

Perhaps you were looking for [[monomorphism restriction]]?

Parametric polymorphism

2007-03-16T21:56:27Z

MathematicalOrchid: ...have I got this right?

Parametric polymorphism is when a function's type signature allows various arguments to take on arbitrary types, but the types most be ''related'' to each other in some way.

For example, in Java one can write a function that accepts two arguments of any possible type. However, Haskell goes further by allowing a function to accept two arguments of any type so long as they are both ''the same'' type. For example

As a specific (and slightly more complicated) example, the well-known <hask>map</hask> function has a parametrically polymorphic type

<haskell>
map :: (a -> b) -> [a] -> [b]
</haskell>

which means that the function well accept ''any'' type of list and ''any'' type of function, '''provided''' the types match up. This makes <hask>map</hask> highly polymorphic, yet there is still no risk of a runtime type mismatch.

Talk:Combinator

2007-03-12T20:27:14Z

MathematicalOrchid:

I heard Parsec described as a "combinator library", but I have literally ''no concept'' of what the hell that actually means. This page doesn't leave me feeling any less confused. Any hints? [[User:MathematicalOrchid|MathematicalOrchid]] 12:39, 7 March 2007 (UTC)

I've created the referenced page "combinator pattern". Some knowledgeable soul should probably check I haven't said something dumb...

Combinator pattern

2007-03-12T20:26:19Z

MathematicalOrchid: Somebody who knows stuff should check this...

[[Category:Idioms]]

Libraries such as Parsec use the ''combinator pattern'', where complex structures are built by defining a small set of very simple 'primitives', and a set of 'combinators' for combining them into more complicated structures. It's somewhat similar to the Composition pattern found in object-oriented programming.

In the case of the Parsec, the library provides a set of extremely simple (almost trivial) parsers, and ways to combine small parsers into bigger parsers. Many other libraries and programs use the same ideas to build other structures:

* Parsec builds parsers out of smaller parsers.
* The School of Expression (SOE) graphics library builds pictures out of individual shapes.
* The SOE book also mentions a library to build music out of individual notes and rests.
* Another textbook describes building financial contracts.
* [Software transactional memory] builds big transactions out of smaller ones.
* The Haskell IO system itself builds whole programs out of small I/O actions using <hask>>>=</hask> and <hask>return</hask>.

Talk:Type arithmetic

2007-03-12T20:07:35Z

MathematicalOrchid: Is this why?

== Why? ==

This page seems to explain ''what'' but not ''why''. I don't know about anyone else, but when I read 'arithmetic at the type level', the very first thought that pops into my head is 'why in the name of God would you ''want'' to do such an insane thing?' [[User:MathematicalOrchid|MathematicalOrchid]] 11:53, 12 March 2007 (UTC)

== This why? ==

Following some discussions in #haskell, I understand this is related to that widespread Haskell obsession with attempting to "prove" things about programs (in spite of the fact that this is obviously impossible).

If I'm understanding this right, the idea is to be able to construct a type that means not merely "a List containing Integers", but "a List containing at least 6 Integers". And the "arithmetic" part comes in when one wants to say something like

: "This function takes a List containing at least X objects of type T and another List containing at least Y objects of type T, and returns a List containing at least X+Y objects of type T."

In other words, the "arithmetic" part is calculating X+Y at compile-time. And any function that calls the one so-described must prove to the type system that it satisfies the constrains. And, once the constraints are statically verified, no further runtime checks are required.

Is that roughly what this is all about? (And if so, can somebody add some statements to that effect to the content page?) [[User:MathematicalOrchid|MathematicalOrchid]] 20:07, 12 March 2007 (UTC)

Talk:Type arithmetic

2007-03-12T11:53:05Z

MathematicalOrchid: Why?

This page seems to explain ''what'' but not ''why''. I don't know about anyone else, but when I read 'arithmetic at the type level', the very first thought that pops into my head is 'why in the name of God would you ''want'' to do such an insane thing?' [[User:MathematicalOrchid|MathematicalOrchid]] 11:53, 12 March 2007 (UTC)

Talk:Toy compression implementations

2007-03-09T13:34:17Z

MathematicalOrchid: I'm impressed...

Much kudos for fixing the underflow error. The new LZW implementation is much smaller, but... how in the name of God does it actually work? o_O [[User:MathematicalOrchid|MathematicalOrchid]] 11:36, 9 March 2007 (UTC)

To understand it I rewrote it a bit:

<haskell>
encode_LZW :: (Eq t) => [t] -> [t] -> [Int]
encode_LZW alphabet = work (map (:[]) alphabet) where
work table [] = []
work table lst = index : work table' rest
where (tok, rest) = last . takeWhile ((`elem` table) . fst) . tail $ zip (inits lst) (tails lst)
index = fromJust (elemIndex tok table)
table' = table ++ [tok']
tok' = tok ++ [head rest]
</haskell>

The idea of the the table, which is the 1st argument to 'work', is that some prefix of the input is already in the table.

(encode_LZW chars) uses 'chars' to make the initial table for the 'work' function by turning the list of characters into a list of length 1 strings.

The <hask>where (tok,rst)</hask> definition can be read right to left:
* The <hask>zip (inits lst) (tails lst)</hask> computes every possible way to split <hask>lst</hask> input into a prefix and suffix, in increasing length of prefix.
* The <hask>tail</hask> function just drops the head because it doesn't want to consider the length 0 prefix
* <hask>takeWhile</hask> applies the predicate <hask>(`elem` table)</hask> to the prefix. This will always succeed on the length 1 prefix, and may find longer prefixes in the table.
* The <hask>last</hask> function take the last prefix in the table, which will always be the longest such prefix
* <hask>tok</hask> is this prefix, and <hask>rest</hask> is the remaining suffix to process.

Wow... a most ingenious (and inefficient) approach! Well, now it makes sense anyway. [[User:MathematicalOrchid|MathematicalOrchid]] 13:34, 9 March 2007 (UTC)

Talk:Toy compression implementations

2007-03-09T11:36:12Z

MathematicalOrchid: My mind is blown...