https://wiki.haskell.org/api.php?action=feedcontributions&user=Chak&feedformat=atomHaskellWiki - User contributions [en]2015-12-02T05:35:08ZUser contributionsMediaWiki 1.19.14+dfsg-1https://wiki.haskell.org/IDEsIDEs2015-10-24T06:07:14Z<p>Chak: /* Haskell for Mac */</p>
<hr />
<div>The IDE world in Haskell is incomplete, but is in motion. There are many choices. When choosing your IDE, there are the following things to consider.<br />
<br />
== Notable features of interest to consider ==<br />
<br />
This is a list of features that any Haskell IDE could or should have. The IDEs listed below generally support some subset of these features. Please add more to this list if you think of anything. In future this should be expanded into separate headings with more description of how they would desirably work. For a discussion of IDEs there is the [https://groups.google.com/forum/#!forum/haskell-ide haskell-ide mailing list] and the [https://github.com/haskell/haskell-ide haskell-ide repository]<br />
<br />
* Syntax highlighting (e.g. for Haskell, Cabal, Literate Haskell, Core, etc.)<br />
* Macros (e.g. inserting imports/aligning/sorting imports, aligning up text, transposing/switching/moving things around)<br />
* Type information (e.g. type at point, info at point, type of expression)<br />
* IntelliSense/completion (e.g. jump-to-definition, who-calls, calls-who, search by type, completion, etc.)<br />
* Project management (e.g. understanding of Cabal, configuration/building/installing, package sandboxing)<br />
* Interactive REPL (e.g. GHCi/Hugs interaction, expression evaluation and such)<br />
* Knowledge of Haskell in the GHCi/GHC side (e.g. understanding error types, the REPL, REPL objects, object inspection)<br />
* Indentation support (e.g. tab cycle, simple back-forward indentation, whole area indentation, structured editing, etc.)<br />
* Proper syntactic awareness of Haskell (e.g. with a proper parser and proper editor transpositions a la the structured editors of the 80s and Isabel et al)<br />
* Documentation support (e.g. ability to call up documentation of symbol or module, either in the editor, or in the browser)<br />
* Debugger support (e.g. stepping, breakpoints, etc.)<br />
* Refactoring support (e.g. symbol renaming, hlint, etc.)<br />
* Templates (e.g. snippets, Zen Coding type stuff, filling in all the cases of a case, etc.)<br />
<br />
== Open Source ==<br />
<br />
=== [https://github.com/rikvdkleij/intellij-haskell IntelliJ plugin for Haskell] ===<br />
:See [http://www.haskell.org/pipermail/haskell-cafe/2014-October/116567.html the announcement of the plugin] and the [http://en.wikipedia.org/wiki/IntelliJ_IDEA Wikipedia article about IntelliJ].<br />
<br />
=== [http://eclipsefp.github.com/ EclipseFP plugin for Eclipse IDE] ===<br />
:Eclipse is an open, extensible IDE platform for "everything and nothing in particular". It is implemented in Java and runs on several platforms. The Java IDE built on top of it has already become very popular among Java developers. The Haskell tools extend it to support editing (syntax coloring, code assist), compiling, and running Haskell programs from within the IDE. In more details, it features:<br />
:* Syntax highlighting and errors/warning highlighting<br />
:* A module browser showing all installed packages, their modules and the contents of the modules (functions, types, etc.)<br />
:* Integration with [http://www.haskell.org/hoogle/ Hoogle]: select an identifier in your code, press F4 and see the results in hoogle<br />
:* Code navigation: from within a Haskell source file, jump to the file where a symbol in declared, or everywhere a symbol is used (type sensitive search, not just a text search)<br />
:* Outline view: quickly jump to definitions in your file<br />
:* Quick fixes on common errors and import management<br />
:* A cabal file editor and integration with Cabal (uses cabal configure, cabal build under the covers), and a graphical view of installed packages<br />
:* Integration with GHCi: launch GHCi inside Eclipse on any module<br />
:* Integration with the GHCi debugger: performs the GHCi debugging commands for you from the standard Eclipse debugging interface<br />
:* Integration with [http://community.haskell.org/~ndm/hlint/ HLint]: gives you HLint warning on building and allows you to quick fix them<br />
:* Integration with [https://github.com/jaspervdj/stylish-haskell Stylish-Haskell]: format your code with stylish-haskell<br />
:* Test support: shows results of test-framework based test suite in a graphical format. HTF support to come soon.<br />
<br />
=== [http://colorer.sourceforge.net/eclipsecolorer/index.html Colorer plugin for Eclipse IDE] ===<br />
:Syntax highlighting in Eclipse can be achieved using the Colorer plugin. This is more light weight than using the EclipseFP plugin which has much functionality but can be messy to install and has sometimes been a bit shaky.<br />
<br />
:Eclipse Colorer is a plugin that enables syntax highlighting for a wide range of languages. It uses its own XML-based language for describing syntactic regions of languages. It does not include support for Haskell by default, but this can be added using the syntax description files attached below.<br />
<br />
:<b>Installation instructions</b><br />
:# Install the Colorer from the update site <code>http://colorer.sf.net/eclipsecolorer/</code> (for more detailed instructions see the project page).<br />
:# Download the Haskell syntax description files in [http://www.haskell.org/wikiupload/1/16/Haskell_Eclipse_Colorer.tar.gz Haskell_Eclipse_Colorer.tar.gz].<br />
:#Extract its contents (haskell.hrc and proto.hrc) into the following directory (overwriting proto.hrc): <code>eclipse_installation_dir/plugins/net.sf.colorer_0.9.9/colorer/hrc</code> (sometimes the wiki seems to create a nesting tar file, so you might have to unpack twise).<br />
:# Finished. A restart of Eclipse might be required. .hs files should open with syntax highlighting.<br />
<br />
:<b>Tips</b><br />
:* If .hs files open with another kind of syntax highlighting check that they are associated with the Colorer Editor (Preferences -> General -> Editors -> File Associations). Or right click on them and choose Open With -> Other -> Colorer Editor.<br />
:* Sometimes the highlighting gets confused. Then it might help to press Ctrl+R and re-colour the editor.<br />
:* Use the Word Completion feature (Shift+Alt+7) as a poor man's content assist.<br />
:* Use the standard External Tools feature in Eclipse to invoke the compiler from inside the IDE.<br />
:* Use the Block selection feature (Shift+Alt+A) to insert/remove line comments on multiple lines at the same time.<br />
:* Some other useful standard Eclipse features include the Open resource command (Ctrl+Shift+R), the File search command (Ctrl+H) and the bookmarks feature (Edit -> Add bookmark). Make sure to check Include in next/prev navigation box (Windows -> Preferences -> General -> Editors -> Text Editors -> Annotations -> Bookmarks).<br />
<br />
=== [[Leksah]] ===<br />
:Leksah is an IDE for Haskell written in Haskell. Leksah is intended as a practical tool to support the Haskell development process. Leksah uses GTK+ as GUI Toolkit with the gtk2hs binding. It is platform independent and should run on any platform where GTK+, gtk2hs and GHC can be installed.<br />
<br />
* http://www.leksah.org/<br />
* https://hackage.haskell.org/package/leksah<br />
* https://github.com/leksah/leksah<br />
<br />
=== [http://kdevelop.org/ KDevelop] ===<br />
:This IDE supports many languages. For Haskell it currently supports project management, syntax highlighting, building (with GHC) & executing within the IDE.<br />
<br />
=== [http://www.vim.org Vim] ===<br />
<br />
This may or may not be up to date. A Vim user should update it.<br />
<br />
:* [https://github.com/begriffs/haskell-vim-now Haskell-vim-now] -- Full-featured Vim config with install script. Supports type inspection, linting, Hoogle, tagging with codex+hasktags, unicode concealing, and refactoring.<br />
:* [http://projects.haskell.org/haskellmode-vim/ Haskell mode for Vim by Claus Reinke] - These plugins provide Vim integration with GHC and Haddock.<br />
:* [https://github.com/scrooloose/syntastic Syntastic] -- An extremely useful Vim plugin which will interact with ghc_mod (when editing a Haskell file) every time the source file is saved to check for syntax and type errors.<br />
:* [http://www.vim.org/scripts/script.php?script_id=2356 SHIM by Lars Kotthoff] -- Superior Haskell Interaction Mode (SHIM) plugin for Vim providing full GHCi integration (requires Vim compiled with Ruby support).<br />
:* [http://www.vim.org/scripts/script.php?script_id=3200 Haskell Conceal] -- shows Unicode symbols for common Haskell operators such as ++ and other lexical notation in Vim window (source file itself remains unchanged).<br />
:* [http://urchin.earth.li/~ian/vim/ by Ian Lynagh]: distinguishes different literal Haskell styles (Vim 7.0 includes a syntax file which supersedes these plugins).<br />
:* There's a [[Literate programming/Vim|copy of lhaskell.vim]] on the Wiki.<br />
:* [https://github.com/MarcWeber/vim-addon-haskell by Marc Weber] -- Vim script-based function/module completion, cabal support, tagging by one command, context completion ( w<tab> -> where ), module outline, etc<br />
:* [http://www.vim.org/scripts/script.php?script_id=1968 Vim indenting mode for Haskell]<br />
:* [https://github.com/ujihisa/neco-ghc neco-ghc] pragma, module, function completion.<br />
:* [https://github.com/eagletmt/ghcmod-vim Ghcmod-vim]<br />
:* [https://github.com/bitc/vim-hdevtools Hdevtools] - gives type information, quicker reloading and more.<br />
:* [http://blog-mno2.csie.org/blog/2011/11/17/vim-plugins-for-haskell-programmers/ Addition list] with some missing here with screen shots of many of the above.<br />
<br />
=== [http://www.gnu.org/s/emacs/ Emacs] ===<br />
<br />
See [[Emacs]].<br />
<br />
=== [http://atom.io Atom] ===<br />
<br />
Atom is very similar to Sublime Text 2 (which is now discontinued). A huge [http://atom.io/packages package database] exists and two packages important to haskell developers are:<br />
:* [https://atom.io/packages/language-haskell language-haskell] for haskell syntax highlighting.<br />
:* [https://atom.io/packages/ide-haskell ide-haskell] for cabal-support, linting and ghc-mod utilities like type previewing.<br />
<br />
== Commercial ==<br />
<br />
=== [https://www.fpcomplete.com/business/haskell-center/overview/ FP Haskell Center] ===<br />
<br />
FP Complete has developed a commercial Haskell IDE.<br />
<br />
It's in the cloud, and comes with all of the libraries on Stackage ready to go. (Basically, the Haskell Platform on steroids.)<br />
<br />
It's "in the cloud," which has its pros and cons.<br />
<br />
The standard IDE is in your browser, and has integration with Git and Github. Emacs, Sublime and Vim support will be released soon. One particularly cool feature is that you can spin up temporary web servers to test out the Haskell-powered website you might be coding up. It's really easy, and you can pay for FP Complete to host your permanent application, too.<br />
<br />
There's a free trial, with free academic licenses and paid commercial licenses. There will be "personal" licenses in a few weeks (from early Sept 2013) as well, since the commercial pricing is a bit steep for hobbyists.<br />
<br />
==== Feature set ====<br />
<br />
Some of the features:<br />
<br />
* Auto-completion.<br />
* Hoogle searching of all of Stackage.<br />
* Hoogling in the context of a module and its imports.<br />
* Live typechecking/recompiling / jump to error.<br />
* Hlint suggestions.<br />
* Jump to definition.<br />
* Auto-removal of unnecessary imports.<br />
* Get type of any identifier (globally or locally defined).<br />
* Show documentation of any symbol (via hoogle), or open haddocks.<br />
* Refactoring.<br />
* Build project, run project.<br />
* Auto-code formatting.<br />
* Run a temporary web service for testing web apps.<br />
* Deploy project to an Amazon instance.<br />
<br />
=== [http://haskellformac.com Haskell for Mac] ===<br />
<br />
Haskell for Mac is an easy-to-use integrated programming environment for Haskell on OS X. It is a one-click install of a complete Haskell system, including Haskell compiler, editor, many libraries, and a novel form of interactive <em>Haskell playgrounds</em>. Haskell playgrounds support exploration and experimentation with code. They are convenient to learn functional programming, prototype Haskell code, interactively visualize data, and to create interactive animations.<br />
<br />
Features include the following:<br />
* Built-in Haskell editor with customisable themes, or you can use a separate text editor.<br />
* Autosaving and automatic project versioning.<br />
* Interactive Haskell playgrounds evaluate your code as you type.<br />
* Playground results can be text or images produced by the Rasterific, Diagrams, and Chart packages.<br />
* Add code and multimedia files to a Haskell project with drag'n'drop. <br />
* Haskell binding to Apple's 2D animation and games framework SpriteKit.<br />
<br />
Haskell for Mac requires OS X Yosemite or above.<br />
<br />
=== [https://github.com/SublimeHaskell/SublimeHaskell/ Sublime-Haskell] ===<br />
<br />
Sublime-Haskell is a plugin for the [http://www.sublimetext.com/ Sublime Text Editor]. It is installed through the [https://sublime.wbond.net/ Sublime Package Controller].<br />
<br />
It is built as a plugin to the Sublime text editor, so all the standard editing functionality is there. Here are the Haskell specific features:<br />
* Syntax highlighting and error marking for Haskell and Cabal. Errors provided by interaction with the compiler. The errors are listed in an error pane, and the user can navigate through the errors.<br />
* When working on a project that has a Cabal file, the Cabal file is detected, and the project can be configured, built, run, and tested using Cabal. The Cabal file is automatically detected. This also enhances error reporting, and auto-completion (all exported symbols from the project can then be matched against). Thus, there is good project management support.<br />
* Rescan/build on file change.<br />
* Can use Cabal-dev for sandboxing/pristine builds.<br />
* Prettification/indentation and alignment via Stylish-Haskell.<br />
* Jump to definition, and show information for a definition (using haskell-docs).<br />
* Type display and insertion<br />
* Fast building and type-inference via hdevtools.<br />
* HLint provided by GHC-Mod.<br />
<br />
Thus, Sublime-Haskell satisfies all the requirements listed at the top of the wiki for a baseline Haskell IDE. Sublime-Text is closed source, but the Haskell plugin is open source.<br />
<br />
== See also ==<br />
<br />
* [http://blog.johantibell.com/2011/08/results-from-state-of-haskell-2011.html Results from the State of Haskell, 2011 Survey]. <br />
* [http://nickknowlson.com/blog/2011/09/12/haskell-survey-categorized-weaknesses/ Categorized Weaknesses from the State of Haskell 2011 Survey], which barely touched upon IDEs.<br />
* [[Editors]]<br />
* [[Applications and libraries/Program development#Editor support]]<br />
* [http://code.haskell.org/shim/ Shim]; the aim of the shim (Superior Haskell Interaction Mode) project is to provide better support for editing Haskell code in VIM and Emacs<br />
<br />
== Other IDEs and Editors ==<br />
<br />
The list below is incomplete. Please add to it with whatever you think of. This list should be expanded into sections, as above, with more details, with links to the actual documentation of the described features.<br />
<br />
* Vim — '''PROS:''' Free. Works on Windows. Works in terminal. Decent alignment support. Tag-based completion and jumps. Very good syntax highlighting, flymake (via Syntastic), Cabal integration, Hoogle. Documentation for symbol at point '''CONS:''' Arcane, difficult for new users. Some complain of bad indentation support.<br />
* [http://www.haskell.org/haskellwiki/Haskell_mode_for_Emacs Emacs]— '''PROS:''' Free. Works on Windows. Works in terminal. Decent alignment, indentation, syntax highlighting. Limited type information (type and info of name at point). Cabal/GHC/GHCi awareness and Haskell-aware REPL. Completion and jump-to-definition (via ETAGS). Documentation of symbol at point. Hoogle. Documentation for symbol at point. Flymake (error checking on the fly). '''CONS:''' Arcane, difficult for new users.<br />
* Sublime — '''PROS:''' Works on Windows. '''CONS:''' Poor alignment support (though [http://www.reddit.com/r/haskell/comments/ts8fi/haskell_ides_emacs_vim_and_sublime_oh_my_opinions/c4pair1 there are packages] to do indentation a little better). Proprietary.<br />
* [[Yi]] — '''PROS:''' Written in Haskell. Works in terminal. '''CONS:''' Very immature, lacking features. Problems building generally, especially on Windows.<br />
* [http://www.haskell.org/haskellwiki/Leksah Leksah] — '''PROS:''' Syntax highlighting. Understands Cabal, Module browser, dependency knowledge, documentation display inside the IDE, jump-to-definition, flymake (error checking on the fly), limited evaluation of snippets, scratch buffer. Autocompletion. Not an arcane interface a la Emacs/Vim. '''CONS:''' Doesn't have a decent REPL. Are there any other cons? — This should be moved to the section above.<br />
* [[Editors | Other Editors]<br />
* [http://www.cs.kent.ac.uk/projects/heat/ HEAT:] An Interactive Development Environment for Learning & Teaching Haskell<br />
* [http://www.geany.org/ Geany] '''PROS:''' Free. Works on Windows. Syntax highlighting, REPL. '''CONS:''' After using it for a while, Geany freezes quite often.<br />
<br />
== Outdated ==<br />
<br />
* [http://web.archive.org/web/20110726153330/http://hoovy.org/HaskellXcodePlugin/ plugin for Xcode] (links to the web archive)<br />
<br />
=== [http://www.haskell.org/haskellwiki/HIDE hIDE] ===<br />
:hIDE is a GUI-based Haskell IDE written using gtk+hs. It does not include an editor but instead interfaces with NEdit, vim or GNU emacs.<br />
<br />
=== [http://www.haskell.org/haskellwiki/HIDE hIDE-2] ===<br />
:Through the dark ages many a programmer has longed for the ultimate tool. In response to this most unnerving craving, of which we ourselves have had maybe more than our fair share, the dynamic trio of #Haskellaniacs (dons, dcoutts and Lemmih) hereby announce, to the relief of the community, that a fetus has been conceived: ''hIDE - the Haskell Integrated Development Environment''. So far the unborn integrates source code recognition and a chameleon editor, resenting these in a snappy gtk2 environment. Although no seer has yet predicted the date of birth of our hIDEous creature, we hope that the mere knowledge of its existence will spread peace of mind throughout the community as oil on troubled waters. See also: [[HIDE/Screenshots of HIDE]] and [[HIDE]]<br />
<br />
=== [http://web.archive.org/web/20060213161530/http://www.students.cs.uu.nl/people/rjchaaft/JCreator/ JCreator with Haskell support] ===<br />
: <b>N.B. The link above is to the Wayback Machine (Web Archive); it seem that JCreator is no longer supported.</b><br />
:JCreator is a highly customizable Java IDE for Windows. Features include extensive project support, fully customizable toolbars (including the images of user tools) and menus, increase/decrease indent for a selected block of text (tab/shift+tab respectively). The Haskell support module adds syntax highlighting for Haskell files and WinHugs, hugs, a static checker (if you double click on the error message, JCreator will jump to the right file and line and highlight it yellow) and the Haskell 98 Report as tools. Platforms: Win95, Win98, WinNT and Win2000 (only Win95 not tested yet). Size: 6MB. JCreator is a trademark of Xinox Software; Copyright &copy; 2000 Xinox Software. The Haskell support module is made by Rijk-Jan van Haaften.<br />
<br />
=== [[haste]] - Haskell TurboEdit ===<br />
:haste - Haskell TurboEdit - was an IDE for the functional programming language Haskell, written in Haskell.<br />
<br />
=== [http://www.haskell.org/visualhaskell Visual Haskell] ===<br />
:Visual Haskell is a complete development environment for Haskell software, based on Microsoft's [http://www.microsoft.com/visualstudio/en-us Microsoft Visual Studio] platform. Visual Haskell integrates with the Visual Studio editor to provide interactive features to aid Haskell development, and it enables the construction of projects consisting of multiple Haskell modules, using the Cabal building/packaging infrastructure.<br />
<br />
=== [http://www.cs.kent.ac.uk/projects/vital/ Vital] ===<br />
:Vital is a visual programming environment. It is particularly intended for supporting the open-ended, incremental style of development often preferred by end users (engineers, scientists, analysts, etc.).<br />
<br />
=== [http://www.cs.kent.ac.uk/projects/pivotal/ Pivotal] ===<br />
:Pivotal 0.025 is an early prototype of a Vital-like environment for Haskell. Unlike Vital, however, Pivotal is implemented entirely in Haskell. The implementation is based on the use of the hs-plugins library to allow dynamic compilation and evaluation of Haskell expressions together with the gtk2hs library for implementing the GUI.</div>Chakhttps://wiki.haskell.org/IDEsIDEs2015-10-24T06:06:22Z<p>Chak: /* Sublime-Haskell */</p>
<hr />
<div>The IDE world in Haskell is incomplete, but is in motion. There are many choices. When choosing your IDE, there are the following things to consider.<br />
<br />
== Notable features of interest to consider ==<br />
<br />
This is a list of features that any Haskell IDE could or should have. The IDEs listed below generally support some subset of these features. Please add more to this list if you think of anything. In future this should be expanded into separate headings with more description of how they would desirably work. For a discussion of IDEs there is the [https://groups.google.com/forum/#!forum/haskell-ide haskell-ide mailing list] and the [https://github.com/haskell/haskell-ide haskell-ide repository]<br />
<br />
* Syntax highlighting (e.g. for Haskell, Cabal, Literate Haskell, Core, etc.)<br />
* Macros (e.g. inserting imports/aligning/sorting imports, aligning up text, transposing/switching/moving things around)<br />
* Type information (e.g. type at point, info at point, type of expression)<br />
* IntelliSense/completion (e.g. jump-to-definition, who-calls, calls-who, search by type, completion, etc.)<br />
* Project management (e.g. understanding of Cabal, configuration/building/installing, package sandboxing)<br />
* Interactive REPL (e.g. GHCi/Hugs interaction, expression evaluation and such)<br />
* Knowledge of Haskell in the GHCi/GHC side (e.g. understanding error types, the REPL, REPL objects, object inspection)<br />
* Indentation support (e.g. tab cycle, simple back-forward indentation, whole area indentation, structured editing, etc.)<br />
* Proper syntactic awareness of Haskell (e.g. with a proper parser and proper editor transpositions a la the structured editors of the 80s and Isabel et al)<br />
* Documentation support (e.g. ability to call up documentation of symbol or module, either in the editor, or in the browser)<br />
* Debugger support (e.g. stepping, breakpoints, etc.)<br />
* Refactoring support (e.g. symbol renaming, hlint, etc.)<br />
* Templates (e.g. snippets, Zen Coding type stuff, filling in all the cases of a case, etc.)<br />
<br />
== Open Source ==<br />
<br />
=== [https://github.com/rikvdkleij/intellij-haskell IntelliJ plugin for Haskell] ===<br />
:See [http://www.haskell.org/pipermail/haskell-cafe/2014-October/116567.html the announcement of the plugin] and the [http://en.wikipedia.org/wiki/IntelliJ_IDEA Wikipedia article about IntelliJ].<br />
<br />
=== [http://eclipsefp.github.com/ EclipseFP plugin for Eclipse IDE] ===<br />
:Eclipse is an open, extensible IDE platform for "everything and nothing in particular". It is implemented in Java and runs on several platforms. The Java IDE built on top of it has already become very popular among Java developers. The Haskell tools extend it to support editing (syntax coloring, code assist), compiling, and running Haskell programs from within the IDE. In more details, it features:<br />
:* Syntax highlighting and errors/warning highlighting<br />
:* A module browser showing all installed packages, their modules and the contents of the modules (functions, types, etc.)<br />
:* Integration with [http://www.haskell.org/hoogle/ Hoogle]: select an identifier in your code, press F4 and see the results in hoogle<br />
:* Code navigation: from within a Haskell source file, jump to the file where a symbol in declared, or everywhere a symbol is used (type sensitive search, not just a text search)<br />
:* Outline view: quickly jump to definitions in your file<br />
:* Quick fixes on common errors and import management<br />
:* A cabal file editor and integration with Cabal (uses cabal configure, cabal build under the covers), and a graphical view of installed packages<br />
:* Integration with GHCi: launch GHCi inside Eclipse on any module<br />
:* Integration with the GHCi debugger: performs the GHCi debugging commands for you from the standard Eclipse debugging interface<br />
:* Integration with [http://community.haskell.org/~ndm/hlint/ HLint]: gives you HLint warning on building and allows you to quick fix them<br />
:* Integration with [https://github.com/jaspervdj/stylish-haskell Stylish-Haskell]: format your code with stylish-haskell<br />
:* Test support: shows results of test-framework based test suite in a graphical format. HTF support to come soon.<br />
<br />
=== [http://colorer.sourceforge.net/eclipsecolorer/index.html Colorer plugin for Eclipse IDE] ===<br />
:Syntax highlighting in Eclipse can be achieved using the Colorer plugin. This is more light weight than using the EclipseFP plugin which has much functionality but can be messy to install and has sometimes been a bit shaky.<br />
<br />
:Eclipse Colorer is a plugin that enables syntax highlighting for a wide range of languages. It uses its own XML-based language for describing syntactic regions of languages. It does not include support for Haskell by default, but this can be added using the syntax description files attached below.<br />
<br />
:<b>Installation instructions</b><br />
:# Install the Colorer from the update site <code>http://colorer.sf.net/eclipsecolorer/</code> (for more detailed instructions see the project page).<br />
:# Download the Haskell syntax description files in [http://www.haskell.org/wikiupload/1/16/Haskell_Eclipse_Colorer.tar.gz Haskell_Eclipse_Colorer.tar.gz].<br />
:#Extract its contents (haskell.hrc and proto.hrc) into the following directory (overwriting proto.hrc): <code>eclipse_installation_dir/plugins/net.sf.colorer_0.9.9/colorer/hrc</code> (sometimes the wiki seems to create a nesting tar file, so you might have to unpack twise).<br />
:# Finished. A restart of Eclipse might be required. .hs files should open with syntax highlighting.<br />
<br />
:<b>Tips</b><br />
:* If .hs files open with another kind of syntax highlighting check that they are associated with the Colorer Editor (Preferences -> General -> Editors -> File Associations). Or right click on them and choose Open With -> Other -> Colorer Editor.<br />
:* Sometimes the highlighting gets confused. Then it might help to press Ctrl+R and re-colour the editor.<br />
:* Use the Word Completion feature (Shift+Alt+7) as a poor man's content assist.<br />
:* Use the standard External Tools feature in Eclipse to invoke the compiler from inside the IDE.<br />
:* Use the Block selection feature (Shift+Alt+A) to insert/remove line comments on multiple lines at the same time.<br />
:* Some other useful standard Eclipse features include the Open resource command (Ctrl+Shift+R), the File search command (Ctrl+H) and the bookmarks feature (Edit -> Add bookmark). Make sure to check Include in next/prev navigation box (Windows -> Preferences -> General -> Editors -> Text Editors -> Annotations -> Bookmarks).<br />
<br />
=== [[Leksah]] ===<br />
:Leksah is an IDE for Haskell written in Haskell. Leksah is intended as a practical tool to support the Haskell development process. Leksah uses GTK+ as GUI Toolkit with the gtk2hs binding. It is platform independent and should run on any platform where GTK+, gtk2hs and GHC can be installed.<br />
<br />
* http://www.leksah.org/<br />
* https://hackage.haskell.org/package/leksah<br />
* https://github.com/leksah/leksah<br />
<br />
=== [http://kdevelop.org/ KDevelop] ===<br />
:This IDE supports many languages. For Haskell it currently supports project management, syntax highlighting, building (with GHC) & executing within the IDE.<br />
<br />
=== [http://www.vim.org Vim] ===<br />
<br />
This may or may not be up to date. A Vim user should update it.<br />
<br />
:* [https://github.com/begriffs/haskell-vim-now Haskell-vim-now] -- Full-featured Vim config with install script. Supports type inspection, linting, Hoogle, tagging with codex+hasktags, unicode concealing, and refactoring.<br />
:* [http://projects.haskell.org/haskellmode-vim/ Haskell mode for Vim by Claus Reinke] - These plugins provide Vim integration with GHC and Haddock.<br />
:* [https://github.com/scrooloose/syntastic Syntastic] -- An extremely useful Vim plugin which will interact with ghc_mod (when editing a Haskell file) every time the source file is saved to check for syntax and type errors.<br />
:* [http://www.vim.org/scripts/script.php?script_id=2356 SHIM by Lars Kotthoff] -- Superior Haskell Interaction Mode (SHIM) plugin for Vim providing full GHCi integration (requires Vim compiled with Ruby support).<br />
:* [http://www.vim.org/scripts/script.php?script_id=3200 Haskell Conceal] -- shows Unicode symbols for common Haskell operators such as ++ and other lexical notation in Vim window (source file itself remains unchanged).<br />
:* [http://urchin.earth.li/~ian/vim/ by Ian Lynagh]: distinguishes different literal Haskell styles (Vim 7.0 includes a syntax file which supersedes these plugins).<br />
:* There's a [[Literate programming/Vim|copy of lhaskell.vim]] on the Wiki.<br />
:* [https://github.com/MarcWeber/vim-addon-haskell by Marc Weber] -- Vim script-based function/module completion, cabal support, tagging by one command, context completion ( w<tab> -> where ), module outline, etc<br />
:* [http://www.vim.org/scripts/script.php?script_id=1968 Vim indenting mode for Haskell]<br />
:* [https://github.com/ujihisa/neco-ghc neco-ghc] pragma, module, function completion.<br />
:* [https://github.com/eagletmt/ghcmod-vim Ghcmod-vim]<br />
:* [https://github.com/bitc/vim-hdevtools Hdevtools] - gives type information, quicker reloading and more.<br />
:* [http://blog-mno2.csie.org/blog/2011/11/17/vim-plugins-for-haskell-programmers/ Addition list] with some missing here with screen shots of many of the above.<br />
<br />
=== [http://www.gnu.org/s/emacs/ Emacs] ===<br />
<br />
See [[Emacs]].<br />
<br />
=== [http://atom.io Atom] ===<br />
<br />
Atom is very similar to Sublime Text 2 (which is now discontinued). A huge [http://atom.io/packages package database] exists and two packages important to haskell developers are:<br />
:* [https://atom.io/packages/language-haskell language-haskell] for haskell syntax highlighting.<br />
:* [https://atom.io/packages/ide-haskell ide-haskell] for cabal-support, linting and ghc-mod utilities like type previewing.<br />
<br />
== Commercial ==<br />
<br />
=== [https://www.fpcomplete.com/business/haskell-center/overview/ FP Haskell Center] ===<br />
<br />
FP Complete has developed a commercial Haskell IDE.<br />
<br />
It's in the cloud, and comes with all of the libraries on Stackage ready to go. (Basically, the Haskell Platform on steroids.)<br />
<br />
It's "in the cloud," which has its pros and cons.<br />
<br />
The standard IDE is in your browser, and has integration with Git and Github. Emacs, Sublime and Vim support will be released soon. One particularly cool feature is that you can spin up temporary web servers to test out the Haskell-powered website you might be coding up. It's really easy, and you can pay for FP Complete to host your permanent application, too.<br />
<br />
There's a free trial, with free academic licenses and paid commercial licenses. There will be "personal" licenses in a few weeks (from early Sept 2013) as well, since the commercial pricing is a bit steep for hobbyists.<br />
<br />
==== Feature set ====<br />
<br />
Some of the features:<br />
<br />
* Auto-completion.<br />
* Hoogle searching of all of Stackage.<br />
* Hoogling in the context of a module and its imports.<br />
* Live typechecking/recompiling / jump to error.<br />
* Hlint suggestions.<br />
* Jump to definition.<br />
* Auto-removal of unnecessary imports.<br />
* Get type of any identifier (globally or locally defined).<br />
* Show documentation of any symbol (via hoogle), or open haddocks.<br />
* Refactoring.<br />
* Build project, run project.<br />
* Auto-code formatting.<br />
* Run a temporary web service for testing web apps.<br />
* Deploy project to an Amazon instance.<br />
<br />
=== [http://haskellformac.com Haskell for Mac] ===<br />
<br />
Haskell for Mac is an easy-to-use integrated programming environment for Haskell on OS X. It is a one-click install of a complete Haskell system, including Haskell compiler, editor, many libraries, and a novel form of interactive <em>Haskell playgrounds</em>. Haskell playgrounds support exploration and experimentation with code. They are convenient to learn functional programming, prototype Haskell code, interactively visualize data, and to create interactive animations.<br />
<br />
Features include the following:<br />
* Built-in Haskell editor with customisable themes, or you can use a separate text editor.<br />
* Autosaving and automatic project versioning.<br />
* Interactive Haskell playgrounds evaluate your code as you type.<br />
* Playground results can be text or images produced by the Rasterific, Diagrams, and Chart packages.<br />
* Add code and multimedia files to a Haskell project with drag'n'drop. <br />
* Haskell binding to Apple's 2D animation and games framework SpriteKit.<br />
<br />
Haskell for Mac is requires for OS X Yosemite or above.<br />
<br />
=== [https://github.com/SublimeHaskell/SublimeHaskell/ Sublime-Haskell] ===<br />
<br />
Sublime-Haskell is a plugin for the [http://www.sublimetext.com/ Sublime Text Editor]. It is installed through the [https://sublime.wbond.net/ Sublime Package Controller].<br />
<br />
It is built as a plugin to the Sublime text editor, so all the standard editing functionality is there. Here are the Haskell specific features:<br />
* Syntax highlighting and error marking for Haskell and Cabal. Errors provided by interaction with the compiler. The errors are listed in an error pane, and the user can navigate through the errors.<br />
* When working on a project that has a Cabal file, the Cabal file is detected, and the project can be configured, built, run, and tested using Cabal. The Cabal file is automatically detected. This also enhances error reporting, and auto-completion (all exported symbols from the project can then be matched against). Thus, there is good project management support.<br />
* Rescan/build on file change.<br />
* Can use Cabal-dev for sandboxing/pristine builds.<br />
* Prettification/indentation and alignment via Stylish-Haskell.<br />
* Jump to definition, and show information for a definition (using haskell-docs).<br />
* Type display and insertion<br />
* Fast building and type-inference via hdevtools.<br />
* HLint provided by GHC-Mod.<br />
<br />
Thus, Sublime-Haskell satisfies all the requirements listed at the top of the wiki for a baseline Haskell IDE. Sublime-Text is closed source, but the Haskell plugin is open source.<br />
<br />
== See also ==<br />
<br />
* [http://blog.johantibell.com/2011/08/results-from-state-of-haskell-2011.html Results from the State of Haskell, 2011 Survey]. <br />
* [http://nickknowlson.com/blog/2011/09/12/haskell-survey-categorized-weaknesses/ Categorized Weaknesses from the State of Haskell 2011 Survey], which barely touched upon IDEs.<br />
* [[Editors]]<br />
* [[Applications and libraries/Program development#Editor support]]<br />
* [http://code.haskell.org/shim/ Shim]; the aim of the shim (Superior Haskell Interaction Mode) project is to provide better support for editing Haskell code in VIM and Emacs<br />
<br />
== Other IDEs and Editors ==<br />
<br />
The list below is incomplete. Please add to it with whatever you think of. This list should be expanded into sections, as above, with more details, with links to the actual documentation of the described features.<br />
<br />
* Vim — '''PROS:''' Free. Works on Windows. Works in terminal. Decent alignment support. Tag-based completion and jumps. Very good syntax highlighting, flymake (via Syntastic), Cabal integration, Hoogle. Documentation for symbol at point '''CONS:''' Arcane, difficult for new users. Some complain of bad indentation support.<br />
* [http://www.haskell.org/haskellwiki/Haskell_mode_for_Emacs Emacs]— '''PROS:''' Free. Works on Windows. Works in terminal. Decent alignment, indentation, syntax highlighting. Limited type information (type and info of name at point). Cabal/GHC/GHCi awareness and Haskell-aware REPL. Completion and jump-to-definition (via ETAGS). Documentation of symbol at point. Hoogle. Documentation for symbol at point. Flymake (error checking on the fly). '''CONS:''' Arcane, difficult for new users.<br />
* Sublime — '''PROS:''' Works on Windows. '''CONS:''' Poor alignment support (though [http://www.reddit.com/r/haskell/comments/ts8fi/haskell_ides_emacs_vim_and_sublime_oh_my_opinions/c4pair1 there are packages] to do indentation a little better). Proprietary.<br />
* [[Yi]] — '''PROS:''' Written in Haskell. Works in terminal. '''CONS:''' Very immature, lacking features. Problems building generally, especially on Windows.<br />
* [http://www.haskell.org/haskellwiki/Leksah Leksah] — '''PROS:''' Syntax highlighting. Understands Cabal, Module browser, dependency knowledge, documentation display inside the IDE, jump-to-definition, flymake (error checking on the fly), limited evaluation of snippets, scratch buffer. Autocompletion. Not an arcane interface a la Emacs/Vim. '''CONS:''' Doesn't have a decent REPL. Are there any other cons? — This should be moved to the section above.<br />
* [[Editors | Other Editors]<br />
* [http://www.cs.kent.ac.uk/projects/heat/ HEAT:] An Interactive Development Environment for Learning & Teaching Haskell<br />
* [http://www.geany.org/ Geany] '''PROS:''' Free. Works on Windows. Syntax highlighting, REPL. '''CONS:''' After using it for a while, Geany freezes quite often.<br />
<br />
== Outdated ==<br />
<br />
* [http://web.archive.org/web/20110726153330/http://hoovy.org/HaskellXcodePlugin/ plugin for Xcode] (links to the web archive)<br />
<br />
=== [http://www.haskell.org/haskellwiki/HIDE hIDE] ===<br />
:hIDE is a GUI-based Haskell IDE written using gtk+hs. It does not include an editor but instead interfaces with NEdit, vim or GNU emacs.<br />
<br />
=== [http://www.haskell.org/haskellwiki/HIDE hIDE-2] ===<br />
:Through the dark ages many a programmer has longed for the ultimate tool. In response to this most unnerving craving, of which we ourselves have had maybe more than our fair share, the dynamic trio of #Haskellaniacs (dons, dcoutts and Lemmih) hereby announce, to the relief of the community, that a fetus has been conceived: ''hIDE - the Haskell Integrated Development Environment''. So far the unborn integrates source code recognition and a chameleon editor, resenting these in a snappy gtk2 environment. Although no seer has yet predicted the date of birth of our hIDEous creature, we hope that the mere knowledge of its existence will spread peace of mind throughout the community as oil on troubled waters. See also: [[HIDE/Screenshots of HIDE]] and [[HIDE]]<br />
<br />
=== [http://web.archive.org/web/20060213161530/http://www.students.cs.uu.nl/people/rjchaaft/JCreator/ JCreator with Haskell support] ===<br />
: <b>N.B. The link above is to the Wayback Machine (Web Archive); it seem that JCreator is no longer supported.</b><br />
:JCreator is a highly customizable Java IDE for Windows. Features include extensive project support, fully customizable toolbars (including the images of user tools) and menus, increase/decrease indent for a selected block of text (tab/shift+tab respectively). The Haskell support module adds syntax highlighting for Haskell files and WinHugs, hugs, a static checker (if you double click on the error message, JCreator will jump to the right file and line and highlight it yellow) and the Haskell 98 Report as tools. Platforms: Win95, Win98, WinNT and Win2000 (only Win95 not tested yet). Size: 6MB. JCreator is a trademark of Xinox Software; Copyright &copy; 2000 Xinox Software. The Haskell support module is made by Rijk-Jan van Haaften.<br />
<br />
=== [[haste]] - Haskell TurboEdit ===<br />
:haste - Haskell TurboEdit - was an IDE for the functional programming language Haskell, written in Haskell.<br />
<br />
=== [http://www.haskell.org/visualhaskell Visual Haskell] ===<br />
:Visual Haskell is a complete development environment for Haskell software, based on Microsoft's [http://www.microsoft.com/visualstudio/en-us Microsoft Visual Studio] platform. Visual Haskell integrates with the Visual Studio editor to provide interactive features to aid Haskell development, and it enables the construction of projects consisting of multiple Haskell modules, using the Cabal building/packaging infrastructure.<br />
<br />
=== [http://www.cs.kent.ac.uk/projects/vital/ Vital] ===<br />
:Vital is a visual programming environment. It is particularly intended for supporting the open-ended, incremental style of development often preferred by end users (engineers, scientists, analysts, etc.).<br />
<br />
=== [http://www.cs.kent.ac.uk/projects/pivotal/ Pivotal] ===<br />
:Pivotal 0.025 is an early prototype of a Vital-like environment for Haskell. Unlike Vital, however, Pivotal is implemented entirely in Haskell. The implementation is based on the use of the hs-plugins library to allow dynamic compilation and evaluation of Haskell expressions together with the gtk2hs library for implementing the GUI.</div>Chakhttps://wiki.haskell.org/TutorialsTutorials2015-10-11T03:56:06Z<p>Chak: </p>
<hr />
<div>==Introductions to Haskell==<br />
<br />
These are the recommended places to start learning, short of buying a [[Books#Textbooks|textbook]].<br />
<br />
=== Best places to start ===<br />
<br />
;[http://www.seas.upenn.edu/~cis194/lectures.html CIS 194: Introduction to Haskell (Spring 2013)]: An excellent tutorial to Haskell for beginners given as a course at UPenn by the author of the Typeclassopedia and Diagrams, Brent Yorgey. More compact than LYAH and RWH, but still communicates both basics and some notoriously unfamiliar concepts effectively.<br />
<br />
;[http://learnyouahaskell.com Learn You a Haskell for Great Good! (LYAH)]<br />
: Nicely illustrated tutorial showing Haskell concepts while interacting in GHCi. Written and drawn by Miran Lipovača.<br />
<br />
;[http://book.realworldhaskell.org/ Real World Haskell (RWH)]<br />
: A free online version of the complete book, with numerous reader-submitted comments. RWH is best suited for people who know the fundamentals of Haskell already, and can write basic Haskell programs themselves already. It makes a great follow up after finishing LYAH. It can easily be read cover-to-cover, or you can focus on the chapters that interest you most, or when you find an idea you don't yet understand.<br />
<br />
;[http://en.wikibooks.org/wiki/Haskell/YAHT Yet Another Haskell Tutorial (YAHT)]<br />
:By Hal Daume III et al. A recommended tutorial for Haskell that is still under construction but covers already much ground. Also a classic text.<br />
<br />
;[http://en.wikibooks.org/wiki/Haskell Haskell Wikibook] <br />
:A communal effort by several authors to produce the definitive Haskell textbook. It's very much a work in progress at the moment, and contributions are welcome. For 6 inch e-Readers/tablet computers, there is [http://commons.wikimedia.org/wiki/File:Haskell_eBook_Reader.pdf a PDF version of the book]. <br />
<br />
;[http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours Write Yourself a Scheme in 48 Hours in Haskell]<br />
:A Haskell Tutorial, by Jonathan Tang. Most Haskell tutorials on the web seem to take a language-reference-manual approach to teaching. They show you the syntax of the language, a few language constructs, and then have you construct a few simple functions at the interactive prompt. The "hard stuff" of how to write a functioning, useful program is left to the end, or sometimes omitted entirely. This tutorial takes a different tack. You'll start off with command-line arguments and parsing, and progress to writing a fully-functional Scheme interpreter that implements a good-sized subset of R5RS Scheme. Along the way, you'll learn Haskell's I/O, mutable state, dynamic typing, error handling, and parsing features. By the time you finish, you should be fairly fluent in both Haskell and Scheme.<br />
<br />
;[http://acm.wustl.edu/functional/haskell.php How to Learn Haskell]<br />
:Some students at Washington University in St. Louis documented the path they took to learning Haskell and put together a nice meta-tutorial to guide beginners through some of the available resources. Experienced programmers looking for some quick code examples may be interested in their [http://acm.wustl.edu/functional/hs-breads.php breadcrumbs].<br />
<br />
;[http://ohaskell.dshevchenko.biz/ О Haskell по-человечески]<br />
:About Haskell from a beginner for beginners. Not an academical, but practical tutorial. Written by Denis Shevchenko in Russian.<br />
<br />
=== Other tutorials ===<br />
<br />
;[http://dev.stephendiehl.com/hask/ What I wish I knew when learning Haskell] :By Stephen Diehl. Does what it says on the tin. See [http://www.reddit.com/r/haskell/comments/23srcm/what_i_wish_i_knew_when_learning_haskell_20/ Reddit appreciation]<br />
<br />
;[http://channel9.msdn.com/Series/C9-Lectures-Erik-Meijer-Functional-Programming-Fundamentals C9 Lectures: Erik Meijer - Functional Programming Fundamentals]<br />
:A set of videos of lectures by Erik Meijer<br />
<br />
;[http://www.yellosoft.us/evilgenius/ Haskell for the Evil Genius] :By Andrew Pennebaker. An overview of how functional and declarative programming can increase the accuracy and efficiency of digital superweapons, empowering evil geniuses in their supreme goal of taking over the world.<br />
<br />
;[http://www.yellosoft.us/parallel-processing-with-haskell Parallel Processing with Haskell] :By Andrew Pennebaker. A short, accelerated introduction to Haskell for coding parallel programs.<br />
<br />
;[http://www.yellosoft.us/getoptfu GetOptFu] :By Andrew Pennebaker. A guide to robust command line argument parsing in Haskell. Available online in HTML, and offline in ePUB and MOBI formats.<br />
<br />
;[http://www.haskell.org/tutorial/ A [[Gentle]] Introduction to Haskell] :By Paul Hudak, John Peterson, and Joseph H. Fasel. The title is misleading. Some knowledge of another functional programming language is expected. The emphasis is on the type system and those features which are really new in Haskell (compared to other functional programming languages). A classic, but not for the faint of heart (it's not so gentle). Also available in [http://www.haskell.org/wikiupload//5/5e/GentleFR.pdf French] [http://gorgonite.developpez.com/livres/traductions/haskell/gentle-haskell/ from this website] and also [http://www.rsdn.ru/article/haskell/haskell_part1.xml in Russian]. <br />
<br />
;[[H-99: Ninety-Nine Haskell Problems]]<br />
:A collection of programming puzzles, with Haskell solutions. Solving these is a great way to get into Haskell programming.<br />
<br />
;[http://shuklan.com/haskell Undergraduate Haskell Lectures from the University of Virginia] <br />
:An introductory set of slides full of example code for an undergraduate course in Haskell. Topics include basic list manipulations, higher order functions, cabal, the IO Monad, and Category Theory.<br />
<br />
;[[Haskell Tutorial for C Programmers]]<br />
:By Eric Etheridge. From the intro: "This tutorial assumes that the reader is familiar with C/C++, Python, Java, or Pascal. I am writing for you because it seems that no other tutorial was written to help students overcome the difficulty of moving from C/C++, Java, and the like to Haskell."<br />
<br />
;[http://www.ibm.com/developerworks/linux/tutorials/l-hask/ Beginning Haskell] <br />
:From IBM developerWorks. This tutorial targets programmers of imperative languages wanting to learn about functional programming in the language Haskell. If you have programmed in languages such as C, Pascal, Fortran, C++, Java, Cobol, Ada, Perl, TCL, REXX, JavaScript, Visual Basic, or many others, you have been using an imperative paradigm. This tutorial provides a gentle introduction to the paradigm of functional programming, with specific illustrations in the Haskell 98 language. (Free registration required.)<br />
<br />
;[http://www.cse.chalmers.se/~rjmh/tutorials.html Tutorial Papers in Functional Programming].<br />
:A collection of links to other Haskell tutorials, from John Hughes.<br />
<br />
;[http://www.cs.ou.edu/~rlpage/fpclassCurrent/textbook/haskell.shtml Two Dozen Short Lessons in Haskell] <br />
:By Rex Page. A draft of a textbook on functional programming, available by ftp. It calls for active participation from readers by omitting material at certain points and asking the reader to attempt to fill in the missing information based on knowledge they have already acquired. The missing information is then supplied on the reverse side of the page. <br />
<br />
;[ftp://ftp.geoinfo.tuwien.ac.at/navratil/HaskellTutorial.pdf Haskell-Tutorial] <br />
:By Damir Medak and Gerhard Navratil. The fundamentals of functional languages for beginners. <br />
<br />
;[http://video.s-inf.de/#FP.2005-SS-Giesl.(COt).HD_Videoaufzeichnung Video Lectures] <br />
:Lectures (in English) by Jürgen Giesl. About 30 hours in total, and great for learning Haskell. The lectures are 2005-SS-FP.V01 through 2005-SS-FP.V26. Videos 2005-SS-FP.U01 through 2005-SS-FP.U11 are exercise answer sessions, so you probably don't want those.<br />
<br />
;[http://www.cs.utoronto.ca/~trebla/fp/ Albert's Functional Programming Course] <br />
:A 15 lesson introduction to most aspects of Haskell.<br />
<br />
;[http://www.iceteks.com/articles.php/haskell/1 Introduction to Haskell]<br />
:By Chris Dutton, An "attempt to bring the ideas of functional programming to the masses here, and an experiment in finding ways to make it easy and interesting to follow".<br />
<br />
;[http://www.csc.depauw.edu/~bhoward/courses/0203Spring/csc122/haskintro/ An Introduction to Haskell]<br />
:A brief introduction, by Brian Howard.<br />
<br />
;[http://www.linuxjournal.com/article/9096 Translating Haskell into English]<br />
:By Shannon Behrens, a glimpse of the Zen of Haskell, without requiring that they already be Haskell converts.<br />
<br />
;[http://www.shlomifish.org/lecture/Perl/Haskell/slides/ Haskell for Perl Programmers]<br />
:Brief introduction to Haskell, with a view to what perl programmers are interested in<br />
<br />
;[http://lisperati.com/haskell/ How To Organize a Picnic on a Computer]<br />
:Fun introduction to Haskell, step by step building of a program to seat people at a planned picnic, based on their similarities using data from a survey and a map of the picnic location.<br />
<br />
;[http://cs.wallawalla.edu/research/KU/PR/Haskell.html Haskell Tutorial]<br />
<br />
;[http://www.lisperati.com/haskell/ Conrad Barski's Haskell tutorial .. with robots]<br />
<br />
;[[Media:Introduction.pdf|Frederick Ross's Haskell introduction]]<br />
<br />
;[http://de.wikibooks.org/wiki/Haskell Dirk's Haskell Tutorial]<br />
:in German for beginners by a beginner. Not so deep, but with a lot examples with very small steps.<br />
<br />
;[http://www.crsr.net/Programming_Languages/SoftwareTools/index.html Software Tools in Haskell]<br />
:A tutorial for advanced readers<br />
<br />
;[http://learn.hfm.io/ Learning Haskell]<br />
:A comprehensive introduction to Haskell that combines text with screencasts. No previous knowledge of functional programming is required. The tutorial is still work in progress with additional chapters being added over time.<br />
<br />
See also the discussion [http://www.reddit.com/r/haskell/comments/2blsqa/papers_every_haskeller_should_read/ Papers every haskeller should read].<br />
<br />
== Motivation for using Haskell ==<br />
<br />
;[http://www.cse.chalmers.se/~rjmh/Papers/whyfp.html Why Functional Programming Matters] <br />
:By [http://www.cse.chalmers.se/~rjmh/ John Hughes], The Computer Journal, Vol. 32, No. 2, 1989, pp. 98 - 107. Also in: David A. Turner (ed.): Research Topics in Functional Programming, Addison-Wesley, 1990, pp. 17 - 42.<BR> Exposes the advantages of functional programming languages. Demonstrates how higher-order functions and lazy evaluation enable new forms of modularization of programs.<br />
<br />
;[[Why Haskell matters]] <br />
:Discussion of the advantages of using Haskell in particular. An excellent article.<br />
<br />
;[http://www.youtube.com/watch?v=Fqi0Xu2Enaw Haskell Introduction]<br />
:A video from FP Complete<br />
<br />
;[http://www.cs.kent.ac.uk/pubs/1997/224/index.html Higher-order + Polymorphic = Reusable] <br />
:By [http://www.cs.kent.ac.uk/people/staff/sjt/index.html Simon Thompson]. Unpublished, May 1997.<BR> <STRONG>Abstract:</STRONG> This paper explores how certain ideas in object oriented languages have their correspondents in functional languages. In particular we look at the analogue of the iterators of the C++ standard template library. We also give an example of the use of constructor classes which feature in Haskell 1.3 and Gofer.<br />
<br />
;[http://www.ibm.com/developerworks/java/library/j-cb07186/index.html Explore functional programming with Haskell]<br />
:Introduction to the benefits of functional programming in Haskell by Bruce Tate.<br />
<br />
== Blog articles ==<br />
<br />
There are a large number of tutorials covering diverse Haskell topics<br />
published as blogs. Some of the best of these articles are collected<br />
here:<br />
<br />
;[[Blog articles]]<br />
<br />
==Practical Haskell==<br />
<br />
These tutorials examine using Haskell to writing complex real-world applications<br />
<br />
;[http://research.microsoft.com/en-us/um/people/simonpj/Papers/marktoberdorf/ Tackling the awkward squad: monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell]<br />
:Simon Peyton Jones. Presented at the 2000 Marktoberdorf Summer School. In "Engineering theories of software construction", ed Tony Hoare, Manfred Broy, Ralf Steinbruggen, IOS Press, ISBN 1-58603-1724, 2001, pp47-96. The standard reference for monadic IO in GHC/Haskell. <br><strong>Abstract:</strong>Functional programming may be beautiful, but to write real applications we must grapple with awkward real-world issues: input/output, robustness, concurrency, and interfacing to programs written in other languages.<br />
<br />
;[[Hitchhikers Guide to the Haskell]]<br />
: Tutorial for C/Java/OCaml/... programers by Dmitry Astapov. From the intro: "This text intends to introduce the reader to the practical aspects of Haskell from the very beginning (plans for the first chapters include: I/O, darcs, Parsec, QuickCheck, profiling and debugging, to mention a few)".<br />
<br />
;[http://www.haskell.org/haskellwiki/IO_inside Haskell I/O inside: Down the Rabbit's Hole]<br />
:By Bulat Ziganshin (2006), a comprehensive tutorial on using IO monad.<br />
<br />
;[http://web.archive.org/web/20060622030538/http://www.reid-consulting-uk.ltd.uk/docs/ffi.html A Guide to Haskell's Foreign Function Interface]<br />
:A guide to using the foreign function interface extension, using the rich set of functions in the Foreign libraries, design issues, and FFI preprocessors.<br />
<br />
;[[Haskell IO for Imperative Programmers]]<br />
:A short introduction to IO from the perspective of an imperative programmer.<br />
<br />
;[[A brief introduction to Haskell|A Brief Introduction to Haskell]]<br />
:A translation of the article, [http://www.cs.jhu.edu/~scott/pl/lectures/caml-intro.html Introduction to OCaml], to Haskell.<br />
<br />
;[[Roll your own IRC bot]]<br />
:This tutorial is designed as a practical guide to writing real world code in Haskell and hopes to intuitively motivate and introduce some of the advanced features of Haskell to the novice programmer, including monad transformers. Our goal is to write a concise, robust and elegant IRC bot in Haskell.<br />
<br />
;[http://projects.haskell.org/gtk2hs/docs/tutorial/glade/ Glade Tutorial (GUI Programming)]<br />
:For the absolute beginner in both Glade and Gtk2Hs. Covers the basics of Glade and how to access a .glade file and widgets in Gtk2Hs. Estimated learning time: 2 hours.<br />
;[http://www.muitovar.com/glade/es-index.html Tutorial de Glade]<br />
:A Spanish translation of the Glade tutorial<br />
<br />
;[http://www.muitovar.com/gtk2hs/index.html Gtk2Hs Tutorial]<br />
: An extensive [[Gtk2Hs]] programming guide, based on the GTK+2.0 tutorial by Tony Gale and Ian Main. This tutorial on GUI programming with Gtk2Hs has 22 chapters in 7 sections, plus an appendix on starting drawing with Cairo. A Spanish translation and source code of the examples are also available.<br />
<br />
;Applications of Functional Programming<br />
:Colin Runciman and David Wakeling (ed.), UCL Press, 1995, ISBN 1-85728-377-5 HB. From the cover:<blockquote>This book is unique in showcasing real, non-trivial applications of functional programming using the Haskell language. It presents state-of-the-art work from the FLARE project and will be an invaluable resource for advanced study, research and implementation.</blockquote><br />
<br />
;[[DealingWithBinaryData]] a guide to ByteStrings, the various <tt>Get</tt> monads and the <tt>Put</tt> monad.<br />
<br />
;[[Internationalization of Haskell programs]]<br />
:Short tutorial on how to use GNU gettext utility to make applications, written on Haskell, multilingual.<br />
<br />
===Testing===<br />
<br />
;[http://blog.moertel.com/articles/2006/10/31/introductory-haskell-solving-the-sorting-it-out-kata Small overview of QuickCheck]<br />
<br />
;[[Introduction to QuickCheck]]<br />
<br />
==Reference material==<br />
<br />
;[http://www.haskell.org/haskellwiki/Category:Tutorials A growing list of Haskell tutorials on a diverse range of topics]<br />
:Available on this wiki<br />
<br />
;[http://www.haskell.org/haskellwiki/Category:How_to "How to"-style tutorials and information]<br />
<br />
;[http://zvon.org/other/haskell/Outputglobal/index.html Haskell Reference] <br />
:By Miloslav Nic.<br />
<br />
;[http://members.chello.nl/hjgtuyl/tourdemonad.html A tour of the Haskell Monad functions]<br />
:By Henk-Jan van Tuyl.<br />
<br />
;[http://www.cse.unsw.edu.au/~en1000/haskell/inbuilt.html Useful Haskell functions]<br />
:An explanation for beginners of many Haskell functions that are predefined in the Haskell Prelude.<br />
<br />
;[http://www.haskell.org/ghc/docs/latest/html/libraries/ Documentation for the standard libraries]<br />
:Complete documentation of the standard Haskell libraries.<br />
<br />
;[http://www.haskell.org/haskellwiki/Category:Idioms Haskell idioms]<br />
:A collection of articles describing some common Haskell idioms. Often quite advanced.<br />
<br />
;[http://www.haskell.org/haskellwiki/Blow_your_mind Useful idioms]<br />
:A collection of short, useful Haskell idioms.<br />
<br />
;[http://www.haskell.org/haskellwiki/Programming_guidelines Programming guidelines]<br />
:Some Haskell programming and style conventions.<br />
<br />
;[http://www.cse.chalmers.se/~rjmh/Combinators/LightningTour/index.htm Lightning Tour of Haskell]<br />
:By John Hughes, as part of a Chalmers programming course<br />
<br />
;[http://vmg.pp.ua/books/КопьютерыИсети/_ИХТИК31G/single/Hall%20C.The%20little%20Haskeller.pdf The Little Haskeller] <br />
:By Cordelia Hall and John Hughes. 9. November 1993, 26 pages. An introduction using the Chalmers Haskell B interpreter (hbi). Beware that it relies very much on the user interface of hbi which is quite different for other Haskell systems, and the tutorials cover Haskell 1.2 , not Haskell 98.<br />
<br />
;[http://www.staff.science.uu.nl/~fokke101/courses/fp-eng.pdf Functional Programming]<br />
:By Jeroen Fokker, 1995. (153 pages, 600 KB). Textbook for learning functional programming with Gofer (an older implementation of Haskell). Here without Chapters&nbsp;6 and&nbsp;7.<br />
<br />
== Comparisons to other languages ==<br />
<br />
Articles contrasting feature of Haskell with other languages.<br />
<br />
;[http://programming.reddit.com/goto?id=nq1k Haskell versus Scheme]<br />
:Mark C. Chu-Carroll, Haskell and Scheme: Which One and Why?<br />
<br />
;[http://wiki.python.org/moin/PythonVsHaskell Comparing Haskell and Python]<br />
:A short overview of similarities and differences between Haskell and Python.<br />
<br />
;[http://programming.reddit.com/goto?id=nwm2 Monads in OCaml]<br />
:Syntax extension for monads in OCaml<br />
<br />
;[http://www.shlomifish.org/lecture/Perl/Haskell/slides/ Haskell for Perl programmers]<br />
:Short intro for perlers<br />
<br />
;[[A_brief_introduction_to_Haskell|Introduction to Haskell]] versus [http://www.cs.jhu.edu/~scott/pl/lectures/caml-intro.html Introduction to OCaml].<br />
<br />
;[http://www.thaiopensource.com/relaxng/derivative.html An algorithm for RELAX NG validation]<br />
:by James Clark (of RELAX NG fame). Describes an algorithm for validating an XML document against a RELAX NG schema, uses Haskell to describe the algorithm. The algorithm in Haskell and Java is then [http://www.donhopkins.com/drupal/node/117 discussed here].<br />
<br />
;[http://blog.prb.io/first-steps-with-haskell-for-web-applications.html Haskell + FastCGI versus Ruby on Rails]<br />
:A short blog entry documenting performance results with ruby on rails and Haskell with fastcgi<br />
<br />
;[http://haskell.cs.yale.edu/wp-content/uploads/2011/03/HaskellVsAda-NSWC.pdf Haskell vs. Ada vs. C++ vs. Awk vs. ..., An Experiment in Software Prototyping Productivity] (PDF)<br />
:Paul Hudak and Mark P. Jones, 16 pages.<blockquote>Description of the results of an experiment in which several conventional programming languages, together with the functional language Haskell, were used to prototype a Naval Surface Warfare Center requirement for Geometric Region Servers. The resulting programs and development metrics were reviewed by a committee chosen by the US Navy. The results indicate that the Haskell prototype took significantly less time to develop and was considerably more concise and easier to understand than the corresponding prototypes written in several different imperative languages, including Ada and C++. </blockquote> <br />
<br />
;[http://www.osl.iu.edu/publications/prints/2003/comparing_generic_programming03.pdf A Comparative Study of Language Support for Generic Programming] (pdf)<br />
:Ronald Garcia, Jaakko Jrvi, Andrew Lumsdaine, Jeremy G. Siek, and Jeremiah Willcock. In Proceedings of the 2003 ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications (OOPSLA'03), October 2003.<blockquote>An interesting comparison of generic programming support across languages, including: Haskell, SML, C++, Java, C#. Haskell supports all constructs described in the paper -- the only language to do so. </blockquote><br />
<br />
;[http://homepages.inf.ed.ac.uk/wadler/realworld/index.html Functional Programming in the Real World]<br />
:A list of functional programs applied to real-world tasks. The main criterion for being real-world is that the program was written primarily to perform some task, not primarily to experiment with functional programming. Functional is used in the broad sense that includes both `pure' programs (no side effects) and `impure' (some use of side effects). Languages covered include CAML, Clean, Erlang, Haskell, Miranda, Scheme, SML, and others.<br />
<br />
;[http://www.defmacro.org/ramblings/lisp-in-haskell.html Lisp in Haskell]<br />
:Writing A Lisp Interpreter In Haskell, a tutorial<br />
<br />
;[http://bendyworks.com/geekville/articles/2012/12/from-ruby-to-haskell-part-1-testing From Ruby to Haskell, Part 1: Testing]<br />
:A quick comparison between ruby's and haskell's BDD.<br />
<br />
== Teaching Haskell ==<br />
<br />
;[http://www.cs.kent.ac.uk/pubs/1997/208/index.html Where do I begin? A problem solving approach to teaching functional programming]<br />
:By [http://www.cs.kent.ac.uk/people/staff/sjt/index.html Simon Thompson]. In Krzysztof Apt, Pieter Hartel, and Paul Klint, editors, First International Conference on Declarative Programming Languages in Education. Springer-Verlag, September 1997. <br> <STRONG>Abstract:</STRONG> This paper introduces a problem solving method for teaching functional programming, based on Polya's `How To Solve It', an introductory investigation of mathematical method. We first present the language independent version, and then show in particular how it applies to the development of programs in Haskell. The method is illustrated by a sequence of examples and a larger case study. <br />
<br />
;[http://www.cs.kent.ac.uk/pubs/1995/214/index.html Functional programming through the curriculum]<br />
:By [http://www.cs.kent.ac.uk/people/staff/sjt/index.html Simon Thompson] and Steve Hill. In Pieter H. Hartel and Rinus Plasmeijer, editors, Functional Programming Languages in Education, LNCS 1022, pages 85-102. Springer-Verlag, December 1995. <br> <STRONG>Abstract:</STRONG> This paper discusses our experience in using a functional language in topics across the computer science curriculum. After examining the arguments for taking a functional approach, we look in detail at four case studies from different areas: programming language semantics, machine architectures, graphics and formal languages. <br />
<br />
;[http://www.cse.unsw.edu.au/~chak/papers/CK02a.html The Risks and Benefits of Teaching Purely Functional Programming in First Year]<br />
:By [http://www.cse.unsw.edu.au/~chak/ Manuel M. T. Chakravarty] and [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]. Journal of Functional Programming 14(1), pp 113-123, 2004. An earlier version of this paper was presented at Functional and Declarative Programming in Education (FDPE02). <br> <strong>Abstract</strong> We argue that teaching purely functional programming as such in freshman courses is detrimental to both the curriculum as well as to promoting the paradigm. Instead, we need to focus on the more general aims of teaching elementary techniques of programming and essential concepts of computing. We support this viewpoint with experience gained during several semesters of teaching large first-year classes (up to 600 students) in Haskell. These classes consisted of computer science students as well as students from other disciplines. We have systematically gathered student feedback by conducting surveys after each semester. This article contributes an approach to the use of modern functional languages in first year courses and, based on this, advocates the use of functional languages in this setting.<br />
<br />
<br />
==Using monads==<br />
<br />
;[http://www.haskell.org/wikiupload/c/c6/ICMI45-paper-en.pdf How to build a monadic interpreter in one day] (PDF)<br />
:By Dan Popa. A small tutorial on how to build a language in one day, using the Parser Monad in the front end and a monad with state and I/O string in the back end. Read it if you are interested in learning: <br />
:# language construction and <br />
:# interpreter construction<br />
<br />
;[[Monad Transformers Explained]]<br />
<br />
;[[MonadCont under the hood]]<br />
:A detailed description of the ''Cont'' data type and its monadic operations, including the class ''MonadCont''.<br />
<br />
;[http://en.wikipedia.org/wiki/Monads_in_functional_programming Article on monads on Wikipedia]<br />
<br />
;[[IO inside]] page<br />
:Explains why I/O in Haskell is implemented with a monad.<br />
<br />
;[http://stefan-klinger.de/files/monadGuide.pdf The Haskell Programmer's Guide to the IO Monad - Don't Panic.] <br />
:By Stefan Klinger. This report scratches the surface of category theory, an abstract branch of algebra, just deep enough to find the monad structure. It seems well written.<br />
<br />
;[https://karczmarczuk.users.greyc.fr/TEACH/Doc/monads.html Systematic Design of Monads]<br />
:By John Hughes and Magnus Carlsson. Many useful monads can be designed in a systematic way, by successively adding facilities to a trivial monad. The capabilities that can be added in this way include state, exceptions, backtracking, and output. Here we give a brief description of the trivial monad, each kind of extension, and sketches of some interesting operations that each monad supports.<br />
<br />
;[[Simple monad examples]]<br />
<br />
<br />
See also: <br />
<br />
* the [[Monad]] HaskellWiki page<br />
* [[Research papers/Monads and arrows]].<br />
* [[Blog articles#Monads |Blog articles]]<br />
* [[Monad tutorials timeline]]<br />
<br />
===Tutorials===<br />
<br />
''The comprehensive list is available at [[Monad tutorials timeline]].''<br />
<br />
;[http://mvanier.livejournal.com/3917.html Mike Vanier's monad tutorial]<br />
:Recommended by David Balaban.<br />
<br />
;[[All About Monads]], [http://www.sampou.org/haskell/a-a-monads/html/index.html モナドのすべて]<br />
:By Jeff Newbern. This tutorial aims to explain the concept of a monad and its application to functional programming in a way that is easy to understand and useful to beginning and intermediate Haskell programmers. Familiarity with the Haskell language is assumed, but no prior experience with monads is required. <br />
<br />
;[[Monads as computation]]<br />
:A tutorial which gives a broad overview to motivate the use of monads as an abstraction in functional programming and describe their basic features. It makes an attempt at showing why they arise naturally from some basic premises about the design of a library.<br />
<br />
;[[Monads as containers]]<br />
:A tutorial describing monads from a rather different perspective: as an abstraction of container-types, rather than an abstraction of types of computation.<br />
<br />
;[http://www.grabmueller.de/martin/www/pub/Transformers.en.html Monad Transformers Step by Step]<br />
:By Martin Grabm&uuml;ller. A small tutorial on using monad transformers. In contrast to others found on the web, it concentrates on using them, not on their implementation.<br />
<br />
;[[What a Monad is not]]<br />
<br />
;[http://noordering.wordpress.com/2009/03/31/how-you-shouldnt-use-monad/ How you should(n’t) use Monad]<br />
<br />
;[http://www-users.mat.uni.torun.pl/~fly/materialy/fp/haskell-doc/Monads.html What the hell are Monads?] <br />
:By Noel Winstanley. A basic introduction to monads, monadic programming and IO. This introduction is presented by means of examples rather than theory, and assumes a little knowledge of Haskell. <br />
<br />
;[http://www.engr.mun.ca/~theo/Misc/haskell_and_monads.htm Monads for the Working Haskell Programmer -- a short tutorial]<br />
:By Theodore Norvell. <br />
<br />
;[http://blog.sigfpe.com/2006/08/you-could-have-invented-monads-and.html You Could Have Invented Monads! (And Maybe You Already Have.)]<br />
:A short tutorial on monads, introduced from a pragmatic approach, with less category theory references <br />
<br />
;[[Meet Bob The Monadic Lover]]<br />
:By Andrea Rossato. A humorous and short introduction to Monads, with code but without any reference to category theory: what monads look like and what they are useful for, from the perspective of a ... lover. (There is also the slightly more serious [[The Monadic Way]] by the same author.)<br />
<br />
;[http://www.haskell.org/pipermail/haskell-cafe/2006-November/019190.html Monstrous Monads]<br />
:Andrew Pimlott's humourous introduction to monads, using the metaphor of "monsters".<br />
<br />
;[http://strabismicgobbledygook.wordpress.com/2010/03/06/a-state-monad-tutorial/ A State Monad Tutorial]<br />
:A detailed tutorial with simple but practical examples.<br />
<br />
;[http://www.reddit.com/r/programming/comments/ox6s/ask_reddit_what_the_hell_are_monads/coxiv Ask Reddit: What the hell are monads? answer by tmoertel] and [http://programming.reddit.com/info/ox6s/comments/coxoh dons].<br />
<br />
;[[The Monadic Way]]<br />
<br />
;[http://www.alpheccar.org/content/60.html Three kind of monads] : sequencing, side effects or containers<br />
<br />
;[http://www.muitovar.com/monad/moncow.html The Greenhorn's Guide to becoming a Monad Cowboy]<br />
:Covers basics, with simple examples, in a ''for dummies'' style. Includes monad transformers and monadic functions. Estimated learning time 2-3 days.<br />
<br />
;[http://ertes.de/articles/monads.html Understanding Haskell Monads]<br />
<br />
;[http://www.reddit.com/r/programming/comments/64th1/monads_in_python_in_production_code_you_can_and/c02u9mb An explanation by 808140]<br />
<br />
==Workshops on advanced functional programming==<br />
<br />
;[http://compilers.iecc.com/comparch/article/95-04-024 Advanced Functional Programming: 1st International Spring School on Advanced Functional Programming Techniques], Bastad, Sweden, May 24 - 30, 1995. Tutorial Text (Lecture Notes in Computer Science) <br />
<br />
;[http://alfa.di.uminho.pt/~afp98/ Advanced Functional Programming: 3rd International School], AFP'98, Braga, Portugal, September 12-19, 1998, Revised Lectures (Lecture Notes in Computer Science) <br />
<br />
;[http://www.staff.science.uu.nl/~jeuri101/afp/afp4/ Advanced Functional Programming: 4th International School], AFP 2002, Oxford, UK, August 19-24, 2002, Revised Lectures (Lecture Notes in Computer Science) <br />
<br />
;[http://www.cs.ut.ee/afp04/ Advanced Functional Programming: 5th International School], AFP 2004, Tartu, Estonia, August 14-21, 2004, Revised Lectures (Lecture Notes in Computer Science) <br />
<br />
More advanced materials available from the [[Conferences|conference proceedings]], and the [[Research papers]] collection.<br />
<br />
<br />
[[Category:Tutorials]]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2012-02-10T01:34:13Z<p>Chak: /* Overview */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
Data Parallel Haskell (DPH) is available as an add-on for [http://haskell.org/ghc/download_ghc_7_4_1 GHC 7.4] in the form of a few separate cabal package. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude for vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed or no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code.<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://repa.ouroborus.net/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Note:''' This page describes version 0.6.* of the DPH libraries. We only support this version of DPH as well as the current development version.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, install [http://haskell.org/ghc/download_ghc_7_4_1 GHC 7.4] and then install the DPH libraries with <code>cabal install</code> as follows:<br />
<blockquote><br />
<code>$ cabal update</code><br><br />
<code>$ cabal install dph-examples</code><br />
</blockquote><br />
This will install all DPH packages, including a set of simple examples, see [http://hackage.haskell.org/package/dph-examples dph-examples]. (The package [http://hackage.haskell.org/package/dph-examples dph-examples] does depend on OpenGL and Gloss as both are used in a visualiser for the nobody example.)<br />
<br />
'''WARNING:''' The vanilla GHC distribution does '''not''' include <code>cabal install</code>. This is in contrast to the Haskell Platform, which does include <code>cabal install</code>. If you want to avoid installing the <code>cabal-intstall</code> package and its dependencies explicitly, simply install GHC 7.4.1 in addition to your current Haskell Platform installation. (How to do that depends on your platform and personal preferences. One option is to install a bindist into your home directory with symbolic links to the binaries including the version number.) Then, install DPH with the following command:<br />
<blockquote><br />
<code>cabal install --with-compiler=`which ghc-7.4.1` --with-hc-pkg=`which ghc-pkg-7.4.1` dph-examples</code><br />
</blockquote><br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by using a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# OPTIONS_GHC -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel<br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.<br />
<br />
==== Generating input data ====<br />
<br />
To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile and link the program as described above.<br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].<br />
<br />
==== Parallel execution ====<br />
<br />
By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.) To determine the runtime of parallel code, measuring CPU time, as demonstrated in the [[GHC/Data Parallel Haskell/MainTimed|timed variant of the dot product example]], is not sufficient anymore. We need to measure wall clock times instead. For proper benchmarking, it is advisable to use a library, such as [http://hackage.haskell.org/package/criterion criterion].<br />
<br />
A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.<br />
<br />
=== Further examples and documentation ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking. <br />
<br />
The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph-par library documentation] on Hackage.<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2012-02-06T03:57:58Z<p>Chak: /* Project status */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
Data Parallel Haskell (DPH) is available as an add-on for [http://haskell.org/ghc/download_ghc_7_4_1 GHC 7.4] in the form of a few separate cabal package. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude for vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed or no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code.<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://repa.ouroborus.net/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Note:''' This page describes version 0.6.* of the DPH libraries. We only support this version of DPH as well as the current development version.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, install [http://haskell.org/ghc/download_ghc_7_4_1 GHC 7.4] and then install the DPH libraries with <code>cabal install</code> as follows:<br />
<blockquote><br />
<code>$ cabal update</code><br><br />
<code>$ cabal install dph-examples</code><br />
</blockquote><br />
This will install all DPH packages, including a set of simple examples, see [http://hackage.haskell.org/package/dph-examples dph-examples]. (The package [http://hackage.haskell.org/package/dph-examples dph-examples] does depend on OpenGL and Gloss as both are used in a visualiser for the nobody example.)<br />
<br />
'''WARNING:''' The vanilla GHC distribution does '''not''' include <code>cabal install</code>. This is in contrast to the Haskell Platform, which does include <code>cabal install</code>. If you want to avoid installing the <code>cabal-intstall</code> package and its dependencies explicitly, simply install GHC 7.4.1 in addition to your current Haskell Platform installation. (How to do that depends on your platform and personal preferences. One option is to install a bindist into your home directory with symbolic links to the binaries including the version number.) Then, install DPH with the following command:<br />
<blockquote><br />
<code>cabal install --with-compiler=`which ghc-7.4.1` --with-hc-pkg=`which ghc-pkg-7.4.1` dph-examples</code><br />
</blockquote><br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# OPTIONS_GHC -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel<br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.<br />
<br />
==== Generating input data ====<br />
<br />
To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile and link the program as described above.<br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].<br />
<br />
==== Parallel execution ====<br />
<br />
By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.) To determine the runtime of parallel code, measuring CPU time, as demonstrated in the [[GHC/Data Parallel Haskell/MainTimed|timed variant of the dot product example]], is not sufficient anymore. We need to measure wall clock times instead. For proper benchmarking, it is advisable to use a library, such as [http://hackage.haskell.org/package/criterion criterion].<br />
<br />
A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.<br />
<br />
=== Further examples and documentation ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking. <br />
<br />
The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph-par library documentation] on Hackage.<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2012-02-06T03:54:54Z<p>Chak: /* Where to get it */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
Data Parallel Haskell (DPH) is available as an add-on for [http://haskell.org/ghc/download_ghc_7_4_1 GHC 7.4] in the form of a few separate cabal package. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude for vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed or no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code.<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://repa.ouroborus.net/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, install [http://haskell.org/ghc/download_ghc_7_4_1 GHC 7.4] and then install the DPH libraries with <code>cabal install</code> as follows:<br />
<blockquote><br />
<code>$ cabal update</code><br><br />
<code>$ cabal install dph-examples</code><br />
</blockquote><br />
This will install all DPH packages, including a set of simple examples, see [http://hackage.haskell.org/package/dph-examples dph-examples]. (The package [http://hackage.haskell.org/package/dph-examples dph-examples] does depend on OpenGL and Gloss as both are used in a visualiser for the nobody example.)<br />
<br />
'''WARNING:''' The vanilla GHC distribution does '''not''' include <code>cabal install</code>. This is in contrast to the Haskell Platform, which does include <code>cabal install</code>. If you want to avoid installing the <code>cabal-intstall</code> package and its dependencies explicitly, simply install GHC 7.4.1 in addition to your current Haskell Platform installation. (How to do that depends on your platform and personal preferences. One option is to install a bindist into your home directory with symbolic links to the binaries including the version number.) Then, install DPH with the following command:<br />
<blockquote><br />
<code>cabal install --with-compiler=`which ghc-7.4.1` --with-hc-pkg=`which ghc-pkg-7.4.1` dph-examples</code><br />
</blockquote><br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# OPTIONS_GHC -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel<br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.<br />
<br />
==== Generating input data ====<br />
<br />
To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile and link the program as described above.<br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].<br />
<br />
==== Parallel execution ====<br />
<br />
By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.) To determine the runtime of parallel code, measuring CPU time, as demonstrated in the [[GHC/Data Parallel Haskell/MainTimed|timed variant of the dot product example]], is not sufficient anymore. We need to measure wall clock times instead. For proper benchmarking, it is advisable to use a library, such as [http://hackage.haskell.org/package/criterion criterion].<br />
<br />
A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.<br />
<br />
=== Further examples and documentation ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking. <br />
<br />
The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph-par library documentation] on Hackage.<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2012-02-06T03:51:04Z<p>Chak: /* Project status */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
Data Parallel Haskell (DPH) is available as an add-on for [http://haskell.org/ghc/download_ghc_7_4_1 GHC 7.4] in the form of a few separate cabal package. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude for vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed or no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code.<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://repa.ouroborus.net/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, install [http://haskell.org/ghc/download_ghc_7_2_1 GHC 7.2] and then install the DPH libraries with <code>cabal install</code> as follows:<br />
<blockquote><br />
<code>$ cabal update</code><br><br />
<code>$ cabal install dph-examples</code><br />
</blockquote><br />
This will install all DPH packages, including a set of simple examples, see [http://hackage.haskell.org/package/dph-examples dph-examples]. (The package [http://hackage.haskell.org/package/dph-examples dph-examples] does depend on OpenGL and Gloss as both are used in a visualised for the nobody example.)<br />
<br />
'''WARNING:''' The vanilla GHC distribution does '''not''' include <code>cabal install</code>. This is in contrast to the Haskell Platform, which does include <code>cabal install</code>. If you want to avoid installing the <code>cabal-intstall</code> package and its dependencies explicitly, simply install GHC 7.2.1 in addition to your current Haskell Platform installation. (How to do that depends on your platform and personal preferences. One option is to install a bindist into your home directory with symbolic links to the binaries including the version number.) Then, install DPH with the following command:<br />
<blockquote><br />
<code>cabal install --with-compiler=`which ghc-7.2.1` --with-hc-pkg=`which ghc-pkg-7.2.1` dph-examples</code><br />
</blockquote><br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# OPTIONS_GHC -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel<br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.<br />
<br />
==== Generating input data ====<br />
<br />
To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile and link the program as described above.<br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].<br />
<br />
==== Parallel execution ====<br />
<br />
By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.) To determine the runtime of parallel code, measuring CPU time, as demonstrated in the [[GHC/Data Parallel Haskell/MainTimed|timed variant of the dot product example]], is not sufficient anymore. We need to measure wall clock times instead. For proper benchmarking, it is advisable to use a library, such as [http://hackage.haskell.org/package/criterion criterion].<br />
<br />
A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.<br />
<br />
=== Further examples and documentation ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking. <br />
<br />
The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph-par library documentation] on Hackage.<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/HakkuTaikai/AttendeesHakkuTaikai/Attendees2011-08-30T00:34:12Z<p>Chak: /* HakkuTaikai Attendees */</p>
<hr />
<div>= HakkuTaikai Attendees =<br />
<br />
The venue is not open to the public on the day of the Hackathon, so we must submit a list of names to security. If you're name is not on the list, you may not be admitted.<br />
<br />
If you do not wish to announce your attendance in public, please email haskathon★liyang.hu instead.<br />
<br />
{| class="wikitable"<br />
!Nickname<br />
!Real Name<br />
!Affiliation<br />
!Mobile<br />
|-<br />
| liyang<br />
| Liyang HU<br />
| Tsuru Capital LLC<br />
| +81 80 4361 1307<br />
|-<br />
| kfish<br />
| [[User:ConradParker|Conrad Parker]]<br />
| Tsuru Capital SG Pte Ltd<br />
| +81 80 4162 1307<br />
|-<br />
| erikde/m3ga<br />
| [[User:Erik de Castro Lopo|Erik de Castro Lopo]]<br />
| bCODE Pty Ltd<br />
| +61 400 912 480<br />
|-<br />
| kazu<br />
| Kazu Yamamoto<br />
| IIJ<br />
| not public<br />
|-<br />
| tibbe<br />
| Johan Tibell<br />
| Google<br />
| not public<br />
|-<br />
| lpeterse<br />
| Lars Petersen<br />
| -<br />
| not public<br />
|-<br />
| TacticalGrace<br />
| [[User:chak|Manuel Chakravarty]]<br />
| University of New South Wales<br />
| not public<br />
|}</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2011-08-11T13:42:46Z<p>Chak: /* Where to get it */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
Data Parallel Haskell (DPH) is available as an add-on for [http://haskell.org/ghc/download_ghc_7_2_1 GHC 7.2] in the form of a few separate cabal package. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used and certain idioms are avoided. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, install [http://haskell.org/ghc/download_ghc_7_2_1 GHC 7.2] and then install the DPH libraries with <code>cabal install</code> as follows:<br />
<blockquote><br />
<code>$ cabal update</code><br><br />
<code>$ cabal install dph-examples</code><br />
</blockquote><br />
This will install all DPH packages, including a set of simple examples, see [http://hackage.haskell.org/package/dph-examples dph-examples]. (The package [http://hackage.haskell.org/package/dph-examples dph-examples] does depend on OpenGL and Gloss as both are used in a visualised for the nobody example.)<br />
<br />
'''WARNING:''' The vanilla GHC distribution does '''not''' include <code>cabal install</code>. This is in contrast to the Haskell Platform, which does include <code>cabal install</code>. If you want to avoid installing the <code>cabal-intstall</code> package and its dependencies explicitly, simply install GHC 7.2.1 in addition to your current Haskell Platform installation. (How to do that depends on your platform and personal preferences. One option is to install a bindist into your home directory with symbolic links to the binaries including the version number.) Then, install DPH with the following command:<br />
<blockquote><br />
<code>cabal install --with-compiler=`which ghc-7.2.1` --with-hc-pkg=`which ghc-pkg-7.2.1` dph-examples</code><br />
</blockquote><br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# OPTIONS_GHC -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.<br />
<br />
==== Generating input data ====<br />
<br />
To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile and link the program as described above.<br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].<br />
<br />
==== Parallel execution ====<br />
<br />
By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.) To determine the runtime of parallel code, measuring CPU time, as demonstrated in the [[GHC/Data Parallel Haskell/MainTimed|timed variant of the dot product example]], is not sufficient anymore. We need to measure wall clock times instead. For proper benchmarking, it is advisable to use a library, such as [http://hackage.haskell.org/package/criterion criterion].<br />
<br />
A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.<br />
<br />
=== Further examples and documentation ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking. <br />
<br />
The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph-par library documentation] on Hackage.<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2011-08-11T12:52:27Z<p>Chak: /* Further examples and documentation */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
Data Parallel Haskell (DPH) is available as an add-on for [http://haskell.org/ghc/download_ghc_7_2_1 GHC 7.2] in the form of a few separate cabal package. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used and certain idioms are avoided. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, install [http://haskell.org/ghc/download_ghc_7_2_1 GHC 7.2] and then install the DPH libraries with <code>cabal install</code> as follows:<br />
<blockquote><br />
<code>$ cabal update</code><br><br />
<code>$ cabal install dph-examples</code><br />
</blockquote><br />
This will install all DPH packages, including a set of simple examples, see [http://hackage.haskell.org/package/dph-examples dph-examples]. (The package [http://hackage.haskell.org/package/dph-examples dph-examples] does depend on OpenGL and Gloss as both are used in a visualised for the nobody example.)<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# OPTIONS_GHC -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.<br />
<br />
==== Generating input data ====<br />
<br />
To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile and link the program as described above.<br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].<br />
<br />
==== Parallel execution ====<br />
<br />
By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.) To determine the runtime of parallel code, measuring CPU time, as demonstrated in the [[GHC/Data Parallel Haskell/MainTimed|timed variant of the dot product example]], is not sufficient anymore. We need to measure wall clock times instead. For proper benchmarking, it is advisable to use a library, such as [http://hackage.haskell.org/package/criterion criterion].<br />
<br />
A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.<br />
<br />
=== Further examples and documentation ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking. <br />
<br />
The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph-par library documentation] on Hackage.<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2011-08-11T12:49:32Z<p>Chak: /* Where to get it */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
Data Parallel Haskell (DPH) is available as an add-on for [http://haskell.org/ghc/download_ghc_7_2_1 GHC 7.2] in the form of a few separate cabal package. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used and certain idioms are avoided. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, install [http://haskell.org/ghc/download_ghc_7_2_1 GHC 7.2] and then install the DPH libraries with <code>cabal install</code> as follows:<br />
<blockquote><br />
<code>$ cabal update</code><br><br />
<code>$ cabal install dph-examples</code><br />
</blockquote><br />
This will install all DPH packages, including a set of simple examples, see [http://hackage.haskell.org/package/dph-examples dph-examples]. (The package [http://hackage.haskell.org/package/dph-examples dph-examples] does depend on OpenGL and Gloss as both are used in a visualised for the nobody example.)<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# OPTIONS_GHC -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.<br />
<br />
==== Generating input data ====<br />
<br />
To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile and link the program as described above.<br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].<br />
<br />
==== Parallel execution ====<br />
<br />
By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.) To determine the runtime of parallel code, measuring CPU time, as demonstrated in the [[GHC/Data Parallel Haskell/MainTimed|timed variant of the dot product example]], is not sufficient anymore. We need to measure wall clock times instead. For proper benchmarking, it is advisable to use a library, such as [http://hackage.haskell.org/package/criterion criterion].<br />
<br />
A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.<br />
<br />
=== Further examples and documentation ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking. <br />
<br />
The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph library documentation] on Hackage (which will be uploaded with the GHC 7.2 DPH release).<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2011-08-11T12:43:17Z<p>Chak: /* Where to get it */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
Data Parallel Haskell (DPH) is available as an add-on for [http://haskell.org/ghc/download_ghc_7_2_1 GHC 7.2] in the form of a few separate cabal package. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used and certain idioms are avoided. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, install [http://haskell.org/ghc/download_ghc_7_2_1 GHC 7.2] and then install the DPH libraries with <hask>cabal install<hask> as follows:<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# OPTIONS_GHC -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.<br />
<br />
==== Generating input data ====<br />
<br />
To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile and link the program as described above.<br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].<br />
<br />
==== Parallel execution ====<br />
<br />
By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.) To determine the runtime of parallel code, measuring CPU time, as demonstrated in the [[GHC/Data Parallel Haskell/MainTimed|timed variant of the dot product example]], is not sufficient anymore. We need to measure wall clock times instead. For proper benchmarking, it is advisable to use a library, such as [http://hackage.haskell.org/package/criterion criterion].<br />
<br />
A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.<br />
<br />
=== Further examples and documentation ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking. <br />
<br />
The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph library documentation] on Hackage (which will be uploaded with the GHC 7.2 DPH release).<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2011-08-11T12:36:52Z<p>Chak: /* Project status */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
Data Parallel Haskell (DPH) is available as an add-on for [http://haskell.org/ghc/download_ghc_7_2_1 GHC 7.2] in the form of a few separate cabal package. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used and certain idioms are avoided. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# OPTIONS_GHC -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.<br />
<br />
==== Generating input data ====<br />
<br />
To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile and link the program as described above.<br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].<br />
<br />
==== Parallel execution ====<br />
<br />
By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.) To determine the runtime of parallel code, measuring CPU time, as demonstrated in the [[GHC/Data Parallel Haskell/MainTimed|timed variant of the dot product example]], is not sufficient anymore. We need to measure wall clock times instead. For proper benchmarking, it is advisable to use a library, such as [http://hackage.haskell.org/package/criterion criterion].<br />
<br />
A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.<br />
<br />
=== Further examples and documentation ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking. <br />
<br />
The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph library documentation] on Hackage (which will be uploaded with the GHC 7.2 DPH release).<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2011-03-31T09:46:10Z<p>Chak: /* Compiling vectorised code */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.2. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# OPTIONS_GHC -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.<br />
<br />
==== Generating input data ====<br />
<br />
To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile and link the program as described above.<br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].<br />
<br />
==== Parallel execution ====<br />
<br />
By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.) To determine the runtime of parallel code, measuring CPU time, as demonstrated in the [[GHC/Data Parallel Haskell/MainTimed|timed variant of the dot product example]], is not sufficient anymore. We need to measure wall clock times instead. For proper benchmarking, it is advisable to use a library, such as [http://hackage.haskell.org/package/criterion criterion].<br />
<br />
A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.<br />
<br />
=== Further examples and documentation ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking. <br />
<br />
The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph library documentation] on Hackage (which will be uploaded with the GHC 7.2 DPH release).<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2011-01-25T05:50:02Z<p>Chak: /* Parallel execution */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.2. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# GHC_OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.<br />
<br />
==== Generating input data ====<br />
<br />
To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile and link the program as described above.<br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].<br />
<br />
==== Parallel execution ====<br />
<br />
By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.) To determine the runtime of parallel code, measuring CPU time, as demonstrated in the [[GHC/Data Parallel Haskell/MainTimed|timed variant of the dot product example]], is not sufficient anymore. We need to measure wall clock times instead. For proper benchmarking, it is advisable to use a library, such as [http://hackage.haskell.org/package/criterion criterion].<br />
<br />
A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.<br />
<br />
=== Further examples and documentation ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking. <br />
<br />
The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph library documentation] on Hackage (which will be uploaded with the GHC 7.2 DPH release).<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2011-01-25T05:37:26Z<p>Chak: /* Further examples */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.2. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# GHC_OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.<br />
<br />
==== Generating input data ====<br />
<br />
To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile and link the program as described above.<br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].<br />
<br />
==== Parallel execution ====<br />
<br />
By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.)<br />
<br />
A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.<br />
<br />
=== Further examples and documentation ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking. <br />
<br />
The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph library documentation] on Hackage (which will be uploaded with the GHC 7.2 DPH release).<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2011-01-25T05:27:48Z<p>Chak: /* Parallel execution */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.2. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# GHC_OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.<br />
<br />
==== Generating input data ====<br />
<br />
To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile and link the program as described above.<br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].<br />
<br />
==== Parallel execution ====<br />
<br />
By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.)<br />
<br />
A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2011-01-25T04:54:44Z<p>Chak: /* Using vectorised code */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.2. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# GHC_OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.<br />
<br />
==== Generating input data ====<br />
<br />
To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile and link the program as described above.<br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_Haskell/MainTimedGHC/Data Parallel Haskell/MainTimed2011-01-25T04:34:57Z<p>Chak: New page: The following variant of the main module for the dot product example determines and prints the runtime of the dot product kernel in microseconds. <haskell> import System.CPUTime (getCPUTim...</p>
<hr />
<div>The following variant of the main module for the dot product example determines and prints the runtime of the dot product kernel in microseconds.<br />
<haskell><br />
import System.CPUTime (getCPUTime)<br />
import System.Random (newStdGen)<br />
import Control.Exception (evaluate)<br />
import Data.Array.Parallel.PArray (PArray, randomRs, nf)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
-- generate random input vectors<br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
<br />
-- force the evaluation of the input vectors<br />
evaluate $ nf v<br />
evaluate $ nf w<br />
<br />
-- timed computations<br />
start <- getCPUTime<br />
let result = dotp_wrapper v w<br />
evaluate result<br />
end <- getCPUTime<br />
<br />
-- print the result<br />
putStrLn $ show result ++ " in " ++ show ((end - start) `div` 1000000) ++ "us"<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell></div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2011-01-25T04:32:51Z<p>Chak: /* Generating input data */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.2. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# GHC_OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded -fdph-par DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend.<br />
<br />
==== Generating input data ====<br />
<br />
To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile and link the program as described above.<br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2011-01-25T04:30:07Z<p>Chak: /* Generating input data */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.2. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# GHC_OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded -fdph-par DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend.<br />
<br />
==== Generating input data ====<br />
<br />
To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile and link the program as described above.<br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. A variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask> is at [wiki:GHC/Data Parallel Haskell/MainTimed].<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2011-01-25T03:41:06Z<p>Chak: /* Project status */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.2. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# GHC_OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded -fdph-par DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend.<br />
<br />
==== Generating input data ====<br />
<br />
To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile and link the program as described above.<br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2011-01-25T03:23:18Z<p>Chak: /* Using vectorised code */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# GHC_OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded -fdph-par DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend.<br />
<br />
==== Generating input data ====<br />
<br />
To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile and link the program as described above.<br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2011-01-25T03:15:11Z<p>Chak: /* Compiling vectorised code */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# GHC_OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fdph-par DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime.<br />
<br />
==== Generating input data ====<br />
<br />
To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile and link the program as described above.<br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2011-01-25T02:57:49Z<p>Chak: /* Compiling vectorised code */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# GHC_OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:<br />
<blockquote><br />
<code>ghc -c -Odph DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code.<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime.<br />
<br />
==== Generating input data ====<br />
<br />
To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile and link the program as described above.<br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2010-12-12T09:59:30Z<p>Chak: /* Feedback */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:<br />
<blockquote><br />
<code>ghc -c -Odph DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code.<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime.<br />
<br />
==== Generating input data ====<br />
<br />
To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile and link the program as described above.<br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2010-12-12T07:23:32Z<p>Chak: /* Generating input data */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:<br />
<blockquote><br />
<code>ghc -c -Odph DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code.<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime.<br />
<br />
==== Generating input data ====<br />
<br />
To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile and link the program as described above.<br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2010-12-12T07:22:56Z<p>Chak: /* Generating input data */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:<br />
<blockquote><br />
<code>ghc -c -Odph DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code.<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime.<br />
<br />
==== Generating input data ====<br />
<br />
To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data sets. Hence, instead of two small constant vectors, we might want to generate some larger input data:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile and link the program as described above.<br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2010-12-12T06:08:42Z<p>Chak: /* Using vectorised code */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:<br />
<blockquote><br />
<code>ghc -c -Odph DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code.<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable <code>dotp</code> with<br />
<blockquote><br />
<code>ghc -o dotp -threaded DotP.o Main.o</code><br />
</blockquote><br />
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime.<br />
<br />
==== Generating input data ====<br />
<br />
In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -O -fdph-seq Main.hs</code><br />
</blockquote><br />
and finally link with<br />
<blockquote><br />
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code><br />
</blockquote><br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2010-12-12T06:07:40Z<p>Chak: /* Using vectorised code */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:<br />
<blockquote><br />
<code>ghc -c -Odph DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code.<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:<br />
<haskell><br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -Odph Main.hs</code><br />
</blockquote><br />
and finally link the two modules into an executable `dotp` with<br />
<blockquote><br />
<code>ghc -o dotp -threaded DotP.o Main.o</code><br />
</blockquote><br />
We need the `-threaded` option to link with GHC's multi-threaded runtime.<br />
<br />
==== Generating input data ====<br />
<br />
In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -O -fdph-seq Main.hs</code><br />
</blockquote><br />
and finally link with<br />
<blockquote><br />
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code><br />
</blockquote><br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2010-12-12T06:03:22Z<p>Chak: /* Generating input data */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:<br />
<blockquote><br />
<code>ghc -c -Odph DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code.<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:<br />
<haskell><br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
<br />
==== Generating input data ====<br />
<br />
In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -O -fdph-seq Main.hs</code><br />
</blockquote><br />
and finally link with<br />
<blockquote><br />
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code><br />
</blockquote><br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2010-12-12T05:55:10Z<p>Chak: /* Using vectorised code */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:<br />
<blockquote><br />
<code>ghc -c -Odph DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code.<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:<br />
<haskell><br />
import Data.Array.Parallel.PArray (PArray, fromList)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= let v = fromList [1..10] -- convert lists...<br />
w = fromList [1,2..20] -- ...to parallel arrays<br />
result = dotp_wrapper v w -- invoke vectorised code<br />
in<br />
print result -- print the result<br />
</haskell><br />
<br />
==== Generating input data ====<br />
<br />
Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -O -fdph-seq Main.hs</code><br />
</blockquote><br />
and finally link with<br />
<blockquote><br />
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code><br />
</blockquote><br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2010-12-10T08:09:47Z<p>Chak: /* Compiling vectorised code */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE ParallelArrays #-}<br />
{-# OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:<br />
<blockquote><br />
<code>ghc -c -Odph DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code.<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -O -fdph-seq Main.hs</code><br />
</blockquote><br />
and finally link with<br />
<blockquote><br />
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code><br />
</blockquote><br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2010-12-10T07:47:41Z<p>Chak: /* Impedance matching */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The definition of <hask>dotp_double</hask> requires two language extensions, namely <hask>PArr</hask> to enable the syntax of parallel arrays and <hask>ParallelListComp</hask> for the parallel comprehension. Furthermore, we need to explicitly tell GHC which modules we want to vectorise.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
The compiler option to enable vectorisation is <code>-fvectorise</code>. Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE PArr, ParallelListComp #-}<br />
{-# OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_double,dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fcpr-off -fdph-seq DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that is geared at optimising DPH code. Moreover, we use <code>-fcpr-off</code> as GHC's CPR phase doesn't play nice with type families at the moment, which in turn are heavily used in the DPH library. We shall discuss <code>-fdph-seq</code> below.<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -O -fdph-seq Main.hs</code><br />
</blockquote><br />
and finally link with<br />
<blockquote><br />
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code><br />
</blockquote><br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2010-12-10T07:43:36Z<p>Chak: /* Special Prelude */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays '''cannot''' be passed. Instead, we can pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The definition of <hask>dotp_double</hask> requires two language extensions, namely <hask>PArr</hask> to enable the syntax of parallel arrays and <hask>ParallelListComp</hask> for the parallel comprehension. Furthermore, we need to explicitly tell GHC which modules we want to vectorise.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
The compiler option to enable vectorisation is <code>-fvectorise</code>. Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE PArr, ParallelListComp #-}<br />
{-# OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_double,dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fcpr-off -fdph-seq DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that is geared at optimising DPH code. Moreover, we use <code>-fcpr-off</code> as GHC's CPR phase doesn't play nice with type families at the moment, which in turn are heavily used in the DPH library. We shall discuss <code>-fdph-seq</code> below.<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -O -fdph-seq Main.hs</code><br />
</blockquote><br />
and finally link with<br />
<blockquote><br />
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code><br />
</blockquote><br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2010-12-10T04:26:07Z<p>Chak: /* No type classes */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude.html Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Double.html Data.Array.Parallel.Prelude.Double], [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Int.html Data.Array.Parallel.Prelude.Int], and [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Word8.html Data.Array.Parallel.Prelude.Word8]. These three modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). If your code needs any other numeric types or functions that are not implemented in the these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays '''cannot''' be passed. Instead, we can pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The definition of <hask>dotp_double</hask> requires two language extensions, namely <hask>PArr</hask> to enable the syntax of parallel arrays and <hask>ParallelListComp</hask> for the parallel comprehension. Furthermore, we need to explicitly tell GHC which modules we want to vectorise.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
The compiler option to enable vectorisation is <code>-fvectorise</code>. Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE PArr, ParallelListComp #-}<br />
{-# OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_double,dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fcpr-off -fdph-seq DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that is geared at optimising DPH code. Moreover, we use <code>-fcpr-off</code> as GHC's CPR phase doesn't play nice with type families at the moment, which in turn are heavily used in the DPH library. We shall discuss <code>-fdph-seq</code> below.<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -O -fdph-seq Main.hs</code><br />
</blockquote><br />
and finally link with<br />
<blockquote><br />
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code><br />
</blockquote><br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2010-12-10T04:25:15Z<p>Chak: /* Running DPH programs */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not properly handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude.html Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Double.html Data.Array.Parallel.Prelude.Double], [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Int.html Data.Array.Parallel.Prelude.Int], and [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Word8.html Data.Array.Parallel.Prelude.Word8]. These three modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). If your code needs any other numeric types or functions that are not implemented in the these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays '''cannot''' be passed. Instead, we can pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The definition of <hask>dotp_double</hask> requires two language extensions, namely <hask>PArr</hask> to enable the syntax of parallel arrays and <hask>ParallelListComp</hask> for the parallel comprehension. Furthermore, we need to explicitly tell GHC which modules we want to vectorise.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
The compiler option to enable vectorisation is <code>-fvectorise</code>. Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE PArr, ParallelListComp #-}<br />
{-# OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_double,dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fcpr-off -fdph-seq DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that is geared at optimising DPH code. Moreover, we use <code>-fcpr-off</code> as GHC's CPR phase doesn't play nice with type families at the moment, which in turn are heavily used in the DPH library. We shall discuss <code>-fdph-seq</code> below.<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -O -fdph-seq Main.hs</code><br />
</blockquote><br />
and finally link with<br />
<blockquote><br />
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code><br />
</blockquote><br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2010-12-10T04:19:45Z<p>Chak: /* Overview */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code called ''vectorisation'' that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but there it raises the level of expressiveness dramatically.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not properly handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude.html Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Double.html Data.Array.Parallel.Prelude.Double], [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Int.html Data.Array.Parallel.Prelude.Int], and [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Word8.html Data.Array.Parallel.Prelude.Word8]. These three modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). If your code needs any other numeric types or functions that are not implemented in the these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays '''cannot''' be passed. Instead, we can pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The definition of <hask>dotp_double</hask> requires two language extensions, namely <hask>PArr</hask> to enable the syntax of parallel arrays and <hask>ParallelListComp</hask> for the parallel comprehension. Furthermore, we need to explicitly tell GHC which modules we want to vectorise.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
The compiler option to enable vectorisation is <code>-fvectorise</code>. Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE PArr, ParallelListComp #-}<br />
{-# OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_double,dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fcpr-off -fdph-seq DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that is geared at optimising DPH code. Moreover, we use <code>-fcpr-off</code> as GHC's CPR phase doesn't play nice with type families at the moment, which in turn are heavily used in the DPH library. We shall discuss <code>-fdph-seq</code> below.<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -O -fdph-seq Main.hs</code><br />
</blockquote><br />
and finally link with<br />
<blockquote><br />
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code><br />
</blockquote><br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2010-12-10T04:13:59Z<p>Chak: /* Where to get it */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines analogs of most list operations from the Haskell prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code called ''vectorisation'' that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but there it raises the level of expressiveness dramatically.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not properly handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude.html Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Double.html Data.Array.Parallel.Prelude.Double], [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Int.html Data.Array.Parallel.Prelude.Int], and [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Word8.html Data.Array.Parallel.Prelude.Word8]. These three modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). If your code needs any other numeric types or functions that are not implemented in the these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays '''cannot''' be passed. Instead, we can pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The definition of <hask>dotp_double</hask> requires two language extensions, namely <hask>PArr</hask> to enable the syntax of parallel arrays and <hask>ParallelListComp</hask> for the parallel comprehension. Furthermore, we need to explicitly tell GHC which modules we want to vectorise.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
The compiler option to enable vectorisation is <code>-fvectorise</code>. Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE PArr, ParallelListComp #-}<br />
{-# OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_double,dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fcpr-off -fdph-seq DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that is geared at optimising DPH code. Moreover, we use <code>-fcpr-off</code> as GHC's CPR phase doesn't play nice with type families at the moment, which in turn are heavily used in the DPH library. We shall discuss <code>-fdph-seq</code> below.<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -O -fdph-seq Main.hs</code><br />
</blockquote><br />
and finally link with<br />
<blockquote><br />
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code><br />
</blockquote><br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2010-12-10T04:10:32Z<p>Chak: /* Project status */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
<center><br />
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png<br />
</center><br />
<br />
''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''<br />
<br />
__TOC__<br />
<br />
<br />
<br />
<br />
=== Project status ===<br />
<br />
We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).<br />
<br />
The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].<br />
<br />
DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
DPH is available in the current stable release GHC 6.10.1, which is [http://haskell.org/ghc/download_ghc_6_10_1.html available in source and binary form] for many architectures. If you are compiling 6.10.1 ''from source,'' please ensure that you include the <code>ghc-6.10.1-src-extralibs.tar.bz2</code> archive as it supplies important libraries. GHC distribution binaries should include these libraries by default.<br />
<br />
'''Update [March 2009]:''' The 6.10.1 release has now fallen considerably behind the current development version in the HEAD repository, not only with respect to DPH support, but generally concerning support for multi-core parallelism in the GHC runtime system. Hence, if you are interested in performance and scalability, you need to use the development compiler – with the usual caveats. We are planning a more mature stable release for 6.12. (Due to the scale of the changes involved, we are not able to backport the latest changes to the 6.10.2 release.) To use the code in the HEAD repository, please follow [http://hackage.haskell.org/trac/ghc/wiki/Building/QuickStart the standard build instructions.] Important is that you download ''package dph'' before you build and install the system; you can achieve that with<br />
<br />
./darcs-all --dph get<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines analogs of most list operations from the Haskell prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code called ''vectorisation'' that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but there it raises the level of expressiveness dramatically.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not properly handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude.html Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Double.html Data.Array.Parallel.Prelude.Double], [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Int.html Data.Array.Parallel.Prelude.Int], and [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Word8.html Data.Array.Parallel.Prelude.Word8]. These three modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). If your code needs any other numeric types or functions that are not implemented in the these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays '''cannot''' be passed. Instead, we can pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The definition of <hask>dotp_double</hask> requires two language extensions, namely <hask>PArr</hask> to enable the syntax of parallel arrays and <hask>ParallelListComp</hask> for the parallel comprehension. Furthermore, we need to explicitly tell GHC which modules we want to vectorise.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
The compiler option to enable vectorisation is <code>-fvectorise</code>. Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE PArr, ParallelListComp #-}<br />
{-# OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_double,dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fcpr-off -fdph-seq DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that is geared at optimising DPH code. Moreover, we use <code>-fcpr-off</code> as GHC's CPR phase doesn't play nice with type families at the moment, which in turn are heavily used in the DPH library. We shall discuss <code>-fdph-seq</code> below.<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -O -fdph-seq Main.hs</code><br />
</blockquote><br />
and finally link with<br />
<blockquote><br />
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code><br />
</blockquote><br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/User:ChakUser:Chak2010-03-22T04:10:03Z<p>Chak: </p>
<hr />
<div>Nothing to see here, really, but check out<br />
<br />
* my [http://www.cse.unsw.edu.au/~chak/ webpage] and<br />
* my [http://justtesting.org blog].<br />
<br />
On [[IRC channel|#haskell]] and #ghc, I am TacticalGrace.</div>Chakhttps://wiki.haskell.org/AusHac2010AusHac20102010-03-18T00:19:00Z<p>Chak: /* Possible Projects */</p>
<hr />
<div>If you've found this page, you use Haskell, ''and'' live in Australia (or at the very least able and willing to travel here), then you're in the right place! We're looking into organising a Haskell [[Hackathon]] some time during the middle of 2010, and this where it shall be organised.<br />
<br />
If you're interested in coming, '''please''' put your name down on the list below, along with your IRC nickname if you're on #haskell, and possibly your email (We'll use this to let you know of any progress we've made, but it's not mandatory). Also, if you've got something to discuss, feel free to add it to the bottom of the page in the Discussion section (just to keep the rest of the page clean and helpful).<br />
<br />
== What we've got so far ==<br />
<br />
===Why===<br />
<br />
Because we miss out on all the fun they have up north, and we've got something to offer. It's also a great chance to meet all these people you talk to on IRC, or read their blogs, and just have a good time, while getting some (potentially) useful work done!<br />
<br />
===When===<br />
<br />
A few dates have been discussed, mainly taking into account when the university holidays are for various universities:<br />
<br />
* ANU: 7 June -> 18 July<br />
* UNSW: 29 June -> 18 July<br />
<br />
So so far we need a weekend between the 28th of June and the 18th of July.<br />
<br />
We're looking at organising it over a weekend, and I (Axman6) would quite like to have it start on a Friday, ending on Sunday. This does not at all mean that those who can’t make the Friday will miss out, the more people we have, the better. But I think that having more time will mean that we can get more done (which is the point right?).<br />
<br />
===Where===<br />
<br />
Manuel Chakravarty and Ben Lippmeier have said there should be no problem finding a room at UNSW, with the only possible problem being Internet access for everyone, but hopefully something can be arranged by that time.<br />
<br />
===Who===<br />
<br />
If you're interested in coming, please show your interest by adding your details to the list below (if you don't have an account, please email me (Axman6) your details and I'll add you).<br />
<br />
<table border="1px"><br />
<tr><br />
<td>Name</td><br />
<td>IRC Nickname</td><br />
<td>Email</td><br />
<td>Availability</td><br />
<td>Preferred date</td><br />
<td>Comment</td><br />
</tr><br />
<br />
<tr><br />
<td>Alex Mason</td><br />
<td>Axman6</td><br />
<td>axman6@gmail.com</td><br />
<td>Probably any weekend during the ANU holidays</td><br />
<td>-</td><br />
<td>Organiser... sort of</td><br />
</tr><br />
<br />
<br />
<tr><br />
<td>[[:User:ivanm|Ivan Miljenovic]]</td><br />
<td>ivanm</td><br />
<td>Ivan <dot> Miljenovic <at> gmail <dot> com</td><br />
<td>*shrug* lazy PhD student, so whenever</td><br />
<td>&nbsp;&nbsp; <=== </td><br />
<td>ditto</td><br />
</tr><br />
<br />
<tr><br />
<td>Tony Morris</td><br />
<td>dibblego</td><br />
<td>code@tmorris.net</td><br />
<td>Nothing specific</td><br />
<td>-</td><br />
<td>Tentative, depending on health</td><br />
</tr><br />
<br />
<tr><br />
<td>Manuel Chakravarty</td><br />
<td>TacticalGrace</td><br />
<td>chak@justtesting.org</td><br />
<td>I'm away 4-11 July; will probably not be able to attend all of it regardless of date</td><br />
<td>Probably weekend of the 18th July</td><br />
<td>Will help getting a room at UNSW</td><br />
</tr><br />
</table><br />
<br />
== Discussion ==<br />
<br />
=== Possible Projects ===<br />
<br />
====Generic graph class====<br />
'''What:''' I (Ivan) last year floated the idea of replacing the current default array-based Graph data type with an extensible set of classes with default instances. There's various interest about this around and I've done some work on it, but if there's anyone else coming it'd be better to bounce ideas together about how to define such classes.<br />
<br />
'''Who:''' Ivan M<br />
<br />
====Gloss-based plots====<br />
'''What:''' Either an alternative graphing back end to Criterion that only relies on OpenGL (through the use of Gloss), or a library for plotting. At the moment Gloss looks like it may only be suitable for bar type graphs, but we'll see. (We may look into writing some other library that's better suited than Gloss, as Gloss is aimed at students learning haskell, and wanting to just get something drawn)<br />
<br />
'''Who:''' Ivan M, Alex M<br />
<br />
====GHC LLVM backend====<br />
'''What:''' The recent work dome by David Terei on an LLVM backend for GHC has shown some fantastic results, and getting it to a point where it could become the default GHC backend is something a lot of people would really like to see.<br />
<br />
'''Who:''' Alex M, Manuel<br />
<br />
====Accelerate====<br />
'''What:''' [http://hackage.haskell.org/package/accelerate Accelerate] is a Haskell EDSL for regular array computations. The aim is to make it generate so blindingly fast code that the C folks start to cry. An LLVM backend is in very early stages of development and a CUDA GPU backend is good enough to run some first small Accelerate programs.<br />
<br />
'''Who:''' Manuel<br />
<br />
=== Dates ===<br />
<br />
<br />
== Related Links ==<br />
<br />
* [[OzHaskell]]</div>Chakhttps://wiki.haskell.org/AusHac2010AusHac20102010-03-18T00:10:47Z<p>Chak: /* Who */</p>
<hr />
<div>If you've found this page, you use Haskell, ''and'' live in Australia (or at the very least able and willing to travel here), then you're in the right place! We're looking into organising a Haskell [[Hackathon]] some time during the middle of 2010, and this where it shall be organised.<br />
<br />
If you're interested in coming, '''please''' put your name down on the list below, along with your IRC nickname if you're on #haskell, and possibly your email (We'll use this to let you know of any progress we've made, but it's not mandatory). Also, if you've got something to discuss, feel free to add it to the bottom of the page in the Discussion section (just to keep the rest of the page clean and helpful).<br />
<br />
== What we've got so far ==<br />
<br />
===Why===<br />
<br />
Because we miss out on all the fun they have up north, and we've got something to offer. It's also a great chance to meet all these people you talk to on IRC, or read their blogs, and just have a good time, while getting some (potentially) useful work done!<br />
<br />
===When===<br />
<br />
A few dates have been discussed, mainly taking into account when the university holidays are for various universities:<br />
<br />
* ANU: 7 June -> 18 July<br />
* UNSW: 29 June -> 18 July<br />
<br />
So so far we need a weekend between the 28th of June and the 18th of July.<br />
<br />
We're looking at organising it over a weekend, and I (Axman6) would quite like to have it start on a Friday, ending on Sunday. This does not at all mean that those who can’t make the Friday will miss out, the more people we have, the better. But I think that having more time will mean that we can get more done (which is the point right?).<br />
<br />
===Where===<br />
<br />
Manuel Chakravarty and Ben Lippmeier have said there should be no problem finding a room at UNSW, with the only possible problem being Internet access for everyone, but hopefully something can be arranged by that time.<br />
<br />
===Who===<br />
<br />
If you're interested in coming, please show your interest by adding your details to the list below (if you don't have an account, please email me (Axman6) your details and I'll add you).<br />
<br />
<table border="1px"><br />
<tr><br />
<td>Name</td><br />
<td>IRC Nickname</td><br />
<td>Email</td><br />
<td>Availability</td><br />
<td>Preferred date</td><br />
<td>Comment</td><br />
</tr><br />
<br />
<tr><br />
<td>Alex Mason</td><br />
<td>Axman6</td><br />
<td>axman6@gmail.com</td><br />
<td>Probably any weekend during the ANU holidays</td><br />
<td>-</td><br />
<td>Organiser... sort of</td><br />
</tr><br />
<br />
<br />
<tr><br />
<td>[[:User:ivanm|Ivan Miljenovic]]</td><br />
<td>ivanm</td><br />
<td>Ivan <dot> Miljenovic <at> gmail <dot> com</td><br />
<td>*shrug* lazy PhD student, so whenever</td><br />
<td>&nbsp;&nbsp; <=== </td><br />
<td>ditto</td><br />
</tr><br />
<br />
<tr><br />
<td>Tony Morris</td><br />
<td>dibblego</td><br />
<td>code@tmorris.net</td><br />
<td>Nothing specific</td><br />
<td>-</td><br />
<td>Tentative, depending on health</td><br />
</tr><br />
<br />
<tr><br />
<td>Manuel Chakravarty</td><br />
<td>TacticalGrace</td><br />
<td>chak@justtesting.org</td><br />
<td>I'm away 4-11 July; will probably not be able to attend all of it regardless of date</td><br />
<td>Probably weekend of the 18th July</td><br />
<td>Will help getting a room at UNSW</td><br />
</tr><br />
</table><br />
<br />
== Discussion ==<br />
<br />
=== Possible Projects ===<br />
<br />
====Generic graph class====<br />
'''What:''' I (Ivan) last year floated the idea of replacing the current default array-based Graph data type with an extensible set of classes with default instances. There's various interest about this around and I've done some work on it, but if there's anyone else coming it'd be better to bounce ideas together about how to define such classes.<br />
<br />
'''Who:''' Ivan M<br />
<br />
====Gloss-based plots====<br />
'''What:''' Either an alternative graphing back end to Criterion that only relies on OpenGL (through the use of Gloss), or a library for plotting. At the moment Gloss looks like it may only be suitable for bar type graphs, but we'll see. (We may look into writing some other library that's better suited than Gloss, as Gloss is aimed at students learning haskell, and wanting to just get something drawn)<br />
<br />
'''Who:''' Ivan M, Alex M<br />
<br />
====GHC LLVM backend====<br />
'''What:''' The recent work dome by David Terei on an LLVM backend for GHC has shown some fantastic results, and getting it to a point where it could become the default GHC backend is something a lot of people would really like to see.<br />
<br />
'''Who:''' Alex M<br />
<br />
=== Dates ===<br />
<br />
<br />
== Related Links ==<br />
<br />
* [[OzHaskell]]</div>Chakhttps://wiki.haskell.org/IPhoneIPhone2009-06-19T20:28:15Z<p>Chak: </p>
<hr />
<div>If you are working with Haskell and making iPhone apps, or if you intend to soon, please fill in your info below.<br />
By helping each other out, we can work more productively and have more fun.<br />
<br />
{|width="80%" border="1" cellpadding="2" cellspacing="0"<br />
|-<br />
!Name<br />
!Contact info<br />
!Haskell-fu (0-5)<br />
!iPhone-fu (0-5)<br />
!Have (to share)<br />
!Need<br />
!Intended iPhone apps<br />
|-<br />
| Conal Elliott<br />
| [http://conal.net Home], [http://conal.net/blog blog], [http://haskell.org/haskellwiki/User:Conal wiki user], [http://twitter.com/conal Twitter], [http://www.facebook.com/profile.php?id=685783314&ref=name Facebook], [http://www.linkedin.com/profile?&key=4476842 Linkedin], IRC: conal<br />
| 5<br />
| 0<br />
| Functional graphics & GUI, misc Haskell libs, design skills<br />
| iPhone basics, Haskell-to-iPhone compiler<br />
| Interactive graphics toys<br />
|-<br />
| Chris Eidhof<br />
| [http://eidhof.nl Home], [http://tupil.com Tupil], [http://haskell.org/haskellwiki/User:ChrisEidhof wiki user], [http://twitter.com/chriseidhof Twitter], [http://www.linkedin.com/pub/chris-eidhof/3/b6/2b6 Linkedin], IRC: chr1s<br />
| 4<br />
| 3<br />
| iPhone experience, web programming experience, dependent types experience<br />
| Haskell-to-iPhone compiler (either as DSL or GHC Core -> iPhone)<br />
| Navigation-based apps (think of things like iTunes, Facebook, etc.), Games (maybe using a combination of FRP and something like arrowlets)<br />
|-<br />
| Daniel Peebles<br />
| [http://pumpkinpat.ch Home], [http://twitter.com/copumpkin Twitter]<br />
| 3<br />
| 4<br />
| Extensive iPhone platform knowledge<br />
| GHC cross-compiling to ARM Mach-O<br />
| Nothing in particular yet<br />
|-<br />
| John Meacham<br />
| [http://repetae.net Home], [http://notanumber.net/ blog]<br />
| -<br />
| -<br />
| Working Haskell to iPhone compiler (jhc)<br />
| Testers and Feedback to make cross compilation smoother. HOC integration with jhc.<br />
| Symbolic Algebra Application, Equation Editor<br />
|-<br />
| Eelco Lempsink<br />
| [http://eelco.lempsink.nl Home], [http://tupil.com Tupil], [http://haskell.org/haskellwiki/User:eelco wiki user], [http://twitter.com/eelco Twitter], [http://www.linkedin.com/in/lempsink Linkedin], IRC: eelco<br />
| 4<br />
| 3<br />
| iPhone and web experience<br />
| Haskell-to-iPhone with (Cocoa Touch) API intergration<br />
| Nothing in particular, looking for a good Haskell use-case :)<br />
|-<br />
| Bernd Brassel<br />
| [http://www-ps.informatik.uni-kiel.de/~bbr Home],[http://www.art2guide.com/index_en.html art2guide]<br />
| 5<br />
| 4<br />
| Haskell experience, iPhone developer<br />
| iPhone embedding into Haskell, good programmers<br />
| audio-visual guiding systems<br />
|-<br />
| Martin Kudlvasr<br />
| [http://trinpad.eu not exactly home],[http://www.linkedin.com/in/martinkudlvasr LinkedIn], irc: trin_cz, xmpp: trin@jabbim.cz<br />
| 3<br />
| 0<br />
| year of haskell experience in OpenGL and project euler<br />
| iPhone basics, Haskell-to-iPhone compiler<br />
| fascinated by reactive, game development<br />
|-<br />
| Sebastiaan Visser<br />
| [http://github.com/sebastiaanvisser Projects], [http://haskell.org/haskellwiki/User:Sebastiaan wiki user], [http://twitter.com/sfvisser Twitter]<br />
| 4<br />
| 0<br />
| Some experience/ideas about building EDSLs.<br />
| Deep EDSL Haskell-to-ObjectiveC, high-level to target GUI/animation. <br />
| Nothing in particular yet. Want to have objective C backend for [http://github.com/sebastiaanvisser/frp-js/tree/master this] EDSL.<br />
|-<br />
| Manuel Chakravarty<br />
| [http://www.cse.unsw.edu.au/~chak/ Home], [http://justtesting.org blog], [http://haskell.org/haskellwiki/User:chak wiki user], [http://twitter.com/TacticalGrace Twitter], [http://www.linkedin.com/in/manuelchakravarty LinkedIn], IRC: Chilli<br />
| 5<br />
| 2<br />
| Haskell EDSL & compiler know how; Objective-C and Cocoa Touch basics<br />
| Haskell tools for iphone dev<br />
| games & productivity apps<br />
|-<br />
|}<br />
<br />
There are at least two ways to use Haskell to make iPhone apps.<br />
One is having a Haskell-to-iPhone compiler, which would probably cross-compile from another host environment (probably Mac OS X).<br />
Another way is to write Haskell programs that ''generate'' iPhone-compatible code when run (rather than when compiled), based on an embedded DSL, similarly to [http://conal.net/papers/jfp-saig/ ''Compiling Embedded Languages''].<br />
<br />
Some helpful resources:<br />
<br />
* [http://iphoneideas.tumblr.com/ Free iPhone ideas] (blog by Chris Eidhof)<br />
* [http://hoc.sourceforge.net/ HOC Haskell to Objective-C binding]<br />
* [http://github.com/sebastiaanvisser/frp-js/tree/master Reactive DSL currently with JS backend]. We might be working on Objective-C backend during Hack-ɸ.<br />
* Stanford course: [http://www.stanford.edu/class/cs193p/ iPhone Application Programming], with online notes, code, and lecture video.<br />
* [http://hackage.haskell.org/trac/ghc/wiki/ObjectiveC Haskell Objective-C FFI proposal] (work-in-progress)</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2009-03-20T00:15:39Z<p>Chak: /* Where to get it */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
__TOC__<br />
<br />
=== Project status ===<br />
<br />
A first ''technology preview'' of Data Parallel Haskell (DPH) is included in the 6.10.1 release of GHC. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance).<br />
<br />
The purpose of this technology preview is twofold. Firstly, it gives interested early adopters the opportunity to see where the project is headed and enables them to experiment with simple DPH programs. Secondly, we hope to get user feedback that helps us to guide the project and prioritise those features that our users are most interested in.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
DPH is available in the current stable release GHC 6.10.1, which is [http://haskell.org/ghc/download_ghc_6_10_1.html available in source and binary form] for many architectures. If you are compiling 6.10.1 ''from source,'' please ensure that you include the <code>ghc-6.10.1-src-extralibs.tar.bz2</code> archive as it supplies important libraries. GHC distribution binaries should include these libraries by default.<br />
<br />
'''Update [March 2009]:''' The 6.10.1 release has now fallen considerably behind the current development version in the HEAD repository, not only with respect to DPH support, but generally concerning support for multi-core parallelism in the GHC runtime system. Hence, if you are interested in performance and scalability, you need to use the development compiler – with the usual caveats. We are planning a more mature stable release for 6.12. (Due to the scale of the changes involved, we are not able to backport the latest changes to the 6.10.2 release.) To use the code in the HEAD repository, please follow [http://hackage.haskell.org/trac/ghc/wiki/Building/QuickStart the standard build instructions.] Important is that you download ''package dph'' before you build and install the system; you can achieve that with<br />
<br />
./darcs-all --dph get<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines analogs of most list operations from the Haskell prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code called ''vectorisation'' that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but there it raises the level of expressiveness dramatically.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not properly handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-Prelude.html Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-Prelude-Double.html Data.Array.Parallel.Prelude.Double], [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-Prelude-Int.html Data.Array.Parallel.Prelude.Int], and [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-Prelude-Word8.html Data.Array.Parallel.Prelude.Word8]. These three modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). If your code needs any other numeric types or functions that are not implemented in the these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays '''cannot''' be passed. Instead, we can pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The definition of <hask>dotp_double</hask> requires two language extensions, namely <hask>PArr</hask> to enable the syntax of parallel arrays and <hask>ParallelListComp</hask> for the parallel comprehension. Furthermore, we need to explicitly tell GHC which modules we want to vectorise.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
The compiler option to enable vectorisation is <code>-fvectorise</code>. Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE PArr, ParallelListComp #-}<br />
{-# OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_double,dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fcpr-off -fdph-seq DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that is geared at optimising DPH code. Moreover, we use <code>-fcpr-off</code> as GHC's CPR phase doesn't play nice with type families at the moment, which in turn are heavily used in the DPH library. We shall discuss <code>-fdph-seq</code> below.<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -O -fdph-seq Main.hs</code><br />
</blockquote><br />
and finally link with<br />
<blockquote><br />
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code><br />
</blockquote><br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2009-03-09T10:14:43Z<p>Chak: /* Where to get it */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
__TOC__<br />
<br />
=== Project status ===<br />
<br />
A first ''technology preview'' of Data Parallel Haskell (DPH) is included in the 6.10.1 release of GHC. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance).<br />
<br />
The purpose of this technology preview is twofold. Firstly, it gives interested early adopters the opportunity to see where the project is headed and enables them to experiment with simple DPH programs. Secondly, we hope to get user feedback that helps us to guide the project and prioritise those features that our users are most interested in.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
Currently, we recommend to use the implementation in GHC 6.10.1. It is [http://haskell.org/ghc/download_ghc_6_10_1.html available in source and binary form] for many architectures. (Please use the version in the HEAD repository of GHC only if you are a GHC developer or a very experienced GHC user and if you know the current status of the DPH code – intermediate versions may well be broken while we implement major changes.)<br />
<br />
Note that in addition to the compiler you will need the <code>dph</code> package; this is included in the <code>ghc-XXX-src-extralibs.tar.bz2</code> archive, but can also be retrieved by invoking:<br />
<br />
./sync-all --dph get<br />
<br />
from within the compiler source tree.<br />
<br />
'''Note on versions:''' The 6.10.1 release has now fallen considerably behind the current development version in the HEAD repository, not only with respect to DPH support, but generally concerning support for multi-core parallelism in the GHC runtime system. Hence, if you are interested in performance and scalability, you need to use development compiler – with the usual caveats. We are planning a more mature stable release for 6.12. (Due to the scale of the changes involved, we are not able to backport the latest changes to the 6.10.2 release.)<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines analogs of most list operations from the Haskell prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code called ''vectorisation'' that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but there it raises the level of expressiveness dramatically.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not properly handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-Prelude.html Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-Prelude-Double.html Data.Array.Parallel.Prelude.Double], [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-Prelude-Int.html Data.Array.Parallel.Prelude.Int], and [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-Prelude-Word8.html Data.Array.Parallel.Prelude.Word8]. These three modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). If your code needs any other numeric types or functions that are not implemented in the these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays '''cannot''' be passed. Instead, we can pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The definition of <hask>dotp_double</hask> requires two language extensions, namely <hask>PArr</hask> to enable the syntax of parallel arrays and <hask>ParallelListComp</hask> for the parallel comprehension. Furthermore, we need to explicitly tell GHC which modules we want to vectorise.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
The compiler option to enable vectorisation is <code>-fvectorise</code>. Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE PArr, ParallelListComp #-}<br />
{-# OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_double,dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fcpr-off -fdph-seq DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that is geared at optimising DPH code. Moreover, we use <code>-fcpr-off</code> as GHC's CPR phase doesn't play nice with type families at the moment, which in turn are heavily used in the DPH library. We shall discuss <code>-fdph-seq</code> below.<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -O -fdph-seq Main.hs</code><br />
</blockquote><br />
and finally link with<br />
<blockquote><br />
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code><br />
</blockquote><br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/GHC/Data_Parallel_HaskellGHC/Data Parallel Haskell2009-03-09T10:13:14Z<p>Chak: /* Where to get it */</p>
<hr />
<div>[[Category:GHC|Data Parallel Haskell]]<br />
== Data Parallel Haskell ==<br />
<br />
''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell]. <br />
<br />
__TOC__<br />
<br />
=== Project status ===<br />
<br />
A first ''technology preview'' of Data Parallel Haskell (DPH) is included in the 6.10.1 release of GHC. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance).<br />
<br />
The purpose of this technology preview is twofold. Firstly, it gives interested early adopters the opportunity to see where the project is headed and enables them to experiment with simple DPH programs. Secondly, we hope to get user feedback that helps us to guide the project and prioritise those features that our users are most interested in.<br />
<br />
'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.<br />
<br />
=== Where to get it ===<br />
<br />
Currently, we recommend to use the implementation in GHC 6.10.1. It is [http://haskell.org/ghc/download_ghc_6_10_1.html available in source and binary form] for many architectures. (Please use the version in the HEAD repository of GHC only if you are a GHC developer or a very experienced GHC user and if you know the current status of the DPH code – intermediate versions may well be broken while we implement major changes.)<br />
<br />
Note that in addition to the compiler you will need the <code>dph</code> package; this is included in the <code>ghc-XXX-src-extralibs.tar.bz2</code> archive, but can also be retrieved by invoking:<br />
<br />
./sync-all --dph get<br />
<br />
from within the compiler source tree.<br />
<br />
'''Note on versions:''' The 6.10.1 release has now fallen considerably behind the current development version in the HEAD repository, not only with respect to DPH support, but generally concerning support for multi-core parallelism in the GHC runtime system. Hence, if you are interested in performance and scalability, you need to use development compiler – with the usual caveats. We are planning a more mature stable release for 6.12.<br />
<br />
=== Overview ===<br />
<br />
From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines analogs of most list operations from the Haskell prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).<br />
<br />
The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)<br />
<br />
As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that absent from the standard list library, such as <hask>permuteP</hask>.<br />
<br />
=== A simple example ===<br />
<br />
As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:<br />
<haskell><br />
dotp :: Num a => [:a:] -> [:a:] -> a<br />
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.<br />
<br />
=== Running DPH programs ===<br />
<br />
Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code called ''vectorisation'' that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but there it raises the level of expressiveness dramatically.<br />
<br />
==== No type classes ====<br />
<br />
Unfortunately, vectorisation does not properly handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.<br />
<haskell><br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
</haskell><br />
<br />
==== Special Prelude ====<br />
<br />
As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-Prelude.html Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-Prelude-Double.html Data.Array.Parallel.Prelude.Double], [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-Prelude-Int.html Data.Array.Parallel.Prelude.Int], and [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-Prelude-Word8.html Data.Array.Parallel.Prelude.Word8]. These three modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). If your code needs any other numeric types or functions that are not implemented in the these Prelude modules, you currently need to implement and vectorise that functionality yourself.<br />
<br />
To compile <hask>dotp_double</hask>, we add the following three import statements:<br />
<haskell><br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
</haskell><br />
<br />
==== Impedance matching ====<br />
<br />
Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays '''cannot''' be passed. Instead, we can pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code). <br />
<br />
Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.<br />
<haskell><br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.<br />
<br />
==== Compiling vectorised code ====<br />
<br />
The definition of <hask>dotp_double</hask> requires two language extensions, namely <hask>PArr</hask> to enable the syntax of parallel arrays and <hask>ParallelListComp</hask> for the parallel comprehension. Furthermore, we need to explicitly tell GHC which modules we want to vectorise.<br />
<br />
Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.<br />
<br />
The compiler option to enable vectorisation is <code>-fvectorise</code>. Overall, we get the following complete module definition for the dot-product code:<br />
<haskell><br />
{-# LANGUAGE PArr, ParallelListComp #-}<br />
{-# OPTIONS -fvectorise #-}<br />
<br />
module DotP (dotp_double,dotp_wrapper)<br />
where<br />
<br />
import qualified Prelude<br />
import Data.Array.Parallel.Prelude<br />
import Data.Array.Parallel.Prelude.Double<br />
<br />
dotp_double :: [:Double:] -> [:Double:] -> Double<br />
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]<br />
<br />
dotp_wrapper :: PArray Double -> PArray Double -> Double<br />
{-# NOINLINE dotp_wrapper #-}<br />
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)<br />
</haskell><br />
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:<br />
<blockquote><br />
<code>ghc -c -Odph -fcpr-off -fdph-seq DotP.hs</code><br />
</blockquote><br />
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that is geared at optimising DPH code. Moreover, we use <code>-fcpr-off</code> as GHC's CPR phase doesn't play nice with type families at the moment, which in turn are heavily used in the DPH library. We shall discuss <code>-fdph-seq</code> below.<br />
<br />
==== Using vectorised code ====<br />
<br />
Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:<br />
<haskell><br />
import System.Random (newStdGen)<br />
import Data.Array.Parallel.PArray (PArray, randomRs)<br />
<br />
import DotP (dotp_wrapper) -- import vectorised code<br />
<br />
main :: IO ()<br />
main<br />
= do <br />
gen1 <- newStdGen<br />
gen2 <- newStdGen<br />
let v = randomRs n range gen1<br />
w = randomRs n range gen2<br />
print $ dotp_wrapper v w -- invoke vectorised code and print the result<br />
where<br />
n = 10000 -- vector length<br />
range = (-100, 100) -- range of vector elements<br />
</haskell><br />
We compile this module with<br />
<blockquote><br />
<code>ghc -c -O -fdph-seq Main.hs</code><br />
</blockquote><br />
and finally link with<br />
<blockquote><br />
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code><br />
</blockquote><br />
<br />
'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.<br />
<br />
==== Parallel execution ====<br />
<br />
The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code. <br />
<br />
In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.<br />
<br />
Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.<br />
<br />
=== Further examples ===<br />
<br />
Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.<br />
<br />
The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].<br />
<br />
=== Designing parallel programs ===<br />
<br />
Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.<br />
<br />
DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:<br />
<haskell><br />
data RTree a = RNode [:RTree a:]<br />
</haskell><br />
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]<br />
<br />
For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]<br />
<br />
=== Further reading and information on the implementation ===<br />
<br />
DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].<br />
<br />
For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].<br />
<br />
=== Feedback ===<br />
<br />
Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:<br />
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]<br />
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]<br />
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]<br />
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]</div>Chakhttps://wiki.haskell.org/Gtk2HsGtk2Hs2009-03-04T02:45:54Z<p>Chak: /* Using the GTK+ OS X Framework */</p>
<hr />
<div>[[Category:User interfaces]]<br />
== What is it? ==<br />
<br />
Gtk2Hs is a Haskell binding to Gtk+ 2.x.<br />
Using it, one can write Gtk+ based applications with GHC.<br />
<br />
== Homepage ==<br />
<br />
http://haskell.org/gtk2hs/<br />
<br />
== Status ==<br />
<br />
It currently works with Gtk+ 2.0 through to 2.8 on Unix, Win32 and MacOS X.<br />
The widget function coverage is almost complete, only a few minor bits and pieces are missing.<br />
<br />
It currently builds with ghc 5.04.3 through to 6.8.2<br />
<br />
== Installation Notes ==<br />
=== Mac OS X ===<br />
<br />
==== Using the GTK+ OS X Framework ====<br />
<br />
This explains how to install Gtk2Hs on Macs using the native [http://www.gtk-osx.org/ GTK+ OS X Framework], a port of GTK+ to the Mac that does '''not''' depend on X11, and hence, is better integrated into the Mac desktop - i.e., menus actually appear in the menu bar, where they belong. It also avoids the often tedious installation of GTK+ via MacPorts. However, it misses support for optional Gtk2Hs packages that are currently not supported by the [http://www.gtk-osx.org/ GTK+ OS X Framework], most notably support for Glade. It does include support for Cairo, though.<br />
<br />
Here is how to install the library:<br />
# Download and install [http://www.gtk-osx.org/ GTK+ OS X Framework] (this uses the standard Mac package installer).<br />
# Install [http://pkg-config.freedesktop.org/ pkg-config], either by compiling it from source or via MacPorts.<br />
# Download and unpack the Gtk2Hs tar ball from the [http://www.haskell.org/gtk2hs/downloads/ Gtk2Hs download page] (I tested 0.10.0).<br />
# Configure with (you may want to remove the two backslashes and put everything on one line)<br />
env PKG_CONFIG_PATH=/Library/Frameworks/Cairo.framework/Resources/dev/lib/pkgconfig:\ <br />
/Library/Frameworks/GLib.framework/Resources/dev/lib/pkgconfig:\ <br />
/Library/Frameworks/Gtk.framework/Resources/dev/lib/pkgconfig ./configure --disable-gio<br />
# Build with<br />
make<br />
# Install (to <tt>/usr/local/</tt> unless a <tt>--prefix</tt> option was passed to <tt>configure</tt>) with<br />
sudo make install<br />
<br />
The library is now registered with the package database of the GHC you used for compiling.<br />
<br />
NB: Thanks to Ross Mellgren for his post on the gtk2hs-users list that outlined the use of <tt>PKG_CONFIG_PATH</tt>.<br />
<br />
==== Article as of Mid 2008 ====<br />
Installing Gtk2Hs on Mac requires some finesse, at least until Haskell Libary Platform is built or ghc-6.8.3 is <br />
available in macports. (These are planned for late 2008.)<br />
<br />
* Install [http://macports.org MacPorts]<br />
* Install dependencies:<br />
sudo port instll glade3 libglade2 gstreamer gst-plugins-base gtksourceview cairo librsvg gtkglext firefox<br />
* Update PKG_CONFIG_PATH (for libraries)<br />
export PKG_CONFIG_PATH=/usr/lib/pkgconfig:/usr/local/lib/pkgconfig:/opt/local/lib/pkgconfig<br />
* Update ghc to use macports libs: Edit your main <tt>ghc</tt> driver program and change the last line to:<br />
exec $GHCBIN $TOPDIROPT ${1+"$@"} -L/opt/local/lib -I/opt/local/include<br />
* Download Gtk2Hs following instructions at [http://www.haskell.org/gtk2hs/downloads/ Gtk2Hs Download page]<br />
* Check configuration:<br />
./configure --enable-docs --enable-profiling<br />
<br />
...<br />
<br />
**************************************************<br />
* Configuration completed successfully. <br />
* <br />
* The following packages will be built: <br />
* <br />
* glib : yes <br />
* gtk : yes <br />
* glade : yes <br />
* cairo : yes <br />
* svgcairo : yes <br />
* gtkglext : yes <br />
* gconf : yes <br />
* sourceview : yes <br />
* mozembed : yes <br />
* soegtk : yes <br />
* gnomevfs : yes <br />
* gstreamer : yes <br />
* documentation : yes <br />
* <br />
* Now do "(g)make" followed by "(g)make install"<br />
**************************************************<br />
* Build and Install:<br />
make <br />
sudo make install<br />
==== Recent experiences ====<br />
I successfully installed the latest version on Mac OS 10.5 by:<br />
* Installing Macports.<br />
* <tt>sudo port install ghc</tt><br />
* <tt>sudo port install gtk2hs</tt> - which does not complete successfully. It does however, install the appropriate dependencies. Note that there are so many, you may need to install a couple of times due to time outs etc.. The build of Gtk2HS will fail, but that is ok - continue as below.<br />
* Remove the build directory under <tt>/opt/.../build/gtk2hs</tt><br />
* Download Gtk2Hs via darcs as per [http://haskell.org/gtk2hs/development/#darcs the gtk2hs download instructions]<br />
* do a <tt>sudo port install automake</tt><br />
* do a <tt>sudo port install alex</tt><br />
* do a <tt>sudo port install happy</tt> (Note this also fails and must be built from source. See the [[Happy]] page for details.)<br />
* Follow the build instructions on the [http://haskell.org/gtk2hs/development/#darcs the gtk2hs download page]. I would suggest using <tt>./configure --prefix=/opt/local</tt> to get it in the same place as ports - personal preference though.<br />
Good luck - as usual, your mileage may vary.<br />
<br />
== Demos ==<br />
<br />
=== OpenGL and Gtk2Hs ===<br />
<br />
[[Gtk2Hs/Demos/GtkGLext/hello.hs]]<br />
<br />
[[Gtk2Hs/Demos/GtkGLext/terrain.hs]] requires [[Gtk2Hs/Demos/GtkGLext/terrain.xpm]]<br />
<br />
==FAQs==<br />
These are links to FAQS on the main site.<br />
*[http://haskell.org/gtk2hs/archives/2005/06/23/hiding-the-console-on-windows/#more-26 Hiding the console on windows]<br />
*[http://haskell.org/gtk2hs/archives/2005/07/24/writing-multi-threaded-guis/#more-38 Writing multi-threaded GUIs]<br />
*[http://haskell.org/gtk2hs/archives/2005/06/24/building-from-source-on-windows/#more-15 Building on Windows]<br />
*[http://haskell.org/gtk2hs/development/#darcs Checkout instructions]. Also see [[Darcs]]<br />
<br />
[[Category:Applications]]</div>Chakhttps://wiki.haskell.org/Gtk2HsGtk2Hs2009-03-04T02:45:19Z<p>Chak: Use of the GTK+ OS X Framework</p>
<hr />
<div>[[Category:User interfaces]]<br />
== What is it? ==<br />
<br />
Gtk2Hs is a Haskell binding to Gtk+ 2.x.<br />
Using it, one can write Gtk+ based applications with GHC.<br />
<br />
== Homepage ==<br />
<br />
http://haskell.org/gtk2hs/<br />
<br />
== Status ==<br />
<br />
It currently works with Gtk+ 2.0 through to 2.8 on Unix, Win32 and MacOS X.<br />
The widget function coverage is almost complete, only a few minor bits and pieces are missing.<br />
<br />
It currently builds with ghc 5.04.3 through to 6.8.2<br />
<br />
== Installation Notes ==<br />
=== Mac OS X ===<br />
<br />
=== Using the GTK+ OS X Framework ===<br />
<br />
This explains how to install Gtk2Hs on Macs using the native [http://www.gtk-osx.org/ GTK+ OS X Framework], a port of GTK+ to the Mac that does '''not''' depend on X11, and hence, is better integrated into the Mac desktop - i.e., menus actually appear in the menu bar, where they belong. It also avoids the often tedious installation of GTK+ via MacPorts. However, it misses support for op