HaskellWiki - User contributions [en]

Game Development

2016-09-21T00:01:57Z

Chak: /* Videos */

[[Category:Games]] [[Category:Community]]

This page and the #haskell-game [[IRC channel]] are the starting points for everyone interested in doing game development with Haskell. You may also wish to join the [http://www.haskellers.com/teams/7 Games group] on haskellers.com, or [http://www.reddit.com/r/haskellgamedev the Haskell game development subreddit].

There are quite a lot of games, unfinished libraries, and interested people out there - please gather links here and join us on '''[irc://irc.freenode.net/#haskell-game #haskell-game]''' !

== Games and game engines ==

* [[Applications and libraries/Games]] lists [[Applications and libraries/Games#Games|games]] and [[Applications and libraries/Games#Game_Engines_and_Libraries|game engines/libs]]

* See also Hackage categories: [http://hackage.haskell.org/packages/#cat:Game Game], [http://hackage.haskell.org/packages/#cat:Game%20Engine Game Engine], [http://hackage.haskell.org/packages/#cat:Graphics Graphics], [http://hackage.haskell.org/packages/#cat:Sound Sound], [http://hackage.haskell.org/packages/#cat:Physics Physics], [http://hackage.haskell.org/packages/#cat:FRP FRP]

* Other game-related wiki pages: [[:category:Games]]

=== Other supporting software ===

* [http://hackage.haskell.org/package/grid grid] provides tools for working with regular arrangements of tiles, such as might be used in a board game or self-organising map (SOM). Grid currently supports triangular, square, and hexagonal tiles, with various 2D and toroidal layouts ([https://github.com/mhwombat/grid/wiki description]).

== Articles and blog posts ==



* [http://blog.haskellformac.com/blog/writing-games-in-haskell-with-spritekit Writing Games in Haskell with SpriteKit]

* [http://free-idea-monoid.blogspot.ca/2015/09/skeletal-animation-for-games-in-haskell.html Skeletal animation for games in Haskell]

* [http://free-idea-monoid.blogspot.ca/2014/03/experimenting-with-game-engine-concepts.html Experimenting with game engine concepts in Haskell]

* [http://fho.f12n.de/posts/2014-10-25-easily-extensible-entity-enigma.html The easily extensible entity enigma]

* [http://www.gamedev.net/page/resources/_/technical/game-programming/haskell-game-object-design-or-how-functions-can-get-you-apples-r3204 Haskell Game Object Design - Or How Functions Can Get You Apples]

* [http://keera.co.uk/blog/2014/10/15/from-60-fps-to-500/ From 60 Frames per Second to 500 in Haskell]

* [http://www.reddit.com/r/haskell/comments/2f9w0p/is_it_practical_to_write_a_strong_chess_engine_in/ Is it practical to write a strong chess engine in Haskell?]

* [https://www.youtube.com/watch?v=1MNTerD8IuI Writing a game in Haskell] (video)

* [https://ocharles.org.uk/blog/posts/2013-08-18-asteroids-in-netwire.html Asteroids & Netwire]

* [https://ocharles.org.uk/blog/posts/2013-08-01-getting-started-with-netwire-and-sdl.html Getting Started with Netwire and SDL]

* [https://github.com/alexander-b/master-thesis The Quest for Programming Nirvana: On Programming Game Systems in Haskell]; a Master Thesis on programming game systems in Haskell

* [http://www.cse.unsw.edu.au/~pls/thesis/munc-thesis.pdf Functional Programming and 3D Games] (thesis, PDF)

* [http://blog.chucklefish.org/?p=154 Wayward Tide: Technical Info]

* [https://ocharles.org.uk/blog/posts/2013-08-18-asteroids-in-netwire.html Asteroids & Netwire]

* [http://jshaskell.blogspot.nl/2012/09/breakout.html Writing JavaScript games in Haskell - Breakout]

* [https://github.com/leonidas/codeblog/blob/master/2012/2012-01-17-declarative-game-logic-afrp.md Purely Functional, Declarative Game Logic Using Reactive Programming]

* [http://folk.uio.no/carljsv/computergames/computergames.pdf Computer Games'] - trying to implement the game flow of a computer game

* [http://lambda-the-ultimate.org/node/1277 The Next Mainstream Programming Languages: A Game Developer's Perspective] (PPT, PDF) presentation by Tim Sweeney

* [http://prog21.dadgum.com/23.html Purely Functional Retrogames]

* [http://prog21.dadgum.com/36.html Accidentally Introducing Side Effects into Purely Functional Code]

* [[media:wxhaskell.pdf | wxHaskell - A Portable and Concise GUI Library for Haskell]] (PDF) - describes an implementation of an asteroids game, [[wxAsteroids]]

* [http://www.palgorithm.co.uk/2009/08/haskell-for-games/ Haskell for Games!] Blog post, with PDF slides from AngloHaskell talk.

* [http://www.gamasutra.com/view/feature/2985/postmortem_naughty_dogs_jak_and_.php Postmortem: Naughty Dog's Jak and Daxter: the Precursor Legacy]; an article about a game developed with a [http://en.wikipedia.org/wiki/Domain-specific_language DSL] compiler written in Lisp

* [http://lambdor.net/ Lambdor] blog mostly about Yampa FRP and game development in Haskell with some tutorials

* [http://jshaskell.blogspot.de/ Writing JavaScript games in Haskell]

* [http://lambdacube3d.wordpress.com/ LambdaCube 3D] is a domain specific language and library that makes it possible to program GPUs in a purely functional style.

== Videos ==

* [https://www.youtube.com/watch?v=9dk7_GDNocQ Playing with Graphics and Animations in Haskell]

* [http://www.youtube.com/watch?v=AJQZg3Po-Ag bloxors: an OpenGL Logic Game written in Haskell]

* [http://www.youtube.com/watch?v=XoE5CKLLnaM LambdaCube 3D - Stunts example]

* [http://www.youtube.com/watch?v=JleoASegUlk LambdaCube 3D - Quake 3 example]

== Examples ==



* [http://folk.uio.no/carljsv/gorillabas/ GorillaBAS] - this was an attempt on defining computer games, and build such a thing.

* https://github.com/mlesniak/game - Haskell/OpenGL/Chipmunk game prototypes

* [[wxAsteroids]], a well-documented game, based on [[wxHaskell]]

* [https://github.com/simonmichael/hssdl-mac-example hssdl-mac-example] - how to make an SDL-using package buildable on mac OSX

* http://codepad.org/LRGEkkDp - initialization for SDL to start rendering OpenGL stuff

* http://hackage.haskell.org/package/stunts - A revival of the classic racing game Stunts to serve as a non-toy-sized example for LambdaCube.

* http://hackage.haskell.org/package/dow - Dungeons of Wor is an homage to the classic arcade game, Wizard of Wor. This game is also an experiment in functional reactive programming, so it might be a useful resource to anyone interested in this topic.

* [https://github.com/sseefried/open-epidemic-game Epidemic]: a small game for Android devices. As an added bonus a complete development environment for the game can be built with Docker using the [https://github.com/sseefried/docker-epidemic-build-env.git docker-epidemic-build-env] repo.

* Possible Hackage categorisation guidelines: upload games to Game, engines and libs to Game Engine, or at least to some category beginning with Game, and check latest categories before uploading

== Forums ==

There are several forums in the Haskell world where game development can be discussed:
* The [https://www.haskell.org/mailman/listinfo/haskell-cafe Haskell Café] mailing list

* This page

* [irc://irc.freenode.net/#haskell-game #haskell-game] ([[IRC]])

* [http://www.haskellers.com/teams/7 Special Interest Groups » Games] at Haskeller.com

* [https://github.com/haskell-game/brainstorming haskell-game] at GitHub

* [http://www.reddit.com/r/haskellgamedev Haskell Game Development] at reddit

== Wishlist ==

Is Hackage missing a useful data structure or library for some functionality that would benefit game programming? Suggestions for useful things can be added here as potential projects to hack on.

* [https://hackage.haskell.org/package/Octree Octtree], [http://hackage.haskell.org/package/KdTree kd]-[http://hackage.haskell.org/package/kd-tree tree], various space partitioning techniques (maybe start with [http://hackage.haskell.org/package/spacepart spacepart]).

* Binding to [http://www.fmod.org fmod]

* Binding to [http://enet.bespin.org/Features.html enet] for multiplayer games. (jeffz is working on this).

Game Development

2016-09-20T23:56:03Z

Chak: /* Articles and blog posts */

[[Category:Games]] [[Category:Community]]

This page and the #haskell-game [[IRC channel]] are the starting points for everyone interested in doing game development with Haskell. You may also wish to join the [http://www.haskellers.com/teams/7 Games group] on haskellers.com, or [http://www.reddit.com/r/haskellgamedev the Haskell game development subreddit].

There are quite a lot of games, unfinished libraries, and interested people out there - please gather links here and join us on '''[irc://irc.freenode.net/#haskell-game #haskell-game]''' !

== Games and game engines ==

* [[Applications and libraries/Games]] lists [[Applications and libraries/Games#Games|games]] and [[Applications and libraries/Games#Game_Engines_and_Libraries|game engines/libs]]

* See also Hackage categories: [http://hackage.haskell.org/packages/#cat:Game Game], [http://hackage.haskell.org/packages/#cat:Game%20Engine Game Engine], [http://hackage.haskell.org/packages/#cat:Graphics Graphics], [http://hackage.haskell.org/packages/#cat:Sound Sound], [http://hackage.haskell.org/packages/#cat:Physics Physics], [http://hackage.haskell.org/packages/#cat:FRP FRP]

* Other game-related wiki pages: [[:category:Games]]

=== Other supporting software ===

* [http://hackage.haskell.org/package/grid grid] provides tools for working with regular arrangements of tiles, such as might be used in a board game or self-organising map (SOM). Grid currently supports triangular, square, and hexagonal tiles, with various 2D and toroidal layouts ([https://github.com/mhwombat/grid/wiki description]).

== Articles and blog posts ==



* [http://blog.haskellformac.com/blog/writing-games-in-haskell-with-spritekit Writing Games in Haskell with SpriteKit]

* [http://free-idea-monoid.blogspot.ca/2015/09/skeletal-animation-for-games-in-haskell.html Skeletal animation for games in Haskell]

* [http://free-idea-monoid.blogspot.ca/2014/03/experimenting-with-game-engine-concepts.html Experimenting with game engine concepts in Haskell]

* [http://fho.f12n.de/posts/2014-10-25-easily-extensible-entity-enigma.html The easily extensible entity enigma]

* [http://www.gamedev.net/page/resources/_/technical/game-programming/haskell-game-object-design-or-how-functions-can-get-you-apples-r3204 Haskell Game Object Design - Or How Functions Can Get You Apples]

* [http://keera.co.uk/blog/2014/10/15/from-60-fps-to-500/ From 60 Frames per Second to 500 in Haskell]

* [http://www.reddit.com/r/haskell/comments/2f9w0p/is_it_practical_to_write_a_strong_chess_engine_in/ Is it practical to write a strong chess engine in Haskell?]

* [https://www.youtube.com/watch?v=1MNTerD8IuI Writing a game in Haskell] (video)

* [https://ocharles.org.uk/blog/posts/2013-08-18-asteroids-in-netwire.html Asteroids & Netwire]

* [https://ocharles.org.uk/blog/posts/2013-08-01-getting-started-with-netwire-and-sdl.html Getting Started with Netwire and SDL]

* [https://github.com/alexander-b/master-thesis The Quest for Programming Nirvana: On Programming Game Systems in Haskell]; a Master Thesis on programming game systems in Haskell

* [http://www.cse.unsw.edu.au/~pls/thesis/munc-thesis.pdf Functional Programming and 3D Games] (thesis, PDF)

* [http://blog.chucklefish.org/?p=154 Wayward Tide: Technical Info]

* [https://ocharles.org.uk/blog/posts/2013-08-18-asteroids-in-netwire.html Asteroids & Netwire]

* [http://jshaskell.blogspot.nl/2012/09/breakout.html Writing JavaScript games in Haskell - Breakout]

* [https://github.com/leonidas/codeblog/blob/master/2012/2012-01-17-declarative-game-logic-afrp.md Purely Functional, Declarative Game Logic Using Reactive Programming]

* [http://folk.uio.no/carljsv/computergames/computergames.pdf Computer Games'] - trying to implement the game flow of a computer game

* [http://lambda-the-ultimate.org/node/1277 The Next Mainstream Programming Languages: A Game Developer's Perspective] (PPT, PDF) presentation by Tim Sweeney

* [http://prog21.dadgum.com/23.html Purely Functional Retrogames]

* [http://prog21.dadgum.com/36.html Accidentally Introducing Side Effects into Purely Functional Code]

* [[media:wxhaskell.pdf | wxHaskell - A Portable and Concise GUI Library for Haskell]] (PDF) - describes an implementation of an asteroids game, [[wxAsteroids]]

* [http://www.palgorithm.co.uk/2009/08/haskell-for-games/ Haskell for Games!] Blog post, with PDF slides from AngloHaskell talk.

* [http://www.gamasutra.com/view/feature/2985/postmortem_naughty_dogs_jak_and_.php Postmortem: Naughty Dog's Jak and Daxter: the Precursor Legacy]; an article about a game developed with a [http://en.wikipedia.org/wiki/Domain-specific_language DSL] compiler written in Lisp

* [http://lambdor.net/ Lambdor] blog mostly about Yampa FRP and game development in Haskell with some tutorials

* [http://jshaskell.blogspot.de/ Writing JavaScript games in Haskell]

* [http://lambdacube3d.wordpress.com/ LambdaCube 3D] is a domain specific language and library that makes it possible to program GPUs in a purely functional style.

== Videos ==

* [http://www.youtube.com/watch?v=AJQZg3Po-Ag bloxors: an OpenGL Logic Game written in Haskell]

* [http://www.youtube.com/watch?v=XoE5CKLLnaM LambdaCube 3D - Stunts example]

* [http://www.youtube.com/watch?v=JleoASegUlk LambdaCube 3D - Quake 3 example]

== Examples ==



* [http://folk.uio.no/carljsv/gorillabas/ GorillaBAS] - this was an attempt on defining computer games, and build such a thing.

* https://github.com/mlesniak/game - Haskell/OpenGL/Chipmunk game prototypes

* [[wxAsteroids]], a well-documented game, based on [[wxHaskell]]

* [https://github.com/simonmichael/hssdl-mac-example hssdl-mac-example] - how to make an SDL-using package buildable on mac OSX

* http://codepad.org/LRGEkkDp - initialization for SDL to start rendering OpenGL stuff

* http://hackage.haskell.org/package/stunts - A revival of the classic racing game Stunts to serve as a non-toy-sized example for LambdaCube.

* http://hackage.haskell.org/package/dow - Dungeons of Wor is an homage to the classic arcade game, Wizard of Wor. This game is also an experiment in functional reactive programming, so it might be a useful resource to anyone interested in this topic.

* [https://github.com/sseefried/open-epidemic-game Epidemic]: a small game for Android devices. As an added bonus a complete development environment for the game can be built with Docker using the [https://github.com/sseefried/docker-epidemic-build-env.git docker-epidemic-build-env] repo.

* Possible Hackage categorisation guidelines: upload games to Game, engines and libs to Game Engine, or at least to some category beginning with Game, and check latest categories before uploading

== Forums ==

There are several forums in the Haskell world where game development can be discussed:
* The [https://www.haskell.org/mailman/listinfo/haskell-cafe Haskell Café] mailing list

* This page

* [irc://irc.freenode.net/#haskell-game #haskell-game] ([[IRC]])

* [http://www.haskellers.com/teams/7 Special Interest Groups » Games] at Haskeller.com

* [https://github.com/haskell-game/brainstorming haskell-game] at GitHub

* [http://www.reddit.com/r/haskellgamedev Haskell Game Development] at reddit

== Wishlist ==

Is Hackage missing a useful data structure or library for some functionality that would benefit game programming? Suggestions for useful things can be added here as potential projects to hack on.

* [https://hackage.haskell.org/package/Octree Octtree], [http://hackage.haskell.org/package/KdTree kd]-[http://hackage.haskell.org/package/kd-tree tree], various space partitioning techniques (maybe start with [http://hackage.haskell.org/package/spacepart spacepart]).

* Binding to [http://www.fmod.org fmod]

* Binding to [http://enet.bespin.org/Features.html enet] for multiplayer games. (jeffz is working on this).

Applications and libraries/Games

2016-09-20T23:48:19Z

Chak: Add Lazy Lambda to games

{{LibrariesPage}}

See also: [[Game Development]]

== Games ==

See also the [http://hackage.haskell.org/packages/archive/pkg-list.html#cat:game Game] category on Hackage.

;{{HackagePackage|id=babylon}}
: An implementation of a simple 2-player board game. Uses wxHaskell.

;[https://www.haskell.org/communities/11-2015/html/report.html#sect7.13.5 Barbarossa]
:A UCI chess engine written completely in Haskell

;[https://github.com/plneappl/BeHaskelled BeHaskelled]
: A Bejeweled clone written completely in Haskell with {{HackagePackage|id=gloss}}.

;{{HackagePackage|id=board-games}}
: Computer player algorithms for three games: Connect Four, Rows&Columns, Mastermind. Intended for running as a web server.

;{{HackagePackage|id=boomslang}}
: A clone of the popular Flash game Boomshine.

;[https://github.com/yairchu/defend Defend The King from Forces of Different]
: A simple multiplayer real time strategy game.

; [http://www.increpare.com/2008/11/endless-cavern/ Endless Cavern]
: A 2D procedurally-generated cave exploration game.

;[http://sourceforge.net/projects/fooengine/?abmode=1 Foo]
:Foo (abbreviation from football) is a playing machine of [http://en.wikipedia.org/wiki/Paper_Soccer Paper Soccer], a pencil and paper game for two players. It contains a simple interface using HOpenGL library and provides many playing algorithms.

;[[Frag]]
:Frag is a 3D first person shooting game written in Haskell, by Mun Hon Cheong. It uses Yampa, Quake 3 BSP level format and OpenGL. It is licensed under the GPL.

;[http://mfuglos.github.io/jeopardy Fuglos Jeopardy]
:Fuglos Jeopardy is a free implementation of a game resembling the popular quiz show 'Jeopardy'. It is written using Gtk2Hs as GUI toolkit. It is quite feature complete and easy to use. It contains support for LaTeX, so you can for example use LaTeX math syntax in your data sheets and thus organize a math jeopoardy event. Licensed under GPL3.

;[[GeBoP]]
:The General Boardgames Player, offers a set of board games: Ataxx, Bamp, Halma, Hex, Kram, Nim, Reversi, TicTacToe, and Zenix. It uses wxHaskell.

; [http://folk.uio.no/carljsv/gorillabas/GorillaBAS-0.1.tar.gz GorillaBAS]
: A concrete game from an attempt on defining computer games.

; [https://github.com/ocharles/hadoom hadoom]
:A clone of Doom, using reactive-banana, GTK, and the "diagrams" library.

; [https://github.com/ivanperez-keera/haskanoid haskanoid]
:An breakout game with SDL graphics and Kinect and Wiimote support. Written in FRP, there's a fork in Haskell for Android.

;[http://www.informatik.uni-bremen.de/~cxl/lehre/pi3.ws01/asteroids/ Haskell in Space]
:An asteroid like game

;[http://www.hedgewars.org/ Hedgewars]
:A turn-based artillery game. The game server is written in Haskell.

;[http://www.cs.ox.ac.uk/people/ian.lynagh/Hetris/ Hetris]
:ASCII Tetris in Haskell

;{{HackagePackage|id=hfiar}}
:Four in a Row in Haskell. Uses wxHaskell.

;{{HackagePackage|id=hinvaders}}
:A simple ANSI-graphics space invaders written entirely in Haskell 98.

;[http://mu.org/~mux/LambdaChess/ LambdaChess]
:GTK chess client

;[https://github.com/mchakravarty/lazy-lambda Lazy Lambda]
:Lazy Lambda is a simple Flappy Bird clone in Haskell, implemented with [https://github.com/mchakravarty/HaskellSpriteKit Haskell SpriteKit]. It was originally developed for the [https://speakerdeck.com/mchakravarty/playing-with-graphics-and-animations-in-haskell Compose :: Melbourne 2016 keynote], where it was live coded in the second half of the presentation.

;[http://quasimal.com/projects/level_0.html Level 0]
:A fun and featureful Snake II clone using SDL.

;[http://www.ncc.up.pt/~pbv/stuff/lostcities/ Lost Cities]
:A two-player card game where each player tries to mount profitable expeditions. It uses wxHaskell.

;{{HackagePackage|id=mage}}
:Nethack clone written in Haskell (The web site have [http://www.scannedinavian.com/~shae/mage-1.0pre35.tar.gz this mage-1.0.pre35.tar.gz file] containing an older version that was using Data.FiniteMap.) There seems to be a problem with newer curses library even with the more recent 1.1.0 version.

;{{HackagePackage|id=MazesOfMonad}}
:Role-Playing Game (influenced by Nethack), complete and fully playable. Console mode only.

;[http://www.geocities.jp/takascience/haskell/monadius_en.html Monadius]
:Monadius is a shoot 'em up with the selection bar power-up system for Windows, written in Haskell (now on Hackage as {{HackagePackage|id=Monadius}}; see also the [http://www.youtube.com/watch?v=zqFgQiPKtOI video])

;[http://mokehehe.blogspot.com/2009/04/super-nario-move-to-github.html Monao]
:A Super Mario clone, using an SDL binding different from the one in Hackage: [https://github.com/mokehehe/monao Monao on github], [https://github.com/keera-studios/monao New maintained version on github]

;[http://joyridelabs.de/game/ Nikki and the Robots]
:A puzzle, platformer game.

;[http://berlinbrowndev.blogspot.com/2007/09/octane-mech-opengl-haskell-based-mech.html Octane Mech]
:Octane Mech, OpenGL Haskell based mech game

;[http://sourceforge.net/projects/puzhs/ puzhs]
:Haskell bindings to [https://code.google.com/p/puz/ libpuz]

;[http://haskell-tetris.pbworks.com/w/page/16967421/Main OpenGL Tetris]
:Tetris in Haskell with OpenGL

;[http://srineet.brinkster.net/para/para.html Paratrooper]
:Paratrooper is a simple action game that runs on Windows and is written in literate Haskell.

;[http://raincat.bysusanlin.com/ Raincat]
:2D puzzle game featuring a fuzzy little cat (uses GLUT)

;[http://roguestar.downstairspeople.org Roguestar]
:Roguestar is a science fiction adventure role playing game using Haskell and OpenGL.

;{{HackagePackage|id=Shu-thing}}
:A 2-D vector graphics upwards-scrolling keyboard-controlled shooter. You shoot the enemies while dodging their bullets until you reach and defeat the enemy.

;{{HackagePackage|id=SpaceInvaders}}
:A video game, based on [[Yampa]]

;{{HackagePackage|id=stunts}}
:A revival of the classic racing game Stunts to serve as a non-toy-sized example for LambdaCube.

;[https://github.com/nbartlomiej/tfoo Tfoo]
:A simple Five in a Row game. Online, with server-sent events, deployed to [http://tfoo.herokuapp.com/ Heroku], open source.

;[http://web.jfet.org/~kwantam/TriHs.tar.gz TriHs] (tar.gz)
:A 1- or 2-player Tetris game using Gtk2Hs and Cairo.

;[[wxAsteroids]]
:Your space ship enters an asteroid belt, try to avoid collisions! wxAsteroids is based on wxHaskell.

;[http://xiangqiboard.blogspot.com/2007/12/gnuxiangqi-angekndigt.html Xiangqiboard]
:An implementation of xiangqi for Unix, using gtk2hs + cairo

;{{HackagePackage | id =Yogurt}}
:A functional MUD client featuring prioritized, regex-based hooks, variables, timers, logging, dynamic loading of Yogurt scripts and more. For example programs, please see [http://code.google.com/p/yogurt-mud/ Yogurt's home page].

=== Commercial games ===
;[https://play.google.com/store/apps/details?id=uk.co.keera.games.magiccookies Magic Cookies!]
:A lights-out clone for Android written in Haskell using SDL2 graphics and the FRP implementation Yampa. Created by [http://facebook.com/keerastudios Keera Studios].

=== Unfinished/in-progress games ===

;[http://allureofthestars.com Allure of the Stars]
:A near-future Sci-Fi roguelike and tactical squad game. Long-term goals are high replayability and auto-balancing through procedural content generation and persistent content modification based on player behaviour. The game is written using the {{HackagePackage|id=LambdaHack}} roguelike game engine.

;[http://ipwnstudios.com/node/4 Bloodknight]
:An action RPG for mobile devices

; [https://github.com/ghulette/haskell-game-of-life haskell-game-of-life]
: Conway's Game of Life

; [https://github.com/EricThoma/hchess hchess]
: Incomplete toy chess engine

;[http://dotat.at/prog/life/hslife.hs HsLife]
:A Haskell implementation of hashlife. It uses GLUT.

== Game Engines and Libraries ==

;[https://github.com/egonSchiele/actionkid actionkid]
:A video game framework, with a [http://vimeo.com/109663514 video tutorial] and [https://github.com/egonSchiele/chips chips], a game based on it.

;[[Bogre-Banana]]
:A 3D game-engine for Haskell. It uses Haskell bindings to the OGRE 3D engine and OIS input system and a library called reactive-banana, to create a "Functional Reactive Programming" game-engine.

;[http://hackage.haskell.org/package/bullet Bullet]
:A wrapper for the Bullet physics engine.

;[http://hackage.haskell.org/package/free-game free-game]
:A GUI/game library based on free monads.

;[http://hackage.haskell.org/package/FunGEn FunGEn]
:FunGEn (Functional Game Engine) is a platform-independent BSD-licensed 2D game engine based on OpenGL and GLUT. Its light dependencies make it easy to install, however GLUT is reputed to be unsuitable for simultaneous keypresses. As of 2011 it's the only general-purpose game engine, and the quickest way to throw together [https://github.com/haskell-game/fungen/blob/master/examples/hello.hs simple] [https://github.com/haskell-game/fungen/blob/master/examples/pong/pong.hs 2D] [https://github.com/haskell-game/fungen/blob/master/examples/worms/worms.hs games], in Haskell. Example code: [http://joyful.com/fungen/site/example.html A Brief Example]. Forks and patches welcome!

;[http://projects.haskell.org/game-tree/ game-tree]
:game-tree is a purely functional library for searching game trees - useful for zero-sum two player games.

;[http://hackage.haskell.org/package/GLFW-b GLFW-b]
:Bindings to GLFW, a free, open source, multi-platform library for creating OpenGL contexts and managing input, including keyboard, mouse, joystick and time.

;[http://gloss.ouroborus.net/ Gloss]
:An OpenGL abstraction layer supporting game-style main loops.

;[https://github.com/haskell-game haskell-game]
:A project to make game development with Haskell easier to get started with by providing a suite of libraries for covering all sorts of aspects of game development.

;[https://github.com/mchakravarty/HaskellSpriteKit Haskell SpriteKit]
:Haskell SpriteKit provides a purely functional interface to the SpriteKit game engine on Apple platforms. SpriteKit is a state-of-the-art engine for 2D games and includes a versatile animation framework and an integrated physics engine. It is easy to use without the need for low-level programming or advanced concepts, such FRP.

;[http://helm-engine.org/ Helm]
:A functionally reactive game engine inspired by [http://elm-lang.org/ Elm].

;[http://hackage.haskell.org/package/HGamer3D HGamer3D]
:A game engine for Windows which includes Haskell bindings to a couple of C++ libraries and a Haskell API on top of that. Features include Audio, Joystick, Mouse and Keyboard handling, GUI, Network, Physics, 3D graphics.
:[https://www.youtube.com/watch?v=v_GSbObYRkY Y-Wing flight] is a video of a demonstration of the possibilities of HGamer3D.

;[http://hackage.haskell.org/cgi-bin/hackage-scripts/package/Hipmunk Hipmunk]
:Hipmunk: A Haskell binding for [http://chipmunk-physics.net/ Chipmunk]. Chipmunk is a fast, simple, portable, 2D physics engine. It is completely self-contained. See also [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/HipmunkPlayground HipmunkPlayground]: a simple OpenGL program that allows you to see some of Hipmunk's functions in action.

;[https://github.com/asivitz/Hickory Hickory]
:Hickory is not really a game engine. It's more of a collection of tools and abstractions that can be used to make games. It doesn't have opinions and doesn't force you into a particular paradigm.

;[[Hpysics]]
:Hpysics is a physics engine written using Data Parallel Haskell during Google Summer of Code 2008.

;[http://hackage.haskell.org/package/hogre hogre]
:Haskell bindings to the excellent OGRE 3D rendering engine. Ogre has been used in commercial games such as Torchlight and several books exist documenting the Ogre API. Ogre uses an MIT license making it compatible with many Haskell libraries.

;[http://hackage.haskell.org/package/IrrHaskell IrrHaskell]
:Haskell binding to the [http://irrlicht.sourceforge.net/ Irrlicht game engine]. The Irrlicht Engine is an open source high performance realtime 3D engine

;[http://lambdacube3d.com/ LambdaCube 3D]
:LambdaCube 3D is a domain specific language and library that makes it possible to program GPUs in a purely functional style.

;[http://hackage.haskell.org/package/set-cover set-cover]
:Solver for exact set cover problems. Included examples: [[Sudoku]], [[Mastermind]], [[Nonogram]], Domino tessellation, 8 Queens, Soma Cube, [[Tetris Cube]], Cube of L's, Logika's Baumeister puzzle. Generic algorithm allows to choose between slow but flexible Set from containers package and fast but cumbersome bitvectors.

=== Unfinished/in-progress game engines/libraries ===

;[https://github.com/adorablepuppy/CurryDog CurryDog]
:Aims to be a 2d and 3d modular game engine.

;[https://github.com/keera-studios/gtk-helpers gtk-helpers]
:A collection of auxiliary operations related to Gtk2hs. See also [http://keera.co.uk/blog/2013/03/19/creating-board-games-in-haskell/ Creating board games in Haskell in 100 lines of code]

;[[HaskGame]]
:An incomplete graphics system abstraction layer.

; [https://bananu7.github.io/Hate Hate]
:Hate is a small framework for graphical haskell games and applications. It's heavily inspired by Love and aims at similar ease of use, but within the power of Haskell's type and concurrency safety.

; [https://github.com/shicks/hsgame hsgame]
:A framework for network games

;[https://github.com/LambdaHack/LambdaHack LambdaHack]
:A game engine library for roguelike games of arbitrary theme, size and complexity, packaged together with a small example dungeon crawler. When completed, it will let you specify content to be procedurally generated, define the AI behaviour on top of the generic content-independent rules and compile a ready-to-play game binary, using either the supplied or a custom-made main loop. Several frontends are available (GTK is the default) and many other generic engine components are easily overridden, but the fundamental source of flexibility lies in the strict and type-safe separation of code and content.

[[Category:Games|*]]
[[Category:Applications]]
[[Category:Libraries]]

Applications and libraries/Games

2016-09-20T23:39:31Z

Chak: Added Haskell SpriteKit to engines

{{LibrariesPage}}

See also: [[Game Development]]

== Games ==

See also the [http://hackage.haskell.org/packages/archive/pkg-list.html#cat:game Game] category on Hackage.

;{{HackagePackage|id=babylon}}
: An implementation of a simple 2-player board game. Uses wxHaskell.

;[https://www.haskell.org/communities/11-2015/html/report.html#sect7.13.5 Barbarossa]
:A UCI chess engine written completely in Haskell

;[https://github.com/plneappl/BeHaskelled BeHaskelled]
: A Bejeweled clone written completely in Haskell with {{HackagePackage|id=gloss}}.

;{{HackagePackage|id=board-games}}
: Computer player algorithms for three games: Connect Four, Rows&Columns, Mastermind. Intended for running as a web server.

;{{HackagePackage|id=boomslang}}
: A clone of the popular Flash game Boomshine.

;[https://github.com/yairchu/defend Defend The King from Forces of Different]
: A simple multiplayer real time strategy game.

; [http://www.increpare.com/2008/11/endless-cavern/ Endless Cavern]
: A 2D procedurally-generated cave exploration game.

;[http://sourceforge.net/projects/fooengine/?abmode=1 Foo]
:Foo (abbreviation from football) is a playing machine of [http://en.wikipedia.org/wiki/Paper_Soccer Paper Soccer], a pencil and paper game for two players. It contains a simple interface using HOpenGL library and provides many playing algorithms.

;[[Frag]]
:Frag is a 3D first person shooting game written in Haskell, by Mun Hon Cheong. It uses Yampa, Quake 3 BSP level format and OpenGL. It is licensed under the GPL.

;[http://mfuglos.github.io/jeopardy Fuglos Jeopardy]
:Fuglos Jeopardy is a free implementation of a game resembling the popular quiz show 'Jeopardy'. It is written using Gtk2Hs as GUI toolkit. It is quite feature complete and easy to use. It contains support for LaTeX, so you can for example use LaTeX math syntax in your data sheets and thus organize a math jeopoardy event. Licensed under GPL3.

;[[GeBoP]]
:The General Boardgames Player, offers a set of board games: Ataxx, Bamp, Halma, Hex, Kram, Nim, Reversi, TicTacToe, and Zenix. It uses wxHaskell.

; [http://folk.uio.no/carljsv/gorillabas/GorillaBAS-0.1.tar.gz GorillaBAS]
: A concrete game from an attempt on defining computer games.

; [https://github.com/ocharles/hadoom hadoom]
:A clone of Doom, using reactive-banana, GTK, and the "diagrams" library.

; [https://github.com/ivanperez-keera/haskanoid haskanoid]
:An breakout game with SDL graphics and Kinect and Wiimote support. Written in FRP, there's a fork in Haskell for Android.

;[http://www.informatik.uni-bremen.de/~cxl/lehre/pi3.ws01/asteroids/ Haskell in Space]
:An asteroid like game

;[http://www.hedgewars.org/ Hedgewars]
:A turn-based artillery game. The game server is written in Haskell.

;[http://www.cs.ox.ac.uk/people/ian.lynagh/Hetris/ Hetris]
:ASCII Tetris in Haskell

;{{HackagePackage|id=hfiar}}
:Four in a Row in Haskell. Uses wxHaskell.

;{{HackagePackage|id=hinvaders}}
:A simple ANSI-graphics space invaders written entirely in Haskell 98.

;[http://mu.org/~mux/LambdaChess/ LambdaChess]
:GTK chess client

;[http://quasimal.com/projects/level_0.html Level 0]
:A fun and featureful Snake II clone using SDL.

;[http://www.ncc.up.pt/~pbv/stuff/lostcities/ Lost Cities]
:A two-player card game where each player tries to mount profitable expeditions. It uses wxHaskell.

;{{HackagePackage|id=mage}}
:Nethack clone written in Haskell (The web site have [http://www.scannedinavian.com/~shae/mage-1.0pre35.tar.gz this mage-1.0.pre35.tar.gz file] containing an older version that was using Data.FiniteMap.) There seems to be a problem with newer curses library even with the more recent 1.1.0 version.

;{{HackagePackage|id=MazesOfMonad}}
:Role-Playing Game (influenced by Nethack), complete and fully playable. Console mode only.

;[http://www.geocities.jp/takascience/haskell/monadius_en.html Monadius]
:Monadius is a shoot 'em up with the selection bar power-up system for Windows, written in Haskell (now on Hackage as {{HackagePackage|id=Monadius}}; see also the [http://www.youtube.com/watch?v=zqFgQiPKtOI video])

;[http://mokehehe.blogspot.com/2009/04/super-nario-move-to-github.html Monao]
:A Super Mario clone, using an SDL binding different from the one in Hackage: [https://github.com/mokehehe/monao Monao on github], [https://github.com/keera-studios/monao New maintained version on github]

;[http://joyridelabs.de/game/ Nikki and the Robots]
:A puzzle, platformer game.

;[http://berlinbrowndev.blogspot.com/2007/09/octane-mech-opengl-haskell-based-mech.html Octane Mech]
:Octane Mech, OpenGL Haskell based mech game

;[http://sourceforge.net/projects/puzhs/ puzhs]
:Haskell bindings to [https://code.google.com/p/puz/ libpuz]

;[http://haskell-tetris.pbworks.com/w/page/16967421/Main OpenGL Tetris]
:Tetris in Haskell with OpenGL

;[http://srineet.brinkster.net/para/para.html Paratrooper]
:Paratrooper is a simple action game that runs on Windows and is written in literate Haskell.

;[http://raincat.bysusanlin.com/ Raincat]
:2D puzzle game featuring a fuzzy little cat (uses GLUT)

;[http://roguestar.downstairspeople.org Roguestar]
:Roguestar is a science fiction adventure role playing game using Haskell and OpenGL.

;{{HackagePackage|id=Shu-thing}}
:A 2-D vector graphics upwards-scrolling keyboard-controlled shooter. You shoot the enemies while dodging their bullets until you reach and defeat the enemy.

;{{HackagePackage|id=SpaceInvaders}}
:A video game, based on [[Yampa]]

;{{HackagePackage|id=stunts}}
:A revival of the classic racing game Stunts to serve as a non-toy-sized example for LambdaCube.

;[https://github.com/nbartlomiej/tfoo Tfoo]
:A simple Five in a Row game. Online, with server-sent events, deployed to [http://tfoo.herokuapp.com/ Heroku], open source.

;[http://web.jfet.org/~kwantam/TriHs.tar.gz TriHs] (tar.gz)
:A 1- or 2-player Tetris game using Gtk2Hs and Cairo.

;[[wxAsteroids]]
:Your space ship enters an asteroid belt, try to avoid collisions! wxAsteroids is based on wxHaskell.

;[http://xiangqiboard.blogspot.com/2007/12/gnuxiangqi-angekndigt.html Xiangqiboard]
:An implementation of xiangqi for Unix, using gtk2hs + cairo

;{{HackagePackage | id =Yogurt}}
:A functional MUD client featuring prioritized, regex-based hooks, variables, timers, logging, dynamic loading of Yogurt scripts and more. For example programs, please see [http://code.google.com/p/yogurt-mud/ Yogurt's home page].

=== Commercial games ===
;[https://play.google.com/store/apps/details?id=uk.co.keera.games.magiccookies Magic Cookies!]
:A lights-out clone for Android written in Haskell using SDL2 graphics and the FRP implementation Yampa. Created by [http://facebook.com/keerastudios Keera Studios].

=== Unfinished/in-progress games ===

;[http://allureofthestars.com Allure of the Stars]
:A near-future Sci-Fi roguelike and tactical squad game. Long-term goals are high replayability and auto-balancing through procedural content generation and persistent content modification based on player behaviour. The game is written using the {{HackagePackage|id=LambdaHack}} roguelike game engine.

;[http://ipwnstudios.com/node/4 Bloodknight]
:An action RPG for mobile devices

; [https://github.com/ghulette/haskell-game-of-life haskell-game-of-life]
: Conway's Game of Life

; [https://github.com/EricThoma/hchess hchess]
: Incomplete toy chess engine

;[http://dotat.at/prog/life/hslife.hs HsLife]
:A Haskell implementation of hashlife. It uses GLUT.

== Game Engines and Libraries ==

;[https://github.com/egonSchiele/actionkid actionkid]
:A video game framework, with a [http://vimeo.com/109663514 video tutorial] and [https://github.com/egonSchiele/chips chips], a game based on it.

;[[Bogre-Banana]]
:A 3D game-engine for Haskell. It uses Haskell bindings to the OGRE 3D engine and OIS input system and a library called reactive-banana, to create a "Functional Reactive Programming" game-engine.

;[http://hackage.haskell.org/package/bullet Bullet]
:A wrapper for the Bullet physics engine.

;[http://hackage.haskell.org/package/free-game free-game]
:A GUI/game library based on free monads.

;[http://hackage.haskell.org/package/FunGEn FunGEn]
:FunGEn (Functional Game Engine) is a platform-independent BSD-licensed 2D game engine based on OpenGL and GLUT. Its light dependencies make it easy to install, however GLUT is reputed to be unsuitable for simultaneous keypresses. As of 2011 it's the only general-purpose game engine, and the quickest way to throw together [https://github.com/haskell-game/fungen/blob/master/examples/hello.hs simple] [https://github.com/haskell-game/fungen/blob/master/examples/pong/pong.hs 2D] [https://github.com/haskell-game/fungen/blob/master/examples/worms/worms.hs games], in Haskell. Example code: [http://joyful.com/fungen/site/example.html A Brief Example]. Forks and patches welcome!

;[http://projects.haskell.org/game-tree/ game-tree]
:game-tree is a purely functional library for searching game trees - useful for zero-sum two player games.

;[http://hackage.haskell.org/package/GLFW-b GLFW-b]
:Bindings to GLFW, a free, open source, multi-platform library for creating OpenGL contexts and managing input, including keyboard, mouse, joystick and time.

;[http://gloss.ouroborus.net/ Gloss]
:An OpenGL abstraction layer supporting game-style main loops.

;[https://github.com/haskell-game haskell-game]
:A project to make game development with Haskell easier to get started with by providing a suite of libraries for covering all sorts of aspects of game development.

;[https://github.com/mchakravarty/HaskellSpriteKit Haskell SpriteKit]
:Haskell SpriteKit provides a purely functional interface to the SpriteKit game engine on Apple platforms. SpriteKit is a state-of-the-art engine for 2D games and includes a versatile animation framework and an integrated physics engine. It is easy to use without the need for low-level programming or advanced concepts, such FRP.

;[http://helm-engine.org/ Helm]
:A functionally reactive game engine inspired by [http://elm-lang.org/ Elm].

;[http://hackage.haskell.org/package/HGamer3D HGamer3D]
:A game engine for Windows which includes Haskell bindings to a couple of C++ libraries and a Haskell API on top of that. Features include Audio, Joystick, Mouse and Keyboard handling, GUI, Network, Physics, 3D graphics.
:[https://www.youtube.com/watch?v=v_GSbObYRkY Y-Wing flight] is a video of a demonstration of the possibilities of HGamer3D.

;[http://hackage.haskell.org/cgi-bin/hackage-scripts/package/Hipmunk Hipmunk]
:Hipmunk: A Haskell binding for [http://chipmunk-physics.net/ Chipmunk]. Chipmunk is a fast, simple, portable, 2D physics engine. It is completely self-contained. See also [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/HipmunkPlayground HipmunkPlayground]: a simple OpenGL program that allows you to see some of Hipmunk's functions in action.

;[https://github.com/asivitz/Hickory Hickory]
:Hickory is not really a game engine. It's more of a collection of tools and abstractions that can be used to make games. It doesn't have opinions and doesn't force you into a particular paradigm.

;[[Hpysics]]
:Hpysics is a physics engine written using Data Parallel Haskell during Google Summer of Code 2008.

;[http://hackage.haskell.org/package/hogre hogre]
:Haskell bindings to the excellent OGRE 3D rendering engine. Ogre has been used in commercial games such as Torchlight and several books exist documenting the Ogre API. Ogre uses an MIT license making it compatible with many Haskell libraries.

;[http://hackage.haskell.org/package/IrrHaskell IrrHaskell]
:Haskell binding to the [http://irrlicht.sourceforge.net/ Irrlicht game engine]. The Irrlicht Engine is an open source high performance realtime 3D engine

;[http://lambdacube3d.com/ LambdaCube 3D]
:LambdaCube 3D is a domain specific language and library that makes it possible to program GPUs in a purely functional style.

;[http://hackage.haskell.org/package/set-cover set-cover]
:Solver for exact set cover problems. Included examples: [[Sudoku]], [[Mastermind]], [[Nonogram]], Domino tessellation, 8 Queens, Soma Cube, [[Tetris Cube]], Cube of L's, Logika's Baumeister puzzle. Generic algorithm allows to choose between slow but flexible Set from containers package and fast but cumbersome bitvectors.

=== Unfinished/in-progress game engines/libraries ===

;[https://github.com/adorablepuppy/CurryDog CurryDog]
:Aims to be a 2d and 3d modular game engine.

;[https://github.com/keera-studios/gtk-helpers gtk-helpers]
:A collection of auxiliary operations related to Gtk2hs. See also [http://keera.co.uk/blog/2013/03/19/creating-board-games-in-haskell/ Creating board games in Haskell in 100 lines of code]

;[[HaskGame]]
:An incomplete graphics system abstraction layer.

; [https://bananu7.github.io/Hate Hate]
:Hate is a small framework for graphical haskell games and applications. It's heavily inspired by Love and aims at similar ease of use, but within the power of Haskell's type and concurrency safety.

; [https://github.com/shicks/hsgame hsgame]
:A framework for network games

;[https://github.com/LambdaHack/LambdaHack LambdaHack]
:A game engine library for roguelike games of arbitrary theme, size and complexity, packaged together with a small example dungeon crawler. When completed, it will let you specify content to be procedurally generated, define the AI behaviour on top of the generic content-independent rules and compile a ready-to-play game binary, using either the supplied or a custom-made main loop. Several frontends are available (GTK is the default) and many other generic engine components are easily overridden, but the fundamental source of flexibility lies in the strict and type-safe separation of code and content.

[[Category:Games|*]]
[[Category:Applications]]
[[Category:Libraries]]

Learning Haskell

2016-08-15T02:10:35Z

Chak: /* Online tutorials */ Added Learning Haskell tutorial

[[Category:Tutorials]]

This portal points to places where you can go if you want to learn Haskell.

The [[Introduction|Introduction to Haskell]] on the Haskell website tells you what Haskell gives you: substantially increased programmer productivity, shorter, clearer, and more maintainable code, fewer errors, higher reliability, a smaller semantic gap between the programmer and the language, shorter lead times. There is an old but still relevant paper about [http://www.cse.chalmers.se/~rjmh/Papers/whyfp.html Why Functional Programming Matters] (PDF) by John Hughes. More recently, Sebastian Sylvan wrote an article about [[Why Haskell Matters]].

Join the [http://www.reddit.com/r/haskell Haskell subreddit], where we do regular Q&A threads called [[Hask Anything]] (that's the archive).

There is also a [http://www.haskell.org/haskellwiki/Comparison table comparing Haskell to other functional languages]. Many questions about functional programming are answered by the [http://www.cs.nott.ac.uk/~gmh//faq.html comp.lang.functional FAQ].

You can ask questions to members of the Haskell community on mailing lists, IRC, or StackOverflow. We recommend installing the [http://www.haskell.org/platform/ Haskell Platform].

== Training courses ==

Short training courses aimed at existing programmers

* [http://www.well-typed.com/services_training On-site and public training courses] by Well-Typed (2-day intro, 2-day advanced, custom on-site courses)
* [http://www.nobleprog.co.uk/haskell/training Public training courses] by NobleProg and Nilcons
* [http://www.cs.ox.ac.uk/softeng/subjects/FPR.html Software Engineering course on Functional Programming] at the University of Oxford (1-week course)
* [http://www.cs.uu.nl/wiki/USCS Summerschool on Applied Functional Programming] at Utrecht University (2-week course)

== Material for self-study ==

Below there are links to certain introductory material. If you want to dig deeper, see [[Books and tutorials]].

=== Textbooks ===

* [http://www.haskellbook.com/ Haskell Programming from first principles]
* [http://www.cs.yale.edu/homes/hudak/SOE/ The Haskell School of Expression]
* [http://www.haskellcraft.com/ Haskell: the Craft of Functional Programming]
* [http://www.prenhall.com/allbooks/ptr_0134843460.html Introduction to Functional Programming using Haskell]
* [http://www.cambridge.org/us/knowledge/isbn/item1129654/Introduction%20to%20Functional%20Programming%20Systems%20Using%20Haskell/?site_locale=en_US An Introduction to Functional Programming Systems Using Haskell]
* [http://www.iro.umontreal.ca/~lapalme/Algorithms-functional.html Algorithms: A functional programming approach]
* [http://homepages.cwi.nl/~jve/HR/ The Haskell Road to Logic, Maths, and Programming] (also freely [http://fldit-www.cs.uni-dortmund.de/~peter/PS07/HR.pdf available online]).
* [http://www.cs.nott.ac.uk/~gmh/book.html Programming in Haskell]
* [http://book.realworldhaskell.org/ Real World Haskell]
* [http://nostarch.com/lyah.htm Learn You a Haskell for Great Good!]
* [http://http://happylearnhaskelltutorial.com Happy Learn Haskell Tutorial]

=== Online tutorials ===

* [[Meta-tutorial]]
* [http://pluralsight.com/training/Courses/Find?highlight=true&searchTerm=haskell Haskell Fundamentals - get started and learn key concepts] at Pluralsight (2-part, 5 hour online course)
* [http://en.wikibooks.org/wiki/Haskell Haskell Wikibook] A thorough textbook with a step-by-step beginners track assuming no programming background. Also includes many advanced concepts, and adaptations of "Yet Another Haskell Tutorial", "Write Yourself a Scheme in 48 Hours", and "All about monads".
* [http://pub.hal3.name/daume02yaht.pdf YAHT - Yet Another Haskell Tutorial] (good tutorial available online)
* [http://www.cs.ou.edu/~rlpage/fpclassCurrent/textbook/haskell.shtml Two dozen short lessons]
* [http://www.haskell.org/tutorial/ A Gentle Introduction to Haskell] - classic text, but not so gentle really :D
* [ftp://ftp.geoinfo.tuwien.ac.at/navratil/HaskellTutorial.pdf Haskell-Tutorial]
* [http://lasche.codingcrew.de/kurse/haskell/hskurs_index.htm Online Haskell Course] (German)
* [http://collection.openlibra.com.s3.amazonaws.com/pdf/haskell_tutorial_for_c_programmers_en.pdf Haskell tutorial for C Programmers]
* [http://learnyouahaskell.com/ Learn You a Haskell for Great Good!] Beautiful, illustrated Haskell tutorial for programmers with less of a functional programming background.
* [http://happylearnhaskelltutorial.com/ Happy Learn Haskell Tutorial] Up to date complete beginner illustrated tutorial that uses many basic examples and exercises , going very slowly step by step.
* [http://www.youtube.com/playlist?list=PL2672EBC57C1F5F9B Learning Haskell] Ongoing tutorial in the form of YouTube videos; updates slowly.
*[https://stevekrouse.github.io/hs.js/ Pattern matching, first-class functions, and abstracting over recursion in Haskell], a simulation of the evaluation of map, foldr and foldl.
* [https://www.schoolofhaskell.com/ School of Haskell]
* [http://learn.hfm.io/ Learning Haskell] — a tutorial combining clear explanations, graphics programming, and hands-on screencasts to teach you the essential concepts of functional programming in Haskell.

=== Advanced tutorials ===

* [[Hitchhikers guide to Haskell]]
* [http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours Write Yourself a Scheme in 48 Hours]
* [http://research.microsoft.com/en-us/um/people/simonpj/papers/marktoberdorf/ Tackling the Awkward Squad] (on I/O, interfacing to C, concurrency and exceptions)

=== Debugging/profiling/optimization ===

=== Monads ===

* [http://blog.sigfpe.com/2006/08/you-could-have-invented-monads-and.html You Could Have Invented Monads! (And Maybe You Already Have.)]
* [http://homepages.inf.ed.ac.uk/wadler/papers/marktoberdorf/baastad.pdf Monads for Functional Programming]
* [http://www.haskell.org/haskellwiki/All_About_Monads All about monads]
* [[IO inside|IO inside: down the Rabbit Hole]]

=== Type classes ===

* [http://homepages.inf.ed.ac.uk/wadler/papers/class/class.ps.gz The paper that for the first time introduced type classes and their implementation using dictionaries]
* [[Research papers/Type systems#Type classes|More papers on the type classes]]

=== Generic programming ===

* [[Scrap your boilerplate]]

=== Popular libraries ===

* ByteStrings?
* [http://legacy.cs.uu.nl/daan/download/parsec/parsec.html Parsec, a fast combinator parser]
* [[Modern array libraries]]
* [http://www.haskell.org/haskellwiki/Gtk2Hs/Tutorials Gtk2Hs, the GUI library]
* [https://ocharles.org.uk/blog/ 24 Days of Hackage] (blog posts about many popular libraries)

=== Reference ===

* The official language definition: [[Language and library specification]]
* [http://www.letu.edu/people/jaytevis/Programming-Languages/Haskell/tourofprelude.html Tour of the Haskell Prelude]
* [http://zvon.org/other/haskell/Outputglobal/index.html Haskell Reference]
* Haskell [[Reference card]]
* [http://members.chello.nl/hjgtuyl/tourdemonad.html A tour of the Haskell Monad functions]
* [http://www.cs.uu.nl/wiki/bin/view/Helium/ATourOfTheHeliumPrelude Tour of the Helium Prelude]
* [http://www.cs.kent.ac.uk/people/staff/sjt/craft2e/errors/allErrors.html Some common Hugs error messages]
* [http://cheatsheet.codeslower.com/ The Haskell Cheatsheet] - A reference card and mini-tutorial in one.
* A [http://www.haskell.org/haskellwiki/Category:Glossary Glossary] of common terminology.

=== Course material ===
* [http://www.cse.chalmers.se/edu/course/TDA555/ Introduction to Functional Programming, Chalmers] (for beginners at programming)
* [http://www.cse.chalmers.se/edu/course/TDA452/ Functional Programming, Chalmers]
* [http://www.cse.chalmers.se/edu/course/afp/ Advanced Functional Programming, Chalmers]
* [http://www.cse.chalmers.se/edu/course/pfp/ Parallel Functional Programming, Chalmers]
* [http://www.shuklan.com/haskell Introduction to Haskell], University of Virginia CS 1501
* [http://www.cs.caltech.edu/courses/cs11/material/haskell/index.html CS 11 Caltech]
* [http://www.cs.uu.nl/docs/vakken/lfp/ Functional programming]: course notes ([http://www.staff.science.uu.nl/~fokke101/courses/fp-eng.pdf English], [http://www.staff.science.uu.nl/~fokke101/courses/fp-nl.pdf Dutch], [http://www.staff.science.uu.nl/~fokke101/courses/fp-sp.pdf Spanish]), slides in Dutch
* [http://www.cse.unsw.edu.au/~cs1011/05s2/ CS1011]: Tutorials, lab exercises and solutions
* Stanford - [http://www.scs.stanford.edu/11au-cs240h/ Functional Systems in Haskell]
* [http://www.seas.upenn.edu/~cis194/spring13/lectures.html CIS 194 Introduction to Haskell], University of Pennsylvania

== Trying Haskell online ==

There are several websites where you can enter a Haskell program and run it. They are (in no particular order):
* [https://cloud.sagemath.com/ SageMathCloud]
* [https://www.fpcomplete.com/school/using-fphc FP Haskell Center]
* [http://tryhaskell.org/ Try Haskell]
* [http://www.codeworld.info/ Codeworld]
* [http://chrisuehlinger.com/LambdaBubblePop/ Bubble Pop!], the satisfaction of popping bubble wrap, combined with the satisfaction of really elegant functional programming!
* [http://tryplayg.herokuapp.com/ Try Haste & HPlayground client-side framework]; the source code is on [https://github.com/agocorona/tryhplay GitHub]
* [https://koding.com/ Koding] is a cloud based IDE which supports Haskell and several other languages. Free accounts allow one virtual machine.

To create a browser based environment yourself:
* [http://gibiansky.github.io/IHaskell/ IHaskell]

Distributions

2016-03-27T07:02:59Z

Chak: /* Haskell for Mac IDE */

The standard ways to install the Glasgow Haskell Compiler and related tools are given at the main [http://www.haskell.org/downloads Haskell webpage]. However, there are many other ways to install GHC, suited for different purposes.

== [http://haskellformac.com Haskell for Mac IDE] ==

Haskell for Mac is an easy-to-use integrated programming environment for Haskell on OS X. It is a one-click install of a complete Haskell system, including Haskell compiler, editor, many libraries, and a novel form of interactive Haskell playgrounds. Haskell playgrounds support exploration and experimentation with code. They are convenient to learn functional programming, prototype Haskell code, interactively visualize data, and to create interactive animations.

Features include the following:
* Avoid dealing with complicated installation instructions.
* Built-in Haskell editor with customisable themes and context-sensitive identifier completion.
* Interactive Haskell playgrounds provide immediate and continuous feedback.
* You can immediately see what you program is doing while you develop it.
* Playground results can be text or images produced by the Rasterific, Diagrams, and Chart packages.
* Add code and multimedia files to a Haskell project with drag'n'drop.
* Haskell binding to Apple's 2D animation and games framework SpriteKit.

Haskell for Mac requires OS X Yosemite or above.

== [http://www.kronosnotebook.com/haskell Kronos Haskell Notebook for Mac] ==

Based on IPython Notebook and IHaskell, Kronos provides
* immediate installation of Haskell and related tools
* a beautiful notebook environment for editing and documenting code
* an easy interface for external package installation
* easy file management and exporting to multiple formats

== [http://nixos.org/nix Nix Package Manager] ==

The Nix package manager (part of NixOS but usable independently) can install GHC, related tools, and Haskell packages across Linux and other Unix systems (including OS X).

The key advantages of adopting Nix as a Haskell distribution are isolation and reproducibility, with environments fully specified. This can simplify dependency management by reducing hidden state.

The users' guide to Haskell infrastructure is the most important reference:
http://nixos.org/nixpkgs/manual/#users-guide-to-the-haskell-infrastructure

Also helpful are blog posts of and articles of various users describing their environments:

* http://www.cse.chalmers.se/~bernardy/nix.html
* https://ocharles.org.uk/blog/posts/2014-02-04-how-i-develop-with-nixos.html
* http://www.pavelkogan.com/2014/07/09/haskell-development-with-nix/

Nix support for Haskell is very much under active development, and many users have begun to adopt the new haskell-ng workflow: http://stackoverflow.com/questions/29033580/how-do-i-use-the-new-haskell-ng-infrastructure-on-nixos

== [https://halcyon.sh/ Halcyon] ==

Halcyon is a system for installing Haskell apps and development tools, including GHC and Cabal. It is a simple system which also archives and caches all build products, and can automatically restore archived build products, saving time during development, continuous integration, and deployment. It allows sandbox sources, build tools, and native OS packages to be declared as dependencies and installed together with the app. It can be used to construct deployment systems, such as [https://haskellonheroku.com/ Haskell on Heroku].

== [https://cloud.sagemath.com/ SageMathCloud] ==

SageMathCloud is a platform for collaborative computational mathematics. It provides both free and paid accounts. Among the many tools it provides (including SageMath, R, IPython, Numpy/Scipy/Matplotlib, Octave, Cython, GAP, Pari, Macaulay2, and Singular) is GHC. It allows the editing of Haskell files and the creation of Haskell projects, and interaction with GHC through an embedded terminal in the command line.

With SageMathCloud, developers have access to a shared cloud environment for Haskell, usable from any computer with an internet connection, and requiring no installation.

== Other Online Evaluators ==

There are other tools available that allow the compilation and execution of small amounts of Haskell code for testing, illustration and education purposes. These include:

* http://codepad.org/
* https://www.codechef.com/ide

Additionally, FP Complete's [http://schoolofhaskell.com School of Haskell] allows the embedding of "active" example source code into their blog posts and tutorials.

Distributions

2016-03-27T07:00:34Z

Chak: /* Haskell for Mac */

The standard ways to install the Glasgow Haskell Compiler and related tools are given at the main [http://www.haskell.org/downloads Haskell webpage]. However, there are many other ways to install GHC, suited for different purposes.

== [http://haskellformac.com Haskell for Mac IDE] ==

Haskell for Mac is an easy-to-use integrated programming environment for Haskell on OS X. It is a one-click install of a complete Haskell system, including Haskell compiler, editor, many libraries, and a novel form of interactive Haskell playgrounds. Haskell playgrounds support exploration and experimentation with code. They are convenient to learn functional programming, prototype Haskell code, interactively visualize data, and to create interactive animations.

Features include the following:
* Built-in Haskell editor with customisable themes and context-sensitive identifier completion.
* Interactive Haskell playgrounds provide immediate and continuous feedback.
* You can immediately see what you program is doing while you develop it.
* Playground results can be text or images produced by the Rasterific, Diagrams, and Chart packages.
* Add code and multimedia files to a Haskell project with drag'n'drop.
* Haskell binding to Apple's 2D animation and games framework SpriteKit.

Haskell for Mac requires OS X Yosemite or above.

== [http://www.kronosnotebook.com/haskell Kronos Haskell Notebook for Mac] ==

Based on IPython Notebook and IHaskell, Kronos provides
* immediate installation of Haskell and related tools
* a beautiful notebook environment for editing and documenting code
* an easy interface for external package installation
* easy file management and exporting to multiple formats

== [http://nixos.org/nix Nix Package Manager] ==

The Nix package manager (part of NixOS but usable independently) can install GHC, related tools, and Haskell packages across Linux and other Unix systems (including OS X).

The key advantages of adopting Nix as a Haskell distribution are isolation and reproducibility, with environments fully specified. This can simplify dependency management by reducing hidden state.

The users' guide to Haskell infrastructure is the most important reference:
http://nixos.org/nixpkgs/manual/#users-guide-to-the-haskell-infrastructure

Also helpful are blog posts of and articles of various users describing their environments:

* http://www.cse.chalmers.se/~bernardy/nix.html
* https://ocharles.org.uk/blog/posts/2014-02-04-how-i-develop-with-nixos.html
* http://www.pavelkogan.com/2014/07/09/haskell-development-with-nix/

Nix support for Haskell is very much under active development, and many users have begun to adopt the new haskell-ng workflow: http://stackoverflow.com/questions/29033580/how-do-i-use-the-new-haskell-ng-infrastructure-on-nixos

== [https://halcyon.sh/ Halcyon] ==

Halcyon is a system for installing Haskell apps and development tools, including GHC and Cabal. It is a simple system which also archives and caches all build products, and can automatically restore archived build products, saving time during development, continuous integration, and deployment. It allows sandbox sources, build tools, and native OS packages to be declared as dependencies and installed together with the app. It can be used to construct deployment systems, such as [https://haskellonheroku.com/ Haskell on Heroku].

== [https://cloud.sagemath.com/ SageMathCloud] ==

SageMathCloud is a platform for collaborative computational mathematics. It provides both free and paid accounts. Among the many tools it provides (including SageMath, R, IPython, Numpy/Scipy/Matplotlib, Octave, Cython, GAP, Pari, Macaulay2, and Singular) is GHC. It allows the editing of Haskell files and the creation of Haskell projects, and interaction with GHC through an embedded terminal in the command line.

With SageMathCloud, developers have access to a shared cloud environment for Haskell, usable from any computer with an internet connection, and requiring no installation.

== Other Online Evaluators ==

There are other tools available that allow the compilation and execution of small amounts of Haskell code for testing, illustration and education purposes. These include:

* http://codepad.org/
* https://www.codechef.com/ide

Additionally, FP Complete's [http://schoolofhaskell.com School of Haskell] allows the embedding of "active" example source code into their blog posts and tutorials.

Mac OS X

2016-02-02T00:35:06Z

Chak: /* Haskell for Mac */

There is also now the [[Mac OS X Strike Force]] that aims to improve using Haskell on OS X.

== The Haskell Platform ==

There are Mac OS X installers of the full Haskell Platform development environment. We recommend it:

[http://haskell.org/platform/ http://haskell.org/platform/icons/button-100.png]

== [http://haskellformac.com Haskell for Mac (IDE)] ==

[http://haskellformac.com Haskell for Mac] is an easy-to-use integrated programming environment for Haskell on OS X. It is a one-click install of a complete Haskell system, including Haskell compiler, editor, many libraries, and a novel form of interactive Haskell playgrounds. Haskell playgrounds support exploration and experimentation with code. They are convenient to learn functional programming, prototype Haskell code, interactively visualize data, and to create interactive animations.

Features include the following:
* Built-in Haskell editor with customisable themes, or you can use a separate text editor.
* Interactive Haskell playgrounds evaluate your code as you type.
* Easy to explore type information and to observe the behaviour of you program as you change it.
* Playground results can be text or images produced by the Rasterific, Diagrams, and Chart packages.
* Add code and multimedia files to a Haskell project with drag'n'drop.
* Haskell binding to Apple's 2D animation and games framework SpriteKit.
* Autosaving and automatic project versioning.

Haskell for Mac supports OS X Yosemite or above.

== GHC ==

==== Important notes ====

To get the most out of your GHC environment, you should add '~/Library/Haskell/bin' to your PATH environment variable before the path where you have GHC installed. This will allow you to get and use cabal-updates, as well as other programs shipped with GHC like hsc2hs.

In your ~/.profile, add the line:

<code>export PATH=$HOME/Library/Haskell/bin:$PATH</code>

=== Mac OS X 10.9 (Mavericks), Mac OS X 10.8 (Mountain Lion) and Xcode 5 ===

Both Mountain Lion and Mavericks support and now use XCode 5, which no longer provides GCC, only Clang.

This should not be problem for GHC 7.8 and newer, but
If using GHC 7.6.* or older, one of several work arounds is needed!

The workaround that the Haskell Platform maintainers are supporting can be found [http://www.haskell.org/pipermail/haskell-cafe/2013-October/111174.html here]. That work around along with [http://justtesting.org/post/64947952690/the-glasgow-haskell-compiler-ghc-on-os-x-10-9 this one] work with only the system provided compilers.

However, if you are still encountering usual bugs, the GCC based directions [https://gist.github.com/cartazio/7131371 here] may work out better.

=== Mac OS X 10.5 (Leopard) ===

To install GHC on Mac OS X 10.5 (Leopard), there are the following options:
* install the [http://hackage.haskell.org/platform/ Haskell Platform]
* install [http://www.macports.org MacPort]'s [http://trac.macports.org/browser/trunk/dports/lang/ghc/Portfile ghc] package

=== Mac OS X 10.6 (Snow Leopard) and 10.7 (Lion) ===

* Install the [http://hackage.haskell.org/platform/ Haskell Platform]

To uninstall ghc call:
<code>
sudo uninstall-hs
</code>

=== Xcode 4.1 ===

GHC needs Xcode to be installed so it has access to the bintools, headers, and link libraries of the platform. The later two are provided by the SDK that comes as part of Xcode. GHC 7.0.2 is compiled against the 10.5 SDK. Xcode 4.1 no longer ships with it. <tt>ghci</tt> will work, but linking and some compiles with <ghc> will not. To make those work you need a copy of the 10.5 SDK. You can get this one several ways:

* Before you install Xcode 4.1, if you have Xcode 3.2 installed, do one of the following:
** Move it aside (renaming <tt>/Developer</tt> to <tt>/Xcode3.2</tt>)
** Move just the sdk aside (moving <tt>/Developer/SDKs/MacOSX10.5.sdk</tt> to, say, <tt>/ExtraSDKs/MacOSX10.5.sdk</tt>)
** Move just the sdk aside, install Xcode 4.1, then move it back into the <tt>/Developer/SDKs</tt> directory.
* If you don't have Xcode 3.2, then you can download it from the Apple Developer site, and install it in a location other than "/Developer". If you have already installed Xcode 4.1 ''be sure'' that you customized the install and don't install the "System Tools" or "UNIX Development" packages.

Building via GHC:
ghc --make -I{loc}/MacOSX10.5.sdk/usr/include/ -L{loc}/MacOSX10.5.sdk/usr/lib

Building via cabal:
cabal --extra-include-dirs={loc}/MacOSX10.5.sdk/usr/include/ --extra-lib-dirs={loc}/MacOSX10.5.sdk/usr/lib

Replace <tt>{loc}</tt> with wherever you put the SDK.

== HUGS ==

* install [http://www.macports.org MacPort]'s [http://trac.macports.org/browser/trunk/dports/lang/hugs98/Portfile hugs98] package.

== Installing libraries with external C bindings ==

Haskell libraries are installed with the <code>cabal</code> command line tool.

Some libraries depend on external C libraries, which are best installed with [http://macports.org MacPorts]. However, you have to tell cabal to include the <code>/opt/local/</code> directories when searching for external libraries. The following shell script does that by wrapping the <code>cabal</code> utility

> cat cabal-macports
#!/bin/bash
export CPPFLAGS=-I/opt/local/include
export LDFLAGS=-L/opt/local/lib
cabal $@ --extra-include-dirs=/opt/local/include \
--extra-lib-dirs=/opt/local/lib

> cabal-macports install foobar

== Editors with Haskell support ==

=== Open Source ===

* [http://aquamacs.org/ AquaMacs] or [http://emacsforosx.com EmacsForOSX], a graphical Emacs version
* [http://eclipsefp.sourceforge.net/ Eclipse] with the [[EclipseFP]] plugin. See [[EclipseOn_Mac_OS_X]]
* [http://www.gnu.org/software/emacs/ Emacs], is installed on every Mac
* [http://leksah.org/ Leksah]
* [http://code.google.com/p/macvim/ MacVim], a graphical Vim version
* [https://github.com/textmate/textmate Textmate 2], open source incarnation of TextMate 1.
* [http://www.vim.org/ Vim], is installed on every Mac
* [http://haskell.org/haskellwiki/Yi Yi] (written in Haskell itself!), is available through cabal-install

=== Commercial ===

[http://www.codingmonkeys.de/subethaedit/ SubEthaEdit]:

[[Image:SubEthaEdit.png]]

[http://macromates.com/ TextMate]:

[[Image:TextMate.png]]

[http://tuppis.com/smultron/ Smultron]:

[[Image:Smultron.png]]

and [http://www.sublimetext.com/ Sublime Text 2]:
[[Image:SubilmeText2.png]]

TextEdit is Mac's default text editor, a very basic editor that works fine for most uses, you must however be careful to put it into plain text mode using the Format menu.

== Shipping Installable Haskell Applications ==

* [http://hackage.haskell.org/cgi-bin/hackage-scripts/package/mkbndl mkbndl] builds installable Mac OSX applications from your Haskell project.

== Links ==
* [[Using Haskell in an Xcode Cocoa project]]; a description of how to add a Haskell module (callable from C) to an Xcode/Cocoa/Interface builder project on your Mac.
* [[Mac OS X Common Installation Paths]]: an effort to standardize where things go on a Mac OS X installation
[[Category:OS]]

IDEs

2016-01-19T22:26:05Z

Chak: Fixed mistake in previous edit, removing commercial section

The IDE world in Haskell is incomplete, but is in motion. There are many choices. When choosing your IDE, there are the following things to consider.

== Notable features of interest to consider ==

This is a list of features that any Haskell IDE could or should have. The IDEs listed below generally support some subset of these features. Please add more to this list if you think of anything. In future this should be expanded into separate headings with more description of how they would desirably work. For a discussion of IDEs there is the [https://groups.google.com/forum/#!forum/haskell-ide haskell-ide mailing list] and the [https://github.com/haskell/haskell-ide haskell-ide repository]

* Syntax highlighting (e.g. for Haskell, Cabal, Literate Haskell, Core, etc.)
* Macros (e.g. inserting imports/aligning/sorting imports, aligning up text, transposing/switching/moving things around)
* Type information (e.g. type at point, info at point, type of expression)
* IntelliSense/completion (e.g. jump-to-definition, who-calls, calls-who, search by type, completion, etc.)
* Project management (e.g. understanding of Cabal, configuration/building/installing, package sandboxing)
* Interactive REPL (e.g. GHCi/Hugs interaction, expression evaluation and such)
* Knowledge of Haskell in the GHCi/GHC side (e.g. understanding error types, the REPL, REPL objects, object inspection)
* Indentation support (e.g. tab cycle, simple back-forward indentation, whole area indentation, structured editing, etc.)
* Proper syntactic awareness of Haskell (e.g. with a proper parser and proper editor transpositions a la the structured editors of the 80s and Isabel et al)
* Documentation support (e.g. ability to call up documentation of symbol or module, either in the editor, or in the browser)
* Debugger support (e.g. stepping, breakpoints, etc.)
* Refactoring support (e.g. symbol renaming, hlint, etc.)
* Templates (e.g. snippets, Zen Coding type stuff, filling in all the cases of a case, etc.)

== Open Source ==

=== [https://github.com/rikvdkleij/intellij-haskell IntelliJ plugin for Haskell] ===
:See [http://www.haskell.org/pipermail/haskell-cafe/2014-October/116567.html the announcement of the plugin] and the [http://en.wikipedia.org/wiki/IntelliJ_IDEA Wikipedia article about IntelliJ].

=== [http://eclipsefp.github.com/ EclipseFP plugin for Eclipse IDE] ===
:Eclipse is an open, extensible IDE platform for "everything and nothing in particular". It is implemented in Java and runs on several platforms. The Java IDE built on top of it has already become very popular among Java developers. The Haskell tools extend it to support editing (syntax coloring, code assist), compiling, and running Haskell programs from within the IDE. In more details, it features:
:* Syntax highlighting and errors/warning highlighting
:* A module browser showing all installed packages, their modules and the contents of the modules (functions, types, etc.)
:* Integration with [http://www.haskell.org/hoogle/ Hoogle]: select an identifier in your code, press F4 and see the results in hoogle
:* Code navigation: from within a Haskell source file, jump to the file where a symbol in declared, or everywhere a symbol is used (type sensitive search, not just a text search)
:* Outline view: quickly jump to definitions in your file
:* Quick fixes on common errors and import management
:* A cabal file editor and integration with Cabal (uses cabal configure, cabal build under the covers), and a graphical view of installed packages
:* Integration with GHCi: launch GHCi inside Eclipse on any module
:* Integration with the GHCi debugger: performs the GHCi debugging commands for you from the standard Eclipse debugging interface
:* Integration with [http://community.haskell.org/~ndm/hlint/ HLint]: gives you HLint warning on building and allows you to quick fix them
:* Integration with [https://github.com/jaspervdj/stylish-haskell Stylish-Haskell]: format your code with stylish-haskell
:* Test support: shows results of test-framework based test suite in a graphical format. HTF support to come soon.

=== [http://colorer.sourceforge.net/eclipsecolorer/index.html Colorer plugin for Eclipse IDE] ===
:Syntax highlighting in Eclipse can be achieved using the Colorer plugin. This is more light weight than using the EclipseFP plugin which has much functionality but can be messy to install and has sometimes been a bit shaky.

:Eclipse Colorer is a plugin that enables syntax highlighting for a wide range of languages. It uses its own XML-based language for describing syntactic regions of languages. It does not include support for Haskell by default, but this can be added using the syntax description files attached below.

:Installation instructions
:# Install the Colorer from the update site <code>http://colorer.sf.net/eclipsecolorer/</code> (for more detailed instructions see the project page).
:# Download the Haskell syntax description files in [http://www.haskell.org/wikiupload/1/16/Haskell_Eclipse_Colorer.tar.gz Haskell_Eclipse_Colorer.tar.gz].
:#Extract its contents (haskell.hrc and proto.hrc) into the following directory (overwriting proto.hrc): <code>eclipse_installation_dir/plugins/net.sf.colorer_0.9.9/colorer/hrc</code> (sometimes the wiki seems to create a nesting tar file, so you might have to unpack twise).
:# Finished. A restart of Eclipse might be required. .hs files should open with syntax highlighting.

:Tips
:* If .hs files open with another kind of syntax highlighting check that they are associated with the Colorer Editor (Preferences -> General -> Editors -> File Associations). Or right click on them and choose Open With -> Other -> Colorer Editor.
:* Sometimes the highlighting gets confused. Then it might help to press Ctrl+R and re-colour the editor.
:* Use the Word Completion feature (Shift+Alt+7) as a poor man's content assist.
:* Use the standard External Tools feature in Eclipse to invoke the compiler from inside the IDE.
:* Use the Block selection feature (Shift+Alt+A) to insert/remove line comments on multiple lines at the same time.
:* Some other useful standard Eclipse features include the Open resource command (Ctrl+Shift+R), the File search command (Ctrl+H) and the bookmarks feature (Edit -> Add bookmark). Make sure to check Include in next/prev navigation box (Windows -> Preferences -> General -> Editors -> Text Editors -> Annotations -> Bookmarks).

=== [[Leksah]] ===
:Leksah is an IDE for Haskell written in Haskell. Leksah is intended as a practical tool to support the Haskell development process. Leksah uses GTK+ as GUI Toolkit with the gtk2hs binding. It is platform independent and should run on any platform where GTK+, gtk2hs and GHC can be installed.

* http://www.leksah.org/
* https://hackage.haskell.org/package/leksah
* https://github.com/leksah/leksah

=== [http://kdevelop.org/ KDevelop] ===
:This IDE supports many languages. For Haskell it currently supports project management, syntax highlighting, building (with GHC) & executing within the IDE.

=== [http://www.vim.org Vim] ===

This may or may not be up to date. A Vim user should update it.

:* [https://github.com/begriffs/haskell-vim-now Haskell-vim-now] -- Full-featured Vim config with install script. Supports type inspection, linting, Hoogle, tagging with codex+hasktags, unicode concealing, and refactoring.
:* [http://projects.haskell.org/haskellmode-vim/ Haskell mode for Vim by Claus Reinke] - These plugins provide Vim integration with GHC and Haddock.
:* [https://github.com/scrooloose/syntastic Syntastic] -- An extremely useful Vim plugin which will interact with ghc_mod (when editing a Haskell file) every time the source file is saved to check for syntax and type errors.
:* [http://www.vim.org/scripts/script.php?script_id=2356 SHIM by Lars Kotthoff] -- Superior Haskell Interaction Mode (SHIM) plugin for Vim providing full GHCi integration (requires Vim compiled with Ruby support).
:* [http://www.vim.org/scripts/script.php?script_id=3200 Haskell Conceal] -- shows Unicode symbols for common Haskell operators such as ++ and other lexical notation in Vim window (source file itself remains unchanged).
:* [http://urchin.earth.li/~ian/vim/ by Ian Lynagh]: distinguishes different literal Haskell styles (Vim 7.0 includes a syntax file which supersedes these plugins).
:* There's a [[Literate programming/Vim|copy of lhaskell.vim]] on the Wiki.
:* [https://github.com/MarcWeber/vim-addon-haskell by Marc Weber] -- Vim script-based function/module completion, cabal support, tagging by one command, context completion ( w<tab> -> where ), module outline, etc
:* [http://www.vim.org/scripts/script.php?script_id=1968 Vim indenting mode for Haskell]
:* [https://github.com/ujihisa/neco-ghc neco-ghc] pragma, module, function completion.
:* [https://github.com/eagletmt/ghcmod-vim Ghcmod-vim]
:* [https://github.com/bitc/vim-hdevtools Hdevtools] - gives type information, quicker reloading and more.
:* [http://blog-mno2.csie.org/blog/2011/11/17/vim-plugins-for-haskell-programmers/ Addition list] with some missing here with screen shots of many of the above.

=== [http://www.gnu.org/s/emacs/ Emacs] ===

See [[Emacs]].

=== [http://atom.io Atom] ===

Atom is very similar to Sublime Text 2 (which is now discontinued). A huge [http://atom.io/packages package database] exists and two packages important to haskell developers are:
:* [https://atom.io/packages/language-haskell language-haskell] for haskell syntax highlighting.
:* [https://atom.io/packages/ide-haskell ide-haskell] for cabal-support, linting and ghc-mod utilities like type previewing.

== Commercial ==

=== [http://haskellformac.com Haskell for Mac] ===

Haskell for Mac is an easy-to-use integrated programming environment for Haskell on OS X. It is a one-click install of a complete Haskell system, including Haskell compiler, editor, many libraries, and a novel form of interactive Haskell playgrounds. Haskell playgrounds support exploration and experimentation with code. They are convenient to learn functional programming, prototype Haskell code, interactively visualize data, and to create interactive animations.

Features include the following:
* Built-in Haskell editor with customisable themes, or you can use a separate text editor.
* Autosaving and automatic project versioning.
* Interactive Haskell playgrounds evaluate your code as you type.
* Playground results can be text or images produced by the Rasterific, Diagrams, and Chart packages.
* Add code and multimedia files to a Haskell project with drag'n'drop.
* Haskell binding to Apple's 2D animation and games framework SpriteKit.

Haskell for Mac requires OS X Yosemite or above.

=== [https://github.com/SublimeHaskell/SublimeHaskell/ Sublime-Haskell] ===

Sublime-Haskell is a plugin for the [http://www.sublimetext.com/ Sublime Text Editor]. It is installed through the [https://sublime.wbond.net/ Sublime Package Controller].

It is built as a plugin to the Sublime text editor, so all the standard editing functionality is there. Here are the Haskell specific features:
* Syntax highlighting and error marking for Haskell and Cabal. Errors provided by interaction with the compiler. The errors are listed in an error pane, and the user can navigate through the errors.
* When working on a project that has a Cabal file, the Cabal file is detected, and the project can be configured, built, run, and tested using Cabal. The Cabal file is automatically detected. This also enhances error reporting, and auto-completion (all exported symbols from the project can then be matched against). Thus, there is good project management support.
* Rescan/build on file change.
* Can use Cabal-dev for sandboxing/pristine builds.
* Prettification/indentation and alignment via Stylish-Haskell.
* Jump to definition, and show information for a definition (using haskell-docs).
* Type display and insertion
* Fast building and type-inference via hdevtools.
* HLint provided by GHC-Mod.

Thus, Sublime-Haskell satisfies all the requirements listed at the top of the wiki for a baseline Haskell IDE. Sublime-Text is closed source, but the Haskell plugin is open source.

== See also ==

* [http://blog.johantibell.com/2011/08/results-from-state-of-haskell-2011.html Results from the State of Haskell, 2011 Survey].
* [http://nickknowlson.com/blog/2011/09/12/haskell-survey-categorized-weaknesses/ Categorized Weaknesses from the State of Haskell 2011 Survey], which barely touched upon IDEs.
* [[Editors]]
* [[Applications and libraries/Program development#Editor support]]
* [http://code.haskell.org/shim/ Shim]; the aim of the shim (Superior Haskell Interaction Mode) project is to provide better support for editing Haskell code in VIM and Emacs

== Other IDEs and Editors ==

The list below is incomplete. Please add to it with whatever you think of. This list should be expanded into sections, as above, with more details, with links to the actual documentation of the described features.

* Vim — '''PROS:''' Free. Works on Windows. Works in terminal. Decent alignment support. Tag-based completion and jumps. Very good syntax highlighting, flymake (via Syntastic), Cabal integration, Hoogle. Documentation for symbol at point '''CONS:''' Arcane, difficult for new users. Some complain of bad indentation support.
* [http://www.haskell.org/haskellwiki/Haskell_mode_for_Emacs Emacs]— '''PROS:''' Free. Works on Windows. Works in terminal. Decent alignment, indentation, syntax highlighting. Limited type information (type and info of name at point). Cabal/GHC/GHCi awareness and Haskell-aware REPL. Completion and jump-to-definition (via ETAGS). Documentation of symbol at point. Hoogle. Documentation for symbol at point. Flymake (error checking on the fly). '''CONS:''' Arcane, difficult for new users.
* Sublime — '''PROS:''' Works on Windows. '''CONS:''' Poor alignment support (though [http://www.reddit.com/r/haskell/comments/ts8fi/haskell_ides_emacs_vim_and_sublime_oh_my_opinions/c4pair1 there are packages] to do indentation a little better). Proprietary.
* [[Yi]] — '''PROS:''' Written in Haskell. Works in terminal. '''CONS:''' Very immature, lacking features. Problems building generally, especially on Windows.
* [http://www.haskell.org/haskellwiki/Leksah Leksah] — '''PROS:''' Syntax highlighting. Understands Cabal, Module browser, dependency knowledge, documentation display inside the IDE, jump-to-definition, flymake (error checking on the fly), limited evaluation of snippets, scratch buffer. Autocompletion. Not an arcane interface a la Emacs/Vim. '''CONS:''' Doesn't have a decent REPL. Are there any other cons? — This should be moved to the section above.
* [[Editors | Other Editors]
* [http://www.cs.kent.ac.uk/projects/heat/ HEAT:] An Interactive Development Environment for Learning & Teaching Haskell
* [http://www.geany.org/ Geany] '''PROS:''' Free. Works on Windows. Syntax highlighting, REPL. '''CONS:''' After using it for a while, Geany freezes quite often.

== Outdated ==

* [http://web.archive.org/web/20110726153330/http://hoovy.org/HaskellXcodePlugin/ plugin for Xcode] (links to the web archive)

=== [http://www.haskell.org/haskellwiki/HIDE hIDE] ===
:hIDE is a GUI-based Haskell IDE written using gtk+hs. It does not include an editor but instead interfaces with NEdit, vim or GNU emacs.

=== [http://www.haskell.org/haskellwiki/HIDE hIDE-2] ===
:Through the dark ages many a programmer has longed for the ultimate tool. In response to this most unnerving craving, of which we ourselves have had maybe more than our fair share, the dynamic trio of #Haskellaniacs (dons, dcoutts and Lemmih) hereby announce, to the relief of the community, that a fetus has been conceived: ''hIDE - the Haskell Integrated Development Environment''. So far the unborn integrates source code recognition and a chameleon editor, resenting these in a snappy gtk2 environment. Although no seer has yet predicted the date of birth of our hIDEous creature, we hope that the mere knowledge of its existence will spread peace of mind throughout the community as oil on troubled waters. See also: [[HIDE/Screenshots of HIDE]] and [[HIDE]]

=== [http://web.archive.org/web/20060213161530/http://www.students.cs.uu.nl/people/rjchaaft/JCreator/ JCreator with Haskell support] ===
: N.B. The link above is to the Wayback Machine (Web Archive); it seem that JCreator is no longer supported.
:JCreator is a highly customizable Java IDE for Windows. Features include extensive project support, fully customizable toolbars (including the images of user tools) and menus, increase/decrease indent for a selected block of text (tab/shift+tab respectively). The Haskell support module adds syntax highlighting for Haskell files and WinHugs, hugs, a static checker (if you double click on the error message, JCreator will jump to the right file and line and highlight it yellow) and the Haskell 98 Report as tools. Platforms: Win95, Win98, WinNT and Win2000 (only Win95 not tested yet). Size: 6MB. JCreator is a trademark of Xinox Software; Copyright © 2000 Xinox Software. The Haskell support module is made by Rijk-Jan van Haaften.

=== [[haste]] - Haskell TurboEdit ===
:haste - Haskell TurboEdit - was an IDE for the functional programming language Haskell, written in Haskell.

=== [http://www.haskell.org/visualhaskell Visual Haskell] ===
:Visual Haskell is a complete development environment for Haskell software, based on Microsoft's [http://www.microsoft.com/visualstudio/en-us Microsoft Visual Studio] platform. Visual Haskell integrates with the Visual Studio editor to provide interactive features to aid Haskell development, and it enables the construction of projects consisting of multiple Haskell modules, using the Cabal building/packaging infrastructure.

=== [http://www.cs.kent.ac.uk/projects/vital/ Vital] ===
:Vital is a visual programming environment. It is particularly intended for supporting the open-ended, incremental style of development often preferred by end users (engineers, scientists, analysts, etc.).

=== [http://www.cs.kent.ac.uk/projects/pivotal/ Pivotal] ===
:Pivotal 0.025 is an early prototype of a Vital-like environment for Haskell. Unlike Vital, however, Pivotal is implemented entirely in Haskell. The implementation is based on the use of the hs-plugins library to allow dynamic compilation and evaluation of Haskell expressions together with the gtk2hs library for implementing the GUI.

=== [https://www.fpcomplete.com/business/haskell-center/overview/ FP Haskell Center] ===

:FP Complete has developed a commercial Haskell IDE. (Now [https://www.fpcomplete.com/blog/2015/10/retiring-fphc retired]).

: It's in the cloud, and comes with all of the libraries on Stackage ready to go. (Basically, the Haskell Platform on steroids.) It's "in the cloud," which has its pros and cons.

: The standard IDE is in your browser, and has integration with Git and Github. Emacs, Sublime and Vim support will be released soon. One particularly cool feature is that you can spin up temporary web servers to test out the Haskell-powered website you might be coding up. It's really easy, and you can pay for FP Complete to host your permanent application, too.

: There's a free trial, with free academic licenses and paid commercial licenses. There will be "personal" licenses in a few weeks (from early Sept 2013) as well, since the commercial pricing is a bit steep for hobbyists.

: Some of the features:

* Auto-completion.
* Hoogle searching of all of Stackage.
* Hoogling in the context of a module and its imports.
* Live typechecking/recompiling / jump to error.
* Hlint suggestions.
* Jump to definition.
* Auto-removal of unnecessary imports.
* Get type of any identifier (globally or locally defined).
* Show documentation of any symbol (via hoogle), or open haddocks.
* Refactoring.
* Build project, run project.
* Auto-code formatting.
* Run a temporary web service for testing web apps.
* Deploy project to an Amazon instance.

IDEs

2015-10-24T06:07:14Z

Chak: /* Haskell for Mac */

The IDE world in Haskell is incomplete, but is in motion. There are many choices. When choosing your IDE, there are the following things to consider.

== Notable features of interest to consider ==

This is a list of features that any Haskell IDE could or should have. The IDEs listed below generally support some subset of these features. Please add more to this list if you think of anything. In future this should be expanded into separate headings with more description of how they would desirably work. For a discussion of IDEs there is the [https://groups.google.com/forum/#!forum/haskell-ide haskell-ide mailing list] and the [https://github.com/haskell/haskell-ide haskell-ide repository]

* Syntax highlighting (e.g. for Haskell, Cabal, Literate Haskell, Core, etc.)
* Macros (e.g. inserting imports/aligning/sorting imports, aligning up text, transposing/switching/moving things around)
* Type information (e.g. type at point, info at point, type of expression)
* IntelliSense/completion (e.g. jump-to-definition, who-calls, calls-who, search by type, completion, etc.)
* Project management (e.g. understanding of Cabal, configuration/building/installing, package sandboxing)
* Interactive REPL (e.g. GHCi/Hugs interaction, expression evaluation and such)
* Knowledge of Haskell in the GHCi/GHC side (e.g. understanding error types, the REPL, REPL objects, object inspection)
* Indentation support (e.g. tab cycle, simple back-forward indentation, whole area indentation, structured editing, etc.)
* Proper syntactic awareness of Haskell (e.g. with a proper parser and proper editor transpositions a la the structured editors of the 80s and Isabel et al)
* Documentation support (e.g. ability to call up documentation of symbol or module, either in the editor, or in the browser)
* Debugger support (e.g. stepping, breakpoints, etc.)
* Refactoring support (e.g. symbol renaming, hlint, etc.)
* Templates (e.g. snippets, Zen Coding type stuff, filling in all the cases of a case, etc.)

== Open Source ==

=== [https://github.com/rikvdkleij/intellij-haskell IntelliJ plugin for Haskell] ===
:See [http://www.haskell.org/pipermail/haskell-cafe/2014-October/116567.html the announcement of the plugin] and the [http://en.wikipedia.org/wiki/IntelliJ_IDEA Wikipedia article about IntelliJ].

=== [http://eclipsefp.github.com/ EclipseFP plugin for Eclipse IDE] ===
:Eclipse is an open, extensible IDE platform for "everything and nothing in particular". It is implemented in Java and runs on several platforms. The Java IDE built on top of it has already become very popular among Java developers. The Haskell tools extend it to support editing (syntax coloring, code assist), compiling, and running Haskell programs from within the IDE. In more details, it features:
:* Syntax highlighting and errors/warning highlighting
:* A module browser showing all installed packages, their modules and the contents of the modules (functions, types, etc.)
:* Integration with [http://www.haskell.org/hoogle/ Hoogle]: select an identifier in your code, press F4 and see the results in hoogle
:* Code navigation: from within a Haskell source file, jump to the file where a symbol in declared, or everywhere a symbol is used (type sensitive search, not just a text search)
:* Outline view: quickly jump to definitions in your file
:* Quick fixes on common errors and import management
:* A cabal file editor and integration with Cabal (uses cabal configure, cabal build under the covers), and a graphical view of installed packages
:* Integration with GHCi: launch GHCi inside Eclipse on any module
:* Integration with the GHCi debugger: performs the GHCi debugging commands for you from the standard Eclipse debugging interface
:* Integration with [http://community.haskell.org/~ndm/hlint/ HLint]: gives you HLint warning on building and allows you to quick fix them
:* Integration with [https://github.com/jaspervdj/stylish-haskell Stylish-Haskell]: format your code with stylish-haskell
:* Test support: shows results of test-framework based test suite in a graphical format. HTF support to come soon.

=== [http://colorer.sourceforge.net/eclipsecolorer/index.html Colorer plugin for Eclipse IDE] ===
:Syntax highlighting in Eclipse can be achieved using the Colorer plugin. This is more light weight than using the EclipseFP plugin which has much functionality but can be messy to install and has sometimes been a bit shaky.

:Eclipse Colorer is a plugin that enables syntax highlighting for a wide range of languages. It uses its own XML-based language for describing syntactic regions of languages. It does not include support for Haskell by default, but this can be added using the syntax description files attached below.

:Installation instructions
:# Install the Colorer from the update site <code>http://colorer.sf.net/eclipsecolorer/</code> (for more detailed instructions see the project page).
:# Download the Haskell syntax description files in [http://www.haskell.org/wikiupload/1/16/Haskell_Eclipse_Colorer.tar.gz Haskell_Eclipse_Colorer.tar.gz].
:#Extract its contents (haskell.hrc and proto.hrc) into the following directory (overwriting proto.hrc): <code>eclipse_installation_dir/plugins/net.sf.colorer_0.9.9/colorer/hrc</code> (sometimes the wiki seems to create a nesting tar file, so you might have to unpack twise).
:# Finished. A restart of Eclipse might be required. .hs files should open with syntax highlighting.

:Tips
:* If .hs files open with another kind of syntax highlighting check that they are associated with the Colorer Editor (Preferences -> General -> Editors -> File Associations). Or right click on them and choose Open With -> Other -> Colorer Editor.
:* Sometimes the highlighting gets confused. Then it might help to press Ctrl+R and re-colour the editor.
:* Use the Word Completion feature (Shift+Alt+7) as a poor man's content assist.
:* Use the standard External Tools feature in Eclipse to invoke the compiler from inside the IDE.
:* Use the Block selection feature (Shift+Alt+A) to insert/remove line comments on multiple lines at the same time.
:* Some other useful standard Eclipse features include the Open resource command (Ctrl+Shift+R), the File search command (Ctrl+H) and the bookmarks feature (Edit -> Add bookmark). Make sure to check Include in next/prev navigation box (Windows -> Preferences -> General -> Editors -> Text Editors -> Annotations -> Bookmarks).

=== [[Leksah]] ===
:Leksah is an IDE for Haskell written in Haskell. Leksah is intended as a practical tool to support the Haskell development process. Leksah uses GTK+ as GUI Toolkit with the gtk2hs binding. It is platform independent and should run on any platform where GTK+, gtk2hs and GHC can be installed.

* http://www.leksah.org/
* https://hackage.haskell.org/package/leksah
* https://github.com/leksah/leksah

=== [http://kdevelop.org/ KDevelop] ===
:This IDE supports many languages. For Haskell it currently supports project management, syntax highlighting, building (with GHC) & executing within the IDE.

=== [http://www.vim.org Vim] ===

This may or may not be up to date. A Vim user should update it.

:* [https://github.com/begriffs/haskell-vim-now Haskell-vim-now] -- Full-featured Vim config with install script. Supports type inspection, linting, Hoogle, tagging with codex+hasktags, unicode concealing, and refactoring.
:* [http://projects.haskell.org/haskellmode-vim/ Haskell mode for Vim by Claus Reinke] - These plugins provide Vim integration with GHC and Haddock.
:* [https://github.com/scrooloose/syntastic Syntastic] -- An extremely useful Vim plugin which will interact with ghc_mod (when editing a Haskell file) every time the source file is saved to check for syntax and type errors.
:* [http://www.vim.org/scripts/script.php?script_id=2356 SHIM by Lars Kotthoff] -- Superior Haskell Interaction Mode (SHIM) plugin for Vim providing full GHCi integration (requires Vim compiled with Ruby support).
:* [http://www.vim.org/scripts/script.php?script_id=3200 Haskell Conceal] -- shows Unicode symbols for common Haskell operators such as ++ and other lexical notation in Vim window (source file itself remains unchanged).
:* [http://urchin.earth.li/~ian/vim/ by Ian Lynagh]: distinguishes different literal Haskell styles (Vim 7.0 includes a syntax file which supersedes these plugins).
:* There's a [[Literate programming/Vim|copy of lhaskell.vim]] on the Wiki.
:* [https://github.com/MarcWeber/vim-addon-haskell by Marc Weber] -- Vim script-based function/module completion, cabal support, tagging by one command, context completion ( w<tab> -> where ), module outline, etc
:* [http://www.vim.org/scripts/script.php?script_id=1968 Vim indenting mode for Haskell]
:* [https://github.com/ujihisa/neco-ghc neco-ghc] pragma, module, function completion.
:* [https://github.com/eagletmt/ghcmod-vim Ghcmod-vim]
:* [https://github.com/bitc/vim-hdevtools Hdevtools] - gives type information, quicker reloading and more.
:* [http://blog-mno2.csie.org/blog/2011/11/17/vim-plugins-for-haskell-programmers/ Addition list] with some missing here with screen shots of many of the above.

=== [http://www.gnu.org/s/emacs/ Emacs] ===

See [[Emacs]].

=== [http://atom.io Atom] ===

Atom is very similar to Sublime Text 2 (which is now discontinued). A huge [http://atom.io/packages package database] exists and two packages important to haskell developers are:
:* [https://atom.io/packages/language-haskell language-haskell] for haskell syntax highlighting.
:* [https://atom.io/packages/ide-haskell ide-haskell] for cabal-support, linting and ghc-mod utilities like type previewing.

== Commercial ==

=== [https://www.fpcomplete.com/business/haskell-center/overview/ FP Haskell Center] ===

FP Complete has developed a commercial Haskell IDE.

It's in the cloud, and comes with all of the libraries on Stackage ready to go. (Basically, the Haskell Platform on steroids.)

It's "in the cloud," which has its pros and cons.

The standard IDE is in your browser, and has integration with Git and Github. Emacs, Sublime and Vim support will be released soon. One particularly cool feature is that you can spin up temporary web servers to test out the Haskell-powered website you might be coding up. It's really easy, and you can pay for FP Complete to host your permanent application, too.

There's a free trial, with free academic licenses and paid commercial licenses. There will be "personal" licenses in a few weeks (from early Sept 2013) as well, since the commercial pricing is a bit steep for hobbyists.

==== Feature set ====

Some of the features:

* Auto-completion.
* Hoogle searching of all of Stackage.
* Hoogling in the context of a module and its imports.
* Live typechecking/recompiling / jump to error.
* Hlint suggestions.
* Jump to definition.
* Auto-removal of unnecessary imports.
* Get type of any identifier (globally or locally defined).
* Show documentation of any symbol (via hoogle), or open haddocks.
* Refactoring.
* Build project, run project.
* Auto-code formatting.
* Run a temporary web service for testing web apps.
* Deploy project to an Amazon instance.

=== [http://haskellformac.com Haskell for Mac] ===

Haskell for Mac is an easy-to-use integrated programming environment for Haskell on OS X. It is a one-click install of a complete Haskell system, including Haskell compiler, editor, many libraries, and a novel form of interactive Haskell playgrounds. Haskell playgrounds support exploration and experimentation with code. They are convenient to learn functional programming, prototype Haskell code, interactively visualize data, and to create interactive animations.

Features include the following:
* Built-in Haskell editor with customisable themes, or you can use a separate text editor.
* Autosaving and automatic project versioning.
* Interactive Haskell playgrounds evaluate your code as you type.
* Playground results can be text or images produced by the Rasterific, Diagrams, and Chart packages.
* Add code and multimedia files to a Haskell project with drag'n'drop.
* Haskell binding to Apple's 2D animation and games framework SpriteKit.

Haskell for Mac requires OS X Yosemite or above.

=== [https://github.com/SublimeHaskell/SublimeHaskell/ Sublime-Haskell] ===

Sublime-Haskell is a plugin for the [http://www.sublimetext.com/ Sublime Text Editor]. It is installed through the [https://sublime.wbond.net/ Sublime Package Controller].

It is built as a plugin to the Sublime text editor, so all the standard editing functionality is there. Here are the Haskell specific features:
* Syntax highlighting and error marking for Haskell and Cabal. Errors provided by interaction with the compiler. The errors are listed in an error pane, and the user can navigate through the errors.
* When working on a project that has a Cabal file, the Cabal file is detected, and the project can be configured, built, run, and tested using Cabal. The Cabal file is automatically detected. This also enhances error reporting, and auto-completion (all exported symbols from the project can then be matched against). Thus, there is good project management support.
* Rescan/build on file change.
* Can use Cabal-dev for sandboxing/pristine builds.
* Prettification/indentation and alignment via Stylish-Haskell.
* Jump to definition, and show information for a definition (using haskell-docs).
* Type display and insertion
* Fast building and type-inference via hdevtools.
* HLint provided by GHC-Mod.

Thus, Sublime-Haskell satisfies all the requirements listed at the top of the wiki for a baseline Haskell IDE. Sublime-Text is closed source, but the Haskell plugin is open source.

== See also ==

* [http://blog.johantibell.com/2011/08/results-from-state-of-haskell-2011.html Results from the State of Haskell, 2011 Survey].
* [http://nickknowlson.com/blog/2011/09/12/haskell-survey-categorized-weaknesses/ Categorized Weaknesses from the State of Haskell 2011 Survey], which barely touched upon IDEs.
* [[Editors]]
* [[Applications and libraries/Program development#Editor support]]
* [http://code.haskell.org/shim/ Shim]; the aim of the shim (Superior Haskell Interaction Mode) project is to provide better support for editing Haskell code in VIM and Emacs

== Other IDEs and Editors ==

The list below is incomplete. Please add to it with whatever you think of. This list should be expanded into sections, as above, with more details, with links to the actual documentation of the described features.

* Vim — '''PROS:''' Free. Works on Windows. Works in terminal. Decent alignment support. Tag-based completion and jumps. Very good syntax highlighting, flymake (via Syntastic), Cabal integration, Hoogle. Documentation for symbol at point '''CONS:''' Arcane, difficult for new users. Some complain of bad indentation support.
* [http://www.haskell.org/haskellwiki/Haskell_mode_for_Emacs Emacs]— '''PROS:''' Free. Works on Windows. Works in terminal. Decent alignment, indentation, syntax highlighting. Limited type information (type and info of name at point). Cabal/GHC/GHCi awareness and Haskell-aware REPL. Completion and jump-to-definition (via ETAGS). Documentation of symbol at point. Hoogle. Documentation for symbol at point. Flymake (error checking on the fly). '''CONS:''' Arcane, difficult for new users.
* Sublime — '''PROS:''' Works on Windows. '''CONS:''' Poor alignment support (though [http://www.reddit.com/r/haskell/comments/ts8fi/haskell_ides_emacs_vim_and_sublime_oh_my_opinions/c4pair1 there are packages] to do indentation a little better). Proprietary.
* [[Yi]] — '''PROS:''' Written in Haskell. Works in terminal. '''CONS:''' Very immature, lacking features. Problems building generally, especially on Windows.
* [http://www.haskell.org/haskellwiki/Leksah Leksah] — '''PROS:''' Syntax highlighting. Understands Cabal, Module browser, dependency knowledge, documentation display inside the IDE, jump-to-definition, flymake (error checking on the fly), limited evaluation of snippets, scratch buffer. Autocompletion. Not an arcane interface a la Emacs/Vim. '''CONS:''' Doesn't have a decent REPL. Are there any other cons? — This should be moved to the section above.
* [[Editors | Other Editors]
* [http://www.cs.kent.ac.uk/projects/heat/ HEAT:] An Interactive Development Environment for Learning & Teaching Haskell
* [http://www.geany.org/ Geany] '''PROS:''' Free. Works on Windows. Syntax highlighting, REPL. '''CONS:''' After using it for a while, Geany freezes quite often.

== Outdated ==

* [http://web.archive.org/web/20110726153330/http://hoovy.org/HaskellXcodePlugin/ plugin for Xcode] (links to the web archive)

=== [http://www.haskell.org/haskellwiki/HIDE hIDE] ===
:hIDE is a GUI-based Haskell IDE written using gtk+hs. It does not include an editor but instead interfaces with NEdit, vim or GNU emacs.

=== [http://www.haskell.org/haskellwiki/HIDE hIDE-2] ===
:Through the dark ages many a programmer has longed for the ultimate tool. In response to this most unnerving craving, of which we ourselves have had maybe more than our fair share, the dynamic trio of #Haskellaniacs (dons, dcoutts and Lemmih) hereby announce, to the relief of the community, that a fetus has been conceived: ''hIDE - the Haskell Integrated Development Environment''. So far the unborn integrates source code recognition and a chameleon editor, resenting these in a snappy gtk2 environment. Although no seer has yet predicted the date of birth of our hIDEous creature, we hope that the mere knowledge of its existence will spread peace of mind throughout the community as oil on troubled waters. See also: [[HIDE/Screenshots of HIDE]] and [[HIDE]]

=== [http://web.archive.org/web/20060213161530/http://www.students.cs.uu.nl/people/rjchaaft/JCreator/ JCreator with Haskell support] ===
: N.B. The link above is to the Wayback Machine (Web Archive); it seem that JCreator is no longer supported.
:JCreator is a highly customizable Java IDE for Windows. Features include extensive project support, fully customizable toolbars (including the images of user tools) and menus, increase/decrease indent for a selected block of text (tab/shift+tab respectively). The Haskell support module adds syntax highlighting for Haskell files and WinHugs, hugs, a static checker (if you double click on the error message, JCreator will jump to the right file and line and highlight it yellow) and the Haskell 98 Report as tools. Platforms: Win95, Win98, WinNT and Win2000 (only Win95 not tested yet). Size: 6MB. JCreator is a trademark of Xinox Software; Copyright © 2000 Xinox Software. The Haskell support module is made by Rijk-Jan van Haaften.

=== [[haste]] - Haskell TurboEdit ===
:haste - Haskell TurboEdit - was an IDE for the functional programming language Haskell, written in Haskell.

=== [http://www.haskell.org/visualhaskell Visual Haskell] ===
:Visual Haskell is a complete development environment for Haskell software, based on Microsoft's [http://www.microsoft.com/visualstudio/en-us Microsoft Visual Studio] platform. Visual Haskell integrates with the Visual Studio editor to provide interactive features to aid Haskell development, and it enables the construction of projects consisting of multiple Haskell modules, using the Cabal building/packaging infrastructure.

=== [http://www.cs.kent.ac.uk/projects/vital/ Vital] ===
:Vital is a visual programming environment. It is particularly intended for supporting the open-ended, incremental style of development often preferred by end users (engineers, scientists, analysts, etc.).

=== [http://www.cs.kent.ac.uk/projects/pivotal/ Pivotal] ===
:Pivotal 0.025 is an early prototype of a Vital-like environment for Haskell. Unlike Vital, however, Pivotal is implemented entirely in Haskell. The implementation is based on the use of the hs-plugins library to allow dynamic compilation and evaluation of Haskell expressions together with the gtk2hs library for implementing the GUI.

IDEs

2015-10-24T06:06:22Z

Chak: /* Sublime-Haskell */

The IDE world in Haskell is incomplete, but is in motion. There are many choices. When choosing your IDE, there are the following things to consider.

== Notable features of interest to consider ==

This is a list of features that any Haskell IDE could or should have. The IDEs listed below generally support some subset of these features. Please add more to this list if you think of anything. In future this should be expanded into separate headings with more description of how they would desirably work. For a discussion of IDEs there is the [https://groups.google.com/forum/#!forum/haskell-ide haskell-ide mailing list] and the [https://github.com/haskell/haskell-ide haskell-ide repository]

* Syntax highlighting (e.g. for Haskell, Cabal, Literate Haskell, Core, etc.)
* Macros (e.g. inserting imports/aligning/sorting imports, aligning up text, transposing/switching/moving things around)
* Type information (e.g. type at point, info at point, type of expression)
* IntelliSense/completion (e.g. jump-to-definition, who-calls, calls-who, search by type, completion, etc.)
* Project management (e.g. understanding of Cabal, configuration/building/installing, package sandboxing)
* Interactive REPL (e.g. GHCi/Hugs interaction, expression evaluation and such)
* Knowledge of Haskell in the GHCi/GHC side (e.g. understanding error types, the REPL, REPL objects, object inspection)
* Indentation support (e.g. tab cycle, simple back-forward indentation, whole area indentation, structured editing, etc.)
* Proper syntactic awareness of Haskell (e.g. with a proper parser and proper editor transpositions a la the structured editors of the 80s and Isabel et al)
* Documentation support (e.g. ability to call up documentation of symbol or module, either in the editor, or in the browser)
* Debugger support (e.g. stepping, breakpoints, etc.)
* Refactoring support (e.g. symbol renaming, hlint, etc.)
* Templates (e.g. snippets, Zen Coding type stuff, filling in all the cases of a case, etc.)

== Open Source ==

=== [https://github.com/rikvdkleij/intellij-haskell IntelliJ plugin for Haskell] ===
:See [http://www.haskell.org/pipermail/haskell-cafe/2014-October/116567.html the announcement of the plugin] and the [http://en.wikipedia.org/wiki/IntelliJ_IDEA Wikipedia article about IntelliJ].

=== [http://eclipsefp.github.com/ EclipseFP plugin for Eclipse IDE] ===
:Eclipse is an open, extensible IDE platform for "everything and nothing in particular". It is implemented in Java and runs on several platforms. The Java IDE built on top of it has already become very popular among Java developers. The Haskell tools extend it to support editing (syntax coloring, code assist), compiling, and running Haskell programs from within the IDE. In more details, it features:
:* Syntax highlighting and errors/warning highlighting
:* A module browser showing all installed packages, their modules and the contents of the modules (functions, types, etc.)
:* Integration with [http://www.haskell.org/hoogle/ Hoogle]: select an identifier in your code, press F4 and see the results in hoogle
:* Code navigation: from within a Haskell source file, jump to the file where a symbol in declared, or everywhere a symbol is used (type sensitive search, not just a text search)
:* Outline view: quickly jump to definitions in your file
:* Quick fixes on common errors and import management
:* A cabal file editor and integration with Cabal (uses cabal configure, cabal build under the covers), and a graphical view of installed packages
:* Integration with GHCi: launch GHCi inside Eclipse on any module
:* Integration with the GHCi debugger: performs the GHCi debugging commands for you from the standard Eclipse debugging interface
:* Integration with [http://community.haskell.org/~ndm/hlint/ HLint]: gives you HLint warning on building and allows you to quick fix them
:* Integration with [https://github.com/jaspervdj/stylish-haskell Stylish-Haskell]: format your code with stylish-haskell
:* Test support: shows results of test-framework based test suite in a graphical format. HTF support to come soon.

=== [http://colorer.sourceforge.net/eclipsecolorer/index.html Colorer plugin for Eclipse IDE] ===
:Syntax highlighting in Eclipse can be achieved using the Colorer plugin. This is more light weight than using the EclipseFP plugin which has much functionality but can be messy to install and has sometimes been a bit shaky.

:Eclipse Colorer is a plugin that enables syntax highlighting for a wide range of languages. It uses its own XML-based language for describing syntactic regions of languages. It does not include support for Haskell by default, but this can be added using the syntax description files attached below.

:Installation instructions
:# Install the Colorer from the update site <code>http://colorer.sf.net/eclipsecolorer/</code> (for more detailed instructions see the project page).
:# Download the Haskell syntax description files in [http://www.haskell.org/wikiupload/1/16/Haskell_Eclipse_Colorer.tar.gz Haskell_Eclipse_Colorer.tar.gz].
:#Extract its contents (haskell.hrc and proto.hrc) into the following directory (overwriting proto.hrc): <code>eclipse_installation_dir/plugins/net.sf.colorer_0.9.9/colorer/hrc</code> (sometimes the wiki seems to create a nesting tar file, so you might have to unpack twise).
:# Finished. A restart of Eclipse might be required. .hs files should open with syntax highlighting.

:Tips
:* If .hs files open with another kind of syntax highlighting check that they are associated with the Colorer Editor (Preferences -> General -> Editors -> File Associations). Or right click on them and choose Open With -> Other -> Colorer Editor.
:* Sometimes the highlighting gets confused. Then it might help to press Ctrl+R and re-colour the editor.
:* Use the Word Completion feature (Shift+Alt+7) as a poor man's content assist.
:* Use the standard External Tools feature in Eclipse to invoke the compiler from inside the IDE.
:* Use the Block selection feature (Shift+Alt+A) to insert/remove line comments on multiple lines at the same time.
:* Some other useful standard Eclipse features include the Open resource command (Ctrl+Shift+R), the File search command (Ctrl+H) and the bookmarks feature (Edit -> Add bookmark). Make sure to check Include in next/prev navigation box (Windows -> Preferences -> General -> Editors -> Text Editors -> Annotations -> Bookmarks).

=== [[Leksah]] ===
:Leksah is an IDE for Haskell written in Haskell. Leksah is intended as a practical tool to support the Haskell development process. Leksah uses GTK+ as GUI Toolkit with the gtk2hs binding. It is platform independent and should run on any platform where GTK+, gtk2hs and GHC can be installed.

* http://www.leksah.org/
* https://hackage.haskell.org/package/leksah
* https://github.com/leksah/leksah

=== [http://kdevelop.org/ KDevelop] ===
:This IDE supports many languages. For Haskell it currently supports project management, syntax highlighting, building (with GHC) & executing within the IDE.

=== [http://www.vim.org Vim] ===

This may or may not be up to date. A Vim user should update it.

:* [https://github.com/begriffs/haskell-vim-now Haskell-vim-now] -- Full-featured Vim config with install script. Supports type inspection, linting, Hoogle, tagging with codex+hasktags, unicode concealing, and refactoring.
:* [http://projects.haskell.org/haskellmode-vim/ Haskell mode for Vim by Claus Reinke] - These plugins provide Vim integration with GHC and Haddock.
:* [https://github.com/scrooloose/syntastic Syntastic] -- An extremely useful Vim plugin which will interact with ghc_mod (when editing a Haskell file) every time the source file is saved to check for syntax and type errors.
:* [http://www.vim.org/scripts/script.php?script_id=2356 SHIM by Lars Kotthoff] -- Superior Haskell Interaction Mode (SHIM) plugin for Vim providing full GHCi integration (requires Vim compiled with Ruby support).
:* [http://www.vim.org/scripts/script.php?script_id=3200 Haskell Conceal] -- shows Unicode symbols for common Haskell operators such as ++ and other lexical notation in Vim window (source file itself remains unchanged).
:* [http://urchin.earth.li/~ian/vim/ by Ian Lynagh]: distinguishes different literal Haskell styles (Vim 7.0 includes a syntax file which supersedes these plugins).
:* There's a [[Literate programming/Vim|copy of lhaskell.vim]] on the Wiki.
:* [https://github.com/MarcWeber/vim-addon-haskell by Marc Weber] -- Vim script-based function/module completion, cabal support, tagging by one command, context completion ( w<tab> -> where ), module outline, etc
:* [http://www.vim.org/scripts/script.php?script_id=1968 Vim indenting mode for Haskell]
:* [https://github.com/ujihisa/neco-ghc neco-ghc] pragma, module, function completion.
:* [https://github.com/eagletmt/ghcmod-vim Ghcmod-vim]
:* [https://github.com/bitc/vim-hdevtools Hdevtools] - gives type information, quicker reloading and more.
:* [http://blog-mno2.csie.org/blog/2011/11/17/vim-plugins-for-haskell-programmers/ Addition list] with some missing here with screen shots of many of the above.

=== [http://www.gnu.org/s/emacs/ Emacs] ===

See [[Emacs]].

=== [http://atom.io Atom] ===

Atom is very similar to Sublime Text 2 (which is now discontinued). A huge [http://atom.io/packages package database] exists and two packages important to haskell developers are:
:* [https://atom.io/packages/language-haskell language-haskell] for haskell syntax highlighting.
:* [https://atom.io/packages/ide-haskell ide-haskell] for cabal-support, linting and ghc-mod utilities like type previewing.

== Commercial ==

=== [https://www.fpcomplete.com/business/haskell-center/overview/ FP Haskell Center] ===

FP Complete has developed a commercial Haskell IDE.

It's in the cloud, and comes with all of the libraries on Stackage ready to go. (Basically, the Haskell Platform on steroids.)

It's "in the cloud," which has its pros and cons.

The standard IDE is in your browser, and has integration with Git and Github. Emacs, Sublime and Vim support will be released soon. One particularly cool feature is that you can spin up temporary web servers to test out the Haskell-powered website you might be coding up. It's really easy, and you can pay for FP Complete to host your permanent application, too.

There's a free trial, with free academic licenses and paid commercial licenses. There will be "personal" licenses in a few weeks (from early Sept 2013) as well, since the commercial pricing is a bit steep for hobbyists.

==== Feature set ====

Some of the features:

* Auto-completion.
* Hoogle searching of all of Stackage.
* Hoogling in the context of a module and its imports.
* Live typechecking/recompiling / jump to error.
* Hlint suggestions.
* Jump to definition.
* Auto-removal of unnecessary imports.
* Get type of any identifier (globally or locally defined).
* Show documentation of any symbol (via hoogle), or open haddocks.
* Refactoring.
* Build project, run project.
* Auto-code formatting.
* Run a temporary web service for testing web apps.
* Deploy project to an Amazon instance.

=== [http://haskellformac.com Haskell for Mac] ===

Haskell for Mac is an easy-to-use integrated programming environment for Haskell on OS X. It is a one-click install of a complete Haskell system, including Haskell compiler, editor, many libraries, and a novel form of interactive Haskell playgrounds. Haskell playgrounds support exploration and experimentation with code. They are convenient to learn functional programming, prototype Haskell code, interactively visualize data, and to create interactive animations.

Features include the following:
* Built-in Haskell editor with customisable themes, or you can use a separate text editor.
* Autosaving and automatic project versioning.
* Interactive Haskell playgrounds evaluate your code as you type.
* Playground results can be text or images produced by the Rasterific, Diagrams, and Chart packages.
* Add code and multimedia files to a Haskell project with drag'n'drop.
* Haskell binding to Apple's 2D animation and games framework SpriteKit.

Haskell for Mac is requires for OS X Yosemite or above.

=== [https://github.com/SublimeHaskell/SublimeHaskell/ Sublime-Haskell] ===

Sublime-Haskell is a plugin for the [http://www.sublimetext.com/ Sublime Text Editor]. It is installed through the [https://sublime.wbond.net/ Sublime Package Controller].

It is built as a plugin to the Sublime text editor, so all the standard editing functionality is there. Here are the Haskell specific features:
* Syntax highlighting and error marking for Haskell and Cabal. Errors provided by interaction with the compiler. The errors are listed in an error pane, and the user can navigate through the errors.
* When working on a project that has a Cabal file, the Cabal file is detected, and the project can be configured, built, run, and tested using Cabal. The Cabal file is automatically detected. This also enhances error reporting, and auto-completion (all exported symbols from the project can then be matched against). Thus, there is good project management support.
* Rescan/build on file change.
* Can use Cabal-dev for sandboxing/pristine builds.
* Prettification/indentation and alignment via Stylish-Haskell.
* Jump to definition, and show information for a definition (using haskell-docs).
* Type display and insertion
* Fast building and type-inference via hdevtools.
* HLint provided by GHC-Mod.

Thus, Sublime-Haskell satisfies all the requirements listed at the top of the wiki for a baseline Haskell IDE. Sublime-Text is closed source, but the Haskell plugin is open source.

== See also ==

* [http://blog.johantibell.com/2011/08/results-from-state-of-haskell-2011.html Results from the State of Haskell, 2011 Survey].
* [http://nickknowlson.com/blog/2011/09/12/haskell-survey-categorized-weaknesses/ Categorized Weaknesses from the State of Haskell 2011 Survey], which barely touched upon IDEs.
* [[Editors]]
* [[Applications and libraries/Program development#Editor support]]
* [http://code.haskell.org/shim/ Shim]; the aim of the shim (Superior Haskell Interaction Mode) project is to provide better support for editing Haskell code in VIM and Emacs

== Other IDEs and Editors ==

The list below is incomplete. Please add to it with whatever you think of. This list should be expanded into sections, as above, with more details, with links to the actual documentation of the described features.

* Vim — '''PROS:''' Free. Works on Windows. Works in terminal. Decent alignment support. Tag-based completion and jumps. Very good syntax highlighting, flymake (via Syntastic), Cabal integration, Hoogle. Documentation for symbol at point '''CONS:''' Arcane, difficult for new users. Some complain of bad indentation support.
* [http://www.haskell.org/haskellwiki/Haskell_mode_for_Emacs Emacs]— '''PROS:''' Free. Works on Windows. Works in terminal. Decent alignment, indentation, syntax highlighting. Limited type information (type and info of name at point). Cabal/GHC/GHCi awareness and Haskell-aware REPL. Completion and jump-to-definition (via ETAGS). Documentation of symbol at point. Hoogle. Documentation for symbol at point. Flymake (error checking on the fly). '''CONS:''' Arcane, difficult for new users.
* Sublime — '''PROS:''' Works on Windows. '''CONS:''' Poor alignment support (though [http://www.reddit.com/r/haskell/comments/ts8fi/haskell_ides_emacs_vim_and_sublime_oh_my_opinions/c4pair1 there are packages] to do indentation a little better). Proprietary.
* [[Yi]] — '''PROS:''' Written in Haskell. Works in terminal. '''CONS:''' Very immature, lacking features. Problems building generally, especially on Windows.
* [http://www.haskell.org/haskellwiki/Leksah Leksah] — '''PROS:''' Syntax highlighting. Understands Cabal, Module browser, dependency knowledge, documentation display inside the IDE, jump-to-definition, flymake (error checking on the fly), limited evaluation of snippets, scratch buffer. Autocompletion. Not an arcane interface a la Emacs/Vim. '''CONS:''' Doesn't have a decent REPL. Are there any other cons? — This should be moved to the section above.
* [[Editors | Other Editors]
* [http://www.cs.kent.ac.uk/projects/heat/ HEAT:] An Interactive Development Environment for Learning & Teaching Haskell
* [http://www.geany.org/ Geany] '''PROS:''' Free. Works on Windows. Syntax highlighting, REPL. '''CONS:''' After using it for a while, Geany freezes quite often.

== Outdated ==

* [http://web.archive.org/web/20110726153330/http://hoovy.org/HaskellXcodePlugin/ plugin for Xcode] (links to the web archive)

=== [http://www.haskell.org/haskellwiki/HIDE hIDE] ===
:hIDE is a GUI-based Haskell IDE written using gtk+hs. It does not include an editor but instead interfaces with NEdit, vim or GNU emacs.

=== [http://www.haskell.org/haskellwiki/HIDE hIDE-2] ===
:Through the dark ages many a programmer has longed for the ultimate tool. In response to this most unnerving craving, of which we ourselves have had maybe more than our fair share, the dynamic trio of #Haskellaniacs (dons, dcoutts and Lemmih) hereby announce, to the relief of the community, that a fetus has been conceived: ''hIDE - the Haskell Integrated Development Environment''. So far the unborn integrates source code recognition and a chameleon editor, resenting these in a snappy gtk2 environment. Although no seer has yet predicted the date of birth of our hIDEous creature, we hope that the mere knowledge of its existence will spread peace of mind throughout the community as oil on troubled waters. See also: [[HIDE/Screenshots of HIDE]] and [[HIDE]]

=== [http://web.archive.org/web/20060213161530/http://www.students.cs.uu.nl/people/rjchaaft/JCreator/ JCreator with Haskell support] ===
: N.B. The link above is to the Wayback Machine (Web Archive); it seem that JCreator is no longer supported.
:JCreator is a highly customizable Java IDE for Windows. Features include extensive project support, fully customizable toolbars (including the images of user tools) and menus, increase/decrease indent for a selected block of text (tab/shift+tab respectively). The Haskell support module adds syntax highlighting for Haskell files and WinHugs, hugs, a static checker (if you double click on the error message, JCreator will jump to the right file and line and highlight it yellow) and the Haskell 98 Report as tools. Platforms: Win95, Win98, WinNT and Win2000 (only Win95 not tested yet). Size: 6MB. JCreator is a trademark of Xinox Software; Copyright © 2000 Xinox Software. The Haskell support module is made by Rijk-Jan van Haaften.

=== [[haste]] - Haskell TurboEdit ===
:haste - Haskell TurboEdit - was an IDE for the functional programming language Haskell, written in Haskell.

=== [http://www.haskell.org/visualhaskell Visual Haskell] ===
:Visual Haskell is a complete development environment for Haskell software, based on Microsoft's [http://www.microsoft.com/visualstudio/en-us Microsoft Visual Studio] platform. Visual Haskell integrates with the Visual Studio editor to provide interactive features to aid Haskell development, and it enables the construction of projects consisting of multiple Haskell modules, using the Cabal building/packaging infrastructure.

=== [http://www.cs.kent.ac.uk/projects/vital/ Vital] ===
:Vital is a visual programming environment. It is particularly intended for supporting the open-ended, incremental style of development often preferred by end users (engineers, scientists, analysts, etc.).

=== [http://www.cs.kent.ac.uk/projects/pivotal/ Pivotal] ===
:Pivotal 0.025 is an early prototype of a Vital-like environment for Haskell. Unlike Vital, however, Pivotal is implemented entirely in Haskell. The implementation is based on the use of the hs-plugins library to allow dynamic compilation and evaluation of Haskell expressions together with the gtk2hs library for implementing the GUI.

Tutorials

2015-10-11T03:56:06Z

Chak:

==Introductions to Haskell==

These are the recommended places to start learning, short of buying a [[Books#Textbooks|textbook]].

=== Best places to start ===

;[http://www.seas.upenn.edu/~cis194/lectures.html CIS 194: Introduction to Haskell (Spring 2013)]: An excellent tutorial to Haskell for beginners given as a course at UPenn by the author of the Typeclassopedia and Diagrams, Brent Yorgey. More compact than LYAH and RWH, but still communicates both basics and some notoriously unfamiliar concepts effectively.

;[http://learnyouahaskell.com Learn You a Haskell for Great Good! (LYAH)]
: Nicely illustrated tutorial showing Haskell concepts while interacting in GHCi. Written and drawn by Miran Lipovača.

;[http://book.realworldhaskell.org/ Real World Haskell (RWH)]
: A free online version of the complete book, with numerous reader-submitted comments. RWH is best suited for people who know the fundamentals of Haskell already, and can write basic Haskell programs themselves already. It makes a great follow up after finishing LYAH. It can easily be read cover-to-cover, or you can focus on the chapters that interest you most, or when you find an idea you don't yet understand.

;[http://en.wikibooks.org/wiki/Haskell/YAHT Yet Another Haskell Tutorial (YAHT)]
:By Hal Daume III et al. A recommended tutorial for Haskell that is still under construction but covers already much ground. Also a classic text.

;[http://en.wikibooks.org/wiki/Haskell Haskell Wikibook]
:A communal effort by several authors to produce the definitive Haskell textbook. It's very much a work in progress at the moment, and contributions are welcome. For 6 inch e-Readers/tablet computers, there is [http://commons.wikimedia.org/wiki/File:Haskell_eBook_Reader.pdf a PDF version of the book].

;[http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours Write Yourself a Scheme in 48 Hours in Haskell]
:A Haskell Tutorial, by Jonathan Tang. Most Haskell tutorials on the web seem to take a language-reference-manual approach to teaching. They show you the syntax of the language, a few language constructs, and then have you construct a few simple functions at the interactive prompt. The "hard stuff" of how to write a functioning, useful program is left to the end, or sometimes omitted entirely. This tutorial takes a different tack. You'll start off with command-line arguments and parsing, and progress to writing a fully-functional Scheme interpreter that implements a good-sized subset of R5RS Scheme. Along the way, you'll learn Haskell's I/O, mutable state, dynamic typing, error handling, and parsing features. By the time you finish, you should be fairly fluent in both Haskell and Scheme.

;[http://acm.wustl.edu/functional/haskell.php How to Learn Haskell]
:Some students at Washington University in St. Louis documented the path they took to learning Haskell and put together a nice meta-tutorial to guide beginners through some of the available resources. Experienced programmers looking for some quick code examples may be interested in their [http://acm.wustl.edu/functional/hs-breads.php breadcrumbs].

;[http://ohaskell.dshevchenko.biz/ О Haskell по-человечески]
:About Haskell from a beginner for beginners. Not an academical, but practical tutorial. Written by Denis Shevchenko in Russian.

=== Other tutorials ===

;[http://dev.stephendiehl.com/hask/ What I wish I knew when learning Haskell] :By Stephen Diehl. Does what it says on the tin. See [http://www.reddit.com/r/haskell/comments/23srcm/what_i_wish_i_knew_when_learning_haskell_20/ Reddit appreciation]

;[http://channel9.msdn.com/Series/C9-Lectures-Erik-Meijer-Functional-Programming-Fundamentals C9 Lectures: Erik Meijer - Functional Programming Fundamentals]
:A set of videos of lectures by Erik Meijer

;[http://www.yellosoft.us/evilgenius/ Haskell for the Evil Genius] :By Andrew Pennebaker. An overview of how functional and declarative programming can increase the accuracy and efficiency of digital superweapons, empowering evil geniuses in their supreme goal of taking over the world.

;[http://www.yellosoft.us/parallel-processing-with-haskell Parallel Processing with Haskell] :By Andrew Pennebaker. A short, accelerated introduction to Haskell for coding parallel programs.

;[http://www.yellosoft.us/getoptfu GetOptFu] :By Andrew Pennebaker. A guide to robust command line argument parsing in Haskell. Available online in HTML, and offline in ePUB and MOBI formats.

;[http://www.haskell.org/tutorial/ A [[Gentle]] Introduction to Haskell] :By Paul Hudak, John Peterson, and Joseph H. Fasel. The title is misleading. Some knowledge of another functional programming language is expected. The emphasis is on the type system and those features which are really new in Haskell (compared to other functional programming languages). A classic, but not for the faint of heart (it's not so gentle). Also available in [http://www.haskell.org/wikiupload//5/5e/GentleFR.pdf French] [http://gorgonite.developpez.com/livres/traductions/haskell/gentle-haskell/ from this website] and also [http://www.rsdn.ru/article/haskell/haskell_part1.xml in Russian].

;[[H-99: Ninety-Nine Haskell Problems]]
:A collection of programming puzzles, with Haskell solutions. Solving these is a great way to get into Haskell programming.

;[http://shuklan.com/haskell Undergraduate Haskell Lectures from the University of Virginia]
:An introductory set of slides full of example code for an undergraduate course in Haskell. Topics include basic list manipulations, higher order functions, cabal, the IO Monad, and Category Theory.

;[[Haskell Tutorial for C Programmers]]
:By Eric Etheridge. From the intro: "This tutorial assumes that the reader is familiar with C/C++, Python, Java, or Pascal. I am writing for you because it seems that no other tutorial was written to help students overcome the difficulty of moving from C/C++, Java, and the like to Haskell."

;[http://www.ibm.com/developerworks/linux/tutorials/l-hask/ Beginning Haskell]
:From IBM developerWorks. This tutorial targets programmers of imperative languages wanting to learn about functional programming in the language Haskell. If you have programmed in languages such as C, Pascal, Fortran, C++, Java, Cobol, Ada, Perl, TCL, REXX, JavaScript, Visual Basic, or many others, you have been using an imperative paradigm. This tutorial provides a gentle introduction to the paradigm of functional programming, with specific illustrations in the Haskell 98 language. (Free registration required.)

;[http://www.cse.chalmers.se/~rjmh/tutorials.html Tutorial Papers in Functional Programming].
:A collection of links to other Haskell tutorials, from John Hughes.

;[http://www.cs.ou.edu/~rlpage/fpclassCurrent/textbook/haskell.shtml Two Dozen Short Lessons in Haskell]
:By Rex Page. A draft of a textbook on functional programming, available by ftp. It calls for active participation from readers by omitting material at certain points and asking the reader to attempt to fill in the missing information based on knowledge they have already acquired. The missing information is then supplied on the reverse side of the page.

;[ftp://ftp.geoinfo.tuwien.ac.at/navratil/HaskellTutorial.pdf Haskell-Tutorial]
:By Damir Medak and Gerhard Navratil. The fundamentals of functional languages for beginners.

;[http://video.s-inf.de/#FP.2005-SS-Giesl.(COt).HD_Videoaufzeichnung Video Lectures]
:Lectures (in English) by Jürgen Giesl. About 30 hours in total, and great for learning Haskell. The lectures are 2005-SS-FP.V01 through 2005-SS-FP.V26. Videos 2005-SS-FP.U01 through 2005-SS-FP.U11 are exercise answer sessions, so you probably don't want those.

;[http://www.cs.utoronto.ca/~trebla/fp/ Albert's Functional Programming Course]
:A 15 lesson introduction to most aspects of Haskell.

;[http://www.iceteks.com/articles.php/haskell/1 Introduction to Haskell]
:By Chris Dutton, An "attempt to bring the ideas of functional programming to the masses here, and an experiment in finding ways to make it easy and interesting to follow".

;[http://www.csc.depauw.edu/~bhoward/courses/0203Spring/csc122/haskintro/ An Introduction to Haskell]
:A brief introduction, by Brian Howard.

;[http://www.linuxjournal.com/article/9096 Translating Haskell into English]
:By Shannon Behrens, a glimpse of the Zen of Haskell, without requiring that they already be Haskell converts.

;[http://www.shlomifish.org/lecture/Perl/Haskell/slides/ Haskell for Perl Programmers]
:Brief introduction to Haskell, with a view to what perl programmers are interested in

;[http://lisperati.com/haskell/ How To Organize a Picnic on a Computer]
:Fun introduction to Haskell, step by step building of a program to seat people at a planned picnic, based on their similarities using data from a survey and a map of the picnic location.

;[http://cs.wallawalla.edu/research/KU/PR/Haskell.html Haskell Tutorial]

;[http://www.lisperati.com/haskell/ Conrad Barski's Haskell tutorial .. with robots]

;[[Media:Introduction.pdf|Frederick Ross's Haskell introduction]]

;[http://de.wikibooks.org/wiki/Haskell Dirk's Haskell Tutorial]
:in German for beginners by a beginner. Not so deep, but with a lot examples with very small steps.

;[http://www.crsr.net/Programming_Languages/SoftwareTools/index.html Software Tools in Haskell]
:A tutorial for advanced readers

;[http://learn.hfm.io/ Learning Haskell]
:A comprehensive introduction to Haskell that combines text with screencasts. No previous knowledge of functional programming is required. The tutorial is still work in progress with additional chapters being added over time.

See also the discussion [http://www.reddit.com/r/haskell/comments/2blsqa/papers_every_haskeller_should_read/ Papers every haskeller should read].

== Motivation for using Haskell ==

;[http://www.cse.chalmers.se/~rjmh/Papers/whyfp.html Why Functional Programming Matters]
:By [http://www.cse.chalmers.se/~rjmh/ John Hughes], The Computer Journal, Vol. 32, No. 2, 1989, pp. 98 - 107. Also in: David A. Turner (ed.): Research Topics in Functional Programming, Addison-Wesley, 1990, pp. 17 - 42. Exposes the advantages of functional programming languages. Demonstrates how higher-order functions and lazy evaluation enable new forms of modularization of programs.

;[[Why Haskell matters]]
:Discussion of the advantages of using Haskell in particular. An excellent article.

;[http://www.youtube.com/watch?v=Fqi0Xu2Enaw Haskell Introduction]
:A video from FP Complete

;[http://www.cs.kent.ac.uk/pubs/1997/224/index.html Higher-order + Polymorphic = Reusable]
:By [http://www.cs.kent.ac.uk/people/staff/sjt/index.html Simon Thompson]. Unpublished, May 1997. Abstract: This paper explores how certain ideas in object oriented languages have their correspondents in functional languages. In particular we look at the analogue of the iterators of the C++ standard template library. We also give an example of the use of constructor classes which feature in Haskell 1.3 and Gofer.

;[http://www.ibm.com/developerworks/java/library/j-cb07186/index.html Explore functional programming with Haskell]
:Introduction to the benefits of functional programming in Haskell by Bruce Tate.

== Blog articles ==

There are a large number of tutorials covering diverse Haskell topics
published as blogs. Some of the best of these articles are collected
here:

;[[Blog articles]]

==Practical Haskell==

These tutorials examine using Haskell to writing complex real-world applications

;[http://research.microsoft.com/en-us/um/people/simonpj/Papers/marktoberdorf/ Tackling the awkward squad: monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell]
:Simon Peyton Jones. Presented at the 2000 Marktoberdorf Summer School. In "Engineering theories of software construction", ed Tony Hoare, Manfred Broy, Ralf Steinbruggen, IOS Press, ISBN 1-58603-1724, 2001, pp47-96. The standard reference for monadic IO in GHC/Haskell. Abstract:Functional programming may be beautiful, but to write real applications we must grapple with awkward real-world issues: input/output, robustness, concurrency, and interfacing to programs written in other languages.

;[[Hitchhikers Guide to the Haskell]]
: Tutorial for C/Java/OCaml/... programers by Dmitry Astapov. From the intro: "This text intends to introduce the reader to the practical aspects of Haskell from the very beginning (plans for the first chapters include: I/O, darcs, Parsec, QuickCheck, profiling and debugging, to mention a few)".

;[http://www.haskell.org/haskellwiki/IO_inside Haskell I/O inside: Down the Rabbit's Hole]
:By Bulat Ziganshin (2006), a comprehensive tutorial on using IO monad.

;[http://web.archive.org/web/20060622030538/http://www.reid-consulting-uk.ltd.uk/docs/ffi.html A Guide to Haskell's Foreign Function Interface]
:A guide to using the foreign function interface extension, using the rich set of functions in the Foreign libraries, design issues, and FFI preprocessors.

;[[Haskell IO for Imperative Programmers]]
:A short introduction to IO from the perspective of an imperative programmer.

;[[A brief introduction to Haskell|A Brief Introduction to Haskell]]
:A translation of the article, [http://www.cs.jhu.edu/~scott/pl/lectures/caml-intro.html Introduction to OCaml], to Haskell.

;[[Roll your own IRC bot]]
:This tutorial is designed as a practical guide to writing real world code in Haskell and hopes to intuitively motivate and introduce some of the advanced features of Haskell to the novice programmer, including monad transformers. Our goal is to write a concise, robust and elegant IRC bot in Haskell.

;[http://projects.haskell.org/gtk2hs/docs/tutorial/glade/ Glade Tutorial (GUI Programming)]
:For the absolute beginner in both Glade and Gtk2Hs. Covers the basics of Glade and how to access a .glade file and widgets in Gtk2Hs. Estimated learning time: 2 hours.
;[http://www.muitovar.com/glade/es-index.html Tutorial de Glade]
:A Spanish translation of the Glade tutorial

;[http://www.muitovar.com/gtk2hs/index.html Gtk2Hs Tutorial]
: An extensive [[Gtk2Hs]] programming guide, based on the GTK+2.0 tutorial by Tony Gale and Ian Main. This tutorial on GUI programming with Gtk2Hs has 22 chapters in 7 sections, plus an appendix on starting drawing with Cairo. A Spanish translation and source code of the examples are also available.

;Applications of Functional Programming
:Colin Runciman and David Wakeling (ed.), UCL Press, 1995, ISBN 1-85728-377-5 HB. From the cover:<blockquote>This book is unique in showcasing real, non-trivial applications of functional programming using the Haskell language. It presents state-of-the-art work from the FLARE project and will be an invaluable resource for advanced study, research and implementation.</blockquote>

;[[DealingWithBinaryData]] a guide to ByteStrings, the various <tt>Get</tt> monads and the <tt>Put</tt> monad.

;[[Internationalization of Haskell programs]]
:Short tutorial on how to use GNU gettext utility to make applications, written on Haskell, multilingual.

===Testing===

;[http://blog.moertel.com/articles/2006/10/31/introductory-haskell-solving-the-sorting-it-out-kata Small overview of QuickCheck]

;[[Introduction to QuickCheck]]

==Reference material==

;[http://www.haskell.org/haskellwiki/Category:Tutorials A growing list of Haskell tutorials on a diverse range of topics]
:Available on this wiki

;[http://www.haskell.org/haskellwiki/Category:How_to "How to"-style tutorials and information]

;[http://zvon.org/other/haskell/Outputglobal/index.html Haskell Reference]
:By Miloslav Nic.

;[http://members.chello.nl/hjgtuyl/tourdemonad.html A tour of the Haskell Monad functions]
:By Henk-Jan van Tuyl.

;[http://www.cse.unsw.edu.au/~en1000/haskell/inbuilt.html Useful Haskell functions]
:An explanation for beginners of many Haskell functions that are predefined in the Haskell Prelude.

;[http://www.haskell.org/ghc/docs/latest/html/libraries/ Documentation for the standard libraries]
:Complete documentation of the standard Haskell libraries.

;[http://www.haskell.org/haskellwiki/Category:Idioms Haskell idioms]
:A collection of articles describing some common Haskell idioms. Often quite advanced.

;[http://www.haskell.org/haskellwiki/Blow_your_mind Useful idioms]
:A collection of short, useful Haskell idioms.

;[http://www.haskell.org/haskellwiki/Programming_guidelines Programming guidelines]
:Some Haskell programming and style conventions.

;[http://www.cse.chalmers.se/~rjmh/Combinators/LightningTour/index.htm Lightning Tour of Haskell]
:By John Hughes, as part of a Chalmers programming course

;[http://vmg.pp.ua/books/КопьютерыИсети/_ИХТИК31G/single/Hall%20C.The%20little%20Haskeller.pdf The Little Haskeller]
:By Cordelia Hall and John Hughes. 9. November 1993, 26 pages. An introduction using the Chalmers Haskell B interpreter (hbi). Beware that it relies very much on the user interface of hbi which is quite different for other Haskell systems, and the tutorials cover Haskell 1.2 , not Haskell 98.

;[http://www.staff.science.uu.nl/~fokke101/courses/fp-eng.pdf Functional Programming]
:By Jeroen Fokker, 1995. (153 pages, 600 KB). Textbook for learning functional programming with Gofer (an older implementation of Haskell). Here without Chapters 6 and 7.

== Comparisons to other languages ==

Articles contrasting feature of Haskell with other languages.

;[http://programming.reddit.com/goto?id=nq1k Haskell versus Scheme]
:Mark C. Chu-Carroll, Haskell and Scheme: Which One and Why?

;[http://wiki.python.org/moin/PythonVsHaskell Comparing Haskell and Python]
:A short overview of similarities and differences between Haskell and Python.

;[http://programming.reddit.com/goto?id=nwm2 Monads in OCaml]
:Syntax extension for monads in OCaml

;[http://www.shlomifish.org/lecture/Perl/Haskell/slides/ Haskell for Perl programmers]
:Short intro for perlers

;[[A_brief_introduction_to_Haskell|Introduction to Haskell]] versus [http://www.cs.jhu.edu/~scott/pl/lectures/caml-intro.html Introduction to OCaml].

;[http://www.thaiopensource.com/relaxng/derivative.html An algorithm for RELAX NG validation]
:by James Clark (of RELAX NG fame). Describes an algorithm for validating an XML document against a RELAX NG schema, uses Haskell to describe the algorithm. The algorithm in Haskell and Java is then [http://www.donhopkins.com/drupal/node/117 discussed here].

;[http://blog.prb.io/first-steps-with-haskell-for-web-applications.html Haskell + FastCGI versus Ruby on Rails]
:A short blog entry documenting performance results with ruby on rails and Haskell with fastcgi

;[http://haskell.cs.yale.edu/wp-content/uploads/2011/03/HaskellVsAda-NSWC.pdf Haskell vs. Ada vs. C++ vs. Awk vs. ..., An Experiment in Software Prototyping Productivity] (PDF)
:Paul Hudak and Mark P. Jones, 16 pages.<blockquote>Description of the results of an experiment in which several conventional programming languages, together with the functional language Haskell, were used to prototype a Naval Surface Warfare Center requirement for Geometric Region Servers. The resulting programs and development metrics were reviewed by a committee chosen by the US Navy. The results indicate that the Haskell prototype took significantly less time to develop and was considerably more concise and easier to understand than the corresponding prototypes written in several different imperative languages, including Ada and C++. </blockquote>

;[http://www.osl.iu.edu/publications/prints/2003/comparing_generic_programming03.pdf A Comparative Study of Language Support for Generic Programming] (pdf)
:Ronald Garcia, Jaakko Jrvi, Andrew Lumsdaine, Jeremy G. Siek, and Jeremiah Willcock. In Proceedings of the 2003 ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications (OOPSLA'03), October 2003.<blockquote>An interesting comparison of generic programming support across languages, including: Haskell, SML, C++, Java, C#. Haskell supports all constructs described in the paper -- the only language to do so. </blockquote>

;[http://homepages.inf.ed.ac.uk/wadler/realworld/index.html Functional Programming in the Real World]
:A list of functional programs applied to real-world tasks. The main criterion for being real-world is that the program was written primarily to perform some task, not primarily to experiment with functional programming. Functional is used in the broad sense that includes both `pure' programs (no side effects) and `impure' (some use of side effects). Languages covered include CAML, Clean, Erlang, Haskell, Miranda, Scheme, SML, and others.

;[http://www.defmacro.org/ramblings/lisp-in-haskell.html Lisp in Haskell]
:Writing A Lisp Interpreter In Haskell, a tutorial

;[http://bendyworks.com/geekville/articles/2012/12/from-ruby-to-haskell-part-1-testing From Ruby to Haskell, Part 1: Testing]
:A quick comparison between ruby's and haskell's BDD.

== Teaching Haskell ==

;[http://www.cs.kent.ac.uk/pubs/1997/208/index.html Where do I begin? A problem solving approach to teaching functional programming]
:By [http://www.cs.kent.ac.uk/people/staff/sjt/index.html Simon Thompson]. In Krzysztof Apt, Pieter Hartel, and Paul Klint, editors, First International Conference on Declarative Programming Languages in Education. Springer-Verlag, September 1997. Abstract: This paper introduces a problem solving method for teaching functional programming, based on Polya's `How To Solve It', an introductory investigation of mathematical method. We first present the language independent version, and then show in particular how it applies to the development of programs in Haskell. The method is illustrated by a sequence of examples and a larger case study.

;[http://www.cs.kent.ac.uk/pubs/1995/214/index.html Functional programming through the curriculum]
:By [http://www.cs.kent.ac.uk/people/staff/sjt/index.html Simon Thompson] and Steve Hill. In Pieter H. Hartel and Rinus Plasmeijer, editors, Functional Programming Languages in Education, LNCS 1022, pages 85-102. Springer-Verlag, December 1995. Abstract: This paper discusses our experience in using a functional language in topics across the computer science curriculum. After examining the arguments for taking a functional approach, we look in detail at four case studies from different areas: programming language semantics, machine architectures, graphics and formal languages.

;[http://www.cse.unsw.edu.au/~chak/papers/CK02a.html The Risks and Benefits of Teaching Purely Functional Programming in First Year]
:By [http://www.cse.unsw.edu.au/~chak/ Manuel M. T. Chakravarty] and [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]. Journal of Functional Programming 14(1), pp 113-123, 2004. An earlier version of this paper was presented at Functional and Declarative Programming in Education (FDPE02). Abstract We argue that teaching purely functional programming as such in freshman courses is detrimental to both the curriculum as well as to promoting the paradigm. Instead, we need to focus on the more general aims of teaching elementary techniques of programming and essential concepts of computing. We support this viewpoint with experience gained during several semesters of teaching large first-year classes (up to 600 students) in Haskell. These classes consisted of computer science students as well as students from other disciplines. We have systematically gathered student feedback by conducting surveys after each semester. This article contributes an approach to the use of modern functional languages in first year courses and, based on this, advocates the use of functional languages in this setting.

==Using monads==

;[http://www.haskell.org/wikiupload/c/c6/ICMI45-paper-en.pdf How to build a monadic interpreter in one day] (PDF)
:By Dan Popa. A small tutorial on how to build a language in one day, using the Parser Monad in the front end and a monad with state and I/O string in the back end. Read it if you are interested in learning:
:# language construction and
:# interpreter construction

;[[Monad Transformers Explained]]

;[[MonadCont under the hood]]
:A detailed description of the ''Cont'' data type and its monadic operations, including the class ''MonadCont''.

;[http://en.wikipedia.org/wiki/Monads_in_functional_programming Article on monads on Wikipedia]

;[[IO inside]] page
:Explains why I/O in Haskell is implemented with a monad.

;[http://stefan-klinger.de/files/monadGuide.pdf The Haskell Programmer's Guide to the IO Monad - Don't Panic.]
:By Stefan Klinger. This report scratches the surface of category theory, an abstract branch of algebra, just deep enough to find the monad structure. It seems well written.

;[https://karczmarczuk.users.greyc.fr/TEACH/Doc/monads.html Systematic Design of Monads]
:By John Hughes and Magnus Carlsson. Many useful monads can be designed in a systematic way, by successively adding facilities to a trivial monad. The capabilities that can be added in this way include state, exceptions, backtracking, and output. Here we give a brief description of the trivial monad, each kind of extension, and sketches of some interesting operations that each monad supports.

;[[Simple monad examples]]

See also:

* the [[Monad]] HaskellWiki page
* [[Research papers/Monads and arrows]].
* [[Blog articles#Monads |Blog articles]]
* [[Monad tutorials timeline]]

===Tutorials===

''The comprehensive list is available at [[Monad tutorials timeline]].''

;[http://mvanier.livejournal.com/3917.html Mike Vanier's monad tutorial]
:Recommended by David Balaban.

;[[All About Monads]], [http://www.sampou.org/haskell/a-a-monads/html/index.html モナドのすべて]
:By Jeff Newbern. This tutorial aims to explain the concept of a monad and its application to functional programming in a way that is easy to understand and useful to beginning and intermediate Haskell programmers. Familiarity with the Haskell language is assumed, but no prior experience with monads is required.

;[[Monads as computation]]
:A tutorial which gives a broad overview to motivate the use of monads as an abstraction in functional programming and describe their basic features. It makes an attempt at showing why they arise naturally from some basic premises about the design of a library.

;[[Monads as containers]]
:A tutorial describing monads from a rather different perspective: as an abstraction of container-types, rather than an abstraction of types of computation.

;[http://www.grabmueller.de/martin/www/pub/Transformers.en.html Monad Transformers Step by Step]
:By Martin Grabmüller. A small tutorial on using monad transformers. In contrast to others found on the web, it concentrates on using them, not on their implementation.

;[[What a Monad is not]]

;[http://noordering.wordpress.com/2009/03/31/how-you-shouldnt-use-monad/ How you should(n’t) use Monad]

;[http://www-users.mat.uni.torun.pl/~fly/materialy/fp/haskell-doc/Monads.html What the hell are Monads?]
:By Noel Winstanley. A basic introduction to monads, monadic programming and IO. This introduction is presented by means of examples rather than theory, and assumes a little knowledge of Haskell.

;[http://www.engr.mun.ca/~theo/Misc/haskell_and_monads.htm Monads for the Working Haskell Programmer -- a short tutorial]
:By Theodore Norvell.

;[http://blog.sigfpe.com/2006/08/you-could-have-invented-monads-and.html You Could Have Invented Monads! (And Maybe You Already Have.)]
:A short tutorial on monads, introduced from a pragmatic approach, with less category theory references

;[[Meet Bob The Monadic Lover]]
:By Andrea Rossato. A humorous and short introduction to Monads, with code but without any reference to category theory: what monads look like and what they are useful for, from the perspective of a ... lover. (There is also the slightly more serious [[The Monadic Way]] by the same author.)

;[http://www.haskell.org/pipermail/haskell-cafe/2006-November/019190.html Monstrous Monads]
:Andrew Pimlott's humourous introduction to monads, using the metaphor of "monsters".

;[http://strabismicgobbledygook.wordpress.com/2010/03/06/a-state-monad-tutorial/ A State Monad Tutorial]
:A detailed tutorial with simple but practical examples.

;[http://www.reddit.com/r/programming/comments/ox6s/ask_reddit_what_the_hell_are_monads/coxiv Ask Reddit: What the hell are monads? answer by tmoertel] and [http://programming.reddit.com/info/ox6s/comments/coxoh dons].

;[[The Monadic Way]]

;[http://www.alpheccar.org/content/60.html Three kind of monads] : sequencing, side effects or containers

;[http://www.muitovar.com/monad/moncow.html The Greenhorn's Guide to becoming a Monad Cowboy]
:Covers basics, with simple examples, in a ''for dummies'' style. Includes monad transformers and monadic functions. Estimated learning time 2-3 days.

;[http://ertes.de/articles/monads.html Understanding Haskell Monads]

;[http://www.reddit.com/r/programming/comments/64th1/monads_in_python_in_production_code_you_can_and/c02u9mb An explanation by 808140]

==Workshops on advanced functional programming==

;[http://compilers.iecc.com/comparch/article/95-04-024 Advanced Functional Programming: 1st International Spring School on Advanced Functional Programming Techniques], Bastad, Sweden, May 24 - 30, 1995. Tutorial Text (Lecture Notes in Computer Science)

;[http://alfa.di.uminho.pt/~afp98/ Advanced Functional Programming: 3rd International School], AFP'98, Braga, Portugal, September 12-19, 1998, Revised Lectures (Lecture Notes in Computer Science)

;[http://www.staff.science.uu.nl/~jeuri101/afp/afp4/ Advanced Functional Programming: 4th International School], AFP 2002, Oxford, UK, August 19-24, 2002, Revised Lectures (Lecture Notes in Computer Science)

;[http://www.cs.ut.ee/afp04/ Advanced Functional Programming: 5th International School], AFP 2004, Tartu, Estonia, August 14-21, 2004, Revised Lectures (Lecture Notes in Computer Science)

More advanced materials available from the [[Conferences|conference proceedings]], and the [[Research papers]] collection.

[[Category:Tutorials]]

GHC/Data Parallel Haskell

2012-02-10T01:34:13Z

Chak: /* Overview */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

Data Parallel Haskell (DPH) is available as an add-on for [http://haskell.org/ghc/download_ghc_7_4_1 GHC 7.4] in the form of a few separate cabal package. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude for vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed or no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code.

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://repa.ouroborus.net/ Repa], which builds on the parallel array infrastructure of DPH.

'''Note:''' This page describes version 0.6.* of the DPH libraries. We only support this version of DPH as well as the current development version.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, install [http://haskell.org/ghc/download_ghc_7_4_1 GHC 7.4] and then install the DPH libraries with <code>cabal install</code> as follows:
<blockquote>
<code>$ cabal update</code> 
<code>$ cabal install dph-examples</code>
</blockquote>
This will install all DPH packages, including a set of simple examples, see [http://hackage.haskell.org/package/dph-examples dph-examples]. (The package [http://hackage.haskell.org/package/dph-examples dph-examples] does depend on OpenGL and Gloss as both are used in a visualiser for the nobody example.)

'''WARNING:''' The vanilla GHC distribution does '''not''' include <code>cabal install</code>. This is in contrast to the Haskell Platform, which does include <code>cabal install</code>. If you want to avoid installing the <code>cabal-intstall</code> package and its dependencies explicitly, simply install GHC 7.4.1 in addition to your current Haskell Platform installation. (How to do that depends on your platform and personal preferences. One option is to install a bindist into your home directory with symbolic links to the binaries including the version number.) Then, install DPH with the following command:
<blockquote>
<code>cabal install --with-compiler=`which ghc-7.4.1` --with-hc-pkg=`which ghc-pkg-7.4.1` dph-examples</code>
</blockquote>

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by using a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# OPTIONS_GHC -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:
<blockquote>
<code>ghc -c -Odph -fdph-par DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph -fdph-par Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.

==== Generating input data ====

To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile and link the program as described above.

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].

==== Parallel execution ====

By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.) To determine the runtime of parallel code, measuring CPU time, as demonstrated in the [[GHC/Data Parallel Haskell/MainTimed|timed variant of the dot product example]], is not sufficient anymore. We need to measure wall clock times instead. For proper benchmarking, it is advisable to use a library, such as [http://hackage.haskell.org/package/criterion criterion].

A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.

=== Further examples and documentation ===

Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking.

The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph-par library documentation] on Hackage.

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2012-02-06T03:57:58Z

Chak: /* Project status */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

Data Parallel Haskell (DPH) is available as an add-on for [http://haskell.org/ghc/download_ghc_7_4_1 GHC 7.4] in the form of a few separate cabal package. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude for vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed or no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code.

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://repa.ouroborus.net/ Repa], which builds on the parallel array infrastructure of DPH.

'''Note:''' This page describes version 0.6.* of the DPH libraries. We only support this version of DPH as well as the current development version.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, install [http://haskell.org/ghc/download_ghc_7_4_1 GHC 7.4] and then install the DPH libraries with <code>cabal install</code> as follows:
<blockquote>
<code>$ cabal update</code> 
<code>$ cabal install dph-examples</code>
</blockquote>
This will install all DPH packages, including a set of simple examples, see [http://hackage.haskell.org/package/dph-examples dph-examples]. (The package [http://hackage.haskell.org/package/dph-examples dph-examples] does depend on OpenGL and Gloss as both are used in a visualiser for the nobody example.)

'''WARNING:''' The vanilla GHC distribution does '''not''' include <code>cabal install</code>. This is in contrast to the Haskell Platform, which does include <code>cabal install</code>. If you want to avoid installing the <code>cabal-intstall</code> package and its dependencies explicitly, simply install GHC 7.4.1 in addition to your current Haskell Platform installation. (How to do that depends on your platform and personal preferences. One option is to install a bindist into your home directory with symbolic links to the binaries including the version number.) Then, install DPH with the following command:
<blockquote>
<code>cabal install --with-compiler=`which ghc-7.4.1` --with-hc-pkg=`which ghc-pkg-7.4.1` dph-examples</code>
</blockquote>

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# OPTIONS_GHC -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:
<blockquote>
<code>ghc -c -Odph -fdph-par DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph -fdph-par Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.

==== Generating input data ====

To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile and link the program as described above.

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].

==== Parallel execution ====

By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.) To determine the runtime of parallel code, measuring CPU time, as demonstrated in the [[GHC/Data Parallel Haskell/MainTimed|timed variant of the dot product example]], is not sufficient anymore. We need to measure wall clock times instead. For proper benchmarking, it is advisable to use a library, such as [http://hackage.haskell.org/package/criterion criterion].

A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.

=== Further examples and documentation ===

Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking.

The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph-par library documentation] on Hackage.

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2012-02-06T03:54:54Z

Chak: /* Where to get it */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

Data Parallel Haskell (DPH) is available as an add-on for [http://haskell.org/ghc/download_ghc_7_4_1 GHC 7.4] in the form of a few separate cabal package. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude for vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed or no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code.

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://repa.ouroborus.net/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, install [http://haskell.org/ghc/download_ghc_7_4_1 GHC 7.4] and then install the DPH libraries with <code>cabal install</code> as follows:
<blockquote>
<code>$ cabal update</code> 
<code>$ cabal install dph-examples</code>
</blockquote>
This will install all DPH packages, including a set of simple examples, see [http://hackage.haskell.org/package/dph-examples dph-examples]. (The package [http://hackage.haskell.org/package/dph-examples dph-examples] does depend on OpenGL and Gloss as both are used in a visualiser for the nobody example.)

'''WARNING:''' The vanilla GHC distribution does '''not''' include <code>cabal install</code>. This is in contrast to the Haskell Platform, which does include <code>cabal install</code>. If you want to avoid installing the <code>cabal-intstall</code> package and its dependencies explicitly, simply install GHC 7.4.1 in addition to your current Haskell Platform installation. (How to do that depends on your platform and personal preferences. One option is to install a bindist into your home directory with symbolic links to the binaries including the version number.) Then, install DPH with the following command:
<blockquote>
<code>cabal install --with-compiler=`which ghc-7.4.1` --with-hc-pkg=`which ghc-pkg-7.4.1` dph-examples</code>
</blockquote>

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# OPTIONS_GHC -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:
<blockquote>
<code>ghc -c -Odph -fdph-par DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph -fdph-par Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.

==== Generating input data ====

To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile and link the program as described above.

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].

==== Parallel execution ====

By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.) To determine the runtime of parallel code, measuring CPU time, as demonstrated in the [[GHC/Data Parallel Haskell/MainTimed|timed variant of the dot product example]], is not sufficient anymore. We need to measure wall clock times instead. For proper benchmarking, it is advisable to use a library, such as [http://hackage.haskell.org/package/criterion criterion].

A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.

=== Further examples and documentation ===

Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking.

The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph-par library documentation] on Hackage.

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2012-02-06T03:51:04Z

Chak: /* Project status */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

Data Parallel Haskell (DPH) is available as an add-on for [http://haskell.org/ghc/download_ghc_7_4_1 GHC 7.4] in the form of a few separate cabal package. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude for vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed or no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code.

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://repa.ouroborus.net/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, install [http://haskell.org/ghc/download_ghc_7_2_1 GHC 7.2] and then install the DPH libraries with <code>cabal install</code> as follows:
<blockquote>
<code>$ cabal update</code> 
<code>$ cabal install dph-examples</code>
</blockquote>
This will install all DPH packages, including a set of simple examples, see [http://hackage.haskell.org/package/dph-examples dph-examples]. (The package [http://hackage.haskell.org/package/dph-examples dph-examples] does depend on OpenGL and Gloss as both are used in a visualised for the nobody example.)

'''WARNING:''' The vanilla GHC distribution does '''not''' include <code>cabal install</code>. This is in contrast to the Haskell Platform, which does include <code>cabal install</code>. If you want to avoid installing the <code>cabal-intstall</code> package and its dependencies explicitly, simply install GHC 7.2.1 in addition to your current Haskell Platform installation. (How to do that depends on your platform and personal preferences. One option is to install a bindist into your home directory with symbolic links to the binaries including the version number.) Then, install DPH with the following command:
<blockquote>
<code>cabal install --with-compiler=`which ghc-7.2.1` --with-hc-pkg=`which ghc-pkg-7.2.1` dph-examples</code>
</blockquote>

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# OPTIONS_GHC -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:
<blockquote>
<code>ghc -c -Odph -fdph-par DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph -fdph-par Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.

==== Generating input data ====

To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile and link the program as described above.

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].

==== Parallel execution ====

By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.) To determine the runtime of parallel code, measuring CPU time, as demonstrated in the [[GHC/Data Parallel Haskell/MainTimed|timed variant of the dot product example]], is not sufficient anymore. We need to measure wall clock times instead. For proper benchmarking, it is advisable to use a library, such as [http://hackage.haskell.org/package/criterion criterion].

A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.

=== Further examples and documentation ===

Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking.

The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph-par library documentation] on Hackage.

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

HakkuTaikai/Attendees

2011-08-30T00:34:12Z

Chak: /* HakkuTaikai Attendees */

= HakkuTaikai Attendees =

The venue is not open to the public on the day of the Hackathon, so we must submit a list of names to security. If you're name is not on the list, you may not be admitted.

If you do not wish to announce your attendance in public, please email haskathon★liyang.hu instead.

{| class="wikitable"
!Nickname
!Real Name
!Affiliation
!Mobile
|-
| liyang
| Liyang HU
| Tsuru Capital LLC
| +81 80 4361 1307
|-
| kfish
| [[User:ConradParker|Conrad Parker]]
| Tsuru Capital SG Pte Ltd
| +81 80 4162 1307
|-
| erikde/m3ga
| [[User:Erik de Castro Lopo|Erik de Castro Lopo]]
| bCODE Pty Ltd
| +61 400 912 480
|-
| kazu
| Kazu Yamamoto
| IIJ
| not public
|-
| tibbe
| Johan Tibell
| Google
| not public
|-
| lpeterse
| Lars Petersen
| -
| not public
|-
| TacticalGrace
| [[User:chak|Manuel Chakravarty]]
| University of New South Wales
| not public
|}

GHC/Data Parallel Haskell

2011-08-11T13:42:46Z

Chak: /* Where to get it */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

Data Parallel Haskell (DPH) is available as an add-on for [http://haskell.org/ghc/download_ghc_7_2_1 GHC 7.2] in the form of a few separate cabal package. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used and certain idioms are avoided. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, install [http://haskell.org/ghc/download_ghc_7_2_1 GHC 7.2] and then install the DPH libraries with <code>cabal install</code> as follows:
<blockquote>
<code>$ cabal update</code> 
<code>$ cabal install dph-examples</code>
</blockquote>
This will install all DPH packages, including a set of simple examples, see [http://hackage.haskell.org/package/dph-examples dph-examples]. (The package [http://hackage.haskell.org/package/dph-examples dph-examples] does depend on OpenGL and Gloss as both are used in a visualised for the nobody example.)

'''WARNING:''' The vanilla GHC distribution does '''not''' include <code>cabal install</code>. This is in contrast to the Haskell Platform, which does include <code>cabal install</code>. If you want to avoid installing the <code>cabal-intstall</code> package and its dependencies explicitly, simply install GHC 7.2.1 in addition to your current Haskell Platform installation. (How to do that depends on your platform and personal preferences. One option is to install a bindist into your home directory with symbolic links to the binaries including the version number.) Then, install DPH with the following command:
<blockquote>
<code>cabal install --with-compiler=`which ghc-7.2.1` --with-hc-pkg=`which ghc-pkg-7.2.1` dph-examples</code>
</blockquote>

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# OPTIONS_GHC -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:
<blockquote>
<code>ghc -c -Odph -fdph-par DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph -fdph-par Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.

==== Generating input data ====

To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile and link the program as described above.

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].

==== Parallel execution ====

By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.) To determine the runtime of parallel code, measuring CPU time, as demonstrated in the [[GHC/Data Parallel Haskell/MainTimed|timed variant of the dot product example]], is not sufficient anymore. We need to measure wall clock times instead. For proper benchmarking, it is advisable to use a library, such as [http://hackage.haskell.org/package/criterion criterion].

A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.

=== Further examples and documentation ===

Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking.

The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph-par library documentation] on Hackage.

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2011-08-11T12:52:27Z

Chak: /* Further examples and documentation */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

Data Parallel Haskell (DPH) is available as an add-on for [http://haskell.org/ghc/download_ghc_7_2_1 GHC 7.2] in the form of a few separate cabal package. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used and certain idioms are avoided. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, install [http://haskell.org/ghc/download_ghc_7_2_1 GHC 7.2] and then install the DPH libraries with <code>cabal install</code> as follows:
<blockquote>
<code>$ cabal update</code> 
<code>$ cabal install dph-examples</code>
</blockquote>
This will install all DPH packages, including a set of simple examples, see [http://hackage.haskell.org/package/dph-examples dph-examples]. (The package [http://hackage.haskell.org/package/dph-examples dph-examples] does depend on OpenGL and Gloss as both are used in a visualised for the nobody example.)

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# OPTIONS_GHC -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:
<blockquote>
<code>ghc -c -Odph -fdph-par DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph -fdph-par Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.

==== Generating input data ====

To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile and link the program as described above.

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].

==== Parallel execution ====

By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.) To determine the runtime of parallel code, measuring CPU time, as demonstrated in the [[GHC/Data Parallel Haskell/MainTimed|timed variant of the dot product example]], is not sufficient anymore. We need to measure wall clock times instead. For proper benchmarking, it is advisable to use a library, such as [http://hackage.haskell.org/package/criterion criterion].

A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.

=== Further examples and documentation ===

Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking.

The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph-par library documentation] on Hackage.

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2011-08-11T12:49:32Z

Chak: /* Where to get it */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

Data Parallel Haskell (DPH) is available as an add-on for [http://haskell.org/ghc/download_ghc_7_2_1 GHC 7.2] in the form of a few separate cabal package. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used and certain idioms are avoided. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, install [http://haskell.org/ghc/download_ghc_7_2_1 GHC 7.2] and then install the DPH libraries with <code>cabal install</code> as follows:
<blockquote>
<code>$ cabal update</code> 
<code>$ cabal install dph-examples</code>
</blockquote>
This will install all DPH packages, including a set of simple examples, see [http://hackage.haskell.org/package/dph-examples dph-examples]. (The package [http://hackage.haskell.org/package/dph-examples dph-examples] does depend on OpenGL and Gloss as both are used in a visualised for the nobody example.)

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# OPTIONS_GHC -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:
<blockquote>
<code>ghc -c -Odph -fdph-par DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph -fdph-par Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.

==== Generating input data ====

To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile and link the program as described above.

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].

==== Parallel execution ====

By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.) To determine the runtime of parallel code, measuring CPU time, as demonstrated in the [[GHC/Data Parallel Haskell/MainTimed|timed variant of the dot product example]], is not sufficient anymore. We need to measure wall clock times instead. For proper benchmarking, it is advisable to use a library, such as [http://hackage.haskell.org/package/criterion criterion].

A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.

=== Further examples and documentation ===

Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking.

The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph library documentation] on Hackage (which will be uploaded with the GHC 7.2 DPH release).

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2011-08-11T12:43:17Z

Chak: /* Where to get it */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

Data Parallel Haskell (DPH) is available as an add-on for [http://haskell.org/ghc/download_ghc_7_2_1 GHC 7.2] in the form of a few separate cabal package. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used and certain idioms are avoided. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, install [http://haskell.org/ghc/download_ghc_7_2_1 GHC 7.2] and then install the DPH libraries with <hask>cabal install<hask> as follows:

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# OPTIONS_GHC -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:
<blockquote>
<code>ghc -c -Odph -fdph-par DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph -fdph-par Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.

==== Generating input data ====

To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile and link the program as described above.

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].

==== Parallel execution ====

By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.) To determine the runtime of parallel code, measuring CPU time, as demonstrated in the [[GHC/Data Parallel Haskell/MainTimed|timed variant of the dot product example]], is not sufficient anymore. We need to measure wall clock times instead. For proper benchmarking, it is advisable to use a library, such as [http://hackage.haskell.org/package/criterion criterion].

A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.

=== Further examples and documentation ===

Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking.

The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph library documentation] on Hackage (which will be uploaded with the GHC 7.2 DPH release).

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2011-08-11T12:36:52Z

Chak: /* Project status */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

Data Parallel Haskell (DPH) is available as an add-on for [http://haskell.org/ghc/download_ghc_7_2_1 GHC 7.2] in the form of a few separate cabal package. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used and certain idioms are avoided. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# OPTIONS_GHC -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:
<blockquote>
<code>ghc -c -Odph -fdph-par DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph -fdph-par Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.

==== Generating input data ====

To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile and link the program as described above.

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].

==== Parallel execution ====

By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.) To determine the runtime of parallel code, measuring CPU time, as demonstrated in the [[GHC/Data Parallel Haskell/MainTimed|timed variant of the dot product example]], is not sufficient anymore. We need to measure wall clock times instead. For proper benchmarking, it is advisable to use a library, such as [http://hackage.haskell.org/package/criterion criterion].

A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.

=== Further examples and documentation ===

Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking.

The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph library documentation] on Hackage (which will be uploaded with the GHC 7.2 DPH release).

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2011-03-31T09:46:10Z

Chak: /* Compiling vectorised code */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.2. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# OPTIONS_GHC -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:
<blockquote>
<code>ghc -c -Odph -fdph-par DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph -fdph-par Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.

==== Generating input data ====

To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile and link the program as described above.

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].

==== Parallel execution ====

By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.) To determine the runtime of parallel code, measuring CPU time, as demonstrated in the [[GHC/Data Parallel Haskell/MainTimed|timed variant of the dot product example]], is not sufficient anymore. We need to measure wall clock times instead. For proper benchmarking, it is advisable to use a library, such as [http://hackage.haskell.org/package/criterion criterion].

A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.

=== Further examples and documentation ===

Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking.

The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph library documentation] on Hackage (which will be uploaded with the GHC 7.2 DPH release).

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2011-01-25T05:50:02Z

Chak: /* Parallel execution */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.2. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# GHC_OPTIONS -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:
<blockquote>
<code>ghc -c -Odph -fdph-par DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph -fdph-par Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.

==== Generating input data ====

To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile and link the program as described above.

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].

==== Parallel execution ====

By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.) To determine the runtime of parallel code, measuring CPU time, as demonstrated in the [[GHC/Data Parallel Haskell/MainTimed|timed variant of the dot product example]], is not sufficient anymore. We need to measure wall clock times instead. For proper benchmarking, it is advisable to use a library, such as [http://hackage.haskell.org/package/criterion criterion].

A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.

=== Further examples and documentation ===

Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking.

The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph library documentation] on Hackage (which will be uploaded with the GHC 7.2 DPH release).

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2011-01-25T05:37:26Z

Chak: /* Further examples */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.2. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# GHC_OPTIONS -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:
<blockquote>
<code>ghc -c -Odph -fdph-par DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph -fdph-par Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.

==== Generating input data ====

To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile and link the program as described above.

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].

==== Parallel execution ====

By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.)

A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.

=== Further examples and documentation ===

Further examples are available in the [http://darcs.haskell.org/packages/dph/dph-examples/ examples directory of the package dph source]. This code also includes reference implementations for some of the example that are useful for benchmarking.

The interfaces of the various components of the DPH library are in the [http://hackage.haskell.org/package/dph library documentation] on Hackage (which will be uploaded with the GHC 7.2 DPH release).

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2011-01-25T05:27:48Z

Chak: /* Parallel execution */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.2. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# GHC_OPTIONS -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:
<blockquote>
<code>ghc -c -Odph -fdph-par DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph -fdph-par Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.

==== Generating input data ====

To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile and link the program as described above.

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].

==== Parallel execution ====

By default, a Haskell program uses only one OS thread, and hence, also only one CPU core for execution. To use multiple cores, we need to invoke the executable with an explicit RTS command line option — e.g., <code>./dotp +RTS -N2</code> uses two cores. (Strictly speaking, it uses two OS threads, which will be scheduled on two separate cores if available.)

A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with just one core and to move to multiple cores only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is often bad as GHC's runtime makes no effort at optimising placement.

=== Further examples ===

Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.

The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2011-01-25T04:54:44Z

Chak: /* Using vectorised code */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.2. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# GHC_OPTIONS -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:
<blockquote>
<code>ghc -c -Odph -fdph-par DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph -fdph-par Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded -fdph-par -rtsopts DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend. We include <code>-rtsopts</code> to be able to explicitly determine the number of OS threads used to execute our code.

==== Generating input data ====

To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile and link the program as described above.

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].

==== Parallel execution ====

The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code.

In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.

=== Further examples ===

Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.

The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell/MainTimed

2011-01-25T04:34:57Z

Chak: New page: The following variant of the main module for the dot product example determines and prints the runtime of the dot product kernel in microseconds. <haskell> import System.CPUTime (getCPUTim...

The following variant of the main module for the dot product example determines and prints the runtime of the dot product kernel in microseconds.
<haskell>
import System.CPUTime (getCPUTime)
import System.Random (newStdGen)
import Control.Exception (evaluate)
import Data.Array.Parallel.PArray (PArray, randomRs, nf)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
-- generate random input vectors
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2

-- force the evaluation of the input vectors
evaluate $ nf v
evaluate $ nf w

-- timed computations
start <- getCPUTime
let result = dotp_wrapper v w
evaluate result
end <- getCPUTime

-- print the result
putStrLn $ show result ++ " in " ++ show ((end - start) `div` 1000000) ++ "us"
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>

GHC/Data Parallel Haskell

2011-01-25T04:32:51Z

Chak: /* Generating input data */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.2. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# GHC_OPTIONS -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:
<blockquote>
<code>ghc -c -Odph -fdph-par DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph -fdph-par Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded -fdph-par DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend.

==== Generating input data ====

To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile and link the program as described above.

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. For a variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask>, see [[GHC/Data Parallel Haskell/MainTimed|timed dot product]].

==== Parallel execution ====

The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code.

In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.

=== Further examples ===

Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.

The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2011-01-25T04:30:07Z

Chak: /* Generating input data */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.2. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# GHC_OPTIONS -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:
<blockquote>
<code>ghc -c -Odph -fdph-par DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph -fdph-par Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded -fdph-par DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend.

==== Generating input data ====

To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile and link the program as described above.

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose. A variant of the dot-product example code that determines the CPU time consumed by <hask>dotp_wrapper</hask> is at [wiki:GHC/Data Parallel Haskell/MainTimed].

==== Parallel execution ====

The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code.

In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.

=== Further examples ===

Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.

The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2011-01-25T03:41:06Z

Chak: /* Project status */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.2. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# GHC_OPTIONS -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:
<blockquote>
<code>ghc -c -Odph -fdph-par DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph -fdph-par Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded -fdph-par DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend.

==== Generating input data ====

To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile and link the program as described above.

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.

==== Parallel execution ====

The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code.

In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.

=== Further examples ===

Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.

The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2011-01-25T03:23:18Z

Chak: /* Using vectorised code */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# GHC_OPTIONS -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:
<blockquote>
<code>ghc -c -Odph -fdph-par DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph -fdph-par Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded -fdph-par DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime and <code>-fdph-par</code> to link with the standard parallel DPH backend.

==== Generating input data ====

To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile and link the program as described above.

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.

==== Parallel execution ====

The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code.

In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.

=== Further examples ===

Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.

The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2011-01-25T03:15:11Z

Chak: /* Compiling vectorised code */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# GHC_OPTIONS -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it as follows:
<blockquote>
<code>ghc -c -Odph -fdph-par DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code and <code>-fdph-par</code> selects the standard parallel DPH backend library. (This is currently the only relevant backend, but there may be others in the future.)

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime.

==== Generating input data ====

To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile and link the program as described above.

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.

==== Parallel execution ====

The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code.

In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.

=== Further examples ===

Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.

The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2011-01-25T02:57:49Z

Chak: /* Compiling vectorised code */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# GHC_OPTIONS -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:
<blockquote>
<code>ghc -c -Odph DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code.

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime.

==== Generating input data ====

To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile and link the program as described above.

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.

==== Parallel execution ====

The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code.

In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.

=== Further examples ===

Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.

The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2010-12-12T09:59:30Z

Chak: /* Feedback */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# OPTIONS -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:
<blockquote>
<code>ghc -c -Odph DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code.

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime.

==== Generating input data ====

To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile and link the program as described above.

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.

==== Parallel execution ====

The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code.

In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.

=== Further examples ===

Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.

The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://www.cse.unsw.edu.au/~benl/ Ben Lippmeier]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2010-12-12T07:23:32Z

Chak: /* Generating input data */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# OPTIONS -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:
<blockquote>
<code>ghc -c -Odph DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code.

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime.

==== Generating input data ====

To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data set. Hence, instead of two small constant vectors, we might want to generate some larger input data:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile and link the program as described above.

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.

==== Parallel execution ====

The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code.

In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.

=== Further examples ===

Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.

The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2010-12-12T07:22:56Z

Chak: /* Generating input data */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# OPTIONS -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:
<blockquote>
<code>ghc -c -Odph DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code.

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime.

==== Generating input data ====

To see any benefit from parallel execution, a data-parallel program needs to operate on a sufficiently large data sets. Hence, instead of two small constant vectors, we might want to generate some larger input data:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile and link the program as described above.

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.

==== Parallel execution ====

The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code.

In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.

=== Further examples ===

Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.

The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2010-12-12T06:08:42Z

Chak: /* Using vectorised code */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# OPTIONS -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:
<blockquote>
<code>ghc -c -Odph DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code.

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph Main.hs</code>
</blockquote>
and finally link the two modules into an executable <code>dotp</code> with
<blockquote>
<code>ghc -o dotp -threaded DotP.o Main.o</code>
</blockquote>
We need the <code>-threaded</code> option to link with GHC's multi-threaded runtime.

==== Generating input data ====

In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -O -fdph-seq Main.hs</code>
</blockquote>
and finally link with
<blockquote>
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code>
</blockquote>

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.

==== Parallel execution ====

The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code.

In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.

=== Further examples ===

Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.

The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2010-12-12T06:07:40Z

Chak: /* Using vectorised code */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# OPTIONS -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:
<blockquote>
<code>ghc -c -Odph DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code.

==== Using vectorised code ====

Finally, we need a main module that calls the vectorised code, but is itself not vectorised, so that it may contain I/O. In this simple example, we convert two simple lists to parallel arrays, compute their dot product, and print the result:
<haskell>
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -Odph Main.hs</code>
</blockquote>
and finally link the two modules into an executable `dotp` with
<blockquote>
<code>ghc -o dotp -threaded DotP.o Main.o</code>
</blockquote>
We need the `-threaded` option to link with GHC's multi-threaded runtime.

==== Generating input data ====

In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -O -fdph-seq Main.hs</code>
</blockquote>
and finally link with
<blockquote>
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code>
</blockquote>

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.

==== Parallel execution ====

The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code.

In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.

=== Further examples ===

Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.

The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2010-12-12T06:03:22Z

Chak: /* Generating input data */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# OPTIONS -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:
<blockquote>
<code>ghc -c -Odph DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code.

==== Using vectorised code ====

Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:
<haskell>
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>

==== Generating input data ====

In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -O -fdph-seq Main.hs</code>
</blockquote>
and finally link with
<blockquote>
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code>
</blockquote>

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.

==== Parallel execution ====

The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code.

In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.

=== Further examples ===

Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.

The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2010-12-12T05:55:10Z

Chak: /* Using vectorised code */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# OPTIONS -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:
<blockquote>
<code>ghc -c -Odph DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code.

==== Using vectorised code ====

Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:
<haskell>
import Data.Array.Parallel.PArray (PArray, fromList)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= let v = fromList [1..10] -- convert lists...
w = fromList [1,2..20] -- ...to parallel arrays
result = dotp_wrapper v w -- invoke vectorised code
in
print result -- print the result
</haskell>

==== Generating input data ====

Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -O -fdph-seq Main.hs</code>
</blockquote>
and finally link with
<blockquote>
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code>
</blockquote>

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.

==== Parallel execution ====

The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code.

In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.

=== Further examples ===

Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.

The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2010-12-10T08:09:47Z

Chak: /* Compiling vectorised code */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The syntax for parallel arrays is an extension to Haskell 2010 that needs to be enabled with the language option <hask>ParallelArrays</hask>. Furthermore, we need to explicitly tell GHC if we want to vectorise a module by using the <hask>-fvectorise</hask> option.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE ParallelArrays #-}
{-# OPTIONS -fvectorise #-}

module DotP (dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:
<blockquote>
<code>ghc -c -Odph DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that works best for DPH code.

==== Using vectorised code ====

Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -O -fdph-seq Main.hs</code>
</blockquote>
and finally link with
<blockquote>
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code>
</blockquote>

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.

==== Parallel execution ====

The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code.

In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.

=== Further examples ===

Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.

The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2010-12-10T07:47:41Z

Chak: /* Impedance matching */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays (which might be nested) '''cannot''' be passed. Instead, we need to pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The definition of <hask>dotp_double</hask> requires two language extensions, namely <hask>PArr</hask> to enable the syntax of parallel arrays and <hask>ParallelListComp</hask> for the parallel comprehension. Furthermore, we need to explicitly tell GHC which modules we want to vectorise.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

The compiler option to enable vectorisation is <code>-fvectorise</code>. Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE PArr, ParallelListComp #-}
{-# OPTIONS -fvectorise #-}

module DotP (dotp_double,dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:
<blockquote>
<code>ghc -c -Odph -fcpr-off -fdph-seq DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that is geared at optimising DPH code. Moreover, we use <code>-fcpr-off</code> as GHC's CPR phase doesn't play nice with type families at the moment, which in turn are heavily used in the DPH library. We shall discuss <code>-fdph-seq</code> below.

==== Using vectorised code ====

Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -O -fdph-seq Main.hs</code>
</blockquote>
and finally link with
<blockquote>
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code>
</blockquote>

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.

==== Parallel execution ====

The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code.

In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.

=== Further examples ===

Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.

The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2010-12-10T07:43:36Z

Chak: /* Special Prelude */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude.hs Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Float.hs Data.Array.Parallel.Prelude.Float], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Double.hs Data.Array.Parallel.Prelude.Double], [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Int.hs Data.Array.Parallel.Prelude.Int], and [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Word8.hs Data.Array.Parallel.Prelude.Word8]. These four modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). Moreover, we have [http://darcs.haskell.org/packages/dph/dph-common/Data/Array/Parallel/Prelude/Bool.hs Data.Array.Parallel.Prelude.Bool]. If your code needs any other numeric types or functions that are not implemented in these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays '''cannot''' be passed. Instead, we can pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The definition of <hask>dotp_double</hask> requires two language extensions, namely <hask>PArr</hask> to enable the syntax of parallel arrays and <hask>ParallelListComp</hask> for the parallel comprehension. Furthermore, we need to explicitly tell GHC which modules we want to vectorise.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

The compiler option to enable vectorisation is <code>-fvectorise</code>. Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE PArr, ParallelListComp #-}
{-# OPTIONS -fvectorise #-}

module DotP (dotp_double,dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:
<blockquote>
<code>ghc -c -Odph -fcpr-off -fdph-seq DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that is geared at optimising DPH code. Moreover, we use <code>-fcpr-off</code> as GHC's CPR phase doesn't play nice with type families at the moment, which in turn are heavily used in the DPH library. We shall discuss <code>-fdph-seq</code> below.

==== Using vectorised code ====

Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -O -fdph-seq Main.hs</code>
</blockquote>
and finally link with
<blockquote>
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code>
</blockquote>

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.

==== Parallel execution ====

The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code.

In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.

=== Further examples ===

Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.

The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2010-12-10T04:26:07Z

Chak: /* No type classes */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude.html Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Double.html Data.Array.Parallel.Prelude.Double], [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Int.html Data.Array.Parallel.Prelude.Int], and [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Word8.html Data.Array.Parallel.Prelude.Word8]. These three modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). If your code needs any other numeric types or functions that are not implemented in the these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays '''cannot''' be passed. Instead, we can pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The definition of <hask>dotp_double</hask> requires two language extensions, namely <hask>PArr</hask> to enable the syntax of parallel arrays and <hask>ParallelListComp</hask> for the parallel comprehension. Furthermore, we need to explicitly tell GHC which modules we want to vectorise.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

The compiler option to enable vectorisation is <code>-fvectorise</code>. Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE PArr, ParallelListComp #-}
{-# OPTIONS -fvectorise #-}

module DotP (dotp_double,dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:
<blockquote>
<code>ghc -c -Odph -fcpr-off -fdph-seq DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that is geared at optimising DPH code. Moreover, we use <code>-fcpr-off</code> as GHC's CPR phase doesn't play nice with type families at the moment, which in turn are heavily used in the DPH library. We shall discuss <code>-fdph-seq</code> below.

==== Using vectorised code ====

Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -O -fdph-seq Main.hs</code>
</blockquote>
and finally link with
<blockquote>
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code>
</blockquote>

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.

==== Parallel execution ====

The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code.

In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.

=== Further examples ===

Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.

The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2010-12-10T04:25:15Z

Chak: /* Running DPH programs */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code, called ''vectorisation'', that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but for parallel code it dramatically simplifies load balancing.

==== No type classes ====

Unfortunately, vectorisation does not properly handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude.html Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Double.html Data.Array.Parallel.Prelude.Double], [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Int.html Data.Array.Parallel.Prelude.Int], and [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Word8.html Data.Array.Parallel.Prelude.Word8]. These three modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). If your code needs any other numeric types or functions that are not implemented in the these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays '''cannot''' be passed. Instead, we can pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The definition of <hask>dotp_double</hask> requires two language extensions, namely <hask>PArr</hask> to enable the syntax of parallel arrays and <hask>ParallelListComp</hask> for the parallel comprehension. Furthermore, we need to explicitly tell GHC which modules we want to vectorise.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

The compiler option to enable vectorisation is <code>-fvectorise</code>. Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE PArr, ParallelListComp #-}
{-# OPTIONS -fvectorise #-}

module DotP (dotp_double,dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:
<blockquote>
<code>ghc -c -Odph -fcpr-off -fdph-seq DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that is geared at optimising DPH code. Moreover, we use <code>-fcpr-off</code> as GHC's CPR phase doesn't play nice with type families at the moment, which in turn are heavily used in the DPH library. We shall discuss <code>-fdph-seq</code> below.

==== Using vectorised code ====

Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -O -fdph-seq Main.hs</code>
</blockquote>
and finally link with
<blockquote>
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code>
</blockquote>

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.

==== Parallel execution ====

The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code.

In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.

=== Further examples ===

Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.

The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2010-12-10T04:19:45Z

Chak: /* Overview */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines variants of most list operations from the Haskell Prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that are absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code called ''vectorisation'' that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but there it raises the level of expressiveness dramatically.

==== No type classes ====

Unfortunately, vectorisation does not properly handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude.html Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Double.html Data.Array.Parallel.Prelude.Double], [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Int.html Data.Array.Parallel.Prelude.Int], and [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Word8.html Data.Array.Parallel.Prelude.Word8]. These three modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). If your code needs any other numeric types or functions that are not implemented in the these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays '''cannot''' be passed. Instead, we can pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The definition of <hask>dotp_double</hask> requires two language extensions, namely <hask>PArr</hask> to enable the syntax of parallel arrays and <hask>ParallelListComp</hask> for the parallel comprehension. Furthermore, we need to explicitly tell GHC which modules we want to vectorise.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

The compiler option to enable vectorisation is <code>-fvectorise</code>. Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE PArr, ParallelListComp #-}
{-# OPTIONS -fvectorise #-}

module DotP (dotp_double,dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:
<blockquote>
<code>ghc -c -Odph -fcpr-off -fdph-seq DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that is geared at optimising DPH code. Moreover, we use <code>-fcpr-off</code> as GHC's CPR phase doesn't play nice with type families at the moment, which in turn are heavily used in the DPH library. We shall discuss <code>-fdph-seq</code> below.

==== Using vectorised code ====

Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -O -fdph-seq Main.hs</code>
</blockquote>
and finally link with
<blockquote>
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code>
</blockquote>

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.

==== Parallel execution ====

The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code.

In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.

=== Further examples ===

Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.

The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2010-12-10T04:13:59Z

Chak: /* Where to get it */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

To get DPH, you currently need to get the development version of GHC, which automatically includes DPH. We are in the process of preparing a DPH release for GHC 7.0, the current stable release of GHC.

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines analogs of most list operations from the Haskell prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code called ''vectorisation'' that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but there it raises the level of expressiveness dramatically.

==== No type classes ====

Unfortunately, vectorisation does not properly handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude.html Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Double.html Data.Array.Parallel.Prelude.Double], [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Int.html Data.Array.Parallel.Prelude.Int], and [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Word8.html Data.Array.Parallel.Prelude.Word8]. These three modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). If your code needs any other numeric types or functions that are not implemented in the these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays '''cannot''' be passed. Instead, we can pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The definition of <hask>dotp_double</hask> requires two language extensions, namely <hask>PArr</hask> to enable the syntax of parallel arrays and <hask>ParallelListComp</hask> for the parallel comprehension. Furthermore, we need to explicitly tell GHC which modules we want to vectorise.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

The compiler option to enable vectorisation is <code>-fvectorise</code>. Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE PArr, ParallelListComp #-}
{-# OPTIONS -fvectorise #-}

module DotP (dotp_double,dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:
<blockquote>
<code>ghc -c -Odph -fcpr-off -fdph-seq DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that is geared at optimising DPH code. Moreover, we use <code>-fcpr-off</code> as GHC's CPR phase doesn't play nice with type families at the moment, which in turn are heavily used in the DPH library. We shall discuss <code>-fdph-seq</code> below.

==== Using vectorised code ====

Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -O -fdph-seq Main.hs</code>
</blockquote>
and finally link with
<blockquote>
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code>
</blockquote>

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.

==== Parallel execution ====

The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code.

In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.

=== Further examples ===

Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.

The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

GHC/Data Parallel Haskell

2010-12-10T04:10:32Z

Chak: /* Project status */

[[Category:GHC|Data Parallel Haskell]]
== Data Parallel Haskell ==

''Data Parallel Haskell'' is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html nested data parallelism] with a focus to utilise multicore CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper [http://www.cse.unsw.edu.au/~chak/papers/papers.html#ndp-haskell Nepal – Nested Data-Parallelism in Haskell].

<center>
http://17.media.tumblr.com/VtG26AnzIklk0sh6YkZSLYNPo1_400.png
</center>

''This is the performance of a dot product of two vectors of 10 million doubles each using Data Parallel Haskell. Both machines have 8 cores. Each core of the T2 has 8 hardware thread contexts. ''

__TOC__

=== Project status ===

We are currently preparing for a release of Data Parallel Haskell (DPH) for GHC 7.0. All major components of DPH are implemented, including code vectorisation and parallel execution on multicore systems. However, the implementation has many limitations and probably also many bugs. Major limitations include the inability to mix vectorised and non-vectorised code in a single Haskell module, the need to use a feature-deprived, special-purpose Prelude in vectorised code, and a lack of optimisations (leading to poor performance in some cases).

The current implementation should work well for code with nested parallelism, where the depth of nesting is statically fixed. It should also perform reasonably when nesting is recursive as long as no user-defined nested-parallel datatypes are used. Support for user-defined nested-parallel datatypes is still rather experimental and will likely result in inefficient code. For concrete examples of the various classes of parallelism, please refer to the [http://hackage.haskell.org/trac/ghc/wiki/DataParallel/BenchmarkStatus DPH benchmark status page].

DPH focuses on irregular data parallelism. For regular data parallel code in Haskell, please consider using the companion library [http://trac.haskell.org/repa/ Repa], which builds on the parallel array infrastructure of DPH.

'''Disclaimer:''' Data Parallel Haskell is very much '''work in progress.''' Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and functionality may change during development.

=== Where to get it ===

DPH is available in the current stable release GHC 6.10.1, which is [http://haskell.org/ghc/download_ghc_6_10_1.html available in source and binary form] for many architectures. If you are compiling 6.10.1 ''from source,'' please ensure that you include the <code>ghc-6.10.1-src-extralibs.tar.bz2</code> archive as it supplies important libraries. GHC distribution binaries should include these libraries by default.

'''Update [March 2009]:''' The 6.10.1 release has now fallen considerably behind the current development version in the HEAD repository, not only with respect to DPH support, but generally concerning support for multi-core parallelism in the GHC runtime system. Hence, if you are interested in performance and scalability, you need to use the development compiler – with the usual caveats. We are planning a more mature stable release for 6.12. (Due to the scale of the changes involved, we are not able to backport the latest changes to the 6.10.2 release.) To use the code in the HEAD repository, please follow [http://hackage.haskell.org/trac/ghc/wiki/Building/QuickStart the standard build instructions.] Important is that you download ''package dph'' before you build and install the system; you can achieve that with

./darcs-all --dph get

=== Overview ===

From a user's point of view, Data Parallel Haskell adds a new data type to Haskell –namely, ''parallel arrays''– as well as operations on parallel arrays. Syntactically, parallel arrays are like lists, only that instead of square brackets <hask>[</hask> and <hask>]</hask>, parallel arrays use square brackets with a colon <hask>[:</hask> and <hask>:]</hask>. In particular, <hask>[:e:]</hask> is the type of parallel arrays with elements of type <hask>e</hask>; the expression <hask>[:x, y, z:]</hask> denotes a three element parallel array with elements <hask>x</hask>, <hask>y</hask>, and <hask>z</hask>; and <hask>[:x + 1 | x <- xs:]</hask> represents a simple array comprehension. More sophisticated array comprehensions (including the equivalent of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions]) as well as enumerations and pattern matching proceed in an analog manner. Moreover, the array library of DPH defines analogs of most list operations from the Haskell prelude and the standard <hask>List</hask> library (e.g., we have <hask>lengthP</hask>, <hask>sumP</hask>, <hask>mapP</hask>, and so on).

The two main differences between lists and parallel arrays are that (1) parallel arrays are a strict data structure and (2) that they are not inductively defined. Parallel arrays are strict in that by demanding a single element, all elements of an array are demanded. Hence, all elements of a parallel array might be evaluated in parallel. To facilitate such parallel evaluation, operations on parallel arrays should treat arrays as aggregate structures that are manipulated in their entirety (instead of the inductive, element-wise processing that is the foundation of all Haskell list functions.)

As a consequence, parallel arrays are always finite, and standard functions that yield infinite lists, such as <hask>enumFrom</hask> and <hask>repeat</hask>, have no corresponding array operation. Moreover, parallel arrays only have an undirected fold function <hask>foldP</hask> that requires an associative function as an argument – such a fold function has a parallel step complexity of O(log ''n'') for arrays of length ''n''. Parallel arrays also come with some aggregate operations that absent from the standard list library, such as <hask>permuteP</hask>.

=== A simple example ===

As a simple example of a DPH program, consider the following code that computes the dot product of two vectors given as parallel arrays:
<haskell>
dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>
This code uses an array variant of [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#parallel-list-comprehensions parallel list comprehensions], which could alternatively be written as <hask>[:x * y | (x, y) <- zipP xs ys:]</hask>, but should otherwise be self-explanatory to any Haskell programmer.

=== Running DPH programs ===

Unfortunately, we cannot use the above implementation of <hask>dotp</hask> directly in the current preliminary implementation of DPH. In the following, we will discuss how the code needs to be modified and how it needs to be compiled and run for parallel execution. GHC applies an elaborate transformation to DPH code called ''vectorisation'' that turns nested into flat data parallelism. This transformation is only useful for code that is executed in parallel (i.e., code that manipulates parallel arrays), but there it raises the level of expressiveness dramatically.

==== No type classes ====

Unfortunately, vectorisation does not properly handle type classes at the moment. Hence, we currently need to avoid overloaded operations in parallel code. To account for that limitation, we specialise <hask>dotp</hask> on doubles.
<haskell>
dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]
</haskell>

==== Special Prelude ====

As the current implementation of vectorisation cannot handle some language constructs, we cannot use it to vectorise those parts of the standard Prelude that might be used in parallel code (such as arithmetic operations). Instead, DPH comes with its own (rather limited) Prelude in [http://haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude.html Data.Array.Parallel.Prelude] plus three extra modules to support one numeric type each [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Double.html Data.Array.Parallel.Prelude.Double], [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Int.html Data.Array.Parallel.Prelude.Int], and [http://www.haskell.org/ghc/docs/6.12-latest/html/libraries/dph-par-0.4.0/Data-Array-Parallel-Prelude-Word8.html Data.Array.Parallel.Prelude.Word8]. These three modules support the same functions (on different types) and if a program needs to use more than one, they need to be imported qualified (as we cannot use type classes in vectorised code in the current version). If your code needs any other numeric types or functions that are not implemented in the these Prelude modules, you currently need to implement and vectorise that functionality yourself.

To compile <hask>dotp_double</hask>, we add the following three import statements:
<haskell>
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double
</haskell>

==== Impedance matching ====

Special care is needed at the interface between vectorised and non-vectorised code. Currently, only simple types can be passed between these different kinds of code. In particular, parallel arrays '''cannot''' be passed. Instead, we can pass flat arrays of type <hask>PArray</hask>. This type is exported by our special-purpose Prelude together with a conversion function <hask>fromPArrayP</hask> (which is specific to the element type due to the lack of type classes in vectorised code).

Using this conversion function, we define a wrapper function for <hask>dotp_double</hask> that we export and use from non-vectorised code.
<haskell>
dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
It is important to mark this function as <hask>NOINLINE</hask> as we don't want it to be inlined into non-vectorised code.

==== Compiling vectorised code ====

The definition of <hask>dotp_double</hask> requires two language extensions, namely <hask>PArr</hask> to enable the syntax of parallel arrays and <hask>ParallelListComp</hask> for the parallel comprehension. Furthermore, we need to explicitly tell GHC which modules we want to vectorise.

Currently, GHC either vectorises all code in a module or none. This can be inconvenient as some parts of a program cannot be vectorised – for example, code in the <hask>IO</hask> monad (the radical re-ordering of computations performed by the vectorisation transformation is only valid for pure code). As a consequence, the programmer currently needs to partition vectorised and non-vectorised code carefully over different modules.

The compiler option to enable vectorisation is <code>-fvectorise</code>. Overall, we get the following complete module definition for the dot-product code:
<haskell>
{-# LANGUAGE PArr, ParallelListComp #-}
{-# OPTIONS -fvectorise #-}

module DotP (dotp_double,dotp_wrapper)
where

import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Double

dotp_double :: [:Double:] -> [:Double:] -> Double
dotp_double xs ys = sumP [:x * y | x <- xs | y <- ys:]

dotp_wrapper :: PArray Double -> PArray Double -> Double
{-# NOINLINE dotp_wrapper #-}
dotp_wrapper v w = dotp_double (fromPArrayP v) (fromPArrayP w)
</haskell>
Assuming the module is in a file <hask>DotP.hs</hask>, we compile it was follows:
<blockquote>
<code>ghc -c -Odph -fcpr-off -fdph-seq DotP.hs</code>
</blockquote>
The option <code>-Odph</code> enables a predefined set of GHC optimisation options that is geared at optimising DPH code. Moreover, we use <code>-fcpr-off</code> as GHC's CPR phase doesn't play nice with type families at the moment, which in turn are heavily used in the DPH library. We shall discuss <code>-fdph-seq</code> below.

==== Using vectorised code ====

Finally, we need a wrapper module that calls the vectorised code, but is itself not vectorised. In this simple example, this is just the <hask>Main</hask> module that generates two random vectors and computes their dot product:
<haskell>
import System.Random (newStdGen)
import Data.Array.Parallel.PArray (PArray, randomRs)

import DotP (dotp_wrapper) -- import vectorised code

main :: IO ()
main
= do
gen1 <- newStdGen
gen2 <- newStdGen
let v = randomRs n range gen1
w = randomRs n range gen2
print $ dotp_wrapper v w -- invoke vectorised code and print the result
where
n = 10000 -- vector length
range = (-100, 100) -- range of vector elements
</haskell>
We compile this module with
<blockquote>
<code>ghc -c -O -fdph-seq Main.hs</code>
</blockquote>
and finally link with
<blockquote>
<code>ghc -o dotp -fdph-seq -threaded DotP.o Main.o</code>
</blockquote>

'''NOTE:''' The code as presented is unsuitable for benchmarking as we wouldn't want to measure the purely sequential random number generation (that dominates this simple program). For benchmarking, we would want to guarantee that the generated vectors are fully evaluated before taking the time. The module [http://www.haskell.org/ghc/docs/latest/html/libraries/dph-par/Data-Array-Parallel-PArray.html Data.Array.Parallel.PArray] exports the function <hask>nf</hask> for this purpose.

==== Parallel execution ====

The array library of DPH comes in two flavours: <code>dph-seq</code> and <code>dph-par</code>. The former supports the whole DPH stack, but only executes on a single core. In contrast, <code>dph-par</code> implements multi-threaded code.

In the above compiler invocations, we used the option <code>-fdph-seq</code> to select the <code>dph-seq</code> flavour. We can as well compile with <code>-fdph-par</code> to generate multi-threaded code. By invoking <code>./dotp +RTS -N2</code>, we use two OS threads to execute the program. A beautiful property of DPH is that the number of threads used to execute a program only affects its performance, but not the result. So, it is fine to do all debugging concerning correctness with <code>dph-seq</code> and to switch to <code>dph-par</code> only for performance debugging.

Data Parallel Haskell –and more generally, GHC's multi-threading support– currently only aims at multicore processors or uniform memory access (UMA) multi-processors. Performance on non-uniform memory access (NUMA) machines is generally bad as GHC's runtime makes no effort at optimising placement. Some people have reported that the parallel garbage collector (as included in GHC 6.10.1) should not be used with parallel programs; i.e., it is advisable to start parallel programs with <code>my_program +RTS -N2 -g1</code> to run on two cores (and different arguments to <code>-N</code> for other core counts). This problem has been addressed in the development version of GHC.

=== Further examples ===

Further examples are available in the [http://darcs.haskell.org/ghc-6.10/packages/dph/examples/ examples directory of the package dph source]. In addition to code using vectorisation (as described above), these examples also contain code that directly targets the two array libraries contained in <code>-package dph-seq</code> and <code>-package dph-par</code>, respectively. For more complex programs, targeting the DPH array libraries directly can lead to much faster code than using vectorisation, as GHC currently doesn't optimise vectorised code very well. However, code targeting the DPH libraries directly can only use flat data parallelism.

The interfaces of the various components of the DPH library are specified in GHC's [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html hierarchical libraries documentation].

=== Designing parallel programs ===

Data Parallel Haskell is a high-level language to code parallel algorithms. Like plain Haskell, DPH frees the programmer from many low-level operational considerations (such as thread creation, thread synchronisation, critical sections, and deadlock avoidance). Nevertheless, the full responsibility for parallel algorithm design and many performance considerations (such as when does a computation have sufficient parallelism to make it worthwhile to exploit that parallelism) are still with the programmer.

DPH encourages a data-driven style of parallel programming and, in good Haskell tradition, puts the choice of data types first. Specifically, the choice between using lists or parallel arrays for a data structure determines whether operations on the structure will be executed sequentially or in parallel. In addition to suitably combining standard lists and parallel arrays, it is often also useful to embed parallel arrays in a user-defined inductive structure, such as the following definition of parallel rose trees:
<haskell>
data RTree a = RNode [:RTree a:]
</haskell>
The tree is inductively defined; hence, tree traversals will proceed sequentially, level by level. However, the children of each node are held in parallel arrays, and hence, may be traversed in parallel. This structure is, for example, useful in parallel adaptive algorithms based on a hierarchical decomposition, such as the Barnes-Hut algorithm for solving the ''N''-body problem as discussed in more detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.]

For a general introduction to nested data parallelism and its cost model, see Blelloch's [http://www.cs.cmu.edu/~scandal/cacm/cacm2.html Programming Parallel Algorithms.]

=== Further reading and information on the implementation ===

DPH has two major components: (1) the ''vectorisation transformation'' and (2) the ''generic DPH library for flat parallel arrays''. The vectorisation transformation turns nested into flat data-parallelism and is described in detail in the paper [http://www.cse.unsw.edu.au/~chak/papers/PLKC08.html Harnessing the Multicores: Nested Data Parallelism in Haskell.] The generic array library maps flat data-parallelism to GHC's multi-threaded multicore support and is described in the paper [http://www.cse.unsw.edu.au/~chak/papers/CLPKM06.html Data Parallel Haskell: a status report]. The same topics are also covered in the slides for the two talks [http://research.microsoft.com/~simonpj/papers/ndp/NdpSlides.pdf Nested data parallelism in Haskell] and [http://dataparallel.googlegroups.com/web/UNSW%20CGO%20DP%202007.pdf Compiling nested data parallelism by program transformation].

For further reading, consult this [[GHC/Data Parallel Haskell/References|collection of background papers, and pointers to other people's work]]. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, much of it is described on the GHC developer wiki on the pages covering [http://hackage.haskell.org/trac/ghc/wiki/DataParallel data parallelism] and [http://hackage.haskell.org/trac/ghc/wiki/TypeFunctions type families].

=== Feedback ===

Please file bug reports at [http://hackage.haskell.org/trac/ghc/ GHC's bug tracker]. Moreover, comments and suggestions are very welcome. Please post them to the [mailto:glasgow-haskell-users@haskell.org GHC user's mailing list], or contact the DPH developers directly:
* [http://www.cse.unsw.edu.au/~chak/ Manuel Chakravarty]
* [http://www.cse.unsw.edu.au/~keller/ Gabriele Keller]
* [http://www.cse.unsw.edu.au/~rl/ Roman Leshchinskiy]
* [http://research.microsoft.com/~simonpj/ Simon Peyton Jones]

User:Chak

2010-03-22T04:10:03Z

Chak:

Nothing to see here, really, but check out

* my [http://www.cse.unsw.edu.au/~chak/ webpage] and
* my [http://justtesting.org blog].

On [[IRC channel|#haskell]] and #ghc, I am TacticalGrace.