GHC/Data Parallel Haskell

Data Parallel Haskell

Data Parallel Haskell is the codename for an extension to the Glasgow Haskell Compiler and its libraries to support nested data parallelism with a focus to utilise multi-core CPUs. Nested data parallelism extends the programming model of flat data parallelism, as known from parallel Fortran dialects, to irregular parallel computations (such as divide-and-conquer algorithms) and irregular data structures (such as sparse matrices and tree structures). An introduction to nested data parallelism in Haskell, including some examples, can be found in the paper Nepal -- Nested Data-Parallelism in Haskell.

Overview

The design, some implementation details, and first results of Data Parallel Haskell are described in the paper Data Parallel Haskell: a status report. The same topic is covered in the slides for the two talks Nested data parallelism in Haskell and Compiling nested data parallelism by program transformation.

Download

Disclaimer: Data Parallel Haskell is very much work in progress. Some components are already usable, and we explain here how to use them. However, please be aware that APIs are still in flux and some functionality may be broken at times during development.

Further information

For further reading, refer to this collection of background papers, and pointers to other people's work. If you are really curious and like to know implementation details and the internals of the Data Parallel Haskell project, it is all on the GHC developer wiki on the pages covering data parallelism and associated types.

Implementation of nested data parallelism in two stages

As Data Parallel Haskell is not fully implemented at this stage, use of the existing components requires some knowledge of the structure of the implementation, which essentially consists of two parts:

A concurrent, high-performance library of strict, segmented arrays. In contrast to Haskell 98 arrays, these arrays are strict in that when one element of an array is evaluated, all of them are - we also call this a parallel evaluation semantics. Moreover, all operations come in two flavours: one for plain, flat arrays and one for segmented arrays, which are a particular representation of arrays with one level of nesting. For example, a sum of a segmented array of numbers computes one sum for each segment. The library has a purely functional interface, but internally uses monadic low-level array operations, array fusion, and a many standard GHC optimisations to produce highly optimised code. Finally, the library uses GHC's SMP concurrency to parallelise array processing on hardware with multiple processing elements.
A vectorising program transformation (in the compiler) that maps nested array code (of arbitrary nesting depth) to code using segmented arrays. Nested parallel arrays are as convenient to use as finite, eagerly evaluated nested lists. Code vectorisation transforms a nested parallel array program such that it operates on the segmented arrays described before. The resulting code would be tedious to write manually, but is much more efficient than a direct implementation of nested arrays. (As segmented arrays correspond to arrays with one level of nesting, vectorisation collapses arbitrary nesting to one level of nesting.)

So far, we have implemented large parts of the concurrent array library (Part 1), but not the vectorisation transformation (Part 2). Hence, users must choose between the following two options: (1) The convenient programming model of arbitrarily nested, irregular array computations with a reasonably efficient, but not highly optimised and only purely sequential implementation. (2) An aggresively optimising, concurrent array library with a less expressive API of segmented (instead of nested) arrays, but which is still purely functional.

Convenience without the speed: nested arrays in Haskell

Any recent stable version of GHC (e.g., version 6.6) includes syntactic support for array comprehensions and a library of frequently used array operations. To use these, you need to pass GHC the command line option -fparr (to enable the extra syntax) and import the module GHC.PArr (a Prelude extension for arrays). (If you like to use parallel array comprehensions, you also need -fglasgow-exts.) For example, the following module implements a dot product using arrays:

{-# OPTIONS -fparr -fglasgow-exts #-}
module DotP (dotp)
where
import GHC.PArr

dotp :: Num a => [:a:] -> [:a:] -> a
dotp xs ys = sumP [:x * y | x <- xs | y <- ys:]

You can use this module in an interactive GHCi session as follows:

Prelude> :set -fparr -fglasgow-exts
Prelude> :load DotP
[1 of 1] Compiling DotP             ( code/haskell/DotP.hs, interpreted )
Ok, modules loaded: DotP.
*DotP> dotp [:1..3:] [:4..6:]
32
*DotP>

(NB: The :set is needed despite the OPTIONS pragma in DotP.hs, so that you can use array syntax on the interactive command line of GHCi.)

Unfortunately, the current version of Haddock does not grok the special array syntax, so there is no nice HTML version of the interface for GHC.PArr. Instead, please consult the source code of GHC.PArr.

Speed with less convenience: package ndp

The concurrent, high-performance library of strict, segmented arrays mentioned above takes the form of a GHC package called ndp. This package is under development and only available in source form. The simplest way to build it is to first get and build a source distribution of GHC (preferably the current development version) - see the docu on how to get the sources and how to build them. Then, in the source tree, do the following

% cd libraries
% darcs get http://darcs.haskell.org/packages/ndp/
% cd ndp
% make boot
% make

Now, the option -package ndp is available for use with the inplace compiler (i.e., compiler/ghc-inplace). Alternatively, you can install it by invoking make install on the GHC source root and within libraries/ndp/. Then, the option -package ndp can be used in the installed compiler.

For example, the following module implements the dot product with package ndp:

module DotP_ndp (dotp)
where

import Data.Array.Parallel.Unlifted

dotp :: (Num a, UA a) => UArr a -> UArr a -> a
dotp xs ys = sumU (zipWithU (*) xs ys)

We can also use that in an interactive GHCi session:

Prelude> :set -package ndp
package flags have changed, ressetting and loading new packages...
Loading package ndp-1.0 ... linking ... done.
Prelude> :l /home/chak/code/haskell/DotP_ndp
[1 of 1] Compiling DotP_ndp         ( /home/chak/code/haskell/DotP_ndp.hs, interpreted )
Ok, modules loaded: DotP_ndp.
*DotP_ndp> dotp (toU [1..3]) (toU [4..6])
Loading package haskell98 ... linking ... done.
32.0
*DotP_ndp>

The difference between the package ndp and the -fparr version of the dot product is just a fairly small amount of sugar. However, for programs using arrays of more complex (including nested arrays), the difference is much bigger. Nevertheless, many programs can be implemented quite easily with just package ndp. The speed difference between the two options is dramatic.

Most of the functions under Data.Array.Parallel.Unlifted are still purely sequential, albeit much more efficient than GHC.PArr. In addition, the (currently only few) functions from Data.Array.Parallel.Unlifted.Parallel transparently use multiple processing elements if GHC was compiled with SMP multiprocessor support.

A number of examples of using package ndp are in the examples directory.

References

Data Parallel Haskell:

Data Parallel Haskell: a status report. Manuel M. T. Chakravarty, Roman Leshchinskiy, Simon Peyton Jones, Gabriele Keller, and Simon Marlow. In DAMP 2007: Workshop on Declarative Aspects of Multicore Programming, ACM Press, 2007. Summary: Illustrates our approach to implementing nested data parallelism by way of the example of multiplying a sparse matrix with a vector and gives first performance figures. It also includes an overview over the implementation and references to our previous work in the area. Here are the slides of a talk about the paper.

Nepal -- Nested Data-Parallelism in Haskell. Manuel M. T. Chakravarty, Gabriele Keller, Roman Lechtchinsky, and Wolf Pfannenstiel. In Euro-Par 2001: Parallel Processing, 7th International Euro-Par Conference, Springer-Verlag, LNCS 2150, pages 524-534, 2001. Summary: Illustrates the language design of integrating support for nested data parallelism into Haskell; in particular, the semantics of parallel arrays and the idea of distinguishing between the parallel and sequential components of a data structure and algorithm by type are introduced. These concepts are illustrated by a parallel version of quicksort, the Barnes-Hut algorithm for solving the n-body problem, and Wang's algorithm to solving tridiagonal systems of linear equations.

Implementing nested data parallelism by program transformation:

Higher Order Flattening. Roman Leshchinskiy, Manuel M. T. Chakravarty, and Gabriele Keller. In Third International Workshop on Practical Aspects of High-level Parallel Programming (PAPP 2006), Springer-Verlag, LNCS 3992, 2006. Summary: This paper explains how the flattening transformation can be extended to higher-order functions by way of closure conversion and closure inspection. This method was one of the central contributions of Roman Leshchinskiy's PhD thesis.

Associated Types with Class. Manuel M. T. Chakravarty, Gabriele Keller, Simon Peyton Jones, and Simon Marlow. In Proceedings of The 32nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'05), pages 1-13, ACM Press, 2005. Summary: Introduces the idea and type theory of type-indexed data types as type members of Haskell type classes. These associated data types are an essential element of our optimising, non-parametric array implementation.

More Types for Nested Data Parallel Programming. Manuel M. T. Chakravarty and Gabriele Keller. In Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming, pages 94-105, ACM Press, 2000. Summary: Extends Blelloch's flattening transformation for nested data parallelism to languages supporting full algebraic data types, including sum types and recursive types. This paper extends flattening for recursive types as introduced in Gabriele Keller's PhD thesis.

On the Distributed Implementation of Aggregate Data Structures by Program Transformation. Gabriele Keller and Manuel M. T. Chakravarty. In Fourth International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS'99), pages 108-122, Springer Verlag, LNCS 1586, 1999. Summary: Presents the idea of supporting transformation-based optimisations, and in particular array and communication fusion, by distinguishing between distributed and local data by type. This method was one of the main contributions of Gabriele Keller's PhD thesis.

An approach to fast arrays in Haskell, Manuel M. T. Chakravarty and Gabriele Keller. In Johan Jeuring and Simon Peyton Jones, editors, lecture notes for The Summer School and Workshop on Advanced Functional Programming 2002. LNCS 2638, Springer-Verlag, pages 27-58, 2003. Summary: This tutorial paper illustrates the main challenges in implementing sequential high-performance arrays in a lazy functional language. It includes a step-by-step illustration of first-order flattening, discusses implementing non-parametric arrays without associated types, and illustrates a simple approach to equational array fusion. (Data Parallel Haskell uses a more powerful fusion framework based on stream fusion.)

Other languages with nested data parallelism:

Programming Parallel Algorithms. Guy E. Blelloch. In Communications of the ACM, 39(3), March, 1996. Summary: This seminal article illustrates the flexibility and high level of abstraction of nested data parallelism. It also describes the model's language-based cost model.
NESL: A Parallel Programming Language. Summary: This is the main NESL page with many links to programming examples and implementation techniques. The work on NESL did lay the foundations for the programming model of nested data parallelism and is the one most influential precursors of our work.
The Manticore Project. Summary: This is the main page of the Manticore project with many further links. Manticore is a recent effort to develop a heterogeneous parallel programming language targeting multi-core processors, which also includes nested data parallelism in the style of NESL and Data Parallel Haskell.
Publications of the Proteus project. Summary: Proteus was an effort to develop a heterogeneous parallel language during the high-performance computing era. Most of the actual work on Proteus was actually concerned with its nested data parallel sub-language.