Parallel GHC Project

From HaskellWiki
Revision as of 09:39, 13 November 2012 by Mikolaj (talk | contribs) (TS&perf update)
Jump to navigation Jump to search


Overview

The Parallel GHC Project is an MSR-funded project to push the real-world use of parallel Haskell. The aim is to demonstrate that parallel Haskell can be employed successfully in industrial projects.

In the last few years GHC has gained impressive support for parallel programming on commodity multi-core systems. In addition to traditional threads and shared variables, it supports pure parallelism, software transactional memory (STM), and data parallelism. With much of this research and development complete, the next stage is to get the technology into more widespread use.

This project aims to do the engineering work to solve whatever remaining practical problems are blocking organisations from making serious use of parallelism with GHC. The driving force is the applications rather than the technology.

The project involves a partnership with six groups from commercial and scientific organisations. Over the course of two years these groups are applying parallel Haskell in their specific domains. They are being supported by GHC HQ and Well-Typed who are providing advice on Haskell tools and techniques, and applying engineering effort to resolve any issues that are hindering these groups' progress.

The project is being coordinated by Well-Typed and they are providing the bulk of the support and engineering effort. The project started in the summer of 2010.

Project News

ThreadScope and friends

We have been continuing our work to make ThreadScope more helpful and informative in tracking down your parallel and concurrent Haskell performance problems. We now have the ability to collect heap (and some other) statistics from the GHC runtime system and present them in ThreadScope for a selected runtime interval. These features are available for users of GHC 7.6 or newer. On feedback from users, we have improved support for user events of different granularity. We have released a preliminary version of tools for collecting information from hardware performance counters, more specifically the Linux Perf Events. This can be useful for studying IO-heavy programs, the idea being to visualise system calls as being distinct from actual execution of Haskell code. The perf events support will be available for users of a recent development GHC (7.7.x) or the eventual 7.8 release.

Cloud Haskell

The reimplementation of Cloud Haskell is now avaiable from Hackage and in a state where it is ready for serious experiments. Compared to the prototype it is much faster; it can run on multiple kinds of networks; it has backends to support different environments (like cluster or cloud, with a proof-of-concept backend for Azure); has a new system for dealing with node disconnect and reconnect, and a more precisely defined semantics; supports composable, polymorphic serialisable closures; and internally the code is better structured and easier to work with.

The wiki lists some important open issues of the implementation; the semantics document mentioned above lists some important open semantic issues.

Project artefacts

Some of the work by our project partners is available to the public

Project Partner Description Status
mightttpd2 IIJ File/CGI server on top of Warp version 2.5.7 released 2012-04-05
webserver IIJ HTTP server library version 0.4.6 released 2011−10−05
wai-app-file-cgi IIJ File/CGI WAI application (used by Mighttpd) version 0.5.8 released 2012-04-05
wai-logger IIJ Logging system for WAI (used by Mighttpd) version 0.1.4 released 2012-02-13
http-date IIJ Fast parser and formatter for HTTP Date version 0.0.2 released 2012-02-17
dns IIJ DNS library version 0.2.0 released 2011−08−31
iproute IIJ IP routing table version 1.2.5 released 2012-04-02
domain-auth IIJ Library for Sender Policy Framework, SenderID, DomainKeys and DKIM. version 0.2.0 released 2011−08-31
RPF IIJ Receiver Policy Framework (milter) version 0.2.0 released 2011−08-31

In addition to helping the participating organisations, the project will whenever possible make improvements to libraries and tools that are useful to Haskell users more generally.

Project Description Status
multiprocess Threadscope profiling of multi-process or distributed Haskell systems such as client/server or MPI programs. in progress
LFG Haskell implementation of some pseudo random number generators from the SPRNG library testing
SPRNG binding Haskell wrapper around SPRNG in progress
ThreadScope improvements new spark profiling tools, heap statistics, GUI enhancements, bug fixes version 0.2.2 released 2012-11-02
ghc-events improvements spark events support, eventlog verification, improved eventlog merging version 0.4.2.0 released 2012-11-01
linux-perf Library for manipulating the data file output of the Linux perf command, with special support for displaying GHC RTS perf events in ghc-events and ThreadScope version 0.3 released 2012-11-03
gtk2hs maintenance & release GHC 7.2 support version 0.12.2 released 2011-11-13
Haskell-MPI Haskell bindings to C MPI library version 1.2.1 released 2012-02-15
GHC RTS improvements  #4449 - GHC 7 can't do IO when daemonized fixed in 7.0.x branch
 #4504 - "awaitSignal Nothing" does not block thread with -threaded fixed in 7.0.2
 #4512 - EventLog does not play well with forkProcess fixed in 7.0.x branch
 #4514 - IO manager can deadlock if a file descriptor is closed behind its back fixed in 7.0.x branch
 #4854 - Validating on a PPC Mac OS X: Fix miscellaneous errors and warnings fixed in 7.0.x branch
c2hs improvements marshalling functions now can have arguments supplied to them. version 0.16.3 released 2011−03−24
libssh2 Major refactoring of the libssh2 bindings for Haskell to use the Haskell I/O manager to deal with blocking, rather than block in C land (necessary for the Cloud Haskell Azure backend) version 0.2 released 2012-10-14
http-conduit Added support for client SSL certificates (necessary for the Cloud Haskell Azure backend) version 1.8.1 released 2012-10-24
network-transport / network-transport-tcp Generic Network Transport interface and TCP instantiation, used by Cloud Haskell, HdpH, and MetaPar version 0.3.0 released 2012-10-03 / version 0.3.1 released 2012-10-19
distributed-process / distributed-static / distributed-process-simplelocalnet Reimplementation of Cloud Haskell version 0.4.0.2 released 2012-10-23 / version 0.2.1 released 2012-10-03 / version 0.2.0.7 released 2012-10-23
distributed-process-azure / azure-service-api Proof-of-concept Cloud Haskell backend for the Microsoft Azure version 0.1.0 released 2012-11-05

The project will also aim to document existing tools and parallel programming practices, making them accesible to a wider public.

Project Description Status
ThreadScope Tour a short guide to using ThreadScope to help analyse parallel program performance unveiled 2012-01-14
submissions to TMR 19 Mighttpd – a High Performance Web Server in Haskell (Kazu Yamamoto) submitted
High Performance Haskell with MPI (Bernie Pope and Dmitry Astapov) submitted
Parallel Haskell Portal one-stop resource oriented for users of parallelism and concurrency in Haskell unveiled 2011−04−20
Cloud Haskell wiki Collection of Cloud Haskell resources, as well as links to a number of new blog posts about Cloud Haskell last update 2012-11-12

The Parallel Haskell Digest

We have been publishing a regular newsletter containing project news, other parallel news from around the Haskell community and short "Word of the Month" articles giving brief introductions to important concepts in parallelism.

The back issues are here:

Getting involved

Progress reports will be posted to the parallel Haskell mailing list and to the Well-Typed blog.

The best starting point to get involved is to join the mailing list. Note that the list is for parallel Haskell generally, not just the Parallel GHC Project.

Participating organisations

Dragonfly
Cloudy Bayes: Hierarchical Bayesian modeling in Haskell
The Cloudy Bayes project aims to develop a fast Bayesian model fitter that takes advantage of modern multiprocessor machines. It will support model descriptions in the BUGS model description language (WinBUGS, OpenBUGS, and JAGS). It will be implemented as an embedded domain specific language (EDSL) within Haskell. A wide range of model hierarchical Bayesian model structures will be possible, including many of the models used in medical, ecological, and biological sciences.
Cloudy Bayes will provide an easy to use interface for describing models, running Monte Carlo Markov chain (MCMC) fitters, diagnosing performance and convergence criteria as it runs, and collecting output for post-processing. Haskell's strong type system will be used to ensure that model descriptions make sense, providing a fast, safe development cycle.
IIJ Innovation Institute Inc.
Haskell is suitable for many kinds of domain, and GHC's support for lightweight threads makes it attractive for concurrency applications. An exception has been network server programming because GHC 6.12 and earlier have an IO manager that is limited to 1024 network sockets. GHC 7 has a new IO manager implementation that gets rid of this limitation.
This project will implement several network servers to demonstrate that Haskell is suitable for network servers that handle a massive number of concurrent connections.
Los Alamos National Laboratory
This project will use parallel Haskell to implement high-performance Monte Carlo algorithms, a class of algorithms which use randomness to sample large or otherwise intractable solution spaces. The initial goal is a particle-based MC algorithm suitable for modeling the flow of radiation, with application to problems in astrophysics. From this, the project is expected to move to identification of suitable abstractions for expressing a wider variety of Monte Carlo algorithms, and using models for different physical phenomena.
Willow Garage Inc.
Distributed Rigid Body Dynamics in ROS
Willow Garage seeks a high-level representation for a distributed rigid body dynamics simulation, capable of excellent parallel speedup on current and foreseeable hardware, yet linking to existing optimized libraries for low-level message passing and matrix math.
This project will drive API, performance, and profiling tool requirements for Haskell's interface to the Message Passing Interface (MPI) specification, an industry-standard in High Performance Computing (HPC), as used on clusters of many nodes.
Competing internal initiatives use C++/MPI and CUDA directly.
Willow Garage aims to lay the groundwork for personal robotics applications in everyday life. ROS (Robot Operating System) is an open source, meta-operating system for your robot.
Telefónica I+D
This project is to demonstrate parallel Haskell technology using the example of graph algorithms in large graphs representing social networks. The current work is on parallel versions of the Bron-Kerbosch algorithm for finding maximal cliques in a graph. The initial goal is to demonstrate good speedups on multi-core and the overall aim to demonstrate good speedups of a distributed version of the algorithm using Cloud Haskell.
VETT UK
VETT are working on a transaction processing application using Cloud Haskell. More details will be available shortly.