Difference between revisions of "Performance/Parallel"

From HaskellWiki
Jump to navigation Jump to search
m (change advice about cross-module parallelism loss)
 
Line 31: Line 31:
 
Sparks let us speculate on parallelism via the `par` function, which hints that its first argument would be good to evaluate in parallel. A library of combinators for parallelism can be built up this way.
 
Sparks let us speculate on parallelism via the `par` function, which hints that its first argument would be good to evaluate in parallel. A library of combinators for parallelism can be built up this way.
   
=== Inlining ===
+
=== Cross-module parallelism loss ===
   
 
"If I move the `parallelize' definition (a combinator using `par`) into another module and import that module, the performance is completely lost"
 
"If I move the `parallelize' definition (a combinator using `par`) into another module and import that module, the performance is completely lost"
   
  +
This problem is not entirely understood yet. The first piece of advice is to upgrade to GHC HEAD which handles the sparks created by `par` much better.
This is almost always due to lack of inlining. Add {-# INLINE #-} pragmas to the definition of parallelize.
 
   
 
== Data Parallel Arrays ==
 
== Data Parallel Arrays ==

Latest revision as of 15:29, 23 May 2009

Tips and tricks for better multicore parallel performance from your Haskell.

Which GHC version to use

The recommended version of GHC for parallel programming at the moment (Apr 2009) is GHC 6.12 (aka the HEAD branch of GHC), which has had extensive tuning.

Affinity

Parallel GC

Increase the default heap size. +RTS -H500M for example.

Using n-1 cores

GHC has parallel garbage collection since 2007 already. However, recent work showed that this parallel GC sometimes hampers performance, based on various factors, in particular on Linux systems. See this thread for more: http://www.haskell.org/pipermail/glasgow-haskell-users/2009-April/017050.html

GHC 6.10.2 contains some slightly bogus heuristics about when to turn on the parallel GC, and it just so happens that 8 processors tips it over the point where the parallel GC is enabled for young-generation collections. In 6.10.2 the parallel GC really didn't help most of the time, but it has undergone a lot of tuning since then, and in the HEAD things are much better (see the results from our ICFP submission).

Disabling parallel GC

In the meantime you might get somewhere by disabling parallel GC altogether (+RTS -g1), but as the results in our paper show, sometimes the parallel GC is essential for retaining locality in parallel

Best flags for parallel GC

Don't forget the -qg0 -qb flags with HEAD, these flags usually give the best parallel GC performance at the moment (Apr 2009)

Sparks

Sparks let us speculate on parallelism via the `par` function, which hints that its first argument would be good to evaluate in parallel. A library of combinators for parallelism can be built up this way.

Cross-module parallelism loss

"If I move the `parallelize' definition (a combinator using `par`) into another module and import that module, the performance is completely lost"

This problem is not entirely understood yet. The first piece of advice is to upgrade to GHC HEAD which handles the sparks created by `par` much better.

Data Parallel Arrays

Data Parallelism in Haskell
Manuel Chakravarty's slides from ICFP PC Workshop April 2009.

Tool Support

Tools for parallel performance tuning:

ThreadScope
GHC (HEAD 2009) supports a visual post-mortem analysis, and a graphical tool "ThreadScope" has been developed by Satnam Singh and others.

More information

Runtime Support for Multicore Haskell (Simon Marlow, Simon Peyton Jones, Satnam Singh) Submitted to ICFP'09, March 2009