Benchmarks Game/Parallel/BinaryTreesDPH

From HaskellWiki
< Benchmarks Game‎ | Parallel
Revision as of 20:26, 7 October 2008 by Thoughtpolice (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Binary Trees

Data Parallel Haskell

  • Not submitted, requires GHC 6.10 beta or above

Hardware:

  • 2.2gHz core2duo, Macbook Pro

Build and run:

   $ ~/ghc-6.10/bin/ghc --make -fcpr-off -threaded -fdph-par -package dph-base -Odph binarytrees.hs
   $ ./binarytrees 20 +RTS -N3 -sstderr -A350M

Times

Note, these times have a 350mb heap (+RTS -A350M)

Single core vs. dual core

  • Single core (no -N3): 18.945
  • Dual core (with -N3): 14.338 (see below)

DPH vs. parallel strategies

  • Parallel strategies dual core: 11.471 (see below)
  • DPH Dual core: 14.338 (see below)

Comparison with parallel strategies version of parallel binary trees:

Data Parallel Haskell

time ./binarytrees 20 +RTS -N3 -sstderr -A350M -RTS
./binarytrees 20 +RTS -N3 -sstderr -A350M 
stretch tree of depth 21	 check: -1
2097152	 trees of depth 4	 check: -2097152
524288	 trees of depth 6	 check: -524288
131072	 trees of depth 8	 check: -131072
32768	 trees of depth 10	 check: -32768
8192	 trees of depth 12	 check: -8192
2048	 trees of depth 14	 check: -2048
512	 trees of depth 16	 check: -512
128	 trees of depth 18	 check: -128
32	 trees of depth 20	 check: -32
long lived tree of depth 20	 check: -1
 14,584,449,064 bytes allocated in the heap
    215,093,024 bytes copied during GC
     34,674,344 bytes maximum residency (2 sample(s))
         51,688 bytes maximum slop
           1083 MB total memory in use (9 MB lost due to fragmentation)
 Generation 0:    16 collections,    16 parallel,  1.97s,  1.13s elapsed
 Generation 1:     2 collections,     2 parallel,  0.83s,  0.47s elapsed
 Parallel GC work balance: 2.34 (53773189 / 22969253, ideal 3)
 Task  0 (worker) :  MUT time:  23.18s  ( 12.54s elapsed)
                     GC  time:   0.37s  (  0.21s elapsed)
 Task  1 (worker) :  MUT time:  21.46s  ( 12.54s elapsed)
                     GC  time:   2.09s  (  1.19s elapsed)
 Task  2 (worker) :  MUT time:  23.53s  ( 12.54s elapsed)
                     GC  time:   0.02s  (  0.02s elapsed)
 Task  3 (worker) :  MUT time:  23.56s  ( 12.57s elapsed)
                     GC  time:   0.00s  (  0.00s elapsed)
 Task  4 (worker) :  MUT time:  23.24s  ( 12.57s elapsed)
                     GC  time:   0.32s  (  0.18s elapsed)
 INIT  time    0.02s  (  0.03s elapsed)
 MUT   time   20.75s  ( 12.54s elapsed)
 GC    time    2.80s  (  1.60s elapsed)
 EXIT  time    0.00s  (  0.02s elapsed)
 Total time   23.57s  ( 14.17s elapsed)
 %GC time      11.9%  (11.3% elapsed)
 Alloc rate    702,222,726 bytes per MUT second
 Productivity  88.1% of total user, 146.4% of total elapsed
recordMutableGen_sync: 0
gc_alloc_block_sync: 14318885
whitehole_spin: 0
gen[0].steps[0].sync_todo: 0
gen[0].steps[0].sync_large_objects: 0
gen[0].steps[1].sync_todo: 354
gen[0].steps[1].sync_large_objects: 0
gen[1].steps[0].sync_todo: 10112
gen[1].steps[0].sync_large_objects: 0
./binarytrees 20 +RTS -N3 -sstderr -A350M -RTS  23.57s user 2.13s system 179% cpu 14.338 total

Parallel Strategies

time ./binarytrees 20 +RTS -N3 -sstderr -A350M -RTS
./binarytrees 20 +RTS -N3 -sstderr -A350M 
stretch tree of depth 21	 check: -1
2097152	 trees of depth 4	 check: -2097152
524288	 trees of depth 6	 check: -524288
131072	 trees of depth 8	 check: -131072
32768	 trees of depth 10	 check: -32768
8192	 trees of depth 12	 check: -8192
2048	 trees of depth 14	 check: -2048
512	 trees of depth 16	 check: -512
128	 trees of depth 18	 check: -128
32	 trees of depth 20	 check: -32
long lived tree of depth 20	 check: -1
9,719,681,300 bytes allocated in the heap
164,038,148 bytes copied during GC (scavenged)
       160 bytes copied during GC (not scavenged)
33,718,272 bytes maximum residency (2 sample(s))
        11 collections in generation 0 (  0.33s)
         2 collections in generation 1 (  0.20s)
      1094 Mb total memory in use
 Task  0 (worker) :  MUT time:  18.43s  ( 10.65s elapsed)
                     GC  time:   0.17s  (  0.19s elapsed)
 Task  1 (worker) :  MUT time:  18.54s  ( 10.65s elapsed)
                     GC  time:   0.06s  (  0.06s elapsed)
 Task  2 (worker) :  MUT time:  18.32s  ( 10.65s elapsed)
                     GC  time:   0.29s  (  0.36s elapsed)
 Task  3 (worker) :  MUT time:  18.60s  ( 10.67s elapsed)
                     GC  time:   0.01s  (  0.01s elapsed)
 Task  4 (worker) :  MUT time:  18.61s  ( 10.68s elapsed)
                     GC  time:   0.00s  (  0.00s elapsed)
 INIT  time    0.01s  (  0.03s elapsed)
 MUT   time   18.08s  ( 10.65s elapsed)
 GC    time    0.53s  (  0.63s elapsed)
 EXIT  time    0.00s  (  0.01s elapsed)
 Total time   18.62s  ( 11.31s elapsed)
 %GC time       2.8%  (5.6% elapsed)
 Alloc rate    537,291,716 bytes per MUT second
 Productivity  97.1% of total user, 159.8% of total elapsed
./binarytrees 20 +RTS -N3 -sstderr -A350M -RTS  18.62s user 2.15s system 181% cpu 11.471 total