Difference between revisions of "Benchmarks Game/Parallel/BinaryTreesDPH"
< Benchmarks Game | Parallel
Jump to navigation
Jump to search
Line 7: | Line 7: | ||
* Not submitted, requires GHC 6.10 beta or above |
* Not submitted, requires GHC 6.10 beta or above |
||
+ | Info: |
||
− | Hardware: |
||
* 2.2gHz core2duo, Macbook Pro |
* 2.2gHz core2duo, Macbook Pro |
||
+ | * The Glorious Glasgow Haskell Compilation System, version 6.10.0.20080921 |
||
Build and run: |
Build and run: |
||
Line 20: | Line 21: | ||
<b>Note, these times have a 350mb heap (+RTS -A350M)</b> |
<b>Note, these times have a 350mb heap (+RTS -A350M)</b> |
||
− | Single core vs. dual core |
+ | DPH - Single core vs. dual core |
* Single core (no -N3): 18.945 |
* Single core (no -N3): 18.945 |
||
* Dual core (with -N3): 14.338 (see below) |
* Dual core (with -N3): 14.338 (see below) |
||
− | DPH vs. parallel strategies |
||
− | * Parallel strategies dual core: 11.471 (see below) |
||
− | * DPH Dual core: 14.338 (see below) |
||
⚫ | |||
− | Comparison with parallel strategies version of parallel binary trees: |
||
+ | <b>DPH</b>: |
||
⚫ | |||
time ./binarytrees 20 +RTS -N3 -sstderr -A350M -RTS |
time ./binarytrees 20 +RTS -N3 -sstderr -A350M -RTS |
||
Line 96: | Line 94: | ||
./binarytrees 20 +RTS -N3 -sstderr -A350M -RTS 23.57s user 2.13s system 179% cpu 14.338 total |
./binarytrees 20 +RTS -N3 -sstderr -A350M -RTS 23.57s user 2.13s system 179% cpu 14.338 total |
||
+ | |||
− | + | <b>Parallel Strategies</b>: |
|
time ./binarytrees 20 +RTS -N3 -sstderr -A350M -RTS |
time ./binarytrees 20 +RTS -N3 -sstderr -A350M -RTS |
||
Line 149: | Line 148: | ||
./binarytrees 20 +RTS -N3 -sstderr -A350M -RTS 18.62s user 2.15s system 181% cpu 11.471 total |
./binarytrees 20 +RTS -N3 -sstderr -A350M -RTS 18.62s user 2.15s system 181% cpu 11.471 total |
||
+ | |||
+ | ===Code=== |
||
+ | <haskell> |
||
+ | {-# LANGUAGE BangPatterns, TypeOperators #-} |
||
+ | {-# OPTIONS -funbox-strict-fields #-} |
||
+ | -- |
||
+ | -- The Computer Language Shootout |
||
+ | -- http://shootout.alioth.debian.org/ |
||
+ | -- |
||
+ | -- Contributed by Don Stewart and Thomas Davie, modified for DPH |
||
+ | -- usage by Austin Seipp |
||
+ | -- |
||
+ | -- This uses the Data Parallel Haskell beta available in ghc 6.10, to |
||
+ | -- exploit multi-core machines. |
||
+ | -- |
||
+ | |||
+ | import System |
||
+ | import Data.Bits |
||
+ | import Text.Printf |
||
+ | import Data.Array.Parallel.Base |
||
+ | import Data.Array.Parallel.Unlifted as U |
||
+ | import Data.Array.Parallel.Unlifted.Parallel as UP |
||
+ | -- |
||
+ | -- an artificially strict tree. |
||
+ | -- |
||
+ | -- normally you would ensure the branches are lazy, but this benchmark |
||
+ | -- requires strict allocation. |
||
+ | -- |
||
+ | data Tree = Nil | Node !Int !Tree !Tree |
||
+ | |||
+ | minN = 4 |
||
+ | |||
+ | io s n t = printf "%s of depth %d\t check: %d\n" s n t |
||
+ | |||
+ | main = do |
||
+ | n <- getArgs >>= readIO . head |
||
+ | let maxN = max (minN + 2) n |
||
+ | stretchN = maxN + 1 |
||
+ | |||
+ | -- stretch memory tree |
||
+ | let c = check' (make 0 stretchN) |
||
+ | io "stretch tree" stretchN c |
||
+ | |||
+ | -- allocate a long lived tree |
||
+ | let !long = make 0 maxN |
||
+ | |||
+ | -- allocate, walk, and deallocate many bottom-up binary trees |
||
+ | let vs = UP.mapUP (depth' maxN) $ UP.enumFromThenToUP minN (minN+2) maxN |
||
+ | mapM_ (\((m :*:d :*: i)) -> |
||
+ | io (show m ++ "\t trees") d i) (U.toList vs) |
||
+ | |||
+ | -- confirm the the long-lived binary tree still exists |
||
+ | io "long lived tree" maxN (check' long) |
||
+ | |||
+ | -- generate many trees |
||
+ | depth' :: Int -> Int -> (Int :*: Int :*: Int) |
||
+ | depth' m d = |
||
+ | (2*n) :*: d :*: (sumT d n 0) |
||
+ | where |
||
+ | n = 1 `shiftL` (m - d + minN) |
||
+ | |||
+ | -- allocate and check lots of trees |
||
+ | sumT :: Int -> Int -> Int -> Int |
||
+ | sumT d 0 t = t |
||
+ | sumT d i t = sumT d (i-1) (t + a + b) |
||
+ | where a = check' (make i d) |
||
+ | b = check' (make (-i) d) |
||
+ | |||
+ | -- traverse the tree, counting up the nodes |
||
+ | check' :: Tree -> Int |
||
+ | check' Nil = 0 |
||
+ | check' (Node i l r) = i + check' l - check' r |
||
+ | |||
+ | -- build a tree |
||
+ | make :: Int -> Int -> Tree |
||
+ | make i 0 = Node i Nil Nil |
||
+ | make i d = Node i (make (i2-1) d2) (make i2 d2) |
||
+ | where i2 = 2*i; d2 = d-1 |
||
+ | </haskell> |
||
+ | |||
+ | ===Notes=== |
||
+ | |||
+ | I'm taking a guess (core is tremendous) that an issue potentially lies here: |
||
+ | <haskell> mapM_ (\((m :*:d :*: i)) -> |
||
+ | io (show m ++ "\t trees") d i) (U.toList vs)</haskell> |
||
+ | Not sure if U.toList fuses properly - but how can we traverse it otherwise? |
Revision as of 20:48, 7 October 2008
Binary Trees
Data Parallel Haskell
- Not submitted, requires GHC 6.10 beta or above
Info:
- 2.2gHz core2duo, Macbook Pro
- The Glorious Glasgow Haskell Compilation System, version 6.10.0.20080921
Build and run:
$ ~/ghc-6.10/bin/ghc --make -fcpr-off -threaded -fdph-par -package dph-base -Odph binarytrees.hs $ ./binarytrees 20 +RTS -N3 -sstderr -A350M
Times
Note, these times have a 350mb heap (+RTS -A350M)
DPH - Single core vs. dual core
- Single core (no -N3): 18.945
- Dual core (with -N3): 14.338 (see below)
Parallel Strategies vs. Data Parallel Haskell
DPH:
time ./binarytrees 20 +RTS -N3 -sstderr -A350M -RTS ./binarytrees 20 +RTS -N3 -sstderr -A350M stretch tree of depth 21 check: -1 2097152 trees of depth 4 check: -2097152 524288 trees of depth 6 check: -524288 131072 trees of depth 8 check: -131072 32768 trees of depth 10 check: -32768 8192 trees of depth 12 check: -8192 2048 trees of depth 14 check: -2048 512 trees of depth 16 check: -512 128 trees of depth 18 check: -128 32 trees of depth 20 check: -32 long lived tree of depth 20 check: -1 14,584,449,064 bytes allocated in the heap 215,093,024 bytes copied during GC 34,674,344 bytes maximum residency (2 sample(s)) 51,688 bytes maximum slop 1083 MB total memory in use (9 MB lost due to fragmentation)
Generation 0: 16 collections, 16 parallel, 1.97s, 1.13s elapsed Generation 1: 2 collections, 2 parallel, 0.83s, 0.47s elapsed
Parallel GC work balance: 2.34 (53773189 / 22969253, ideal 3)
Task 0 (worker) : MUT time: 23.18s ( 12.54s elapsed) GC time: 0.37s ( 0.21s elapsed)
Task 1 (worker) : MUT time: 21.46s ( 12.54s elapsed) GC time: 2.09s ( 1.19s elapsed)
Task 2 (worker) : MUT time: 23.53s ( 12.54s elapsed) GC time: 0.02s ( 0.02s elapsed)
Task 3 (worker) : MUT time: 23.56s ( 12.57s elapsed) GC time: 0.00s ( 0.00s elapsed)
Task 4 (worker) : MUT time: 23.24s ( 12.57s elapsed) GC time: 0.32s ( 0.18s elapsed)
INIT time 0.02s ( 0.03s elapsed) MUT time 20.75s ( 12.54s elapsed) GC time 2.80s ( 1.60s elapsed) EXIT time 0.00s ( 0.02s elapsed) Total time 23.57s ( 14.17s elapsed)
%GC time 11.9% (11.3% elapsed)
Alloc rate 702,222,726 bytes per MUT second
Productivity 88.1% of total user, 146.4% of total elapsed
recordMutableGen_sync: 0 gc_alloc_block_sync: 14318885 whitehole_spin: 0 gen[0].steps[0].sync_todo: 0 gen[0].steps[0].sync_large_objects: 0 gen[0].steps[1].sync_todo: 354 gen[0].steps[1].sync_large_objects: 0 gen[1].steps[0].sync_todo: 10112 gen[1].steps[0].sync_large_objects: 0 ./binarytrees 20 +RTS -N3 -sstderr -A350M -RTS 23.57s user 2.13s system 179% cpu 14.338 total
Parallel Strategies:
time ./binarytrees 20 +RTS -N3 -sstderr -A350M -RTS ./binarytrees 20 +RTS -N3 -sstderr -A350M stretch tree of depth 21 check: -1 2097152 trees of depth 4 check: -2097152 524288 trees of depth 6 check: -524288 131072 trees of depth 8 check: -131072 32768 trees of depth 10 check: -32768 8192 trees of depth 12 check: -8192 2048 trees of depth 14 check: -2048 512 trees of depth 16 check: -512 128 trees of depth 18 check: -128 32 trees of depth 20 check: -32 long lived tree of depth 20 check: -1 9,719,681,300 bytes allocated in the heap 164,038,148 bytes copied during GC (scavenged) 160 bytes copied during GC (not scavenged) 33,718,272 bytes maximum residency (2 sample(s))
11 collections in generation 0 ( 0.33s) 2 collections in generation 1 ( 0.20s)
1094 Mb total memory in use
Task 0 (worker) : MUT time: 18.43s ( 10.65s elapsed) GC time: 0.17s ( 0.19s elapsed)
Task 1 (worker) : MUT time: 18.54s ( 10.65s elapsed) GC time: 0.06s ( 0.06s elapsed)
Task 2 (worker) : MUT time: 18.32s ( 10.65s elapsed) GC time: 0.29s ( 0.36s elapsed)
Task 3 (worker) : MUT time: 18.60s ( 10.67s elapsed) GC time: 0.01s ( 0.01s elapsed)
Task 4 (worker) : MUT time: 18.61s ( 10.68s elapsed) GC time: 0.00s ( 0.00s elapsed)
INIT time 0.01s ( 0.03s elapsed) MUT time 18.08s ( 10.65s elapsed) GC time 0.53s ( 0.63s elapsed) EXIT time 0.00s ( 0.01s elapsed) Total time 18.62s ( 11.31s elapsed)
%GC time 2.8% (5.6% elapsed)
Alloc rate 537,291,716 bytes per MUT second
Productivity 97.1% of total user, 159.8% of total elapsed
./binarytrees 20 +RTS -N3 -sstderr -A350M -RTS 18.62s user 2.15s system 181% cpu 11.471 total
Code
{-# LANGUAGE BangPatterns, TypeOperators #-}
{-# OPTIONS -funbox-strict-fields #-}
--
-- The Computer Language Shootout
-- http://shootout.alioth.debian.org/
--
-- Contributed by Don Stewart and Thomas Davie, modified for DPH
-- usage by Austin Seipp
--
-- This uses the Data Parallel Haskell beta available in ghc 6.10, to
-- exploit multi-core machines.
--
import System
import Data.Bits
import Text.Printf
import Data.Array.Parallel.Base
import Data.Array.Parallel.Unlifted as U
import Data.Array.Parallel.Unlifted.Parallel as UP
--
-- an artificially strict tree.
--
-- normally you would ensure the branches are lazy, but this benchmark
-- requires strict allocation.
--
data Tree = Nil | Node !Int !Tree !Tree
minN = 4
io s n t = printf "%s of depth %d\t check: %d\n" s n t
main = do
n <- getArgs >>= readIO . head
let maxN = max (minN + 2) n
stretchN = maxN + 1
-- stretch memory tree
let c = check' (make 0 stretchN)
io "stretch tree" stretchN c
-- allocate a long lived tree
let !long = make 0 maxN
-- allocate, walk, and deallocate many bottom-up binary trees
let vs = UP.mapUP (depth' maxN) $ UP.enumFromThenToUP minN (minN+2) maxN
mapM_ (\((m :*:d :*: i)) ->
io (show m ++ "\t trees") d i) (U.toList vs)
-- confirm the the long-lived binary tree still exists
io "long lived tree" maxN (check' long)
-- generate many trees
depth' :: Int -> Int -> (Int :*: Int :*: Int)
depth' m d =
(2*n) :*: d :*: (sumT d n 0)
where
n = 1 `shiftL` (m - d + minN)
-- allocate and check lots of trees
sumT :: Int -> Int -> Int -> Int
sumT d 0 t = t
sumT d i t = sumT d (i-1) (t + a + b)
where a = check' (make i d)
b = check' (make (-i) d)
-- traverse the tree, counting up the nodes
check' :: Tree -> Int
check' Nil = 0
check' (Node i l r) = i + check' l - check' r
-- build a tree
make :: Int -> Int -> Tree
make i 0 = Node i Nil Nil
make i d = Node i (make (i2-1) d2) (make i2 d2)
where i2 = 2*i; d2 = d-1
Notes
I'm taking a guess (core is tremendous) that an issue potentially lies here:
mapM_ (\((m :*:d :*: i)) ->
io (show m ++ "\t trees") d i) (U.toList vs)
Not sure if U.toList fuses properly - but how can we traverse it otherwise?