Difference between revisions of "GHC/SIMD"
< GHC
Jump to navigation
Jump to search
(Initial entry) |
m (To be deleted if no new content appears...) |
||
(One intermediate revision by one other user not shown) | |||
Line 1: | Line 1: | ||
+ | [[Category:Pages to be removed]] |
||
− | == Overview == |
||
− | |||
− | This page is initially to provide a location for discussions on extending GHC to take advantage of CPU SIMD instructions, including SSE and Altivec instructions. |
||
− | |||
− | SSE provides 'packed' data types of floats and integers that fit into 128 bit xmm registers. |
||
− | |||
− | The operations on these data types include the standard mathematical operations (Add/Mul/...). There are also additional mathematical operations (reciprocal, reciprocal-square-root) and packed-specific operations such as dot-product, horizontal add/sub/add-sub. |
||
− | |||
− | Also, to support data-streaming operations, there are memory operations that bypass the cache and write directly to/from the xmm registers. |
||
− | |||
− | xmm registers are 128 bits and hold both packed integer and packed float types. I suggest that a new `PackedReg` data constructor be added. |
||
− | |||
− | In terms of an implementation plan: |
||
− | |||
− | * Add new packed data types and 'standard' operations on those types to Cmm and primops.txt.pp |
||
− | |||
− | ** Int32Packed4#, ... |
||
− | |||
− | ** Width = ... | W32_4 | ... |
||
− | |||
− | * implement new types and operations in backends (C/LLVM/ASM) |
||
− | |||
− | So far this is straightforward. |
||
− | |||
− | * As has been mentioned on the developer's [http://hackage.haskell.org/trac/ghc/ticket/3557 wiki] a 'packed-size' agnostic optimising layer of vector operations would be great. It seems that this could be implemented without new primops on top of the CPU-specific primops. |
||
− | |||
− | * What mechanism should be used for constructing/accessing elements of a packed data type? (LLVM has a <vector n type> datatype with accessor functions). |
||
− | |||
− | * Stream fusion would allow complex operations for 'map'ed and 'zip'ed vectors of Floats, etc., that are optimised to make use of CPU Vectors. |