|
|
Line 1: |
Line 1: |
| == Overview ==
| |
|
| |
|
| This page is initially to provide a location for discussions on extending GHC to take advantage of CPU SIMD instructions, including SSE and Altivec instructions.
| |
|
| |
| SSE provides 'packed' data types of floats and integers that fit into 128 bit xmm registers.
| |
|
| |
| The operations on these data types include the standard mathematical operations (Add/Mul/...). There are also additional mathematical operations (reciprocal, reciprocal-square-root) and packed-specific operations such as dot-product, horizontal add/sub/add-sub.
| |
|
| |
| Also, to support data-streaming operations, there are memory operations that bypass the cache and write directly to/from the xmm registers.
| |
|
| |
| xmm registers are 128 bits and hold both packed integer and packed float types. I suggest that a new `PackedReg` data constructor be added.
| |
|
| |
| In terms of an implementation plan:
| |
|
| |
| * Add new packed data types and 'standard' operations on those types to Cmm and primops.txt.pp
| |
|
| |
| ** Int32Packed4#, ...
| |
|
| |
| ** Width = ... | W32_4 | ...
| |
|
| |
| * implement new types and operations in backends (C/LLVM/ASM)
| |
|
| |
| So far this is straightforward.
| |
|
| |
| * As has been mentioned on the developer's [http://hackage.haskell.org/trac/ghc/ticket/3557 wiki] a 'packed-size' agnostic optimising layer of vector operations would be great. It seems that this could be implemented without new primops on top of the CPU-specific primops.
| |
|
| |
| * What mechanism should be used for constructing/accessing elements of a packed data type? (LLVM has a <vector n type> datatype with accessor functions).
| |
|
| |
| * Stream fusion would allow complex operations for 'map'ed and 'zip'ed vectors of Floats, etc., that are optimised to make use of CPU Vectors.
| |