GHC/SIMD: Difference between revisions

Line 1:

~~== Overview ==~~

~~This page is initially to provide a location for discussions on extending GHC to take advantage of CPU SIMD instructions, including SSE and Altivec instructions.~~

~~SSE provides 'packed' data types of floats and integers that fit into 128 bit xmm registers.~~

The operations on these data types include the standard mathematical operations (Add/Mul/...). There are also additional mathematical operations (reciprocal, reciprocal-square-root) and packed-specific operations such as dot-product, horizontal add/sub/add-sub.

~~Also, to support data-streaming operations, there are memory operations that bypass the cache and write directly to/from the xmm registers.~~

~~xmm registers are 128 bits and hold both packed integer and packed float types. I suggest that a new `PackedReg` data constructor be added.~~

~~In terms of an implementation plan:~~

* Add new packed data types and 'standard' operations on those types to Cmm and primops.txt.pp

** Int32Packed4#, ...

** Width = ... | W32_4 | ...

* implement new types and operations in backends (C/LLVM/ASM)

~~So far this is straightforward.~~

* As has been mentioned on the developer's [http://hackage.haskell.org/trac/ghc/ticket/3557 wiki] a 'packed-size' agnostic optimising layer of vector operations would be great. It seems that this could be implemented without new primops on top of the CPU-specific primops.

* What mechanism should be used for constructing/accessing elements of a packed data type? (LLVM has a <vector n type> datatype with accessor functions).

* Stream fusion would allow complex operations for 'map'ed and 'zip'ed vectors of Floats, etc., that are optimised to make use of CPU Vectors.

Revision as of 23:23, 3 November 2010