Difference between revisions of "Numeric Haskell: A Repa Tutorial"
DonStewart (talk  contribs) (→Importing the library) 
m (Moved code in <haskell> brackets to the left margin to avoid confusing the parser) 

Line 31:  Line 31:  
Download the `repa` package: 
Download the `repa` package: 

−  +  $ cabal install repa 

and import it qualified: 
and import it qualified: 

−  +  import qualified Data.Array.Repa as R 

The library needs to be imported qualified as it shares the same function names as list operations in the Prelude. 
The library needs to be imported qualified as it shares the same function names as list operations in the Prelude. 

Line 64:  Line 64:  
<haskell> 
<haskell> 

−  +  type DIM0 = Z 

−  +  type DIM1 = DIM0 :. Int 

−  +  type DIM2 = DIM1 :. Int 

−  +  type DIM3 = DIM2 :. Int 

−  +  type DIM4 = DIM3 :. Int 

−  +  type DIM5 = DIM4 :. Int 

</haskell> 
</haskell> 

Line 75:  Line 75:  
<haskell> 
<haskell> 

−  +  Array DIM2 Double 

</haskell> 
</haskell> 

Line 81:  Line 81:  
<haskell> 
<haskell> 

−  +  Array Z Double 

</haskell> 
</haskell> 

Line 96:  Line 96:  
<haskell> 
<haskell> 

−  +  > Z  the zerodimension 

−  +  Z 

</haskell> 
</haskell> 

Line 103:  Line 103:  
<haskell> 
<haskell> 

−  +  > :t Z :. 10 

−  +  Z :. 10 :: Num head => Z :. head 

</haskell> 
</haskell> 

Line 114:  Line 114:  
<haskell> 
<haskell> 

−  +  > :t Z :. (10 :: Int) 

−  +  Z :. (10 :: Int) :: Z :. Int 

</haskell> 
</haskell> 

Line 128:  Line 128:  
New repa arrays ("arrays" from here on) can be generated in many ways, and we always begin by importing the <code>Data.Array.Repa</code> module: 
New repa arrays ("arrays" from here on) can be generated in many ways, and we always begin by importing the <code>Data.Array.Repa</code> module: 

⚫  
+  <code> 

⚫  
⚫  
−  Loading package ghcprim ... linking ... done. 

⚫  
−  +  Loading package ghcprim ... linking ... done. 

−  +  Loading package integergmp ... linking ... done. 

−  +  Loading package base ... linking ... done. 

−  +  Loading package ffi1.0 ... linking ... done. 

+  Prelude> :m + Data.Array.Repa 

+  </code> 

They may be constructed from lists, for example. Here is a one dimensional array of length 10, here, given the shape `(Z :. 10)`: 
They may be constructed from lists, for example. Here is a one dimensional array of length 10, here, given the shape `(Z :. 10)`: 

<haskell> 
<haskell> 

−  +  > let x = fromList (Z :. (10::Int)) [1..10] 

−  +  > x 

−  +  [1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0] 

</haskell> 
</haskell> 

Line 147:  Line 147:  
<haskell> 
<haskell> 

−  +  > :t x 

−  +  x :: Array (Z :. Int) Double 

</haskell> 
</haskell> 

Line 156:  Line 156:  
<haskell> 
<haskell> 

−  +  x :: Array DIM1 Double 

</haskell> 
</haskell> 

Line 162:  Line 162:  
<haskell> 
<haskell> 

−  +  > let x = fromList (Z :. (5::Int) :. (2::Int)) [1..10] 

−  +  > x 

−  +  [1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0] 

</haskell> 
</haskell> 

Line 170:  Line 170:  
<haskell> 
<haskell> 

−  +  x :: Array ((Z :. Int) :. Int) Double 

</haskell> 
</haskell> 

Line 176:  Line 176:  
<haskell> 
<haskell> 

−  +  x :: Array DIM2 Double 

</haskell> 
</haskell> 

Line 184:  Line 184:  
<haskell> 
<haskell> 

−  +  fromVector :: Shape sh => sh > Vector a > Array sh a 

</haskell> 
</haskell> 

Line 190:  Line 190:  
<haskell> 
<haskell> 

−  +  import Data.Vector.Unboxed 

−  +  > let x = fromVector (Z :. (10::Int)) (enumFromN 0 10) 

−  +  [0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0] 

</haskell> 
</haskell> 

Line 200:  Line 200:  
<haskell> 
<haskell> 

−  +  > let x = fromVector (Z :. (3::Int) :. (3::Int)) (enumFromN 0 9) 

−  +  > x 

−  +  [0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0] 

−  +  > :t x 

−  +  x :: Array ((Z :. Int) :. Int) Double 

</haskell> 
</haskell> 

Line 303:  Line 303:  
<haskell> 
<haskell> 

−  +  (!) :: (Shape sh, Elt a) => Array sh a > sh > a 

</haskell> 
</haskell> 

Line 309:  Line 309:  
<haskell> 
<haskell> 

−  +  > let x = fromList (Z :. (10::Int)) [1..10] 

−  +  > x ! (Z :. 2) 

−  +  3.0 

</haskell> 
</haskell> 

Line 317:  Line 317:  
<haskell> 
<haskell> 

−  +  > x ! 2 

−  +  No instance for (Num (Z :. Int)) 

−  +  arising from the literal `2' 

</haskell> 
</haskell> 

Line 328:  Line 328:  
<haskell> 
<haskell> 

−  +  > x ! (Z :. 11) 

−  +  *** Exception: ./Data/Vector/Generic.hs:222 ((!)): index out of bounds (11,10) 

</haskell> 
</haskell> 

Line 335:  Line 335:  
<haskell> 
<haskell> 

−  +  (!?) :: (Shape sh, Elt a) => Array sh a > sh > Maybe a 

</haskell> 
</haskell> 

Line 341:  Line 341:  
<haskell> 
<haskell> 

−  +  > x !? (Z :. 9) 

−  +  Just 10.0 

−  +  > x !? (Z :. 11) 

−  +  Nothing 

</haskell> 
</haskell> 

Revision as of 02:53, 17 May 2011
Repa is a Haskell library for high performance, regular, multidimensional parallel arrays. All numeric data is stored unboxed and functions written with the Repa combinators are automatically parallel (provided you supply "+RTS N" on the command line when running the program).
This document provides a tutorial on array programming in Haskell using the repa package.
Note: a companion tutorial to this is provided as the vector tutorial.
Contents
Quick Tour
Repa (REgular PArallel arrays) is an advanced, multidimensional parallel arrays library for Haskell, with a number of distinct capabilities:
 The arrays are "regular" (i.e. dense and rectangular); and
 Functions may be written that are polymorphic in the shape of the array;
 Many operations on arrays are accomplished by changing only the shape of the array (without copying elements);
 The library will automatically parallelize operations over arrays.
This is a quick start guide for the package. For further information, consult:
 The Haddock Documentation
 Regular, Shapepolymorphic, Parallel Arrays in Haskell.
 Efﬁcient Parallel Stencil Convolution in Haskell
Importing the library
Download the `repa` package:
$ cabal install repa
and import it qualified:
import qualified Data.Array.Repa as R
The library needs to be imported qualified as it shares the same function names as list operations in the Prelude.
Note: Operations that involve writing new index types for Repa arrays will require the 'XTypeOperators' language extension.
For noncore functionality, a number of related packages are available:
 repabytestring
 repaio
 repaalgorithms
 repadevil (image loading)
and example algorithms in:
Index types and shapes
Before we can get started manipulating arrays, we need a grasp of repa's notion of array shape. Much like the classic 'array' library in Haskell, repabased arrays are parameterized via a type which determines the dimension of the array, and the type of its index. However, while classic arrays take tuples to represent multiple dimensions, Repa arrays use a richer type language for describing multidimensional array indices and shapes.
Index types consist of two parts:
 a dimension component; and
 an index type
The most common dimensions are given by the shorthand names:
type DIM0 = Z
type DIM1 = DIM0 :. Int
type DIM2 = DIM1 :. Int
type DIM3 = DIM2 :. Int
type DIM4 = DIM3 :. Int
type DIM5 = DIM4 :. Int
thus,
Array DIM2 Double
is the type of a twodimensional array of doubles, indexed via `Int` keys, while
Array Z Double
is a zerodimension object (i.e. a point) holding a Double.
Many operations over arrays are polymorphic in the shape / dimension
component. Others require operating on the shape itself, rather than
the array. A typeclass, Shape
, lets us operate uniformally
over arrays with different shape.
Shapes
To build values of `shape` type, we can use the Z
and :.
constructors:
> Z  the zerodimension
Z
For arrays of nonzero dimension, we must give a size. Note: a common error is to leave off the type of the size.
> :t Z :. 10
Z :. 10 :: Num head => Z :. head
leading to annoying type errors about unresolved instances, such as:
No instance for (Shape (Z :. head0))
To select the correct instance, you will need to annotate the size literals with their concrete type:
> :t Z :. (10 :: Int)
Z :. (10 :: Int) :: Z :. Int
is the shape of 1D arrays of length 10, indexed via Ints.
Given an array, you can always find its shape by calling extent
.
Additional convenience types for selecting particular parts of a shape are also provided (All, Any, Slice
etc.) which are covered later in the tutorial.
Generating arrays
New repa arrays ("arrays" from here on) can be generated in many ways, and we always begin by importing the Data.Array.Repa
module:
$ ghci
GHCi, version 7.0.3: http://www.haskell.org/ghc/ :? for help
Loading package ghcprim ... linking ... done.
Loading package integergmp ... linking ... done.
Loading package base ... linking ... done.
Loading package ffi1.0 ... linking ... done.
Prelude> :m + Data.Array.Repa
They may be constructed from lists, for example. Here is a one dimensional array of length 10, here, given the shape `(Z :. 10)`:
> let x = fromList (Z :. (10::Int)) [1..10]
> x
[1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0]
The type of `x` is inferred as:
> :t x
x :: Array (Z :. Int) Double
which we can read as "an array of dimension 1, indexed via Int keys, holding elements of type Double"
We could also have written the type as:
x :: Array DIM1 Double
The same data may also be treated as a two dimensional array, by changing the shape parameter:
> let x = fromList (Z :. (5::Int) :. (2::Int)) [1..10]
> x
[1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0]
which has the type:
x :: Array ((Z :. Int) :. Int) Double
or, more simply:
x :: Array DIM2 Double
Building arrays from vectors
It is also possible to build arrays from unboxed vectors, from the 'vector' package:
fromVector :: Shape sh => sh > Vector a > Array sh a
New arrays are built by applying a shape to the vector. For example:
import Data.Vector.Unboxed
> let x = fromVector (Z :. (10::Int)) (enumFromN 0 10)
[0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0]
is a onedimensional array of doubles. As usual, we can also impose other shapes:
> let x = fromVector (Z :. (3::Int) :. (3::Int)) (enumFromN 0 9)
> x
[0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0]
> :t x
x :: Array ((Z :. Int) :. Int) Double
to create a 3x3 array.
Reading arrays from files
Using the repaio package, arrays may be written and read from files in a number of formats:
 as BMP files; and
 in a number of text formats.
with other formats rapidly appearing. For the special case of arrays of Word8 values, the repabytestring library supports generating bytestrings in memory.
An example: to write an 2D array to an ascii file:
writeMatrixToTextFile "/tmp/test.dat" x
This will result in a file containing:
MATRIX 2 5 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
In turn, this file may be read back in via readMatrixFromTextFile
.
To process .bmp files, use Data.Array.Repa.IO.BMP, as follows (currently reading only works for 24 bit .bmp):
Data.Array.Repa.IO.BMP> x < readImageFromBMP "/tmp/test24.bmp"
Reads this .bmp image:
as a 3D array of Word8, which can be further processed.
Note: at the time of writing, there are no binary instances for repa arrays
For image IO in many, many formats, use the repadevil library.
Copying arrays from pointers
You can also generate new repa arrays by copying data from a pointer, using the repabytestring package. Here is an example, using "copyFromPtrWord8":
import Data.Word
import Foreign.Ptr
import qualified Data.Vector.Storable as V
import qualified Data.Array.Repa as R
import Data.Array.Repa
import qualified Data.Array.Repa.ByteString as R
import Data.Array.Repa.IO.DevIL
i, j, k :: Int
(i, j, k) = (255, 255, 4 {RGBA})
 1d vector, filled with pretty colors
v :: V.Vector Word8
v = V.fromList . take (i * j * k) . cycle $ concat
[ [ r, g, b, 255 ]
 r < [0 .. 255]
, g < [0 .. 255]
, b < [0 .. 255]
]
ptr2repa :: Ptr Word8 > IO (R.Array R.DIM3 Word8)
ptr2repa p = R.copyFromPtrWord8 (Z :. i :. j :. k) p
main = do
 copy our 1d vector to a repa 3d array, via a pointer
r < V.unsafeWith v ptr2repa
runIL $ writeImage "test.png" r
return ()
This fills a vector, converts it to a pointer, then copies that pointer to a 3d array, before writing the result out as this image:
Indexing arrays
To access elements in repa arrays, you provide an array and a shape, to access the element:
(!) :: (Shape sh, Elt a) => Array sh a > sh > a
So:
> let x = fromList (Z :. (10::Int)) [1..10]
> x ! (Z :. 2)
3.0
Note that we can't give just a bare literal as the shape, even for onedimensional arrays, :
> x ! 2
No instance for (Num (Z :. Int))
arising from the literal `2'
as the Z type isn't in the Num class, and Haskell's numeric literals are overloaded.
What if the index is out of bounds, though?
> x ! (Z :. 11)
*** Exception: ./Data/Vector/Generic.hs:222 ((!)): index out of bounds (11,10)
an exception is thrown. An altnerative is to indexing functions that return a Maybe:
(!?) :: (Shape sh, Elt a) => Array sh a > sh > Maybe a
An example:
> x !? (Z :. 9)
Just 10.0
> x !? (Z :. 11)
Nothing
Operations on arrays
Besides indexing, there are many regular, listlike operations on arrays.
Maps, zips, filters and folds
We can map over multidimensional arrays:
> let x = fromList (Z :. (3::Int) :. (3::Int)) [1..9] > x [1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0]
since `map` conflicts with the definition in the Prelude, we have to use it qualified:
> Data.Array.Repa.map (^2) x [1.0,4.0,9.0,16.0,25.0,36.0,49.0,64.0,81.0]
Maps leave the dimension unchanged.
Folding reduces the inner dimension of the array.
fold :: (Shape sh, Elt a) => (a > a > a) > a > Array (sh :. Int) a > Array sh a
So if 'x' is a 3D array:
> let x = fromList (Z :. (3::Int) :. (3::Int)) [1..9] > x [1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0]
We can sum each row, to yield a 2D array:
> fold (+) 0 x [6.0,15.0,24.0]
Two arrays may be combined via zipWith
:
zipWith :: (Shape sh, Elt b, Elt c, Elt a) => (a > b > c) > Array sh a > Array sh b > Array sh c
an example:
> zipWith (*) x x [1.0,4.0,9.0,16.0,25.0,36.0,49.0,64.0,81.0]
Numeric operations: negation, addition, subtraction, multiplication
Repa arrays are instances of the Num
. This means that
operations on numerical elements are lifted automagically onto arrays of
such elements. For example, (+)
on two double values corresponds to
elementwise addition, (+)
, of the two arrays of doubles:
> let x = fromList (Z :. (10::Int)) [1..10] > x + x [2.0,4.0,6.0,8.0,10.0,12.0,14.0,16.0,18.0,20.0]
Other operations from the Num class work just as well:
> x [1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0,10.0]
> x ^ 3 [1.0,8.0,27.0,64.0,125.0,216.0,343.0,512.0,729.0,1000.0]
> x * x [1.0,4.0,9.0,16.0,25.0,36.0,49.0,64.0,81.0,100.0]
Changing the shape of an array
One of the main advantages of repastyle arrays over other arrays in Haskell is the ability to reshape data without copying. This is achieved via *indexspace transformations*.
An example: transposing a 2D array (this example taken from the repa paper). First, the type of the transformation:
transpose2D :: Elt e => Array DIM2 e > Array DIM2 e
Note that this transform will work on DIM2 arrays holding any elements. Now, to swap rows and columns, we have to modify the shape:
transpose2D a = backpermute (swap e) swap a where e = extent a swap (Z :. i :. j) = Z :. j :. i
The swap function reorders the index space of the array. To do this, we extract the current shape of the array, and write a function that maps the index space from the old array to the new array. That index space function is then passed to backpermute which actually constructs the new array from the old one.
backpermute generates a new array from an old, when given the new shape, and a function that translates between the index space of each array (i.e. a shape transformer).
backpermute :: (Shape sh, Shape sh', Elt a) => sh' > (sh' > sh) > Array sh a > Array sh' a
Note that the array created is not actually evaluated (we only modified the index space of the array).
Transposition is such a common operation that it is provided by the library:
transpose :: (Shape sh, Elt a) => Array ((sh :. Int) :. Int) a > Array ((sh :. Int) :. Int)
the type indicate that it works on the lowest two dimensions of the array.
Other operations on index spaces include taking slices and joining arrays into larger ones.
Examples
Following are some examples of useful functions that exercise the API.
Example: matrixmatrix multiplication
A more advanced example from the Repa paper: matrixmatrix multiplication: the result of matrix multiplication is a matrix whose elements are found by multiplying the elements of each row from the first matrix by the associated elements of the same column from the second matrix and summing the result.
if and
then
So we take two, 2D arrays and generate a new array, using our transpose function from earlier:
mmMult :: (Num e, Elt e) => Array DIM2 e > Array DIM2 e > Array DIM2 e
mmMult a b = sum (zipWith (*) aRepl bRepl) where t = transpose2D b aRepl = extend (Z :.All :.colsB :.All) a bRepl = extend (Z :.rowsA :.All :.All) t (Z :.colsA :.rowsA) = extent a (Z :.colsB :.rowsB) = extent b
The idea is to expand both 2D argument arrays into 3D arrays by
replicating them across a new axis. The front face of the cuboid that
results represents the array a
, which we replicate as often
as b
has columns (colsB)
, producing
aRepl
.
The top face represents t
(the transposed b), which we
replicate as often as a has rows (rowsA)
, producing
bRepl,
. The two replicated arrays have the same extent,
which corresponds to the index space of matrix multiplication
Optimized implementations of this function are available in the repaalgorithms package.
Example: parallel image desaturation
To convert an image from color to greyscale, we can use the luminosity method to averge RGB pixels into a common grey value, where the average is weighted for human perception of green
The formula for luminosity is 0.21 R + 0.71 G + 0.07 B.
We can write a parallel image desaturation tool using repa and the repadevil image library:
import Data.Array.Repa.IO.DevIL
import Data.Array.Repa hiding ((++))
import Data.Word
import System.Environment

 Read an image, desaturate, write out with new name.

main = do
[f] < getArgs
runIL $ do
a < readImage f
let b = traverse a id luminosity
writeImage ("grey" ++ f) b
And now the luminosity transform itself, which averages the 3 RGB colors based on preceived weight:

 (Parallel) desaturation of an image via the luminosity method.

luminosity :: (DIM3 > Word8) > DIM3 > Word8
luminosity _ (Z :. _ :. _ :. 3) = 255  alpha channel
luminosity f (Z :. i :. j :. _) = ceiling $ 0.21 * r + 0.71 * g + 0.07 * b
where
r = fromIntegral $ f (Z :. i :. j :. 0)
g = fromIntegral $ f (Z :. i :. j :. 1)
b = fromIntegral $ f (Z :. i :. j :. 2)
And that's it! The result is a parallel image desaturator, when compiled with
$ ghc O threaded rtsopts make A.hs fforcerecomp
which we can run, to use two cores:
$ time ./A sunflower.png +RTS N2 H ./A sunflower.png +RTS N2 H 0.19s user 0.03s system 135% cpu 0.165 total
Given an image like this:
The desaturated result from Haskell: