# Difference between revisions of "Numeric Haskell: A Vector Tutorial"

DonStewart (talk | contribs) |
DonStewart (talk | contribs) |
||

Line 287: | Line 287: | ||

./vector data 0.08s user 0.01s system 98% cpu 0.088 total |
./vector data 0.08s user 0.01s system 98% cpu 0.088 total |
||

− | == |
+ | === Random numbers === |

+ | |||

+ | If we can parse from a file, we can also fill a vector with random numbers. |
||

+ | We'll use the [http://hackage.haskell.org/package/mersenne-random mersenne-random] package: |
||

+ | |||

+ | $ cabal install mersenne-random |
||

+ | |||

+ | We can then use MTRandom class to generate random vectors of different types: |
||

+ | |||

+ | <haskell> |
||

+ | import qualified Data.Vector.Unboxed as U |
||

+ | import System.Random.Mersenne |
||

+ | import Control.Monad |
||

+ | |||

+ | main = do |
||

+ | -- create a new source of randomness |
||

+ | -- andan infinite list of randoms |
||

+ | g <- newMTGen Nothing |
||

+ | rs <- randoms g |
||

+ | |||

+ | -- fill a vector with the first 10 random Ints |
||

+ | let a = U.fromList (take 10 rs) :: U.Vector Int |
||

+ | |||

+ | -- print the sum |
||

+ | print (U.sum a) |
||

+ | |||

+ | -- print each element |
||

+ | forM_ (U.toList a) print |
||

+ | </haskell> |
||

+ | |||

+ | Running it: |
||

+ | |||

+ | <haskell> |
||

+ | $ runhaskell B.hs |
||

+ | -56426044567146682 |
||

+ | -2144043065897064806 |
||

+ | -2361203915295681429 |
||

+ | 1023751035988638668 |
||

+ | -5145147152582103336 |
||

+ | 6545758323081548799 |
||

+ | -7630294342751488332 |
||

+ | -5861937811333342436 |
||

+ | -3198510304070719259 |
||

+ | 8949914511577398116 |
||

+ | -8681457396993884283 |
||

+ | </haskell> |
||

+ | |||

+ | == Transformations on Vectors == |
||

+ | |||

+ | A primary operation on arrays are the zip class of functions. |
||

+ | |||

+ | <haskell> |
||

+ | > let a = fromList [20,30,40,50] |
||

+ | |||

+ | > let b = enumFromN 0 4 |
||

+ | |||

+ | > a |
||

+ | fromList [20,30,40,50] |
||

+ | |||

+ | > b |
||

+ | fromList [0,1,2,3] |
||

+ | |||

+ | > Data.Vector.zipWith (-) a b |
||

+ | fromList [20,29,38,47] |
||

+ | </haskell> |
||

+ | |||

+ | We can also, of course, apply a function to each element of a vector (map): |
||

+ | |||

+ | <haskell> |
||

+ | > Data.Vector.map (^2) b |
||

+ | fromList [0,1,4,9] |
||

+ | |||

+ | > Data.Vector.map (\e -> 10 * sin (fromIntegral e)) a |
||

+ | fromList [9.129452507276277,-9.880316240928618,7.451131604793488,-2.6237485370392877] |
||

+ | |||

+ | > Data.Vector.map (< 35) a |
||

+ | fromList [True,True,False,False] |
||

+ | </haskell> |
||

+ | |||

+ | === Folds: Sums, Products, Min, Max === |
||

+ | |||

+ | Many special purpose folds (reductions) on vectors are available: |
||

+ | |||

+ | <haskell> |
||

+ | > let a = enumFromN 1 100 |
||

+ | |||

+ | > Data.Vector.sum a |
||

+ | 5050 |
||

+ | |||

+ | > Data.Vector.product a |
||

+ | 93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000 |
||

+ | |||

+ | > Data.Vector.minimum a |
||

+ | 1 |
||

+ | |||

+ | > Data.Vector.maximum a |
||

+ | 100 |
||

+ | </haskell> |
||

+ | |||

== Indexing, Slicing and Iterating == |
== Indexing, Slicing and Iterating == |
||

+ | |||

+ | One dimensional arrays can be indexed, sliced and iterated over pretty much like lists. |
||

+ | |||

+ | Because Haskell values are by default immutable, all slice operations are zero-copying. |
||

+ | |||

+ | <haskell> |
||

+ | > let a = enumFromN 0 10 |
||

+ | |||

+ | > a |
||

+ | fromList [0,1,2,3,4,5,6,7,8,9] |
||

+ | |||

+ | > let b = Data.Vector.map (^3) a |
||

+ | |||

+ | > b ! 2 |
||

+ | 8 |
||

+ | |||

+ | > slice 2 3 b |
||

+ | fromList [8,27,64] |
||

+ | </haskell> |
||

+ | |||

+ | slice takes 3 arguments: the initial index to slice from, the number of elements to slice, and the vector to operate on. |
||

+ | |||

+ | A number of special-purpose slices are also available: |
||

+ | |||

+ | <haskell> |
||

+ | > Data.Vector.init b |
||

+ | fromList [0,1,8,27,64,125,216,343,512] |
||

+ | |||

+ | > Data.Vector.tail b |
||

+ | fromList [1,8,27,64,125,216,343,512,729] |
||

+ | |||

+ | > Data.Vector.take 3 b |
||

+ | fromList [0,1,8] :: Data.Vector.Vector |
||

+ | |||

+ | > Data.Vector.drop 3 b |
||

+ | fromList [27,64,125,216,343,512,729] :: Data.Vector.Vector |
||

+ | </haskell> |
||

+ | |||

== Bulk operations == |
== Bulk operations == |
||

== Stacking together different arrays == |
== Stacking together different arrays == |

## Revision as of 02:11, 20 February 2010

Vector is a Haskell library for working with arrays, with an emphasis on raw performance, whilst retaining a rich interface. The main data types are boxed and unboxed arrays, and arrays may be immutable (pure), or mutable. Arrays are indexed by non-negative `Int`

values.

The vector library has an API similar to the famous Haskell list library, with many of the same names.

This tutorial is modelled on the NumPy tutorial.

## Contents

- 1 Quick Tour
- 2 The Tutorial
- 2.1 Simple example
- 2.2 Array Types
- 2.3 Array Creation
- 2.4 Transformations on Vectors
- 2.5 Indexing, Slicing and Iterating
- 2.6 Bulk operations
- 2.7 Stacking together different arrays
- 2.8 Splitting one array into several smaller ones
- 2.9 Copies and Views
- 2.10 No Copy at All
- 2.11 Indexing with Arrays of Indices
- 2.12 Indexing with Boolean Arrays
- 2.13 Permutations
- 2.14 Randoms
- 2.15 IO
- 2.16 References

# Quick Tour

Here is a quick overview to get you started.

## Importing the library

Download the vector package:

$ cabal install vector

and import it as, for boxed arrays:

```
import qualified Data.Vector as V
```

or:

```
import qualified Data.Vector.Unboxed as V
```

for unboxed arrays. The library needs to be imported qualified as it shares the same function names as list operations in the Prelude.

## Generating Vectors

New vectors can be generated in many ways:

```
$ ghci
GHCi, version 6.12.1: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Loading package ffi-1.0 ... linking ... done.
Prelude> :m + Data.Vector
-- Generating a vector from a list:
Prelude Data.Vector> let a = fromList [10, 20, 30, 40]
Prelude Data.Vector> a
fromList [10,20,30,40] :: Data.Vector.Vector
-- Or filled from a sequence
Prelude Data.Vector> enumFromStepN 10 10 4
fromList [10,20,30,40] :: Data.Vector.Vector
-- A vector created from four consecutive values
Prelude Data.Vector> enumFromN 10 4
fromList [10,11,12,13] :: Data.Vector.Vector
```

You can also build vectors using operations similar to lists:

```
-- The empty vector
Prelude Data.Vector> empty
fromList [] :: Data.Vector.Vector
-- A vector of length one
Prelude Data.Vector> singleton 2
fromList [2] :: Data.Vector.Vector
-- A vector of length 10, filled with the value '2'
-- Note that to disambiguate names,
-- and avoid a clash with the Prelude,
-- with use the full path to the Vector module
Prelude Data.Vector> Data.Vector.replicate 10 2
fromList [2,2,2,2,2,2,2,2,2,2] :: Data.Vector.Vector
```

In general, you may construct new vectors by applying a function to the index space:

```
Prelude Data.Vector> generate 10 (^2)
fromList [0,1,4,9,16,25,36,49,64,81] :: Data.Vector.Vector
```

Vectors may have more than one dimension:

```
-- Here we create a two dimensional vector, 10 columns,
-- each row filled with the row index.
Prelude Data.Vector> let x = generate 10 (\n -> Data.Vector.replicate 10 n)
-- The type is "Vector of Vector of Ints"
Prelude Data.Vector> :t x
x :: Vector (Vector Int)
```

Vectors may be grown or shrunk arbitrarily:

```
Prelude Data.Vector> let y = Data.Vector.enumFromTo 0 11
Prelude Data.Vector> y
fromList [0,1,2,3,4,5,6,7,8,9,10,11] :: Data.Vector.Vector
-- Take the first 3 elements as a new vector
Prelude Data.Vector> Data.Vector.take 3 y
fromList [0,1,2] :: Data.Vector.Vector
-- Duplicate and join the vector
Prelude Data.Vector> y Data.Vector.++ y
fromList [0,1,2,3,4,5,6,7,8,9,10,11,0,1,2,3,4,5,6,7,8,9,10,11] :: Data.Vector.Vector
```

## Modifying vectors

Just as with lists, you can iterate (map) over arrays, reduce them (fold), filter them, or join them in various ways:

```
-- mapping a function over the elements of a vector
Prelude Data.Vector> Data.Vector.map (^2) y
fromList [0,1,4,9,16,25,36,49,64,81,100,121] :: Data.Vector.Vector
-- Extract only the odd elements from a vector
Prelude Data.Vector> Data.Vector.filter odd y
fromList [1,3,5,7,9,11] :: Data.Vector.Vector
-- Reduce a vector
Prelude Data.Vector> Data.Vector.foldl (+) 0 y
66
-- Take a scan (partial results from a reduction):
Prelude Data.Vector> Data.Vector.scanl (+) 0 y
fromList [0,0,1,3,6,10,15,21,28,36,45,55,66] :: Data.Vector.Vector
-- Zip two arrays pairwise, into an array of pairs
Prelude Data.Vector> Data.Vector.zip y y
fromList [(0,0),(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9),(10,10),(11,11)] :: Data.Vector.Vector
```

## Indexing vectors

And like all good arrays, you can index them in various ways:

```
-- Take the first element
Prelude Data.Vector> Data.Vector.head y
0
-- Take the last element
Prelude Data.Vector> Data.Vector.tail y
fromList [1,2,3,4,5,6,7,8,9,10,11] :: Data.Vector.Vector
-- Take an arbitrary element
Prelude Data.Vector> y ! 4
4
```

# The Tutorial

The vector package provides a several types of array. The most general interface is via Data.Vector, which provides for boxed arrays, holding any type.

There are also more specialized array types:

which provide unboxed arrays (i.e. no closures) and storable arrays (data that is pinned, and may be passed to and from C via a Ptr).

In all cases, the operations are subject to loop fusion. That is, if you compose two functions,

map f . map g

the compiler will rewrite it into a single traversal:

map (f . g)

saving time and space.

## Simple example

You can create the arrays in many ways, for example, from a regular Haskell list:

```
let a = fromList [2,3,4]
Prelude Data.Vector> a
fromList [2,3,4] :: Data.Vector.Vector
Prelude Data.Vector> :t a
a :: Vector Integer
```

GHCi will print the contents of the vector as executable code.

To create a multidimensional array, you can use a nested list generator to fill it:

```
Prelude Data.Vector> let x = fromList [ fromList [1 .. x] | x <- [1..10] ]
Prelude Data.Vector> :t x
x :: Vector (Vector Integer)
```

-- XXX TODO need a better printing function for multidimensional arrays.

You can also just create arrays filled with zeroes:

```
Prelude Data.Vector> Data.Vector.replicate 10 0
fromList [0,0,0,0,0,0,0,0,0,0] :: Data.Vector.Vector
```

And you can fill arrays from a sequence generator:

```
Prelude Data.Vector> enumFromN 1 10
fromList [1,2,3,4,5,6,7,8,9,10] :: Data.Vector.Vector
Prelude Data.Vector> enumFromStepN 0 10 10
fromList [0,10,20,30,40,50,60,70,80,90] :: Data.Vector.Vector
```

## Array Types

### Boxed Arrays

### Unboxed Arrays

### Pure Arrays

### Impure Arrays

### Some examples

The most important attributes of an array are available in O(1) time, such as the size (length),

```
-- how big is the array?
Prelude Data.Vector> let a = fromList [1,2,3,4,5,6,7,8,9,10]
Prelude Data.Vector> Data.Vector.length a
10
-- is the array empty?
Prelude Data.Vector> Data.Vector.null a
False
```

## Array Creation

### An example: filling a vector from a file

We often want to populate a vector using a external data file. The easiest way to do this is with bytestring IO, and Data.Vector.unfoldr (or the equivalent functions in Data.Vector.Unboxed or Data.Vector.Storable:

```
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy.Char8 as L
import qualified Data.Vector as U
import System.Environment
main = do
[f] <- getArgs
s <- L.readFile f
print . U.sum . parse $ s
-- Fill a new vector from a file containing a list of numbers.
parse = U.unfoldr step
where
step !s = case L.readInt s of
Nothing -> Nothing
Just (!k, !t) -> Just (k, L.tail t)
```

Note the use of bang patterns to ensure the parsing accumulated state is produced strictly.

Create a data file filled with 1 million integers:

$ seq 1 1000000 > data

Compile with -Odph (enables special optimizations to help fusion):

$ ghc -Odph --make vector.hs

Run:

$ time ./vector data 500000500000 ./vector data 0.08s user 0.01s system 98% cpu 0.088 total

### Random numbers

If we can parse from a file, we can also fill a vector with random numbers. We'll use the mersenne-random package:

$ cabal install mersenne-random

We can then use MTRandom class to generate random vectors of different types:

```
import qualified Data.Vector.Unboxed as U
import System.Random.Mersenne
import Control.Monad
main = do
-- create a new source of randomness
-- andan infinite list of randoms
g <- newMTGen Nothing
rs <- randoms g
-- fill a vector with the first 10 random Ints
let a = U.fromList (take 10 rs) :: U.Vector Int
-- print the sum
print (U.sum a)
-- print each element
forM_ (U.toList a) print
```

Running it:

```
$ runhaskell B.hs
-56426044567146682
-2144043065897064806
-2361203915295681429
1023751035988638668
-5145147152582103336
6545758323081548799
-7630294342751488332
-5861937811333342436
-3198510304070719259
8949914511577398116
-8681457396993884283
```

## Transformations on Vectors

A primary operation on arrays are the zip class of functions.

```
> let a = fromList [20,30,40,50]
> let b = enumFromN 0 4
> a
fromList [20,30,40,50]
> b
fromList [0,1,2,3]
> Data.Vector.zipWith (-) a b
fromList [20,29,38,47]
```

We can also, of course, apply a function to each element of a vector (map):

```
> Data.Vector.map (^2) b
fromList [0,1,4,9]
> Data.Vector.map (\e -> 10 * sin (fromIntegral e)) a
fromList [9.129452507276277,-9.880316240928618,7.451131604793488,-2.6237485370392877]
> Data.Vector.map (< 35) a
fromList [True,True,False,False]
```

### Folds: Sums, Products, Min, Max

Many special purpose folds (reductions) on vectors are available:

```
> let a = enumFromN 1 100
> Data.Vector.sum a
5050
> Data.Vector.product a
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
> Data.Vector.minimum a
1
> Data.Vector.maximum a
100
```

## Indexing, Slicing and Iterating

One dimensional arrays can be indexed, sliced and iterated over pretty much like lists.

Because Haskell values are by default immutable, all slice operations are zero-copying.

```
> let a = enumFromN 0 10
> a
fromList [0,1,2,3,4,5,6,7,8,9]
> let b = Data.Vector.map (^3) a
> b ! 2
8
> slice 2 3 b
fromList [8,27,64]
```

slice takes 3 arguments: the initial index to slice from, the number of elements to slice, and the vector to operate on.

A number of special-purpose slices are also available:

```
> Data.Vector.init b
fromList [0,1,8,27,64,125,216,343,512]
> Data.Vector.tail b
fromList [1,8,27,64,125,216,343,512,729]
> Data.Vector.take 3 b
fromList [0,1,8] :: Data.Vector.Vector
> Data.Vector.drop 3 b
fromList [27,64,125,216,343,512,729] :: Data.Vector.Vector
```