# Numeric Haskell: A Vector Tutorial

### From HaskellWiki

DonStewart (Talk | contribs) |
DonStewart (Talk | contribs) |
||

Line 166: | Line 166: | ||

There are also more specialized array types: | There are also more specialized array types: | ||

− | + | * [http://hackage.haskell.org/packages/archive/vector/0.5/doc/html/Data-Vector-Unboxed.html Unboxed] | |

− | + | * [http://hackage.haskell.org/packages/archive/vector/0.5/doc/html/Data-Vector-Storable.html Storable] | |

which provide unboxed arrays (i.e. no closures) and storable arrays (data that is pinned, and may be passed to and from C via a Ptr). | which provide unboxed arrays (i.e. no closures) and storable arrays (data that is pinned, and may be passed to and from C via a Ptr). | ||

+ | |||

+ | In all cases, the operations are subject to loop fusion. That is, if you compose two functions, | ||

+ | |||

+ | map f . map g | ||

+ | |||

+ | the compiler will rewrite it into a single traversal: | ||

+ | |||

+ | map (f . g) | ||

+ | |||

+ | saving time and space. | ||

== Simple example == | == Simple example == | ||

Line 214: | Line 224: | ||

fromList [0,10,20,30,40,50,60,70,80,90] :: Data.Vector.Vector | fromList [0,10,20,30,40,50,60,70,80,90] :: Data.Vector.Vector | ||

</haskell> | </haskell> | ||

− | |||

== Array Types == | == Array Types == | ||

Line 222: | Line 231: | ||

=== Impure Arrays === | === Impure Arrays === | ||

=== Some examples === | === Some examples === | ||

+ | |||

+ | The most important attributes of an array are available in O(1) time, such as the size (length), | ||

+ | |||

+ | <haskell> | ||

+ | -- how big is the array? | ||

+ | Prelude Data.Vector> let a = fromList [1,2,3,4,5,6,7,8,9,10] | ||

+ | Prelude Data.Vector> Data.Vector.length a | ||

+ | 10 | ||

+ | |||

+ | -- is the array empty? | ||

+ | Prelude Data.Vector> Data.Vector.null a | ||

+ | False | ||

+ | </haskell> | ||

+ | |||

== Array Creation == | == Array Creation == | ||

+ | |||

+ | === An example: filling a vector from a file === | ||

+ | |||

+ | We often want to populate a vector using a external data file. The easiest way to do this is with bytestring IO, and Data.Vector.unfoldr (or the equivalent functions in Data.Vector.Unboxed or Data.Vector.Storable: | ||

+ | |||

+ | <haskell> | ||

+ | {-# LANGUAGE BangPatterns #-} | ||

+ | |||

+ | import qualified Data.ByteString.Lazy.Char8 as L | ||

+ | import qualified Data.Vector as U | ||

+ | import System.Environment | ||

+ | |||

+ | main = do | ||

+ | [f] <- getArgs | ||

+ | s <- L.readFile f | ||

+ | print . U.sum . parse $ s | ||

+ | |||

+ | -- Fill a new vector from a file containing a list of numbers. | ||

+ | parse = U.unfoldr step | ||

+ | where | ||

+ | step !s = case L.readInt s of | ||

+ | Nothing -> Nothing | ||

+ | Just (!k, !t) -> Just (k, L.tail t) | ||

+ | </haskell> | ||

+ | |||

+ | Note the use of bang patterns to ensure the parsing accumulated state is produced strictly. | ||

+ | |||

+ | Create a data file filled with 1 million integers: | ||

+ | |||

+ | $ seq 1 1000000 > data | ||

+ | |||

+ | Compile with -Odph (enables special optimizations to help fusion): | ||

+ | |||

+ | $ ghc -Odph --make vector.hs | ||

+ | |||

+ | Run: | ||

+ | |||

+ | $ time ./vector data | ||

+ | 500000500000 | ||

+ | ./vector data 0.08s user 0.01s system 98% cpu 0.088 total | ||

+ | |||

== Basic Operations == | == Basic Operations == | ||

== Indexing, Slicing and Iterating == | == Indexing, Slicing and Iterating == |

## Revision as of 01:38, 20 February 2010

The vector library has an API similar to the famous Haskell list library, with many of the same names.

This tutorial is modelled on the NumPy tutorial.

# 1 Quick Tour

Here is a quick overview to get you started.

## 1.1 Importing the library

Download the vector package:

$ cabal install vector

and import it as, for boxed arrays:

import qualified Data.Vector as V

or:

import qualified Data.Vector.Unboxed as V

for unboxed arrays. The library needs to be imported qualified as it shares the same function names as list operations in the Prelude.

## 1.2 Generating Vectors

New vectors can be generated in many ways:

$ ghci GHCi, version 6.12.1: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. Loading package ffi-1.0 ... linking ... done. Prelude> :m + Data.Vector -- Generating a vector from a list: Prelude Data.Vector> let a = fromList [10, 20, 30, 40] Prelude Data.Vector> a fromList [10,20,30,40] :: Data.Vector.Vector -- Or filled from a sequence Prelude Data.Vector> enumFromStepN 10 10 4 fromList [10,20,30,40] :: Data.Vector.Vector -- A vector created from four consecutive values Prelude Data.Vector> enumFromN 10 4 fromList [10,11,12,13] :: Data.Vector.Vector

You can also build vectors using operations similar to lists:

-- The empty vector Prelude Data.Vector> empty fromList [] :: Data.Vector.Vector -- A vector of length one Prelude Data.Vector> singleton 2 fromList [2] :: Data.Vector.Vector -- A vector of length 10, filled with the value '2' -- Note that to disambiguate names, -- and avoid a clash with the Prelude, -- with use the full path to the Vector module Prelude Data.Vector> Data.Vector.replicate 10 2 fromList [2,2,2,2,2,2,2,2,2,2] :: Data.Vector.Vector

In general, you may construct new vectors by applying a function to the index space:

Prelude Data.Vector> generate 10 (^2) fromList [0,1,4,9,16,25,36,49,64,81] :: Data.Vector.Vector

Vectors may have more than one dimension:

-- Here we create a two dimensional vector, 10 columns, -- each row filled with the row index. Prelude Data.Vector> let x = generate 10 (\n -> Data.Vector.replicate 10 n) -- The type is "Vector of Vector of Ints" Prelude Data.Vector> :t x x :: Vector (Vector Int)

Vectors may be grown or shrunk arbitrarily:

Prelude Data.Vector> let y = Data.Vector.enumFromTo 0 11 Prelude Data.Vector> y fromList [0,1,2,3,4,5,6,7,8,9,10,11] :: Data.Vector.Vector -- Take the first 3 elements as a new vector Prelude Data.Vector> Data.Vector.take 3 y fromList [0,1,2] :: Data.Vector.Vector -- Duplicate and join the vector Prelude Data.Vector> y Data.Vector.++ y fromList [0,1,2,3,4,5,6,7,8,9,10,11,0,1,2,3,4,5,6,7,8,9,10,11] :: Data.Vector.Vector

## 1.3 Modifying vectors

Just as with lists, you can iterate (map) over arrays, reduce them (fold), filter them, or join them in various ways:

-- mapping a function over the elements of a vector Prelude Data.Vector> Data.Vector.map (^2) y fromList [0,1,4,9,16,25,36,49,64,81,100,121] :: Data.Vector.Vector -- Extract only the odd elements from a vector Prelude Data.Vector> Data.Vector.filter odd y fromList [1,3,5,7,9,11] :: Data.Vector.Vector -- Reduce a vector Prelude Data.Vector> Data.Vector.foldl (+) 0 y 66 -- Take a scan (partial results from a reduction): Prelude Data.Vector> Data.Vector.scanl (+) 0 y fromList [0,0,1,3,6,10,15,21,28,36,45,55,66] :: Data.Vector.Vector -- Zip two arrays pairwise, into an array of pairs Prelude Data.Vector> Data.Vector.zip y y fromList [(0,0),(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9),(10,10),(11,11)] :: Data.Vector.Vector

## 1.4 Indexing vectors

And like all good arrays, you can index them in various ways:

-- Take the first element Prelude Data.Vector> Data.Vector.head y 0 -- Take the last element Prelude Data.Vector> Data.Vector.tail y fromList [1,2,3,4,5,6,7,8,9,10,11] :: Data.Vector.Vector -- Take an arbitrary element Prelude Data.Vector> y ! 4 4

# 2 The Tutorial

The vector package provides a several types of array. The most general interface is via Data.Vector, which provides for boxed arrays, holding any type.

There are also more specialized array types:

which provide unboxed arrays (i.e. no closures) and storable arrays (data that is pinned, and may be passed to and from C via a Ptr).

In all cases, the operations are subject to loop fusion. That is, if you compose two functions,

map f . map g

the compiler will rewrite it into a single traversal:

map (f . g)

saving time and space.

## 2.1 Simple example

You can create the arrays in many ways, for example, from a regular Haskell list:

let a = fromList [2,3,4] Prelude Data.Vector> a fromList [2,3,4] :: Data.Vector.Vector Prelude Data.Vector> :t a a :: Vector Integer

GHCi will print the contents of the vector as executable code.

To create a multidimensional array, you can use a nested list generator to fill it:

Prelude Data.Vector> let x = fromList [ fromList [1 .. x] | x <- [1..10] ] Prelude Data.Vector> :t x x :: Vector (Vector Integer)

-- XXX TODO need a better printing function for multidimensional arrays.

You can also just create arrays filled with zeroes:

Prelude Data.Vector> Data.Vector.replicate 10 0 fromList [0,0,0,0,0,0,0,0,0,0] :: Data.Vector.Vector

And you can fill arrays from a sequence generator:

Prelude Data.Vector> enumFromN 1 10 fromList [1,2,3,4,5,6,7,8,9,10] :: Data.Vector.Vector Prelude Data.Vector> enumFromStepN 0 10 10 fromList [0,10,20,30,40,50,60,70,80,90] :: Data.Vector.Vector

## 2.2 Array Types

### 2.2.1 Boxed Arrays

### 2.2.2 Unboxed Arrays

### 2.2.3 Pure Arrays

### 2.2.4 Impure Arrays

### 2.2.5 Some examples

The most important attributes of an array are available in O(1) time, such as the size (length),

-- how big is the array? Prelude Data.Vector> let a = fromList [1,2,3,4,5,6,7,8,9,10] Prelude Data.Vector> Data.Vector.length a 10 -- is the array empty? Prelude Data.Vector> Data.Vector.null a False

## 2.3 Array Creation

### 2.3.1 An example: filling a vector from a file

We often want to populate a vector using a external data file. The easiest way to do this is with bytestring IO, and Data.Vector.unfoldr (or the equivalent functions in Data.Vector.Unboxed or Data.Vector.Storable:

{-# LANGUAGE BangPatterns #-} import qualified Data.ByteString.Lazy.Char8 as L import qualified Data.Vector as U import System.Environment main = do [f] <- getArgs s <- L.readFile f print . U.sum . parse $ s -- Fill a new vector from a file containing a list of numbers. parse = U.unfoldr step where step !s = case L.readInt s of Nothing -> Nothing Just (!k, !t) -> Just (k, L.tail t)

Note the use of bang patterns to ensure the parsing accumulated state is produced strictly.

Create a data file filled with 1 million integers:

$ seq 1 1000000 > data

Compile with -Odph (enables special optimizations to help fusion):

$ ghc -Odph --make vector.hs

Run:

$ time ./vector data 500000500000 ./vector data 0.08s user 0.01s system 98% cpu 0.088 total