Difference between revisions of "Numeric Haskell: A Vector Tutorial"
DonStewart (talk  contribs) 
DonStewart (talk  contribs) 

Line 166:  Line 166:  
There are also more specialized array types: 
There are also more specialized array types: 

−  +  * [http://hackage.haskell.org/packages/archive/vector/0.5/doc/html/DataVectorUnboxed.html Unboxed] 

−  +  * [http://hackage.haskell.org/packages/archive/vector/0.5/doc/html/DataVectorStorable.html Storable] 

which provide unboxed arrays (i.e. no closures) and storable arrays (data that is pinned, and may be passed to and from C via a Ptr). 
which provide unboxed arrays (i.e. no closures) and storable arrays (data that is pinned, and may be passed to and from C via a Ptr). 

+  
+  In all cases, the operations are subject to loop fusion. That is, if you compose two functions, 

+  
+  map f . map g 

+  
+  the compiler will rewrite it into a single traversal: 

+  
+  map (f . g) 

+  
+  saving time and space. 

== Simple example == 
== Simple example == 

Line 214:  Line 224:  
fromList [0,10,20,30,40,50,60,70,80,90] :: Data.Vector.Vector 
fromList [0,10,20,30,40,50,60,70,80,90] :: Data.Vector.Vector 

</haskell> 
</haskell> 

−  
== Array Types == 
== Array Types == 

Line 222:  Line 231:  
=== Impure Arrays === 
=== Impure Arrays === 

=== Some examples === 
=== Some examples === 

+  
+  The most important attributes of an array are available in O(1) time, such as the size (length), 

+  
+  <haskell> 

+   how big is the array? 

+  Prelude Data.Vector> let a = fromList [1,2,3,4,5,6,7,8,9,10] 

+  Prelude Data.Vector> Data.Vector.length a 

+  10 

+  
+   is the array empty? 

+  Prelude Data.Vector> Data.Vector.null a 

+  False 

+  </haskell> 

+  
== Array Creation == 
== Array Creation == 

+  
+  === An example: filling a vector from a file === 

+  
+  We often want to populate a vector using a external data file. The easiest way to do this is with bytestring IO, and Data.Vector.unfoldr (or the equivalent functions in Data.Vector.Unboxed or Data.Vector.Storable: 

+  
+  <haskell> 

+  {# LANGUAGE BangPatterns #} 

+  
+  import qualified Data.ByteString.Lazy.Char8 as L 

+  import qualified Data.Vector as U 

+  import System.Environment 

+  
+  main = do 

+  [f] < getArgs 

+  s < L.readFile f 

+  print . U.sum . parse $ s 

+  
+   Fill a new vector from a file containing a list of numbers. 

+  parse = U.unfoldr step 

+  where 

+  step !s = case L.readInt s of 

+  Nothing > Nothing 

+  Just (!k, !t) > Just (k, L.tail t) 

+  </haskell> 

+  
+  Note the use of bang patterns to ensure the parsing accumulated state is produced strictly. 

+  
+  Create a data file filled with 1 million integers: 

+  
+  $ seq 1 1000000 > data 

+  
+  Compile with Odph (enables special optimizations to help fusion): 

+  
+  $ ghc Odph make vector.hs 

+  
+  Run: 

+  
+  $ time ./vector data 

+  500000500000 

+  ./vector data 0.08s user 0.01s system 98% cpu 0.088 total 

+  
== Basic Operations == 
== Basic Operations == 

== Indexing, Slicing and Iterating == 
== Indexing, Slicing and Iterating == 
Revision as of 01:38, 20 February 2010
Vector is a Haskell library for working with arrays, with an emphasis on raw performance, whilst retaining a rich interface. The main data types are boxed and unboxed arrays, and arrays may be immutable (pure), or mutable. Arrays are indexed by nonnegative Int
values.
The vector library has an API similar to the famous Haskell list library, with many of the same names.
This tutorial is modelled on the NumPy tutorial.
Contents
 1 Quick Tour
 2 The Tutorial
 2.1 Simple example
 2.2 Array Types
 2.3 Array Creation
 2.4 Basic Operations
 2.5 Indexing, Slicing and Iterating
 2.6 Bulk operations
 2.7 Stacking together different arrays
 2.8 Splitting one array into several smaller ones
 2.9 Copies and Views
 2.10 No Copy at All
 2.11 Indexing with Arrays of Indices
 2.12 Indexing with Boolean Arrays
 2.13 Permutations
 2.14 Randoms
 2.15 IO
 2.16 References
Quick Tour
Here is a quick overview to get you started.
Importing the library
Download the vector package:
$ cabal install vector
and import it as, for boxed arrays:
import qualified Data.Vector as V
or:
import qualified Data.Vector.Unboxed as V
for unboxed arrays. The library needs to be imported qualified as it shares the same function names as list operations in the Prelude.
Generating Vectors
New vectors can be generated in many ways:
$ ghci
GHCi, version 6.12.1: http://www.haskell.org/ghc/ :? for help
Loading package ghcprim ... linking ... done.
Loading package integergmp ... linking ... done.
Loading package base ... linking ... done.
Loading package ffi1.0 ... linking ... done.
Prelude> :m + Data.Vector
 Generating a vector from a list:
Prelude Data.Vector> let a = fromList [10, 20, 30, 40]
Prelude Data.Vector> a
fromList [10,20,30,40] :: Data.Vector.Vector
 Or filled from a sequence
Prelude Data.Vector> enumFromStepN 10 10 4
fromList [10,20,30,40] :: Data.Vector.Vector
 A vector created from four consecutive values
Prelude Data.Vector> enumFromN 10 4
fromList [10,11,12,13] :: Data.Vector.Vector
You can also build vectors using operations similar to lists:
 The empty vector
Prelude Data.Vector> empty
fromList [] :: Data.Vector.Vector
 A vector of length one
Prelude Data.Vector> singleton 2
fromList [2] :: Data.Vector.Vector
 A vector of length 10, filled with the value '2'
 Note that to disambiguate names,
 and avoid a clash with the Prelude,
 with use the full path to the Vector module
Prelude Data.Vector> Data.Vector.replicate 10 2
fromList [2,2,2,2,2,2,2,2,2,2] :: Data.Vector.Vector
In general, you may construct new vectors by applying a function to the index space:
Prelude Data.Vector> generate 10 (^2)
fromList [0,1,4,9,16,25,36,49,64,81] :: Data.Vector.Vector
Vectors may have more than one dimension:
 Here we create a two dimensional vector, 10 columns,
 each row filled with the row index.
Prelude Data.Vector> let x = generate 10 (\n > Data.Vector.replicate 10 n)
 The type is "Vector of Vector of Ints"
Prelude Data.Vector> :t x
x :: Vector (Vector Int)
Vectors may be grown or shrunk arbitrarily:
Prelude Data.Vector> let y = Data.Vector.enumFromTo 0 11
Prelude Data.Vector> y
fromList [0,1,2,3,4,5,6,7,8,9,10,11] :: Data.Vector.Vector
 Take the first 3 elements as a new vector
Prelude Data.Vector> Data.Vector.take 3 y
fromList [0,1,2] :: Data.Vector.Vector
 Duplicate and join the vector
Prelude Data.Vector> y Data.Vector.++ y
fromList [0,1,2,3,4,5,6,7,8,9,10,11,0,1,2,3,4,5,6,7,8,9,10,11] :: Data.Vector.Vector
Modifying vectors
Just as with lists, you can iterate (map) over arrays, reduce them (fold), filter them, or join them in various ways:
 mapping a function over the elements of a vector
Prelude Data.Vector> Data.Vector.map (^2) y
fromList [0,1,4,9,16,25,36,49,64,81,100,121] :: Data.Vector.Vector
 Extract only the odd elements from a vector
Prelude Data.Vector> Data.Vector.filter odd y
fromList [1,3,5,7,9,11] :: Data.Vector.Vector
 Reduce a vector
Prelude Data.Vector> Data.Vector.foldl (+) 0 y
66
 Take a scan (partial results from a reduction):
Prelude Data.Vector> Data.Vector.scanl (+) 0 y
fromList [0,0,1,3,6,10,15,21,28,36,45,55,66] :: Data.Vector.Vector
 Zip two arrays pairwise, into an array of pairs
Prelude Data.Vector> Data.Vector.zip y y
fromList [(0,0),(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8),(9,9),(10,10),(11,11)] :: Data.Vector.Vector
Indexing vectors
And like all good arrays, you can index them in various ways:
 Take the first element
Prelude Data.Vector> Data.Vector.head y
0
 Take the last element
Prelude Data.Vector> Data.Vector.tail y
fromList [1,2,3,4,5,6,7,8,9,10,11] :: Data.Vector.Vector
 Take an arbitrary element
Prelude Data.Vector> y ! 4
4
The Tutorial
The vector package provides a several types of array. The most general interface is via Data.Vector, which provides for boxed arrays, holding any type.
There are also more specialized array types:
which provide unboxed arrays (i.e. no closures) and storable arrays (data that is pinned, and may be passed to and from C via a Ptr).
In all cases, the operations are subject to loop fusion. That is, if you compose two functions,
map f . map g
the compiler will rewrite it into a single traversal:
map (f . g)
saving time and space.
Simple example
You can create the arrays in many ways, for example, from a regular Haskell list:
let a = fromList [2,3,4]
Prelude Data.Vector> a
fromList [2,3,4] :: Data.Vector.Vector
Prelude Data.Vector> :t a
a :: Vector Integer
GHCi will print the contents of the vector as executable code.
To create a multidimensional array, you can use a nested list generator to fill it:
Prelude Data.Vector> let x = fromList [ fromList [1 .. x]  x < [1..10] ]
Prelude Data.Vector> :t x
x :: Vector (Vector Integer)
 XXX TODO need a better printing function for multidimensional arrays.
You can also just create arrays filled with zeroes:
Prelude Data.Vector> Data.Vector.replicate 10 0
fromList [0,0,0,0,0,0,0,0,0,0] :: Data.Vector.Vector
And you can fill arrays from a sequence generator:
Prelude Data.Vector> enumFromN 1 10
fromList [1,2,3,4,5,6,7,8,9,10] :: Data.Vector.Vector
Prelude Data.Vector> enumFromStepN 0 10 10
fromList [0,10,20,30,40,50,60,70,80,90] :: Data.Vector.Vector
Array Types
Boxed Arrays
Unboxed Arrays
Pure Arrays
Impure Arrays
Some examples
The most important attributes of an array are available in O(1) time, such as the size (length),
 how big is the array?
Prelude Data.Vector> let a = fromList [1,2,3,4,5,6,7,8,9,10]
Prelude Data.Vector> Data.Vector.length a
10
 is the array empty?
Prelude Data.Vector> Data.Vector.null a
False
Array Creation
An example: filling a vector from a file
We often want to populate a vector using a external data file. The easiest way to do this is with bytestring IO, and Data.Vector.unfoldr (or the equivalent functions in Data.Vector.Unboxed or Data.Vector.Storable:
{# LANGUAGE BangPatterns #}
import qualified Data.ByteString.Lazy.Char8 as L
import qualified Data.Vector as U
import System.Environment
main = do
[f] < getArgs
s < L.readFile f
print . U.sum . parse $ s
 Fill a new vector from a file containing a list of numbers.
parse = U.unfoldr step
where
step !s = case L.readInt s of
Nothing > Nothing
Just (!k, !t) > Just (k, L.tail t)
Note the use of bang patterns to ensure the parsing accumulated state is produced strictly.
Create a data file filled with 1 million integers:
$ seq 1 1000000 > data
Compile with Odph (enables special optimizations to help fusion):
$ ghc Odph make vector.hs
Run:
$ time ./vector data 500000500000 ./vector data 0.08s user 0.01s system 98% cpu 0.088 total