Difference between revisions of "Arrays"

From HaskellWiki
Jump to navigation Jump to search
(formatted code)
(editing)
Line 1: Line 1:
Haskell'98 supports just one array constructor type, namely Array (see
+
Haskell'98 supports just one array constructor type, namely [http://haskell.org/onlinereport/array.html Array], which gives you immutable
http://haskell.org/onlinereport/array.html). It creates immutable
 
 
boxed arrays. "Immutable" means that these arrays, like any other pure
 
boxed arrays. "Immutable" means that these arrays, like any other pure
functional data structures, have contents fixed at construction time -
+
functional data structure, have contents fixed at construction time.
you can't modify it, only query. There is a "modification" operations,
+
You can't modify them, only query. There are "modification" operations,
but they just return new array and don't modify an original one. This
+
but they just return new array and don't modify the original one. This
makes possible using Arrays in pure functional code along with lists.
+
makes it possible using Arrays in pure functional code along with lists.
 
"Boxed" means that array elements are just ordinary Haskell (lazy)
 
"Boxed" means that array elements are just ordinary Haskell (lazy)
 
values, which are evaluated on need, and even can contain bottom
 
values, which are evaluated on need, and even can contain bottom
 
(undefined) value. You can learn how to use these arrays at
 
(undefined) value. You can learn how to use these arrays at
http://haskell.org/tutorial/arrays.html and i recommend you to read
+
http://haskell.org/tutorial/arrays.html and I recommend that you read
 
this before proceeding to rest of this page
 
this before proceeding to rest of this page
   
Nowadays three Haskell compilers - GHC, Hugs and NHC - shipped with
+
Nowadays three Haskell compilers - GHC, Hugs and NHC - ship with
the same set of Hierarchical Libraries
+
the same set of [http://www.haskell.org/ghc/docs/latest/html/libraries/index.html Hierarchical Libraries],
 
and these libraries contain a new implementation of arrays which is
(http://www.haskell.org/ghc/docs/latest/html/libraries/index.html),
 
 
backward compatible with the Haskell'98 one, but which has far more features.
and these libraries contains new implementation of arrays, which is
 
 
Suffice it to say that these libraries support 9 types of array
backward compatible with the Haskell'98 one, but contains far more features.
 
Suffice to say that these libraries supports 9 types of array
 
 
constructors: Array, UArray, IOArray, IOUArray, STArray, STUArray,
 
constructors: Array, UArray, IOArray, IOUArray, STArray, STUArray,
DiffArray, DiffUArray and StorableArray. It is no wonder that new
+
DiffArray, DiffUArray and StorableArray. It is no wonder that the
  +
array libraries are a source of so much confusion for new Haskellers. However, they are actually very simple - each provides just one of two interfaces, and one of these you already know.
arrays library make so much confusion for haskellers, although
 
basically it is very simple - it provides only two interfaces, one of
 
that you already know.
 
   
== Immutable arrays (module Data.Array.IArray) ==
+
== Immutable arrays (module [http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Array-IArray.html Data.Array.IArray]) ==
   
The first interface, provided by the new arrays library, is defined
+
The first interface provided by the new array library, is defined
 
by type class IArray (which stands for "immutable array" and defined
 
by type class IArray (which stands for "immutable array" and defined
in module Data.Array.IArray - see
+
in the module [http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Array-IArray.html Data.Array.IArray]
 
and defines the same operations that were defined for Array in
http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Array-IArray.html)
 
 
Haskell'98. The big difference is that it is now a typeclass and there are 4
and contains just the same operations that was defined for Array in
 
 
array type constructors, each of which implements this interface: Array,
Haskell'98. The only difference is that now it is a typeclass and there are 4
 
 
UArray, DiffArray, and DiffUArray. We will later describe the differences
array type constructors, which implement this interface: Array,
 
 
between them and the cases when these other types are preferable to use instead
UArray, DiffArray, DiffUArray. We will describe later differences
 
 
of the good old Array. Also note that to use Array type constructor
between them and cases when other types are preferred to use instead
 
of good old Array. Also note that to use Array type constructor
 
 
together with other new array types, you need to import
 
together with other new array types, you need to import
 
Data.Array.IArray module instead of Data.Array
 
Data.Array.IArray module instead of Data.Array
Line 41: Line 36:
   
   
== Mutable IO arrays (module Data.Array.IO) ==
+
== Mutable IO arrays (module [http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Array-IO.html Data.Array.IO]) ==
   
Second interface defined by the type class MArray (which stands for
+
The second interface is defined by the type class MArray (which stands for
"mutable array" and defined in module Data.Array.MArray - see
+
"mutable array" and is defined in the module [http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Array-MArray.html Data.Array.MArray]
http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Array-MArray.html)
 
 
and contains operations to update array elements in-place. Mutable
 
and contains operations to update array elements in-place. Mutable
arrays are very like to IORefs, only containing multiple values. Type
+
arrays are very similar to IORefs, only they contain multiple values. Type
 
constructors for mutable arrays are IOArray and IOUArray and
 
constructors for mutable arrays are IOArray and IOUArray and
operations which create, update and query these arrays all belongs to
+
operations which create, update and query these arrays all belong to the
 
IO monad:
 
IO monad:
   
Line 58: Line 52:
 
readArray arr 1 >>= print
 
readArray arr 1 >>= print
   
This program creates array of 10 elements with 37 as initial
+
This program creates an array of 10 elements with all values initially set to 37. Then it reads and prints the first element of the array. After that, the
values. Then it reads and prints first element of array. After that
+
program modifies the first element of the array and then reads and prints it
 
again. The type declaration in the second line is necessary because our little
program modifies first element of array and then reads and prints it
 
 
program doesn't provide enough context to allow the compiler to determine the concrete type of `arr`.
again. Type definition in second line is necessary because our little
 
program don't allow compiler to determine concrete type of `arr`.
 
   
   
   
== Mutable arrays in ST monad (module Data.Array.ST) ==
+
== Mutable arrays in ST monad (module [http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Array-ST.html Data.Array.ST]) ==
   
Just like IORef has more general cousine - STRef, IOArray has more
+
In the same way that IORef has its more general cousin STRef, IOArray has a more
general version - STArray (and IOUArray dubbed by STUArray). These
+
general version STArray (and similarly, IOUArray is parodied by STUArray). These
array types allows to work with modifiable arrays in state monad:
+
array types allow one to work with mutable arrays in the ST monad:
   
 
import Control.Monad.ST
 
import Control.Monad.ST
 
import Data.Array.ST
 
import Data.Array.ST
main = print $ runST
 
(do arr <- newArray (1,10) 127 :: ST s (STArray s Int Int)
 
a <- readArray arr 1
 
writeArray arr 1 216
 
b <- readArray arr 1
 
return (a,b)
 
)
 
   
 
buildPair = do arr <- newArray (1,10) 127 :: ST s (STArray s Int Int)
Believe you or not, but now you know all that needed to _use_ any
 
 
a <- readArray arr 1
 
writeArray arr 1 216
 
b <- readArray arr 1
 
return (a,b)
  +
 
main = print $ runST buildPair
  +
 
Believe it or not, now you know all that is needed to <i>use</i> any
 
array type. Unless you are interested in speed issues, just use Array,
 
array type. Unless you are interested in speed issues, just use Array,
 
IOArray and STArray where appropriate. The following topics are almost
 
IOArray and STArray where appropriate. The following topics are almost
exclusively about selecting proper array type to make program run
+
exclusively about selecting the proper array type to make programs run
 
faster.
 
faster.
   
   
   
== DiffArray (module Data.Array.Diff) ==
+
== DiffArray (module [http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Array-Diff.html Data.Array.Diff]) ==
   
As we already stated, update operation on immutable arrays (IArray)
+
As we already stated, the update operation on immutable arrays (IArray)
just creates new copy of array, what is very inefficient, but it is
+
just creates a new copy of the array, which is very inefficient, but it is a
pure operation what can be used in pure functions. On the other hand,
+
pure operation which can be used in pure functions. On the other hand,
 
updates on mutable arrays (MArray) are efficient but can be done only
 
updates on mutable arrays (MArray) are efficient but can be done only
 
in monadic code. DiffArray combines the best of both worlds - it
 
in monadic code. DiffArray combines the best of both worlds - it
supports interface of IArray and therefore can be used in pure
+
supports interface of IArray and therefore can be used in a pure
functional way, but internally used an efficient updating of MArrays.
+
functional way, but internally uses the efficient update of MArrays.
   
How that trick works? DiffArray has pure external interface, but
+
How does this trick work? DiffArray has a pure external interface, but
internally it represented as the reference to IOArray.
+
internally it represented as a reference to an IOArray.
   
 
When the '//' operator is applied to a diff array, its contents
 
When the '//' operator is applied to a diff array, its contents
Line 110: Line 104:
 
 
 
So if a diff array is used in a single-threaded style,
 
So if a diff array is used in a single-threaded style,
i.e. after '//' application the old version is no longer used,
+
that is, after '//' application the old version is no longer used,
 
a!i takes O(1) time and a//d takes O(length d).
 
a!i takes O(1) time and a//d takes O(length d).
 
Accessing elements of older versions gradually becomes slower.
 
Accessing elements of older versions gradually becomes slower.
Line 119: Line 113:
 
thus have fast element access by a//[].
 
thus have fast element access by a//[].
   
Library provides two "differential" array costructors - DiffArray,
+
The library provides two "differential" array constructors - DiffArray,
made internally from IOArray, and DiffUArray, based on IOUArray. But
+
made internally from IOArray, and DiffUArray, based on IOUArray. If you really need to, you can construct new "differential" array types from any
 
'MArray' types living in the 'IO' monad. See the [http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Array-Diff.html module documentation] for further details.
if you need, you can construct new "differential" array types from any
 
'MArray' types living in the 'IO' monad. See the module internals for
 
further details
 
   
   
Line 129: Line 121:
 
== Unboxed arrays ==
 
== Unboxed arrays ==
   
  +
In most implementations of lazy evaluation, values are represented at runtime as pointers to either their value, or code for computing their value. This extra level of indirection, together with any extra tags needed by the runtime, is known as a box. The default "boxed" arrays consist of many of these boxes, each of which may compute its value separately. This allows for many neat tricks, like recursively defining an array's elements in terms of one another, or only computing the specific elements of the array which are ever needed. However, for large arrays, it costs a lot in terms of overhead, and if the entire array is always needed, it can be a waste.
Unboxed arrays are like arrays in C - they contains just the plain
 
  +
values without extra level of indirection, so that, for example, array
 
 
Unboxed arrays are more like arrays in C - they contain just the plain
of 1024 values of type Int32 will use only 4 kb of memory. Moreover,
 
 
values without this extra level of indirection, so that, for example,
indexing of such arrays works significantly faster.
 
 
an array of 1024 values of type Int32 will use only 4 kb of memory.
 
Moreover, indexing of such arrays can be significantly faster.
   
 
Of course, unboxed arrays have their own disadvantages. First, unboxed
 
Of course, unboxed arrays have their own disadvantages. First, unboxed
arays can be made only of plain values having fixed size - Int, Word,
+
arrays can be made only of plain values having a fixed size - Int, Word,
Char, Bool, Ptr, Double (see full list on
+
Char, Bool, Ptr, Double, etc. (see the full list in the [http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Array-Unboxed.html Data.Array.Unboxed] module).
 
You can even implement unboxed arrays yourself for other
http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Array-Unboxed.html).
 
You can even implement yourself unboxed arrays interface for other
 
 
simple types, including enumerations. But Integer, String and any
 
simple types, including enumerations. But Integer, String and any
other types defined with variants cannot form the unboxed arrays.
+
other types defined with variable size cannot be elements of unboxed arrays.
  +
Second, without that extra level of indirection, all of the elements in an unboxed array must be evaluated when the array is evaluated, so you lose the benefits of lazy evaluation. Indexing the array to read just one element will construct the entire array. This is not much of a loss if you will eventually need the whole array, but it does prevent recursively defining the array elements in terms of each other, and may be too expensive if you only ever need specific values. Nevertheless, unboxed arrays are a very useful optimization
Second, all elements in unboxed array are evaluated when array is
 
 
instrument, and I recommend using them as much as possible.
created, so you can't use benefits of lazy evaluation for elements of
 
such array. Nevertheless, unboxed arrays are very useful optimization
 
instrument, so i recommend to use them as much as possible.
 
   
All main array types in this library has their unboxed counterparts:
+
All main array types in the library have unboxed counterparts:
   
 
Array - UArray (module Data.Array.Unboxed)
 
Array - UArray (module Data.Array.Unboxed)
Line 154: Line 145:
   
 
So, basically replacing boxed arrays in your program with unboxed ones
 
So, basically replacing boxed arrays in your program with unboxed ones
is very simple - just add 'U' to type signatures and you are done! If
+
is very simple - just add 'U' to the type signatures, and you are done! Of course, if you change Array to UArray, you also need to add "Data.Array.Unboxed"
 
to your imports list.
you changed Array to UArray, you also need to add "Data.Array.Unboxed"
 
to your imports list
 
   
   
   
== StorableArray (module Data.Array.Storable) ==
+
== StorableArray (module [http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-Array-Storable.html Data.Array.Storable]) ==
   
 
A storable array is an IO-mutable array which stores its
 
A storable array is an IO-mutable array which stores its
Line 169: Line 159:
   
 
It is similar to 'IOUArray' (in particular, it implements the same
 
It is similar to 'IOUArray' (in particular, it implements the same
MArray interface) but slower. Its advantage is that it's compatible
+
MArray interface) but slower. The advantage is that it's compatible
  +
with C through the foreign function interface. The memory addresses of
with C. Memory address of storable arrays are fixed, so you can pass
 
them to C routines.
+
storable arrays are fixed, so you can pass them to C routines.
   
 
The pointer to the array contents is obtained by 'withStorableArray'.
 
The pointer to the array contents is obtained by 'withStorableArray'.
 
The idea is similar to 'ForeignPtr' (used internally here).
 
The idea is similar to 'ForeignPtr' (used internally here).
 
The pointer should be used only during execution of the 'IO' action
 
The pointer should be used only during execution of the 'IO' action
retured by the function passed as argument to 'withStorableArray'.
+
returned by the function passed as argument to 'withStorableArray'.
   
 
{-# OPTIONS_GHC -fglasgow-exts #-}
 
{-# OPTIONS_GHC -fglasgow-exts #-}
Line 201: Line 191:
 
== The Haskell Array Preprocessor (STPP) ==
 
== The Haskell Array Preprocessor (STPP) ==
   
Using in Haskell mutable arrays (IO and ST ones) is not very handy.
+
Using mutable arrays in Haskell (IO and ST ones) is not very handy.
 
But there is one tool which adds syntax sugar and makes using of such
 
But there is one tool which adds syntax sugar and makes using of such
 
arrays very close to that in imperative languages. It is written by
 
arrays very close to that in imperative languages. It is written by

Revision as of 13:28, 9 January 2006

Haskell'98 supports just one array constructor type, namely Array, which gives you immutable boxed arrays. "Immutable" means that these arrays, like any other pure functional data structure, have contents fixed at construction time. You can't modify them, only query. There are "modification" operations, but they just return new array and don't modify the original one. This makes it possible using Arrays in pure functional code along with lists. "Boxed" means that array elements are just ordinary Haskell (lazy) values, which are evaluated on need, and even can contain bottom (undefined) value. You can learn how to use these arrays at http://haskell.org/tutorial/arrays.html and I recommend that you read this before proceeding to rest of this page

Nowadays three Haskell compilers - GHC, Hugs and NHC - ship with the same set of Hierarchical Libraries, and these libraries contain a new implementation of arrays which is backward compatible with the Haskell'98 one, but which has far more features. Suffice it to say that these libraries support 9 types of array constructors: Array, UArray, IOArray, IOUArray, STArray, STUArray, DiffArray, DiffUArray and StorableArray. It is no wonder that the array libraries are a source of so much confusion for new Haskellers. However, they are actually very simple - each provides just one of two interfaces, and one of these you already know.

Immutable arrays (module Data.Array.IArray)

The first interface provided by the new array library, is defined by type class IArray (which stands for "immutable array" and defined in the module Data.Array.IArray and defines the same operations that were defined for Array in Haskell'98. The big difference is that it is now a typeclass and there are 4 array type constructors, each of which implements this interface: Array, UArray, DiffArray, and DiffUArray. We will later describe the differences between them and the cases when these other types are preferable to use instead of the good old Array. Also note that to use Array type constructor together with other new array types, you need to import Data.Array.IArray module instead of Data.Array


Mutable IO arrays (module Data.Array.IO)

The second interface is defined by the type class MArray (which stands for "mutable array" and is defined in the module Data.Array.MArray and contains operations to update array elements in-place. Mutable arrays are very similar to IORefs, only they contain multiple values. Type constructors for mutable arrays are IOArray and IOUArray and operations which create, update and query these arrays all belong to the IO monad:

import Data.Array.IO
main = do arr <- newArray (1,10) 37 :: IO (IOArray Int Int)
          readArray  arr 1 >>= print
          writeArray arr 1 64
          readArray  arr 1 >>= print

This program creates an array of 10 elements with all values initially set to 37. Then it reads and prints the first element of the array. After that, the program modifies the first element of the array and then reads and prints it again. The type declaration in the second line is necessary because our little program doesn't provide enough context to allow the compiler to determine the concrete type of `arr`.


Mutable arrays in ST monad (module Data.Array.ST)

In the same way that IORef has its more general cousin STRef, IOArray has a more general version STArray (and similarly, IOUArray is parodied by STUArray). These array types allow one to work with mutable arrays in the ST monad:

import Control.Monad.ST
import Data.Array.ST
buildPair = do arr <- newArray (1,10) 127 :: ST s (STArray s Int Int)
               a <- readArray arr 1
               writeArray arr 1 216
               b <- readArray arr 1
               return (a,b)
main = print $ runST buildPair

Believe it or not, now you know all that is needed to use any array type. Unless you are interested in speed issues, just use Array, IOArray and STArray where appropriate. The following topics are almost exclusively about selecting the proper array type to make programs run faster.


DiffArray (module Data.Array.Diff)

As we already stated, the update operation on immutable arrays (IArray) just creates a new copy of the array, which is very inefficient, but it is a pure operation which can be used in pure functions. On the other hand, updates on mutable arrays (MArray) are efficient but can be done only in monadic code. DiffArray combines the best of both worlds - it supports interface of IArray and therefore can be used in a pure functional way, but internally uses the efficient update of MArrays.

How does this trick work? DiffArray has a pure external interface, but internally it represented as a reference to an IOArray.

When the '//' operator is applied to a diff array, its contents are physically updated in place. The old array silently changes its representation without changing the visible behavior: it stores a link to the new current array along with the difference to be applied to get the old contents.

So if a diff array is used in a single-threaded style, that is, after '//' application the old version is no longer used, a!i takes O(1) time and a//d takes O(length d). Accessing elements of older versions gradually becomes slower.

Updating an array which is not current makes a physical copy. The resulting array is unlinked from the old family. So you can obtain a version which is guaranteed to be current and thus have fast element access by a//[].

The library provides two "differential" array constructors - DiffArray, made internally from IOArray, and DiffUArray, based on IOUArray. If you really need to, you can construct new "differential" array types from any 'MArray' types living in the 'IO' monad. See the module documentation for further details.


Unboxed arrays

In most implementations of lazy evaluation, values are represented at runtime as pointers to either their value, or code for computing their value. This extra level of indirection, together with any extra tags needed by the runtime, is known as a box. The default "boxed" arrays consist of many of these boxes, each of which may compute its value separately. This allows for many neat tricks, like recursively defining an array's elements in terms of one another, or only computing the specific elements of the array which are ever needed. However, for large arrays, it costs a lot in terms of overhead, and if the entire array is always needed, it can be a waste.

Unboxed arrays are more like arrays in C - they contain just the plain values without this extra level of indirection, so that, for example, an array of 1024 values of type Int32 will use only 4 kb of memory. Moreover, indexing of such arrays can be significantly faster.

Of course, unboxed arrays have their own disadvantages. First, unboxed arrays can be made only of plain values having a fixed size - Int, Word, Char, Bool, Ptr, Double, etc. (see the full list in the Data.Array.Unboxed module). You can even implement unboxed arrays yourself for other simple types, including enumerations. But Integer, String and any other types defined with variable size cannot be elements of unboxed arrays. Second, without that extra level of indirection, all of the elements in an unboxed array must be evaluated when the array is evaluated, so you lose the benefits of lazy evaluation. Indexing the array to read just one element will construct the entire array. This is not much of a loss if you will eventually need the whole array, but it does prevent recursively defining the array elements in terms of each other, and may be too expensive if you only ever need specific values. Nevertheless, unboxed arrays are a very useful optimization instrument, and I recommend using them as much as possible.

All main array types in the library have unboxed counterparts:

Array - UArray          (module Data.Array.Unboxed)
IOArray - IOUArray      (module Data.Array.IO)
STArray - STUArray      (module Data.Array.ST)
DiffArray - DiffUArray  (module Data.Array.Diff)

So, basically replacing boxed arrays in your program with unboxed ones is very simple - just add 'U' to the type signatures, and you are done! Of course, if you change Array to UArray, you also need to add "Data.Array.Unboxed" to your imports list.


StorableArray (module Data.Array.Storable)

A storable array is an IO-mutable array which stores its contents in a contiguous memory block living in the C heap. Elements are stored according to the class 'Storable'. You can obtain the pointer to the array contents to manipulate elements from languages like C.

It is similar to 'IOUArray' (in particular, it implements the same MArray interface) but slower. The advantage is that it's compatible with C through the foreign function interface. The memory addresses of storable arrays are fixed, so you can pass them to C routines.

The pointer to the array contents is obtained by 'withStorableArray'. The idea is similar to 'ForeignPtr' (used internally here). The pointer should be used only during execution of the 'IO' action returned by the function passed as argument to 'withStorableArray'.

{-# OPTIONS_GHC -fglasgow-exts #-}
import Data.Array.Storable
import Foreign.Ptr
import Foreign.C.Types

main = do arr <- newArray (1,10) 37 :: IO (StorableArray Int Int)
          readArray arr 1 >>= print
          withStorableArray arr $ \ptr ->
              memset ptr 0 40
          readArray arr 1 >>= print

foreign import ccall unsafe "string.h" 
    memset  :: Ptr a -> CInt -> CSize -> IO ()


If you want to use this pointer afterwards, ensure that you call 'touchStorableArray' AFTER the last use of the pointer, so that the array will be not freed too early.


The Haskell Array Preprocessor (STPP)

Using mutable arrays in Haskell (IO and ST ones) is not very handy. But there is one tool which adds syntax sugar and makes using of such arrays very close to that in imperative languages. It is written by Hal Daume III and you can get it as http://www.isi.edu/~hdaume/STPP/stpp.tar.gz

Using this tool, you can index array elements in arbitrary complex expressions with just "arr[|i|]" notation and this preprocessor will automatically convert such syntax forms to appropriate calls to 'readArray' and 'writeArray'. Multi-dimensional arrays are also supported, with indexing in the form "arr[|i|][|j|]". See further descriptions at http://www.isi.edu/~hdaume/STPP/


Unsafe indexing, freezing/thawing, running over array elements

GHC-specific topics

Parallel arrays (module GHC.PArr)

Welcome to machine: Array#, MutableArray#, ByteArray#, MutableByteArray#, pinned and moveable byte arrays

Notes for contributors to this page

if you have any questions, please ask at the IRC/maillist. if you have any answers, please submit them directly to this page. please don't sign your contributions, so that anyone will feel free to further improve this page. but if you are compiler/Array libraries author - please sign your text to let us know that it is the Last Word of Truth :-)

disclaimer: i'm not native english speaker, so that this page can contain all sorts of spelling errors. moreover, i never used arrays in my programs (except for parallel arrays), so i can't guarantee that this page don't contains any other types of bugs :-)