# Difference between revisions of "Applications and libraries/Data structures"

From HaskellWiki

m (fixed typos; smoothed some wording) |
m (Updated link to Edison) |
||

Line 4: | Line 4: | ||

==General libraries and tools for data structures== |
==General libraries and tools for data structures== |
||

− | ;[http://www. |
+ | ;[http://www.cs.princeton.edu/~rdockins/edison/home/ Edison] |

:This is the latest version of the Edison library of efficient data structures. There are also [http://www.haskell.org/ghc/docs/edison/ earlier version of Edison] by Chris Okasaki. It provides sequences, finite maps, priority queues, and sets/bags. ([http://www.eecs.usma.edu/Personnel/okasaki/pubs.html#hw00 overview paper]). |
:This is the latest version of the Edison library of efficient data structures. There are also [http://www.haskell.org/ghc/docs/edison/ earlier version of Edison] by Chris Okasaki. It provides sequences, finite maps, priority queues, and sets/bags. ([http://www.eecs.usma.edu/Personnel/okasaki/pubs.html#hw00 overview paper]). |
||

## Revision as of 23:37, 16 May 2009

*The copyright status of this work is not known. Please help resolve this on the talk page.*

*This page contains a list of libraries and tools in a certain category. For a comprehensive list of such pages, see Applications and libraries.*

## Contents

## General libraries and tools for data structures

- Edison
- This is the latest version of the Edison library of efficient data structures. There are also earlier version of Edison by Chris Okasaki. It provides sequences, finite maps, priority queues, and sets/bags. (overview paper).

- Library/New collections
- This is a package of many useful collections types. Advantages include:
- It's easy to migrate from standard Lists/Sets/Maps to the new package. The package is an evolution (rather than a revolution) of the collections currently in the base package.
- Each collection type fits in a consistent framework (thanks to classes)
- An extensive QuickCheck test suite also serves as detailed specification for the collections.

- This package includes and supercedes the following libraries:

- Data.Tree.AVL
- An implementation of AVL trees and related utilities.

- Data.StringMap
- A library providing maps from String keys to values, based on Tries.

- This package includes the following libraries, but they are maintained separately:

- Ranged Sets
- A ranged set is a list of non-overlapping ranges. The ranges have upper and lower boundaries, and a boundary divides the base type into values above and below. No value can ever sit on a boundary. So you can have the set {2.0 < x <= 3.0, 5.3 < x < 6}

- Strafunski
- A bundle for generic programming. It provides programming support for generic traversal as useful in the implementation of program transformations.

- Partial v0.1
- The Partial library provides a partial order class. It also provides routines for generating a Hasse diagram from a set and a partial order. Renderers are provided for the abstract Hasse diagram representation into LaTeX (via Xy-pic) and into dot, the format for AT&T's Graphviz tools. Since no horizontal sorting is done, the Xy-pic output is rather poor at present; dot does a much better job with its layout optimisation algorithm.

- Discrete Interval Encoding Trees
- The discrete interval encoding tree (DIET) is a structure for storing subsets of types having a total order and a predecessor and a successor function.

- HList
- A heterogeneous collection is a datatype that is capable of storing data of different types, while providing operations for look-up, update, iteration, and others. There are various kinds of heterogeneous collections, differing in representation, invariants, and access operations.

- monadLib
- Iavor Diatchki's library of monad transformers for Haskell. It enables the quick construction of monads --- abstract data types that capture common programming idioms, such as throwing and catching exceptions or continuations. In many programming languages such features are built into the language (if they're provided at all), but in Haskell they are user-programmable.

- Pointless Haskell
- Pointless Haskell is library for point-free programming with recursion patterns defined as hylomorphisms. It also allows the visualization of the intermediate data structure of the hylomorphisms with GHood. This feature together with the DrHylo tool allows us to easily visualize recursion trees of Haskell functions.

- rhaskell : Reactive Objects
- Stefan Wehr's reactive objects library. Reactive objects are a convenient abstraction for writing programs which have to interact with a concurrent environment. A reactive object has two characteristics: the abandonment of all blocking operations and the unification of the concepts state and process. The former allows a reactive object to accept input from multiple sources without imposing any ordering on the input events. The latter prevents race conditions because the state of an object is only manipulated from the process belonging to the object.

- GenUtil
- A collection of random useful utility functions written in pure Haskell 98. In general, it tries to conform to the naming scheme put forth in the Haskell prelude and fill in the obvious omissions, as well as providing useful routines in general.

- PersistentDocument
*The link is dead, somebody please either update it or remove it.* - The persistent document abstraction takes care of dealing with a document you want to open from and save to disk and that supports undo. This functionality can be used by editors of arbitrary documents and saves you a lot of quite subtle coding.

- Zipper monad
- A generic monad for navigating around arbitrary data structures

## Graphs

- FGL - A Functional Graph Library
- The functional graph library provides a collection of graph operations.

- Data.Relation
- Part of the UMinho Haskell libraries, this library provides a representation and operations on relations. A special case of relations are graphs. The operations include graph chopping and slicing, strong connected component analysis, graphs metrics, and more.

- Haskell Graph Automorphism Library
- Jean-Philippe Bernardy's implementation of Brendan McKay's algorithm for graph canonical labeling and automorphism group (Nauty).

## IO

- Streams
- Streams is a feature-rich, flexible, extensible, backward-compatible and fast I/O library. It supports various stream types: files and legacy Handle type, string and memory buffers, pipes. There is also common functionality available for any stream: buffering, Char encoding, locking.

## Mutable data

- The Haskell STate Preprocessor
- This is a short preprocessor for stateful Haskell programs. It aleviates the pain of performing single array lookup/write/update functions with some syntax to support it. It also supports hash table operations based on the HashTable implementation available from the author. Finally, it supports monadic if and monadic case constructions. It is lightweight in the sense that it performs no syntactic analysis and is essentially a character transformer.

- Arrays & References Library
- Featuring:

- Unboxed references in IO and ST monads
- Monad-independent interfaces to boxed and unboxed references
- Syntax sugar to make using of mutable objects easier (=:, +=, -=,..)
- Reimplemented Arrays library with the following improvements:
- Unboxed arrays now can be used in polymorphic functions
- The MArray class now supports arrays with dynamic bounds
- Implementation of dynamic (resizable) arrays

- Data.StorableVector.ST
- StorableVector also support mutation in ST monad.

See also Modern array libraries

## Lists

- SList
*(broken link)* - Sized lists for Haskell

- dlist
- Difference lists (supporting O(1) append and snoc)

## Strings

- Data.ByteString
- The FPS library provides mmapped and malloc'd packed strings (byte arrays held by a ForeignPtr), along with a list interface to these strings. It lets you do extremely fast I/O in Haskell; in some cases, even faster than typical C implementations, as well as conserving space.

- Data.StorableVector
- This is very much the same like Data.ByteString extended to any Storable element type. There is a data structure for monolithic blocks and chunky sequences. Mutable access is also possible in ST monad.

- MissingH
- MissingH is a library of pure-Haskell utility functions relating to strings, logging, and I/O.

- HsLocale
- A locale-aware replacement for the standard IO routines, and support for
*wide*strings

- VariableExpansion
- A library for variable expansion inside strings

- i18n strings
- At sourceforge

## Regular expressions

There are many libraries for regular expressions available. By default GHC comes with:

- regex-base
- This defines the type classes for the new API

- regex-compat
- This defines the old Text.Regex.Posix API, using regex-posix

- regex-posix
- This is the interface to the old (and slow) posix backend (C's regex.h)

All backends provide String and ByteString interfaces.

Additional backend libraries are available for (hopefully) more efficient regular expression implementations:

- regex-pcre
- PCRE-based regexes. This requires libpcre (currently works against version 6.6.0 as of January 2007). This Haskell library and libpcre are both BSD. This is very fast but has left branch semantics instead of POSIX leftmost longest semantics.

- regex-tre
- TRE-based regexes. This requires libtre (currently works against version 0.7.5 as of January 2007). Libtre aims to be POSIX compatible and is a much faster implementation that used by regex-posix. Note that libtre does have an outstanding bug in correctly interpreting some regular expressions. This Haskell package is BSD-3 licensed and libtre is LGPL.

- regex-parsec
- Pure Haskell regexes implemented via parsec. The leftmost longest subexpression capture semantics of this library differ from the POSIX standard. This also has an option to prefer the left branch semantics that libpcre uses.

- regex-dfa
- Pure Haskell DFA for regexes. This is licensed under LPGL (the above packages are all under BSD-3) due to derived code. This is faster than regex-parsec but does not return captured subexpressions. It also is being updated to handle repeated patterns that can match empty strings (the older version hangs, the new version rewrites the pattern internally).

- There is a regex-tdfa packages in development (not released) that will be a pure Haskell and BSD-3 licensed implementation inspired by libtre. It aims to be a replacement for regex-posix.

- Development versions (possible unstable or broken) of the above packages are also available under regex-unstable. For support, please use the haskell-cafe mailing list.

## Serialising data

- Data.Binary (darcs repository)
- A library for serialising binary values to and from lazy ByteStrings.

- NewBinary
- A port of Malcolm Wallace's Binary library from NHC, offering facilities for heap compression and binary I/O. The de-facto standard for binary I/O in Haskell

- SerTH
- SerTH is a binary serialization library for Haskell. It supports serializing cyclic datatypes in a fast binary format. SerTH uses template haskell for deriving the serializing interface for new datatypes.

- AltBinary
- AltBinary is an exhaustive library that support binary I/O and serialization. It's part of Streams library, so serialization is possible to any I/O source, from String to memory-mapped file. It's also backward compatible with NewBinary library what makes translation of old code trivial. Very fast, very feature-rich, Hugs/GHC compatible, etc, etc...

- HsSyck
- YAML is a straightforward machine parsable data serialization format designed for human readability and interaction with dynamic languages. It is optimized for data serialization, configuration settings, log files, Internet messaging and filtering. Syck is an extension, written in C, for reading and writing YAML swiftly in popular scripting languages. It is part of core Ruby, and also has bindings for Perl 5, Python, Lua, Cocoa, and Perl 6. HsSyck provides Data.Yaml.Syck as an interface to YAML structures, using Data.ByteString for efficient textual data representation. Additionally, we provide a set of DrIFT rules to dump and load arbitrary Haskell data types in the YAML format.

- GenericSerialize
- GenericSerialize is a library which serializes data using the "Scrap Your Boilerplate" infrastructure. This means that while it cannot easily support data-structure-specific serialization, it can support many different data formats cheaply.

## Compressing data

- Compressions-2005
- Features of the Compression-2005 Library (homepage)

- easy and uniform access to most competitive free compression algorithms as of April'05: LZMA, PPMd and GRZip
- all input/output performed via user-supplied functions (callbacks), so you can compress data in memory, files, pipes, sockets and anything else
- all parameters of compression algorithm are defined with a single string, for example "lzma:8mb:fast:hc4:fb32".
- Using this library, you can write a bzip-like utility in one line, with better compression results than bzip2 itself.

- Zlib
- Zlib bindings for ByteStrings. darcs get http://code.haskell.org/zlib/ (docs)

- BZip2
- BZip2 bindings for ByteStrings. darcs get http://code.haskell.org/bzlib/ (docs)

## Benchmarking data structures

- Auburn
- Auburn is Graeme Moss's kit for benchmarking implementations of lazy data structures. Give it several implementations of an ADT (abstract data type) and it will tell you which one is best for your particular application.

- Bench
- Simple time and space benchmarking for various list-like data structures. Easily adapted to arbitrary structures

## Generic traversals

- RecLib A Recursion and Traversal Library for Haskell
- The Recursion Library for Haskell provides a rich set of generic traversal strategies to facilitate the flexible specification of generic term traversals. The underlying mechanism is the Scrap Your Boilerplate (SYB) approach. Most of the strategies that are used to implement recursion operators are taken from Stratego.

## Typesafe variants of `IntSet`

and `IntMap`

for enumerations

- ChrisKuklewicz wrote a bit of boilerplate to re-use IntSet and IntMap with any Enum type: EnumSet EnumMap
- an efficient implementation of EnumSet for sets which fit into a machine word can be found in Edison (see above)