HaskellImplementorsWorkshop/2011/Tibell

From HaskellWiki
Jump to navigation Jump to search

Faster persistent data structures through hashing

Johan Tibell

The most commonly used map (dictionary) data types in Haskell are implemented using some kind of binary tree, typically a size balanced tree or a Patricia tree. While binary trees provide good asymptotic performance, their real world performance is not stellar, especially when used with keys which are expensive to compare, such as strings.

In this talk I will describe a new map data type that uses a recently developed data structure, a hash array mapped trie, to achieve better real world performance. I will describe the design and implementation of this new data structure, improvements made to GHC to improve its performance, benchmark results, and finally conclude with a discussion on compiler improvements that could further improve the performance of this data type.

I've given a version of this talk before at Galois, but I've made some progress since: at the Galois talk I described that while HAMTs have great lookup performance, there insert performance wasn't the greatest (slightly slower than Data.Map). I've now addressed this problem and my HAMT implementation is not only faster than Data.Map but also faster than Data.IntMap, the main contender for insert performance. The lookup performance is still 3x better than Data.IntMap.

I also plan to spend a few minutes talking about ideas on how we could improve unboxing/memory representation in GHC to better support this kind of data structures.