-
Notifications
You must be signed in to change notification settings - Fork 91
Proposals unordered containers
This is a proposal for the unordered-containers package version 0.2.3.0 and the hashable package version 1.1.2.5 to be included in the next major release of the Haskell platform.
Everyone is invited to review this proposal, following the standard procedure for proposing and reviewing packages.
Review comments should be sent to the libraries mailing list before Monday March 25th, 2013, which is the discussion deadline.
Proposal author: Johan Tibell
Package maintainer: Johan Tibell
The following individuals contributed to the review process:
* ...
The unordered-container' package implements hashing-based containers that trade some functionality (lack of ordering) for speed. The hashable package provides a type class that, like Ord, is implemented by all types that can be used as keys in these containers.
Documentation and tarballs from Hackage:
* http://hackage.haskell.org/package/unordered-containers * http://hackage.haskell.org/package/hashable
Development repos:
* https://github.com/tibbe/unordered-containers * https://github.com/tibbe/hashable
Many applications that analyse large data sets ("Big Data") hold large lookup tables in memory. unordered-containers simultaneously targets such applications while also offering a faster drop-in replacement for e.g. `Data.Map`.
unordered-containers trades some functionality (lack of key-order related function) for [speed](http://blog.johantibell.com/2012/03/announcing-unordered-containers-02.html). Lookup operations are more than twice as fast and insertion is between 33-50% faster than `Data.Map`. The memory overhead is ~4.5 words per key-value pair while `Data.IntMap` uses 8 (includes key) and `Data.Map` uses 6.
The unordered-containers API should be familiar to any Haskeller that has used the containers package.
The unordered-containers library is very widely used: according to http://packdeps.haskellers.com/, it has 122 reverse dependencies on Hackage.
The unordered-containers API should be familiar to any Haskeller that has used the containers package. It contains a subset of the functions in e.g. `Data.Map`. The idea is to eventually grow the API to be as rich as the the containers one (minus the operations that rely on key ordering).
The hashable API contains a single type class, which is in spirit similar to `Ord`:
`hashWithSalt` computes a hash value for its argument, mixing in the given salt. The use of a salt allows the user to produce different hash values from the same input, which is important in some applications, like cuckoo hashing.
Instead of using a type class to create hash values one could use a higher-order argument to the container constructor
as done in `Data.HashTable`. This approach was not chosen for the following reasons:
* It's less convenient: users rarely want to vary the hash function used for a given key type. * It's more error prone: hash functions are a specialist topic and letting the user choosing which function to use might lead to he/she using a bad one. * It's algorithmically less efficient for some operations: e.g. set-union can be done more efficiently if we know both sets used the same hash function. * It's faster: GHC does a better job specializing functions that use dictionaries (via `INLIABLE`) than functions that use higher-order arguments.
The data structure (a hash array mapped trie) that's used to implement unordered-containers trades some insert performance for a much better lookup performance. This is a sensible trade-off for the majority of applications, but there are a few programs where using e.g. `Data.IntMap` might be better.
* Q: Why use an older version of hashable? * A: The latest version of hashable has some performance issues, mostly due to it design, that need to be addressed before it can be included in the platform.