Skip to content
This repository has been archived by the owner on Feb 3, 2022. It is now read-only.

Proposals unordered containers

Robert M. Lefkowitz edited this page Jun 23, 2014 · 2 revisions

PageOutline

Table of Contents

Proposal: Add the unordered-containers package and the hashable package to the Haskell Platform

This is a proposal for the unordered-containers package version 0.2.3.0 and the hashable package version 1.1.2.5 to be included in the next major release of the Haskell platform.

Everyone is invited to review this proposal, following the standard procedure for proposing and reviewing packages.

Review comments should be sent to the libraries mailing list before Monday March 25th, 2013, which is the discussion deadline.

Credits

Proposal author: Johan Tibell

Package maintainer: Johan Tibell

The following individuals contributed to the review process:

  * ...

Abstract

The unordered-container' package implements hashing-based containers that trade some functionality (lack of ordering) for speed. The hashable package provides a type class that, like Ord, is implemented by all types that can be used as keys in these containers.

Documentation and tarballs from Hackage:

 * http://hackage.haskell.org/package/unordered-containers
 * http://hackage.haskell.org/package/hashable

Development repos:

 * https://github.com/tibbe/unordered-containers
 * https://github.com/tibbe/hashable

Rationale

Many applications that analyse large data sets ("Big Data") hold large lookup tables in memory. unordered-containers simultaneously targets such applications while also offering a faster drop-in replacement for e.g. `Data.Map`.

unordered-containers trades some functionality (lack of key-order related function) for [speed](http://blog.johantibell.com/2012/03/announcing-unordered-containers-02.html). Lookup operations are more than twice as fast and insertion is between 33-50% faster than `Data.Map`. The memory overhead is ~4.5 words per key-value pair while `Data.IntMap` uses 8 (includes key) and `Data.Map` uses 6.

The unordered-containers API should be familiar to any Haskeller that has used the containers package.

The unordered-containers library is very widely used: according to http://packdeps.haskellers.com/, it has 122 reverse dependencies on Hackage.

Introduction to the API

The unordered-containers API should be familiar to any Haskeller that has used the containers package. It contains a subset of the functions in e.g. `Data.Map`. The idea is to eventually grow the API to be as rich as the the containers one (minus the operations that rely on key ordering).

The hashable API contains a single type class, which is in spirit similar to `Ord`:

`hashWithSalt` computes a hash value for its argument, mixing in the given salt. The use of a salt allows the user to produce different hash values from the same input, which is important in some applications, like cuckoo hashing.

Design decisions

Type class instead of higher-order function to generate hash values

Instead of using a type class to create hash values one could use a higher-order argument to the container constructor

as done in `Data.HashTable`. This approach was not chosen for the following reasons:

 * It's less convenient: users rarely want to vary the hash function used for a given key type.
 * It's more error prone: hash functions are a specialist topic and letting the user choosing which function to use might lead to he/she using a bad one.
 * It's algorithmically less efficient for some operations: e.g. set-union can be done more efficiently if we know both sets used the same hash function.
 * It's faster: GHC does a better job specializing functions that use dictionaries (via `INLIABLE`) than functions that use higher-order arguments.

Insert vs lookup performance

The data structure (a hash array mapped trie) that's used to implement unordered-containers trades some insert performance for a much better lookup performance. This is a sensible trade-off for the majority of applications, but there are a few programs where using e.g. `Data.IntMap` might be better.

Open issues

 * Q: Why use an older version of hashable?
 * A: The latest version of hashable has some performance issues, mostly due to it design, that need to be addressed before it can be included in the platform.
Clone this wiki locally