Ideas and plans for hash table / map / set #30

fingolfin · 2017-08-03T15:57:52Z

For the future, I'd like to rewrite the hash set/map/table code once again. Some of the things I'd like to do in the new new code would include:

implement it so that it can be used as a true hashset, i.e. with no unnecessay memory or performance overhead (with the current code, you can implement a hashset as a hashmap that maps its elements to an arbitrary value, e.g. true, but this wastes memory and CPU cycles)
I'd like to implement it as an ordered hash set, which is the perfect datastructure for orbit enumerations, and in practice can even be faster than a classic hash set (as was implemented here). For an explanation on this, see e.g. here: https://morepypy.blogspot.de/2015/01/faster-more-memory-efficient-and-more.html
Another things I'd like to experiment with is Robin Hood hashing, see e.g.
Another possible interesting link: https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implementation.html

Also,

collect (more) statistics e.g. about collisions -- this is very helpful for debugging, and helps identify bad hash functions and also bugs in the hash table code
experiment with the load factor at which we resize the table (currently hard coded to 70 percent in the C implementation)
our open addressing currently uses this probe function (with PERTURB_SHIFT equal to 5):

    perturb = INT_INTOBJ(hash);
    idx = perturb & mask;
...
        idx = (5 * idx + perturb + 1) & mask;
        perturb >>= PERTURB_SHIFT;

Experiment with others, e.g. linear or quadratic probing

More ideas and TODOs:

implement iterators: KeysIterator, ValuesIterator, KeyValuesIterator (the last one would return pairs [key, value])
use one of them to implement for x in hashmap do ... -- I think this should iterate over keys
make a HashSet -- this could use almost the exact same code, it would simply omit the values plist
perhaps provide functions to get all keys or values as plists (but don't just return the internal lists!! instead, return compacted copies)

UPDATE: more stuff:

add a LookupWithDefault(hashmap, key, default), which works like the usual lookup, but if key is not present in the hashmap, it returns default.
provide a high-level interface for DS_Hash_AccumulateValue

The text was updated successfully, but these errors were encountered:

stevelinton · 2017-08-04T11:34:51Z

@fingolfin mentioned yesterday the idea of an hashmap which also preserves the order in which elements were added. As I understood it this essentially a PLIST to which new elements were added at the end, combined with a hashtable storing indices into the PLIST.

A few thoughts about this:

The way to present this at GAP level is probably as a List with Add but no assignment and a super-fast Position method (and consequently also an \in method). That very neatly meets the needs of orbit algorithms.

If you don't plan to delete much then this approach may always be correct. In the hash table you store the index of the key-value pair (if you have values) and some bits of the hash value of the key. Depending on the size of the hash table, you could get all of that into 32 or 64 bits per entry for all but the most enormous tables. With linear probing and Robin Hood hashing, for instance, you will basically need just one cache line from the hash table plus one from the PLIST plus whatever you have to do to compare entries for almost all lookups.

Making one of these would replace the common idiom of sorting a PLIST when you have finished making it and before you start doing a lot of Position or \in tests on it. Provided you can find a hash function, this is strictly better.

fingolfin mentioned this issue Aug 4, 2017

Write up design ideas and specs #10

Open

fingolfin mentioned this issue Aug 11, 2017

Add a new common base type, as replacement for IsCollection ? #53

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ideas and plans for hash table / map / set #30

Ideas and plans for hash table / map / set #30

fingolfin commented Aug 3, 2017 •

edited

Loading

stevelinton commented Aug 4, 2017

Ideas and plans for hash table / map / set #30

Ideas and plans for hash table / map / set #30

Comments

fingolfin commented Aug 3, 2017 • edited Loading

stevelinton commented Aug 4, 2017

fingolfin commented Aug 3, 2017 •

edited

Loading