Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ideas and plans for hash table / map / set #30

Open
fingolfin opened this issue Aug 3, 2017 · 1 comment
Open

Ideas and plans for hash table / map / set #30

fingolfin opened this issue Aug 3, 2017 · 1 comment

Comments

@fingolfin
Copy link
Member

fingolfin commented Aug 3, 2017

For the future, I'd like to rewrite the hash set/map/table code once again. Some of the things I'd like to do in the new new code would include:

Also,

  • collect (more) statistics e.g. about collisions -- this is very helpful for debugging, and helps identify bad hash functions and also bugs in the hash table code
  • experiment with the load factor at which we resize the table (currently hard coded to 70 percent in the C implementation)
  • our open addressing currently uses this probe function (with PERTURB_SHIFT equal to 5):
    perturb = INT_INTOBJ(hash);
    idx = perturb & mask;
...
        idx = (5 * idx + perturb + 1) & mask;
        perturb >>= PERTURB_SHIFT;

Experiment with others, e.g. linear or quadratic probing

More ideas and TODOs:

  • implement iterators: KeysIterator, ValuesIterator, KeyValuesIterator (the last one would return pairs [key, value])
  • use one of them to implement for x in hashmap do ... -- I think this should iterate over keys
  • make a HashSet -- this could use almost the exact same code, it would simply omit the values plist
  • perhaps provide functions to get all keys or values as plists (but don't just return the internal lists!! instead, return compacted copies)

UPDATE: more stuff:

  • add a LookupWithDefault(hashmap, key, default), which works like the usual lookup, but if key is not present in the hashmap, it returns default.

  • provide a high-level interface for DS_Hash_AccumulateValue

@stevelinton
Copy link
Contributor

@fingolfin mentioned yesterday the idea of an hashmap which also preserves the order in which elements were added. As I understood it this essentially a PLIST to which new elements were added at the end, combined with a hashtable storing indices into the PLIST.

A few thoughts about this:

The way to present this at GAP level is probably as a List with Add but no assignment and a super-fast Position method (and consequently also an \in method). That very neatly meets the needs of orbit algorithms.

If you don't plan to delete much then this approach may always be correct. In the hash table you store the index of the key-value pair (if you have values) and some bits of the hash value of the key. Depending on the size of the hash table, you could get all of that into 32 or 64 bits per entry for all but the most enormous tables. With linear probing and Robin Hood hashing, for instance, you will basically need just one cache line from the hash table plus one from the PLIST plus whatever you have to do to compare entries for almost all lookups.

Making one of these would replace the common idiom of sorting a PLIST when you have finished making it and before you start doing a lot of Position or \in tests on it. Provided you can find a hash function, this is strictly better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants