Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iterate over hashes #120

Open
maartensandbox opened this issue Mar 25, 2023 · 2 comments
Open

iterate over hashes #120

maartensandbox opened this issue Mar 25, 2023 · 2 comments

Comments

@maartensandbox
Copy link

I have two dictionaries. I iterate over the keyvalue pairs of the first one, and if the second dict also contains that key then I assign it a new value.

It would be nice if I was able (maybe I already am?) to iterate over the (key,hash,value) pairs, and have a faster lookup in the second dict that avoids rehashing the key.

@andyferris
Copy link
Owner

andyferris commented Mar 25, 2023

Hmm, yes that is an interesting use case. The "inner join" and "left join" patterns are pretty common.

Note this is getting very low level. Maybe something like the "tokens" dictionary defined for hash-based dictionaries, that iterates (token, hash), could be helpful, together with an extension of gettoken where you provide the key and precomputed hash. For example, if keyword arguments are fast we enough we could try for a signature like gettoken(key; hash), or else create a new generic functions.

We also overwrite and utilize a bit or two of the hash for internal purposes, so we need to take care with that (ideally this would work between Dictionary and UnorderedDictionary without problems).

Do you see any other nice ways of implementing this?

@maartensandbox
Copy link
Author

One could also add a specialized gettoken that dispatches on Pair{I,Uint} for Indices{I}? It will anyway probably only be something implemented for Dictionary, as those are the only indices that currently store the hash. Although one could probably do something similar for unordereddictionary, the index reveals the first n bits of the hash, and a lookup in a different unordereddictionary can then sometimes avoid having to recompute the hash, but I don't think it's worth the hassle.

The internal bits can be somewhat confusing when iterating (token,hash), and then not actually having hash(token) == hash, but I don't think that's a real problem. When users start supplying their own hash to get faster lookups, then they probably will also have read the documentation, and the lookup function will anyway "&hashmask" whatever gets passed in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants