Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future enhancements #14

Open
mcauley-penney opened this issue Feb 20, 2023 · 0 comments
Open

Future enhancements #14

mcauley-penney opened this issue Feb 20, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@mcauley-penney
Copy link
Owner

This tool could be improved in serious ways. As of right now, its decent as a beginner's project: open addressing, linear probing, usage of a large value that may be prime as the size of the hash table. We do not use resizing right now and rely upon the hash table size to be enough. With these features in mind:

  1. consider using a tighter method of finding a prime size above our input size. Last I've seen on my Linux machine, our maximum input size will be around 75,000. Skarupke1 uses a list of primes to find the next valid size for the array, but also uses resizing. Skiena2 mentions on p. 93 that one should use a large prime that is not too close to $2^i - 1$, but what $i$ is in the presented formula isn't clear to me yet. I did not notice any information that stood out about sizing in chapter 6, about hashing and randomized algorithms but my math is so weak that, if that information was there in a veiled or implied way, I missed it. We should first determine if this can improve our memory usage, which we haven't profiled. A harder rule about the range of useful sizes of a hash table, provided by a higher source, would allow us to know how far off we are. To test prime values, we could use the Miller-Rabin primality test3.

  2. consider the Robin Hood heuristic4 for reorganization during linear probing. Kahssay5 expounds the value of it (and in fact discusses how to improve upon it and hash tables in general, such as avoiding wraparound by using a buffer of spaces at the end of the array). Our method, the standard way, is fine, but I suspect that reorganization would be more valuable to us if our hash table was more dense. More research is necessary.

Footnotes

  1. Malte Skarupke. I Wrote The Fastest Hashtable. 2016. url: https://probablydance.com/2017/02/26/i-wrote-the-fastest-hashtable.

  2. S. S. Skiena, “Hashing”, The Algorithm Design Manual, 3rd ed., Springer, 2020, pp. 93–98.

  3. https://kconrad.math.uconn.edu/blurbs/ugradnumthy/millerrabin.pdf

  4. https://www.cs.cornell.edu/courses/JavaAndDS/files/hashing_RobinHood.pdf

  5. E. Kahssay, “A Fast Concurrent and Resizable Robin Hood Hash Table,” thesis, 2021.

@mcauley-penney mcauley-penney added the enhancement New feature or request label Feb 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant