Future enhancements #14

mcauley-penney · 2023-02-20T16:48:38Z

This tool could be improved in serious ways. As of right now, its decent as a beginner's project: open addressing, linear probing, usage of a large value that may be prime as the size of the hash table. We do not use resizing right now and rely upon the hash table size to be enough. With these features in mind:

consider using a tighter method of finding a prime size above our input size. Last I've seen on my Linux machine, our maximum input size will be around 75,000. Skarupke¹ uses a list of primes to find the next valid size for the array, but also uses resizing. Skiena² mentions on p. 93 that one should use a large prime that is not too close to $2^i - 1$, but what $i$ is in the presented formula isn't clear to me yet. I did not notice any information that stood out about sizing in chapter 6, about hashing and randomized algorithms but my math is so weak that, if that information was there in a veiled or implied way, I missed it. We should first determine if this can improve our memory usage, which we haven't profiled. A harder rule about the range of useful sizes of a hash table, provided by a higher source, would allow us to know how far off we are. To test prime values, we could use the Miller-Rabin primality test³.
consider the Robin Hood heuristic⁴ for reorganization during linear probing. Kahssay⁵ expounds the value of it (and in fact discusses how to improve upon it and hash tables in general, such as avoiding wraparound by using a buffer of spaces at the end of the array). Our method, the standard way, is fine, but I suspect that reorganization would be more valuable to us if our hash table was more dense. More research is necessary.

Malte Skarupke. I Wrote The Fastest Hashtable. 2016. url: https://probablydance.com/2017/02/26/i-wrote-the-fastest-hashtable. ↩
S. S. Skiena, “Hashing”, The Algorithm Design Manual, 3rd ed., Springer, 2020, pp. 93–98. ↩
https://kconrad.math.uconn.edu/blurbs/ugradnumthy/millerrabin.pdf ↩
https://www.cs.cornell.edu/courses/JavaAndDS/files/hashing_RobinHood.pdf ↩
E. Kahssay, “A Fast Concurrent and Resizable Robin Hood Hash Table,” thesis, 2021. ↩

mcauley-penney added the enhancement New feature or request label Feb 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Future enhancements #14

Future enhancements #14

mcauley-penney commented Feb 20, 2023

Future enhancements #14

Future enhancements #14

Comments

mcauley-penney commented Feb 20, 2023

Footnotes