You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This tool could be improved in serious ways. As of right now, its decent as a beginner's project: open addressing, linear probing, usage of a large value that may be prime as the size of the hash table. We do not use resizing right now and rely upon the hash table size to be enough. With these features in mind:
consider using a tighter method of finding a prime size above our input size. Last I've seen on my Linux machine, our maximum input size will be around 75,000. Skarupke1 uses a list of primes to find the next valid size for the array, but also uses resizing. Skiena2 mentions on p. 93 that one should use a large prime that is not too close to $2^i - 1$, but what $i$ is in the presented formula isn't clear to me yet. I did not notice any information that stood out about sizing in chapter 6, about hashing and randomized algorithms but my math is so weak that, if that information was there in a veiled or implied way, I missed it. We should first determine if this can improve our memory usage, which we haven't profiled. A harder rule about the range of useful sizes of a hash table, provided by a higher source, would allow us to know how far off we are. To test prime values, we could use the Miller-Rabin primality test3.
consider the Robin Hood heuristic4 for reorganization during linear probing. Kahssay5 expounds the value of it (and in fact discusses how to improve upon it and hash tables in general, such as avoiding wraparound by using a buffer of spaces at the end of the array). Our method, the standard way, is fine, but I suspect that reorganization would be more valuable to us if our hash table was more dense. More research is necessary.
This tool could be improved in serious ways. As of right now, its decent as a beginner's project: open addressing, linear probing, usage of a large value that may be prime as the size of the hash table. We do not use resizing right now and rely upon the hash table size to be enough. With these features in mind:
consider using a tighter method of finding a prime size above our input size. Last I've seen on my Linux machine, our maximum input size will be around 75,000. Skarupke1 uses a list of primes to find the next valid size for the array, but also uses resizing. Skiena2 mentions on p. 93 that one should use a large prime that is not too close to$2^i - 1$ , but what $i$ is in the presented formula isn't clear to me yet. I did not notice any information that stood out about sizing in chapter 6, about hashing and randomized algorithms but my math is so weak that, if that information was there in a veiled or implied way, I missed it. We should first determine if this can improve our memory usage, which we haven't profiled. A harder rule about the range of useful sizes of a hash table, provided by a higher source, would allow us to know how far off we are. To test prime values, we could use the Miller-Rabin primality test3.
consider the Robin Hood heuristic4 for reorganization during linear probing. Kahssay5 expounds the value of it (and in fact discusses how to improve upon it and hash tables in general, such as avoiding wraparound by using a buffer of spaces at the end of the array). Our method, the standard way, is fine, but I suspect that reorganization would be more valuable to us if our hash table was more dense. More research is necessary.
Footnotes
Malte Skarupke. I Wrote The Fastest Hashtable. 2016. url: https://probablydance.com/2017/02/26/i-wrote-the-fastest-hashtable. ↩
S. S. Skiena, “Hashing”, The Algorithm Design Manual, 3rd ed., Springer, 2020, pp. 93–98. ↩
https://kconrad.math.uconn.edu/blurbs/ugradnumthy/millerrabin.pdf ↩
https://www.cs.cornell.edu/courses/JavaAndDS/files/hashing_RobinHood.pdf ↩
E. Kahssay, “A Fast Concurrent and Resizable Robin Hood Hash Table,” thesis, 2021. ↩
The text was updated successfully, but these errors were encountered: