This is an autograder for an in-class exercise on hash functions that I created for CS 2510: Fundamentals of Computer Science 2 at Mills College at Northeastern University. Its purpose is to:
- illustrate how user-written hash functions affect hash set performance
- challenge students to think about what makes a good hash function for a given corpus
The recommended usage is:
- Going over the implementation of HashSet.java
with students, showing them how the
hashCode()
function is used. - Discussing the contract for
hashCode()
, which requiresequals()
objects to have equal hash codes. Point out that this is a legal implementation but that it will lead to maximal collisions, reducing the set table's performance to that of a linked list:
@Override
public int hashCode(){
return 42;
}
- Providing students with MyString.java and
challenge them to write a better hash function without calling any existing
implementations of
hashCode()
. (It is up to you whether to provide any examples of good hash functions or allow them access to online resources.) - Having students submit their files to a Gradescope assignment you create for immediate feedback on the number of collisions their function causes on a hidden (or visible) corpus.
- Sharing the implementations that perform the best, either in that class session or the next one.
Follow video instructions to create a Gradescope assignment, which can be summarized as follows:
- Download
autograder.zip
from the latest release or build your zip file by cloning this repository and running./make_autograder.sh
. - Create a Programming Assignment on Gradescope, enabling a leaderboard.
- Configuring the autograder, which includes:
- Selecting the latest Ubuntu image.
- Selecting JDK 17.
- Uploading
autograder.zip
. - Clicking on
Update Autograder
. - Uploading a test submission from a sample-submissions subdirectory.
Here are ideas for customizing the assignment:
- Replacing the corpus (currently a list of winners of SIGCSE prizes) with another corpus, such as the names of your students.
- De-generifying
HashSet
to simplify the code. - Varying the capacity of the hash set to distinguish among different types
of collisions, such as those due to the number buckets and those due to
identical
hashCode()
values.
Anyone interested in hash functions, especially for Java, should read Item 11: Always override hashCode when you override equals in Effective Java (3rd edition) by Joshua Bloch (Pearson, 2018).
The Java 1 String.hashCode()
implementation
is an interesting historical relic that has the curious
property of O(1) runtime.