Creating a test set with a hash (Issue 71 was closed) #613

minertom · 2021-01-05T00:43:53Z

Hi, I did read issue #71 "Creating test set with hash" and I only had one question concerning your explanation.

During the hashing, only the last byte of the actual hash is considered as the test in order to determine if the data in question belongs to the test set. Yes, the whole hash is a unique value (unless a collision happens). But, only the last byte 0-255 is used as the determinant of belonging in the data set. So, are you saying that because the hashing algorithm provides a "uniform distribution" that 20% of the values that represent the last byte of the hash will be less than 51 (20% of 256)?

Thank You
Tom

BTW, I purchased your book. Love it so far.

ageron · 2021-05-04T03:57:15Z

Hi @minertom ,

Thanks for your question, and for your kind words (I'm very glad you enjoy my book!).

You guessed right: I'm assuming that the last byte of the hash follows a uniform distribution over all possible byte values, so about 20% will be lower than 51, since 20% is about 51/256. Note that 51/256=19.92%, while 52/256=20.31%, so there's no easy way to get precisely 20% with just one byte. If this granularity is not sufficient, you could convert the whole hash to a very large integer, and check whether it's smaller than 20% of the max possible value. I felt that the added complexity wasn't worth the effort, but as this code has confused quite a few readers, I'm not sure that was a good call.

Anyway, I hope it's clearer now?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating a test set with a hash (Issue 71 was closed) #613

Creating a test set with a hash (Issue 71 was closed) #613

minertom commented Jan 5, 2021

ageron commented May 4, 2021

Creating a test set with a hash (Issue 71 was closed) #613

Creating a test set with a hash (Issue 71 was closed) #613

Comments

minertom commented Jan 5, 2021

ageron commented May 4, 2021