You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I did read issue #71 "Creating test set with hash" and I only had one question concerning your explanation.
During the hashing, only the last byte of the actual hash is considered as the test in order to determine if the data in question belongs to the test set. Yes, the whole hash is a unique value (unless a collision happens). But, only the last byte 0-255 is used as the determinant of belonging in the data set. So, are you saying that because the hashing algorithm provides a "uniform distribution" that 20% of the values that represent the last byte of the hash will be less than 51 (20% of 256)?
Thank You
Tom
BTW, I purchased your book. Love it so far.
The text was updated successfully, but these errors were encountered:
Thanks for your question, and for your kind words (I'm very glad you enjoy my book!).
You guessed right: I'm assuming that the last byte of the hash follows a uniform distribution over all possible byte values, so about 20% will be lower than 51, since 20% is about 51/256. Note that 51/256=19.92%, while 52/256=20.31%, so there's no easy way to get precisely 20% with just one byte. If this granularity is not sufficient, you could convert the whole hash to a very large integer, and check whether it's smaller than 20% of the max possible value. I felt that the added complexity wasn't worth the effort, but as this code has confused quite a few readers, I'm not sure that was a good call.
Hi, I did read issue #71 "Creating test set with hash" and I only had one question concerning your explanation.
During the hashing, only the last byte of the actual hash is considered as the test in order to determine if the data in question belongs to the test set. Yes, the whole hash is a unique value (unless a collision happens). But, only the last byte 0-255 is used as the determinant of belonging in the data set. So, are you saying that because the hashing algorithm provides a "uniform distribution" that 20% of the values that represent the last byte of the hash will be less than 51 (20% of 256)?
Thank You
Tom
BTW, I purchased your book. Love it so far.
The text was updated successfully, but these errors were encountered: