-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] row hash tries to match position 0 #3548
Conversation
…ys invalid facebook#3543 decreases the size of the tagTable by a factor of 2, which requires using the first tag position in each row for head position instead of a tag. Although position 0 stopped being a valid match, it still persisted in mask calculation resulting in the matches loops possibly terminating before it should have. The fix skips position 0 to solve this problem.
40f3edc
to
b636dfa
Compare
Any benchmark result for illustration ? |
Performance impact is very minor (see updated spreadsheet with bugfix impact). |
I understand that the changes are very minor,
This description implies that less matches were found due to the position 0 issue. However, the regression test shows mostly compression ratio reduction ? Another possible misunderstanding :
Is the "1st tag position" the issue here ? |
Both the regression tests and my benchmarks show mostly compression ratio increase or neutrality. However, the increase is very slight, most probably due to the probability of missing a significantly better match due to this bug being small on the input files.
Position 0 is never the first one to be checked, because it can never be the head position.
The same issue will persist, it will not change much as we will still use one tag slot for head position instead of an actual tag. One last note is that one impact this bug had was that compression wasn't deterministic when tag space wasn't initialized, this issue is what caused me to find the bug and also helped verify that the fix is working properly. |
The reproducibility argument is indeed very important, and should definitely be part of the top list of properties which justify this PR. |
#3543 decreases the size of the tagTable by a factor of 2, which requires using the first tag position in each row for head position instead of a tag. Although position 0 stopped being a valid match, it still persisted in mask calculation resulting in the matches loops possibly terminating before it should have. The fix skips position 0 to solve this problem.