Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] row hash tries to match position 0 #3548

Merged
merged 1 commit into from
Mar 13, 2023

Conversation

yoniko
Copy link
Contributor

@yoniko yoniko commented Mar 11, 2023

#3543 decreases the size of the tagTable by a factor of 2, which requires using the first tag position in each row for head position instead of a tag. Although position 0 stopped being a valid match, it still persisted in mask calculation resulting in the matches loops possibly terminating before it should have. The fix skips position 0 to solve this problem.

…ys invalid

facebook#3543 decreases the size of the tagTable by a factor of 2, which requires using the first tag position in each row for head position instead of a tag.
Although position 0 stopped being a valid match, it still persisted in mask calculation resulting in the matches loops possibly terminating before it should have.
The fix skips position 0 to solve this problem.
@yoniko yoniko force-pushed the bugfix-row-hash-pos0-match branch from 40f3edc to b636dfa Compare March 11, 2023 02:00
@yoniko yoniko changed the title [Bugfix] row hash match in position 0 is always invalid [Bugfix] row hash tries to match position 0 Mar 11, 2023
@yoniko yoniko requested a review from terrelln March 11, 2023 02:27
@Cyan4973
Copy link
Contributor

Any benchmark result for illustration ?

@yoniko
Copy link
Contributor Author

yoniko commented Mar 11, 2023

Any benchmark result for illustration ?

Performance impact is very minor (see updated spreadsheet with bugfix impact).

@Cyan4973
Copy link
Contributor

Cyan4973 commented Mar 11, 2023

I understand that the changes are very minor,
but trying to understand the impact :

Although position 0 stopped being a valid match, it still persisted in mask calculation resulting in the matches loops possibly terminating before it should have

This description implies that less matches were found due to the position 0 issue.
I would therefore expect the fix to translate into more matches found, which would, on average, lead to larger matches selected, with better compression ratio.

However, the regression test shows mostly compression ratio reduction ?

Another possible misunderstanding :
if position 0 results in the matches loop being terminated, and if position 0 is the first one being checked, it would result in the entire list of candidates being invalidated ? In which case, I would expect the impact on searches to be pretty significant, and compression ratio to be visibly impacted. But that's not the case, because differences are very minor. So is the explanation correct ?

which requires using the first tag position in each row for head position instead of a tag

Is the "1st tag position" the issue here ?
If yes, could the head counter occupy the last position instead ?

@yoniko
Copy link
Contributor Author

yoniko commented Mar 11, 2023

However, the regression test shows mostly compression ratio reduction ?

Both the regression tests and my benchmarks show mostly compression ratio increase or neutrality. However, the increase is very slight, most probably due to the probability of missing a significantly better match due to this bug being small on the input files.

position 0 is the first one being checked

Position 0 is never the first one to be checked, because it can never be the head position.

If yes, could the head counter occupy the last position instead ?

The same issue will persist, it will not change much as we will still use one tag slot for head position instead of an actual tag.

One last note is that one impact this bug had was that compression wasn't deterministic when tag space wasn't initialized, this issue is what caused me to find the bug and also helped verify that the fix is working properly.

@Cyan4973
Copy link
Contributor

The reproducibility argument is indeed very important, and should definitely be part of the top list of properties which justify this PR.

@yoniko yoniko merged commit a91e91d into facebook:dev Mar 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants