Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bad integer wraparound in repcode index for fast, dfast, lazy #2610

Merged
merged 1 commit into from
May 5, 2021

Conversation

senhuang42
Copy link
Contributor

The recently improved fuzz coverage of overflow correction has discovered a long-standing bug in ZSTD_compressBlock_lazy_extDict_generic(). There's a bad condition that could cause underflows if a repcode existed in a different segment than ip.

The original comment says /* intentional overflow */, but that actually only correctly applies to the first condition tested - it's not intentional for the second one.

The transformation, step-by-step to show correctness:
repIndex > windowLow
repCurrent - offset_2 > windowLow
-offset_2 > windowLow - repCurrent
offset_2 < repCurrent - windowLow

Test Plan:

  • Unit tests, check that this fixes OSS-Fuzz failure.

@Cyan4973
Copy link
Contributor

Cyan4973 commented May 4, 2021

Any measurable performance impact ?

@senhuang42
Copy link
Contributor Author

Any measurable performance impact ?

No, according to fullbench at level 5 on silesia:

dev: 12#compressContinue_extDict     :   120.2 MB/s  (63807944)
fix: 12#compressContinue_extDict     :   120.2 MB/s  (63807944)

@terrelln
Copy link
Contributor

terrelln commented May 4, 2021

@senhuang42 Does this bug exist in other extDict match finders?

@senhuang42
Copy link
Contributor Author

@senhuang42 Does this bug exist in other extDict match finders?

Looking at the ZSTD_fast one now, and it's there too (and is the cause of the oss-fuzz bug as well). I'll do a pass through all the matchfinders.

@senhuang42
Copy link
Contributor Author

senhuang42 commented May 4, 2021

This problem exists in zstd_fast.c, zstd_double_fast.c, and zstd_lazy.c. `

I did many runs on ZSTD_fast level 1, and the general pattern indicates that the fix might have caused a very minor
slowdown, but I'm not entirely sure. dfast and lazy pretty consistently gave basically no speed change.

Benchmarks to check for speed perturbations:

ZSTD_fast (level 1):

dev:
12#compressContinue_extDict     :   346.2 MB/s  (73515441)
25,401,048,003      cycles
54,934,433,837      instructions
fix:
12#compressContinue_extDict     :   345.1 MB/s  (73515441)
25,473,392,276      cycles
54,950,999,865      instructions

ZSTD_dfast (level 3):
dev: 12#compressContinue_extDict     :   177.0 MB/s (66685466) 
fix: 12#compressContinue_extDict     :   177.4 MB/s  (66685466) 

ZSTD_greedy (level 5):
fix: 12#compressContinue_extDict     :   120.0 MB/s  (63807944)
fix: 12#compressContinue_extDict     :   120.5 MB/s  (63807944)

@senhuang42 senhuang42 force-pushed the lazy_underflow_fix branch 3 times, most recently from 0a6e67a to 8baf068 Compare May 4, 2021 21:12
@senhuang42 senhuang42 changed the title Fix bad integer wraparound in lazy algorithm Fix bad integer wraparound in repcode index for fast, dfast, lazy May 4, 2021
@Cyan4973
Copy link
Contributor

Cyan4973 commented May 4, 2021

The speed difference is very minor and very acceptable,
but what's the scenario employed for benchmark ?
Does it ensure usage of extDict, like (for example) dictionary compression ?

@terrelln
Copy link
Contributor

terrelln commented May 4, 2021

Why are you comparing offset_1 - 1 in most places? And in zstd_lazy.c using offset_1. And always comparing offset_2.

I interpret comparing offset_1 - 1 as a way to ensure that offset_1 == 0 fails the comparison. But wouldn't that also apply to offset_2?

If you end up going the offset_1 - 1 and offset_2 - 1 route, you could check for too-large offsets at the beginning of the block, like the single-segment functions do, and set those offsets to zero..

@senhuang42
Copy link
Contributor Author

senhuang42 commented May 4, 2021

Does it ensure usage of extDict, like (for example) dictionary compression ?

I'm using fullbench with silesia.tar. I'm directly testing compressContinue_extDict() which calls ZSTD_compressBlock_*_extDict_generic() and definitely hits the changed code according to logs. Would that be sufficient to measure speed changes, or is there something else about dictionaries in particular?

Why are you comparing offset_1 - 1 in most places? And in zstd_lazy.c using offset_1. And always comparing offset_2.

I was directly converting from the repIndex expression from before to try and keep the same behavior, So if repIndex = curr+1 - offset_1, we'd compare against offset_1 - 1, and sometimes repIndex = curr - offset_1, in which case we'd compare against offset_1, and same with offset_2. Is that reasonable?

I guess my actual question then is: why is repIndex = curr+1 - offset_1; sometimes, and repIndex = curr - offset_1; at other times?

@Cyan4973
Copy link
Contributor

Cyan4973 commented May 4, 2021

I'm directly testing compressContinue_extDict() which (...) definitely hits the changed code according to logs. Would that be sufficient (...) ?

Yes, totally

@terrelln
Copy link
Contributor

terrelln commented May 4, 2021

I guess my actual question then is: why is repIndex = curr+1 - offset_1; sometimes, and repIndex = curr - offset_1; at other times?

Ah good point, sometimes we search at current+1 to "favor" repcodes. The conversion to offset_1 - 1 is a bit confusing to me, since it isn't super clear that it is because we are searching one position ahead.

Could you add some comments explaining why we are comparing the offsets instead of the indices?

@senhuang42
Copy link
Contributor Author

Ah, makes sense. I'll just change the comparison to curr+1 instead since that conveys the idea better, and add a comment.

Copy link
Contributor

@terrelln terrelln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct to me, thanks for fixing this! Just one minor comment for consistency, and make sure the tests pass, and it looks good to me!

&& (MEM_read32(repMatch) == MEM_read32(ip+1)) ) {
const BYTE* repMatchEnd = repIndex < prefixStartIndex ? dictEnd : iend;
const BYTE* repMatchEnd = offset_1 > curr+1 - prefixStartIndex ? dictEnd : iend;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary to change, here and elsewhere. We should only need to change the comparison on like 412. If the comparison succeeds, then we know it won't underflow.

Also repBase and repMatch are calculated using repIndex so it is a bit confusing to see repMatchEnd calculated differently.

Copy link
Contributor

@terrelln terrelln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming the tests all pass!

@terrelln
Copy link
Contributor

terrelln commented May 5, 2021

I'm going to go ahead and merge this so OSS-Fuzz can pick up these changes in the morning and continue fuzzing.

@terrelln terrelln merged commit d40f55c into facebook:dev May 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants