-
Notifications
You must be signed in to change notification settings - Fork 5.7k
SERVER-116125 Add zlib decompression validation #1628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
SERVER-116125 Add zlib decompression validation #1628
Conversation
|
@spencerjackson @BillyDonahue I have implemented few additional defense-in-depth hardening around zlib compression just to reinforce the CVE-2025-14847 fix. When you get a chance, please review so we can move forward with hardening MongoDB for wider adoption. |
| // FIX: Use actual decompressed length for metrics | ||
| // Previous code incorrectly used output.length() (buffer size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this fix. Actually this should be the headline and focus of its own PR and ticket.
History of what a line of code used to be is essentially irrelevant. Remove comments about it.
Just fix it and commit it.
| * CVE-2025-14847 defense-in-depth: minimum payload size validation | ||
| */ | ||
| TEST(ZlibMessageCompressor, CVE2025_14847_RejectsUndersizedPayload) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to just test to assert properties without referencing a CVE code.
The name (with prefix removed) describes the test sufficiently. No comment needed at all.
| * Test: Excessive decompression ratio is rejected (zip bomb protection) | ||
| * CVE-2025-14847 defense-in-depth: ratio limit validation | ||
| */ | ||
| TEST(ZlibMessageCompressor, CVE2025_14847_RejectsZipBomb) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would an output buffer that happens to have a large capacity be cause for alarm?
The output capacity isn't determined by the user-specified payload AFAICT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please answer this. I don't understand the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, this ratio check creates false positives without adding meaningful security. I have removed it entirely.
|
To accept this code, we'd need to get a contributor's agreement from you. |
- Update assertZlibDecompressBadValue to accept optional output buffer size parameter - Simplify RejectsExcessiveDecompressionRatio test using the enhanced helper - Eliminates code duplication, makes test intent clearer
- Use char* for header validation, hide Bytef abstraction - Replace magic number 8 with Z_DEFLATED constant - Remove numbered comments and braces from one-liner if statements - Fix counterHitDecompress to use actual decompressed length (not buffer size) - Compact error messages
|
| // Per RFC1950: validate first two bytes (CMF, FLG). CM (CMF & 0x0F) must be | ||
| // Z_DEFLATED (8) and (CMF*256 + FLG) must be a multiple of 31. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're saying that already below. I don't think this level of detail should be in the function docs.
We can not have any comment above this function, as its name is sufficient.
| // Maximum allowed decompression ratio to prevent zip bombs | ||
| // A ratio above 1024:1 is suspicious for typical MongoDB messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still disagree with this check. I don't understand the ratio being suspicious if it isn't controlled by incoming queries at all. Isn't output capacity just chosen by the calling function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your reasoning makes complete sense. The output buffer size is allocated by the calling code based on message metadata, not controlled by malicious input. Checking the ratio creates a false sense of security and could cause false positives when legitimate buffers are reused or pre-allocated for worst-case scenarios. The real protection is already in place through header validation and minimum size checks. I am proceeding with removing decompression ratio check altogether.
| if (output.length() > input.length() * kMaxDecompressionRatio) | ||
| return Status{ErrorCodes::BadValue, "Decompression ratio too large"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need an explanation of why this ratio should be enforced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a valid point. Than you for catching. The output buffer size is allocated by the calling code based on message metadata, not controlled by malicious input. Checking the ratio creates a false sense of security and could cause false positives when legitimate buffers are reused or pre-allocated for worst-case scenarios. The real protection is already in place through header validation and minimum size checks. We would not need to enforce this check.
| ASSERT_EQ(ErrorCodes::BadValue, swm.getStatus()); | ||
| } | ||
|
|
||
| // Helper: assert that decompression of payload fails with BadValue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this line of the comment as it's restating the function name.
The following line should be commented with /** */ comments
| void assertZlibDecompressBadValue(const std::vector<char>& payload, size_t outputSize = 1024) { | ||
| auto compressor = std::make_unique<ZlibMessageCompressor>(); | ||
| ConstDataRange input(payload.data(), payload.size()); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no blank lines in this function. I'm not sure what they're delineating. Seems unnecessary.
| ASSERT_NOT_OK(result); | ||
| ASSERT_EQ(result.getStatus().code(), ErrorCodes::BadValue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The callers should do the ASSERT calls.
Then they can determine what they want to check for.
The caller should also be given an opportunity to pass in an output vector (rather than just its size), so they can examine the decompressed data.
Then this function could be used for non-failure cases.
I think we just need something to swallow the decompression call syntax noise. But take a vector of input and produce a vector of output and a status and an output size.
If we did an output.resize(result.getValue()), that should be a way to get the output size back to the caller. We can just return a StatusWith<std::vector> output vector and they can see its .size().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for highlighting this aspect. Totally makes sense.
|
|
||
| TEST(ZlibMessageCompressor, RejectsUndersizedPayload) { | ||
| // Payload smaller than kMinZlibPayloadSize (8 bytes) | ||
| assertZlibDecompressBadValue(std::vector<char>{0x78, 0x9c, 0x03, 0x00}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the vector type to be mentioned. Just use its initializer list constructor.
| assertZlibDecompressBadValue(std::vector<char>{0x78, 0x9c, 0x03, 0x00}); | |
| assertZlibDecompressBadValue({0x78, 0x9c, 0x03, 0x00}); |
| TEST(ZlibMessageCompressor, RejectsInvalidHeader) { | ||
| // Invalid header: wrong compression method (not deflate) | ||
| assertZlibDecompressBadValue(std::vector<char>{ | ||
| 0x70, 0x9c, // Invalid CMF (method != Z_DEFLATED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CMF is the 0x70, right? So the comment should be next to the 0x70.
But I think the comment should probably move up above the statement, rather than interrupting the data.
| } | ||
|
|
||
| TEST(ZlibMessageCompressor, RejectsInvalidHeader) { | ||
| // Invalid header: wrong compression method (not deflate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant. omit
|
|
||
| TEST(ZlibMessageCompressor, RejectsExcessiveDecompressionRatio) { | ||
| // Valid zlib header but request an unreasonably large output capacity (>1024x input). | ||
| std::vector<char> smallPayload = {0x78, 0x9c, 0x63, 0x60, 0x00, 0x00, 0x00, 0x02, 0x00, 0x01}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just inline this into the decompress call
| } | ||
|
|
||
| TEST(ZlibMessageCompressor, RejectsEmptyPayload) { | ||
| assertZlibDecompressBadValue(std::vector<char>{}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| assertZlibDecompressBadValue(std::vector<char>{}); | |
| assertZlibDecompressBadValue({}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing assertZlibDecompressBadValue to zlibDecompress instead so that it returns StatusWith<size_t> for caller flexibility and callers can do their own assertions and can be leveraged for both success and failure cases.
| DataRange decompressedRange(decompressed.data(), decompressed.size()); | ||
|
|
||
| ConstDataRange compressedInput(compressed.data(), compressResult.getValue()); | ||
| auto decompressResult = compressor->decompressData(compressedInput, decompressedRange); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a helper function to handle this decompression.
|
I'm generating a lot of feedback here. In the interest of time, I'm going to take this PR and make an internal one as a rebase of it. I'll omit the "large output buffer" check but keep the others. |
- Fix operator precedence bug in header validation (cmf & 0xf) - Remove decompression ratio check per reviewer feedback - Refactor test helper to return StatusWith for caller flexibility - Clean up style: remove verbose comments, blank lines - Use initializer list syntax in tests
|
@BillyDonahue Thank you for your thoughtful review and for taking the time. I have addressed the comments. Please re-review and let me know if you have any further suggestions. Also, feel free to commander and make any direct updates so we can collaborate. |
This looks good. I have put it with modifications into internal review and we should be able to commit soon. Thx. |
Summary
Adds input validation and security hardening to
ZlibMessageCompressor::decompressData()as defense-in-depth for CVE-2025-14847. Fixes metrics bug incounterHitDecompress().Changes
counterHitDecompress()to use actual lengthTesting
References
Contributor: Ravi Sastry Kadali (@ravisastryk)