Skip to content

Conversation

@ravisastryk
Copy link

@ravisastryk ravisastryk commented Jan 5, 2026

Summary

Adds input validation and security hardening to ZlibMessageCompressor::decompressData() as defense-in-depth for CVE-2025-14847. Fixes metrics bug in counterHitDecompress().

Changes

  • Add minimum payload size check (≥8 bytes)
  • Add zlib header validation (CMF/FLG checksum)
  • Add zip bomb protection (max 1024:1 ratio)
  • Fix counterHitDecompress() to use actual length
  • Add additional regression tests to make sure there are no breaking changes

Testing

  • Existing tests pass
  • New regression tests pass
  • Valid compression unaffected

References

Contributor: Ravi Sastry Kadali (@ravisastryk)

@ravisastryk ravisastryk marked this pull request as ready for review January 5, 2026 02:05
@ravisastryk
Copy link
Author

@spencerjackson @BillyDonahue I have implemented few additional defense-in-depth hardening around zlib compression just to reinforce the CVE-2025-14847 fix. When you get a chance, please review so we can move forward with hardening MongoDB for wider adoption.

Comment on lines 133 to 134
// FIX: Use actual decompressed length for metrics
// Previous code incorrectly used output.length() (buffer size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix. Actually this should be the headline and focus of its own PR and ticket.
History of what a line of code used to be is essentially irrelevant. Remove comments about it.
Just fix it and commit it.

Comment on lines 516 to 518
* CVE-2025-14847 defense-in-depth: minimum payload size validation
*/
TEST(ZlibMessageCompressor, CVE2025_14847_RejectsUndersizedPayload) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to just test to assert properties without referencing a CVE code.
The name (with prefix removed) describes the test sufficiently. No comment needed at all.

* Test: Excessive decompression ratio is rejected (zip bomb protection)
* CVE-2025-14847 defense-in-depth: ratio limit validation
*/
TEST(ZlibMessageCompressor, CVE2025_14847_RejectsZipBomb) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would an output buffer that happens to have a large capacity be cause for alarm?
The output capacity isn't determined by the user-specified payload AFAICT.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please answer this. I don't understand the problem.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this ratio check creates false positives without adding meaningful security. I have removed it entirely.

@BillyDonahue
Copy link
Contributor

BillyDonahue commented Jan 5, 2026

To accept this code, we'd need to get a contributor's agreement from you.
https://github.com/mongodb/mongo/wiki#contributing-to-mongodb

- Update assertZlibDecompressBadValue to accept optional output buffer size parameter
- Simplify RejectsExcessiveDecompressionRatio test using the enhanced helper
- Eliminates code duplication, makes test intent clearer
- Use char* for header validation, hide Bytef abstraction
- Replace magic number 8 with Z_DEFLATED constant
- Remove numbered comments and braces from one-liner if statements
- Fix counterHitDecompress to use actual decompressed length (not buffer size)
- Compact error messages
@ravisastryk
Copy link
Author

To accept this code, we'd need to get a contributor's agreement from you. https://github.com/mongodb/mongo/wiki#contributing-to-mongodb
Thank you @BillyDonahue for reviewing the PR and offering suggestions.
I have signed contributor's agreement. Please re-review when you get a chance

Comment on lines 56 to 57
// Per RFC1950: validate first two bytes (CMF, FLG). CM (CMF & 0x0F) must be
// Z_DEFLATED (8) and (CMF*256 + FLG) must be a multiple of 31.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're saying that already below. I don't think this level of detail should be in the function docs.
We can not have any comment above this function, as its name is sufficient.

Comment on lines 52 to 53
// Maximum allowed decompression ratio to prevent zip bombs
// A ratio above 1024:1 is suspicious for typical MongoDB messages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still disagree with this check. I don't understand the ratio being suspicious if it isn't controlled by incoming queries at all. Isn't output capacity just chosen by the calling function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your reasoning makes complete sense. The output buffer size is allocated by the calling code based on message metadata, not controlled by malicious input. Checking the ratio creates a false sense of security and could cause false positives when legitimate buffers are reused or pre-allocated for worst-case scenarios. The real protection is already in place through header validation and minimum size checks. I am proceeding with removing decompression ratio check altogether.

Comment on lines 107 to 108
if (output.length() > input.length() * kMaxDecompressionRatio)
return Status{ErrorCodes::BadValue, "Decompression ratio too large"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need an explanation of why this ratio should be enforced.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a valid point. Than you for catching. The output buffer size is allocated by the calling code based on message metadata, not controlled by malicious input. Checking the ratio creates a false sense of security and could cause false positives when legitimate buffers are reused or pre-allocated for worst-case scenarios. The real protection is already in place through header validation and minimum size checks. We would not need to enforce this check.

ASSERT_EQ(ErrorCodes::BadValue, swm.getStatus());
}

// Helper: assert that decompression of payload fails with BadValue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this line of the comment as it's restating the function name.
The following line should be commented with /** */ comments

void assertZlibDecompressBadValue(const std::vector<char>& payload, size_t outputSize = 1024) {
auto compressor = std::make_unique<ZlibMessageCompressor>();
ConstDataRange input(payload.data(), payload.size());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no blank lines in this function. I'm not sure what they're delineating. Seems unnecessary.

Comment on lines 222 to 223
ASSERT_NOT_OK(result);
ASSERT_EQ(result.getStatus().code(), ErrorCodes::BadValue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The callers should do the ASSERT calls.
Then they can determine what they want to check for.
The caller should also be given an opportunity to pass in an output vector (rather than just its size), so they can examine the decompressed data.

Then this function could be used for non-failure cases.
I think we just need something to swallow the decompression call syntax noise. But take a vector of input and produce a vector of output and a status and an output size.

If we did an output.resize(result.getValue()), that should be a way to get the output size back to the caller. We can just return a StatusWith<std::vector> output vector and they can see its .size().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for highlighting this aspect. Totally makes sense.


TEST(ZlibMessageCompressor, RejectsUndersizedPayload) {
// Payload smaller than kMinZlibPayloadSize (8 bytes)
assertZlibDecompressBadValue(std::vector<char>{0x78, 0x9c, 0x03, 0x00});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the vector type to be mentioned. Just use its initializer list constructor.

Suggested change
assertZlibDecompressBadValue(std::vector<char>{0x78, 0x9c, 0x03, 0x00});
assertZlibDecompressBadValue({0x78, 0x9c, 0x03, 0x00});

TEST(ZlibMessageCompressor, RejectsInvalidHeader) {
// Invalid header: wrong compression method (not deflate)
assertZlibDecompressBadValue(std::vector<char>{
0x70, 0x9c, // Invalid CMF (method != Z_DEFLATED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CMF is the 0x70, right? So the comment should be next to the 0x70.
But I think the comment should probably move up above the statement, rather than interrupting the data.

}

TEST(ZlibMessageCompressor, RejectsInvalidHeader) {
// Invalid header: wrong compression method (not deflate)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant. omit


TEST(ZlibMessageCompressor, RejectsExcessiveDecompressionRatio) {
// Valid zlib header but request an unreasonably large output capacity (>1024x input).
std::vector<char> smallPayload = {0x78, 0x9c, 0x63, 0x60, 0x00, 0x00, 0x00, 0x02, 0x00, 0x01};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just inline this into the decompress call

}

TEST(ZlibMessageCompressor, RejectsEmptyPayload) {
assertZlibDecompressBadValue(std::vector<char>{});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assertZlibDecompressBadValue(std::vector<char>{});
assertZlibDecompressBadValue({});

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing assertZlibDecompressBadValue to zlibDecompress instead so that it returns StatusWith<size_t> for caller flexibility and callers can do their own assertions and can be leveraged for both success and failure cases.

DataRange decompressedRange(decompressed.data(), decompressed.size());

ConstDataRange compressedInput(compressed.data(), compressResult.getValue());
auto decompressResult = compressor->decompressData(compressedInput, decompressedRange);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a helper function to handle this decompression.

@BillyDonahue
Copy link
Contributor

I'm generating a lot of feedback here. In the interest of time, I'm going to take this PR and make an internal one as a rebase of it. I'll omit the "large output buffer" check but keep the others.

@BillyDonahue BillyDonahue changed the title [SERVER-116125] Defense-in-depth hardening for zlib decompression (CVE-2025-14847) SERVER-116125 Add zlib decompression validation Jan 6, 2026
- Fix operator precedence bug in header validation (cmf & 0xf)
- Remove decompression ratio check per reviewer feedback
- Refactor test helper to return StatusWith for caller flexibility
- Clean up style: remove verbose comments, blank lines
- Use initializer list syntax in tests
@ravisastryk
Copy link
Author

@BillyDonahue Thank you for your thoughtful review and for taking the time. I have addressed the comments. Please re-review and let me know if you have any further suggestions. Also, feel free to commander and make any direct updates so we can collaborate.

@BillyDonahue
Copy link
Contributor

@BillyDonahue Thank you for your thoughtful review and for taking the time. I have addressed the comments. Please re-review and let me know if you have any further suggestions. Also, feel free to commander and make any direct updates so we can collaborate.

This looks good. I have put it with modifications into internal review and we should be able to commit soon. Thx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants