Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v24.2.x] compression: correct endianness in snappy_java_compressor (Manual backport) #25112

Merged

Conversation

WillemKauf
Copy link
Contributor

@WillemKauf WillemKauf commented Feb 19, 2025

Cherry-pick conflict in setup.py.

Also removed test_upgrade_java_compression from java_compression_test.py in backports.

Closes issue #25107

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

Bug Fixes

  • Fix the endianness of snappy_java_compressor headers to match that of snappy-java.

The versions in the snappy header are written using big-endian format in the
`snappy-java` client used by kafka.

Mistakenly, `redpanda` would write them using little-endian format in our
`snappy_java_compressor` implementation.

Correct this by encoding and decoding the `version` and `compatible_version`
headers using big-endian format in `snappy_java_compressor`.

For references to `snappy-java`'s big-endian implementation, see:

* https://github.com/xerial/snappy-java/blob/65e1ec3de1a0d447b137c6dd6393629aa3d75b8b/src/main/java/org/xerial/snappy/SnappyOutputStream.java#L343-L349
* https://github.com/xerial/snappy-java/blob/65e1ec3de1a0d447b137c6dd6393629aa3d75b8b/src/main/java/org/xerial/snappy/SnappyCodec.java#L78-L81

(cherry picked from commit 1c1b006)
Most `snappy` clients do not perform this version check, and furthermore,
it is implemented incorrectly here.

(cherry picked from commit 5723eb4)
(cherry picked from commit 72d02ee)
The two committed files in `snappy_payload` are a raw uncompressed data file,
and a `snappy` compressed data file generated by `redpanda` using the
incorrect little-endian encoding for the version fields in the `snappy`
header.

They are used in a unit test to ensure that with the big-endian fix for
`snappy`, we are still able to decompress the buffer and get the same
decompressed data as before the fix.

(cherry picked from commit a84252d)
In order to allow `kafka-python` to use these compression types,
we must be able to import the respective module.

(cherry picked from commit 17a2e55)
To test compression compatibility with Java-based Kafka consumers/producers.

These tests are parameterized for all compression types, but they most notably
serve as reproducers for an outstanding header-field encoding bug in
`snappy_java_compressor.cc`.

(cherry picked from commit 379f380)
@vbotbuildovich
Copy link
Collaborator

CI test results

test results on build#61991
test_id test_kind job_url test_status passed
gtest_raft_rpunit.gtest_raft_rpunit unit https://buildkite.com/redpanda/redpanda/builds/61991#01951bf8-3c6e-448e-bb6b-3847c823e188 FLAKY 1/2

@WillemKauf WillemKauf enabled auto-merge February 19, 2025 17:55
@lf-rep lf-rep disabled auto-merge February 20, 2025 00:20
@lf-rep lf-rep merged commit 855abeb into redpanda-data:v24.2.x Feb 20, 2025
17 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants