Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v24.1.x] compression: correct endianness in snappy_java_compressor (Manual backport) #25137

Merged

Conversation

WillemKauf
Copy link
Contributor

Cherry-pick conflict in setup.py and bazel BUILD files.

Also removed test_upgrade_java_compression from java_compression_test.py in backports.

Closes issue #25136.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

Bug Fixes

  • Fix the endianness of snappy_java_compressor headers to match that of snappy-java.

The versions in the snappy header are written using big-endian format in the
`snappy-java` client used by kafka.

Mistakenly, `redpanda` would write them using little-endian format in our
`snappy_java_compressor` implementation.

Correct this by encoding and decoding the `version` and `compatible_version`
headers using big-endian format in `snappy_java_compressor`.

For references to `snappy-java`'s big-endian implementation, see:

* https://github.com/xerial/snappy-java/blob/65e1ec3de1a0d447b137c6dd6393629aa3d75b8b/src/main/java/org/xerial/snappy/SnappyOutputStream.java#L343-L349
* https://github.com/xerial/snappy-java/blob/65e1ec3de1a0d447b137c6dd6393629aa3d75b8b/src/main/java/org/xerial/snappy/SnappyCodec.java#L78-L81

(cherry picked from commit 1c1b006)
Most `snappy` clients do not perform this version check, and furthermore,
it is implemented incorrectly here.

(cherry picked from commit 5723eb4)
(cherry picked from commit 72d02ee)
The two committed files in `snappy_payload` are a raw uncompressed data file,
and a `snappy` compressed data file generated by `redpanda` using the
incorrect little-endian encoding for the version fields in the `snappy`
header.

They are used in a unit test to ensure that with the big-endian fix for
`snappy`, we are still able to decompress the buffer and get the same
decompressed data as before the fix.

(cherry picked from commit a84252d)
In order to allow `kafka-python` to use these compression types,
we must be able to import the respective module.

(cherry picked from commit 17a2e55)
To test compression compatibility with Java-based Kafka consumers/producers.

These tests are parameterized for all compression types, but they most notably
serve as reproducers for an outstanding header-field encoding bug in
`snappy_java_compressor.cc`.

(cherry picked from commit 379f380)
@vbotbuildovich
Copy link
Collaborator

CI test results

test results on build#62123
test_id test_kind job_url test_status passed
rptest.tests.delete_records_test.DeleteRecordsTest.test_delete_records_concurrent_truncations.cloud_storage_enabled=True.truncate_point=at_high_watermark ducktape https://buildkite.com/redpanda/redpanda/builds/62123#01952bae-8820-492c-954f-9e031376519f FLAKY 5/6

@WillemKauf WillemKauf enabled auto-merge February 22, 2025 05:13
@WillemKauf WillemKauf merged commit a26ca70 into redpanda-data:v24.1.x Feb 22, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants