Add support for 64-bit Schema Registry IDs (Id64) in arrow-avro #8575
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.
Rationale for this change
Many Kafka deployments use Confluent Schema Registry (4‑byte, big‑endian ID) and/or Apicurio Registry (commonly 8‑byte, big‑endian global ID).
arrow-avroalready supported the Confluent wire format; this PR adds first‑class support for Apicurio’s 64‑bit ID, enabling seamless decode/encode of streams that carry an 8‑byte ID after the magic byte.This improves interoperability with ecosystems that standardize on Apicurio or Red Hat Event Streams.
What changes are included in this PR?
New 64‑bit ID support
Fingerprint::Id64(u64)andFingerprintAlgorithm::Id64.FingerprintStrategy::Id64(u64)and helperFingerprint::load_fingerprint_id64.Fingerprint::serialized_prefixto emit/read an 8‑byte big‑endian ID after the0x00magic byte.Clarify/algin numeric‑ID algorithm names
FingerprintAlgorithm::None(numeric ID) withFingerprintAlgorithm::Id(4‑byte) and introduceId64(8‑byte). All examples and call sites updated accordingly (i.e.,SchemaStore::new_with_type(FingerprintAlgorithm::Id)).Reader/Writer plumbing
Decodernow understands bothId(4‑byte) andId64(8‑byte) prefixes.WriterBuilderacceptsFingerprintStrategy::Id64to write frames with a 64‑bit ID.SchemaStore behavior
SchemaStore::registernow errors forId/Id64algorithms (as those IDs come from a registry); callers should useset(Fingerprint::Id(_)|Id64(_), ...)to associate schemas by registry ID.Docs & examples
FingerprintAlgorithm::Id. Bench and example updates reflect the new variants.Are these changes tested?
Yes. This PR adds/updates unit tests that exercise the new path end‑to‑end, including:
test_stream_writer_with_id64_fingerprint_rt(writer round‑trip with 64‑bit ID).test_two_messages_same_schema_id64(decoder round‑trip with 64‑bit ID).FingerprintAlgorithm::Idinstead ofNone.Are there any user-facing changes?
N/A because
arrow-avroisn't public yet.