Skip to content

Conversation

@jecsand838
Copy link
Contributor

Which issue does this PR close?

We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.

Rationale for this change

Many Kafka deployments use Confluent Schema Registry (4‑byte, big‑endian ID) and/or Apicurio Registry (commonly 8‑byte, big‑endian global ID). arrow-avro already supported the Confluent wire format; this PR adds first‑class support for Apicurio’s 64‑bit ID, enabling seamless decode/encode of streams that carry an 8‑byte ID after the magic byte.

This improves interoperability with ecosystems that standardize on Apicurio or Red Hat Event Streams.

What changes are included in this PR?

  • New 64‑bit ID support

    • Add Fingerprint::Id64(u64) and FingerprintAlgorithm::Id64.
    • Add FingerprintStrategy::Id64(u64) and helper Fingerprint::load_fingerprint_id64.
    • Extend Fingerprint::serialized_prefix to emit/read an 8‑byte big‑endian ID after the 0x00 magic byte.
  • Clarify/algin numeric‑ID algorithm names

    • Replace the prior FingerprintAlgorithm::None (numeric ID) with FingerprintAlgorithm::Id (4‑byte) and introduce Id64 (8‑byte). All examples and call sites updated accordingly (i.e., SchemaStore::new_with_type(FingerprintAlgorithm::Id)).
  • Reader/Writer plumbing

    • Decoder now understands both Id (4‑byte) and Id64 (8‑byte) prefixes.
    • WriterBuilder accepts FingerprintStrategy::Id64 to write frames with a 64‑bit ID.
  • SchemaStore behavior

    • SchemaStore::register now errors for Id/Id64 algorithms (as those IDs come from a registry); callers should use set(Fingerprint::Id(_)|Id64(_), ...) to associate schemas by registry ID.
  • Docs & examples

    • Reader docs expanded to call out Confluent (4‑byte) and Apicurio (8‑byte) formats; examples switched to FingerprintAlgorithm::Id. Bench and example updates reflect the new variants.

Are these changes tested?

Yes. This PR adds/updates unit tests that exercise the new path end‑to‑end, including:

  • test_stream_writer_with_id64_fingerprint_rt (writer round‑trip with 64‑bit ID).
  • test_two_messages_same_schema_id64 (decoder round‑trip with 64‑bit ID).
  • Adjustments to existing tests and benches to use FingerprintAlgorithm::Id instead of None.

Are there any user-facing changes?

N/A because arrow-avro isn't public yet.

…ling

# Changes
- Introduced `Id64` variant for `Fingerprint` and `FingerprintAlgorithm` to support 64-bit Schema Registry IDs.
- Updated `SchemaStore`, `Reader`, and `Writer` to handle `Id64` fingerprints.
- Added tests for `Id64` functionality, including round-trip and schema lookup scenarios.
- Adjusted documentation to cover the newly supported `Id64` format alongside the existing `Id`.
@github-actions github-actions bot added arrow Changes to the arrow crate arrow-avro arrow-avro crate labels Oct 8, 2025
@jecsand838
Copy link
Contributor Author

@alamb @mbrobbel Do you think we'd be able to get this into v57.0.0?

It's a small diff, but relatively high impact.

@mbrobbel mbrobbel added this to the 57.0.0 milestone Oct 9, 2025
Copy link
Member

@mbrobbel mbrobbel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jecsand838

@mbrobbel mbrobbel merged commit 348ae91 into apache:main Oct 10, 2025
23 checks passed
@mbrobbel
Copy link
Member

Thanks @jecsand838

@jecsand838 jecsand838 deleted the add-8byte-id-support branch October 11, 2025 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate arrow-avro arrow-avro crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants