Skip to content

Conversation

@fresh-borzoni
Copy link
Contributor

@fresh-borzoni fresh-borzoni commented Jan 12, 2026

Purpose

Linked issue: close #111

This PR implements the complete KV (Key-Value) record batch format for Fluss-Rust, enabling serialization and deserialization of KV records compatible with the Java implementation.

Brief change log

Core Implementation:

  • KvRecord Implements immutable key-value record format with variable-length encoding for keys and optional values (tombstones for deletions)
  • KvRecordBatch: Provides read-only access to serialized KV record batches with CRC32C checksum validation, schema versioning, and iterator support
  • KvRecordBatchBuilder : Implements batch building with configurable write limits, exactly-once semantics (writer ID + sequence), and direct CompactedRow integration
  • Varint utilities : Complete variable-length integer encoding/decoding matching Protocol Buffers format, with optimized variants for Write, BufMut, and raw slices

Record Format:
KvRecord: [Length:I32][KeyLength:VarInt][Key:bytes]Value:bytes?]
KvBatch: [Length:I32][Magic:I8][CRC:U32][SchemaId:I16][Attributes:I8]
[WriterId:I64][BatchSequence:I32][RecordCount:I32][Records...]

API Updates:

  • Exposed write_unsigned_varint_to_slice() for CompactedRowWriter integration
  • Updated CompactedRowWriter::finish() to return slice instead of copying

API and Format

  pub struct KvRecord { ... }
  pub struct KvRecordBatch { ... }
  pub struct KvRecordBatchBuilder { ... }
  pub mod varint { ... }

Storage Format:

  • Introduces KV record batch format (100% binary-compatible with Java)
  • Uses unsigned varint encoding (NOT zigzag) for space efficiency.
  • Format version tracked via magic byte (CURRENT_KV_MAGIC_VALUE = 0)

Breaking Changes: None (new module)

Compatibility: Rust implementation produces byte-identical output to Java implementation for all field types and edge cases.

Documentation

New Feature: Yes

Note: Reading Context part is factored out to separate Task.

@fresh-borzoni
Copy link
Contributor Author

@luoyuxia @leekeiabstraction @Kelvinyu1117
PTAL 🙏

I decided to use BytesMut as it seems appropriate for Java -> Rust translation and moved varint logic into separate module to be reused.

@fresh-borzoni
Copy link
Contributor Author

FYI
We may wish to consolidate some tests, I used them mostly for development and edge cases checks.

@luoyuxia
Copy link
Contributor

@fresh-borzoni Thanks for the greate work. I'll have a look when I got some time

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a complete KV (Key-Value) record batch format for Fluss-Rust, providing binary-compatible serialization and deserialization with the Java implementation. The implementation includes variable-length integer encoding utilities, KvRecord for individual key-value pairs with optional tombstone support, KvRecordBatch for read-only batch access with CRC32C validation, and KvRecordBatchBuilder for constructing batches with configurable limits and exactly-once semantics.

Changes:

  • Added complete varint encoding/decoding utilities with multiple variants for different use cases (Write trait, BufMut, raw slices)
  • Implemented KvRecord, KvRecordBatch, and KvRecordBatchBuilder for managing KV record batches with header validation and checksum verification
  • Refactored CompactedRowWriter and CompactedRowReader to use shared varint utilities, reducing code duplication

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
crates/fluss/src/util/varint.rs New module implementing variable-length integer encoding/decoding with Protocol Buffers-compatible format
crates/fluss/src/util/mod.rs Exports the new varint module
crates/fluss/src/row/mod.rs Changes compacted module visibility from private to public
crates/fluss/src/row/compacted/compacted_row_writer.rs Refactored to use varint utilities, removing duplicate encoding logic
crates/fluss/src/row/compacted/compacted_row_reader.rs Refactored to use varint utilities, removing duplicate decoding logic
crates/fluss/src/row/compacted/compacted_key_writer.rs Added Default trait implementation
crates/fluss/src/record/mod.rs Exports the new kv module
crates/fluss/src/record/kv/mod.rs Module structure for KV record functionality
crates/fluss/src/record/kv/kv_record.rs Implements immutable KV record with variable-length encoding
crates/fluss/src/record/kv/kv_record_batch.rs Implements read-only batch access with CRC validation and iteration
crates/fluss/src/record/kv/kv_record_batch_builder.rs Implements batch building with write limits and writer state management

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@leekeiabstraction leekeiabstraction left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. PTAL!

@fresh-borzoni
Copy link
Contributor Author

@leekeiabstraction Thank you for the review.

Addressed, PTAL 🙏

Copy link
Contributor

@leekeiabstraction leekeiabstraction left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added further comments. Thank you!

@fresh-borzoni
Copy link
Contributor Author

@leekeiabstraction Thank you for the review!
Addressed comments PTAL

added todo and task #162

Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fresh-borzoni Thanks for the pr. LGTM overall. Just left minor comments.

Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fresh-borzoni Thanks for quick update. LGTM!

@luoyuxia luoyuxia merged commit 49d0c93 into apache:main Jan 15, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce KvRecordBatchBuilder

3 participants