-
Notifications
You must be signed in to change notification settings - Fork 21
[TASK-111] KvRecordBatchBuilder implementation #156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@luoyuxia @leekeiabstraction @Kelvinyu1117 I decided to use BytesMut as it seems appropriate for Java -> Rust translation and moved varint logic into separate module to be reused. |
|
FYI |
a0b3d41 to
c22497a
Compare
|
@fresh-borzoni Thanks for the greate work. I'll have a look when I got some time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a complete KV (Key-Value) record batch format for Fluss-Rust, providing binary-compatible serialization and deserialization with the Java implementation. The implementation includes variable-length integer encoding utilities, KvRecord for individual key-value pairs with optional tombstone support, KvRecordBatch for read-only batch access with CRC32C validation, and KvRecordBatchBuilder for constructing batches with configurable limits and exactly-once semantics.
Changes:
- Added complete varint encoding/decoding utilities with multiple variants for different use cases (Write trait, BufMut, raw slices)
- Implemented KvRecord, KvRecordBatch, and KvRecordBatchBuilder for managing KV record batches with header validation and checksum verification
- Refactored CompactedRowWriter and CompactedRowReader to use shared varint utilities, reducing code duplication
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/fluss/src/util/varint.rs | New module implementing variable-length integer encoding/decoding with Protocol Buffers-compatible format |
| crates/fluss/src/util/mod.rs | Exports the new varint module |
| crates/fluss/src/row/mod.rs | Changes compacted module visibility from private to public |
| crates/fluss/src/row/compacted/compacted_row_writer.rs | Refactored to use varint utilities, removing duplicate encoding logic |
| crates/fluss/src/row/compacted/compacted_row_reader.rs | Refactored to use varint utilities, removing duplicate decoding logic |
| crates/fluss/src/row/compacted/compacted_key_writer.rs | Added Default trait implementation |
| crates/fluss/src/record/mod.rs | Exports the new kv module |
| crates/fluss/src/record/kv/mod.rs | Module structure for KV record functionality |
| crates/fluss/src/record/kv/kv_record.rs | Implements immutable KV record with variable-length encoding |
| crates/fluss/src/record/kv/kv_record_batch.rs | Implements read-only batch access with CRC validation and iteration |
| crates/fluss/src/record/kv/kv_record_batch_builder.rs | Implements batch building with write limits and writer state management |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
leekeiabstraction
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. PTAL!
|
@leekeiabstraction Thank you for the review. Addressed, PTAL 🙏 |
leekeiabstraction
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added further comments. Thank you!
|
@leekeiabstraction Thank you for the review! added todo and task #162 |
luoyuxia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fresh-borzoni Thanks for the pr. LGTM overall. Just left minor comments.
luoyuxia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fresh-borzoni Thanks for quick update. LGTM!
Purpose
Linked issue: close #111
This PR implements the complete KV (Key-Value) record batch format for Fluss-Rust, enabling serialization and deserialization of KV records compatible with the Java implementation.
Brief change log
Core Implementation:
Write,BufMut, and raw slicesRecord Format:
KvRecord: [Length:I32][KeyLength:VarInt][Key:bytes]Value:bytes?]
KvBatch: [Length:I32][Magic:I8][CRC:U32][SchemaId:I16][Attributes:I8]
[WriterId:I64][BatchSequence:I32][RecordCount:I32][Records...]
API Updates:
write_unsigned_varint_to_slice()forCompactedRowWriterintegrationCompactedRowWriter::finish()to return slice instead of copyingAPI and Format
Storage Format:
Breaking Changes: None (new module)
Compatibility: Rust implementation produces byte-identical output to Java implementation for all field types and edge cases.
Documentation
New Feature: Yes
Note: Reading Context part is factored out to separate Task.