-
-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: drop use of MergeReverseRecordReader
#1213
Conversation
WalkthroughThe changes remove the Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Tip CodeRabbit's docstrings feature is now available as part of our Pro Plan! Simply use the command ✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/parseable/streams.rs (2)
472-472
: Revisit partial file handling.
Switching fromMergedReverseRecordReader
toMergedRecordReader
works, but thistry_new
method silently logs invalid files without signaling the caller. Consider providing a way (e.g., returning a list of skipped files) to track which files were skipped for better observability.
532-532
: Note on silent failures.
Removing the.unwrap()
means the construction ofMergedRecordReader
no longer panics. However, the code will also quietly skip invalid files. If partial merges matter, consider a strategy to surface these issues to the caller.src/parseable/staging/reader.rs (1)
40-40
: Consider explicit error reporting.
Changing the signature to returnSelf
discards file-level errors internally. If it's important to track or handle invalid files, returning aResult
or a summary of skipped files might be more transparent.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/parseable/staging/reader.rs
(2 hunks)src/parseable/streams.rs
(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: coverage
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
🔇 Additional comments (8)
src/parseable/streams.rs (2)
60-60
: Smooth import transition.
ReplacingMergedReverseRecordReader
withMergedRecordReader
in the import list aligns with the updated code structure.
489-489
: Confirm iteration order and transformations.
Usingrecord_reader.merged_iter(schema)
is straightforward. However, ensure that removing reverse iteration does not affect any desired ordering logic, and confirm thatadapt_batch
consistently applies all needed transformations.src/parseable/staging/reader.rs (6)
22-22
: Efficient buffering.
UsingBufReader
ensures efficiently buffered file reads.
27-27
: Straightforward import.
Bringing inRecordBatch
is necessary for the reading logic.
28-28
: Correct reader import.
StreamReader
fromarrow_ipc
is the proper choice to handle Arrow IPC streams.
32-32
: Good reuse of utility.
Adoptingadapt_batch
centralizes the batch transformation logic for consistency.
72-72
: Clear merging approach.
merged_iter
provides a unified iterator of record batches across multiple streams, simplifying reading logic.
73-76
: Check ordering requirements.
Flattening the readers and mapping each batch viaadapt_batch
is straightforward, but verify whether preserving any original ordering is necessary or if the new approach suffices for all use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/parseable/staging/reader.rs (1)
72-78
: New merged_iter method replaces reverse reader functionalityThis new method is the key implementation replacing
MergeReverseRecordReader
. It uses a flat mapping approach to iterate through all readers, appropriately reversing and adapting batches.Consider adding documentation to this method explaining its purpose and behavior, especially since it's replacing functionality from the removed
MergeReverseRecordReader
. This would help future developers understand the design decisions.+ /// Iterates through all readers, reversing and adapting each batch to the provided schema. + /// This method replaces the functionality previously provided by MergeReverseRecordReader. pub fn merged_iter(self, schema: Arc<Schema>) -> impl Iterator<Item = RecordBatch> {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/parseable/staging/reader.rs
(6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: coverage
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
🔇 Additional comments (5)
src/parseable/staging/reader.rs (5)
20-33
: Import updates aligned with PR goalsThe import changes reflect the removal of
MergeReverseRecordReader
and the update ofMergedRecordReader
to handle the functionality directly. The addition ofarrow_array::RecordBatch
and streamlined imports support the new implementation.
124-126
: Test helper function simplifies reader creationThe
get_reverse_reader
helper function provides a clean way to create readers for testing, improving test code clarity.Consider adding a brief comment to explain why this function exists and what it does.
+ /// Creates a StreamReader for testing purposes, wrapping the provided reader with BufReader. fn get_reverse_reader<T: Read + Seek>(reader: T) -> StreamReader<BufReader<T>> {
154-162
: Ensure tests validate expected behaviorTests have been updated to work with the new implementation. The assertions are checking that records are being read in the expected order.
216-216
: Test adapts to use new helper methodTest now uses the
get_reverse_reader
helper function and callsflatten()
to convert to an iterator, matching the new implementation approach.
40-61
:❓ Verification inconclusive
Return type change impacts error handling strategy
The
try_new
method has been modified to returnSelf
directly instead ofResult<Self, ()>
, which changes how errors are communicated to callers. The function now logs errors and continues rather than propagating them.Consider documenting this error handling strategy change, as callers previously expecting errors to be propagated will need to be updated. They now need to check if any readers were actually created rather than handling an error result.
🏁 Script executed:
#!/bin/bash # Check if there are any callers of try_new that expect a Result rg "try_new\(.*\)\?\.(?:ok|unwrap|expect)" --type rustLength of output: 56
Document Updated Error Handling Approach
The change in the
try_new
method now returnsSelf
directly rather than propagating errors via aResult
. This new approach logs error messages and skips problematic files instead of using error propagation. Given that no immediate caller patterns indicating an expectation of aResult
were detected by the automated search, please manually verify that all call-sites are updated to check whether any valid readers have been created, rather than relying on explicit error handling.
- Action: Update the function's documentation to highlight that errors during file processing are now handled by logging and omitting invalid files.
- Recommendation: Ensure callers of
try_new
verify the returned instance (e.g., checking an empty readers list) to manage error scenarios appropriately.
We won't be proceeding this route because:
|
Fixes #XXXX.
Description
This PR has:
Summary by CodeRabbit
New Features
Refactor
MergedRecordReader
and removing legacy reverse processing.MergedRecordReader
and improved error handling.