-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple row-layout support, part-1: Restructure code for clearness #2189
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review all of this code carefully, but I did review the new structure (which is 🏅 👍 ) as well as spot checked some of the code
Thank you @yjshen
@@ -18,71 +18,33 @@ | |||
//! Accessing row from raw bytes | |||
|
|||
use crate::error::{DataFusionError, Result}; | |||
#[cfg(feature = "jit")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
|
||
for offset in offsets.iter().take(row_num) { | ||
row.point_to(*offset); | ||
row.point_to(*offset, data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems strange to update row.data
on each new offset as data
isn't changing from iteration to iteration here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason for this change is to support the use case in sort payload output, where we need to chase compositeIndex pointers and output rows that belongs to different input batches/pages. So we could therefore point to a record, append its cell to output record batch buffer, and ponit to the next record.
Since it's just a field assignment without expensive calculations, I think it's acceptable here.
|
||
/// Read `data` of raw-bytes rows starting at `offsets` out to a record batch | ||
|
||
pub fn read_as_batch_jit( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if over the long term we can hide all the reading/write as jit / not as jit within the RowReader / RowWriter -- so that most code in DataFusion will simply use RowReader/RowWriter and the use of jit would be an implementation detail
This may be where you are headed anyways, I just wanted to say it explicitly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, RowReader and RowWriter is meant to be used outside this row module. the underneath implementation could be chosen based on whether the jit feature gate is enabled I think.
I am merging this to unlock further row layout implementations. Thanks @alamb! |
Which issue does this PR close?
The first part for #2188.
Rationale for this change
#[cfg(feature = "jit")]
is scattered all over the code for row reader/writer.What changes are included in this PR?
Moving codes around for clearness.
Are there any user-facing changes?
No