Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple row-layout support, part-1: Restructure code for clearness #2189

Merged
merged 1 commit into from
Apr 11, 2022

Conversation

yjshen
Copy link
Member

@yjshen yjshen commented Apr 10, 2022

Which issue does this PR close?

The first part for #2188.

Rationale for this change

  • JIT feature gate #[cfg(feature = "jit")] is scattered all over the code for row reader/writer.
  • I'm trying to figure out how to use the existing cell getter/setters for different row layouts. Moving unrelated parts to their own module would make further refactoring easier.

What changes are included in this PR?

Moving codes around for clearness.

Are there any user-facing changes?

No

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Apr 10, 2022
@yjshen yjshen mentioned this pull request Apr 10, 2022
3 tasks
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review all of this code carefully, but I did review the new structure (which is 🏅 👍 ) as well as spot checked some of the code

Thank you @yjshen

@@ -18,71 +18,33 @@
//! Accessing row from raw bytes

use crate::error::{DataFusionError, Result};
#[cfg(feature = "jit")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️


for offset in offsets.iter().take(row_num) {
row.point_to(*offset);
row.point_to(*offset, data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems strange to update row.data on each new offset as data isn't changing from iteration to iteration here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason for this change is to support the use case in sort payload output, where we need to chase compositeIndex pointers and output rows that belongs to different input batches/pages. So we could therefore point to a record, append its cell to output record batch buffer, and ponit to the next record.

Since it's just a field assignment without expensive calculations, I think it's acceptable here.


/// Read `data` of raw-bytes rows starting at `offsets` out to a record batch

pub fn read_as_batch_jit(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if over the long term we can hide all the reading/write as jit / not as jit within the RowReader / RowWriter -- so that most code in DataFusion will simply use RowReader/RowWriter and the use of jit would be an implementation detail

This may be where you are headed anyways, I just wanted to say it explicitly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, RowReader and RowWriter is meant to be used outside this row module. the underneath implementation could be chosen based on whether the jit feature gate is enabled I think.

@yjshen
Copy link
Member Author

yjshen commented Apr 11, 2022

I am merging this to unlock further row layout implementations. Thanks @alamb!

@yjshen yjshen merged commit c46c91f into apache:master Apr 11, 2022
@yjshen yjshen deleted the row_refine branch April 22, 2022 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants