Multiple row-layout support, part-1: Restructure code for clearness #2189

yjshen · 2022-04-10T04:42:10Z

Which issue does this PR close?

The first part for #2188.

Rationale for this change

JIT feature gate #[cfg(feature = "jit")] is scattered all over the code for row reader/writer.
I'm trying to figure out how to use the existing cell getter/setters for different row layouts. Moving unrelated parts to their own module would make further refactoring easier.

What changes are included in this PR?

Moving codes around for clearness.

Are there any user-facing changes?

No

alamb

I didn't review all of this code carefully, but I did review the new structure (which is 🏅 👍 ) as well as spot checked some of the code

Thank you @yjshen

alamb · 2022-04-10T12:08:52Z

datafusion/core/src/row/reader.rs

@@ -18,71 +18,33 @@
 //! Accessing row from raw bytes

 use crate::error::{DataFusionError, Result};
-#[cfg(feature = "jit")]


alamb · 2022-04-10T12:11:07Z

datafusion/core/src/row/reader.rs


    for offset in offsets.iter().take(row_num) {
-        row.point_to(*offset);
+        row.point_to(*offset, data);


it seems strange to update row.data on each new offset as data isn't changing from iteration to iteration here

The main reason for this change is to support the use case in sort payload output, where we need to chase compositeIndex pointers and output rows that belongs to different input batches/pages. So we could therefore point to a record, append its cell to output record batch buffer, and ponit to the next record.

Since it's just a field assignment without expensive calculations, I think it's acceptable here.

alamb · 2022-04-10T12:13:52Z

datafusion/core/src/row/jit/reader.rs

+
+/// Read `data` of raw-bytes rows starting at `offsets` out to a record batch
+
+pub fn read_as_batch_jit(


I wonder if over the long term we can hide all the reading/write as jit / not as jit within the RowReader / RowWriter -- so that most code in DataFusion will simply use RowReader/RowWriter and the use of jit would be an implementation detail

This may be where you are headed anyways, I just wanted to say it explicitly

Yes, RowReader and RowWriter is meant to be used outside this row module. the underneath implementation could be chosen based on whether the jit feature gate is enabled I think.

yjshen · 2022-04-11T06:30:17Z

I am merging this to unlock further row layout implementations. Thanks @alamb!

first round: move jit_row to submodule, separate validity and layout

6a37022

github-actions bot added the datafusion Changes in the datafusion crate label Apr 10, 2022

yjshen mentioned this pull request Apr 10, 2022

Support Multiple row layout #2188

Closed

3 tasks

alamb approved these changes Apr 10, 2022

View reviewed changes

yjshen merged commit c46c91f into apache:master Apr 11, 2022

yjshen mentioned this pull request Apr 18, 2022

Introduce RowLayout to represent rows for different purposes #2261

Merged

yjshen deleted the row_refine branch April 22, 2022 08:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple row-layout support, part-1: Restructure code for clearness #2189

Multiple row-layout support, part-1: Restructure code for clearness #2189

yjshen commented Apr 10, 2022

alamb left a comment

alamb Apr 10, 2022

alamb Apr 10, 2022

yjshen Apr 10, 2022

alamb Apr 10, 2022

yjshen Apr 10, 2022

yjshen commented Apr 11, 2022


		/// Read `data` of raw-bytes rows starting at `offsets` out to a record batch

		pub fn read_as_batch_jit(

Multiple row-layout support, part-1: Restructure code for clearness #2189

Multiple row-layout support, part-1: Restructure code for clearness #2189

Conversation

yjshen commented Apr 10, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

alamb left a comment

Choose a reason for hiding this comment

alamb Apr 10, 2022

Choose a reason for hiding this comment

alamb Apr 10, 2022

Choose a reason for hiding this comment

yjshen Apr 10, 2022

Choose a reason for hiding this comment

alamb Apr 10, 2022

Choose a reason for hiding this comment

yjshen Apr 10, 2022

Choose a reason for hiding this comment

yjshen commented Apr 11, 2022