-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance reading ByteViewArray
from parquet by removing an implicit copy
#6031
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -71,7 +71,6 @@ struct ByteViewArrayReader { | |
} | ||
|
||
impl ByteViewArrayReader { | ||
#[allow(unused)] | ||
fn new( | ||
pages: Box<dyn PageIterator>, | ||
data_type: ArrowType, | ||
|
@@ -316,7 +315,10 @@ impl ByteViewArrayDecoderPlain { | |
} | ||
|
||
pub fn read(&mut self, output: &mut ViewBuffer, len: usize) -> Result<usize> { | ||
let block_id = output.append_block(self.buf.clone().into()); | ||
// Here we convert `bytes::Bytes` into `arrow_buffer::Bytes`, which is zero copy | ||
// Then we convert `arrow_buffer::Bytes` into `arrow_buffer:Buffer`, which is also zero copy | ||
let buf = arrow_buffer::Buffer::from_bytes(self.buf.clone().into()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should at least add a comment the rationale for this non obvious code. Maybe it would make sense to pull it into its a function (that could be commented, and more easily discoverable) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thinking about creating a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I recommend we split the code into multiple PRs -- this one to improve performance of the parquet reader and one to make it harder to misuse the API (which I suspect will be a breaking API change) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed, the breakage is much larger than I thought, will continue on this tomorrow. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Filed #6033 to track the idea |
||
let block_id = output.append_block(buf); | ||
|
||
let to_read = len.min(self.max_remaining_values); | ||
|
||
|
@@ -546,7 +548,10 @@ impl ByteViewArrayDecoderDeltaLength { | |
|
||
let src_lengths = &self.lengths[self.length_offset..self.length_offset + to_read]; | ||
|
||
let block_id = output.append_block(self.data.clone().into()); | ||
// Here we convert `bytes::Bytes` into `arrow_buffer::Bytes`, which is zero copy | ||
// Then we convert `arrow_buffer::Bytes` into `arrow_buffer:Buffer`, which is also zero copy | ||
let bytes = arrow_buffer::Buffer::from_bytes(self.data.clone().into()); | ||
let block_id = output.append_block(bytes); | ||
|
||
let mut current_offset = self.data_offset; | ||
let initial_offset = current_offset; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍