-
Notifications
You must be signed in to change notification settings - Fork 838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify null mask preservation in parquet reader #2116
Simplify null mask preservation in parquet reader #2116
Conversation
I intend to run benchmarks for this shortly to confirm no regression |
Codecov Report
@@ Coverage Diff @@
## master #2116 +/- ##
==========================================
+ Coverage 83.73% 83.77% +0.04%
==========================================
Files 225 225
Lines 59412 59474 +62
==========================================
+ Hits 49748 49826 +78
+ Misses 9664 9648 -16
|
d4a689a
to
4b40729
Compare
For some reason this represents a performance regression... More investigation needed 🤔 |
I haven't realize this 😂 I have one question: as
but DefinitionLevelBuffer type is based on
which in build_primitive_reader which not started read data.So i think the type of packed decoder is known before read data. |
i run IT in my modified version, it fail at skip first !😭 |
Correct, this is just a plumbing exercise to get that knowledge into the decoder at construction time |
Perplexingly if you run the arrow_reader benchmark from the crate root, this does not represent a performance regression, but if you run it from within the parquet crate, it does... I'm not really sure what to make of this |
@@ -195,7 +195,6 @@ where | |||
/// | |||
/// `values` will be contiguously populated with the non-null values. Note that if the column | |||
/// is not required, this may be less than either `batch_size` or the number of levels read | |||
#[inline] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would appear that this can result in sub-optimal inlining behaviour, in particular when compiling the parquet crate there is a noticeable performance degredation. Unfortunately the inlined code is so mangled that I've been unable to determine exactly what is going on, but I may revisit this at a later date
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as in "when you leave inline
the benchmarks get slower"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you leave inline
certain benchmarks get slower when compiled from the parquet crate, although there is no difference when compiled from the workspace level... It is incredibly strange...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.def_levels_buffer | ||
.as_ref() | ||
.map(|buf| buf.typed_data()) | ||
self.def_levels_buffer.as_ref().map(|buf| buf.typed_data()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not entirely clear to me why the formatting changed on these lines -- not that it is a bad change, but it seems like it wasn't a semantic change either 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know either...
@@ -195,7 +195,6 @@ where | |||
/// | |||
/// `values` will be contiguously populated with the non-null values. Note that if the column | |||
/// is not required, this may be less than either `batch_size` or the number of levels read | |||
#[inline] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as in "when you leave inline
the benchmarks get slower"?
@@ -277,25 +288,25 @@ enum LevelDecoderInner { | |||
impl ColumnLevelDecoder for ColumnLevelDecoderImpl { | |||
type Slice = [i16]; | |||
|
|||
fn new(max_level: i16, encoding: Encoding, data: ByteBufferPtr) -> Self { | |||
let bit_width = num_required_bits(max_level as u64); | |||
fn set_data(&mut self, encoding: Encoding, data: ByteBufferPtr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not an expert in this area, but the new code structure seems to make sense to me
let def_levels = (desc.max_def_level() > 0) | ||
.then(|| DefinitionLevelBuffer::new(&desc, null_mask_only)); | ||
.then(|| DefinitionLevelBuffer::new(&desc, packed_null_mask(&desc))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this is the key change in this PR? that the decision to use a null mask is pushed down to this level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct 👍
CI Failure should be fixed by #2121 |
Benchmark runs are scheduled for baseline = 5e3facf and contender = a9fa1b4. a9fa1b4 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Part of #2107
Rationale for this change
The original logic added in #1054 is very confusing as it determines whether to use a packed decoder, based on the type of
DefinitionLevelBuffer
passed toDefinitionLevelBufferDecoder
. Not only is this confusing, but it creates a problem when skipping the first records in a column chunk, as the type of decoder is not known until data has been read 😱This largely dated from a time when
GenericRecordReader
was generic over the levels in addition to values. In the end I removed this prior to merge as it was unnecessary complexity.What changes are included in this PR?
Explicitly construct the decoders in
GenericRecordReader
, and passes them toGenericColumnReader::new_with_decoders
. This allows adding an additional constructor parameter toDefinitionLevelBufferDecoder
to instruct it whether to decode packed or not.Are there any user-facing changes?
No, all these traits are crate private