-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix NullArrayReader (#1245) #1246
Conversation
@@ -214,6 +214,10 @@ where | |||
// save definition and repetition buffers | |||
self.def_levels_buffer = self.record_reader.consume_def_levels()?; | |||
self.rep_levels_buffer = self.record_reader.consume_rep_levels()?; | |||
|
|||
// Must consume bitmap buffer | |||
self.record_reader.consume_bitmap_buffer()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this is even getting computed at all is definitely a valid question, but at the same time a workload where the speed of reading arrays of nulls is the bottleneck feels decidedly... niche 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One place it comes up is where you have multiple Parquet files representing a "table", with one or more columns being relatively sparse. If you have a file dropped every hour, then it may be that some of the files have a null column while others have a few values.
It doesn't seem likely it would be the bottleneck (the other columns in the file probably would be), but that's at least how it's come up for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I didn't mean to suggest NullArrayReader itself is niche, but rather there is probably a limited benefit to optimising it to not compute the bitmask unnecessarily 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tustvold -- I'll get this out in arrow 9.0.0 next week; I can try to release a patchset sooner if you need @bjchambers
No need to rush. Next week should be fine. I can just wait to upgrade until
this is out. Thanks for the speedy fix!
…On Sat, Jan 29, 2022 at 3:19 AM Andrew Lamb ***@***.***> wrote:
***@***.**** approved this pull request.
Thanks @tustvold <https://github.com/tustvold> -- I'll get this out in
arrow 9.0.0 next week; I can try to release a patchset sooner if you need
@bjchambers <https://github.com/bjchambers>
—
Reply to this email directly, view it on GitHub
<#1246 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAIY6HXTGYKE5TH6CONQVDUYPEKTANCNFSM5NCAZKAQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Which issue does this PR close?
Closes #1245.
Rationale for this change
Originally reported by @bjchambers, a sanity check added in #1054 fires when reading NullArrays.
This highlighted two problems:
NullArrayReader
has always been somewhat broken as it would never flush the accumulated null buffer, instead just growing it indefinitelyWhat changes are included in this PR?
Fixes the panic, and adds a basic test of the
NullArrayReader
Are there any user-facing changes?
NullArrayReader should work again