-
Notifications
You must be signed in to change notification settings - Fork 918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process parquet bools with microkernels #17157
Process parquet bools with microkernels #17157
Conversation
…lable column code
…attione-nvidia/cudf into mukernels_fixedwidth_optimize
Co-authored-by: nvdbaranec <56695930+nvdbaranec@users.noreply.github.com>
Co-authored-by: nvdbaranec <56695930+nvdbaranec@users.noreply.github.com>
Co-authored-by: nvdbaranec <56695930+nvdbaranec@users.noreply.github.com>
…attione-nvidia/cudf into mukernels_fixedwidth_optimize
Co-authored-by: Vukasin Milovanovic <vmilovanovic@nvidia.com>
…attione-nvidia/cudf into mukernels_fixedwidth_optimize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Only some questions/nits
Co-authored-by: Yunsong Wang <yunsongw@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. I like how the meat of the implementation is so small. The generic kernel continues to dissolve :)
Co-authored-by: nvdbaranec <56695930+nvdbaranec@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great stuff!
/merge |
This adds support for the bool type to reading parquet microkernels. Both plain (bit-packed) and RLE-encoded bool decode is supported, using separate code paths. This PR also massively reduces boilerplate code, as most of the template info needed is already encoded in the kernel mask. Also the superfluous level_t template parameter on rle_run has been removed. And bools have been added to the parquet benchmarks.
Performance: register count drops from 62 -> 56, both plain and RLE-encoded bool decoding are now 46% faster (uncompressed). Reading sample customer data shows no change. NDS tests show no change.
Checklist