Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix unnecessarily strict check in parquet chunked reader for choosing…
… split locations. (rapidsai#16099) This is a fix that somehow didn't make it into the initial wave of bug fixes for the parquet chunked reader earlier this year. The code that determines where to do splits needs to be sure it always chooses a location such that the pages that are selected always enclose at least one full row for a list column. This means that you need to see at least 1 full row (2 row boundaries) in the group of pages. The weaklogic was only checking if you had 1 full row within the very last page in the selection, which is unnecessarily strict. We actually ran into some data out in the wild where this was hit. This PR changes the logic to include all pages within the chunk when doing the check instead of just the last one. Authors: - https://github.com/nvdbaranec - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - Bradley Dice (https://github.com/bdice) - Muhammad Haseeb (https://github.com/mhaseeb123) - Vukasin Milovanovic (https://github.com/vuule) URL: rapidsai#16099
- Loading branch information