Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose libcudf filter expression in read_parquet #15028

Merged

Conversation

wence-
Copy link
Contributor

@wence- wence- commented Feb 12, 2024

Description

libcudf's parquet reader supports filtering rows of the input dataset based on a (restricted subset of) libcudf Expression. Previously this functionality was not exposed in Python-land, do so here.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@wence- wence- requested a review from a team as a code owner February 12, 2024 17:58
@github-actions github-actions bot added the Python Affects Python cuDF API. label Feb 12, 2024
Copy link
Contributor

@shwina shwina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me. Since this is a Cython-only change, I assume we have no tests for it? @vyasr What is the pylibcudf testing plan?

@shwina shwina added non-breaking Non-breaking change improvement Improvement / enhancement to an existing function labels Feb 12, 2024
@shwina
Copy link
Contributor

shwina commented Feb 12, 2024

Going to go ahead and merge this one. The broader question of pylibcudf testing is an important one - let's discuss in a separate issue!

@shwina
Copy link
Contributor

shwina commented Feb 12, 2024

/merge

@shwina
Copy link
Contributor

shwina commented Feb 12, 2024

I have no power here.

@bdice
Copy link
Contributor

bdice commented Feb 13, 2024

I have no power here.

The devcontainers build is failing, which blocks merge. I am working on a fix to libkvikio.

@vyasr
Copy link
Contributor

vyasr commented Feb 13, 2024

Left a comment on the pylibcudf story issue about testing, feel free to respond there and continue that conversation.

@wence- wence- force-pushed the wence/fea/read_parquet-filter branch from 37ad3a7 to b9c195f Compare February 13, 2024 15:54
@wence- wence- force-pushed the wence/fea/read_parquet-filter branch from b9c195f to a696ed2 Compare February 14, 2024 18:11
@rapids-bot rapids-bot bot merged commit 99ed8b9 into rapidsai:branch-24.04 Feb 15, 2024
69 checks passed
@wence- wence- deleted the wence/fea/read_parquet-filter branch February 15, 2024 12:13
@vyasr vyasr added the pylibcudf Issues specific to the pylibcudf package label May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change pylibcudf Issues specific to the pylibcudf package Python Affects Python cuDF API.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants