Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(rust): Fix null handling in collect_statistics of parquet reader #20362

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

codesorcery
Copy link

Fixes #20361

@github-actions github-actions bot added fix Bug fix rust Related to Rust Polars labels Dec 19, 2024
Copy link

codecov bot commented Dec 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.90%. Comparing base (6e4d717) to head (c7808d3).
Report is 12 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #20362      +/-   ##
==========================================
- Coverage   79.13%   78.90%   -0.23%     
==========================================
  Files        1572     1567       -5     
  Lines      219839   219920      +81     
  Branches     2462     2465       +3     
==========================================
- Hits       173970   173537     -433     
- Misses      45301    45815     +514     
  Partials      568      568              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritchie46
Copy link
Member

@coastalwhite is this ok?

@coastalwhite
Copy link
Collaborator

I will look at it. I would really like to make a test with an MRE before merging this. I am pretty sure if I look at the code for a second I can find one.

Copy link
Collaborator

@coastalwhite coastalwhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an okay fix for now. It should really be handled in #20203 by filling in a null.

@coastalwhite coastalwhite force-pushed the fix-null-handling-parquet-collect-statistics branch from fd89933 to 45e5cfa Compare December 20, 2024 10:30
@coastalwhite coastalwhite force-pushed the fix-null-handling-parquet-collect-statistics branch from 45e5cfa to 08d86af Compare December 20, 2024 10:38
Copy link
Collaborator

@coastalwhite coastalwhite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think this should just be fixed by #20203. This kind of makes it impossible to do predicate pushdown with missing columns.

@codesorcery codesorcery force-pushed the fix-null-handling-parquet-collect-statistics branch from ae28fd2 to c7808d3 Compare December 20, 2024 12:35
@codesorcery
Copy link
Author

@coastalwhite thanks for the quick review and the MRE!
However, I think there was a mistake in the test. You were filtering on pl.col.a == pl.col.b, which is true for the data in the first data frame and false for the data in the second data frame. However, the assertion expects the result to contain the data of both data frames.
I updated the test, so that in both data frames there are rows which the filter applies to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

polars.scan_delta(..).filter(..).collect() fails for some datasets
3 participants