Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Update Parquet row filtering to handle type coercion #10716

Merged
merged 5 commits into from
Jun 5, 2024

Conversation

jeffreyssmith2nd
Copy link
Contributor

@jeffreyssmith2nd jeffreyssmith2nd commented May 29, 2024

Which issue does this PR close?

Closes #7925.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the core Core DataFusion crate label May 29, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking definitely on the right path

datafusion/core/src/datasource/schema_adapter.rs Outdated Show resolved Hide resolved
@jeffreyssmith2nd jeffreyssmith2nd marked this pull request as ready for review May 31, 2024 14:17
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jeffreyssmith2nd

I went over the code and test carefully and I think it looks good. I took the liberty of merging up to resolve a conflict.

I also verified that the test covers the code change by removing the code change locally and running the test. When I did so the test fails like this:


assertion failed: filtered.is_ok()
thread 'datasource::physical_plan::parquet::row_filter::test::test_filter_type_coercion' panicked at datafusion/core/src/datasource/physical_plan/parquet/row_filter.rs:549:9:
assertion failed: filtered.is_ok()
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:72:14
   2: core::panicking::panic
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:145:5
   3: datafusion::datasource::physical_plan::parquet::row_filter::test::test_filter_type_coercion
             at ./src/datasource/physical_plan/parquet/row_filter.rs:549:9
   4: datafusion::datasource::physical_plan::parquet::row_filter::test::test_filter_type_coercion::{{closure}}
             at ./src/datasource/physical_plan/parquet/row_filter.rs:486:35
   5: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
   6: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

error: test failed, to rerun pass `--lib`
error: 1 target failed:
    `--lib`

I have some suggestions for the test, but otherwise I think this PR is ready to go.

@Ted-Jiang and @thinkharderdev I wonder if you have a moment to review this too as I believe you actively use the filter pushdown into parquet

datafusion/core/src/datasource/schema_adapter.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @jeffreyssmith2nd

I plan to leave this open for another day in case there are additional comments

Copy link
Contributor

@thinkharderdev thinkharderdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice. lgtm

@alamb alamb merged commit 70256ba into apache:main Jun 5, 2024
23 checks passed
@alamb
Copy link
Contributor

alamb commented Jun 5, 2024

Thank you @jeffreyssmith2nd and @thinkharderdev

@alamb alamb deleted the smith/rowfilter branch June 5, 2024 16:17
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
)

* test: Add a failing test to show the lack of type coercion in row filters

* feat: update parquet row filter to handle type coercion

* chore: lint/fmt

* chore: test improvements and cleanup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parquet Predicate Pushdown Does Not Handle Type Coercion
3 participants