-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable parquet filter pushdown by default #3463
Comments
I plan to review the parquet test coverage over the next day or two My basic plan is to:
|
Update here: I am working on some fuzz testing for the predicate pushdown code |
I found a bug in pushdown by writing a test #4005 |
I found another one with my test. I will keep the list on this ticket updated |
I think we are getting close |
Update here is I think once we get #3976 merged I'll put up the PR to enable the feature by default |
@alamb What's the remaining work that needs to be done to enable it by default? Anything I can do to get this over the finish line? |
@Dandandan ❤️ thank you I know of two major items:
For item 1, I tried a few times, but am currently blocked by #5104 and haven't had time to return to it For item 2, I think someone needs to look at some benchmark results (perhaps TPCH) and figure out what we can do to avoid the regression (or measure and determine it isn't significant) I keep hoping to have time to work on item 1, as I am pretty sure I know what the problem is, but other things keep coming up :( I haven't had a chance to work on 2 yet |
Update here is I am working on a larger benchmarking story, part of which would give us more confidence to merge changes like this in. I hope to have that done early next week |
With the introduction of |
apache/arrow-rs#5523 might help mitigate the impact of pushing down predicates that turn out to not be very selective |
It would be really nice to (finally) be able to turn this optimization on -- thank you @tustvold cc @Dandandan |
@alamb I wonder if you already have some further insights? |
I am just reading the latest messages on apache/arrow-rs#5523 , thanks for the updates |
The writeup here from @tustvold is quite good: apache/arrow-rs#6454 (comment) |
In #3380 @thinkharderdev added support for evaluating filters during the parquet scan via the RowIndex mechanism 🎉
This feature is currently enabled via a feature flag, which is disabled by default.
This ticket tracks enabling this feature by default.
Currently known items are:
The text was updated successfully, but these errors were encountered: