-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet limit pushdown (#5404) #5416
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @tustvold -- it would be nice to have a test for this, but I am not sure there is any way to test this really (other than performance benchmarks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nice job. Thank you @tustvold
There is tpch in the current |
It is a good idea @jackwener -- I have it on my list this week to do some more organizing related to benchmarking. I definitely agree that clickbench would be super helpful |
Benchmark runs are scheduled for baseline = 58cd1bf and contender = 4118076. 4118076 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Relates to #5404
Rationale for this change
apache/arrow-rs#3633 added the ability to push down limits to the parquet reader. This is particularly important when filter pushdown is enabled on
ParquetOptions
(soon to be default), as it allows the limit to be applied before late materialization, which has significant performance benefitsWhat changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?