-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Spark 4.0, Core: Add Limit pushdown to Scan #14615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| public ThisT minRowsRequested(int numRows) { | ||
| return newRefinedScan(table, schema, context.minRowsRequested(numRows)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public ThisT minRowsRequested(int numRows) { | |
| return newRefinedScan(table, schema, context.minRowsRequested(numRows)); | |
| } | |
| public ThisT minRowsRequested(Integer numRows) { | |
| return newRefinedScan(table, schema, context.minRowsRequested(numRows)); | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would we want to make this an Integer instead of an int?
|
|
||
| /** | ||
| * Create a new scan that returns files with at least the given number of rows. This is used as a | ||
| * hint during server-side scan planning to not have to return more rows than necessary. It is not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why only server-side scan planning, we can extend this to any scan ?
if the intention is strictly to make it for server-side scan i would recommend another interface which implementation can implement both Scan and LimitAwareScan (?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but also this is a lot lighter than the open PR #13451 that adds some local optimizations for non rest catalogs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed that wording to not limit this to server-side scan planning
91e0b16 to
bc87a63
Compare
bc87a63 to
501993f
Compare
This pushes down the
LIMITfrom Spark to the underlying Scan. This is still expecting theLIMITto be applied by Spark, but its value is pushed down through the Scan and used asmin-rows-requested(introduced by #14565) for server-side scan planning. This is used as a hint during server-side scan planning to not have to return more rows than necessary. It is not required for the server to return that many rows since the scan may not produce that many rows. The server can also return more rows than requested.