Skip to content

Conversation

@nastra
Copy link
Contributor

@nastra nastra commented Nov 18, 2025

This pushes down the LIMIT from Spark to the underlying Scan. This is still expecting the LIMIT to be applied by Spark, but its value is pushed down through the Scan and used as min-rows-requested (introduced by #14565) for server-side scan planning. This is used as a hint during server-side scan planning to not have to return more rows than necessary. It is not required for the server to return that many rows since the scan may not produce that many rows. The server can also return more rows than requested.

@nastra nastra changed the title Spark, Core: Add Limit pushdown to Scan Spark 4.0, Core: Add Limit pushdown to Scan Nov 18, 2025
Comment on lines 297 to 299
public ThisT minRowsRequested(int numRows) {
return newRefinedScan(table, schema, context.minRowsRequested(numRows));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public ThisT minRowsRequested(int numRows) {
return newRefinedScan(table, schema, context.minRowsRequested(numRows));
}
public ThisT minRowsRequested(Integer numRows) {
return newRefinedScan(table, schema, context.minRowsRequested(numRows));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would we want to make this an Integer instead of an int?


/**
* Create a new scan that returns files with at least the given number of rows. This is used as a
* hint during server-side scan planning to not have to return more rows than necessary. It is not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only server-side scan planning, we can extend this to any scan ?

if the intention is strictly to make it for server-side scan i would recommend another interface which implementation can implement both Scan and LimitAwareScan (?)

Copy link
Contributor

@geruh geruh Nov 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but also this is a lot lighter than the open PR #13451 that adds some local optimizations for non rest catalogs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed that wording to not limit this to server-side scan planning

@nastra nastra closed this Nov 19, 2025
@nastra nastra reopened this Nov 19, 2025
@nastra nastra force-pushed the limit-pushdown-from-spark branch from 91e0b16 to bc87a63 Compare November 19, 2025 08:05
@nastra nastra force-pushed the limit-pushdown-from-spark branch from bc87a63 to 501993f Compare November 19, 2025 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants