Spark 4.0, Core: Add Limit pushdown to Scan #14615

nastra · 2025-11-18T14:07:58Z

This pushes down the LIMIT from Spark to the underlying Scan. This is still expecting the LIMIT to be applied by Spark, but its value is pushed down through the Scan and used as min-rows-requested (introduced by #14565) for server-side scan planning. This is used as a hint during server-side scan planning to not have to return more rows than necessary. It is not required for the server to return that many rows since the scan may not produce that many rows. The server can also return more rows than requested.

singhpk234 · 2025-11-18T14:26:41Z

core/src/main/java/org/apache/iceberg/BaseScan.java

+  public ThisT minRowsRequested(int numRows) {
+    return newRefinedScan(table, schema, context.minRowsRequested(numRows));
+  }


Suggested change

public ThisT minRowsRequested(int numRows) {

return newRefinedScan(table, schema, context.minRowsRequested(numRows));

}

public ThisT minRowsRequested(Integer numRows) {

return newRefinedScan(table, schema, context.minRowsRequested(numRows));

}

why would we want to make this an Integer instead of an int?

singhpk234 · 2025-11-18T14:27:41Z

api/src/main/java/org/apache/iceberg/Scan.java

+
+  /**
+   * Create a new scan that returns files with at least the given number of rows. This is used as a
+   * hint during server-side scan planning to not have to return more rows than necessary. It is not


why only server-side scan planning, we can extend this to any scan ?

if the intention is strictly to make it for server-side scan i would recommend another interface which implementation can implement both Scan and LimitAwareScan (?)

Yeah, but also this is a lot lighter than the open PR #13451 that adds some local optimizations for non rest catalogs.

I've removed that wording to not limit this to server-side scan planning

github-actions bot added API spark core labels Nov 18, 2025

nastra changed the title ~~Spark, Core: Add Limit pushdown to Scan~~ Spark 4.0, Core: Add Limit pushdown to Scan Nov 18, 2025

singhpk234 reviewed Nov 18, 2025

View reviewed changes

nastra closed this Nov 19, 2025

nastra reopened this Nov 19, 2025

nastra added 2 commits November 19, 2025 07:52

Core: Add min-rows-requested to PlanTableScanRequest

2469c6e

review feedback

4d0d187

nastra force-pushed the limit-pushdown-from-spark branch from 91e0b16 to bc87a63 Compare November 19, 2025 08:05

Spark, Core: Add Limit pushdown to Scan

501993f

nastra force-pushed the limit-pushdown-from-spark branch from bc87a63 to 501993f Compare November 19, 2025 08:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spark 4.0, Core: Add Limit pushdown to Scan #14615

Spark 4.0, Core: Add Limit pushdown to Scan #14615

nastra commented Nov 18, 2025

Uh oh!

singhpk234 Nov 18, 2025

Uh oh!

nastra Nov 19, 2025

Uh oh!

singhpk234 Nov 18, 2025

Uh oh!

geruh Nov 19, 2025 •

edited

Loading

Uh oh!

nastra Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Spark 4.0, Core: Add Limit pushdown to Scan #14615

Are you sure you want to change the base?

Spark 4.0, Core: Add Limit pushdown to Scan #14615

Conversation

nastra commented Nov 18, 2025

Uh oh!

singhpk234 Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

nastra Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

singhpk234 Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

geruh Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nastra Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

geruh Nov 19, 2025 •

edited

Loading