OpenAPI: Add min-rows-requested field to PlanTableScanRequest #14565

ajreid21 · 2025-11-12T01:22:16Z

Proposal to add to the spec a way for the client to give the server a hint/indicator for the amount of rows being requested so that the server does not need to return more results than is necessary. Just to be clear, this is not meant to mean the server must return X number of rows or that the server should return at most X of number of rows.

ML Discussion: https://lists.apache.org/thread/m51fxlsbt5yk219ypk2dhj07tlk3407b

singhpk234 · 2025-11-12T04:05:14Z

open-api/rest-catalog-open-api.yaml

          $ref: '#/components/schemas/Expression'
+        min-rows-requested:
+          description:
+            The minimum number of rows requested for the scan


[doubt] how is the server supposed to fulfill this request, when there are equality deletes present ?

Additional questions -

may be we should be explicit that its an hint ?

what would server do if its certain table doesn't have minimum number of row, 400 bad request ?

we might need to define iceberg scan api like we do for projection and filter so that E2E plumbing works

Yes, this is intended as a "hint" or "indicator" from the client to help the server not have to return more than is necessary. It is not required for the server to return that many results (as the result of the scan may not have that many rows, that's why it's named "min-rows-requested" and not "min-rows-required"). Also, it's not a max limit either and the server can return more than the requested number.

I'm open to different name and better description for this

What about The minimum number of rows requested for the scan. This is used as a hint to the server to not have to return more rows than necessary. It is not required for the server to return that many rows since the scan may not have that many rows. The server can also return more rows than requested

This should make it easier to understand that this is a hint and what the expectations are

I think @nastra description is pretty reasonable, +1. This is fundamentally a hint to the server, and I would almost certainly not want to fail if a table doesn't have enough records.

+1 from my end too ! it addresses my questions above

Thanks, I updated the description.

Fokko · 2025-11-13T12:31:10Z

open-api/rest-catalog-open-api.py

    filter: Optional[Expression] = Field(
        None, description='Expression used to filter the table data'
    )
+    min_rows_requested: Optional[int] = Field(


I find the name and description a bit confusing. For me, it is obvious that it might not return exactly the number of rows that are requested. For example, the table might be empty. I would much rather go with something that's analogue to SQL:

Suggested change

min_rows_requested: Optional[int] = Field(

limit: Optional[int] = Field(

Of course, it might still return more rows since the Parquet file contains more rows.

I guess I look at things the other way since I feel like limit connotes a strict upper-bound (that is if there are sufficient matching rows) as how it means in SQL. Another idea: limit-hint?

I agree with @Fokko , what about soft_limit?

just to jump in on bike shedding how about max_rows_required I think it lacks the ambiguity if someone isn't familiar with SQL and I think my issue with and hopefully makes it clearer that this is in fact an upper bound?

just to pile on. target-scan-task-number

@nastra lower bound may not always be valid. e.g. maybe the table just don't have enough number of rows to satisfy the lower bound.

target is probably a good name for the desired plan size. Wondering why not using bytes (instead of rows) to express the target, similar to read.split.target-size?

Anyway, I would suggest target-plan-size-rows or target-plan-size-bytes.

@nastra lower bound may not always be valid. e.g. maybe the table just don't have enough number of rows to satisfy the lower bound.

That's why we added wording for this case: It is not required for the server to return that many rows since the scan may not produce that many rows

target is probably a good name for the desired plan size. Wondering why not using bytes (instead of rows) to express the target, similar to read.split.target-size?

We eventually want to push down the LIMIT from the engine to this property to indicate the number of rows. The server can then do its optimization and return at least that number of rows. As a final step, the engine would then apply the LIMIT on top.

Yeah I don't think we want to do bytes, this is ultimately for allowing engines to have any easy way to express a limit so that we limit the amount of work that needs to be done on a server.

IMO, after going through this thread, I think I still find the most compelling option to be what I proposed earlier, limit-hint. I think that this naming best indicates to a client what the intent is ("To express a limit to the server") and that it's a hint. I also found out that there's some prior art on this naming here as it looks like the Delta Protocol naming for this concept is also called limitHint. https://github.com/delta-io/delta-sharing/blob/main/PROTOCOL.md#request-body

I'm also good with the current PR's min-rows-requested.

I misunderstood the purpose of this config. I thought it was for the target size of the PlanTasks from the PlanTableScan, similar to the read.split.target-size.

Didn't realize it is for the limit push down. I am fine with either min-rows-requested or limit-hint. slight preference for min-rows-requested since it is more descriptive.

thanks everyone, I would suggest to keep it named as min-rows-requested as it precisely describes its purpose and intent.

nastra · 2025-11-18T13:45:37Z

open-api/rest-catalog-open-api.yaml

+            for the server to return that many rows since the scan may not produce that 
+            many rows. The server can also return more rows than requested.
+          type: integer
+          format: int64


I believe this should be int32 because the LIMIT that Spark pushes down is also an int (https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/connector/read/SupportsPushDownLimit.html)

I think I would keep this as int64; yes it's super unlikely to want to limit beyond int max, but I believe the protocol should be open and then the client implementation can be bounded however it likes.

agree with Amogh here

int64 SG ! just checked PG support limit as int64

singhpk234 · 2025-11-19T05:37:38Z

open-api/rest-catalog-open-api.yaml

+            The minimum number of rows requested for the scan. This is used as a hint 
+            to the server to not have to return more rows than necessary. It is not required 
+            for the server to return that many rows since the scan may not produce that 
+            many rows. The server can also return more rows than requested.


wondering if we should bake assertion that it should not < 0 in spec ? IMHO its not necessary as server can always respond with a 400, just putting it out here in case other folks think otherwise.

update IRC spec add minimum rows requested field to PlanTableScanRequest

5d08c6f

github-actions bot added the OPENAPI label Nov 12, 2025

singhpk234 self-requested a review November 12, 2025 03:45

singhpk234 reviewed Nov 12, 2025

View reviewed changes

nastra changed the title ~~OpenAPI: Add minimum rows requested field to PlanTableScanRequest~~ OpenAPI: Add min-rows-requested field to PlanTableScanRequest Nov 12, 2025

ajreid21 added 2 commits November 12, 2025 08:05

update description

396c942

fix

510aee5

Fokko reviewed Nov 13, 2025

View reviewed changes

nastra mentioned this pull request Nov 18, 2025

Core: Add min-rows-requested to PlanTableScanRequest #14614

Open

nastra reviewed Nov 18, 2025

View reviewed changes

nastra mentioned this pull request Nov 18, 2025

Spark 4.0, Core: Add Limit pushdown to Scan #14615

Open

amogh-jahagirdar approved these changes Nov 18, 2025

View reviewed changes

huaxingao approved these changes Nov 18, 2025

View reviewed changes

singhpk234 reviewed Nov 19, 2025

View reviewed changes

singhpk234 approved these changes Nov 19, 2025

View reviewed changes

	min_rows_requested: Optional[int] = Field(
	limit: Optional[int] = Field(

OpenAPI: Add min-rows-requested field to PlanTableScanRequest #14565

Are you sure you want to change the base?

OpenAPI: Add min-rows-requested field to PlanTableScanRequest #14565

Uh oh!

Conversation

ajreid21 commented Nov 12, 2025 • edited by nastra Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

singhpk234 Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajreid21 Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nastra Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu Nov 17, 2025 • edited by nastra Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

ajreid21 commented Nov 12, 2025 •

edited by nastra

Loading

singhpk234 Nov 12, 2025 •

edited

Loading

ajreid21 Nov 12, 2025 •

edited

Loading

nastra Nov 12, 2025 •

edited

Loading

stevenzwu Nov 17, 2025 •

edited by nastra

Loading