-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](paimon) Align incremental query behavior with Spark Paimon for single snapshot scenario #58239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 35341 ms |
TPC-DS: Total hot run time: 187853 ms |
ClickBench: Total hot run time: 27.96 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes incremental query validation for Paimon tables to align with Spark Paimon behavior. Previously, Doris rejected queries with startSnapshotId=0 or startSnapshotId=endSnapshotId, preventing users from querying single-snapshot tables. The fix relaxes validation constraints to allow startSnapshotId >= 0 (instead of > 0) and startSnapshotId <= endSnapshotId (instead of < endSnapshotId).
Key changes:
- Modified validation to accept
startSnapshotId=0, enabling queries from initial state to a specific snapshot - Allow equal snapshot IDs (
startSnapshotId=endSnapshotId), which returns empty results consistent with Spark Paimon - Updated error messages to reflect the new validation rules
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| fe/fe-core/src/main/java/org/apache/doris/datasource/paimon/source/PaimonScanNode.java | Updated validation logic to allow startSnapshotId >= 0, endSnapshotId >= 0, and startSnapshotId <= endSnapshotId; updated error messages accordingly |
| fe/fe-core/src/test/java/org/apache/doris/datasource/paimon/source/PaimonScanNodeTest.java | Modified unit tests to validate new behavior, including test for equal snapshot IDs and negative snapshot ID validation |
| regression-test/suites/external_table_p0/paimon/paimon_incr_read.groovy | Added regression tests for startSnapshotId=0 scenarios and equal snapshot IDs; removed obsolete test that rejected equal snapshot IDs |
| regression-test/data/external_table_p0/paimon/paimon_incr_read.out | Added expected outputs showing empty results for equal snapshot IDs and correct data for startSnapshotId=0 queries |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…single snapshot scenario (#58239) ### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: When a Paimon table has only 1 snapshot, users cannot perform incremental queries. The validation logic in Doris has two issues: 1. It rejects queries where `startSnapshotId = endSnapshotId`: ```sql SELECT * FROM tb_simple@incr('startSnapshotId'='1', 'endSnapshotId'='1'); -- Error: startSnapshotId must be less than endSnapshotId ``` 2. It rejects queries where `startSnapshotId = 0` (which is needed to query all data from a single snapshot): ```sql SELECT * FROM tb_simple@incr('startSnapshotId'='0', 'endSnapshotId'='1'); -- Error: startSnapshotId must be greater than 0 ``` This behavior is inconsistent with Spark Paimon, which: - Allows `startSnapshotId = endSnapshotId` (returns empty result) - Allows `startSnapshotId = 0` to query all data from the initial state to the specified snapshot ## Solution Align Doris incremental query behavior with Spark Paimon: 1. **Allow `startSnapshotId = 0`**: This enables querying all data from a single snapshot by using `startSnapshotId=0, endSnapshotId=1` 2. **Allow `startSnapshotId = endSnapshotId`**: This matches Spark Paimon behavior (returns empty result when querying the same snapshot) 3. **Update validation**: Allow `startSnapshotId >= 0` and `endSnapshotId >= 0` (previously `> 0`)
…single snapshot scenario (#58239) ### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: When a Paimon table has only 1 snapshot, users cannot perform incremental queries. The validation logic in Doris has two issues: 1. It rejects queries where `startSnapshotId = endSnapshotId`: ```sql SELECT * FROM tb_simple@incr('startSnapshotId'='1', 'endSnapshotId'='1'); -- Error: startSnapshotId must be less than endSnapshotId ``` 2. It rejects queries where `startSnapshotId = 0` (which is needed to query all data from a single snapshot): ```sql SELECT * FROM tb_simple@incr('startSnapshotId'='0', 'endSnapshotId'='1'); -- Error: startSnapshotId must be greater than 0 ``` This behavior is inconsistent with Spark Paimon, which: - Allows `startSnapshotId = endSnapshotId` (returns empty result) - Allows `startSnapshotId = 0` to query all data from the initial state to the specified snapshot ## Solution Align Doris incremental query behavior with Spark Paimon: 1. **Allow `startSnapshotId = 0`**: This enables querying all data from a single snapshot by using `startSnapshotId=0, endSnapshotId=1` 2. **Allow `startSnapshotId = endSnapshotId`**: This matches Spark Paimon behavior (returns empty result when querying the same snapshot) 3. **Update validation**: Allow `startSnapshotId >= 0` and `endSnapshotId >= 0` (previously `> 0`)
…single snapshot scenario (apache#58239) ### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: When a Paimon table has only 1 snapshot, users cannot perform incremental queries. The validation logic in Doris has two issues: 1. It rejects queries where `startSnapshotId = endSnapshotId`: ```sql SELECT * FROM tb_simple@incr('startSnapshotId'='1', 'endSnapshotId'='1'); -- Error: startSnapshotId must be less than endSnapshotId ``` 2. It rejects queries where `startSnapshotId = 0` (which is needed to query all data from a single snapshot): ```sql SELECT * FROM tb_simple@incr('startSnapshotId'='0', 'endSnapshotId'='1'); -- Error: startSnapshotId must be greater than 0 ``` This behavior is inconsistent with Spark Paimon, which: - Allows `startSnapshotId = endSnapshotId` (returns empty result) - Allows `startSnapshotId = 0` to query all data from the initial state to the specified snapshot ## Solution Align Doris incremental query behavior with Spark Paimon: 1. **Allow `startSnapshotId = 0`**: This enables querying all data from a single snapshot by using `startSnapshotId=0, endSnapshotId=1` 2. **Allow `startSnapshotId = endSnapshotId`**: This matches Spark Paimon behavior (returns empty result when querying the same snapshot) 3. **Update validation**: Allow `startSnapshotId >= 0` and `endSnapshotId >= 0` (previously `> 0`)
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
When a Paimon table has only 1 snapshot, users cannot perform incremental queries. The validation logic in Doris has two issues:
startSnapshotId = endSnapshotId:startSnapshotId = 0(which is needed to query all data from a single snapshot):This behavior is inconsistent with Spark Paimon, which:
startSnapshotId = endSnapshotId(returns empty result)startSnapshotId = 0to query all data from the initial state to the specified snapshotSolution
Align Doris incremental query behavior with Spark Paimon:
startSnapshotId = 0: This enables querying all data from a single snapshot by usingstartSnapshotId=0, endSnapshotId=1startSnapshotId = endSnapshotId: This matches Spark Paimon behavior (returns empty result when querying the same snapshot)startSnapshotId >= 0andendSnapshotId >= 0(previously> 0)Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)