Skip to content

Conversation

@suxiaogang223
Copy link
Contributor

@suxiaogang223 suxiaogang223 commented Nov 21, 2025

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
When a Paimon table has only 1 snapshot, users cannot perform incremental queries. The validation logic in Doris has two issues:

  1. It rejects queries where startSnapshotId = endSnapshotId:
SELECT * FROM tb_simple@incr('startSnapshotId'='1', 'endSnapshotId'='1');
-- Error: startSnapshotId must be less than endSnapshotId
  1. It rejects queries where startSnapshotId = 0 (which is needed to query all data from a single snapshot):
SELECT * FROM tb_simple@incr('startSnapshotId'='0', 'endSnapshotId'='1');
-- Error: startSnapshotId must be greater than 0

This behavior is inconsistent with Spark Paimon, which:

  • Allows startSnapshotId = endSnapshotId (returns empty result)
  • Allows startSnapshotId = 0 to query all data from the initial state to the specified snapshot

Solution

Align Doris incremental query behavior with Spark Paimon:

  1. Allow startSnapshotId = 0: This enables querying all data from a single snapshot by using startSnapshotId=0, endSnapshotId=1
  2. Allow startSnapshotId = endSnapshotId: This matches Spark Paimon behavior (returns empty result when querying the same snapshot)
  3. Update validation: Allow startSnapshotId >= 0 and endSnapshotId >= 0 (previously > 0)

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223 suxiaogang223 changed the title [fix](paimon) fix paimon increment query [fix](paimon) Allow incremental query with startSnapshotId equals endSnapshotId for Paimon single snapshot scenario Nov 21, 2025
@suxiaogang223 suxiaogang223 changed the title [fix](paimon) Allow incremental query with startSnapshotId equals endSnapshotId for Paimon single snapshot scenario [fix](paimon) Align incremental query behavior with Spark Paimon for single snapshot scenario Nov 21, 2025
@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 35341 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 47f197e8a589238a524a487aec70979ab397f970, data reload: false

------ Round 1 ----------------------------------
q1	17632	5080	4927	4927
q2	2088	313	215	215
q3	10175	1367	735	735
q4	10227	875	365	365
q5	7528	2408	2298	2298
q6	186	167	136	136
q7	925	774	628	628
q8	9370	1362	1155	1155
q9	7108	5404	5435	5404
q10	6829	2237	1828	1828
q11	490	311	286	286
q12	337	373	230	230
q13	17799	3647	3036	3036
q14	228	228	221	221
q15	574	527	505	505
q16	997	994	957	957
q17	581	872	373	373
q18	7484	7605	7995	7605
q19	1242	974	598	598
q20	365	361	238	238
q21	4598	3487	2557	2557
q22	1213	1077	1044	1044
Total cold run time: 107976 ms
Total hot run time: 35341 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5535	5260	5165	5165
q2	330	425	308	308
q3	2610	2962	2422	2422
q4	1502	1896	1484	1484
q5	4559	4413	4432	4413
q6	213	167	130	130
q7	2006	1993	1757	1757
q8	2669	2642	2570	2570
q9	7618	7532	7615	7532
q10	3140	3266	2821	2821
q11	588	552	518	518
q12	678	784	635	635
q13	3589	3829	3014	3014
q14	284	275	256	256
q15	527	487	484	484
q16	1016	1045	1010	1010
q17	1118	1454	1310	1310
q18	7200	7342	6961	6961
q19	763	799	965	799
q20	1918	1968	1878	1878
q21	4738	4398	4340	4340
q22	1119	1041	989	989
Total cold run time: 53720 ms
Total hot run time: 50796 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 187853 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 47f197e8a589238a524a487aec70979ab397f970, data reload: false

query1	1045	465	394	394
query2	6594	1652	1634	1634
query3	6753	230	225	225
query4	26716	22980	22648	22648
query5	4402	660	469	469
query6	339	228	217	217
query7	4647	490	302	302
query8	298	272	245	245
query9	8693	2560	2579	2560
query10	468	373	283	283
query11	15303	15245	14895	14895
query12	171	114	113	113
query13	1700	567	457	457
query14	10816	9201	9104	9104
query15	206	186	172	172
query16	7666	703	541	541
query17	1213	749	636	636
query18	2051	415	320	320
query19	206	200	168	168
query20	131	131	116	116
query21	221	139	113	113
query22	4343	4290	4133	4133
query23	33939	33055	32929	32929
query24	8387	2382	2372	2372
query25	583	517	448	448
query26	1234	277	159	159
query27	2747	497	359	359
query28	4375	2181	2199	2181
query29	785	610	492	492
query30	305	224	196	196
query31	913	820	715	715
query32	86	75	69	69
query33	602	387	317	317
query34	807	872	529	529
query35	804	842	790	790
query36	942	976	894	894
query37	120	114	85	85
query38	3495	3520	3507	3507
query39	1490	1441	1427	1427
query40	223	133	121	121
query41	67	63	63	63
query42	126	111	111	111
query43	485	499	485	485
query44	1233	774	759	759
query45	185	190	171	171
query46	869	993	638	638
query47	1764	1780	1716	1716
query48	400	409	325	325
query49	777	507	401	401
query50	673	680	411	411
query51	3894	4106	3926	3926
query52	117	120	110	110
query53	253	273	200	200
query54	323	306	311	306
query55	92	94	89	89
query56	339	343	373	343
query57	1191	1197	1104	1104
query58	312	291	294	291
query59	2643	2599	2613	2599
query60	382	389	367	367
query61	218	201	191	191
query62	787	719	665	665
query63	236	202	203	202
query64	4588	1302	989	989
query65	4038	3953	3954	3953
query66	1142	450	362	362
query67	15560	15140	15110	15110
query68	7669	945	638	638
query69	516	326	303	303
query70	1319	1260	1352	1260
query71	429	349	325	325
query72	6020	4947	4891	4891
query73	659	590	373	373
query74	8875	8862	8884	8862
query75	3311	3323	2824	2824
query76	3341	1136	734	734
query77	498	408	324	324
query78	9625	9603	8814	8814
query79	2550	846	602	602
query80	767	565	516	516
query81	515	270	241	241
query82	455	157	127	127
query83	304	266	247	247
query84	301	119	93	93
query85	918	476	463	463
query86	382	320	312	312
query87	3774	3712	3662	3662
query88	3749	2255	2259	2255
query89	389	335	297	297
query90	1907	218	219	218
query91	187	175	143	143
query92	84	69	62	62
query93	2187	993	688	688
query94	756	451	340	340
query95	416	315	321	315
query96	475	574	280	280
query97	2920	2934	2935	2934
query98	252	221	210	210
query99	1419	1455	1278	1278
Total cold run time: 275482 ms
Total hot run time: 187853 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.96 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 47f197e8a589238a524a487aec70979ab397f970, data reload: false

query1	0.05	0.04	0.05
query2	0.10	0.05	0.05
query3	0.26	0.09	0.09
query4	1.61	0.11	0.11
query5	0.28	0.27	0.26
query6	1.16	0.67	0.66
query7	0.03	0.03	0.03
query8	0.06	0.04	0.04
query9	0.59	0.52	0.52
query10	0.59	0.57	0.56
query11	0.16	0.11	0.12
query12	0.16	0.12	0.12
query13	0.64	0.62	0.62
query14	1.03	1.01	1.00
query15	0.86	0.85	0.83
query16	0.40	0.40	0.41
query17	1.06	1.05	1.05
query18	0.23	0.20	0.20
query19	1.97	1.81	1.84
query20	0.02	0.01	0.02
query21	15.43	0.21	0.14
query22	4.95	0.08	0.05
query23	15.64	0.26	0.11
query24	2.74	0.85	0.74
query25	0.07	0.06	0.08
query26	0.15	0.14	0.13
query27	0.07	0.06	0.05
query28	4.79	1.23	0.93
query29	12.60	3.93	3.22
query30	0.29	0.16	0.13
query31	2.82	0.61	0.41
query32	3.24	0.60	0.51
query33	3.18	3.06	3.15
query34	15.82	5.31	4.50
query35	4.60	4.55	4.64
query36	0.68	0.51	0.49
query37	0.10	0.07	0.08
query38	0.06	0.05	0.04
query39	0.04	0.03	0.03
query40	0.17	0.14	0.14
query41	0.08	0.03	0.03
query42	0.04	0.03	0.03
query43	0.04	0.03	0.04
Total cold run time: 98.86 s
Total hot run time: 27.96 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (6/6) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 33.33% (2/6) 🎉
Increment coverage report
Complete coverage report

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 22, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes incremental query validation for Paimon tables to align with Spark Paimon behavior. Previously, Doris rejected queries with startSnapshotId=0 or startSnapshotId=endSnapshotId, preventing users from querying single-snapshot tables. The fix relaxes validation constraints to allow startSnapshotId >= 0 (instead of > 0) and startSnapshotId <= endSnapshotId (instead of < endSnapshotId).

Key changes:

  • Modified validation to accept startSnapshotId=0, enabling queries from initial state to a specific snapshot
  • Allow equal snapshot IDs (startSnapshotId=endSnapshotId), which returns empty results consistent with Spark Paimon
  • Updated error messages to reflect the new validation rules

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
fe/fe-core/src/main/java/org/apache/doris/datasource/paimon/source/PaimonScanNode.java Updated validation logic to allow startSnapshotId >= 0, endSnapshotId >= 0, and startSnapshotId <= endSnapshotId; updated error messages accordingly
fe/fe-core/src/test/java/org/apache/doris/datasource/paimon/source/PaimonScanNodeTest.java Modified unit tests to validate new behavior, including test for equal snapshot IDs and negative snapshot ID validation
regression-test/suites/external_table_p0/paimon/paimon_incr_read.groovy Added regression tests for startSnapshotId=0 scenarios and equal snapshot IDs; removed obsolete test that rejected equal snapshot IDs
regression-test/data/external_table_p0/paimon/paimon_incr_read.out Added expected outputs showing empty results for equal snapshot IDs and correct data for startSnapshotId=0 queries

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@morningman morningman merged commit a5f36a1 into apache:master Nov 22, 2025
40 of 42 checks passed
github-actions bot pushed a commit that referenced this pull request Nov 22, 2025
…single snapshot scenario (#58239)

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
When a Paimon table has only 1 snapshot, users cannot perform
incremental queries. The validation logic in Doris has two issues:

1. It rejects queries where `startSnapshotId = endSnapshotId`:
```sql
SELECT * FROM tb_simple@incr('startSnapshotId'='1', 'endSnapshotId'='1');
-- Error: startSnapshotId must be less than endSnapshotId
```

2. It rejects queries where `startSnapshotId = 0` (which is needed to
query all data from a single snapshot):
```sql
SELECT * FROM tb_simple@incr('startSnapshotId'='0', 'endSnapshotId'='1');
-- Error: startSnapshotId must be greater than 0
```

This behavior is inconsistent with Spark Paimon, which:
- Allows `startSnapshotId = endSnapshotId` (returns empty result)
- Allows `startSnapshotId = 0` to query all data from the initial state
to the specified snapshot

## Solution

Align Doris incremental query behavior with Spark Paimon:

1. **Allow `startSnapshotId = 0`**: This enables querying all data from
a single snapshot by using `startSnapshotId=0, endSnapshotId=1`
2. **Allow `startSnapshotId = endSnapshotId`**: This matches Spark
Paimon behavior (returns empty result when querying the same snapshot)
3. **Update validation**: Allow `startSnapshotId >= 0` and
`endSnapshotId >= 0` (previously `> 0`)
github-actions bot pushed a commit that referenced this pull request Nov 22, 2025
…single snapshot scenario (#58239)

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
When a Paimon table has only 1 snapshot, users cannot perform
incremental queries. The validation logic in Doris has two issues:

1. It rejects queries where `startSnapshotId = endSnapshotId`:
```sql
SELECT * FROM tb_simple@incr('startSnapshotId'='1', 'endSnapshotId'='1');
-- Error: startSnapshotId must be less than endSnapshotId
```

2. It rejects queries where `startSnapshotId = 0` (which is needed to
query all data from a single snapshot):
```sql
SELECT * FROM tb_simple@incr('startSnapshotId'='0', 'endSnapshotId'='1');
-- Error: startSnapshotId must be greater than 0
```

This behavior is inconsistent with Spark Paimon, which:
- Allows `startSnapshotId = endSnapshotId` (returns empty result)
- Allows `startSnapshotId = 0` to query all data from the initial state
to the specified snapshot

## Solution

Align Doris incremental query behavior with Spark Paimon:

1. **Allow `startSnapshotId = 0`**: This enables querying all data from
a single snapshot by using `startSnapshotId=0, endSnapshotId=1`
2. **Allow `startSnapshotId = endSnapshotId`**: This matches Spark
Paimon behavior (returns empty result when querying the same snapshot)
3. **Update validation**: Allow `startSnapshotId >= 0` and
`endSnapshotId >= 0` (previously `> 0`)
morrySnow pushed a commit that referenced this pull request Nov 25, 2025
… Paimon for single snapshot scenario #58239 (#58253)

Cherry-picked from #58239

Co-authored-by: Socrates <suyiteng@selectdb.com>
yiguolei pushed a commit that referenced this pull request Dec 2, 2025
… Paimon for single snapshot scenario #58239 (#58254)

Cherry-picked from #58239

Co-authored-by: Socrates <suyiteng@selectdb.com>
@suxiaogang223 suxiaogang223 deleted the fix_paimon_incr branch December 5, 2025 06:22
nagisa-kunhah pushed a commit to nagisa-kunhah/doris that referenced this pull request Dec 14, 2025
…single snapshot scenario (apache#58239)

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
When a Paimon table has only 1 snapshot, users cannot perform
incremental queries. The validation logic in Doris has two issues:

1. It rejects queries where `startSnapshotId = endSnapshotId`:
```sql
SELECT * FROM tb_simple@incr('startSnapshotId'='1', 'endSnapshotId'='1');
-- Error: startSnapshotId must be less than endSnapshotId
```

2. It rejects queries where `startSnapshotId = 0` (which is needed to
query all data from a single snapshot):
```sql
SELECT * FROM tb_simple@incr('startSnapshotId'='0', 'endSnapshotId'='1');
-- Error: startSnapshotId must be greater than 0
```

This behavior is inconsistent with Spark Paimon, which:
- Allows `startSnapshotId = endSnapshotId` (returns empty result)
- Allows `startSnapshotId = 0` to query all data from the initial state
to the specified snapshot

## Solution

Align Doris incremental query behavior with Spark Paimon:

1. **Allow `startSnapshotId = 0`**: This enables querying all data from
a single snapshot by using `startSnapshotId=0, endSnapshotId=1`
2. **Allow `startSnapshotId = endSnapshotId`**: This matches Spark
Paimon behavior (returns empty result when querying the same snapshot)
3. **Update validation**: Allow `startSnapshotId >= 0` and
`endSnapshotId >= 0` (previously `> 0`)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.4-merged dev/4.0.2-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants