Skip to content

[feat][iceberg] Support Iceberg Meta Procedure implementations#56257

Merged
morningman merged 15 commits intoapache:masterfrom
vinlee19:support_iceberg_meta_procedure
Oct 9, 2025
Merged

[feat][iceberg] Support Iceberg Meta Procedure implementations#56257
morningman merged 15 commits intoapache:masterfrom
vinlee19:support_iceberg_meta_procedure

Conversation

@vinlee19
Copy link
Contributor

@vinlee19 vinlee19 commented Sep 20, 2025

What problem does this PR solve?

This PR extends the OPTIMIZE TABLE framework introduced in #55679 by implementing additional Iceberg meta procedure actions. Building upon the foundation established for Iceberg
table optimization, this enhancement adds critical snapshot management operations that enable more sophisticated Iceberg table maintenance workflows.

New Iceberg Actions Implemented

This PR introduces 5 new Iceberg meta procedure actions:

  1. cherrypick_snapshot - Cherry-picks changes from a specific snapshot
  2. fast_forward - Fast-forwards one branch to match another branch's latest snapshot
  3. rollback_to_snapshot - Rolls back table to a specific snapshot
  4. rollback_to_timestamp - Rolls back table to a specific timestamp
  5. set_current_snapshot - Sets a specific snapshot as current

Example Usage

-- Cherry-pick changes from a snapshot
OPTIMIZE TABLE iceberg_catalog.db.table
PROPERTIES("action" = "cherrypick_snapshot", "snapshot_id" = "123456789");
  -- Fast-forward branch to match another branch
  OPTIMIZE TABLE iceberg_catalog.db.table
  PROPERTIES("action" = "fast_forward", "branch" = "feature", "to" = "main");
  -- Rollback to specific snapshot
  OPTIMIZE TABLE iceberg_catalog.db.table
  PROPERTIES("action" = "rollback_to_snapshot", "snapshot_id" = "987654321");

The regression testing strategy utilizes internal Iceberg catalog operations for table creation, data insertion, and branch/tag management, ensuring test stability and eliminating
dependencies on external tools like Spark SQL for test data preparation.

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@vinlee19
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/110) 🎉
Increment coverage report
Complete coverage report

@vinlee19
Copy link
Contributor Author

run performance

@vinlee19
Copy link
Contributor Author

run p0

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 56.36% (62/110) 🎉
Increment coverage report
Complete coverage report

@vinlee19
Copy link
Contributor Author

run performance

@vinlee19
Copy link
Contributor Author

run p0

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 56.36% (62/110) 🎉
Increment coverage report
Complete coverage report

@vinlee19
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 1487 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit be10a6eb26571d82f106dc59739e5220d89d147a, data reload: false

------ Round 1 ----------------------------------
q1	6393	32	22	22
q2	593	26	25	25
q3	873	17	17	17
q4	934	16	15	15
q5	2216	15	16	15
q6	224	15	14	14
q7	976	22	21	21
q8	1182	14	12	12
q9	16853	19	12	12
q10	4786	15	12	12
q11	455	24	24	24
q12	322	13	12	12
q13	17630	13	11	11
q14	238	11	10	10
q15	623	13	10	10
q16	997	1002	925	925
q17	531	12	11	11
q18	7730	13	12	12
q19	1073	12	11	11
q20	350	364	262	262
q21	5037	22	21	21
q22	1076	13	13	13
Total cold run time: 71092 ms
Total hot run time: 1487 ms

----- Round 2, with runtime_filter_mode=off -----
q1	12	11	11	11
q2	20	19	19	19
q3	11	11	11	11
q4	12	12	11	11
q5	11	10	10	10
q6	11	11	10	10
q7	20	19	18	18
q8	11	11	11	11
q9	11	11	10	10
q10	10	10	10	10
q11	19	19	20	19
q12	10	9	10	9
q13	9	10	10	10
q14	10	10	12	10
q15	10	11	10	10
q16	1086	1113	1043	1043
q17	12	10	9	9
q18	10	9	9	9
q19	10	10	11	10
q20	1915	2002	1831	1831
q21	20	19	19	19
q22	10	10	10	10
Total cold run time: 3250 ms
Total hot run time: 3110 ms

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/100) 🎉
Increment coverage report
Complete coverage report

@vinlee19
Copy link
Contributor Author

run p0

@vinlee19
Copy link
Contributor Author

run cloud_p0

@vinlee19
Copy link
Contributor Author

run performance

@doris-robot
Copy link

TPC-H: Total hot run time: 1512 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit be10a6eb26571d82f106dc59739e5220d89d147a, data reload: false

------ Round 1 ----------------------------------
q1	6292	26	18	18
q2	639	26	28	26
q3	874	17	16	16
q4	969	16	17	16
q5	2261	15	14	14
q6	219	13	12	12
q7	946	22	21	21
q8	1194	15	12	12
q9	17068	13	12	12
q10	4796	17	15	15
q11	463	26	22	22
q12	344	13	12	12
q13	17618	13	12	12
q14	260	13	12	12
q15	620	11	11	11
q16	1012	980	958	958
q17	536	12	11	11
q18	7614	14	12	12
q19	1335	13	12	12
q20	350	372	257	257
q21	5587	21	20	20
q22	1116	11	11	11
Total cold run time: 72113 ms
Total hot run time: 1512 ms

----- Round 2, with runtime_filter_mode=off -----
q1	11	10	13	10
q2	21	20	20	20
q3	11	10	11	10
q4	12	11	11	11
q5	11	13	12	12
q6	12	11	11	11
q7	20	19	18	18
q8	10	10	10	10
q9	10	10	11	10
q10	10	11	10	10
q11	19	19	19	19
q12	10	10	10	10
q13	9	9	10	9
q14	10	11	11	11
q15	10	11	10	10
q16	1058	1088	1032	1032
q17	12	12	10	10
q18	10	10	9	9
q19	10	10	10	10
q20	1877	1961	1829	1829
q21	22	19	19	19
q22	11	10	10	10
Total cold run time: 3186 ms
Total hot run time: 3100 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 2757 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit be10a6eb26571d82f106dc59739e5220d89d147a, data reload: false

query1	1088	19	11	11
query2	7141	19	15	15
query3	7637	12	11	11
query4	26563	12	11	11
query5	4465	13	11	11
query6	375	12	11	11
query7	5439	12	11	11
query8	366	18	17	17
query9	9186	12	11	11
query10	728	11	10	10
query11	15945	11	11	11
query12	178	11	11	11
query13	1722	11	10	10
query14	10942	15	14	14
query15	423	11	9	9
query16	7810	11	11	11
query17	2163	11	10	10
query18	2967	11	11	11
query19	249	11	9	9
query20	139	11	10	10
query21	220	11	9	9
query22	4193	11	10	10
query23	33820	20	14	14
query24	10872	14	12	12
query25	758	11	9	9
query26	1828	10	10	10
query27	3371	11	10	10
query28	6252	12	10	10
query29	1740	12	10	10
query30	689	12	10	10
query31	1727	13	12	12
query32	125	11	10	10
query33	1313	12	10	10
query34	1618	830	524	524
query35	998	11	10	10
query36	974	10	9	9
query37	255	10	9	9
query38	3603	10	9	9
query39	1485	753	733	733
query40	322	11	11	11
query41	94	12	10	10
query42	156	12	10	10
query43	492	10	9	9
query44	1323	9	8	8
query45	378	10	9	9
query46	1167	11	10	10
query47	1830	10	9	9
query48	413	10	9	9
query49	1306	11	10	10
query50	772	11	9	9
query51	4000	18	9	9
query52	115	10	10	10
query53	239	12	12	12
query54	782	11	10	10
query55	103	11	11	11
query56	349	10	10	10
query57	1237	9	10	9
query58	378	11	10	10
query59	2676	9	8	8
query60	379	9	10	9
query61	183	9	7	7
query62	837	9	8	8
query63	262	10	10	10
query64	5441	10	11	10
query65	4149	11	9	9
query66	1597	12	11	11
query67	17006	29	10	10
query68	4759	9	9	9
query69	616	11	9	9
query70	1412	9	7	7
query71	615	356	330	330
query72	7673	11	11	11
query73	850	10	10	10
query74	9490	11	9	9
query75	3973	11	10	10
query76	4015	10	9	9
query77	968	11	11	11
query78	9788	11	9	9
query79	1531	9	8	8
query80	774	10	10	10
query81	933	10	10	10
query82	308	10	8	8
query83	332	9	9	9
query84	276	9	8	8
query85	1517	11	10	10
query86	825	10	9	9
query87	3887	10	9	9
query88	2865	15	11	11
query89	406	9	9	9
query90	2216	11	10	10
query91	231	9	9	9
query92	104	9	9	9
query93	1688	9	8	8
query94	2381	10	9	9
query95	497	11	9	9
query96	427	13	11	11
query97	3167	10	9	9
query98	248	227	248	227
query99	1625	9	9	9
Total cold run time: 299750 ms
Total hot run time: 2757 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 0.06 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit be10a6eb26571d82f106dc59739e5220d89d147a, data reload: false

query1	0.07	0.02	0.00
query2	0.11	0.01	0.00
query3	0.27	0.01	0.00
query4	1.76	0.01	0.01
query5	0.29	0.00	0.01
query6	1.66	0.00	0.01
query7	0.05	0.00	0.01
query8	0.07	0.01	0.00
query9	0.64	0.00	0.00
query10	0.60	0.01	0.01
query11	0.17	0.00	0.00
query12	0.15	0.00	0.00
query13	0.64	0.00	0.01
query14	1.09	0.01	0.00
query15	0.88	0.00	0.00
query16	0.41	0.01	0.00
query17	1.12	0.00	0.00
query18	0.22	0.01	0.00
query19	2.29	0.00	0.00
query20	0.02	0.00	0.00
query21	15.92	0.00	0.00
query22	6.77	0.00	0.01
query23	15.77	0.00	0.01
query24	1.36	0.00	0.01
query25	0.22	0.00	0.00
query26	0.17	0.01	0.00
query27	0.12	0.01	0.01
query28	1.30	0.01	0.00
query29	13.21	0.00	0.00
query30	0.31	0.00	0.01
query31	2.25	0.00	0.01
query32	5.89	0.00	0.00
query33	4.42	0.01	0.01
query34	7.61	0.01	0.00
query35	6.28	0.01	0.00
query36	0.69	0.00	0.00
query37	0.11	0.01	0.01
query38	0.08	0.00	0.00
query39	0.05	0.00	0.00
query40	0.20	0.01	0.01
query41	0.10	0.00	0.01
query42	0.06	0.01	0.00
query43	0.05	0.00	0.00
Total cold run time: 95.45 s
Total hot run time: 0.06 s

@vinlee19
Copy link
Contributor Author

run external

Copy link
Contributor

@suxiaogang223 suxiaogang223 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@@ -34,6 +43,7 @@
* at a specific timestamp.
*/
public class IcebergRollbackToTimestampAction extends BaseIcebergAction {
private static final DateTimeFormatter DATETIME_MS_FORMAT = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.SSS");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must with millionsecond?

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 9, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2025

PR approved by at least one committer and no changes requested.

@morningman morningman merged commit b9c48f4 into apache:master Oct 9, 2025
28 of 31 checks passed
github-actions bot pushed a commit that referenced this pull request Oct 9, 2025
### What problem does this PR solve?

This PR extends the OPTIMIZE TABLE framework introduced in #55679 by
implementing additional Iceberg meta procedure actions. Building upon
the foundation established for Iceberg
table optimization, this enhancement adds critical snapshot management
operations that enable more sophisticated Iceberg table maintenance
workflows.


#### New Iceberg Actions Implemented

This PR introduces **5 new Iceberg meta procedure actions**:

1. **`cherrypick_snapshot`** - Cherry-picks changes from a specific
snapshot
2. **`fast_forward`** - Fast-forwards one branch to match another
branch's latest snapshot
3. **`rollback_to_snapshot`** - Rolls back table to a specific snapshot
4. **`rollback_to_timestamp`** - Rolls back table to a specific
timestamp
  5. **`set_current_snapshot`** - Sets a specific snapshot as current

  #### Example Usage
  ```sql
  -- Cherry-pick changes from a snapshot
  OPTIMIZE TABLE iceberg_catalog.db.table
PROPERTIES("action" = "cherrypick_snapshot", "snapshot_id" =
"123456789");
```
```
  -- Fast-forward branch to match another branch
  OPTIMIZE TABLE iceberg_catalog.db.table
PROPERTIES("action" = "fast_forward", "branch" = "feature", "to" =
"main");
```
```
  -- Rollback to specific snapshot
  OPTIMIZE TABLE iceberg_catalog.db.table
PROPERTIES("action" = "rollback_to_snapshot", "snapshot_id" =
"987654321");
```

The regression testing strategy utilizes internal Iceberg catalog operations for table creation, data insertion, and branch/tag management, ensuring test stability and eliminating
  dependencies on external tools like Spark SQL for test data preparation.
yiguolei pushed a commit that referenced this pull request Oct 11, 2025
…tions #56257 (#56732)

Cherry-picked from #56257

Co-authored-by: Petrichor <vinleexiao@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.1-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants