Skip to content

Conversation

@morningman
Copy link
Contributor

@morningman morningman commented Mar 27, 2025

What problem does this PR solve?

Problem Summary:

2025-03-27 11:35:33,694 ERROR (stateListener|95) [EditLog.loadJournal():1231] Operation Type 142
org.apache.doris.common.DdlException: errCode = 2, detailMessage = Failed to find enough backend for ssd storage medium. When setting dynamic_partition.hot_partition_num>0, the hot partitions will store in ssd. Please check the replication num, replication tag and storage medium.
	at org.apache.doris.common.util.DynamicPartitionUtil.checkReplicaAllocation(DynamicPartitionUtil.java:254) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.common.util.DynamicPartitionUtil.checkDynamicPartition(DynamicPartitionUtil.java:190) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.catalog.ColocateTableIndex.replayModifyReplicaAlloc(ColocateTableIndex.java:914) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.catalog.ColocateTableIndex.replayModifyReplicaAlloc(ColocateTableIndex.java:655) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.catalog.Env.replayJournal(Env.java:2759) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.catalog.Env$JournalObserver.runOneCycle(Env.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [doris-fe.jar:1.2-SNAPSHOT]

We should not throw any exception when checking the properties in replay logic.
This PR skip the checking logic when replay.
But I am not sure how to reproduce this situation, l can just guess that after user modify the colocation property
of a table, but some properties of backends are changed, then this issue may happen.
This PR has been tested by user and it can solve the problem.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morningman morningman added usercase Important user case type label dev/2.1.x dev/3.0.x labels Mar 27, 2025
@morningman morningman changed the title [fix] do not check replica allocation when replay [fix](meta) do not check replica allocation when replay Mar 27, 2025
@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33933 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 45b548260489726d62c3bdb7074d1f58f3f5bf73, data reload: false

------ Round 1 ----------------------------------
q1	25661	5175	4993	4993
q2	2073	285	161	161
q3	10409	1250	685	685
q4	10211	985	514	514
q5	7514	2306	2323	2306
q6	187	162	133	133
q7	905	741	606	606
q8	9297	1225	1045	1045
q9	6891	5142	5187	5142
q10	6804	2313	1920	1920
q11	484	272	255	255
q12	343	354	222	222
q13	17765	3697	3057	3057
q14	236	224	207	207
q15	522	504	493	493
q16	629	608	602	602
q17	549	861	333	333
q18	7595	7265	7171	7171
q19	1214	971	568	568
q20	324	345	191	191
q21	3797	2600	2349	2349
q22	1041	1050	980	980
Total cold run time: 114451 ms
Total hot run time: 33933 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5143	5109	5102	5102
q2	239	324	235	235
q3	2126	2636	2274	2274
q4	1440	1795	1364	1364
q5	4447	4419	4398	4398
q6	216	168	125	125
q7	1968	1912	1727	1727
q8	2608	2605	2546	2546
q9	7281	7208	7031	7031
q10	2989	3181	2737	2737
q11	581	501	486	486
q12	689	745	596	596
q13	3533	3880	3246	3246
q14	283	297	279	279
q15	556	511	494	494
q16	647	675	631	631
q17	1134	1563	1388	1388
q18	7695	7621	7459	7459
q19	811	815	843	815
q20	1998	1977	1813	1813
q21	5197	4767	4700	4700
q22	1105	1047	1057	1047
Total cold run time: 52686 ms
Total hot run time: 50493 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 193259 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 45b548260489726d62c3bdb7074d1f58f3f5bf73, data reload: false

query1	1373	1057	1036	1036
query2	6099	1887	1856	1856
query3	11000	4498	4560	4498
query4	51597	25557	23683	23683
query5	5168	617	496	496
query6	334	188	181	181
query7	4879	505	283	283
query8	308	237	219	219
query9	5725	2508	2521	2508
query10	417	310	256	256
query11	15242	15034	15026	15026
query12	161	112	113	112
query13	1047	525	387	387
query14	10830	6247	6373	6247
query15	198	178	179	178
query16	7099	657	483	483
query17	1101	730	598	598
query18	1583	397	303	303
query19	190	195	159	159
query20	131	131	120	120
query21	208	124	104	104
query22	4491	4466	4560	4466
query23	34080	33464	33381	33381
query24	5824	2464	2468	2464
query25	457	450	415	415
query26	681	285	148	148
query27	1816	481	331	331
query28	2922	2450	2472	2450
query29	582	563	430	430
query30	278	247	197	197
query31	848	895	776	776
query32	69	63	62	62
query33	467	390	314	314
query34	776	846	532	532
query35	785	815	746	746
query36	940	989	915	915
query37	122	107	81	81
query38	4173	4241	4201	4201
query39	1507	1461	1437	1437
query40	206	116	106	106
query41	54	56	50	50
query42	124	113	109	109
query43	494	517	495	495
query44	1327	821	835	821
query45	182	176	169	169
query46	851	1037	648	648
query47	1840	1881	1806	1806
query48	394	458	315	315
query49	724	546	454	454
query50	725	772	442	442
query51	4227	4211	4255	4211
query52	114	107	106	106
query53	236	267	191	191
query54	510	513	437	437
query55	90	87	86	86
query56	321	284	276	276
query57	1153	1191	1136	1136
query58	255	247	254	247
query59	2821	2929	2550	2550
query60	290	283	258	258
query61	134	132	166	132
query62	730	732	690	690
query63	229	191	186	186
query64	1471	1025	704	704
query65	4487	4401	4314	4314
query66	780	396	302	302
query67	15910	15725	15239	15239
query68	7044	905	511	511
query69	534	307	265	265
query70	1184	1071	1086	1071
query71	503	294	263	263
query72	5595	4765	4791	4765
query73	1447	639	366	366
query74	8916	8970	8694	8694
query75	4076	3202	2707	2707
query76	4234	1200	770	770
query77	759	386	288	288
query78	10078	10159	9311	9311
query79	2738	823	645	645
query80	618	521	452	452
query81	483	251	227	227
query82	471	126	96	96
query83	224	177	160	160
query84	293	96	74	74
query85	793	362	312	312
query86	355	301	258	258
query87	4369	4588	4420	4420
query88	3421	2237	2262	2237
query89	403	309	271	271
query90	1972	216	223	216
query91	155	142	117	117
query92	74	59	59	59
query93	1247	1088	587	587
query94	677	423	310	310
query95	358	273	274	273
query96	493	569	275	275
query97	3179	3272	3204	3204
query98	224	209	201	201
query99	1412	1441	1295	1295
Total cold run time: 294516 ms
Total hot run time: 193259 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.8 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 45b548260489726d62c3bdb7074d1f58f3f5bf73, data reload: false

query1	0.04	0.03	0.03
query2	0.12	0.11	0.11
query3	0.26	0.20	0.18
query4	1.59	0.19	0.19
query5	0.59	0.58	0.60
query6	1.19	0.72	0.71
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.58	0.51	0.54
query10	0.57	0.57	0.57
query11	0.16	0.11	0.10
query12	0.15	0.12	0.12
query13	0.61	0.59	0.60
query14	2.79	2.74	2.72
query15	0.90	0.86	0.84
query16	0.38	0.39	0.39
query17	1.05	1.06	1.07
query18	0.21	0.20	0.20
query19	1.89	1.99	1.86
query20	0.01	0.01	0.02
query21	15.36	0.93	0.56
query22	0.76	1.15	0.71
query23	14.91	1.36	0.62
query24	6.90	1.88	0.34
query25	0.30	0.27	0.12
query26	0.65	0.16	0.13
query27	0.06	0.05	0.05
query28	9.02	0.90	0.44
query29	12.52	3.94	3.29
query30	0.24	0.09	0.07
query31	2.83	0.57	0.38
query32	3.23	0.54	0.46
query33	3.16	3.08	3.04
query34	15.55	5.15	4.50
query35	4.56	4.48	4.49
query36	0.65	0.49	0.47
query37	0.09	0.07	0.07
query38	0.05	0.04	0.03
query39	0.03	0.02	0.03
query40	0.17	0.14	0.13
query41	0.09	0.03	0.02
query42	0.04	0.03	0.02
query43	0.04	0.03	0.03
Total cold run time: 104.36 s
Total hot run time: 30.8 s

Copy link
Contributor

@deardeng deardeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@yujun777
Copy link
Contributor

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 28, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morningman morningman merged commit 765296f into apache:master Mar 28, 2025
33 of 35 checks passed
github-actions bot pushed a commit that referenced this pull request Mar 28, 2025
### What problem does this PR solve?

Problem Summary:

```
2025-03-27 11:35:33,694 ERROR (stateListener|95) [EditLog.loadJournal():1231] Operation Type 142
org.apache.doris.common.DdlException: errCode = 2, detailMessage = Failed to find enough backend for ssd storage medium. When setting dynamic_partition.hot_partition_num>0, the hot partitions will store in ssd. Please check the replication num, replication tag and storage medium.
	at org.apache.doris.common.util.DynamicPartitionUtil.checkReplicaAllocation(DynamicPartitionUtil.java:254) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.common.util.DynamicPartitionUtil.checkDynamicPartition(DynamicPartitionUtil.java:190) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.catalog.ColocateTableIndex.replayModifyReplicaAlloc(ColocateTableIndex.java:914) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.catalog.ColocateTableIndex.replayModifyReplicaAlloc(ColocateTableIndex.java:655) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.catalog.Env.replayJournal(Env.java:2759) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.catalog.Env$JournalObserver.runOneCycle(Env.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [doris-fe.jar:1.2-SNAPSHOT]
```

We should not throw any exception when checking the properties in replay
logic.
This PR skip the checking logic when replay.
But I am not sure how to reproduce this situation, l can just guess that
after user modify the colocation property
of a table, but some properties of backends are changed, then this issue
may happen.
This PR has been tested by user and it can solve the problem.
github-actions bot pushed a commit that referenced this pull request Mar 28, 2025
### What problem does this PR solve?

Problem Summary:

```
2025-03-27 11:35:33,694 ERROR (stateListener|95) [EditLog.loadJournal():1231] Operation Type 142
org.apache.doris.common.DdlException: errCode = 2, detailMessage = Failed to find enough backend for ssd storage medium. When setting dynamic_partition.hot_partition_num>0, the hot partitions will store in ssd. Please check the replication num, replication tag and storage medium.
	at org.apache.doris.common.util.DynamicPartitionUtil.checkReplicaAllocation(DynamicPartitionUtil.java:254) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.common.util.DynamicPartitionUtil.checkDynamicPartition(DynamicPartitionUtil.java:190) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.catalog.ColocateTableIndex.replayModifyReplicaAlloc(ColocateTableIndex.java:914) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.catalog.ColocateTableIndex.replayModifyReplicaAlloc(ColocateTableIndex.java:655) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.catalog.Env.replayJournal(Env.java:2759) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.catalog.Env$JournalObserver.runOneCycle(Env.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [doris-fe.jar:1.2-SNAPSHOT]
```

We should not throw any exception when checking the properties in replay
logic.
This PR skip the checking logic when replay.
But I am not sure how to reproduce this situation, l can just guess that
after user modify the colocation property
of a table, but some properties of backends are changed, then this issue
may happen.
This PR has been tested by user and it can solve the problem.
yiguolei pushed a commit that referenced this pull request Mar 29, 2025
### What problem does this PR solve?

Problem Summary:

```
2025-03-27 11:35:33,694 ERROR (stateListener|95) [EditLog.loadJournal():1231] Operation Type 142
org.apache.doris.common.DdlException: errCode = 2, detailMessage = Failed to find enough backend for ssd storage medium. When setting dynamic_partition.hot_partition_num>0, the hot partitions will store in ssd. Please check the replication num, replication tag and storage medium.
	at org.apache.doris.common.util.DynamicPartitionUtil.checkReplicaAllocation(DynamicPartitionUtil.java:254) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.common.util.DynamicPartitionUtil.checkDynamicPartition(DynamicPartitionUtil.java:190) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.catalog.ColocateTableIndex.replayModifyReplicaAlloc(ColocateTableIndex.java:914) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.catalog.ColocateTableIndex.replayModifyReplicaAlloc(ColocateTableIndex.java:655) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.catalog.Env.replayJournal(Env.java:2759) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.catalog.Env$JournalObserver.runOneCycle(Env.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [doris-fe.jar:1.2-SNAPSHOT]
```

We should not throw any exception when checking the properties in replay
logic.
This PR skip the checking logic when replay.
But I am not sure how to reproduce this situation, l can just guess that
after user modify the colocation property
of a table, but some properties of backends are changed, then this issue
may happen.
This PR has been tested by user and it can solve the problem.
yiguolei pushed a commit that referenced this pull request Apr 3, 2025
…49569 (#49604)

Cherry-picked from #49569

Co-authored-by: Mingyu Chen (Rayner) <morningman@163.com>
dataroaring pushed a commit that referenced this pull request Apr 22, 2025
…49569 (#49603)

Cherry-picked from #49569

Co-authored-by: Mingyu Chen (Rayner) <morningman@163.com>
@yiguolei yiguolei mentioned this pull request May 13, 2025
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
### What problem does this PR solve?

Problem Summary:

```
2025-03-27 11:35:33,694 ERROR (stateListener|95) [EditLog.loadJournal():1231] Operation Type 142
org.apache.doris.common.DdlException: errCode = 2, detailMessage = Failed to find enough backend for ssd storage medium. When setting dynamic_partition.hot_partition_num>0, the hot partitions will store in ssd. Please check the replication num, replication tag and storage medium.
	at org.apache.doris.common.util.DynamicPartitionUtil.checkReplicaAllocation(DynamicPartitionUtil.java:254) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.common.util.DynamicPartitionUtil.checkDynamicPartition(DynamicPartitionUtil.java:190) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.catalog.ColocateTableIndex.replayModifyReplicaAlloc(ColocateTableIndex.java:914) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.catalog.ColocateTableIndex.replayModifyReplicaAlloc(ColocateTableIndex.java:655) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.catalog.Env.replayJournal(Env.java:2759) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.catalog.Env$JournalObserver.runOneCycle(Env.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]
	at org.apache.doris.common.util.Daemon.run(Daemon.java:116) [doris-fe.jar:1.2-SNAPSHOT]
```

We should not throw any exception when checking the properties in replay
logic.
This PR skip the checking logic when replay.
But I am not sure how to reproduce this situation, l can just guess that
after user modify the colocation property
of a table, but some properties of backends are changed, then this issue
may happen.
This PR has been tested by user and it can solve the problem.
@gavinchou gavinchou mentioned this pull request Jun 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.10-merged dev/3.0.6-merged reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants