Skip to content

Conversation

@yiguolei
Copy link
Contributor

@yiguolei yiguolei commented Jun 16, 2025

During graceful exit, async result writer will not exit and hang because it will try to pull data block the queue endlessly.
AsyncResultWriter thread should exit when fragment mgr wants to exit.

Thread 652 (Thread 0x7fd20c1b0700 (LWP 434882) "FragmentMgrAsyn"):
#0 futex_abstimed_wait_cancelable (private=, abstime=0x7fca3442f560, clockid=, expected=0, futex_word=0x61b000c009f8) at ../sysdeps/nptl/futex-internal.h:320
#1 __pthread_cond_wait_common (abstime=0x7fca3442f560, clockid=, mutex=0x61b000c009a8, cond=0x61b000c009d0) at pthread_cond_wait.c:520
#2 __pthread_cond_timedwait (cond=0x61b000c009d0, mutex=0x61b000c009a8, abstime=0x7fca3442f560) at pthread_cond_wait.c:665
#3 0x00005566ce61f350 in __gthread_cond_timedwait (__cond=0x61b000c009d0, __mutex=0x61b000c009a8, __abs_timeout=0x7fca3442f560) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:872
#4 std::__condvar::wait_until (this=0x61b000c009d0, __m=..., __abs_time=...) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_mutex.h:162
#5 std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=, __lock=..., __atime=...) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/condition_variable:222
#6 0x00005566ce61f021 in std::condition_variable::wait_until<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=, __lock=..., __atime=...) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/condition_variable:135
#7 0x000055670b59b0be in std::condition_variable::wait_for<long, std::ratio<1l, 1l> > (this=0x61b000c009d0, __lock=..., __rtime=...) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/condition_variable:163
#8 doris::vectorized::AsyncResultWriter::process_block (this=, state=0x61e001b3c080, profile=) at /root/doris/be/src/vec/sink/writer/async_result_writer.cpp:149
#9 0x000055670b59eb1f in doris::vectorized::AsyncResultWriter::start_writer(doris::RuntimeState*, doris::RuntimeProfile*)::$_0::operator()() const (this=0x60400146b110) at /root/doris/be/src/vec/sink/writer/async_result_writer.cpp:103
#10 std::__invoke_impl<void, doris::vectorized::AsyncResultWriter::start_writer(doris::RuntimeState*, doris::RuntimeProfile*)::$_0&>(std::__invoke_other, doris::vectorized::AsyncResultWriter::start_writer(doris::RuntimeState*, doris::RuntimeProfile*)::$_0&) (__f=...) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61
#11 std::__invoke_r<void, doris::vectorized::AsyncResultWriter::start_writer(doris::RuntimeState*, doris::RuntimeProfile*)::$_0&>(doris::vectorized::AsyncResultWriter::start_writer(doris::RuntimeState*, doris::RuntimeProfile*)::$_0&) (__fn=...) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:111
#12 std::_Function_handler<void (), doris::vectorized::AsyncResultWriter::start_writer(doris::RuntimeState*, doris::RuntimeProfile*)::$_0>::_M_invoke(std::_Any_data const&) (__functor=...) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
#13 0x00005566d2f2270b in doris::ThreadPool::dispatch_thread (this=0x616000415580) at /root/doris/be/src/util/threadpool.cpp:615
#14 0x00005566d2ef973a in doris::Thread::supervise_thread (arg=0x611005af0540) at /root/doris/be/src/util/thread.cpp:468
#15 0x00007fd5c2034609 in start_thread (arg=) at pthread_create.c:477
#16 0x00007fd5c22e1133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@yiguolei
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34832 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ccd0d9f2a62321198e22a0fb1f900727de2207f6, data reload: false

------ Round 1 ----------------------------------
q1	17857	5155	5138	5138
q2	1949	283	193	193
q3	10294	1348	750	750
q4	10226	1051	541	541
q5	7550	2452	2389	2389
q6	185	173	134	134
q7	931	760	642	642
q8	9342	1302	1114	1114
q9	6920	5211	5188	5188
q10	6886	2339	1949	1949
q11	503	304	297	297
q12	369	370	240	240
q13	17772	3750	3147	3147
q14	246	232	223	223
q15	571	483	498	483
q16	440	460	400	400
q17	637	872	413	413
q18	7976	7230	7259	7230
q19	1621	975	580	580
q20	348	357	240	240
q21	4271	3492	2522	2522
q22	1139	1065	1019	1019
Total cold run time: 108033 ms
Total hot run time: 34832 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5111	5061	5092	5061
q2	246	321	223	223
q3	2184	2682	2386	2386
q4	1439	1848	1408	1408
q5	4341	4559	4465	4465
q6	220	170	129	129
q7	1979	1953	1768	1768
q8	2650	2695	2553	2553
q9	7299	7293	7364	7293
q10	3036	3225	2777	2777
q11	602	516	507	507
q12	704	794	640	640
q13	3584	3903	3322	3322
q14	275	290	299	290
q15	539	501	485	485
q16	456	505	461	461
q17	1179	1582	1380	1380
q18	7873	7677	7548	7548
q19	845	886	1047	886
q20	2026	2065	1898	1898
q21	5165	4640	4542	4542
q22	1149	1051	1052	1051
Total cold run time: 52902 ms
Total hot run time: 51073 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 190384 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ccd0d9f2a62321198e22a0fb1f900727de2207f6, data reload: false

query1	1039	413	436	413
query2	6591	1972	1966	1966
query3	6741	241	238	238
query4	26111	23820	23508	23508
query5	4372	653	490	490
query6	334	228	211	211
query7	4634	522	321	321
query8	282	251	228	228
query9	8628	2921	2943	2921
query10	470	375	305	305
query11	15931	15149	14801	14801
query12	171	119	116	116
query13	1678	577	458	458
query14	9739	6364	6254	6254
query15	208	208	179	179
query16	7526	656	511	511
query17	1203	749	593	593
query18	2085	423	337	337
query19	205	215	177	177
query20	134	131	125	125
query21	219	134	125	125
query22	4147	4314	4188	4188
query23	34276	33468	33394	33394
query24	8513	2388	2465	2388
query25	559	480	426	426
query26	1242	280	161	161
query27	2698	526	374	374
query28	4276	2404	2358	2358
query29	770	573	448	448
query30	286	224	214	214
query31	946	848	776	776
query32	75	66	69	66
query33	568	404	327	327
query34	866	887	572	572
query35	809	840	754	754
query36	1043	1062	939	939
query37	118	102	82	82
query38	4228	4193	4079	4079
query39	1506	1478	1431	1431
query40	210	122	116	116
query41	66	63	67	63
query42	158	126	116	116
query43	542	552	516	516
query44	1427	927	910	910
query45	193	182	177	177
query46	894	1057	659	659
query47	1777	1862	1769	1769
query48	406	452	343	343
query49	741	530	415	415
query50	698	720	436	436
query51	4110	4200	4119	4119
query52	124	124	114	114
query53	251	271	199	199
query54	634	619	561	561
query55	90	87	91	87
query56	337	322	305	305
query57	1182	1224	1152	1152
query58	275	279	278	278
query59	2742	2952	2732	2732
query60	351	343	337	337
query61	130	133	139	133
query62	810	740	647	647
query63	238	205	200	200
query64	4312	1090	723	723
query65	4287	4162	4179	4162
query66	1095	423	333	333
query67	16128	16043	15865	15865
query68	8194	971	620	620
query69	496	345	286	286
query70	1251	1214	1191	1191
query71	527	357	325	325
query72	5683	4866	4876	4866
query73	703	669	389	389
query74	9191	9229	8834	8834
query75	3922	3292	2785	2785
query76	3726	1221	787	787
query77	799	397	309	309
query78	10212	10254	9473	9473
query79	2017	878	628	628
query80	610	562	469	469
query81	483	259	228	228
query82	460	135	104	104
query83	268	267	256	256
query84	272	112	98	98
query85	821	437	331	331
query86	395	315	305	305
query87	4565	4432	4449	4432
query88	3892	2505	2488	2488
query89	396	341	296	296
query90	1946	225	216	216
query91	156	151	124	124
query92	83	64	66	64
query93	1676	1034	651	651
query94	675	415	296	296
query95	401	316	307	307
query96	543	591	305	305
query97	2750	2787	2687	2687
query98	247	220	217	217
query99	1648	1422	1296	1296
Total cold run time: 278443 ms
Total hot run time: 190384 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.11 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ccd0d9f2a62321198e22a0fb1f900727de2207f6, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.04
query3	0.25	0.07	0.07
query4	1.59	0.11	0.10
query5	0.44	0.43	0.42
query6	1.17	0.67	0.69
query7	0.02	0.02	0.02
query8	0.05	0.04	0.03
query9	0.60	0.54	0.52
query10	0.59	0.60	0.58
query11	0.16	0.12	0.11
query12	0.16	0.12	0.13
query13	0.62	0.61	0.60
query14	0.81	0.82	0.83
query15	0.91	0.88	0.88
query16	0.39	0.39	0.38
query17	1.05	1.04	1.05
query18	0.24	0.23	0.22
query19	1.95	1.82	1.89
query20	0.01	0.01	0.01
query21	15.42	0.94	0.57
query22	0.76	1.44	0.71
query23	14.75	1.42	0.66
query24	7.32	1.10	0.55
query25	0.51	0.23	0.10
query26	0.64	0.17	0.14
query27	0.06	0.06	0.05
query28	10.08	0.97	0.46
query29	12.78	4.13	3.43
query30	0.26	0.10	0.06
query31	2.83	0.62	0.41
query32	3.24	0.57	0.48
query33	3.09	3.14	3.17
query34	15.81	5.21	4.51
query35	4.61	4.60	4.55
query36	0.66	0.51	0.49
query37	0.10	0.07	0.06
query38	0.06	0.05	0.04
query39	0.03	0.03	0.03
query40	0.17	0.14	0.12
query41	0.08	0.03	0.02
query42	0.04	0.02	0.03
query43	0.04	0.04	0.03
Total cold run time: 104.47 s
Total hot run time: 29.11 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 14.29% (1/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 56.40% (15074/26726)
Line Coverage 45.17% (134778/298412)
Region Coverage 44.28% (67836/153189)
Branch Coverage 38.84% (34808/89612)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 71.43% (5/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.80% (20989/26303)
Line Coverage 72.74% (216941/298254)
Region Coverage 71.00% (127918/180162)
Branch Coverage 64.63% (66189/102418)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 71.43% (5/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.75% (20976/26303)
Line Coverage 72.72% (216904/298254)
Region Coverage 70.96% (127851/180162)
Branch Coverage 64.65% (66216/102418)

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 16, 2025
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@BiteTheDDDDt BiteTheDDDDt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei merged commit 1030fe3 into apache:master Jun 16, 2025
25 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants