Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bugfix](core) child block is shared between operator and node, it should be shared ptr #28106

Merged
merged 9 commits into from
Dec 8, 2023

Conversation

yiguolei
Copy link
Contributor

@yiguolei yiguolei commented Dec 7, 2023

Proposed changes

_child_block in nest loop join , table value function, repeat node will be shared between ExecNode and related operator, but it should not be a unique ptr in operator, it belongs to exec node.

It will double free the block, if operator's close method is not called correctly.

It should be a shared ptr, then it will not core even if the opeartor's close method is not called.

==30143==ERROR: AddressSanitizer: heap-use-after-free on address 0x61b0029a31d8 at pc 0x557c0b9cbf45 bp 0x7f3032029ed0 sp 0x7f3032029ec8
00:03:16  READ of size 8 at 0x61b0029a31d8 thread T284 (WithoutGroupTas)
00:03:16  #0 0x557c0b9cbf44 in std::_Bvector_base<std::allocator >::_M_deallocate() /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_bvector.h:555:23
00:03:16  #1 0x557c0b9cbf44 in std::_Bvector_base<std::allocator >::~_Bvector_base() /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_bvector.h:543:15
00:03:16  #2 0x557c0b9cbf44 in doris::vectorized::Block::~Block() /root/doris/be/src/vec/core/block.h:71:7
00:03:16  #3 0x557c0b9cbf44 in std::default_deletedoris::vectorized::Block::operator()(doris::vectorized::Block*) const /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:85:2
00:03:16  #4 0x557c0b9cbf44 in std::unique_ptr<doris::vectorized::Block, std::default_deletedoris::vectorized::Block >::unique_ptr() /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:361:4
00:03:16  #5 0x557c2b94d409 in doris::pipeline::StatefulOperatordoris::pipeline::NestLoopJoinProbeOperatorBuilder::StatefulOperator() /root/doris/be/src/pipeline/exec/operator.h:441:41
00:03:16  #6 0x557c2b94d409 in void std::destroy_atdoris::pipeline::NestLoopJoinProbeOperator(doris::pipeline::NestLoopJoinProbeOperator*) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:88:15
00:03:16  #7 0x557c2b94d409 in void std::allocator_traits<std::allocatordoris::pipeline::NestLoopJoinProbeOperator >::destroydoris::pipeline::NestLoopJoinProbeOperator(std::allocatordoris::pipeline::NestLoopJoinProbeOperator&, doris::pipeline::NestLoopJoinProbeOperator*) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/alloc_traits.h:533:4
00:03:16  #8 0x557c2b94d409 in std::_Sp_counted_ptr_inplace<doris::pipeline::NestLoopJoinProbeOperator, std::allocatordoris::pipeline::NestLoopJoinProbeOperator, (__gnu_cxx::_Lock_policy)2>::_M_dispose() /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:528:2
00:03:16  #9 0x557c2b8bae0d in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:168:6
00:03:16  #10 0x557c2b8bae0d in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::
__shared_count() /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:702:11
00:03:16  #11 0x557c2b8bae0d in std::__shared_ptr<doris::pipeline::OperatorBase, (__gnu_cxx::_Lock_policy)2>::
__shared_ptr() /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1149:31
00:03:16  #12 0x557c2b8bae0d in void std::destroy_at<std::shared_ptrdoris::pipeline::OperatorBase >(std::shared_ptrdoris::pipeline::OperatorBase) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:88:15
00:03:16  #13 0x557c2b8bae0d in void std::_Destroy<std::shared_ptrdoris::pipeline::OperatorBase >(std::shared_ptrdoris::pipeline::OperatorBase
) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:138:7
00:03:16  #14 0x557c2b8bae0d in void std::_Destroy_aux::__destroy<std::shared_ptrdoris::pipeline::OperatorBase>(std::shared_ptrdoris::pipeline::OperatorBase, std::shared_ptrdoris::pipeline::OperatorBase) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:152:6
00:03:16  #15 0x557c2b8bae0d in void std::_Destroy<std::shared_ptrdoris::pipeline::OperatorBase
>(std::shared_ptrdoris::pipeline::OperatorBase, std::shared_ptrdoris::pipeline::OperatorBase) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:184:7
00:03:16  #16 0x557c2b8bae0d in void std::_Destroy<std::shared_ptrdoris::pipeline::OperatorBase, std::shared_ptrdoris::pipeline::OperatorBase >(std::shared_ptrdoris::pipeline::OperatorBase, std::shared_ptrdoris::pipeline::OperatorBase, std::allocator<std::shared_ptrdoris::pipeline::OperatorBase >&) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/alloc_traits.h:746:7
00:03:16  #17 0x557c2b8bae0d in std::vector<std::shared_ptrdoris::pipeline::OperatorBase, std::allocator<std::shared_ptrdoris::pipeline::OperatorBase > >::vector() /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:680:2
00:03:16  #18 0x557c2b8bae0d in doris::pipeline::Pipeline::Pipeline() /root/doris/be/src/pipeline/pipeline.h:46:7
00:03:16  #19 0x557c2b87bf30 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:168:6
00:03:16  #20 0x557c2b87bf30 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::
__shared_count() /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:702:11
00:03:16  #21 0x557c2b87bf30 in std::__shared_ptr<doris::pipeline::Pipeline, (__gnu_cxx::_Lock_policy)2>::
__shared_ptr() /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1149:31
00:03:16  #22 0x557c2b87bf30 in void std::destroy_at<std::shared_ptrdoris::pipeline::Pipeline >(std::shared_ptrdoris::pipeline::Pipeline
) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:88:15
00:03:16  #23 0x557c2b87bf30 in void std::_Destroy<std::shared_ptrdoris::pipeline::Pipeline >(std::shared_ptrdoris::pipeline::Pipeline) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:138:7
00:03:16  #24 0x557c2b87bf30 in void std::_Destroy_aux::__destroy<std::shared_ptrdoris::pipeline::Pipeline
>(std::shared_ptrdoris::pipeline::Pipeline, std::shared_ptrdoris::pipeline::Pipeline) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:152:6
00:03:16  #25 0x557c2b87bf30 in void std::_Destroy<std::shared_ptrdoris::pipeline::Pipeline>(std::shared_ptrdoris::pipeline::Pipeline, std::shared_ptrdoris::pipeline::Pipeline) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:184:7
00:03:16  #26 0x557c2b87bf30 in void std::_Destroy<std::shared_ptrdoris::pipeline::Pipeline
, std::shared_ptrdoris::pipeline::Pipeline >(std::shared_ptrdoris::pipeline::Pipeline, std::shared_ptrdoris::pipeline::Pipeline, std::allocator<std::shared_ptrdoris::pipeline::Pipeline >&) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/alloc_traits.h:746:7
00:03:16  #27 0x557c2b87bf30 in std::vector<std::shared_ptrdoris::pipeline::Pipeline, std::allocator<std::shared_ptrdoris::pipeline::Pipeline > >::vector() /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:680:2
00:03:16  #28 0x557c2b87bf30 in doris::pipeline::PipelineFragmentContext::PipelineFragmentContext() /root/doris/be/src/pipeline/pipeline_fragment_context.cpp:148:1
00:03:16  #29 0x557c2b9aecd7 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:168:6
00:03:16  #30 0x557c2b9aecd7 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::
__shared_count() /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:702:11
00:03:16  #31 0x557c2b9aecd7 in std::__shared_ptr<doris::pipeline::PipelineFragmentContext, (__gnu_cxx::_Lock_policy)2>::
__shared_ptr() /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1149:31
00:03:16  #32 0x557c2b9aecd7 in doris::pipeline::TaskScheduler::_try_close_task(doris::pipeline::PipelineTask*, doris::pipeline::PipelineTaskState) /root/doris/be/src/pipeline/task_scheduler.cpp:359:1
00:03:16  #33 0x557c2b9ac721 in doris::pipeline::TaskScheduler::_do_work(unsigned long) /root/doris/be/src/pipeline/task_scheduler.cpp:242:13
00:03:16  #34 0x557c0c61772a in doris::ThreadPool::dispatch_thread() /root/doris/be/src/util/threadpool.cpp:533:24
00:03:16  #35 0x557c0c5f825d in std::function<void ()>::operator()() const /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:560:9
00:03:16  #36 0x557c0c5f825d in doris::Thread::supervise_thread(void*) /root/doris/be/src/util/thread.cpp:498:5
00:03:16  #37 0x7f32137f2608 in start_thread /build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:477:8
00:03:16  #38 0x7f3213a9f132 in __clone /build/glibc-SzIz7B/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
00:03:16 
00:03:16  0x61b0029a31d8 is located 1112 bytes inside of 1472-byte region [0x61b0029a2d80,0x61b0029a3340)
00:03:16  freed by thread T284 (WithoutGroupTas) here:
00:03:16  #0 0x557c09fdb80d in operator delete(void*) (/mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be+0xe8c780d) (BuildId: 3c7aebf9dab039d1)
00:03:16  #1 0x557c0c153a61 in doris::ObjectPool::clear() /root/doris/be/src/common/object_pool.h:57:13
00:03:16  #2 0x557c0c153a61 in doris::RuntimeState::RuntimeState() /root/doris/be/src/runtime/runtime_state.cpp:191:16
00:03:16  #3 0x557c2b87b41d in std::default_deletedoris::RuntimeState::operator()(doris::RuntimeState*) const /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:85:2
00:03:16  #4 0x557c2b87b41d in std::__uniq_ptr_impl<doris::RuntimeState, std::default_deletedoris::RuntimeState >::reset(doris::RuntimeState*) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:182:4
00:03:16  #5 0x557c2b87b41d in std::unique_ptr<doris::RuntimeState, std::default_deletedoris::RuntimeState >::reset(doris::RuntimeState*) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:456:7
00:03:16  #6 0x557c2b87b41d in doris::pipeline::PipelineFragmentContext::PipelineFragmentContext() /root/doris/be/src/pipeline/pipeline_fragment_context.cpp:143:24
00:03:16  #7 0x557c2b9aecd7 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:168:6
00:03:16  #8 0x557c2b9aecd7 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::
__shared_count() /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:702:11
00:03:16  #9 0x557c2b9aecd7 in std::__shared_ptr<doris::pipeline::PipelineFragmentContext, (__gnu_cxx::_Lock_policy)2>::
__shared_ptr() /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1149:31
00:03:16  #10 0x557c2b9aecd7 in doris::pipeline::TaskScheduler::_try_close_task(doris::pipeline::PipelineTask*, doris::pipeline::PipelineTaskState) /root/doris/be/src/pipeline/task_scheduler.cpp:359:1
00:03:16  #11 0x557c2b9ac721 in doris::pipeline::TaskScheduler::_do_work(unsigned long) /root/doris/be/src/pipeline/task_scheduler.cpp:242:13
00:03:16  #12 0x557c0c61772a in doris::ThreadPool::dispatch_thread() /root/doris/be/src/util/threadpool.cpp:533:24
00:03:16  #13 0x557c0c5f825d in std::function<void ()>::operator()() const /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:560:9
00:03:16  #14 0x557c0c5f825d in doris::Thread::supervise_thread(void*) /root/doris/be/src/util/thread.cpp:498:5
00:03:16  #15 0x7f32137f2608 in start_thread /build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:477:8

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

Copy link
Contributor

github-actions bot commented Dec 7, 2023

clang-tidy review says "All clean, LGTM! 👍"

@yiguolei
Copy link
Contributor Author

yiguolei commented Dec 7, 2023

run buildall

@yiguolei yiguolei added the p0_c label Dec 7, 2023
@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.78 seconds
stream load tsv: 588 seconds loaded 74807831229 Bytes, about 121 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.3 seconds inserted 10000000 Rows, about 341K ops/s
storage size: 17195212098 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit fba36e59a3ec1879b64e91f13949bff8249f1b58, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4751	4462	4510	4462
q2	360	164	158	158
q3	1461	1255	1182	1182
q4	1120	939	925	925
q5	3162	3190	3179	3179
q6	251	132	128	128
q7	966	494	492	492
q8	2159	2298	2203	2203
q9	6666	6660	6646	6646
q10	3209	3262	3267	3262
q11	326	213	211	211
q12	362	208	208	208
q13	4558	3777	3856	3777
q14	244	213	211	211
q15	569	516	511	511
q16	450	386	394	386
q17	1013	634	547	547
q18	7747	7219	7499	7219
q19	1504	1364	1389	1364
q20	540	333	339	333
q21	3065	2648	2693	2648
q22	367	296	300	296
Total cold run time: 44850 ms
Total hot run time: 40348 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4428	4399	4389	4389
q2	269	161	172	161
q3	3550	3535	3528	3528
q4	2397	2375	2381	2375
q5	5740	5730	5704	5704
q6	238	121	124	121
q7	2366	1869	1852	1852
q8	3516	3517	3522	3517
q9	9087	8962	8996	8962
q10	3898	4002	4005	4002
q11	502	382	382	382
q12	760	611	608	608
q13	4268	3529	3548	3529
q14	284	256	267	256
q15	565	527	515	515
q16	504	443	478	443
q17	1893	1862	1864	1862
q18	8703	8321	8275	8275
q19	1706	1731	1739	1731
q20	2267	1948	1936	1936
q21	6505	6208	6140	6140
q22	515	420	435	420
Total cold run time: 63961 ms
Total hot run time: 60708 ms

@yiguolei
Copy link
Contributor Author

yiguolei commented Dec 7, 2023

run buildall

Copy link
Contributor

github-actions bot commented Dec 7, 2023

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit cb140715c7ebd274d910c3a550f1a5a077d0d323, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4704	4450	4421	4421
q2	390	163	160	160
q3	1449	1277	1250	1250
q4	1107	934	923	923
q5	3173	3143	3188	3143
q6	246	130	130	130
q7	1003	489	489	489
q8	2202	2209	2197	2197
q9	6685	6698	6654	6654
q10	3228	3256	3265	3256
q11	328	197	205	197
q12	351	209	210	209
q13	4547	3771	3817	3771
q14	244	215	215	215
q15	564	532	521	521
q16	444	393	397	393
q17	1002	600	574	574
q18	7546	7083	6876	6876
q19	1513	1373	1387	1373
q20	529	585	338	338
q21	3125	2621	2659	2621
q22	359	289	297	289
Total cold run time: 44739 ms
Total hot run time: 40000 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4381	4371	4396	4371
q2	271	165	173	165
q3	3541	3531	3511	3511
q4	2391	2373	2377	2373
q5	5737	5719	5715	5715
q6	241	122	122	122
q7	2362	1846	1873	1846
q8	3495	3498	3506	3498
q9	9003	8939	8957	8939
q10	3902	3960	4005	3960
q11	497	372	368	368
q12	758	597	589	589
q13	4313	3547	3526	3526
q14	289	252	256	252
q15	561	516	520	516
q16	506	464	504	464
q17	1851	1848	1895	1848
q18	8689	8189	8249	8189
q19	1702	1755	1747	1747
q20	2240	1936	1934	1934
q21	6496	6171	6109	6109
q22	505	419	432	419
Total cold run time: 63731 ms
Total hot run time: 60461 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.45 seconds
stream load tsv: 586 seconds loaded 74807831229 Bytes, about 121 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17194467608 Bytes

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 7, 2023
Copy link
Contributor

github-actions bot commented Dec 7, 2023

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Dec 7, 2023

PR approved by anyone and no changes requested.

Copy link
Contributor

@xinyiZzz xinyiZzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei merged commit abc802b into apache:master Dec 8, 2023
26 of 28 checks passed
@yiguolei yiguolei removed the dev/2.0.4 label Dec 8, 2023
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
…ould be shared ptr (apache#28106)

_child_block in nest loop join , table value function, repeat node will be shared between ExecNode and related operator, but it should not be a unique ptr in operator, it belongs to exec node.

It will double free the block, if operator's close method is not called correctly.

It should be a shared ptr, then it will not core even if the opeartor's close method is not called.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. p0_c reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants