Skip to content

Conversation

@xinyiZzz
Copy link
Contributor

@xinyiZzz xinyiZzz commented Apr 12, 2023

Proposed changes

Issue Number: close #xxx

Problem summary

  1. No check mem tracker limit and no cancel task in mem hook, only in Allocator. This helps in clearer analysis of memory issues and reduces performance loss.
    PODArray/hash table/arena memory allocation will use Allocator.

  2. Optimize mem limit exceeded log printing

  3. Optimize compilation time

Checklist(Required)

  • Does it affect the original behavior
  • Has unit tests been added
  • Has document been added or modified
  • Does it need to update dependencies
  • Is this PR support rollback (If NO, please explain WHY)

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@xinyiZzz
Copy link
Contributor Author

run buildall

@xinyiZzz xinyiZzz marked this pull request as draft April 12, 2023 09:31
@xinyiZzz xinyiZzz marked this pull request as ready for review April 13, 2023 13:36
@xinyiZzz xinyiZzz force-pushed the 202304011_fix_mem_exceed_cancel branch from 4c7c951 to 02f61d8 Compare April 13, 2023 13:36
@github-actions github-actions bot added the area/sql/function Issues or PRs related to the SQL functions label Apr 13, 2023
@xinyiZzz xinyiZzz changed the title [enhancement](memory) Optimize memory limit exceeded behavior [enhancement](memory) Refactor memory limit exceeded behavior Apr 13, 2023
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@xinyiZzz
Copy link
Contributor Author

run buildall

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei merged commit c704351 into apache:master Apr 14, 2023
gnehil pushed a commit to gnehil/doris that referenced this pull request Apr 21, 2023
…#18590)

No check mem tracker limit and no cancel task in mem hook, only in Allocator. This helps in clearer analysis of memory issues and reduces performance loss.
PODArray/hash table/arena memory allocation will use Allocator.

Optimize mem limit exceeded log printing

Optimize compilation time
Reminiscent pushed a commit to Reminiscent/doris that referenced this pull request May 15, 2023
…#18590)

No check mem tracker limit and no cancel task in mem hook, only in Allocator. This helps in clearer analysis of memory issues and reduces performance loss.
PODArray/hash table/arena memory allocation will use Allocator.

Optimize mem limit exceeded log printing

Optimize compilation time
lide-reed pushed a commit that referenced this pull request May 28, 2025
### What problem does this PR solve?

In branch-2.0, we already refractor these codes in pr:
#18590

Deadlock stack:
1、While load, we alloc MemTracker and need lock TrackerGroup.group_lock
```
   NodeChannel::NodeChannel
     _node_channel_tracker = std::make_shared<MemTracker>
       MemTracker::bind_parent
         std::lock_guard<std::mutex> l(mem_tracker_pool[_parent_group_num].group_lock);
         _tracker_group_it = mem_tracker_pool[_parent_group_num].trackers.insert(
                mem_tracker_pool[_parent_group_num].trackers.end(), this);
```

2、but while we try to call std::list::insert, we need alloc
std::_List_node,here new_hook (in file tcmalloc_hook.h) is triggered,
then we lock the same TrackerGroup.group_lock, make it deadlock
```
new_hook
doris::ThreadMemTrackerMgr::consume
  doris::ThreadMemTrackerMgr::flush_untracked_mem<true, true>
    doris::ThreadMemTrackerMgr::exceeded
      doris::MemTrackerLimiter::print_log_usage
         doris::MemTracker::make_group_snapshot
           std::lock_guard<std::mutex> l(mem_tracker_pool[group_num].group_lock);
```

Full stack info:
```
(gdb) bt
#0  0x00007f219772454d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f219771fe9b in _L_lock_883 () from /lib64/libpthread.so.0
#2  0x00007f219771fd68 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000563ca7a8a824 in __gthread_mutex_lock (__mutex=0x563cb18cbc58)
    at /var/local/ldb-toolchain/include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749
#4  std::mutex::lock (this=0x563cb18cbc58) at /var/local/ldb-toolchain/include/c++/11/bits/std_mutex.h:100
#5  std::lock_guard<std::mutex>::lock_guard (__m=..., this=<synthetic pointer>) at /var/local/ldb-toolchain/include/c++/11/bits/std_mutex.h:229
#6  doris::MemTracker::make_group_snapshot (snapshots=0x7f1f7fe06ea0, group_num=<optimized out>, parent_label=...)
    at /data/TCHouse-D-1.2/be/src/runtime/memory/mem_tracker.cpp:115
#7  0x0000563ca7a7eeb4 in doris::MemTrackerLimiter::print_log_usage (this=0x5640ab7956c0, msg=...)
    at /data/TCHouse-D-1.2/be/src/runtime/memory/mem_tracker_limiter.cpp:198
#8  0x0000563ca7a8d1e4 in doris::ThreadMemTrackerMgr::exceeded (this=this@entry=0x563ce6a2c820, size=1048584)
    at /data/TCHouse-D-1.2/be/src/runtime/memory/thread_mem_tracker_mgr.cpp:59
#9  0x0000563ca78cfeb4 in doris::ThreadMemTrackerMgr::flush_untracked_mem<true, true> (this=0x563ce6a2c820)
    at /data/TCHouse-D-1.2/be/src/runtime/memory/thread_mem_tracker_mgr.h:223
#10 doris::ThreadMemTrackerMgr::consume (size=<optimized out>, this=0x563ce6a2c820)
    at /data/TCHouse-D-1.2/be/src/runtime/memory/thread_mem_tracker_mgr.h:188
#11 doris::ThreadMemTrackerMgr::consume (size=<optimized out>, this=0x563ce6a2c820)
    at /data/TCHouse-D-1.2/be/src/runtime/memory/thread_mem_tracker_mgr.h:178
#12 new_hook (ptr=<optimized out>, size=24) at /data/TCHouse-D-1.2/be/src/runtime/memory/tcmalloc_hook.h:39
#13 0x0000563caf3dfa78 in MallocHook::InvokeNewHookSlow (p=p@entry=0x56422d713b40, s=s@entry=24) at src/malloc_hook.cc:498
#14 0x0000563caf55f2c1 in MallocHook::InvokeNewHook (s=24, p=0x56422d713b40) at src/malloc_hook-inl.h:127
#15 tcmalloc::do_allocate_full<tcmalloc::cpp_throw_oom> (size=size@entry=24) at src/tcmalloc.cc:1805
#16 tcmalloc::allocate_full_cpp_throw_oom (size=size@entry=24) at src/tcmalloc.cc:1815
#17 0x0000563caf55f429 in tcmalloc::dispatch_allocate_full<tcmalloc::cpp_throw_oom> (size=24) at src/tcmalloc.cc:1822
#18 0x0000563ca7a8ab9a in __gnu_cxx::new_allocator<std::_List_node<doris::MemTracker*> >::allocate (__n=1, this=0x563cb18cbc40)
    at /var/local/ldb-toolchain/include/c++/11/ext/new_allocator.h:103
#19 std::allocator_traits<std::allocator<std::_List_node<doris::MemTracker*> > >::allocate (__n=1, __a=...)
    at /var/local/ldb-toolchain/include/c++/11/bits/alloc_traits.h:460
#20 std::__cxx11::_List_base<doris::MemTracker*, std::allocator<doris::MemTracker*> >::_M_get_node (this=0x563cb18cbc40)
    at /var/local/ldb-toolchain/include/c++/11/bits/stl_list.h:442
#21 std::__cxx11::list<doris::MemTracker*, std::allocator<doris::MemTracker*> >::_M_create_node<doris::MemTracker*> (this=0x563cb18cbc40)
    at /var/local/ldb-toolchain/include/c++/11/bits/stl_list.h:634
#22 std::__cxx11::list<doris::MemTracker*, std::allocator<doris::MemTracker*> >::emplace<doris::MemTracker*> (__position=..., this=0x563cb18cbc40)
    at /var/local/ldb-toolchain/include/c++/11/bits/list.tcc:92
#23 std::__cxx11::list<doris::MemTracker*, std::allocator<doris::MemTracker*> >::insert (__x=<optimized out>, __position=..., this=0x563cb18cbc40)
    at /var/local/ldb-toolchain/include/c++/11/bits/stl_list.h:1309
#24 doris::MemTracker::bind_parent (this=0x563eff46ad10, parent=<optimized out>) at /data/TCHouse-D-1.2/be/src/runtime/memory/mem_tracker.cpp:79
#25 0x0000563ca7a8b608 in doris::MemTracker::MemTracker (this=this@entry=0x563eff46ad10, label=..., parent=parent@entry=0x0)
    at /data/TCHouse-D-1.2/be/src/runtime/memory/mem_tracker.cpp:66
#26 0x0000563ca7967467 in __gnu_cxx::new_allocator<doris::MemTracker>::construct<doris::MemTracker, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__p=0x563eff46ad10, this=<optimized out>) at /var/local/ldb-toolchain/include/c++/11/ext/new_allocator.h:154
#27 std::allocator_traits<std::allocator<doris::MemTracker> >::construct<doris::MemTracker, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__p=0x563eff46ad10, __a=...) at /var/local/ldb-toolchain/include/c++/11/bits/alloc_traits.h:512
#28 std::_Sp_counted_ptr_inplace<doris::MemTracker, std::allocator<doris::MemTracker>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__a=..., this=0x563eff46ad00)
    at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:519
#29 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<doris::MemTracker, std::allocator<doris::MemTracker>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__a=..., __p=<optimized out>, this=<optimized out>)
    at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:650
#30 std::__shared_ptr<doris::MemTracker, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<doris::MemTracker>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__tag=..., this=<optimized out>)
    at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:1337
#31 std::shared_ptr<doris::MemTracker>::shared_ptr<std::allocator<doris::MemTracker>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__tag=..., this=<optimized out>) at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:409
#32 std::allocate_shared<doris::MemTracker, std::allocator<doris::MemTracker>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > (__a=...) at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:861
#33 std::make_shared<doris::MemTracker, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > ()
    at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:877
#34 doris::stream_load::NodeChannel::NodeChannel (this=this@entry=0x563d0ca7e110, parent=<optimized out>, 
    index_channel=index_channel@entry=0x563d4b43f200, node_id=<optimized out>) at /data/TCHouse-D-1.2/be/src/exec/tablet_sink.cpp:49
#35 0x0000563cac74a82d in doris::stream_load::VNodeChannel::VNodeChannel (this=this@entry=0x563d0ca7e110, parent=<optimized out>, 
    index_channel=index_channel@entry=0x563d4b43f200, node_id=<optimized out>) at /data/TCHouse-D-1.2/be/src/vec/sink/vtablet_sink.cpp:37
#36 0x0000563ca7967eaa in __gnu_cxx::new_allocator<doris::stream_load::VNodeChannel>::construct<doris::stream_load::VNodeChannel, doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> (__p=0x563d0ca7e110, this=<optimized out>)
    at /var/local/ldb-toolchain/include/c++/11/ext/new_allocator.h:154
#37 std::allocator_traits<std::allocator<doris::stream_load::VNodeChannel> >::construct<doris::stream_load::VNodeChannel, doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> (__p=0x563d0ca7e110, __a=...) at /var/local/ldb-toolchain/include/c++/11/bits/alloc_traits.h:512
#38 std::_Sp_counted_ptr_inplace<doris::stream_load::VNodeChannel, std::allocator<doris::stream_load::VNodeChannel>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> (__a=..., this=0x563d0ca7e100)
    at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:519
#39 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<doris::stream_load::VNodeChannel, std::allocator<doris::stream_load::VNodeChannel>, doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> (__a=..., __p=<optimized out>, this=<optimized out>)
    at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:650
#40 std::__shared_ptr<doris::stream_load::VNodeChannel, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<doris::stream_load::VNodeChannel>, doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> (__tag=..., this=<optimized out>)
    at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:1337
#41 std::shared_ptr<doris::stream_load::VNodeChannel>::shared_ptr<std::allocator<doris::stream_load::VNodeChannel>, doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> (__tag=..., this=<optimized out>) at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:409
#42 std::allocate_shared<doris::stream_load::VNodeChannel, std::allocator<doris::stream_load::VNodeChannel>, doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> (__a=...) at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:861
#43 std::make_shared<doris::stream_load::VNodeChannel, doris::stream_load::OlapTableSink*&, doris::stream_load::IndexChannel*, long&> ()
    at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr.h:877
#44 doris::stream_load::IndexChannel::init (this=this@entry=0x563d4b43f200, state=state@entry=0x563e027a3500, tablets=...)
    at /data/TCHouse-D-1.2/be/src/exec/tablet_sink.cpp:705
#45 0x0000563ca79698d8 in doris::stream_load::OlapTableSink::prepare (this=this@entry=0x5644b9efe880, state=state@entry=0x563e027a3500)
    at /var/local/ldb-toolchain/include/c++/11/bits/shared_ptr_base.h:1290
#46 0x0000563cac74db65 in doris::stream_load::VOlapTableSink::prepare (this=0x5644b9efe880, state=0x563e027a3500)
    at /data/TCHouse-D-1.2/be/src/vec/sink/vtablet_sink.cpp:450
#47 0x0000563ca7946c21 in doris::PlanFragmentExecutor::prepare (this=this@entry=0x563f55efd280, request=..., fragments_ctx=<optimized out>)
    at /var/local/ldb-toolchain/include/c++/11/bits/unique_ptr.h:173
#48 0x0000563ca791f17c in doris::FragmentExecState::prepare (this=this@entry=0x563f55efd200, params=...)
--Type <RET> for more, q to quit, c to continue without paging--
    at /data/TCHouse-D-1.2/be/src/runtime/fragment_mgr.cpp:238
#49 0x0000563ca7926629 in doris::FragmentMgr::exec_plan_fragment(doris::TExecPlanFragmentParams const&, std::function<void (doris::PlanFragmentExecutor*)>) (this=this@entry=0x563cb6e93400, params=..., cb=...) at /data/TCHouse-D-1.2/be/src/runtime/fragment_mgr.cpp:720
#50 0x0000563ca7928dab in doris::FragmentMgr::exec_plan_fragment (this=0x563cb6e93400, params=...)
    at /data/TCHouse-D-1.2/be/src/runtime/fragment_mgr.cpp:564
#51 0x0000563ca7aaf507 in doris::PInternalServiceImpl::_exec_plan_fragment_impl (this=this@entry=0x563cb577b880, ser_request=..., 
    version=<optimized out>, compact=<optimized out>) at /data/TCHouse-D-1.2/be/src/service/internal_service.cpp:480
#52 0x0000563ca7aaf703 in doris::PInternalServiceImpl::_exec_plan_fragment_in_pthread (this=0x563cb577b880, controller=<optimized out>, 
    request=0x563d6c5bb9b0, response=0x564203b92ce0, done=0x563d885a60c0) at /data/TCHouse-D-1.2/be/src/service/internal_service.cpp:254
#53 0x0000563ca78d6dbd in std::function<void ()>::operator()() const (this=0x7f1f7fe090c8)
    at /var/local/ldb-toolchain/include/c++/11/bits/std_function.h:560
#54 doris::PriorityThreadPool::work_thread (this=0x563cb577ba38, thread_id=<optimized out>)
    at /data/TCHouse-D-1.2/be/src/util/priority_thread_pool.hpp:145
#55 0x0000563caf526b00 in std::execute_native_thread_routine (__p=0x563cbfc81d10) at ../../../../../libstdc++-v3/src/c++11/thread.cc:82
#56 0x00007f219771dea5 in start_thread () from /lib64/libpthread.so.0
#57 0x00007f2197a309fd in clone () from /lib64/libc.so.6
```

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/sql/function Issues or PRs related to the SQL functions area/vectorization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants