-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](join)Consider mark join when computing right_col_idx #50720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 34116 ms |
TPC-DS: Total hot run time: 184729 ms |
ClickBench: Total hot run time: 28.95 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run buildall |
TPC-H: Total hot run time: 33737 ms |
TPC-DS: Total hot run time: 185397 ms |
ClickBench: Total hot run time: 29.08 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
| template <int JoinOpType> | ||
| Status ProcessHashTableProbe<JoinOpType>::do_right_half_mark_join_conjuncts( | ||
| vectorized::Block* output_block) { | ||
| DCHECK(JoinOpType == TJoinOp::RIGHT_SEMI_JOIN || JoinOpType == TJoinOp::RIGHT_ANTI_JOIN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return error status to replace dcheck
| DCHECK_EQ(filter.size(), other_conjunct_filter.size()); | ||
| const auto* other_filter_data = other_conjunct_filter.data(); | ||
| for (size_t i = 0; i != filter.size(); ++i) { | ||
| // null & any(true or false) => null => false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can abstract part which similar with do_mark_join_conjuncts to reuse code ?
| _probe_side_output_timer(parent->_probe_side_output_timer), | ||
| _finish_probe_phase_timer(parent->_finish_probe_phase_timer), | ||
| _right_col_idx((_parent_operator->_is_right_semi_anti && !_have_other_join_conjunct) | ||
| _right_col_idx((_parent_operator->_is_right_semi_anti && !_have_other_join_conjunct && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better directly get right_col_idx from parent operator, and use left+right=intermediate to do double check
BiteTheDDDDt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
run buildall |
ClickBench: Total hot run time: 28.89 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
) ### What problem does this PR solve? If there is a mark join condition, then even in a right semi join, the columns from the left table involved in the mark join condition will still appear in the intermediate tuple. This PR also completes the missing handling logic for right semi joins with mark join condition. ``` ==2126730==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x503004ee02e0 at pc 0x55f4a7d3f354 bp 0x7f6b6b587e70 sp 0x7f6b6b587e68 READ of size 8 at 0x503004ee02e0 thread T1296 (brpc_light) #0 0x55f4a7d3f353 in std::__shared_ptr<doris::vectorized::IDataType const, (__gnu_cxx::_Lock_policy)2>::__shared_ptr(std::__shared_ptr<doris::vectorized::IDataType const, (__gnu_cxx::_Lock_policy)2> const&) /root/ldb_toolchain_robin/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/shared_ptr_base.h:1522:7 apache#1 0x55f4a7d3f01e in std::shared_ptr<doris::vectorized::IDataType const>::shared_ptr(std::shared_ptr<doris::vectorized::IDataType const> const&) /root/ldb_toolchain_robin/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/shared_ptr.h:204:7 apache#2 0x55f4f8aa231c in doris::pipeline::HashJoinProbeOperatorX::prepare(doris::RuntimeState*) /root/doris/be/src/pipeline/exec/hashjoin_probe_operator.cpp:577:33 apache#3 0x55f4fb1e4a28 in doris::pipeline::Pipeline::prepare(doris::RuntimeState*) /root/doris/be/src/pipeline/pipeline.cpp:89:5 apache#4 0x55f4fb089d3f in doris::pipeline::PipelineFragmentContext::prepare(doris::TPipelineFragmentParams const&, doris::ThreadPool*) /root/doris/be/src/pipeline/pipeline_fragment_context.cpp:352:9 apache#5 0x55f4ad615a5c in doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, doris::QuerySource, std::function<void (doris::RuntimeState*, doris::Status*)> const&, doris::TPipelineFragmentParamsList const&) /root/doris/be/src/runtime/fragment_mgr.cpp:855:9 apache#6 0x55f4ad613ca6 in doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, doris::QuerySource, doris::TPipelineFragmentParamsList const&) /root/doris/be/src/runtime/fragment_mgr.cpp:634:16 apache#7 0x55f4ae2e867c in doris::PInternalService::_exec_plan_fragment_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, doris::PFragmentRequestVersion, bool, std::function<void (doris::RuntimeState*, doris::Status*)> const&) /root/doris/be/src/service/internal_service.cpp:613:17 apache#8 0x55f4ae2e4b67 in doris::PInternalService::_exec_plan_fragment_in_pthread(google::protobuf::RpcController*, doris::PExecPlanFragmentRequest const*, doris::PExecPlanFragmentResult*, google::protobuf::Closure*) /root/doris/be/src/service/internal_service.cpp:343:14 apache#9 0x55f4ae31d4e1 in doris::PInternalService::exec_plan_fragment_prepare(google::protobuf::RpcController*, doris::PExecPlanFragmentRequest const*, doris::PExecPlanFragmentResult*, google::protobuf::Closure*)::$_0::operator()() const /root/doris/be/src/service/internal_service.cpp:367:9 apache#10 0x55f4ae31d33e in void std::__invoke_impl<void, doris::PInternalService::exec_plan_fragment_prepare(google::protobuf::RpcController*, doris::PExecPlanFragmentRequest const*, doris::PExecPlanFragmentResult*, google::protobuf::Closure*)::$_0&>(std::__invoke_other, doris::PInternalService::exec_plan_fragment_prepare(google::protobuf::RpcController*, doris::PExecPlanFragmentRequest const*, doris::PExecPlanFragmentResult*, google::protobuf::Closure*)::$_0&) /root/ldb_toolchain_robin/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/invoke.h:61:14 apache#11 0x55f4ae31d27e in std::enable_if<is_invocable_r_v<void, doris::PInternalService::exec_plan_fragment_prepare(google::protobuf::RpcController*, doris::PExecPlanFragmentRequest const*, doris::PExecPlanFragmentResult*, google::protobuf::Closure*)::$_0&>, void>::type std::__invoke_r<void, doris::PInternalService::exec_plan_fragment_prepare(google::protobuf::RpcController*, doris::PExecPlanFragmentRequest const*, doris::PExecPlanFragmentResult*, google::protobuf::Closure*)::$_0&>(doris::PInternalService::exec_plan_fragment_prepare(google::protobuf::RpcController*, doris::PExecPlanFragmentRequest const*, doris::PExecPlanFragmentResult*, google::protobuf::Closure*)::$_0&) /root/ldb_toolchain_robin/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/invoke.h:111:2 apache#12 0x55f4ae31cf55 in std::_Function_handler<void (), doris::PInternalService::exec_plan_fragment_prepare(google::protobuf::RpcController*, doris::PExecPlanFragmentRequest const*, doris::PExecPlanFragmentResult*, google::protobuf::Closure*)::$_0>::_M_invoke(std::_Any_data const&) /root/ldb_toolchain_robin/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_function.h:290:9 apache#13 0x55f4a74175af in std::function<void ()>::operator()() const /root/ldb_toolchain_robin/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_function.h:591:9 apache#14 0x55f4ae3c21c4 in doris::WorkThreadPool<false>::work_thread(int) /root/doris/be/src/util/work_thread_pool.hpp:158:17 apache#15 0x55f4ae3c4cc8 in void std::__invoke_impl<void, void (doris::WorkThreadPool<false>::* const&)(int), doris::WorkThreadPool<false>*&, int&>(std::__invoke_memfun_deref, void (doris::WorkThreadPool<false>::* const&)(int), doris::WorkThreadPool<false>*&, int&) /root/ldb_toolchain_robin/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/invoke.h:74:14 apache#16 0x55f4ae3c4a92 in std::__invoke_result<void (doris::WorkThreadPool<false>::* const&)(int), doris::WorkThreadPool<false>*&, int&>::type std::__invoke<void (doris::WorkThreadPool<false>::* const&)(int), doris::WorkThreadPool<false>*&, int&>(void (doris::WorkThreadPool<false>::* const&)(int), doris::WorkThreadPool<false>*&, int&) /root/ldb_toolchain_robin/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/invoke.h:96:14 apache#17 0x55f4ae3c49f8 in decltype(std::__invoke((*this)._M_pmf, std::forward<doris::WorkThreadPool<false>*&>(fp), std::forward<int&>(fp))) std::_Mem_fn_base<void (doris::WorkThreadPool<false>::*)(int), true>::operator()<doris::WorkThreadPool<false>*&, int&>(doris::WorkThreadPool<false>*&, int&) const /root/ldb_toolchain_robin/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/functional:170:11 apache#18 0x55f4ae3c4942 in void std::__invoke_impl<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)>&, doris::WorkThreadPool<false>*&, int&>(std::__invoke_other, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)>&, doris::WorkThreadPool<false>*&, int&) /root/ldb_toolchain_robin/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/invoke.h:61:14 apache#19 0x55f4ae3c4752 in std::enable_if<is_invocable_r_v<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)>&, doris::WorkThreadPool<false>*&, int&>, void>::type std::__invoke_r<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)>&, doris::WorkThreadPool<false>*&, int&>(std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)>&, doris::WorkThreadPool<false>*&, int&) /root/ldb_toolchain_robin/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/invoke.h:111:2 apache#20 0x55f4ae3c4664 in void std::_Bind_result<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)> (doris::WorkThreadPool<false>*, int)>::__call<void, 0ul, 1ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul>) /root/ldb_toolchain_robin/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/functional:654:11 apache#21 0x55f4ae3c4395 in void std::_Bind_result<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)> (doris::WorkThreadPool<false>*, int)>::operator()<>() /root/ldb_toolchain_robin/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/functional:713:17 apache#22 0x55f4ae3c428e in void std::__invoke_impl<void, std::_Bind_result<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)> (doris::WorkThreadPool<false>*, int)>>(std::__invoke_other, std::_Bind_result<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)> (doris::WorkThreadPool<false>*, int)>&&) /root/ldb_toolchain_robin/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/invoke.h:61:14 apache#23 0x55f4ae3c41ce in std::__invoke_result<std::_Bind_result<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)> (doris::WorkThreadPool<false>*, int)>>::type std::__invoke<std::_Bind_result<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)> (doris::WorkThreadPool<false>*, int)>>(std::_Bind_result<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)> (doris::WorkThreadPool<false>*, int)>&&) /root/ldb_toolchain_robin/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/invoke.h:96:14 apache#24 0x55f4ae3c417b in void std::thread::_Invoker<std::tuple<std::_Bind_result<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)> (doris::WorkThreadPool<false>*, int)>>>::_M_invoke<0ul>(std::_Index_tuple<0ul>) /root/ldb_toolchain_robin/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_thread.h:292:13 apache#25 0x55f4ae3c40f6 in std::thread::_Invoker<std::tuple<std::_Bind_result<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)> (doris::WorkThreadPool<false>*, int)>>>::operator()() /root/ldb_toolchain_robin/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_thread.h:299:11 apache#26 0x55f4ae3c3f34 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::_Bind_result<void, std::_Mem_fn<void (doris::WorkThreadPool<false>::*)(int)> (doris::WorkThreadPool<false>*, int)>>>>::_M_run() /root/ldb_toolchain_robin/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/std_thread.h:244:13 apache#27 0x55f4fea05d2e in execute_native_thread_routine pthread_atfork.c apache#28 0x55f4a7162e0a in asan_thread_start(void*) crtstuff.c apache#29 0x7f7149c421c9 in start_thread (/lib64/libpthread.so.0+0x81c9) (BuildId: 7c4add5c7a885e6ff4ce17867d6a2286e4420eec) apache#30 0x7f714a6318d2 in clone (/lib64/libc.so.6+0x398d2) (BuildId: 4ee3325955e3b55b6805f33959b7cb77745ad625) ```
What problem does this PR solve?
If there is a mark join condition, then even in a right semi join, the columns from the left table involved in the mark join condition will still appear in the intermediate tuple.
This PR also completes the missing handling logic for right semi joins with mark join condition.
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)