-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] tpcds sf1000 core dump #8815
Comments
I want to reproduce this, but I can't running, you can put the sql number about tpcds |
It should be q77. |
|
It is a mistake to cast ColumnNullable* to ColumnVectorHelper*. UBSAN reports below messages: runtime error: member call on address 0x00002af70680 which does not point to a n object of type 'doris::vectorized::ColumnVectorHelper' The be core dump running tpcds q77 on 1T data set without UBSAN. Fix apache#8815.
runtime error: member call on address 0x00002af70680 which does not point to a |
When I builded doris be with ubsan enabled and enabled vectorization, be core dump at doris::DecimalV2Value::operator long(). It cored because accessing on a non-aligned address by sse. With ubsan enabled, compile generates different assemble code including sse instruction. A sender serializes tuples to a continugous memory area, while a recver just copy it. So we should align each tuple offset to 16 bytes. For compatibility, we should use a config to control it. BTW: with tools like ubsan, asan, tsan we can find bugs more easily, e.g. apache#8815. It is difficult to find the bug without ubsan. Anyway, we should use mordern tools to be more productivity.
When I builded doris be with ubsan enabled and enabled vectorization, be core dump at doris::DecimalV2Value::operator long(). It cored because accessing on a non-aligned address by sse. With ubsan enabled, compile generates different assemble code including sse instruction. A sender serializes tuples to a continugous memory area, while a recver just copy it. So we should align each tuple offset to 16 bytes. For compatibility, we should use a config to control it. BTW: with tools like ubsan, asan, tsan we can find bugs more easily, e.g. apache#8815. It is difficult to find the bug without ubsan. Anyway, we should use mordern tools to be more productivity.
When I builded doris be with ubsan enabled and enabled vectorization, be core dump at doris::DecimalV2Value::operator long(). It cored because accessing on a non-aligned address by sse. With ubsan enabled, compile generates different assemble code including sse instruction. A sender serializes tuples to a continugous memory area, while a recver just copy it. So we should align each tuple offset to 16 bytes. For compatibility, we should use a config to control it. BTW: with tools like ubsan, asan, tsan we can find bugs more easily, e.g. apache#8815. It is difficult to find the bug without ubsan. Anyway, we should use mordern tools to be more productivity.
When I builded doris be with ubsan enabled and enabled vectorization, be core dump at doris::DecimalV2Value::operator long(). It cored because accessing on a non-aligned address by sse. With ubsan enabled, compile generates different assemble code including sse instruction. A sender serializes tuples to a continugous memory area, while a recver just copy it. So we should align each tuple offset to 16 bytes. For compatibility, we should use a config to control it. BTW: with tools like ubsan, asan, tsan we can find bugs more easily, e.g. apache#8815. It is difficult to find the bug without ubsan. Anyway, we should use mordern tools to be more productivity.
…san enabled (#8831) When I builded doris be with ubsan enabled and enabled vectorization, be core dump at doris::DecimalV2Value::operator long(). It cored because accessing on a non-aligned address by sse. With ubsan enabled, compile generates different assemble code including sse instruction. A sender serializes tuples to a contiguous memory area, while a receiver just copy it. So we should align each tuple offset to 16 bytes. For compatibility, we should use a config to control it. BTW: with tools like ubsan, asan, tsan we can find bugs more easily, e.g. #8815. It is difficult to find the bug without ubsan. Anyway, we should use modern tools to be more productive.
Search before asking
Version
dev-1.0.1
What's Wrong?
``
#0 tcmalloc::SLL_Next (t=0x61000000163) at ./src/linked_list.h:45
#1 tcmalloc::SLL_TryPop (list=0x9b9d440, rv=) at ./src/linked_list.h:69
#2 tcmalloc::ThreadCache::FreeList::TryPop (this=0x9b9d440, rv=) at ./src/thread_cache.h:220
#3 tcmalloc::ThreadCache::Allocate (this=0x9b9d3c0, size=48, cl=, oom_handler=) at ./src/thread_cache.h:379
#4 malloc_fast_path<&tcmalloc::cpp_throw_oom> (size=) at src/tcmalloc.cc:1898
#5 tc_new (size=) at src/tcmalloc.cc:2019
#6 0x0000000001dccf81 in std::__cxx11::basic_string<char, std::char_traits, std::allocator >::_M_construct<char const*> (
this=0x7fddcbd738c0, __beg=0x2a3cc080 "doris::vectorized::IColumn::clone_empty() const", __end=)
at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.tcc:219
#7 std::__cxx11::basic_string<char, std::char_traits, std::allocator >::_M_construct_aux<char const*> (this=0x7fddcbd738c0,
__beg=0x2a3cc080 "doris::vectorized::IColumn::clone_empty() const", __end=)
at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:247
#8 std::__cxx11::basic_string<char, std::char_traits, std::allocator >::_M_construct<char const*> (this=0x7fddcbd738c0,
__beg=0x2a3cc080 "doris::vectorized::IColumn::clone_empty() const", __end=)
at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:266
#9 std::__cxx11::basic_string<char, std::char_traits, std::allocator >::basic_string<std::allocator > (this=0x7fddcbd738c0,
__s=0x2a3cc080 "doris::vectorized::IColumn::clone_empty() const", __a=...)
at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:527
#10 boost::core::demangle[abi:cxx11](char const*) (name=)
at /root/regression/incubator-doris/thirdparty/installed/include/boost/core/demangle.hpp:99
#11 0x0000000001dccb06 in boost::stacktrace::detail::to_string_impl_baseboost::stacktrace::detail::to_string_using_backtrace::operator()[abi:cxx11](void const*) (this=this@entry=0x7fddcbd73950, addr=0x1f76451 <doris::vectorized::IColumn::clone_empty() const+17>)
at /root/regression/incubator-doris/thirdparty/installed/include/boost/stacktrace/detail/frame_unwind.ipp:41
#12 0x0000000001dcc948 in boost::stacktrace::detail::to_string[abi:cxx11](boost::stacktrace::frame const*, unsigned long) (frames=0x1b031b00, size=18)
at /root/regression/incubator-doris/thirdparty/installed/include/boost/stacktrace/detail/frame_unwind.ipp:76
#13 0x0000000001dc95a6 in boost::stacktrace::to_string<std::allocatorboost::stacktrace::frame > (bt=...)
at /root/regression/incubator-doris/thirdparty/installed/include/boost/stacktrace/stacktrace.hpp:402
#14 boost::stacktrace::operator<< <char, std::char_traits, std::allocatorboost::stacktrace::frame > (bt=..., os=...)
at /root/regression/incubator-doris/thirdparty/installed/include/boost/stacktrace/stacktrace.hpp:408
#15 doris::signal::(anonymous namespace)::FailureSignalHandler (signal_number=, signal_info=, ucontext=)
at /root/regression/incubator-doris/be/src/common/signal_handler.h:420
#16
#17 tcmalloc::SLL_Next (t=0x61000000163) at ./src/linked_list.h:45
#18 tcmalloc::SLL_TryPop (list=0x9b9d440, rv=) at ./src/linked_list.h:69
#19 tcmalloc::ThreadCache::FreeList::TryPop (this=0x9b9d440, rv=) at ./src/thread_cache.h:220
#20 tcmalloc::ThreadCache::Allocate (this=0x9b9d3c0, size=48, cl=, oom_handler=) at ./src/thread_cache.h:379
#21 malloc_fast_path<&tcmalloc::cpp_throw_oom> (size=) at src/tcmalloc.cc:1898
#22 tc_new (size=, size@entry=40) at src/tcmalloc.cc:2019
#23 0x0000000002537e42 in COWHelper<doris::vectorized::ColumnVectorHelper, doris::vectorized::ColumnVector >::create<>() ()
at /root/regression/incubator-doris/be/src/vec/common/cow.h:413
What You Expected?
does not core dump
How to Reproduce?
WITH ss AS ( SELECT s_store_sk , sum(ss_ext_sales_price) sales , sum(ss_net_profit) profit FROM store_sales , date_dim , store WHERE (ss_sold_date_sk = d_date_sk) AND (d_date BETWEEN CAST('2000-08-23' AS DATE) AND (CAST('2000-08-23' AS DATE) + INTERVAL '30' DAY) ) AND (ss_store_sk = s_store_sk) GROUP BY s_store_sk ) , sr AS ( SELECT s_store_sk , sum(sr_return_amt) returns , sum(sr_net_loss) profit_loss FROM store_returns , date_dim , store WHERE (sr_returned_date_sk = d_date_sk) AND (d_date BETWEEN CAST('2000-08-23' AS DATE) AND (CAST('2000-08-23' AS DATE) + INTERVAL '30' DAY)) AND (sr_store_sk = s_store_sk) GROUP BY s_store_sk ) , cs AS ( SELECT cs_call_center_sk , sum(cs_ext_sales_price) sales , sum(cs_net_profit) profit FROM catalog_sales , date_dim WHERE (cs_sold_date_sk = d_date_sk) AND (d_date BETWEEN CAST('2000-08-23' AS DATE) AND (CAST('2000-08-23' AS DATE) + INTERVAL '30' DAY)) GROUP BY cs_call_center_sk ) , cr AS ( SELECT cr_call_center_sk , sum(cr_return_amount) returns , sum(cr_net_loss) profit_loss FROM catalog_returns , date_dim WHERE (cr_returned_date_sk = d_date_sk) AND (d_date BETWEEN CAST('2000-08-23' AS DATE) AND (CAST('2000-08-23' AS DATE) + INTERVAL '30' DAY)) GROUP BY cr_call_center_sk ) , ws AS ( SELECT wp_web_page_sk , sum(ws_ext_sales_price) sales , sum(ws_net_profit) profit FROM web_sales , date_dim , web_page WHERE (ws_sold_date_sk = d_date_sk) AND (d_date BETWEEN CAST('2000-08-23' AS DATE) AND (CAST('2000-08-23' AS DATE) + INTERVAL '30' DAY)) AND (ws_web_page_sk = wp_web_page_sk) GROUP BY wp_web_page_sk ) , wr AS ( SELECT wp_web_page_sk , sum(wr_return_amt) returns , sum(wr_net_loss) profit_loss FROM web_returns , date_dim , web_page WHERE (wr_returned_date_sk = d_date_sk) AND (d_date BETWEEN CAST('2000-08-23' AS DATE) AND (CAST('2000-08-23' AS DATE) + INTERVAL '30' DAY)) AND (wr_web_page_sk = wp_web_page_sk) GROUP BY wp_web_page_sk ) SELECT channel , id , sum(sales) sales , sum(returns) returns , sum(profit) profit FROM ( SELECT 'store channel' channel , ss.s_store_sk id , sales , COALESCE(returns, 0) returns , (profit - COALESCE(profit_loss, 0)) profit FROM ss LEFT JOIN sr ON (ss.s_store_sk = sr.s_store_sk) UNION ALL SELECT 'catalog channel' channel , cs_call_center_sk id , sales , returns , (profit - profit_loss) profit FROM cs , cr UNION ALL SELECT 'web channel' channel , ws.wp_web_page_sk id , sales , COALESCE(returns, 0) returns , (profit - COALESCE(profit_loss, 0)) profit FROM ws LEFT JOIN wr ON (ws.wp_web_page_sk = wr.wp_web_page_sk) ) x GROUP BY ROLLUP (channel, id) ORDER BY channel ASC, id ASC, sales ASC LIMIT 100
Anything Else?
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: