-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](parquet)fix when hive_parquet_use_column_names=false && read partition tb cause be core. #49966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix](parquet)fix when hive_parquet_use_column_names=false && read partition tb cause be core. #49966
Conversation
…rtition tb cause be core.
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 34324 ms |
TPC-DS: Total hot run time: 193969 ms |
ClickBench: Total hot run time: 31.25 s |
|
run buildall |
TPC-H: Total hot run time: 34216 ms |
TPC-DS: Total hot run time: 185219 ms |
ClickBench: Total hot run time: 31.22 s |
morningman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
…rtition tb cause be core. (#49966) ### What problem does this PR solve? related pr : #38432 Problem Summary: when you query hive parquet format partition table, and `set hive_parquet_use_column_names = false`, maybe you will get : ``` *** SIGABRT unknown detail explain (@0x2f59de) received by PID 3103198 (TID 3110278 OR 0x7f51c8e63640) from PID 3103198; stack trace: *** 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421 1# 0x00007F55DFB45520 in /lib/x86_64-linux-gnu/libc.so.6 2# pthread_kill at ./nptl/pthread_kill.c:89 3# raise at ../sysdeps/posix/raise.c:27 4# abort at ./stdlib/abort.c:81 5# __gnu_cxx::__verbose_terminate_handler() [clone .cold] at ../../../../libstdc++-v3/libsupc++/vterminate.cc:75 6# __cxxabiv1::__terminate(void (*)()) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48 7# 0x000055C8BD4E2041 in /mnt/disk1/doris-clusters/doris-master/output/be/lib/doris_be 8# 0x000055C8BD4E2194 in /mnt/disk1/doris-clusters/doris-master/output/be/lib/doris_be 9# 0x000055C8BD4E2586 in /mnt/disk1/doris-clusters/doris-master/output/be/lib/doris_be 10# std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.tcc:265 11# doris::vectorized::ParquetReader::get_next_block(doris::vectorized::Block*, unsigned long*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/parquet/vparquet_reader.cpp:586 ```` The reason is that when `get_next_block` replaces the column name, data out of bounds occurs.
…rtition tb cause be core. (#49966) ### What problem does this PR solve? related pr : #38432 Problem Summary: when you query hive parquet format partition table, and `set hive_parquet_use_column_names = false`, maybe you will get : ``` *** SIGABRT unknown detail explain (@0x2f59de) received by PID 3103198 (TID 3110278 OR 0x7f51c8e63640) from PID 3103198; stack trace: *** 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421 1# 0x00007F55DFB45520 in /lib/x86_64-linux-gnu/libc.so.6 2# pthread_kill at ./nptl/pthread_kill.c:89 3# raise at ../sysdeps/posix/raise.c:27 4# abort at ./stdlib/abort.c:81 5# __gnu_cxx::__verbose_terminate_handler() [clone .cold] at ../../../../libstdc++-v3/libsupc++/vterminate.cc:75 6# __cxxabiv1::__terminate(void (*)()) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48 7# 0x000055C8BD4E2041 in /mnt/disk1/doris-clusters/doris-master/output/be/lib/doris_be 8# 0x000055C8BD4E2194 in /mnt/disk1/doris-clusters/doris-master/output/be/lib/doris_be 9# 0x000055C8BD4E2586 in /mnt/disk1/doris-clusters/doris-master/output/be/lib/doris_be 10# std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.tcc:265 11# doris::vectorized::ParquetReader::get_next_block(doris::vectorized::Block*, unsigned long*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/parquet/vparquet_reader.cpp:586 ```` The reason is that when `get_next_block` replaces the column name, data out of bounds occurs.
…rtition tb cause be core. (apache#49966) ### What problem does this PR solve? related pr : apache#38432 Problem Summary: when you query hive parquet format partition table, and `set hive_parquet_use_column_names = false`, maybe you will get : ``` *** SIGABRT unknown detail explain (@0x2f59de) received by PID 3103198 (TID 3110278 OR 0x7f51c8e63640) from PID 3103198; stack trace: *** 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421 1# 0x00007F55DFB45520 in /lib/x86_64-linux-gnu/libc.so.6 2# pthread_kill at ./nptl/pthread_kill.c:89 3# raise at ../sysdeps/posix/raise.c:27 4# abort at ./stdlib/abort.c:81 5# __gnu_cxx::__verbose_terminate_handler() [clone .cold] at ../../../../libstdc++-v3/libsupc++/vterminate.cc:75 6# __cxxabiv1::__terminate(void (*)()) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48 7# 0x000055C8BD4E2041 in /mnt/disk1/doris-clusters/doris-master/output/be/lib/doris_be 8# 0x000055C8BD4E2194 in /mnt/disk1/doris-clusters/doris-master/output/be/lib/doris_be 9# 0x000055C8BD4E2586 in /mnt/disk1/doris-clusters/doris-master/output/be/lib/doris_be 10# std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.tcc:265 11# doris::vectorized::ParquetReader::get_next_block(doris::vectorized::Block*, unsigned long*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/parquet/vparquet_reader.cpp:586 ```` The reason is that when `get_next_block` replaces the column name, data out of bounds occurs.
…rtition tb cause be core. (apache#49966) ### What problem does this PR solve? related pr : apache#38432 Problem Summary: when you query hive parquet format partition table, and `set hive_parquet_use_column_names = false`, maybe you will get : ``` *** SIGABRT unknown detail explain (@0x2f59de) received by PID 3103198 (TID 3110278 OR 0x7f51c8e63640) from PID 3103198; stack trace: *** 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421 1# 0x00007F55DFB45520 in /lib/x86_64-linux-gnu/libc.so.6 2# pthread_kill at ./nptl/pthread_kill.c:89 3# raise at ../sysdeps/posix/raise.c:27 4# abort at ./stdlib/abort.c:81 5# __gnu_cxx::__verbose_terminate_handler() [clone .cold] at ../../../../libstdc++-v3/libsupc++/vterminate.cc:75 6# __cxxabiv1::__terminate(void (*)()) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48 7# 0x000055C8BD4E2041 in /mnt/disk1/doris-clusters/doris-master/output/be/lib/doris_be 8# 0x000055C8BD4E2194 in /mnt/disk1/doris-clusters/doris-master/output/be/lib/doris_be 9# 0x000055C8BD4E2586 in /mnt/disk1/doris-clusters/doris-master/output/be/lib/doris_be 10# std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.tcc:265 11# doris::vectorized::ParquetReader::get_next_block(doris::vectorized::Block*, unsigned long*, bool*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/exec/format/parquet/vparquet_reader.cpp:586 ```` The reason is that when `get_next_block` replaces the column name, data out of bounds occurs.
What problem does this PR solve?
related pr : #38432
Problem Summary:
when you query hive parquet format partition table, and
set hive_parquet_use_column_names = false, maybe you will get :The reason is that when
get_next_blockreplaces the column name, data out of bounds occurs.Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)