-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feature](load) new insgestion load #45937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 32378 ms |
|
TeamCity be ut coverage result: |
TPC-DS: Total hot run time: 189506 ms |
ClickBench: Total hot run time: 31.49 s |
|
run buildall |
TPC-H: Total hot run time: 32528 ms |
TPC-DS: Total hot run time: 190941 ms |
|
TeamCity be ut coverage result: |
ClickBench: Total hot run time: 31.55 s |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
### What problem does this PR solve? Problem Summary: Ingestion Load is used to load pre-processed data into doris. Preprocessing refers to writing the result data to an external storage system after the data is processed according to the partitioning, bucketing and aggregation methods defined by the doris table. The preprocessing is completed by the external system, and then the BE reads the data and converts it into segment files and saves it. The basic flow is as follows:  ### Release note [feature](load) new insgestion load
### What problem does this PR solve? Problem Summary: Ingestion Load is used to load pre-processed data into doris. Preprocessing refers to writing the result data to an external storage system after the data is processed according to the partitioning, bucketing and aggregation methods defined by the doris table. The preprocessing is completed by the external system, and then the BE reads the data and converts it into segment files and saves it. The basic flow is as follows:  ### Release note [feature](load) new insgestion load (cherry picked from commit 6580f6b)
### What problem does this PR solve? Problem Summary: Ingestion Load is used to load pre-processed data into doris. Preprocessing refers to writing the result data to an external storage system after the data is processed according to the partitioning, bucketing and aggregation methods defined by the doris table. The preprocessing is completed by the external system, and then the BE reads the data and converts it into segment files and saves it. The basic flow is as follows:  ### Release note [feature](load) new insgestion load (cherry picked from commit 6580f6b)
Problem Summary: Ingestion Load is used to load pre-processed data into doris. Preprocessing refers to writing the result data to an external storage system after the data is processed according to the partitioning, bucketing and aggregation methods defined by the doris table. The preprocessing is completed by the external system, and then the BE reads the data and converts it into segment files and saves it. The basic flow is as follows:  [feature](load) new insgestion load
…55500) ### What problem does this PR solve? Related PR: #45937 Problem Summary: Fix the error case on ingestion load and the core in parquet reader. ==8898==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62f0020603fc at pc 0x55f634e64ded bp 0x7fba0d03c410 sp 0x7fba0d03bbd8 READ of size 4 at 0x62f0020603fc thread T768 (PUSH-9699) #0 0x55f634e64dec in __asan_memcpy (/mnt/hdd01/ci/doris-deploy-branch-3.1-local/be/lib/doris_be+0x39a24dec) (BuildId: 9b04e7f7d3075dac) #1 0x55f634eca93f in std::char_traits::copy(char*, char const*, unsigned long) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/char_traits.h:409:33 #2 0x55f634eca93f in std::__cxx11::basic_string, std::allocator>::_S_copy(char*, char const*, unsigned long) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:351:4 #3 0x55f634eca93f in std::__cxx11::basic_string, std::allocator>::_S_copy_chars(char*, char const*, char const*) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:398:9 #4 0x55f634eca93f in void std::__cxx11::basic_string, std::allocator>::_M_construct(char const*, char const*, std::forward_iterator_tag) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.tcc:225:6 #5 0x55f654a4f74d in void std::__cxx11::basic_string, std::allocator>::_M_construct_aux(char const*, char const*, std::__false_type) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:247:11 #6 0x55f654a4f74d in void std::__cxx11::basic_string, std::allocator>::_M_construct(char const*, char const*) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:266:4 #7 0x55f654a4f74d in std::__cxx11::basic_string, std::allocator>::basic_string(char const*, unsigned long, std::allocator const&) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:513:9 #8 0x55f654a4f74d in doris::vectorized::parse_thrift_footer(std::shared_ptr, doris::vectorized::FileMetaData**, unsigned long*, doris::io::IOContext*) /home/zcp/repo_center/doris_branch-3.1/doris/be/src/vec/exec/format/parquet/parquet_thrift_util.h:55:17
…pache#55500) Related PR: apache#45937 Problem Summary: Fix the error case on ingestion load and the core in parquet reader. ==8898==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62f0020603fc at pc 0x55f634e64ded bp 0x7fba0d03c410 sp 0x7fba0d03bbd8 READ of size 4 at 0x62f0020603fc thread T768 (PUSH-9699) #0 0x55f634e64dec in __asan_memcpy (/mnt/hdd01/ci/doris-deploy-branch-3.1-local/be/lib/doris_be+0x39a24dec) (BuildId: 9b04e7f7d3075dac) apache#1 0x55f634eca93f in std::char_traits::copy(char*, char const*, unsigned long) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/char_traits.h:409:33 apache#2 0x55f634eca93f in std::__cxx11::basic_string, std::allocator>::_S_copy(char*, char const*, unsigned long) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:351:4 apache#3 0x55f634eca93f in std::__cxx11::basic_string, std::allocator>::_S_copy_chars(char*, char const*, char const*) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:398:9 apache#4 0x55f634eca93f in void std::__cxx11::basic_string, std::allocator>::_M_construct(char const*, char const*, std::forward_iterator_tag) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.tcc:225:6 apache#5 0x55f654a4f74d in void std::__cxx11::basic_string, std::allocator>::_M_construct_aux(char const*, char const*, std::__false_type) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:247:11 apache#6 0x55f654a4f74d in void std::__cxx11::basic_string, std::allocator>::_M_construct(char const*, char const*) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:266:4 apache#7 0x55f654a4f74d in std::__cxx11::basic_string, std::allocator>::basic_string(char const*, unsigned long, std::allocator const&) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:513:9 apache#8 0x55f654a4f74d in doris::vectorized::parse_thrift_footer(std::shared_ptr, doris::vectorized::FileMetaData**, unsigned long*, doris::io::IOContext*) /home/zcp/repo_center/doris_branch-3.1/doris/be/src/vec/exec/format/parquet/parquet_thrift_util.h:55:17
Related PR: #45937 branch-3.1: #55500 Problem Summary: Fix the error case on ingestion load and the core in parquet reader. ``` ==8898==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62f0020603fc at pc 0x55f634e64ded bp 0x7fba0d03c410 sp 0x7fba0d03bbd8 READ of size 4 at 0x62f0020603fc thread T768 (PUSH-9699) #0 0x55f634e64dec in __asan_memcpy (/mnt/hdd01/ci/doris-deploy-branch-3.1-local/be/lib/doris_be+0x39a24dec) (BuildId: 9b04e7f7d3075dac) #1 0x55f634eca93f in std::char_traits::copy(char*, char const*, unsigned long) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/char_traits.h:409:33 #2 0x55f634eca93f in std::__cxx11::basic_string, std::allocator>::_S_copy(char*, char const*, unsigned long) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:351:4 #3 0x55f634eca93f in std::__cxx11::basic_string, std::allocator>::_S_copy_chars(char*, char const*, char const*) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:398:9 #4 0x55f634eca93f in void std::__cxx11::basic_string, std::allocator>::_M_construct(char const*, char const*, std::forward_iterator_tag) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.tcc:225:6 #5 0x55f654a4f74d in void std::__cxx11::basic_string, std::allocator>::_M_construct_aux(char const*, char const*, std::__false_type) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:247:11 #6 0x55f654a4f74d in void std::__cxx11::basic_string, std::allocator>::_M_construct(char const*, char const*) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:266:4 #7 0x55f654a4f74d in std::__cxx11::basic_string, std::allocator>::basic_string(char const*, unsigned long, std::allocator const&) /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:513:9 #8 0x55f654a4f74d in doris::vectorized::parse_thrift_footer(std::shared_ptr, doris::vectorized::FileMetaData**, unsigned long*, doris::io::IOContext*) /home/zcp/repo_center/doris_branch-3.1/doris/be/src/vec/exec/format/parquet/parquet_thrift_util.h:55:17 ```
What problem does this PR solve?
Issue Number: close #47932
Related PR: #xxx
Problem Summary:
Ingestion Load is used to load pre-processed data into doris.
Preprocessing refers to writing the result data to an external storage system after the data is processed according to the partitioning, bucketing and aggregation methods defined by the doris table.
The preprocessing is completed by the external system, and then the BE reads the data and converts it into segment files and saves it.
The basic flow is as follows:

Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)