Skip to content

Conversation

@xinyiZzz
Copy link
Contributor

@xinyiZzz xinyiZzz commented Nov 14, 2024

What problem does this PR solve?

Problem Summary:

The representation of NULL columns in Doris is special, which is DataTypeNull<DataTypeNumber::Uint8>. Uint8 uses arrow::BooleanBuilder when serializing into arrow batch, which does not match the expected arrow::NullBuilder.

Fix:

*** Query id: fd32741526804c1e-bc016473fd8f3aa3 ***
*** is nereids: 1 ***
*** tablet id: 0 ***
*** Aborted at 1731327262 (unix time) try "date -d @1731327262" if you are using GNU date ***
*** Current BE git commitID: 653e315ba5 ***
*** SIGSEGV address not mapped to object (@0x100000024) received by PID 1442863 (TID 1443456 OR 0x7f8b8cdea700) from PID 36; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/common/signal_handler.h:421
 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /mnt/disk2/liyifan/doris/jdk-17.0.2/lib/server/libjvm.so
 2# JVM_handle_linux_signal in /mnt/disk2/liyifan/doris/jdk-17.0.2/lib/server/libjvm.so
 3# 0x00007F8CA1F38B50 in /lib64/libc.so.6
 4# 0x000055FC45E5B2D3 in /mnt/disk2/liyifan/doris/doris_2.1/doris/output_run/be/lib/doris_be
 5# arrow::BooleanBuilder::AppendValues(unsigned char const*, long, unsigned char const*) in /mnt/disk2/liyifan/doris/doris_2.1/doris/output_run/be/lib/doris_be
 6# doris::vectorized::DataTypeNumberSerDe<unsigned char>::write_column_to_arrow(doris::vectorized::IColumn const&, doris::vectorized::PODArray<unsigned char, 4096ul, Allocator<false, false, false, DefaultMemoryAllocator>, 15ul, 16ul> const*, arrow::ArrayBuilder*, int, int, cctz::time_zone const&) const at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/vec/data_types/serde/data_type_number_serde.cpp:86
 7# doris::FromBlockConverter::convert(std::shared_ptr<arrow::RecordBatch>*) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/util/arrow/block_convertor.cpp:390
 8# doris::convert_to_arrow_batch(doris::vectorized::Block const&, std::shared_ptr<arrow::Schema> const&, arrow::MemoryPool*, std::shared_ptr<arrow::RecordBatch>*, cctz::time_zone const&) in /mnt/disk2/liyifan/doris/doris_2.1/doris/output_run/be/lib/doris_be
 9# doris::vectorized::VArrowFlightResultWriter::write(doris::vectorized::Block&) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/vec/sink/varrow_flight_result_writer.cpp:76
10# doris::vectorized::VResultSink::send(doris::RuntimeState*, doris::vectorized::Block*, bool) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/vec/sink/vresult_sink.cpp:149
11# doris::PlanFragmentExecutor::open_vectorized_internal() at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/runtime/plan_fragment_executor.cpp:341
12# doris::PlanFragmentExecutor::open() at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/runtime/plan_fragment_executor.cpp:273

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@xinyiZzz
Copy link
Contributor Author

BE UT will be refactored soon.

@xinyiZzz
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.95% (9888/26056)
Line Coverage: 29.15% (82642/283531)
Region Coverage: 28.27% (42468/150220)
Branch Coverage: 24.85% (21542/86676)
Coverage Report: http://coverage.selectdb-in.cc/coverage/30ce839f005bfd2e0736d6ee7232e8b122a7b7e2_30ce839f005bfd2e0736d6ee7232e8b122a7b7e2/report/index.html

@xinyiZzz xinyiZzz force-pushed the 20241113_fix_arrow_null branch from 30ce839 to 7e4941e Compare November 18, 2024 07:04
@xinyiZzz
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.01% (9897/26040)
Line Coverage: 29.18% (82749/283592)
Region Coverage: 28.31% (42490/150070)
Branch Coverage: 24.89% (21553/86598)
Coverage Report: http://coverage.selectdb-in.cc/coverage/7e4941e14ca15fa927b330f5401ca4da3595cd85_7e4941e14ca15fa927b330f5401ca4da3595cd85/report/index.html

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 18, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@wangbo wangbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGBTM

@xinyiZzz xinyiZzz merged commit 1dcd96d into apache:master Nov 19, 2024
github-actions bot pushed a commit that referenced this pull request Nov 19, 2024
…ch (#43929)

### What problem does this PR solve?

Problem Summary:

The representation of NULL columns in Doris is special, which is
`DataTypeNull<DataTypeNumber::Uint8>`. `Uint8` uses
`arrow::BooleanBuilder` when serializing into arrow batch, which does
not match the expected `arrow::NullBuilder`.

Fix:

```
*** Query id: fd32741526804c1e-bc016473fd8f3aa3 ***
*** is nereids: 1 ***
*** tablet id: 0 ***
*** Aborted at 1731327262 (unix time) try "date -d @1731327262" if you are using GNU date ***
*** Current BE git commitID: 653e315 ***
*** SIGSEGV address not mapped to object (@0x100000024) received by PID 1442863 (TID 1443456 OR 0x7f8b8cdea700) from PID 36; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/common/signal_handler.h:421
 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /mnt/disk2/liyifan/doris/jdk-17.0.2/lib/server/libjvm.so
 2# JVM_handle_linux_signal in /mnt/disk2/liyifan/doris/jdk-17.0.2/lib/server/libjvm.so
 3# 0x00007F8CA1F38B50 in /lib64/libc.so.6
 4# 0x000055FC45E5B2D3 in /mnt/disk2/liyifan/doris/doris_2.1/doris/output_run/be/lib/doris_be
 5# arrow::BooleanBuilder::AppendValues(unsigned char const*, long, unsigned char const*) in /mnt/disk2/liyifan/doris/doris_2.1/doris/output_run/be/lib/doris_be
 6# doris::vectorized::DataTypeNumberSerDe<unsigned char>::write_column_to_arrow(doris::vectorized::IColumn const&, doris::vectorized::PODArray<unsigned char, 4096ul, Allocator<false, false, false, DefaultMemoryAllocator>, 15ul, 16ul> const*, arrow::ArrayBuilder*, int, int, cctz::time_zone const&) const at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/vec/data_types/serde/data_type_number_serde.cpp:86
 7# doris::FromBlockConverter::convert(std::shared_ptr<arrow::RecordBatch>*) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/util/arrow/block_convertor.cpp:390
 8# doris::convert_to_arrow_batch(doris::vectorized::Block const&, std::shared_ptr<arrow::Schema> const&, arrow::MemoryPool*, std::shared_ptr<arrow::RecordBatch>*, cctz::time_zone const&) in /mnt/disk2/liyifan/doris/doris_2.1/doris/output_run/be/lib/doris_be
 9# doris::vectorized::VArrowFlightResultWriter::write(doris::vectorized::Block&) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/vec/sink/varrow_flight_result_writer.cpp:76
10# doris::vectorized::VResultSink::send(doris::RuntimeState*, doris::vectorized::Block*, bool) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/vec/sink/vresult_sink.cpp:149
11# doris::PlanFragmentExecutor::open_vectorized_internal() at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/runtime/plan_fragment_executor.cpp:341
12# doris::PlanFragmentExecutor::open() at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/runtime/plan_fragment_executor.cpp:273
```
github-actions bot pushed a commit that referenced this pull request Nov 19, 2024
…ch (#43929)

### What problem does this PR solve?

Problem Summary:

The representation of NULL columns in Doris is special, which is
`DataTypeNull<DataTypeNumber::Uint8>`. `Uint8` uses
`arrow::BooleanBuilder` when serializing into arrow batch, which does
not match the expected `arrow::NullBuilder`.

Fix:

```
*** Query id: fd32741526804c1e-bc016473fd8f3aa3 ***
*** is nereids: 1 ***
*** tablet id: 0 ***
*** Aborted at 1731327262 (unix time) try "date -d @1731327262" if you are using GNU date ***
*** Current BE git commitID: 653e315 ***
*** SIGSEGV address not mapped to object (@0x100000024) received by PID 1442863 (TID 1443456 OR 0x7f8b8cdea700) from PID 36; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/common/signal_handler.h:421
 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /mnt/disk2/liyifan/doris/jdk-17.0.2/lib/server/libjvm.so
 2# JVM_handle_linux_signal in /mnt/disk2/liyifan/doris/jdk-17.0.2/lib/server/libjvm.so
 3# 0x00007F8CA1F38B50 in /lib64/libc.so.6
 4# 0x000055FC45E5B2D3 in /mnt/disk2/liyifan/doris/doris_2.1/doris/output_run/be/lib/doris_be
 5# arrow::BooleanBuilder::AppendValues(unsigned char const*, long, unsigned char const*) in /mnt/disk2/liyifan/doris/doris_2.1/doris/output_run/be/lib/doris_be
 6# doris::vectorized::DataTypeNumberSerDe<unsigned char>::write_column_to_arrow(doris::vectorized::IColumn const&, doris::vectorized::PODArray<unsigned char, 4096ul, Allocator<false, false, false, DefaultMemoryAllocator>, 15ul, 16ul> const*, arrow::ArrayBuilder*, int, int, cctz::time_zone const&) const at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/vec/data_types/serde/data_type_number_serde.cpp:86
 7# doris::FromBlockConverter::convert(std::shared_ptr<arrow::RecordBatch>*) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/util/arrow/block_convertor.cpp:390
 8# doris::convert_to_arrow_batch(doris::vectorized::Block const&, std::shared_ptr<arrow::Schema> const&, arrow::MemoryPool*, std::shared_ptr<arrow::RecordBatch>*, cctz::time_zone const&) in /mnt/disk2/liyifan/doris/doris_2.1/doris/output_run/be/lib/doris_be
 9# doris::vectorized::VArrowFlightResultWriter::write(doris::vectorized::Block&) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/vec/sink/varrow_flight_result_writer.cpp:76
10# doris::vectorized::VResultSink::send(doris::RuntimeState*, doris::vectorized::Block*, bool) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/vec/sink/vresult_sink.cpp:149
11# doris::PlanFragmentExecutor::open_vectorized_internal() at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/runtime/plan_fragment_executor.cpp:341
12# doris::PlanFragmentExecutor::open() at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/runtime/plan_fragment_executor.cpp:273
```
xinyiZzz added a commit to xinyiZzz/incubator-doris that referenced this pull request Nov 19, 2024
…ch (apache#43929)

### What problem does this PR solve?

Problem Summary:

The representation of NULL columns in Doris is special, which is
`DataTypeNull<DataTypeNumber::Uint8>`. `Uint8` uses
`arrow::BooleanBuilder` when serializing into arrow batch, which does
not match the expected `arrow::NullBuilder`.

Fix:

```
*** Query id: fd32741526804c1e-bc016473fd8f3aa3 ***
*** is nereids: 1 ***
*** tablet id: 0 ***
*** Aborted at 1731327262 (unix time) try "date -d @1731327262" if you are using GNU date ***
*** Current BE git commitID: 653e315 ***
*** SIGSEGV address not mapped to object (@0x100000024) received by PID 1442863 (TID 1443456 OR 0x7f8b8cdea700) from PID 36; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/common/signal_handler.h:421
 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /mnt/disk2/liyifan/doris/jdk-17.0.2/lib/server/libjvm.so
 2# JVM_handle_linux_signal in /mnt/disk2/liyifan/doris/jdk-17.0.2/lib/server/libjvm.so
 3# 0x00007F8CA1F38B50 in /lib64/libc.so.6
 4# 0x000055FC45E5B2D3 in /mnt/disk2/liyifan/doris/doris_2.1/doris/output_run/be/lib/doris_be
 5# arrow::BooleanBuilder::AppendValues(unsigned char const*, long, unsigned char const*) in /mnt/disk2/liyifan/doris/doris_2.1/doris/output_run/be/lib/doris_be
 6# doris::vectorized::DataTypeNumberSerDe<unsigned char>::write_column_to_arrow(doris::vectorized::IColumn const&, doris::vectorized::PODArray<unsigned char, 4096ul, Allocator<false, false, false, DefaultMemoryAllocator>, 15ul, 16ul> const*, arrow::ArrayBuilder*, int, int, cctz::time_zone const&) const at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/vec/data_types/serde/data_type_number_serde.cpp:86
 7# doris::FromBlockConverter::convert(std::shared_ptr<arrow::RecordBatch>*) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/util/arrow/block_convertor.cpp:390
 8# doris::convert_to_arrow_batch(doris::vectorized::Block const&, std::shared_ptr<arrow::Schema> const&, arrow::MemoryPool*, std::shared_ptr<arrow::RecordBatch>*, cctz::time_zone const&) in /mnt/disk2/liyifan/doris/doris_2.1/doris/output_run/be/lib/doris_be
 9# doris::vectorized::VArrowFlightResultWriter::write(doris::vectorized::Block&) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/vec/sink/varrow_flight_result_writer.cpp:76
10# doris::vectorized::VResultSink::send(doris::RuntimeState*, doris::vectorized::Block*, bool) at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/vec/sink/vresult_sink.cpp:149
11# doris::PlanFragmentExecutor::open_vectorized_internal() at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/runtime/plan_fragment_executor.cpp:341
12# doris::PlanFragmentExecutor::open() at /mnt/disk2/liyifan/doris/doris_2.1/doris/be/src/runtime/plan_fragment_executor.cpp:273
```
yiguolei pushed a commit that referenced this pull request Nov 19, 2024
morningman pushed a commit that referenced this pull request Nov 30, 2024
### What problem does this PR solve?
Problem Summary:

add regression test for this pr: #43929

TODO:
Support to export data of doris NULL type to orc file format.
github-actions bot pushed a commit that referenced this pull request Nov 30, 2024
### What problem does this PR solve?
Problem Summary:

add regression test for this pr: #43929

TODO:
Support to export data of doris NULL type to orc file format.
github-actions bot pushed a commit that referenced this pull request Nov 30, 2024
### What problem does this PR solve?
Problem Summary:

add regression test for this pr: #43929

TODO:
Support to export data of doris NULL type to orc file format.
xinyiZzz added a commit that referenced this pull request Dec 3, 2024
…to arrow batch #43929 (#44231)

Cherry-picked from #43929

Co-authored-by: Xinyi Zou <zouxinyi@selectdb.com>
deardeng pushed a commit to deardeng/incubator-doris that referenced this pull request Jan 6, 2025
* [pick](branch-2.1) pick apache#38215 (apache#43386)

pick apache#38215

---------

Co-authored-by: Zou Xinyi <zouxinyi@selectdb.com>

* pick apache#43929

* pick apache#43960
@yiguolei yiguolei mentioned this pull request Jan 19, 2025
@gavinchou gavinchou mentioned this pull request Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.4-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants