Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-34315: [C++] Correct is_null kernel for Union and RunEndEncoded logical nulls #35036

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 24 additions & 2 deletions cpp/src/arrow/array/data.cc
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,19 @@ void SetOffsetsForScalar(ArraySpan* span, offset_type* buffer, int64_t value_siz
span->buffers[buffer_index].size = 2 * sizeof(offset_type);
}

template <typename ArrowType>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: To be consistent with REE code elsewhere: RunEndType.

void SetRunForScalar(ArraySpan* span, std::shared_ptr<DataType> type,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FillRunEndsArrayFromScalar? s/type/run_end_type

uint64_t* scratch_space) {
// Create a lenght-1 array with the value 1 for a REE scalar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

length typo

Copy link
Contributor

@felipecrv felipecrv May 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment can be removed after the other changes that clarify the code are made.

using run_type = typename ArrowType::c_type;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_end_type (run_type is ambiguous -- could be the type of the value in the run).

auto buffer = reinterpret_cast<run_type*>(scratch_space);
buffer[0] = static_cast<run_type>(1);
auto data_buffer =
std::make_shared<Buffer>(reinterpret_cast<uint8_t*>(buffer), sizeof(run_type));
auto data = ArrayData::Make(type, 1, {nullptr, data_buffer}, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::move(type) and std::move(data_buffer)

span->child_data[0].SetMembers(*data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better if the caller passes this->child_data[0] to this instead of the span. Making it obvious at the callsite how both child_data are filled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could derive the scratch_space parameter from that as well: making you use the child_data[0].scratch_space instead of the parent's scratch_space.

}

int GetNumBuffers(const DataType& type) {
switch (type.id()) {
case Type::NA:
Expand Down Expand Up @@ -430,8 +443,17 @@ void ArraySpan::FillFromScalar(const Scalar& value) {
const auto& scalar = checked_cast<const RunEndEncodedScalar&>(value);
this->child_data.resize(2);
auto& run_end_type = scalar.run_end_type();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: style

Suggested change
auto& run_end_type = scalar.run_end_type();
const auto& run_end_type = scalar.run_end_type();

auto run_end = MakeScalar(run_end_type, 1).ValueOrDie();
this->child_data[0].FillFromScalar(*run_end);
switch (run_end_type->id()) {
case Type::INT16:
SetRunForScalar<Int16Type>(this, run_end_type, this->scratch_space);
break;
case Type::INT32:
SetRunForScalar<Int32Type>(this, run_end_type, this->scratch_space);
break;
default:
DCHECK_EQ(run_end_type->id(), Type::INT64);
SetRunForScalar<Int64Type>(this, run_end_type, this->scratch_space);
Copy link
Contributor

@felipecrv felipecrv May 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're getting that stack trace you posted above from this new implementation?

I think this will be less risky if you use the this->child_data[0].scratch_space instead of this->scratch_space for the run-ends array buffer. I think that's the contract: an ArraySpan is free to use the scratch_space it directly owns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're getting that stack trace you posted above from this new implementation?

No, that was with the previous implementation. This last commit was to fix that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But thanks for the comments! Will clean-up

}
this->child_data[1].FillFromScalar(*scalar.value);
} else if (type_id == Type::EXTENSION) {
// Pass through storage
Expand Down