Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-41114: [C++] Add is_validity_defined_by_bitmap() predicate #41115

Merged
merged 5 commits into from
Apr 23, 2024

Conversation

felipecrv
Copy link
Contributor

@felipecrv felipecrv commented Apr 10, 2024

Rationale for this change

To make it easier to find bugs that are very likely to be silent in the codebas because users rarely use unions and REE types.

What changes are included in this PR?

Adding the type predicate and two usages in builder_nested.h.

Are these changes tested?

By the compilation process, since they are both static_asserts.

…alid()

The constexpr is_validity_defined_by_bitmap() predicate guarantees
statically that it's enough to look only at the bitmap for the types
these templates are used with.
@felipecrv felipecrv requested a review from pitrou April 10, 2024 02:37
Copy link

⚠️ GitHub issue #41114 has been automatically assigned in GitHub to PR creator.

@pitrou
Copy link
Member

pitrou commented Apr 10, 2024

Is this conceptually different from internal::HasValidityBitmap?

@felipecrv
Copy link
Contributor Author

Is this conceptually different from internal::HasValidityBitmap?

The definition is the same, but the naming is unclear and ambiguous:

internal::HasValidityBitmap(type) = true iff array instances of this type might have bitmap buffers
Array(Data|Span)?::HasValidityBitmap() = true iff this specific array instance has a bitmap buffer

This reads as if the array.HasValidityBitmap() check is redundant:

static_assert(internal::HasValidityBitmap(ArrayType::type_id));
const uint8_t* validity = array.HasValidityBitmap() ? array.buffers[0].data : NULLPTR;

vs

static_assert(internal::is_validity_defined_by_bitmap(ArrayType::type_id));
const uint8_t* validity = array.HasValidityBitmap() ? array.buffers[0].data : NULLPTR;

@mapleFU
Copy link
Member

mapleFU commented Apr 10, 2024

+1 for this, so it replace MayHaveLogicalNulls() and validity buffer check to just reject the type like "NA", and just check the validity buffer when it might contains physical null?

@felipecrv
Copy link
Contributor Author

+1 for this, so it replace MayHaveLogicalNulls() and validity buffer check to just reject the type like "NA", and just check the validity buffer when it might contains physical null?

In that case, it's better to call MayHaveLogicalNulls() directly. But if you're writing kernels to a subset of types and want to have a local guarantee that accessing the bitmap is enough, you use this check, access the bitmap directly, and never even call IsValid, MayHaveLogicalNulls etc.

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for me

@pitrou
Copy link
Member

pitrou commented Apr 10, 2024

The definition is the same, but the naming is unclear and ambiguous:

Regardless of the name, we shouldn't have two functions doing the same thing. So at least this should be renaming the existing function (but I find your proposition rather wordy).

@felipecrv
Copy link
Contributor Author

The definition is the same, but the naming is unclear and ambiguous:

Regardless of the name, we shouldn't have two functions doing the same thing.

They could evolve in different ways in the future with the addition of new types. It's a coincidence that the intention of both matches today's implementations.

So at least this should be renaming the existing function (but I find your proposition rather wordy).

I also think it's a long name, but I couldn't come up with a shorter one, do you have a name suggestion?

@pitrou
Copy link
Member

pitrou commented Apr 10, 2024

They could evolve in different ways in the future with the addition of new types.

How come? Nobody has ever suggested a different validity representation than the validity bitmap.

I also think it's a long name, but I couldn't come up with a shorter one, do you have a name suggestion?

I would suggest can_have_validity_bitmap or may_have_validity_bitmap.

Copy link
Contributor Author

@felipecrv felipecrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to merge this based on the +1 from @mapleFU and the fact that I acted on all the feedback posted by @pitrou.

@felipecrv felipecrv merged commit a78760f into apache:main Apr 23, 2024
36 checks passed
@felipecrv felipecrv removed the awaiting committer review Awaiting committer review label Apr 23, 2024
@github-actions github-actions bot added the awaiting changes Awaiting changes label Apr 23, 2024
@felipecrv felipecrv deleted the by_bitmap branch April 23, 2024 18:04
Copy link

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit a78760f.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 31 possible false positives for unstable benchmarks that are known to sometimes produce them.

zanmato1984 pushed a commit to zanmato1984/arrow that referenced this pull request Apr 24, 2024
…pache#41115)

### Rationale for this change

To make it easier to find bugs that are very likely to be silent in the codebas because users rarely use unions and REE types.

### What changes are included in this PR?

Adding the type predicate and two usages in `builder_nested.h`.

### Are these changes tested?

By the compilation process, since they are both `static_asserts`.
* GitHub Issue: apache#41114

Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
tolleybot pushed a commit to tmct/arrow that referenced this pull request May 2, 2024
…pache#41115)

### Rationale for this change

To make it easier to find bugs that are very likely to be silent in the codebas because users rarely use unions and REE types.

### What changes are included in this PR?

Adding the type predicate and two usages in `builder_nested.h`.

### Are these changes tested?

By the compilation process, since they are both `static_asserts`.
* GitHub Issue: apache#41114

Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
…pache#41115)

### Rationale for this change

To make it easier to find bugs that are very likely to be silent in the codebas because users rarely use unions and REE types.

### What changes are included in this PR?

Adding the type predicate and two usages in `builder_nested.h`.

### Are these changes tested?

By the compilation process, since they are both `static_asserts`.
* GitHub Issue: apache#41114

Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants