Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++] A way to assert that arrays of a certain type have the validity of the values defined by the validity bitmap #41114

Closed
felipecrv opened this issue Apr 10, 2024 · 1 comment

Comments

@felipecrv
Copy link
Contributor

Describe the enhancement requested

Certain types don't have the validity defined by the bitmap buffer's presence and bit values:

  • NA
  • SPARSE_UNION
  • DENSE_UNION
  • RUN_END_ENCODED

This means that recently inline bool ArrayData::IsValid() had to grow:

bool IsValid(int64_t i) const {
if (buffers[0] != NULLPTR) {
return bit_util::GetBit(buffers[0]->data(), i + offset);
}
const auto type = this->type->id();
if (type == Type::SPARSE_UNION) {
return !internal::IsNullSparseUnion(*this, i);
}
if (type == Type::DENSE_UNION) {
return !internal::IsNullDenseUnion(*this, i);
}
if (type == Type::RUN_END_ENCODED) {
return !internal::IsNullRunEndEncoded(*this, i);
}
return null_count.load() != length;
}

Many type-specific loops in Arrow don't use IsValid and check the bitmap directly instead (good to reduce the branching in hot loops or to do lower-level operations with the bitmaps).

To be future-proof, these loops would have to migrate to using IsValid, but since doing that would be overkill for most cases (unions and REEs are rare), I propose the creation of a function close to the special validity checking functions that checks if a type allows for validity checks to be performed solely through the bitmap:

constexpr bool is_validity_defined_by_bitmap(Type::type id) {
  switch (id) {
    case Type::NA:
    case Type::SPARSE_UNION:
    case Type::DENSE_UNION:
    case Type::RUN_END_ENCODED:
      return false;
    default:
      return true;
  }
}

This means that if a template or function that only looks at the validity bitmap starts being used for these special types we can get static_assert, assert and DCHECK violations.

Component(s)

C++

felipecrv added a commit that referenced this issue Apr 23, 2024
### Rationale for this change

To make it easier to find bugs that are very likely to be silent in the codebas because users rarely use unions and REE types.

### What changes are included in this PR?

Adding the type predicate and two usages in `builder_nested.h`.

### Are these changes tested?

By the compilation process, since they are both `static_asserts`.
* GitHub Issue: #41114

Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
@felipecrv
Copy link
Contributor Author

Issue resolved by pull request 41115
#41115

@felipecrv felipecrv added this to the 17.0.0 milestone Apr 23, 2024
zanmato1984 pushed a commit to zanmato1984/arrow that referenced this issue Apr 24, 2024
…pache#41115)

### Rationale for this change

To make it easier to find bugs that are very likely to be silent in the codebas because users rarely use unions and REE types.

### What changes are included in this PR?

Adding the type predicate and two usages in `builder_nested.h`.

### Are these changes tested?

By the compilation process, since they are both `static_asserts`.
* GitHub Issue: apache#41114

Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
tolleybot pushed a commit to tmct/arrow that referenced this issue May 2, 2024
…pache#41115)

### Rationale for this change

To make it easier to find bugs that are very likely to be silent in the codebas because users rarely use unions and REE types.

### What changes are included in this PR?

Adding the type predicate and two usages in `builder_nested.h`.

### Are these changes tested?

By the compilation process, since they are both `static_asserts`.
* GitHub Issue: apache#41114

Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
vibhatha pushed a commit to vibhatha/arrow that referenced this issue May 25, 2024
…pache#41115)

### Rationale for this change

To make it easier to find bugs that are very likely to be silent in the codebas because users rarely use unions and REE types.

### What changes are included in this PR?

Adding the type predicate and two usages in `builder_nested.h`.

### Are these changes tested?

By the compilation process, since they are both `static_asserts`.
* GitHub Issue: apache#41114

Authored-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant