-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-35141: [C++] Versions of IsNull/IsValid that don't branch on type #35149
Conversation
Please suggest a better name for these function if you can think of one. |
8506111
to
6c75516
Compare
|
||
template <typename ArrowType> | ||
inline bool IsValidFast(int64_t i) const { | ||
if constexpr (ArrowType::type_id == Type::NA) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add a DCHECK to check that ArrowType
is equal to the real type in the Array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was tempted, but it adds too high of an overhead for this function as it's used from tight loops and is expected to be fully inlined. Even considering that it would be affecting only debug builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we just need care the release performance? An array using outerside ArrowType
without checking is so dangerous...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug perf can make CI run much slower. Imagine changing the code from IsNull
to IsNullFast
and getting a slower build in return. I'm counting on people defaulting to use IsNull
(safe) instead of IsNullFast
. The latter is for people working on kernels which usually do many unsafe things and have more carefully written code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mapleFU note that all the special branches perform runtime type checks, so this is not completely unchecked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, though it's still a bit wired for me. I think we can hear about others idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be more clear: the body of these member functions in the .cc
file, where logging.h
can be used, contains runtime checks.
} else if constexpr (ArrowType::type_id == Type::SPARSE_UNION) {
return !IsNullSparseUnion(i);
} else if constexpr (ArrowType::type_id == Type::DENSE_UNION) {
return !IsNullDenseUnion(i);
} else if constexpr (ArrowType::type_id == Type::RUN_END_ENCODED) {
return !IsNullRunEndEncoded(i);
if (buffers[0] != NULLPTR) { | ||
return bit_util::GetBit(buffers[0]->data(), i + offset); | ||
} | ||
const auto type = this->type->id(); | ||
if (type == Type::SPARSE_UNION) { | ||
return !internal::IsNullSparseUnion(*this, i); | ||
} | ||
if (type == Type::DENSE_UNION) { | ||
} else if (type == Type::DENSE_UNION) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this line of change seems to be complained by clang-tidy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not on my editor. Where did you see this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this check is not enabled on Arrow's .clang-tidy and I don't think putting an else after a return is always bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this specific function, the use of else
is not creating unnecessary indentation levels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I was overwhelmed by this kind of warning in the internal toolchain from my employer. :)
if (buffers[0] != NULLPTR) { | ||
return bit_util::GetBit(buffers[0]->data(), i + offset); | ||
} | ||
return null_count.load() != length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a dumb question: if the validity bitmap does not exist, shouldn't null_count
be either 0 or -1 instead of other values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NA
is a known type that has no validity bitmap and null_count
set to length
. So in theory, this is very possible.
6c75516
to
c8d7891
Compare
c8d7891
to
6bf55b5
Compare
@@ -69,16 +69,37 @@ class ARROW_EXPORT Array { | |||
// a potential inner-branch removal. | |||
if (type_id() == Type::SPARSE_UNION) { | |||
return !internal::IsNullSparseUnion(*data_, i); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change it to if else? I think it's not necessary here
clang-tidy has a check about this style ( https://clang.llvm.org/extra/clang-tidy/checks/readability/else-after-return.html ), although we don't enable this check, I think just return without if else is ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More compact. In sync with the if constexpr
used in the other cases. Arrow's .clang-tidy
doesn't complain about this. No extra indentation levels are introduced because of this.
|
||
template <typename ArrowType> | ||
inline bool IsValidFast(int64_t i) const { | ||
if constexpr (ArrowType::type_id == Type::NA) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, though it's still a bit wired for me. I think we can hear about others idea.
@westonpace what do you think about this change? cc @zeroshade @benibus kernels that call |
I'm skeptical about this. Did you find any kernels that call (this is also handled internally by |
I'm aware, but these don't exist/work for REE and Union arrays (and potentially new formats in the future) as described in I added a fix that required adding branches to
Yes, but only as fallback for the types that don't have validity defined by a single bitmap. For instance, my fix of the For REEs, I wrote a completely custom counting loop. Since I already have a |
REE and Union are two different cases. For REE, it is better to iterate on runs, not logical values (exactly what you did for the "hash_count" kernel).
I'm not sure we care about compile-time guarantees here, since it's just a performance concern. |
The compile-time guarantee being that the generated code has a certain shape affecting performance and binary size. But as I see it, my case is weak. I will rebase, remove the first commit, add some use-cases and close the PR that I might re-open later. |
6bf55b5
to
e80c59b
Compare
Rationale for this change
See #35141.
What changes are included in this PR?
ARROW_EXPORT
IsNull
/IsValid
member functionsIsNullFast<ArrowType>
andIsValidFast<ArrowType>
Are these changes tested?
Yes.
Are there any user-facing changes?
New member functions added to
ArrayData
/Array
/ArraySpan
.