-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-35581: [C++] Store offsets in scalars #36018
Conversation
cpp/CMakePresets.json
Outdated
@@ -11,6 +11,7 @@ | |||
"hidden": true, | |||
"generator": "Ninja", | |||
"cacheVariables": { | |||
"CMAKE_EXPORT_COMPILE_COMMANDS": "ON", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to merge this and generate the huge file in CI builds, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's less than a megabyte on my machine; I think it's not expensive and it's worthwhile to enable clang tools by default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the generated file is not huge, but I also think these presets should enable just what's necessary for development. They serve as documentation and starting points for users. It's easy to enable additional options on top of an existing preset, AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But which tools need this in CI? I use this file on my machine for LSP on my editor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The presets aren't just for CI builds, as @pitrou says it's a starting point for users. I would say this is an easy and useful default to provide for users- myself included. I'll move this to a different issue, though, since it's not directly pertinent to offset storage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM on the principle. |
There are CI failures though. |
This PR should probably wait until #35036 is merged |
077168a
to
52bfbab
Compare
It's fine for this to be merged first, will update my PR later afterwards then (since I am using the scratch space in the other PR, fixing that first here also make sense) |
64ee6c4
to
c2f1e40
Compare
@pitrou PTAL |
@github-actions crossbow submit -g cpp |
Revision: 4ea421a Submitted crossbow builds: ursacomputing/crossbow @ actions-8dcf3ee3c6 |
case Type::RUN_END_ENCODED: | ||
return 0; | ||
return 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merged a bit quickly, why does RUN_END_ENCODED
need one buffer here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many places in the codebase assume that buffers.size() >= 1
, even if buffers[0] == nullptr
. When I added test cases which exercised REE scalars those places segfaulted. I thought that requiring buffers.size() >= 1
for REE (as we do for union) was the most expeditious fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@felipecrv What do you think here? Should we require a one-element buffers
vector for REE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RUN_END_ENCODED
doesn't need any buffer but NA
also does not and we return 1
here. 🤔
It's a "once you start lying you can't stop lying" kind of problem for GetNumBuffers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, this was the only place in the codebase which didn't already give REE at least one buffer; the constructors, the builder, ... already did so
ARROW_CHECK_LE(off, length) << "Slice offset (" << off | ||
<< ") greater than array length (" << length << ")"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this has to always perform int to string conversion. Even when the check passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't. The operator<<
is only invoked when the assertion fails. Though operator precedence makes me slightly uneasy:
#define ARROW_CHECK_OR_LOG(condition, level) \
ARROW_PREDICT_TRUE(condition) \
? ARROW_IGNORE_EXPR(0) \
: ::arrow::util::Voidify() & ARROW_LOG(level) << " Check failed: " #condition " "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could actually add a test for that, IMHO.
After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 96ac514. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them. |
### Rationale for this change Bring back the problematic test case of random `case_when` on union(bool, string) type. This case used to fail. However #36018 already addressed the issue. More information about how it used to fail, please refer to #15192 (comment). ### What changes are included in this PR? Bring back the test code. ### Are these changes tested? Yes, the change is the test. ### Are there any user-facing changes? No. * Closes: #15192 Authored-by: zanmato <zanmato1984@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…pache#39308) ### Rationale for this change Bring back the problematic test case of random `case_when` on union(bool, string) type. This case used to fail. However apache#36018 already addressed the issue. More information about how it used to fail, please refer to apache#15192 (comment). ### What changes are included in this PR? Bring back the test code. ### Are these changes tested? Yes, the change is the test. ### Are there any user-facing changes? No. * Closes: apache#15192 Authored-by: zanmato <zanmato1984@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…pache#39308) ### Rationale for this change Bring back the problematic test case of random `case_when` on union(bool, string) type. This case used to fail. However apache#36018 already addressed the issue. More information about how it used to fail, please refer to apache#15192 (comment). ### What changes are included in this PR? Bring back the test code. ### Are these changes tested? Yes, the change is the test. ### Are there any user-facing changes? No. * Closes: apache#15192 Authored-by: zanmato <zanmato1984@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
ArraySpan contained scratch space inside itself for storing offsets when viewing a scalar as a length=1 array. This could lead to dangling pointers in copies of the ArraySpan since copies' pointers will always refer to the original's scratch space, which may have been destroyed.
This patch moves that scratch space into Scalar itself, which is more conformant to the contract of a view since the scratch space will be valid as long as the Scalar is alive, regardless of how ArraySpans are copied.