-
Notifications
You must be signed in to change notification settings - Fork 816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
respect offset in utf8 and list casts #335
Conversation
Codecov Report
@@ Coverage Diff @@
## master #335 +/- ##
==========================================
+ Coverage 82.52% 82.53% +0.01%
==========================================
Files 162 162
Lines 44021 44037 +16
==========================================
+ Hits 36328 36348 +20
+ Misses 7693 7689 -4
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say I am super familiar with this code but the tests look good to me and so did the code I reviewed
Thanks a lot @ritchie46 . I can't fully evaluate this because I am uncertain about the spec here. I am a bit uncertain about the spec here: do we require the offset buffer to start at 0, or is the requirement that the last value minus the first value must be equal to the length of the values buffer? @nevi-me , do you know what is the spec here? I am asking because IPC does not support the |
Yes, I simply respected the API at this point. We throw an error if offsets do not start at zero, but we may be stricter than the spec is. |
This might be a bit tricky. AFAIK the spec doesn't prescribe what happens when a compute kernel interacts with sliced data.
I would interpret "generally" as, 'most implementations will expect to start at 0'. In which case, I would prefer a solution that carries the offset of the array, and starts the offset buffer at 0. I think carrying the offset of the input into the output in this case, is the most performant and compatible solution. Otherwise we'd have to racalculate the offsets to make sure that they start from 0. [0] https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM based on my other comment
Which issue does this PR close?
This is a fix for #334.
Offsets were not taking into account in casts: