-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix Vector{UInt8}
writing
#419
Conversation
If user wants byte-string, they should use |
CI waiting for #424 |
Meta.largUtf8Start(b) | ||
return Meta.LargeUtf8, Meta.largUtf8End(b), nothing | ||
end | ||
elseif eltype(A) == UInt8 && ArrowTypes.isstringtype(T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the only additional type cathed here is Base.CodeUnits
.
Ideally, we want the first (above) used for long string and this one for short-string, but that's not a well defined concept, so for now we just say, "use CodeUnits if you want output byte-string in Arrow"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "cathed" mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, "caught" (I probably tried to type catched, isn't a word), as in caught by the condition
@quinnj 👀 gentle bump? |
@ericphanson or @baumgold would you be willing/able to review this? |
Looks like some of the tests are failing. Would you mind investigating and fixing? Thanks. |
some tests are failing because this changes both code base, you should look at this monorepo test I added in #424: https://github.com/apache/arrow-julia/actions/runs/4685294460/jobs/8302259625?pr=419 |
I see. Probably we should split this up into 2 smaller PRs: one for ArrowTypes and one for Arrow. Assuming we make a new release of ArrowTypes then in the follow-on Arrow PR we can bump the compat of ArrowTypes to the newly released version. |
making a stand alone PR adding a convenient function that doesn't get used and has no direct test makes 0 sense to me. all the changes needed are in Arrow.jl, the convenient function to the "normal" tests fail just because how tests are set up (which brings us back to question: isn't the whole point of monorepo so that when we have 2 packages we don't have to make rapid release to fix tests for the other package?) |
I’m not sure all the reasons for using this setup in this case but that definitely isn’t always a goal. E.g. https://discourse.julialang.org/t/how-beacon-packages-julia-code-in-a-monorepo/90822. |
So what people think, I should do a separate PR to add convince function?? |
If this is what it takes to keep the ball moving, I would suggest doing so, but I am not a maintainer, just an observer interested to see this package continue getting improved. |
just to be clear, in that case we will be doing:
|
@Moelf, do you mind if I make an alternative PR to fix the original issue here? |
go ahead |
Fixes #411. Alternative to #419. This PR should be compatible with or without the ArrowTypes changes. I think it's fine to do compat things in Arrow like this as long as they don't get out of hand and we can eventually remove them as we bump required ArrowTypes versions and such. The PR consists of not treating `Vector{UInt8}` as the Arrow Binary type, which is meant for "binary string"s. Julia has a pretty good match for that in `Base.CodeUnits`, so instead, we use that to write Binary and `Vector{UInt8}` is treated as a regular List of Primitive UInt8 type.
Ok, PR up: #439. Sorry to be so MIA lately; I've been tied up in some heavy other projects and it's been too hard to context-switch back here. A lot of that work (webstack-related) has wrapped up (mostly), I'm hoping to have more time to help review/fix stuff here. This PR was definitely in the right direction @Moelf; thanks for the contribution. It was a great starting point. Couple of specific points:
|
Fixes #411. Alternative to #419. This PR should be compatible with or without the ArrowTypes changes. I think it's fine to do compat things in Arrow like this as long as they don't get out of hand and we can eventually remove them as we bump required ArrowTypes versions and such. The PR consists of not treating `Vector{UInt8}` as the Arrow Binary type, which is meant for "binary string"s. Julia has a pretty good match for that in `Base.CodeUnits`, so instead, we use that to write Binary and `Vector{UInt8}` is treated as a regular List of Primitive UInt8 type.
fix #411
Setup:
Before:
Note:
data = (; x = [b"12", b"", b"3"])
from Julia will give write out the exact same thing, which is IMO why this was wrong.After: