-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change UUID <-> Arrow mapping to (de)serialize to/from 16-byte FixedSizeBinary #103
Conversation
Failing tests? |
Codecov Report
@@ Coverage Diff @@
## main #103 +/- ##
==========================================
- Coverage 84.07% 84.02% -0.05%
==========================================
Files 23 23
Lines 2669 2680 +11
==========================================
+ Hits 2244 2252 +8
- Misses 425 428 +3
Continue to review full report at Codecov.
|
Ah, was missing some methods for FixedSizeListType types it seems. Fixed IIUC. It's a shame that this entails a significant perf hit for
I imagine we could optimize further to lessen/remove the overhead, though I think it'd preferable to do that in a follow-up |
Co-authored-by: SimonDanisch <sdanisch@protonmail.com>
Thanks to @SimonDanisch the UInt128 <-> NTuple{16,UInt8} cast employed here is now non-allocating 🔥 Reduces allocations for
I'll dig into it in a separate PR when I get a chance but AFAICT this is now good to go pending green CI. |
@quinnj This is now ready for review, recommend looking at #104 at the same time (ref #104 (comment), which I can take care of once either is merged) 🙂 thanks! |
* add isbitstype optimized path for FixedSizeList getindex * reuse _unsafe_cast strategy from #103 * rebase + DRY _unsafe_cast
…izeBinary (apache#103) * change UUID <-> Arrow mapping to (de)serialize to/from 16-byte FixedSizeBinary * fix tests * optimize UInt128 <-> NTuple{16,UInt8} casting Co-authored-by: SimonDanisch <sdanisch@protonmail.com> Co-authored-by: SimonDanisch <sdanisch@protonmail.com>
* add isbitstype optimized path for FixedSizeList getindex * reuse _unsafe_cast strategy from apache#103 * rebase + DRY _unsafe_cast
BTW the PR title here says FixedSizeBinary, but what was implemented was FixedSizeList; why is that? Asking since arrow is considering a canonical extension for uuids that uses FixedSizeBinary. |
IIRC back when this PR was authored, Arrow.jl would (de)serialize Julia values with type idk if that is still true nowadays though here's the relevant
|
...to make tables written out by Arrow.jl more portable.
I'm dumb and didn't realize e.g.
pyarrow
doesn't support 128-bit ints, so currently UUID-containing tables will fail to be read by e.g.pa.feather.read_feather(...)
.There's even pyarrow documentation that suggests to Python users that UUIDs can be written as FixedSizeBinary columns, so this seems like a more friendly/correct way to do this.