-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-264: File format #123
Conversation
ec856f0
to
b426e9f
Compare
b426e9f
to
b0bf6bc
Compare
A version that works for flat schemas. |
this is getting close. |
Cool, anything I can do to help? I'll try to find some time to work on the C++ side of this so we can get a passing build with the new metadata and possibly also the file layout. We can use the existing IPC code to make things simpler I haven't looked yet, but we should align each of the buffer writes in the file layout at word boundaries -- the data buffers should already be padded / aligned but I'm not sure if the serialized flatbuffers are padded at 8 byte boundaries. We just dealt with this in Feather (wesm/feather@002c798) |
@wesm I haven't done it yet but it should be easy to pad things to to stay on 8 bytes boundary. |
@wesm added alignment. |
|
||
UInt4Vector offsets; | ||
final UInt4Vector offsets;// TODO: THis masks the same vector in the parent which is assigned to this in the constructor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO (for JIRA?): lists in the current arrow spec use signed int32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove the TODO. That was more a personal note. I initially thought that the masked field had a different content but they actually have exactly the same content and I made it immutable so that there's no uncertainty whether it stays like that.
I'll open a separate bug for unsigned/signed of the offset vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://issues.apache.org/jira/browse/ARROW-273 for the type of the offset vector
I'm about halfway through skimming the patch, will finish tomorrow and then I think we can merge this quite soon |
to me this is good to go. |
+1 -- just finished reviewing, nice that you put this together so quickly! Fine by me to merge; we should try to soon reconcile the outstanding metadata patches. There's other minor things like names for things in the flatbuffer metadata as discussed elsewhere |
* Add perf. test cases * Remove unnecessary copy * Handle zero input case * Fix bugs
This is work in progress