-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading an Arrow file with no message batches after the schema seems to produce a partly initialized Table? #158
Comments
A reproduce is actually quite easy. I think this is close to the Arrow message I deal with - schema/metadata, but zero record batches:
|
Digging in now (sorry for the slow reply); it looks like in Arrow.jl, we're writing an empty record batch whereas pyarrow doesn't. |
Fixes #158. While the Julia implementation currently doesn't provide way to avoid writing any record batches, the pyarrow implementation has more fine-grained control over writing and allows closing an ipc stream without writing any record batches. In that case, on the Julia side when reading, we just need to check for this case specifically and if so, populate some empty columns, since we're currently relying on them being populated when record batches are read.
Ok, fix is up: #175 |
Thanks! This fixes the example I gave above, but seems to choke when the field supplies
|
Ah, thanks for checking that out. There was something in my brain that was telling me that just passing |
* Fix case when ipc stream has no record batches, only schema Fixes #158. While the Julia implementation currently doesn't provide way to avoid writing any record batches, the pyarrow implementation has more fine-grained control over writing and allows closing an ipc stream without writing any record batches. In that case, on the Julia side when reading, we just need to check for this case specifically and if so, populate some empty columns, since we're currently relying on them being populated when record batches are read. * fix metadata
I've tried to put together the above demonstration, though I can't share the exact files unfortunately. The first example, t1, has no rows / message batches, but otherwise has the same schema as t2. t1 errors when trying to access any column (or
show
it, etc), but t2 is fine.Arrow.schema(t1)
looks healthy, and identical toArrow.schema(t2)
.Both files read fine with pyarrow.
If you're able to make sense of it, it would be very appreciated. Thanks!
Versions:
julia 1.6.0
Arrow v1.2.4
The text was updated successfully, but these errors were encountered: