-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing type gets lost when writing partitions of DataFrame #403
Comments
I think I know where it's coming from. The issue happens here
In other words, we do I don't know enough about the Tables API/contract to know whether this is an Arrow problem, Tables problem, or DataFrames problem. Does this issue belong somewhere else? It would be an easy fix to get schema info from the parent object, but are all Tables-compatible sources required to keep that? Eg,
Should I open a PR? Illustration
EDIT: I suspect this will affect other partitioners that rely on Iterators over |
At the moment, a similar thing is blocking #477. |
This is an odd one and likely to be a PICNIC...
Problem: Missigness in a string column is lost after saving/loading arrow file
When it happens: When a column in my dataset has type
Union{Missing,String}
, I partition it, and the missing item appears only in the later partitions. It's easily reproducible (see below).Debugging:
Iterators.partition(Tables.rows(df), 2)
. If partitioned asIterators.partition(df,2)
available from version >1.5.0, it is fineMWE
Versioninfo:
The text was updated successfully, but these errors were encountered: