You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have encountered a small issue reading an Arrow file as a DataFrame.
This might not be a bug, but I think it's worth clarifying if it is expected behavior.
tl;dr;
Reading an arrow file into a DataFrame seems to produce results that are not entirely consistent with other DataFrames.
I start reading an arrow table into the DataFrame package
Yes, this is expected. We tried in the documentation to convey that Arrow.Tables are immutable, as is designed in the official arrow spec. Arrow data in general is meant as an "analytical" format, to enable analysis workloads to process and share data at highest possible speeds between implementations.
That said, when used as a storage format, it's often desirable to further process the data and mutate. You can make copies of the arrow data as mutable structures by doing something like df = DataFrame(Arrow.columntable(Arrow.Table(file))) to get normal Vector arrays.
I have encountered a small issue reading an Arrow file as a DataFrame.
This might not be a bug, but I think it's worth clarifying if it is expected behavior.
tl;dr;
Reading an arrow file into a DataFrame seems to produce results that are not entirely consistent with other DataFrames.
I start reading an arrow table into the DataFrame package
I am interested in replacing the missing values of
V2
as something different (another date).I looked carefully at the dataframe and I realized that it was still a list from Arrow
Applying another
DataFrame
"conversion I got to the "correct" type:Such that the initial transformation worked out. So one solution would be to apply the
DataFrame
conversion twice:which accomplishes the desired change in place.
Is this expected behavior?
The text was updated successfully, but these errors were encountered: