Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Losing type in unnamed column #138

Closed
jtrakk opened this issue Feb 28, 2021 · 1 comment
Closed

Losing type in unnamed column #138

jtrakk opened this issue Feb 28, 2021 · 1 comment

Comments

@jtrakk
Copy link

jtrakk commented Feb 28, 2021

It loses the units if I don't use a Dict.

julia> using Arrow, Unitful

julia> Arrow.write("/tmp/foo.arrow", rand(10) .* u"°C")
"/tmp/foo.arrow"
# No units
julia> Arrow.Table("/tmp/foo.arrow")
Arrow.Table: (val = [0.386558450155964, 0.40953309052489084, 0.49596392007703627, 0.39207582590844714, 0.4577667338357032, 0.9492173362476377, 0.6055765801265836, 0.5689677312338461, 0.3226151344243675, 0.7348948643833466],)

julia> Arrow.write("/tmp/foo.arrow", Dict(:col => rand(10) .* u"°C"))
"/tmp/foo.arrow"

julia> Arrow.Table("/tmp/foo.arrow")
Arrow.Table: (col = Quantity{Float64,𝚯,Unitful.FreeUnits{(K,),𝚯,Unitful.Affine{-5463//20}}}[0.25224281438640905 °C, 0.9380070034984105 °C, 0.685889711996005 °C, 0.5695905396443204 °C, 0.14096895576083734 °C, 0.17337106741309194 °C, 0.950146887395434 °C, 0.3233302961350404 °C, 0.2054310111402502 °C, 0.3122890619977092 °C],)
  [69666777] Arrow v1.2.4
  [1986cc42] Unitful v1.5.0
@quinnj
Copy link
Member

quinnj commented Mar 6, 2021

When you call Arrow.write by just passing rand(10) .* u"°C", it's not doing what you probably expect. Arrow.write expects a Tables.jl-compatible source to write out. A Vector{T} where T is some kind of struct is considered a valid table by default, but in your case a Quantity{Float64,...} is not really a struct, but more like a Float64 w/ some additional metadata.

More specifically, Arrow.write calls cols = Tables.columns(rand(10) .* u"°C") which you can see results in the loss of the Quantity{Float64, ...} metadata. A Dict(:col => ...) on the other hand, is by default a "column table", so no extra conversion happens and the column is serialized as-is, preserving the type metadata.

Hopefully that makes sense. If you have any suggestions on how we can improve the docs, I'm always trying to find ways to clarify.

@quinnj quinnj closed this as completed Mar 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants