Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use of setmetadata! on generic tables basically requires calling Tables.columns on the input #211

Closed
jrevels opened this issue Jun 13, 2021 · 3 comments
Labels
bug Something isn't working

Comments

@jrevels
Copy link
Contributor

jrevels commented Jun 13, 2021

I thought there was an already issue for this, but couldn't find it:

julia> using Arrow, Tables

julia> t = Arrow.Table(Arrow.tobuffer((; x = [1,2,3], y = ["a", "b", "c"])))
Arrow.Table with 3 rows, 2 columns, and schema:
 :x  Int64
 :y  String

julia> Arrow.setmetadata!(t, Dict("key" => "value"))

julia> Arrow.getmetadata(t)
Dict{String,String} with 1 entry:
  "key" => "value"

# where did my metadata go?
julia> Arrow.getmetadata(Arrow.Table(Arrow.tobuffer(t))) isa Nothing
true

julia> t_cols = Tables.columns(t);

julia> Arrow.setmetadata!(t_cols, Dict("key" => "value"));

# but this works
julia> Arrow.getmetadata(Arrow.Table(Arrow.tobuffer(t_cols)))
Dict{String,String} with 1 entry:
  "key" => "value"

IIUC the reason for this is that Arrow.write(_, t) internally calls Tables.columns(t) and then tries to grab THAT object's metadata instead of the original t's metadata (IIRC the exact place where this happens is here; I'd have to double check though, it's been awhile since I tracked it down).

@jrevels jrevels added the bug Something isn't working label Jun 13, 2021
@haberdashPI
Copy link

If I understand this problem, what's ultimately needed is a change in Tables that provides support for metadata (e.g. by defining an interface for it that types can implement, or by transfering metadata to new objects upon a call to Table.columns etc... so that the new objects are also associated with the metadata in a global table).

@ericphanson
Copy link
Member

I think an alternate solution would be #90 -- just removing the metadata cache altogether, and only support it at read and write time. I think something like table, metadata = Arrow.read("file.arrow") and Arrow.write("file.arrow", table, metadata) would be simple and clear, and avoid these issues. Personally, I don't think Arrow should be in the business of trying to persist metadata through various Julia types and should just stick to (de)-serialization since there isn't a fully established metadata story in Julia's data ecosystem in general.

@jrevels
Copy link
Contributor Author

jrevels commented Sep 14, 2021

closed by #238

@jrevels jrevels closed this as completed Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants