-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss of parametric type information for custom types #134
Comments
Ok, this will now correctly error on #156 PR. |
* Start work on overhauling type serialization architecture * More work; serialization is pretty much done but not tested * fix timetype ArrowTypes definitions * more work to get tests passing * get tests passing? * fix * Fix #75 by supporting Set serialization/deserialization * Fix #85 by supporting tuple serialization/deserialization * Lots of cleanup * few more fixes * Update src/arrowtypes.jl Co-authored-by: Jarrett Revels <jarrettrevels@gmail.com> * Update src/arrowtypes.jl Co-authored-by: Jarrett Revels <jarrettrevels@gmail.com> * fix NullKind reading * Fix #134 by requiring concrete or union of concrete element types for all columns when serializing * Add new ArrowTypes.arrowmetadata method for providing additional extension type metadata htat can be used in JuliaType * Update manual * tests Co-authored-by: Jarrett Revels <jarrettrevels@gmail.com>
* Start work on overhauling type serialization architecture * More work; serialization is pretty much done but not tested * fix timetype ArrowTypes definitions * more work to get tests passing * get tests passing? * fix * Fix apache#75 by supporting Set serialization/deserialization * Fix apache#85 by supporting tuple serialization/deserialization * Lots of cleanup * few more fixes * Update src/arrowtypes.jl Co-authored-by: Jarrett Revels <jarrettrevels@gmail.com> * Update src/arrowtypes.jl Co-authored-by: Jarrett Revels <jarrettrevels@gmail.com> * fix NullKind reading * Fix apache#134 by requiring concrete or union of concrete element types for all columns when serializing * Add new ArrowTypes.arrowmetadata method for providing additional extension type metadata htat can be used in JuliaType * Update manual * tests Co-authored-by: Jarrett Revels <jarrettrevels@gmail.com>
Using table = (col = [
Interval{Closed,Unbounded}(1,nothing),
Interval{Unbounded,Closed}(nothing,2),
],) |
I attempted to work around this on Arrow 1.6 (not quite yet released) by storing the parametric information as part of the value as using using Arrow, ArrowTypes, Intervals
table = (;
col=[
Interval{Closed,Unbounded}(1,nothing),
Interval{Unbounded,Closed}(nothing,2),
]
)
for T in (Closed, Open, Unbounded)
name = QuoteNode(Symbol("JuliaLang.Intervals.$(string(T))"))
@eval begin
ArrowTypes.arrowname(::Type{$T}) = $name
ArrowTypes.JuliaType(::Val{$name}) = $T
end
end
let name = Symbol("JuliaLang.Intervals.Interval")
ArrowTypes.arrowname(::Type{<:Interval{T}}) where T = name
ArrowTypes.ArrowType(::Type{<:Interval{T}}) where T = NamedTuple{(:left, :right), Tuple{Tuple{String, T}, Tuple{String, T}}}
function ArrowTypes.toarrow(x::Interval{T,L,R}) where {T,L,R}
return (; left=(string(arrowname(L)), x.first), right=(string(arrowname(R)), x.last))
end
ArrowTypes.JuliaType(::Val{name}) = Interval
function ArrowTypes.fromarrow(::Type{Interval}, left, right)
T = typeof(left[2])
L = ArrowTypes.JuliaType(Val(Symbol(left[1])))
R = ArrowTypes.JuliaType(Val(Symbol(right[1])))
return Interval{T,L,R}(
L === Unbounded ? nothing : left[2],
R === Unbounded ? nothing : right[2],
)
end
end
# ArrowTypes.fromarrow(Interval, ArrowTypes.toarrow(table.col[1]))
table.col
t = Arrow.Table(Arrow.tobuffer(table))
t.col julia> table.col
2-element Vector{Interval{Int64, L, R} where {L<:Bound, R<:Bound}}:
Interval{Int64, Closed, Unbounded}(1, nothing)
Interval{Int64, Unbounded, Closed}(nothing, 2)
julia> t = Arrow.Table(Arrow.tobuffer(table))
Arrow.Table with 2 rows, 1 columns, and schema:
:col Interval
julia> t.col
2-element Arrow.Struct{Interval, Tuple{Arrow.Struct{Tuple{String, Int64}, Tuple{Arrow.List{String, Int32, Vector{UInt8}}, Arrow.Primitive{Int64, Vector{Int64}}}}, Arrow.Struct{Tuple{String, Int64}, Tuple{Arrow.List{String, Int32, Vector{UInt8}}, Arrow.Primitive{Int64, Vector{Int64}}}}}}:
Interval{Int64, Closed, Unbounded}(1, nothing)
Interval{Int64, Unbounded, Closed}(nothing, 2) The main issue I had with implementing this is that the serialized instance as defined by |
Closing as I think we have all the tools in place to support this kind of use-case, even if it's not the most convenient. i.e. the arrow format is really built for pretty homogenous data within the bounds of individual columns, but beyond that, it doesn't fare very well with mixed-type kinds of columns. |
When using a table where a column contains a variety of types with different parameters this information can be lost:
For the particular
Interval
type the problem is worse as the undefined type parameters are inferred from the arguments:The text was updated successfully, but these errors were encountered: