-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add accepted_field_type
for abstract vectors
#89
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a stop-gap I think we should do this. I think there might be might be a more general solution, and maybe we can use Tables.is_equivalent_schema
as Jacob linked in the Arrow issue somehow. I also think on the left-hand sides we might be able to make this conversion more general too somehow, instead of only widening it for Vector
. Would be great to have @jrevels perspective.
But I would say we should merge this now to fix the breakage and take a step towards making Legolas more flexible.
Maybe we should also file an issue to add more tests for this as a followup, to cover the changed lines and ensure we don't regress |
src/schemas.jl
Outdated
@@ -250,6 +253,9 @@ otherwise constitutes type piracy and should be avoided. | |||
@inline accepted_field_type(::SchemaVersion, T::Type) = T | |||
accepted_field_type(::SchemaVersion, ::Type{UUID}) = Union{UUID,UInt128} | |||
accepted_field_type(::SchemaVersion, ::Type{Symbol}) = Union{Symbol,String} | |||
accepted_field_type(::SchemaVersion, ::Type{Vector{String}}) = AbstractVector{<:AbstractString} | |||
accepted_field_type(::SchemaVersion, ::Type{Vector{T}}) where T = AbstractVector{<:T} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accepted_field_type(::SchemaVersion, ::Type{Vector{T}}) where T = AbstractVector{<:T} | |
accepted_field_type(sv::SchemaVersion, ::Type{<:Vector{T}}) where T = AbstractVector{<:(accepted_field_type(sv, T))} |
?
i think this should also obviate the need for the other accepted_field_type
definition for Vector
Vector{String}
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this should also obviate the need for the other accepted_field_type definitions for Vector here
When accepted_field_type(::SchemaVersion, ::Type{Vector}) = AbstractVector
is removed, we still hit error in this test:
┌ Warning: The `Tables.Schema` of the `Arrow.Table` read via `Legolas.read(io_or_path)` does not appear to
│ comply with the `Legolas.SchemaVersion` indicated by the table's metadata (`SchemaVersion("test.parent", 1)`). Try invoking
│ `Legolas.read(io_or_path; validate=false)` to inspect the table.
└ @ Legolas ~/Code/Legolas.jl/src/tables.jl:171
miscellaneous Legolas/src/tables.jl tests: Error During Test at /Users/hrobertson/Code/Legolas.jl/test/runtests.jl:468
Test threw exception
Expression: t == [NamedTuple(ParentV1(r)) for r = Tables.rows(Legolas.read(path))]
ArgumentError: Tables.Schema violates Legolas schema `test.parent@1`:
- Incorrect type: `x` expected `<:Vector`, found `SubArray{Int64, 1, Arrow.Primitive{Int64, Vector{Int64}}, Tuple{UnitRange{Int64}}, true}`
Provided Tables.Schema:
:x SubArray{Int64, 1, Arrow.Primitive{Int64, Vector{Int64}}, Tuple{UnitRange{Int64}}, true}
:y String
Stacktrace:
[1] validate(ts::Tables.Schema{(:x, :y), Tuple{SubArray{Int64, 1, Arrow.Primitive{Int64, Vector{Int64}}, Tuple{UnitRange{Int64}}, true}, String}}, sv::SchemaVersion{Symbol("test.parent"), 1})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, have to keep that one actually, my bad
Co-authored-by: Jarrett Revels <jarrettrevels@gmail.com>
|
PR #89 fixed compatibility with Arrow v2.6 except when the schema field's declared type is e.g. `Union{Vector{T},Missing}` and what's passed for validation is some `AbstractVector`. This adds a method to `accepted_field_type` that recurses on the non-`Missing` type in a `Union` with `Missing`. Note that this also necessitated adding an explicit `accepted_field_type` method for `Any` to avoid infinite recursion with the fallback definition, since `(Union{T,Missing} where {T}) == Any`.
PR #89 fixed compatibility with Arrow v2.6 except when the schema field's declared type is e.g. `Union{Vector{T},Missing}` and what's passed for validation is some `AbstractVector`. This adds a method to `accepted_field_type` that recurses on the non-`Missing` type in a `Union` with `Missing`. Note that this also necessitated adding an explicit `accepted_field_type` method for `Any` to avoid infinite recursion with the fallback definition, since `(Union{T,Missing} where {T}) == Any`.
ref apache/arrow-julia#452
Add
accepted_field_type
s as proposed by @ericphanson in above issue. No additional tests added BUTmain
is currently failing when run on latest (v2.6) Arrow.jl (e.g. in #88), and no longer fails with this fix.