-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use standard Tables.Schema constructor instead of constructing directly #2797
Conversation
This is part of fixing errors like JuliaData/CSV.jl#635 in addition to the changes to support really wide tables in JuliaData/Tables.jl#241. Luckily, there aren't many cases I've found across Tables.jl implementations that make working with really wide tables impossible, but this was a key place where for really wide tables, we want the names/types to be stored as `Vector`s instead of `Tuple`/`Tuple{}` in `Tables.Schema`. This shouldn't have any noticeable change/affect for non-wide DataFrames and should be covered by existing tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this change does not need JuliaData/Tables.jl#241 (it works both on the current release of Tables.jl). Right?
@@ -21,7 +21,7 @@ end | |||
Tables.columnindex(df::Union{AbstractDataFrame, DataFrameRow}, idx::AbstractString) = | |||
columnindex(df, Symbol(idx)) | |||
|
|||
Tables.schema(df::AbstractDataFrame) = Tables.Schema{Tuple(_names(df)), Tuple{[eltype(col) for col in eachcol(df)]...}}() | |||
Tables.schema(df::AbstractDataFrame) = Tables.Schema(_names(df), [eltype(col) for col in eachcol(df)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_names(df)
returns a SubArray
if df
is a SubDataFrame
. Is this acceptable by Tables.schema
?
Also _names(df)
for a DataFrame
returns an internal vector (without copying). Is this safe?
In short. Maybe it should be propertynames(df)
instead (which guarantees a Vector{Symbol}
that is always a copy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, Tables.jl will make its own copy; it accepts any iterable and makes its own Vector{Symbol}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I just "approve" :).
Correct. The goal in the Tables.jl change was no user-visible change. In practice this should affect a very small % of tables. But yes, |
Not urgent to release, but should be safe to include in a patch release at any time. |
The 1.2 release is on the way. It depends on the availability of @nalimilan (I know he is busy). What I need is merging:
and we can make 1.2 release. Optionally we could include (as these are relatively simple things to implement; the only thing is decision, but we can add them 1.3 equally well, so they are non-blocking):
|
This is part of fixing errors like
JuliaData/CSV.jl#635 in addition to the
changes to support really wide tables in
JuliaData/Tables.jl#241. Luckily, there aren't
many cases I've found across Tables.jl implementations that make working
with really wide tables impossible, but this was a key place where for
really wide tables, we want the names/types to be stored as
Vector
sinstead of
Tuple
/Tuple{}
inTables.Schema
. This shouldn't have anynoticeable change/affect for non-wide DataFrames and should be covered
by existing tests.