Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use standard Tables.Schema constructor instead of constructing directly #2797

Merged
merged 1 commit into from
Jun 22, 2021

Conversation

quinnj
Copy link
Member

@quinnj quinnj commented Jun 22, 2021

This is part of fixing errors like
JuliaData/CSV.jl#635 in addition to the
changes to support really wide tables in
JuliaData/Tables.jl#241. Luckily, there aren't
many cases I've found across Tables.jl implementations that make working
with really wide tables impossible, but this was a key place where for
really wide tables, we want the names/types to be stored as Vectors
instead of Tuple/Tuple{} in Tables.Schema. This shouldn't have any
noticeable change/affect for non-wide DataFrames and should be covered
by existing tests.

This is part of fixing errors like
JuliaData/CSV.jl#635 in addition to the
changes to support really wide tables in
JuliaData/Tables.jl#241. Luckily, there aren't
many cases I've found across Tables.jl implementations that make working
with really wide tables impossible, but this was a key place where for
really wide tables, we want the names/types to be stored as `Vector`s
instead of `Tuple`/`Tuple{}` in `Tables.Schema`. This shouldn't have any
noticeable change/affect for non-wide DataFrames and should be covered
by existing tests.
Copy link
Member

@bkamins bkamins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this change does not need JuliaData/Tables.jl#241 (it works both on the current release of Tables.jl). Right?

@@ -21,7 +21,7 @@ end
Tables.columnindex(df::Union{AbstractDataFrame, DataFrameRow}, idx::AbstractString) =
columnindex(df, Symbol(idx))

Tables.schema(df::AbstractDataFrame) = Tables.Schema{Tuple(_names(df)), Tuple{[eltype(col) for col in eachcol(df)]...}}()
Tables.schema(df::AbstractDataFrame) = Tables.Schema(_names(df), [eltype(col) for col in eachcol(df)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_names(df) returns a SubArray if df is a SubDataFrame. Is this acceptable by Tables.schema?

Also _names(df) for a DataFrame returns an internal vector (without copying). Is this safe?

In short. Maybe it should be propertynames(df) instead (which guarantees a Vector{Symbol} that is always a copy)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, Tables.jl will make its own copy; it accepts any iterable and makes its own Vector{Symbol}.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I just "approve" :).

@quinnj
Copy link
Member Author

quinnj commented Jun 22, 2021

I understand this change does not need JuliaData/Tables.jl#241 (it works both on the current release of Tables.jl). Right?

Correct. The goal in the Tables.jl change was no user-visible change. In practice this should affect a very small % of tables. But yes, Tables.Schema(names, types) has always been the preferred Schema constructor; I think we've just supported Tables.jl in DataFrames.jl for so long that we originally did the shortcut before the constructor was standardized.

@quinnj quinnj merged commit e258354 into main Jun 22, 2021
@quinnj quinnj deleted the jq/dfschema branch June 22, 2021 19:58
@quinnj
Copy link
Member Author

quinnj commented Jun 22, 2021

Not urgent to release, but should be safe to include in a patch release at any time.

@bkamins
Copy link
Member

bkamins commented Jun 22, 2021

The 1.2 release is on the way. It depends on the availability of @nalimilan (I know he is busy).

What I need is merging:

and we can make 1.2 release. Optionally we could include (as these are relatively simple things to implement; the only thing is decision, but we can add them 1.3 equally well, so they are non-blocking):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants