Use standard Tables.Schema constructor instead of constructing directly #2797

quinnj · 2021-06-22T18:35:00Z

This is part of fixing errors like
JuliaData/CSV.jl#635 in addition to the
changes to support really wide tables in
JuliaData/Tables.jl#241. Luckily, there aren't
many cases I've found across Tables.jl implementations that make working
with really wide tables impossible, but this was a key place where for
really wide tables, we want the names/types to be stored as Vectors
instead of Tuple/Tuple{} in Tables.Schema. This shouldn't have any
noticeable change/affect for non-wide DataFrames and should be covered
by existing tests.

This is part of fixing errors like JuliaData/CSV.jl#635 in addition to the changes to support really wide tables in JuliaData/Tables.jl#241. Luckily, there aren't many cases I've found across Tables.jl implementations that make working with really wide tables impossible, but this was a key place where for really wide tables, we want the names/types to be stored as `Vector`s instead of `Tuple`/`Tuple{}` in `Tables.Schema`. This shouldn't have any noticeable change/affect for non-wide DataFrames and should be covered by existing tests.

bkamins

I understand this change does not need JuliaData/Tables.jl#241 (it works both on the current release of Tables.jl). Right?

bkamins · 2021-06-22T19:53:37Z

src/other/tables.jl

@@ -21,7 +21,7 @@ end
 Tables.columnindex(df::Union{AbstractDataFrame, DataFrameRow}, idx::AbstractString) =
    columnindex(df, Symbol(idx))

-Tables.schema(df::AbstractDataFrame) = Tables.Schema{Tuple(_names(df)), Tuple{[eltype(col) for col in eachcol(df)]...}}()
+Tables.schema(df::AbstractDataFrame) = Tables.Schema(_names(df), [eltype(col) for col in eachcol(df)])


_names(df) returns a SubArray if df is a SubDataFrame. Is this acceptable by Tables.schema?

Also _names(df) for a DataFrame returns an internal vector (without copying). Is this safe?

In short. Maybe it should be propertynames(df) instead (which guarantees a Vector{Symbol} that is always a copy)

Nah, Tables.jl will make its own copy; it accepts any iterable and makes its own Vector{Symbol}.

Then I just "approve" :).

quinnj · 2021-06-22T19:56:54Z

I understand this change does not need JuliaData/Tables.jl#241 (it works both on the current release of Tables.jl). Right?

Correct. The goal in the Tables.jl change was no user-visible change. In practice this should affect a very small % of tables. But yes, Tables.Schema(names, types) has always been the preferred Schema constructor; I think we've just supported Tables.jl in DataFrames.jl for so long that we originally did the shortcut before the constructor was standardized.

quinnj · 2021-06-22T19:58:55Z

Not urgent to release, but should be safe to include in a patch release at any time.

bkamins · 2021-06-22T20:11:47Z

The 1.2 release is on the way. It depends on the availability of @nalimilan (I know he is busy).

What I need is merging:

Fix float grouping #2791 (here a decision by @nalimilan is needed)
bump version to 1.7.0 DataAPI.jl#38 (and a release) and a follow up Sync with DataAPI.jl 1.7 release #2788
Clean up precompile statements #2792

and we can make 1.2 release. Optionally we could include (as these are relatively simple things to implement; the only thing is decision, but we can add them 1.3 equally well, so they are non-blocking):

Support adding columns to views #2794 (unfinished and probably will need discussion)
Add a method to add/insert empty columns #2783 (here a decision is needed if we want pseudo-broadcasting on 0-dimensional objects to retain their eltype - I am pretty convinced we should go this way, but the discussion is open)

bkamins approved these changes Jun 22, 2021

View reviewed changes

bkamins reviewed Jun 22, 2021

View reviewed changes

quinnj merged commit e258354 into main Jun 22, 2021

quinnj deleted the jq/dfschema branch June 22, 2021 19:58

bkamins mentioned this pull request Jun 23, 2021

allow :col => AsTable and :col => cols #2780

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use standard Tables.Schema constructor instead of constructing directly #2797

Use standard Tables.Schema constructor instead of constructing directly #2797

quinnj commented Jun 22, 2021

bkamins left a comment

bkamins Jun 22, 2021

quinnj Jun 22, 2021

bkamins Jun 22, 2021

quinnj commented Jun 22, 2021

quinnj commented Jun 22, 2021

bkamins commented Jun 22, 2021

Use standard Tables.Schema constructor instead of constructing directly #2797

Use standard Tables.Schema constructor instead of constructing directly #2797

Conversation

quinnj commented Jun 22, 2021

bkamins left a comment

Choose a reason for hiding this comment

bkamins Jun 22, 2021

Choose a reason for hiding this comment

quinnj Jun 22, 2021

Choose a reason for hiding this comment

bkamins Jun 22, 2021

Choose a reason for hiding this comment

quinnj commented Jun 22, 2021

quinnj commented Jun 22, 2021

bkamins commented Jun 22, 2021