Skip to content

CSV.read on empty buffer fails when transforms is specified #74

@spurll

Description

@spurll

This is specifically an issue when the transforms dictionary includes functions that don't have known return values for String inputs. (DataStreams.transform attempts to infer a return type for the function and gets back Union{}.)

For example, say we want to floor the values in the second column:

julia> CSV.read(IOBuffer("a,b,c\n1.1,2.2,3.3\n4.4,5.5,6.6"); nullable=false, transforms=Dict{Int, Function}(2 => floor))
2×3 DataFrames.DataFrame
│ Row │ a   │ b   │ c   │
├─────┼─────┼─────┼─────┤
│ 11.12.03.3 │
│ 24.45.06.6

When there are zero rows, we get:

julia> CSV.read(IOBuffer("a,b,c"); nullable=false, transforms=Dict{Int, Function}(2 => floor))
ERROR: MethodError: Cannot `convert` an object of type Type{Union{}} to an object of type DataType
This may have arisen from a call to the constructor DataType(...),
since type constructors fall back to convert methods.
 in transform(::DataStreams.Data.Schema{true}, ::Dict{Int64,Function}) at /Users/gem/.julia/v0.5/DataStreams/src/DataStreams.jl:77
 in #stream!#5(::Array{Any,1}, ::Function, ::CSV.Source, ::Type{DataFrames.DataFrame}, ::Bool, ::Dict{Int64,Function}) at /Users/gem/.julia/v0.5/DataStreams/src/DataStreams.jl:149
 in #read#30(::Bool, ::Dict{Int64,Function}, ::Array{Any,1}, ::Function, ::Base.AbstractIOBuffer{Array{UInt8,1}}, ::Type{DataFrames.DataFrame}) at /Users/gem/.julia/v0.5/CSV/src/Source.jl:302
 in (::CSV.#kw##read)(::Array{Any,1}, ::CSV.#read, ::Base.AbstractIOBuffer{Array{UInt8,1}}, ::Type{DataFrames.DataFrame}) at ./<missing>:0 (repeats 2 times)

Note that this occurs when there is a header row but no data (or when the buffer is empty but a header is specified).

Currently the user can work around this problem by using a duck-typed wrapper function:

julia> CSV.read(IOBuffer("a,b,c"); nullable=false, transforms=Dict{Int, Function}(2 => x->floor(x)))
0×3 DataFrames.DataFrame

...but that seems like a bit of a kludge.

One method of solving this would be to have CSV.read ignore the transforms dictionary when the buffer is empty (I'm prepping a PR that will do just that; just needs tests). Another approach, I suppose, would be to make DataStreams.transform a little more resistant to to having Union{} pop up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions