-
Notifications
You must be signed in to change notification settings - Fork 145
Description
This is specifically an issue when the transforms dictionary includes functions that don't have known return values for String inputs. (DataStreams.transform attempts to infer a return type for the function and gets back Union{}.)
For example, say we want to floor the values in the second column:
julia> CSV.read(IOBuffer("a,b,c\n1.1,2.2,3.3\n4.4,5.5,6.6"); nullable=false, transforms=Dict{Int, Function}(2 => floor))
2×3 DataFrames.DataFrame
│ Row │ a │ b │ c │
├─────┼─────┼─────┼─────┤
│ 1 │ 1.1 │ 2.0 │ 3.3 │
│ 2 │ 4.4 │ 5.0 │ 6.6 │When there are zero rows, we get:
julia> CSV.read(IOBuffer("a,b,c"); nullable=false, transforms=Dict{Int, Function}(2 => floor))
ERROR: MethodError: Cannot `convert` an object of type Type{Union{}} to an object of type DataType
This may have arisen from a call to the constructor DataType(...),
since type constructors fall back to convert methods.
in transform(::DataStreams.Data.Schema{true}, ::Dict{Int64,Function}) at /Users/gem/.julia/v0.5/DataStreams/src/DataStreams.jl:77
in #stream!#5(::Array{Any,1}, ::Function, ::CSV.Source, ::Type{DataFrames.DataFrame}, ::Bool, ::Dict{Int64,Function}) at /Users/gem/.julia/v0.5/DataStreams/src/DataStreams.jl:149
in #read#30(::Bool, ::Dict{Int64,Function}, ::Array{Any,1}, ::Function, ::Base.AbstractIOBuffer{Array{UInt8,1}}, ::Type{DataFrames.DataFrame}) at /Users/gem/.julia/v0.5/CSV/src/Source.jl:302
in (::CSV.#kw##read)(::Array{Any,1}, ::CSV.#read, ::Base.AbstractIOBuffer{Array{UInt8,1}}, ::Type{DataFrames.DataFrame}) at ./<missing>:0 (repeats 2 times)Note that this occurs when there is a header row but no data (or when the buffer is empty but a header is specified).
Currently the user can work around this problem by using a duck-typed wrapper function:
julia> CSV.read(IOBuffer("a,b,c"); nullable=false, transforms=Dict{Int, Function}(2 => x->floor(x)))
0×3 DataFrames.DataFrame...but that seems like a bit of a kludge.
One method of solving this would be to have CSV.read ignore the transforms dictionary when the buffer is empty (I'm prepping a PR that will do just that; just needs tests). Another approach, I suppose, would be to make DataStreams.transform a little more resistant to to having Union{} pop up.