Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with threads and LazyArrays #536

Closed
Rudi79 opened this issue Nov 21, 2019 · 3 comments
Closed

Problems with threads and LazyArrays #536

Rudi79 opened this issue Nov 21, 2019 · 3 comments

Comments

@Rudi79
Copy link

Rudi79 commented Nov 21, 2019

Thanks for your great work on this package.
I discovered an odd behaviour I wanted to share:

using CSV, DataFrames
dt = DataFrame(a = rand(10000))
dt.a .== dt.a .== dt.a # works
CSV.write("test.csv", dt)

df = CSV.File("test.csv", threaded=false) |> DataFrame!
df.a .== df.a .== df.a # works
dt==df # true

df2 = CSV.File("test.csv", threaded=true) |> DataFrame!
df2.a .== df2.a # works
df2.a .== df2.a .== df2.a # does not work
df2==df # true 

tested on Julia 1.3-rc4 and 5 with activated threading on Win10 and Linux.
The problem seems to be related to broadcasting and LazyArrays.
I already posted on discourse (https://discourse.julialang.org/t/csv-dataframes-problems-with-threads/31318)

@daschw
Copy link

daschw commented Nov 28, 2019

I also ran into an issue with threads and LazyArrays (similar to #529):

using CSV, DataFrames

function write_read_and_append_dfs(n)

    df1 = DataFrame(x = rand(n))
    df2 = DataFrame(x = rand(n))

    CSV.write("df1.csv", df1)
    CSV.write("df2.csv", df2)

    df1 = CSV.read("df1.csv", copycols = true)
    df2 = CSV.read("df2.csv", copycols = true)

    append!(df1, df2)
end

This works for "small" n but not for "larger". On my system I found:

julia> write_read_and_append_dfs(4482)
8964×1 DataFrame
│ Row  │ x          │
│      │ Float64    │
├──────┼────────────┤
│ 10.834451   │
│ 20.00230709 │
│ 30.632514   │
│ 40.576571   │
│ 50.40680989590.691459   │
│ 89600.519632   │
│ 89610.968999   │
│ 89620.0041048  │
│ 89630.109562   │
│ 89640.528489   │

julia> write_read_and_append_dfs(4483)
ERROR: MethodError: no method matching resize!(::LazyArrays.ApplyArray{Float64,1,typeof(vcat),Tuple{Array{Float64,1},Array{Float64,1}}}, ::Int64)
Closest candidates are:
  resize!(::Array{T,1} where T, ::Integer) at array.jl:1017
  resize!(::BitArray{1}, ::Integer) at bitarray.jl:773
  resize!(::JSON.Parser.PushVector, ::Integer) at C:\Users\Daniel\.julia\packages\JSON\d89fA\src\pushvector.jl:30
  ...
Stacktrace:
 [1] append!(::DataFrame, ::DataFrame) at C:\Users\Daniel\.julia\packages\DataFrames\yH0f6\src\dataframe\dataframe.jl:1142
 [2] write_read_and_append_dfs(::Int64) at C:\Users\Daniel\Cloud\TU Wien\Julia\prices.jl:49
 [3] top-level scope at none:0

I figured out that I could avoid this with threaded=false and probably this is intended, but still it might be confusing for users. Maybe a warning and a hint to threaded could be helpful.

@nalimilan
Copy link
Member

@Rudi79 Can you file an issue in LazyArrays? The bug is probably there.

@daschw That sounds like #539.

@quinnj
Copy link
Member

quinnj commented Dec 12, 2019

Closing in favor of #539

@quinnj quinnj closed this as completed Dec 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants