Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Julia crashes with "Bus error: 10" after CSV.write call #180

Closed
crbinz opened this issue Mar 15, 2018 · 7 comments
Closed

Julia crashes with "Bus error: 10" after CSV.write call #180

crbinz opened this issue Mar 15, 2018 · 7 comments

Comments

@crbinz
Copy link

crbinz commented Mar 15, 2018

test.csv contains:

A,B,C,D,E
12345,5,2018-03-10T19:48:06.0,abcdefg,hijklmn

Then:

   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.6.2 (2017-12-13 18:08 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-apple-darwin14.5.0

julia> using CSV, DataFrames

julia> db = CSV.read("test.csv", dateformat=DateFormat("yyyy-mm-dd HH:MM:SS"), nullable = true);

julia> open("test.csv","w") do fh
       CSV.write(fh,db)
       end
Bus error: 10
julia> Pkg.status("CSV")
 - CSV                           0.2.2

Edit: the crash does not occur if I set weakrefstrings=false.

@nalimilan
Copy link
Member

I can reproduce on Linux. It also happens without nullable=true.

@davidanthoff
Copy link

Isn't this kind of expected if you use weakrefstrings=true and memory mapped files? The arrays in db will reference memory locations that are directly mapped onto test.csv on disc. If you now start to change the file on disc, all those pointers to the various weak ref strings in db will point to who-knows-what.

@nalimilan
Copy link
Member

I'd have imagined open("test.csv","w") would have created a new copy of the file and that the OS would have kept the old file around until we close the handle. But apparently the file is modified in place.

Anyway this direct modification of the file is indeed possible and inevitably leads to corruption. We should probably avoid using both weakrefstrings=true and use_mmap=true by default: it's fine to use mmap to speed up parsing, but once it's done we shouldn't rely on the file being intact for the whole life of the data frame (and on Windows the file is locked, which isn't convenient either). I wonder what's the best default combination though: weakrefstrings=false, use_mmap=true or weakrefstrings=true, use_mmap=false? The former would have the advantage that we would return standard Vector{String} arrays, which are easier to understand for users.

See also #170 and #140.

@nalimilan
Copy link
Member

Fixed by #204: the new default is equivalent to weakrefstrings=false.

@nalimilan
Copy link
Member

For reference, a way to avoid this problem is to delete the mmapped file instead of writing into it. That way the OS keeps it available to the process which uses it (just making it invisible from the filesystem) and no crash happens. See JuliaData/Feather.jl#94.

@ararslan
Copy link
Member

Fixed by #204

Not entirely, it seems:

julia> open("x.csv", "w") do io
           print(io, """
               a,b
               1,"hi"
               2,"bye"
               """)
       end

julia> using CSV

julia> df = CSV.read("x.csv")
2×2 DataFrames.DataFrame
│ Row │ a     │ b      │
│     │ Int64 │ String │
├─────┼───────┼────────┤
│ 1   │ 1     │ hi     │
│ 2   │ 2     │ bye    │

julia> CSV.write("x.csv", df)
[1]    13059 bus error  julia

This is with the latest CSV release.

@ararslan
Copy link
Member

Ah, mmapping by default is only disabled on Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants