-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV.read is still incredibly slow on Windows with 0.4.1 #15
Comments
Should be closed by JuliaLang/METADATA.jl#4014 |
(i.e. just do a |
This is not resolved: julia> f="c:\temp\julia1k.csv" julia> julia> julia> julia> size(f1) julia> size(df) I apologize to the user |
@kafisatz, can you share some more details around the slowness you were seeing? Namely:
I tried to dig into this again this morning, but I'm seeing comparable parsing times between my mac and a windows machine on Julia 0.4.1, and latest CSV master ( |
Hi quinn. It is very fast now. csv.csv takes extremely long compared to readtable (dataframe), see below. julia> @time f1=readcsv(f); julia> @time f2=CSV.read(f); julia> @time dt=CSV.csv(f,rows_for_type_detect=10000) julia> @time df=readtable(f); |
hey @kafisatz, a couple of things here:
using DataFrames
using CSV
dt = CSV.csv("myfile.csv")
df = DataFrame(dt) # converts our Data.Table `dt` to a DataFrame without copying |
@quinnj
I just did a test on a 1000 row dataset.
I needed to set type detect to 1000 which may be slightly "unfair" compared to readcsv which has no types. Still CSV.read is far to slow:
** it takes 13 seconds instead of 0.04s **
I note that the functions were already compiled in the example below.
I was hoping that this works better now, as you indicated here:
https://groups.google.com/forum/#!searchin/julia-users/csv/julia-users/IFkPso4JUac/lNLgLoCqAwAJ
Any hints?
julia> f="T:\temp\julia1k.csv"
"T:\temp\julia1k.csv"
julia> @time f1=readcsv(f);
0.043854 seconds (239.86 k allocations: 8.536 MB)
julia> @time df=readtable(f);
0.039639 seconds (221.93 k allocations: 10.359 MB, 15.51% gc time)
julia> @time f2=CSV.read(f,rows_for_type_detect=1000);
13.760476 seconds (1.79 M allocations: 73.616 MB, 0.12% gc time)
julia> @show size(f1),size(f2),size(df)
(size(f1),size(f2),size(df)) = ((1000,77),(999,77),(999,77))
((1000,77),(999,77),(999,77))
julia> versioninfo(true)
Julia Version 0.4.1
Commit cbe1bee* (2015-11-08 10:33 UTC)
Platform Info:
System: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
WORD_SIZE: 64
Microsoft Windows [Version 6.1.7601]
uname: MSYS_NT-6.1 2.3.0(0.290/5/3) 2015-09-29 10:48 x86_64 unknown
Memory: 31.694698333740234 GB (26403.6875 MB free)
Uptime: 1.1864766877332e6 sec
Load Avg: 0.0 0.0 0.0
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz:
speed user nice sys idle irq ticks
#1 3410 MHz 4675942 0 2720907 1179080330 152865 ticks
#2 3410 MHz 609105 0 854667 1185013080 87454 ticks
#3 3410 MHz 6070357 0 9145699 1171260702 124348 ticks
#4 3410 MHz 786603 0 1347911 1184342104 18033 ticks
#5 3410 MHz 6533228 0 11563262 1168380019 145923 ticks
#6 3410 MHz 106033 0 37487 1186332833 1404 ticks
#7 3410 MHz 5059143 0 8723326 1172693743 114894 ticks
#8 3410 MHz 2913786 0 1522086 1182040247 29094 ticks
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.3
Environment:
.CLASSPATH = C:\Users\workstation\Documents\mongojdbcdriver
CLASSPATH = C:\Users\workstation\Documents\mongojdbcdriver
GROOVY_HOME = C:\Program Files (x86)\Groovy\Groovy-2.2.2
HOMEDRIVE = C:
HOMEPATH = \Users\workstation
JAVA_HOME = C:\Program Files\Java\jre8
JULIA_HOME = C:\Program Files\Juno\resources\app\julia\bin
PATHEXT = .COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.groovy;.gy
Package Directory: C:\Users\workstation.julia
27 required packages:
57 additional packages:
julia>
The text was updated successfully, but these errors were encountered: