-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slower materialization Feather vs Arrow #131
Comments
Sorry for the slow response; looks like it has to do w/ our nullability check, since the non-nullable case is actually faster in Arrow.jl: julia> for (fn,name,k) in [(Arrow.Table,tmp_A,(:A,)),(Feather.read,tmp_F,(!,:A))]
vec = fn(name)[k...]
@info typeof(vec)
for i in 1:3
out = Vector{Union{Float64, Missing}}(undef, length(vec))
@time out .= vec .+ 0.1
end
end
[ Info: Arrow.Primitive{Float64, Vector{Float64}}
0.001771 seconds (2 allocations: 160 bytes)
0.000543 seconds (2 allocations: 160 bytes)
0.000323 seconds (2 allocations: 160 bytes)
[ Info: Feather.Arrow.Primitive{Float64}
0.002012 seconds (2 allocations: 96 bytes)
0.000795 seconds (2 allocations: 96 bytes)
0.000607 seconds (2 allocations: 96 bytes) |
quinnj
added a commit
that referenced
this issue
Apr 14, 2021
Fixes #131. Dip me in mustard and call me a hotdog, cuz I can't tell how/why `divrem(i - 1, 8) .+ (1, 1)` ends up being ~30% faster than `fldmod1(i, 8)`. It'd probably be worth looking into it more, but it works for now. The divrem code is @ExpandingMan 's code from his arrow/feather code.
quinnj
added a commit
that referenced
this issue
Apr 15, 2021
Fixes #131. Dip me in mustard and call me a hotdog, cuz I can't tell how/why `divrem(i - 1, 8) .+ (1, 1)` ends up being ~30% faster than `fldmod1(i, 8)`. It'd probably be worth looking into it more, but it works for now. The divrem code is @ExpandingMan 's code from his arrow/feather code.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I'm currently transitioning from
Feather.jl
toArrow.jl
, and I've noticed a near double time increase when reading and modifying 525600x400 DataFrame with mostly columns ofUnion{Missing, Float}
/Union{Missing,Bool}
.Consider :
We get the following results :
using Julia
v1.5.3
,Feather v0.5.7
,Arrow v1.2.4
,DataFrames v0.22.5
with similar results in juliav1.6.0-rc1
Is there a better way to do this with Arrow, that wouldn't it at a slower speed than it's predecessor ?
The text was updated successfully, but these errors were encountered: