-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of Union{T1, T2} #22097
Comments
Someone correct me if I'm wrong, but I believe I'm actually reporting the same thing @johnmyleswhite did in #11699, albeit that issue was a tad more "nullable" focused whereas I believe the actual problem just comes down to plain unions, nullable or not. In my studies, I also came across #10805 and wondered if that would potentially help the Union codegen situation. I'll leave @yuyichao to comment, but it seems like the core idea addressed the issue here. |
Another update here, so I've found a work-around for the inlining issue, it turns out that putting my own explicit function get_then_set2(A)
@simd for i = 1:10
@inbounds A[i] = g2(i)
end
return A
end to this function get_then_set2(A)
@simd for i = 1:10
val = g2(i)
if val isa Void
@inbounds A[i] = g2(i)
else
@inbounds A[i] = g2(i)
end
end
return A
end I do, however, seem to be running into a tricky boxed value issue; I can't reproduce using the examples I've given here, but I'll try to get a way to easily reproduce. Basically it involves inlining a function 2 levels deep that returns a |
If anyone feels brave enough, you should be able to reproduce the allocation I'm seeing by doing the following (probably works on 0.6 or current master, I'm personally on the Pkg.checkout("CSV", "jq/gangy")
Pkg.checkout("DataStreams", "jq/gangy")
Pkg.checkout("DataFrames", "jq/gangy")
Pkg.checkout("WeakRefStrings", "jq/gangy")
Pkg.checkout("Nulls") Then I usually run the following code to generate a dummy file open("randoms_Int64.csv", "w") do f
for j = 1:1_000_000
write(f, string(1001))
write(f, "\n")
end
end Then I run this to see the roughly one allocation per line @time CSV.read("/Users/jacobquinn/Downloads/randoms_$(T).csv"; rows=999999)
# can run the below to see no allocations if we treat the column as `Vector{Int}` instead of `Vector{Union{Int, Null}}`
# @time CSV.read("/Users/jacobquinn/Downloads/randoms_$(T).csv"; nullable=false, rows=999999) And finally the code I'm using to inspect the generated code @time source = CSV.Source("/Users/jacobquinn/Downloads/randoms_Int64.csv"; )
sink = Si = DataFrames.DataFrame
transforms = Dict{Int,Function}()
append = false
args = kwargs = ()
source_schema = DataStreams.Data.schema(source)
sink_schema, transforms2 = DataStreams.Data.transform(source_schema, transforms, true)
sinkstreamtype = DataStreams.Data.Field
sink = Si(sink_schema, sinkstreamtype, append, args...; kwargs...)
columns = []
filter = x->true
@code_warntype DataStreams.Data.stream!(source, sinkstreamtype, sink, source_schema, sink_schema, transforms2, filter, columns) |
An additional note on performance that I believe is the specific purpose of julia> @benchmark Vector{Int}(1000000)
BenchmarkTools.Trial:
memory estimate: 7.63 MiB
allocs estimate: 2
--------------
minimum time: 66.183 μs (88.69% GC)
median time: 200.744 μs (59.07% GC)
mean time: 2.601 ms (96.94% GC)
maximum time: 17.754 ms (99.03% GC)
--------------
samples: 322
evals/sample: 6
julia>
julia> @benchmark Vector{Union{Nulls.Null,Int}}(1000000)
BenchmarkTools.Trial:
memory estimate: 7.63 MiB
allocs estimate: 2
--------------
minimum time: 979.422 μs (0.00% GC)
median time: 1.208 ms (0.00% GC)
mean time: 2.051 ms (42.53% GC)
maximum time: 5.859 ms (68.86% GC)
--------------
samples: 2433
evals/sample: 1 |
unlike codegen, only bitstypes (!isptr) fields are permitted in the union and the offset count starts from 0 instead of 1 but otherwise the tindex counter is compatible
Alrighty, an update from everyone's favorite Union pest: I took a journey down the dark & scary path of inspecting walls of LLVM code & and our own codegen.cpp and I think I've got a better handle on the specific performance problems I'm seeing. To summarize:
|
Really interesting stuff. FYI I get a slight performance improvement from function get_then_set2(A)
@simd for i = 1:10
val = g2(i)
if val isa Void
val = val::Void
@inbounds A[i] = val
else
val = val::Int
@inbounds A[i] = val
end
end
return A
end I suspect that getting vectorization will be quite a challenge, but improving the efficiency even without it would be a great thing. |
Discussion has been scattered across various repos, mostly around the best and/or most performant representation of missingness (null values), but I wanted to open up the discussion here specifically to talk about efforts to improve both code generation & memory layout when
Union{T1, T2}
types are involved, and more specifically whenT1
andT2
are both isbits (I'm not sure if that's a hard requirement, but it at least would seem to be important for the memory layout piece).I've put together a few examples that are (hopefully) representative of some core operations that we'd like to improve.
(*Note: I've run these examples against Julia 0.5, 0.6, and @vtjnash's
jn/union-bits-layout
branch w/o significant difference)I've aimed for these examples to be somewhat representative of actual code uses I anticipate (e.g. in
CSV.jl
, we're interested in calling aparse(...)::Union{T, Null}
and subsequently setting the return value in aVector{Union{T, Null}}
index).Inspecting the generated code in both
@code_warntype
and@code_llvm
, it seems to me the main issues theUnion
case runs into are:unless (Core.isa)(a::Union{Int64,Void},Void)::Any goto 14
at every call point for aUnion
So my main questions revolve around how we can address these performance issues and, assuming we succeed in doing so, how I can improve my mental model as a user/dev in terms of how to handle code that may involve
Union
s.Also happy to help any way I can; providing or refining examples, diving into
/src
, though I'm afraid I may be a bit less efficient than others, or helping write tests. Cheers!The text was updated successfully, but these errors were encountered: