Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of value type construction #36384

Closed
mtfishman opened this issue Jun 22, 2020 · 13 comments
Closed

Performance of value type construction #36384

mtfishman opened this issue Jun 22, 2020 · 13 comments
Assignees
Labels
performance Must go faster

Comments

@mtfishman
Copy link
Contributor

I've already posted about this issue on discourse here, but I didn't get any responses so I thought I would try here.

I noticed the following performance discrepancy between making a value type and making the instance of a value type using a runtime value:

julia> using BenchmarkTools

julia> x = "x"
"x"

julia> f(x::String) = Val{Symbol(x)}
f (generic function with 1 method)

julia> @btime f($x)
  138.326 ns (0 allocations: 0 bytes)
Val{:x}

julia> g(x::String) = Val{Symbol(x)}()
g (generic function with 1 method)

julia> @btime g($x)
  4.037 μs (0 allocations: 0 bytes)
Val{:x}()

This is with Julia v1.4.2:

julia> versioninfo()
Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) E-2176M  CPU @ 2.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)

but I see a similar discrepancy with Julia v1.5 (though it is a bit faster in v1.5, which is great!).

Is there a fundamental reason for this discrepancy? We are using a custom value type to do “runtime dispatch” (take a runtime value like a string, turn it into a value type, and then dispatch on that type). Originally we wanted to dispatch on the instances of the type. However, the 4 microsecond constructor overhead creates a bottleneck in the function (once the type is created, the dispatch and function that are being overloaded are very fast, and we are dispatching on multiple value types so the overhead adds up). We can dispatch on the type instead of the instance, but we were wondering if the constructor can (in principle) be made fast.

Sorry if this topic came up before, I tried searching for it but didn’t find anything addressing this particular question.

@KristofferC
Copy link
Member

KristofferC commented Jun 22, 2020

Dup of #21730 (and #29887).

Workaround is to use f(x::String) = Val{Symbol(x)}.instance (I think that should work at least).

@mtfishman
Copy link
Contributor Author

That's really helpful, thanks for the quick response. We will try that out. Feel free to close in favor of the previous issues.

@vtjnash
Copy link
Member

vtjnash commented Jun 22, 2020

We don't really want people depending on implementation details of that sort though. What's interesting here is that something is blocking inlining:

julia> @code_typed g("")
CodeInfo(
    @ REPL[11]:1 within `g'
   ┌ @ boot.jl:456 within `Symbol'
1 ─│ %1 = $(Expr(:foreigncall, :(:jl_string_ptr), Ptr{UInt8}, svec(Any), 0, :(:ccall), Core.Argument(2)))::Ptr{UInt8}
│  │ %2 = Core.sizeof(x)::Int64
│  │ %3 = $(Expr(:foreigncall, :(:jl_symbol_n), Ref{Symbol}, svec(Ptr{UInt8}, Int64), 0, :(:ccall), :(%1), :(%2), :(%2), :(%1)))::Symbol
│  └
│   %4 = Core.apply_type(Main.Val, %3)::Type{Val{_A}} where _A
│   %5 = (%4)()::Val{_A} where _A
└──      return %5
) => Val{_A} where _A

I think it's the unbound TypeVar in the representation of the method signature after intersection. This hack does much better (though we then have to write the error checking into this method ourself):

julia> @eval struct Val2{x}; (f::Type{<:Val2})() = (Base.isconcretetype(f) ? $(Expr(:new, :f)) : throw(UndefVarError(:x))); end

julia> Val2{T}() where T
ERROR: UndefVarError: x not defined
Stacktrace:
 [1] Val2{T}() at ./REPL[2]:1
 [2] top-level scope at REPL[3]:1

julia> @noinline g(x::String) = Val2{Symbol(x)}()
g (generic function with 1 method)

julia> @code_typed g("")
CodeInfo(
    @ REPL[4]:1 within `g'
   ┌ @ boot.jl:456 within `Symbol'
1 ─│ %1 = $(Expr(:foreigncall, :(:jl_string_ptr), Ptr{UInt8}, svec(Any), 0, :(:ccall), Core.Argument(2)))::Ptr{UInt8}
│  │ %2 = Core.sizeof(x)::Int64
│  │ %3 = $(Expr(:foreigncall, :(:jl_symbol_n), Ref{Symbol}, svec(Ptr{UInt8}, Int64), 0, :(:ccall), :(%1), :(%2), :(%2), :(%1)))::Symbol
│  └
│   %4 = Core.apply_type(Main.Val2, %3)::Type{Val2{_A}} where _A
│  ┌ @ REPL[2]:1 within `Val2'
│  │┌ @ reflection.jl:530 within `isconcretetype'
│  ││┌ @ Base.jl:28 within `getproperty'
│  │││ %5 = Base.getfield(%4, :isconcretetype)::Any
│  │└└
└──│      goto #3 if not %5
2 ─│ %7 = %new(%4)::Val2{_A} where _A
└──│      goto #4
   │┌ @ boot.jl:262 within `UndefVarError'
3 ─││ %9 = %new(Core.UndefVarError, :x)::UndefVarError
│  │└
│  │      Main.throw(%9)::Union{}
└──│      unreachable
   └
4return %7
) => Val2{_A} where _A

julia> @btime g("1") # with Val2
  175.574 ns (0 allocations: 0 bytes)
Val2{Symbol("1")}()

julia> @btime g("1") # with .instance
  172.044 ns (0 allocations: 0 bytes)
Val{Symbol("1")}()

julia> @btime g("1") # with Val
  12.810 μs (0 allocations: 0 bytes)
Val{Symbol("1")}()

@mtfishman
Copy link
Contributor Author

Great, we are happy to use that approach and wait for a native Julia fix.

@JeffBezanson JeffBezanson added the performance Must go faster label Jun 22, 2020
@JeffBezanson
Copy link
Member

This is interesting --- I think we could change constructor lowering to that, since new will already error if given a non-concrete type.

@vtjnash
Copy link
Member

vtjnash commented Jun 22, 2020

Ah, that makes it much easier. I didn't realize it did that, so I was trying to manually throw the same error. I suppose I should have tested it, haha.

@vtjnash
Copy link
Member

vtjnash commented Jun 22, 2020

More drastically, I think it'd be interesting for us to try to bind any function name as a local variable, so that we could detect any code with this common pattern during lowering:

Array{T,1}(::UndefInitializer, m::Int) where {T} =
    ccall(:jl_alloc_array_1d, Array{T,1}, (Any, Int), Array{T,1}, m)

And statically rewrite it to use the argument value, instead of preserving that static param and the apply_type calls (though that might complicate handling the isdefined check on usage and recursive closures, so ymmv).

But for starters, handling it as a pattern for new seems like it could be quite good already.

@mtfishman
Copy link
Contributor Author

I'm not able to follow all of the details, but is the conclusion that the best approach for defining a faster custom value type right now is:

@eval struct Val2{x}
  (f::Type{<:Val2})() = $(Expr(:new, :f))
end

while we wait for a native fix?

@JeffBezanson
Copy link
Member

Yes that should do it.

@JeffBezanson JeffBezanson self-assigned this Jul 1, 2020
@JeffBezanson
Copy link
Member

On master this takes 105ns and 235ns, so it seems to be largely fixed.

@vtjnash
Copy link
Member

vtjnash commented Nov 5, 2020

Probably #35384 (and thus by extension #35983), though that lowering change would still simplify the IR by removing the extra dispatch step, and make this 40% faster.

@JeffBezanson
Copy link
Member

I'll look into the lowering change. I think the unknown static parameter will still prevent inlining (the signature can be written without static parameters, but the user might not write it that way of course), but we could fix that (at least in the case where the static parameters are not used in the body).

@vtjnash
Copy link
Member

vtjnash commented Oct 19, 2022

addressed by #44664

@vtjnash vtjnash closed this as completed Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

4 participants