Compiler support for optimizing PersistentDict #51993

Keno · 2023-11-02T09:49:07Z

This is part of the work to address #51352 by attempting to allow the compiler to perform SRAO on persistent data structures like PersistentDict as if they were regular immutable data structures. These sorts of data structures have very complicated internals (with lots of mutation, memory sharing, etc.), but a relatively simple interface. As such, it is unlikely that our compiler will have sufficient power to optimize this interface by analyzing the implementation.

We thus need to come up with some other mechanism that gives the compiler license to perform the requisite optimization. One way would be to just hardcode PersistentDict into the compiler, optimizing it like any of the other builtin datatypes. However, this is of course very unsatisfying. At the other end of the spectrum would be something like a generic rewrite rule system (e-graphs anyone?) that would let the PersistentDict implementation declare its interface to the compiler and the compiler would use this for optimization (in a perfect world, the actual rewrite would then be checked using some sort of formal methods). I think that would be interesting, but we're very far from even being able to design something like that (at least in Base - experiments with external AbstractInterpreters in this direction are encouraged).

This PR tries to come up with a reasonable middle ground, where the compiler gets some knowledge of the protocol hardcoded without having to know about the implementation details of the data structure.

The basic ideas is that Core provides some magic generic functions that implementations can extend. Semantically, they are not special. They dispatch as usual, and implementations are expected to work properly even in the absence of any compiler optimizations.

However, the compiler is semantically permitted to perform structural optimization using these magic generic functions. In the concrete case, this PR introduces the KeyValue interface which consists of two generic functions, get and set. The core optimization is that the compiler is allowed to rewrite any occurrence of get(set(x, k, v), k) into v without additional legality checks. In particular, the compiler performs no type checks, conversions, etc. The higher level implementation code is expected to do all that.

This approach closely matches the general direction we've been taking in external AbstractInterpreters for embedding additional semantics and optimization opportunities into Julia code (although we generally use methods there, rather than full generic functions), so I think we have some evidence that this sort of approach works reasonably well.

Nevertheless, this is certainly an experiment and the interface is explicitly declared unstable.

Current Status

This is fully working and implemented, but the optimization currently bails on anything but the simplest cases. Filling all those cases in is not particularly hard, but should be done along with a more invasive refactoring of SROA, so we should figure out the general direction here first and then we can finish all that up in a follow-up cleanup.

Obligatory benchmark

Before:

julia> using BenchmarkTools

julia> function foo()
           a = Base.PersistentDict(:a => 1)
           return a[:a]
       end
foo (generic function with 1 method)

julia> @benchmark foo()
BenchmarkTools.Trial: 10000 samples with 993 evaluations.
 Range (min … max):  32.940 ns …  28.754 μs  ┊ GC (min … max):  0.00% … 99.76%
 Time  (median):     49.647 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   57.519 ns ± 333.275 ns  ┊ GC (mean ± σ):  10.81% ±  2.22%

        ▃█▅               ▁▃▅▅▃▁                ▁▃▂   ▂
  ▁▂▄▃▅▇███▇▃▁▂▁▁▁▁▁▁▁▁▂▂▅██████▅▂▁▁▁▁▁▁▁▁▁▁▂▃▃▇███▇▆███▆▄▃▃▂▂ ▃
  32.9 ns         Histogram: frequency by time         68.6 ns <

 Memory estimate: 128 bytes, allocs estimate: 4.

julia> @code_typed foo()
CodeInfo(
1 ─ %1  = invoke Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}(Base.HashArrayMappedTries.undef::UndefInitializer, 1::Int64)::Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}
│   %2  = %new(Base.HashArrayMappedTries.HAMT{Symbol, Int64}, %1, 0x00000000)::Base.HashArrayMappedTries.HAMT{Symbol, Int64}
│   %3  = %new(Base.HashArrayMappedTries.Leaf{Symbol, Int64}, :a, 1)::Base.HashArrayMappedTries.Leaf{Symbol, Int64}
│   %4  = Base.getfield(%2, :data)::Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}
│   %5  = $(Expr(:boundscheck, true))::Bool
└──       goto #5 if not %5
2 ─ %7  = Base.sub_int(1, 1)::Int64
│   %8  = Base.bitcast(UInt64, %7)::UInt64
│   %9  = Base.getfield(%4, :size)::Tuple{Int64}
│   %10 = $(Expr(:boundscheck, true))::Bool
│   %11 = Base.getfield(%9, 1, %10)::Int64
│   %12 = Base.bitcast(UInt64, %11)::UInt64
│   %13 = Base.ult_int(%8, %12)::Bool
└──       goto #4 if not %13
3 ─       goto #5
4 ─ %16 = Core.tuple(1)::Tuple{Int64}
│         invoke Base.throw_boundserror(%4::Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}, %16::Tuple{Int64})::Union{}
└──       unreachable
5 ┄ %19 = Base.getfield(%4, :ref)::MemoryRef{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}
│   %20 = Base.memoryref(%19, 1, false)::MemoryRef{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}
│         Base.memoryrefset!(%20, %3, :not_atomic, false)::MemoryRef{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}
└──       goto #6
6 ─ %23 = Base.getfield(%2, :bitmap)::UInt32
│   %24 = Base.or_int(%23, 0x00010000)::UInt32
│         Base.setfield!(%2, :bitmap, %24)::UInt32
└──       goto #7
7 ─ %27 = %new(Base.PersistentDict{Symbol, Int64}, %2)::Base.PersistentDict{Symbol, Int64}
└──       goto #8
8 ─ %29 = invoke Base.getindex(%27::Base.PersistentDict{Symbol, Int64}, :a::Symbol)::Int64
└──       return %29

After:

julia> using BenchmarkTools

julia> function foo()
           a = Base.PersistentDict(:a => 1)
           return a[:a]
       end
foo (generic function with 1 method)

julia> @benchmark foo()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  2.459 ns … 11.320 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.460 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.469 ns ±  0.183 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂    █                                              ▁    █ ▂
  █▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁█ █
  2.46 ns      Histogram: log(frequency) by time     2.47 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @code_typed foo()
CodeInfo(
1 ─     return 1

JeffBezanson · 2023-11-02T18:29:44Z

LGTM

vchuravy · 2023-11-02T18:35:41Z

this is really cool. A little bit scary but I prefer it over moving the definition into the compiler wholesale.

base/optimized_generics.jl

topolarity · 2023-11-07T16:28:51Z

What are the consequences of actions that break interface assumptions?

julia> dict = Base.PersistentDict(0=>false)
Base.PersistentDict{Int64, Bool} with 1 entry:
  0 => 0

julia> dict.trie.data[1] = Base.HashArrayMappedTries.Leaf{Int64, Bool}(0, true)
Base.HashArrayMappedTries.Leaf{Int64, Bool}(0, true)

julia> dict
Base.PersistentDict{Int64, Bool} with 1 entry:
  0 => 1

Obviously a user "shouldn't" do this, but would this now trigger UB?

Keno · 2023-11-09T04:34:59Z

Obviously a user "shouldn't" do this, but would this now trigger UB?

Yes, any misuse is undefined behavior. There are some basic sanity checks that avoid applying the optimization if you things are obviously wrong, but we can't really say anything else.

Keno · 2023-11-09T09:09:10Z

I've given this some further thought, and I don't think the callback design I have here is all that good. I'm planning to rejigger this with an optional return for the getter, having it return Union{Nothing, Tuple{V}}. That currently has some performance implications, because in general a tuple return with arbitrary values is a bad idea (there's a reason we turn this into a struct all over the compiler), but we can fix that with a better ABI, which I'll put up as a separate PR.

topolarity · 2023-11-09T13:32:14Z

base/compiler/ssair/passes.jl

+    #length(stmt.args) == 4 || return
+
+    collection = stmt.args[end-1]
+    key = stmt.args[end]


I think we need some kind of a mutability check on key, since PersistentDict allows mutable keys but does not behave very sensibly for them.

Can you expand what you mean by doesn't behave sensible? Or open a separate issue?

The main problem for this optimization is:

julia> x = Int64[0]; julia> d = Base.PersistentDict(x => false, [1] => true); julia> x[1] = 1 julia> d[x] true

We could make it an IdDict equivalent.

That still leaves types like IdDict out in the cold, since it is an immutable struct, that references mutable contents to compute the hash. There is isidentityfree that solves that issue though, if desired.

aviatesk · 2023-11-16T12:34:20Z

base/compiler/ssair/passes.jl

+                             leaves::Vector{Any}, 𝕃ₒ::AbstractLattice)
+    # For every leaf, the lifted value
+    lifted_leaves = LiftedLeaves()
+    maybe_undef = false


Suggested change

maybe_undef = false

aviatesk · 2023-11-16T12:34:55Z

base/compiler/ssair/passes.jl

+    lifted_result = lift_leaves_keyvalue(compact, key, leaves, 𝕃ₒ)
+    lifted_result === nothing && return
+    lifted_leaves, any_undef = lifted_result


Suggested change

lifted_result = lift_leaves_keyvalue(compact, key, leaves, 𝕃ₒ)

lifted_result === nothing && return

lifted_leaves, any_undef = lifted_result

lifted_leaves = lift_leaves_keyvalue(compact, key, leaves, 𝕃ₒ)

lifted_leaves === nothing && return

aviatesk · 2023-11-16T12:43:57Z

I don't think the callback design I have here is all that good. I'm planning to rejigger this with an optional return for the getter, having it return Union{Nothing, Tuple{V}}

What is the main reason for this? Is it because it makes debugging difficult when users are developing code to utilize this interface?
I'm a bit concerned about the performance impact of @noinlines that are necessary for SROA to recognize these interface primitives in cases when SROA can't be performed.

aviatesk · 2023-11-16T12:44:15Z

Rebased against the latest master.

Keno · 2023-11-18T08:19:21Z

What is the main reason for this?

The callbacks are a bit awkward, since it's not clear when to infer them, if you want to optimize them.

Keno · 2023-11-20T19:06:26Z

Review addressed, rebased, and updated in line with what I said I would do above. Needs #52193 to be sound, but other than that, I don't see much reason not to try this. The one remaining concern is that because of the @noinline the performance is slightly worse for loosely typed ScopedValues, requiring an extra allocation. However, as indicated above, that can definitely be addressed with some codegen ABI improvements. That said, I'm not planning on finishing those up immediately unless it becomes a problem - I think there's bigger fish to fry elsewhere.

As discussed in #51352, this gives `EnterNode` the ability to set (and restore on leave or catch edge) jl_current_task->scope. Manual modifications of the task field after the task has started are considered undefined behavior. In addition, we gain a new intrinsic to access current_task->scope and both inference and the optimizer will forward scopes from EnterNodes to this intrinsic (non-interprocedurally). Together with #51993 this is sufficient to fully optimize ScopedValues (non-interprocedurally at least).

@benchmark

This is part of the work to address #51352 by attempting to allow the compiler to perform SRAO on persistent data structures like `PersistentDict` as if they were regular immutable data structures. These sorts of data structures have very complicated internals (with lots of mutation, memory sharing, etc.), but a relatively simple interface. As such, it is unlikely that our compiler will have sufficient power to optimize this interface by analyzing the implementation. We thus need to come up with some other mechanism that gives the compiler license to perform the requisite optimization. One way would be to just hardcode `PersistentDict` into the compiler, optimizing it like any of the other builtin datatypes. However, this is of course very unsatisfying. At the other end of the spectrum would be something like a generic rewrite rule system (e-graphs anyone?) that would let the PersistentDict implementation declare its interface to the compiler and the compiler would use this for optimization (in a perfect world, the actual rewrite would then be checked using some sort of formal methods). I think that would be interesting, but we're very far from even being able to design something like that (at least in Base - experiments with external AbstractInterpreters in this direction are encouraged). This PR tries to come up with a reasonable middle ground, where the compiler gets some knowledge of the protocol hardcoded without having to know about the implementation details of the data structure. The basic ideas is that `Core` provides some magic generic functions that implementations can extend. Semantically, they are not special. They dispatch as usual, and implementations are expected to work properly even in the absence of any compiler optimizations. However, the compiler is semantically permitted to perform structural optimization using these magic generic functions. In the concrete case, this PR introduces the `KeyValue` interface which consists of two generic functions, `get` and `set`. The core optimization is that the compiler is allowed to rewrite any occurrence of `get(set(x, k, v), k)` into `v` without additional legality checks. In particular, the compiler performs no type checks, conversions, etc. The higher level implementation code is expected to do all that. This approach closely matches the general direction we've been taking in external AbstractInterpreters for embedding additional semantics and optimization opportunities into Julia code (although we generally use methods there, rather than full generic functions), so I think we have some evidence that this sort of approach works reasonably well. Nevertheless, this is certainly an experiment and the interface is explicitly declared unstable. This is fully working and implemented, but the optimization currently bails on anything but the simplest cases. Filling all those cases in is not particularly hard, but should be done along with a more invasive refactoring of SROA, so we should figure out the general direction here first and then we can finish all that up in a follow-up cleanup. Before: ``` julia> using BenchmarkTools julia> function foo() a = Base.PersistentDict(:a => 1) return a[:a] end foo (generic function with 1 method) julia> @benchmark foo() BenchmarkTools.Trial: 10000 samples with 993 evaluations. Range (min … max): 32.940 ns … 28.754 μs ┊ GC (min … max): 0.00% … 99.76% Time (median): 49.647 ns ┊ GC (median): 0.00% Time (mean ± σ): 57.519 ns ± 333.275 ns ┊ GC (mean ± σ): 10.81% ± 2.22% ▃█▅ ▁▃▅▅▃▁ ▁▃▂ ▂ ▁▂▄▃▅▇███▇▃▁▂▁▁▁▁▁▁▁▁▂▂▅██████▅▂▁▁▁▁▁▁▁▁▁▁▂▃▃▇███▇▆███▆▄▃▃▂▂ ▃ 32.9 ns Histogram: frequency by time 68.6 ns < Memory estimate: 128 bytes, allocs estimate: 4. julia> @code_typed foo() CodeInfo( 1 ─ %1 = invoke Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}(Base.HashArrayMappedTries.undef::UndefInitializer, 1::Int64)::Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}} │ %2 = %new(Base.HashArrayMappedTries.HAMT{Symbol, Int64}, %1, 0x00000000)::Base.HashArrayMappedTries.HAMT{Symbol, Int64} │ %3 = %new(Base.HashArrayMappedTries.Leaf{Symbol, Int64}, :a, 1)::Base.HashArrayMappedTries.Leaf{Symbol, Int64} │ %4 = Base.getfield(%2, :data)::Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}} │ %5 = $(Expr(:boundscheck, true))::Bool └── goto #5 if not %5 2 ─ %7 = Base.sub_int(1, 1)::Int64 │ %8 = Base.bitcast(UInt64, %7)::UInt64 │ %9 = Base.getfield(%4, :size)::Tuple{Int64} │ %10 = $(Expr(:boundscheck, true))::Bool │ %11 = Base.getfield(%9, 1, %10)::Int64 │ %12 = Base.bitcast(UInt64, %11)::UInt64 │ %13 = Base.ult_int(%8, %12)::Bool └── goto #4 if not %13 3 ─ goto #5 4 ─ %16 = Core.tuple(1)::Tuple{Int64} │ invoke Base.throw_boundserror(%4::Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}, %16::Tuple{Int64})::Union{} └── unreachable 5 ┄ %19 = Base.getfield(%4, :ref)::MemoryRef{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}} │ %20 = Base.memoryref(%19, 1, false)::MemoryRef{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}} │ Base.memoryrefset!(%20, %3, :not_atomic, false)::MemoryRef{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}} └── goto #6 6 ─ %23 = Base.getfield(%2, :bitmap)::UInt32 │ %24 = Base.or_int(%23, 0x00010000)::UInt32 │ Base.setfield!(%2, :bitmap, %24)::UInt32 └── goto #7 7 ─ %27 = %new(Base.PersistentDict{Symbol, Int64}, %2)::Base.PersistentDict{Symbol, Int64} └── goto #8 8 ─ %29 = invoke Base.getindex(%27::Base.PersistentDict{Symbol, Int64}, 🅰️:Symbol)::Int64 └── return %29 ``` After: ``` julia> using BenchmarkTools julia> function foo() a = Base.PersistentDict(:a => 1) return a[:a] end foo (generic function with 1 method) julia> @benchmark foo() BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 2.459 ns … 11.320 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 2.460 ns ┊ GC (median): 0.00% Time (mean ± σ): 2.469 ns ± 0.183 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▂ █ ▁ █ ▂ █▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁█ █ 2.46 ns Histogram: log(frequency) by time 2.47 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @code_typed foo() CodeInfo( 1 ─ return 1 ```

Keno · 2023-11-26T16:40:44Z

Rebased. I think this is basically good to go. I'm gonna do some more experiments with this, but those might go into a follow up PR, depending.

Keno · 2023-11-27T17:01:47Z

Working well in my downstream testing. It does still not fully optimize ScopedValue, because we do not yet have the ability to remove all the setup code, which will need some effect tweaks, but that's independent of this, so I'll go ahead and merge this.

As discussed in #51352, this gives `EnterNode` the ability to set (and restore on leave or catch edge) jl_current_task->scope. Manual modifications of the task field after the task has started are considered undefined behavior. In addition, we gain a new intrinsic to access current_task->scope and both inference and the optimizer will forward scopes from EnterNodes to this intrinsic (non-interprocedurally). Together with #51993 this is sufficient to fully optimize ScopedValues (non-interprocedurally at least).

@benchmark

This is part of the work to address JuliaLang#51352 by attempting to allow the compiler to perform SRAO on persistent data structures like `PersistentDict` as if they were regular immutable data structures. These sorts of data structures have very complicated internals (with lots of mutation, memory sharing, etc.), but a relatively simple interface. As such, it is unlikely that our compiler will have sufficient power to optimize this interface by analyzing the implementation. We thus need to come up with some other mechanism that gives the compiler license to perform the requisite optimization. One way would be to just hardcode `PersistentDict` into the compiler, optimizing it like any of the other builtin datatypes. However, this is of course very unsatisfying. At the other end of the spectrum would be something like a generic rewrite rule system (e-graphs anyone?) that would let the PersistentDict implementation declare its interface to the compiler and the compiler would use this for optimization (in a perfect world, the actual rewrite would then be checked using some sort of formal methods). I think that would be interesting, but we're very far from even being able to design something like that (at least in Base - experiments with external AbstractInterpreters in this direction are encouraged). This PR tries to come up with a reasonable middle ground, where the compiler gets some knowledge of the protocol hardcoded without having to know about the implementation details of the data structure. The basic ideas is that `Core` provides some magic generic functions that implementations can extend. Semantically, they are not special. They dispatch as usual, and implementations are expected to work properly even in the absence of any compiler optimizations. However, the compiler is semantically permitted to perform structural optimization using these magic generic functions. In the concrete case, this PR introduces the `KeyValue` interface which consists of two generic functions, `get` and `set`. The core optimization is that the compiler is allowed to rewrite any occurrence of `get(set(x, k, v), k)` into `v` without additional legality checks. In particular, the compiler performs no type checks, conversions, etc. The higher level implementation code is expected to do all that. This approach closely matches the general direction we've been taking in external AbstractInterpreters for embedding additional semantics and optimization opportunities into Julia code (although we generally use methods there, rather than full generic functions), so I think we have some evidence that this sort of approach works reasonably well. Nevertheless, this is certainly an experiment and the interface is explicitly declared unstable. ## Current Status This is fully working and implemented, but the optimization currently bails on anything but the simplest cases. Filling all those cases in is not particularly hard, but should be done along with a more invasive refactoring of SROA, so we should figure out the general direction here first and then we can finish all that up in a follow-up cleanup. ## Obligatory benchmark Before: ``` julia> using BenchmarkTools julia> function foo() a = Base.PersistentDict(:a => 1) return a[:a] end foo (generic function with 1 method) julia> @benchmark foo() BenchmarkTools.Trial: 10000 samples with 993 evaluations. Range (min … max): 32.940 ns … 28.754 μs ┊ GC (min … max): 0.00% … 99.76% Time (median): 49.647 ns ┊ GC (median): 0.00% Time (mean ± σ): 57.519 ns ± 333.275 ns ┊ GC (mean ± σ): 10.81% ± 2.22% ▃█▅ ▁▃▅▅▃▁ ▁▃▂ ▂ ▁▂▄▃▅▇███▇▃▁▂▁▁▁▁▁▁▁▁▂▂▅██████▅▂▁▁▁▁▁▁▁▁▁▁▂▃▃▇███▇▆███▆▄▃▃▂▂ ▃ 32.9 ns Histogram: frequency by time 68.6 ns < Memory estimate: 128 bytes, allocs estimate: 4. julia> @code_typed foo() CodeInfo( 1 ─ %1 = invoke Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}(Base.HashArrayMappedTries.undef::UndefInitializer, 1::Int64)::Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}} │ %2 = %new(Base.HashArrayMappedTries.HAMT{Symbol, Int64}, %1, 0x00000000)::Base.HashArrayMappedTries.HAMT{Symbol, Int64} │ %3 = %new(Base.HashArrayMappedTries.Leaf{Symbol, Int64}, :a, 1)::Base.HashArrayMappedTries.Leaf{Symbol, Int64} │ %4 = Base.getfield(%2, :data)::Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}} │ %5 = $(Expr(:boundscheck, true))::Bool └── goto JuliaLang#5 if not %5 2 ─ %7 = Base.sub_int(1, 1)::Int64 │ %8 = Base.bitcast(UInt64, %7)::UInt64 │ %9 = Base.getfield(%4, :size)::Tuple{Int64} │ %10 = $(Expr(:boundscheck, true))::Bool │ %11 = Base.getfield(%9, 1, %10)::Int64 │ %12 = Base.bitcast(UInt64, %11)::UInt64 │ %13 = Base.ult_int(%8, %12)::Bool └── goto JuliaLang#4 if not %13 3 ─ goto JuliaLang#5 4 ─ %16 = Core.tuple(1)::Tuple{Int64} │ invoke Base.throw_boundserror(%4::Vector{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}}, %16::Tuple{Int64})::Union{} └── unreachable 5 ┄ %19 = Base.getfield(%4, :ref)::MemoryRef{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}} │ %20 = Base.memoryref(%19, 1, false)::MemoryRef{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}} │ Base.memoryrefset!(%20, %3, :not_atomic, false)::MemoryRef{Union{Base.HashArrayMappedTries.HAMT{Symbol, Int64}, Base.HashArrayMappedTries.Leaf{Symbol, Int64}}} └── goto JuliaLang#6 6 ─ %23 = Base.getfield(%2, :bitmap)::UInt32 │ %24 = Base.or_int(%23, 0x00010000)::UInt32 │ Base.setfield!(%2, :bitmap, %24)::UInt32 └── goto JuliaLang#7 7 ─ %27 = %new(Base.PersistentDict{Symbol, Int64}, %2)::Base.PersistentDict{Symbol, Int64} └── goto JuliaLang#8 8 ─ %29 = invoke Base.getindex(%27::Base.PersistentDict{Symbol, Int64}, 🅰️:Symbol)::Int64 └── return %29 ``` After: ``` julia> using BenchmarkTools julia> function foo() a = Base.PersistentDict(:a => 1) return a[:a] end foo (generic function with 1 method) julia> @benchmark foo() BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 2.459 ns … 11.320 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 2.460 ns ┊ GC (median): 0.00% Time (mean ± σ): 2.469 ns ± 0.183 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▂ █ ▁ █ ▂ █▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁█ █ 2.46 ns Histogram: log(frequency) by time 2.47 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @code_typed foo() CodeInfo( 1 ─ return 1 ```

As discussed in #51352, this gives `EnterNode` the ability to set (and restore on leave or catch edge) jl_current_task->scope. Manual modifications of the task field after the task has started are considered undefined behavior. In addition, we gain a new intrinsic to access current_task->scope and both inference and the optimizer will forward scopes from EnterNodes to this intrinsic (non-interprocedurally). Together with #51993 this is sufficient to fully optimize ScopedValues (non-interprocedurally at least).

Keno requested review from vchuravy, JeffBezanson and aviatesk November 2, 2023 09:49

JeffBezanson added the compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) label Nov 2, 2023

topolarity reviewed Nov 7, 2023

View reviewed changes

base/optimized_generics.jl Show resolved Hide resolved

topolarity reviewed Nov 9, 2023

View reviewed changes

aviatesk reviewed Nov 16, 2023

View reviewed changes

aviatesk force-pushed the kf/dictopt branch from f5e821f to 69b582a Compare November 16, 2023 12:44

vchuravy mentioned this pull request Nov 16, 2023

Make PersistentDict behave like an IdDict #52193

Merged

Keno force-pushed the kf/dictopt branch from 69b582a to d4241a7 Compare November 20, 2023 19:04

Keno mentioned this pull request Nov 26, 2023

Make EnterNode save/restore dynamic scope #52309

Merged

Keno force-pushed the kf/dictopt branch from d4241a7 to 252716e Compare November 26, 2023 16:39

Keno changed the title ~~RFC: Compiler support for optimizing PersistentDict~~ Compiler support for optimizing PersistentDict Nov 26, 2023

Keno merged commit 5b2fcb6 into master Nov 27, 2023

Keno deleted the kf/dictopt branch November 27, 2023 17:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compiler support for optimizing PersistentDict #51993

Compiler support for optimizing PersistentDict #51993

Keno commented Nov 2, 2023

JeffBezanson commented Nov 2, 2023

vchuravy commented Nov 2, 2023

topolarity commented Nov 7, 2023 •

edited

Loading

Keno commented Nov 9, 2023

Keno commented Nov 9, 2023

topolarity Nov 9, 2023

vchuravy Nov 9, 2023

topolarity Nov 9, 2023 •

edited

Loading

vchuravy Nov 9, 2023

vtjnash Nov 16, 2023

aviatesk Nov 16, 2023

aviatesk Nov 16, 2023

aviatesk commented Nov 16, 2023

aviatesk commented Nov 16, 2023 •

edited

Loading

Keno commented Nov 18, 2023

Keno commented Nov 20, 2023

Keno commented Nov 26, 2023

Keno commented Nov 27, 2023

Compiler support for optimizing PersistentDict #51993

Compiler support for optimizing PersistentDict #51993

Conversation

Keno commented Nov 2, 2023

Current Status

Obligatory benchmark

JeffBezanson commented Nov 2, 2023

vchuravy commented Nov 2, 2023

topolarity commented Nov 7, 2023 • edited Loading

Keno commented Nov 9, 2023

Keno commented Nov 9, 2023

topolarity Nov 9, 2023

Choose a reason for hiding this comment

vchuravy Nov 9, 2023

Choose a reason for hiding this comment

topolarity Nov 9, 2023 • edited Loading

Choose a reason for hiding this comment

vchuravy Nov 9, 2023

Choose a reason for hiding this comment

vtjnash Nov 16, 2023

Choose a reason for hiding this comment

aviatesk Nov 16, 2023

Choose a reason for hiding this comment

aviatesk Nov 16, 2023

Choose a reason for hiding this comment

aviatesk commented Nov 16, 2023

aviatesk commented Nov 16, 2023 • edited Loading

Keno commented Nov 18, 2023

Keno commented Nov 20, 2023

Keno commented Nov 26, 2023

Keno commented Nov 27, 2023

topolarity commented Nov 7, 2023 •

edited

Loading

topolarity Nov 9, 2023 •

edited

Loading

aviatesk commented Nov 16, 2023 •

edited

Loading