Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimizer: Julia-level escape analysis #43800

Merged
merged 1 commit into from
Feb 16, 2022
Merged

Conversation

aviatesk
Copy link
Member

@aviatesk aviatesk commented Jan 13, 2022

This commit ports EscapeAnalysis.jl into Julia base.
You can find the documentation of this escape analysis at this GitHub page1.

This escape analysis will hopefully be an enabling technology for various
memory-related optimizations at Julia's high level compilation pipeline.
Possible target optimization includes alias aware SROA (#43888),
array SROA (#43909), mutating_arrayfreeze optimization (#42465),
stack allocation of mutables, finalizer elision and so on2.

The primary motivation for porting EA in this PR is to check its impact
on latency as well as to get feedbacks from a broader range of developers.
The plan is that we first introduce EA in this commit, and then merge the
depending PRs built on top of this commit like #43888, #43909 and #42465

This commit simply defines EA inside Julia base compiler and
enables the existing test suite with it. In this commit, we just run EA
before inlining to generate IPO cache. The depending PRs, EA will be
reran after inlining to be used for various local optimizations.


  • IPO cache generation
  • improve performance (reduce additional allocations to ~105% or such)

Footnotes

  1. The same documentation will be included into Julia's developer
    documentation by this commit.

  2. It would be also interesting if LLVM-level optimizations can consume
    IPO information derived by this escape analysis to broaden
    optimization possibilities.

@aviatesk
Copy link
Member Author

@nanosoldier runtests(ALL, vs = ":master")

@aviatesk aviatesk added compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) needs nanosoldier run This PR should have benchmarks run on it labels Jan 13, 2022
@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

@KristofferC
Copy link
Member

@nanosoldier runtests(["AdvancedHMC", "Cambrian", "DelayDiffEq", "Evolutionary", "FastFloat16s", "GalacticOptim", "HydrophoneCalibrations", "ITensors", "IonSim", "LoopVectorization", "MatrixPencils", "MemPool", "MetaArrays", "NarrativeTest", "POMDPPolicies", "Pfam", "PotentialFlow", "Qaintessent", "QuadEig", "RuntimeGeneratedFunctions", "SparseDiffTools", "SpatialJackknife", "StochasticPrograms", "SymbolicRegression", "VoronoiGraph", "WavePropBase"], vs = ":master")


const IR_FLAG_NULL = 0x00
# This statement is marked as @inbounds by user.
# Ff replaced by inlining, any contained boundschecks may be removed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ff?

# If this is not set, all the uses are (statically) dominated by the defs.
# In particular, if a slot has `AssignedOnce && !StaticUndef`, it is an SSA.
const SLOT_STATICUNDEF = 1 # slot might be used before it is defined (structurally)
const SLOT_ASSIGNEDONCE = 16 # slot is assigned to only once
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also do 0x01 << 0

base/compiler/ssair/EscapeAnalysis/EAUtils.jl Outdated Show resolved Hide resolved
base/compiler/ssair/EscapeAnalysis/EscapeAnalysis.jl Outdated Show resolved Hide resolved
@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

@aviatesk aviatesk force-pushed the avi/EscapeAnalysis branch 3 times, most recently from a07cfa3 to 2556a1a Compare January 17, 2022 10:48
const IR_FLAG_EFFECT_FREE = 0x01 << 4

# known to be always effect-free (in particular nothrow)
const _PURE_BUILTINS = Any[tuple, svec, ===, typeof, nfields]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this usage of typeof in ImmutableArrays PR have relevance / problems here?
https://github.com/JuliaLang/julia/pull/42465/files#diff-b6cb5d410b973b5987bc13b1eba0f156fcaece59cfbd6cb12ac503ff44fd13caR1696-R1698

I know there was prior conversation about that kind of usage being purity-violating

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The primary concern at the comment was the use of jl_set_typeof and not uses of typeof queries.
Basically the runtime system may have made some assumptions based on type of already allocated objects, and that use of jl_set_typeof (after allocating object) might cause some problem.

@aviatesk aviatesk force-pushed the avi/EscapeAnalysis branch 2 times, most recently from 0a36203 to eccc077 Compare January 21, 2022 15:15
aviatesk added a commit that referenced this pull request Jan 21, 2022
Unleashes the potential for SROA of mutables using the novel Julia-level
escape analysis (on top of #43800):
1. alias-aware SROA, mutable ϕ-node elimination
2. `isdefined` check elimination
3. load-forwarding for non-eliminable but non-escaping mutables

---

1. alias-aware SROA, mutable ϕ-node elimination

EA's alias analysis allows this new SROA to handle nested mutables allocations
pretty well. Now we can eliminate the heap allocations completely from
this insanely nested examples by the single analysis/optimization pass:
```julia
julia> function refs(x)
           (Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref((x))))))))))))[][][][][][][][][][]
       end
refs (generic function with 1 method)

julia> refs("julia"); @allocated refs("julia")
0
```

EA can also analyze escape of ϕ-node as well as its aliasing.
Mutable ϕ-nodes would be eliminated even for a very tricky case as like:
```julia
julia> code_typed((Bool,String,)) do cond, x
           # these allocation form multiple ϕ-nodes
           if cond
               ϕ2 = ϕ1 = Ref{Any}("foo")
           else
               ϕ2 = ϕ1 = Ref{Any}("bar")
           end
           ϕ2[] = x
           y = ϕ1[] # => x
           return y
       end
1-element Vector{Any}:
 CodeInfo(
1 ─     goto #3 if not cond
2 ─     goto #4
3 ─     nothing::Nothing
4 ┄     return x
) => Any
```

Combined with the powerful alias analysis and ϕ-node handling,
the following realistic examples will be fully optimized:
```julia
julia> # demonstrate the power of our field / alias analysis with realistic end to end examples
       # adapted from http://wiki.luajit.org/Allocation-Sinking-Optimization#implementation%5B
       abstract type AbstractPoint{T} end

julia> struct Point{T} <: AbstractPoint{T}
           x::T
           y::T
       end

julia> mutable struct MPoint{T} <: AbstractPoint{T}
           x::T
           y::T
       end

julia> add(a::P, b::P) where P<:AbstractPoint = P(a.x + b.x, a.y + b.y)
add (generic function with 1 method)

julia> function compute_point(T, n, ax, ay, bx, by)
           a = T(ax, ay)
           b = T(bx, by)
           for i in 0:(n-1)
               a = add(add(a, b), b)
           end
           a.x, a.y
       end
compute_point (generic function with 2 methods)

julia> function compute_point(n, a, b)
           for i in 0:(n-1)
               a = add(add(a, b), b)
           end
           a.x, a.y
       end
compute_point (generic function with 2 methods)

julia> function compute_point!(n, a, b)
           for i in 0:(n-1)
               a′ = add(add(a, b), b)
               a.x = a′.x
               a.y = a′.y
           end
       end
compute_point! (generic function with 1 method)

julia> compute_point(MPoint, 10, 1+.5, 2+.5, 2+.25, 4+.75);

julia> compute_point(MPoint, 10, 1+.5im, 2+.5im, 2+.25im, 4+.75im);

julia> @test @allocated(compute_point(MPoint, 10000, 1+.5, 2+.5, 2+.25, 4+.75)) == 0
Test Passed

julia> @test @allocated(compute_point(MPoint, 10000, 1+.5im, 2+.5im, 2+.25im, 4+.75im)) == 0
Test Passed

julia> compute_point(10, MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75));

julia> compute_point(10, MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im));

julia> @test @allocated(compute_point(10000, MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75))) == 0
Test Passed

julia> @test @allocated(compute_point(10000, MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im))) == 0
Test Passed

julia> af, bf = MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75);

julia> ac, bc = MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im);

julia> compute_point!(10, af, bf);

julia> compute_point!(10, ac, bc);

julia> @test @allocated(compute_point!(10000, af, bf)) == 0
Test Passed

julia> @test @allocated(compute_point!(10000, ac, bc)) == 0
Test Passed
```

2. `isdefined` check elimination

This commit also implements a simple optimization to eliminate
`isdefined` call by checking load-fowardability.
This optimization may be especially useful to eliminate extra allocation
involved with a capturing closure, e.g.:
```julia
julia> callit(f, args...) = f(args...);

julia> function isdefined_elim()
           local arr::Vector{Any}
           callit() do
               arr = Any[]
           end
           return arr
       end;

julia> code_typed(isdefined_elim)
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Any}, svec(Any, Int64), 0, :(:ccall), Vector{Any}, 0, 0))::Vector{Any}
└──      goto #3 if not true
2 ─      goto #4
3 ─      $(Expr(:throw_undef_if_not, :arr, false))::Any
4 ┄      return %1
) => Vector{Any}
```

3. load-forwarding for non-eliminable but non-escaping mutables

EA also allows us to forward loads even when the mutable allocation
can't be eliminated but still its fields are known precisely.
The load forwarding might be useful since it may derive new type information
that succeeding optimization passes can use (or just because it allows
simpler code transformations down the load):
```julia
julia> code_typed((Bool,String,)) do c, s
           r = Ref{Any}(s)
           if c
               return r[]::String # adce_pass! will further eliminate this type assert call also
           else
               return r
           end
       end
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = %new(Base.RefValue{Any}, s)::Base.RefValue{Any}
└──      goto #3 if not c
2 ─      return s
3 ─      return %1
) => Union{Base.RefValue{Any}, String}
```

---

Please refer to the newly added test cases for more examples.
Also, EA's alias analysis already succeeds to reason about arrays, and
so this EA-based SROA will hopefully be generalized for array SROA as well.
aviatesk added a commit that referenced this pull request Jan 21, 2022
Unleashes the potential for SROA of mutables using the novel Julia-level
escape analysis (on top of #43800):
1. alias-aware SROA, mutable ϕ-node elimination
2. `isdefined` check elimination
3. load-forwarding for non-eliminable but non-escaping mutables

---

1. alias-aware SROA, mutable ϕ-node elimination

EA's alias analysis allows this new SROA to handle nested mutables allocations
pretty well. Now we can eliminate the heap allocations completely from
this insanely nested examples by the single analysis/optimization pass:
```julia
julia> function refs(x)
           (Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref((x))))))))))))[][][][][][][][][][]
       end
refs (generic function with 1 method)

julia> refs("julia"); @allocated refs("julia")
0
```

EA can also analyze escape of ϕ-node as well as its aliasing.
Mutable ϕ-nodes would be eliminated even for a very tricky case as like:
```julia
julia> code_typed((Bool,String,)) do cond, x
           # these allocation form multiple ϕ-nodes
           if cond
               ϕ2 = ϕ1 = Ref{Any}("foo")
           else
               ϕ2 = ϕ1 = Ref{Any}("bar")
           end
           ϕ2[] = x
           y = ϕ1[] # => x
           return y
       end
1-element Vector{Any}:
 CodeInfo(
1 ─     goto #3 if not cond
2 ─     goto #4
3 ─     nothing::Nothing
4 ┄     return x
) => Any
```

Combined with the powerful alias analysis and ϕ-node handling,
the following realistic examples will be fully optimized:
```julia
julia> # demonstrate the power of our field / alias analysis with realistic end to end examples
       # adapted from http://wiki.luajit.org/Allocation-Sinking-Optimization#implementation%5B
       abstract type AbstractPoint{T} end

julia> struct Point{T} <: AbstractPoint{T}
           x::T
           y::T
       end

julia> mutable struct MPoint{T} <: AbstractPoint{T}
           x::T
           y::T
       end

julia> add(a::P, b::P) where P<:AbstractPoint = P(a.x + b.x, a.y + b.y)
add (generic function with 1 method)

julia> function compute_point(T, n, ax, ay, bx, by)
           a = T(ax, ay)
           b = T(bx, by)
           for i in 0:(n-1)
               a = add(add(a, b), b)
           end
           a.x, a.y
       end
compute_point (generic function with 2 methods)

julia> function compute_point(n, a, b)
           for i in 0:(n-1)
               a = add(add(a, b), b)
           end
           a.x, a.y
       end
compute_point (generic function with 2 methods)

julia> function compute_point!(n, a, b)
           for i in 0:(n-1)
               a′ = add(add(a, b), b)
               a.x = a′.x
               a.y = a′.y
           end
       end
compute_point! (generic function with 1 method)

julia> compute_point(MPoint, 10, 1+.5, 2+.5, 2+.25, 4+.75);

julia> compute_point(MPoint, 10, 1+.5im, 2+.5im, 2+.25im, 4+.75im);

julia> @test @allocated(compute_point(MPoint, 10000, 1+.5, 2+.5, 2+.25, 4+.75)) == 0
Test Passed

julia> @test @allocated(compute_point(MPoint, 10000, 1+.5im, 2+.5im, 2+.25im, 4+.75im)) == 0
Test Passed

julia> compute_point(10, MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75));

julia> compute_point(10, MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im));

julia> @test @allocated(compute_point(10000, MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75))) == 0
Test Passed

julia> @test @allocated(compute_point(10000, MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im))) == 0
Test Passed

julia> af, bf = MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75);

julia> ac, bc = MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im);

julia> compute_point!(10, af, bf);

julia> compute_point!(10, ac, bc);

julia> @test @allocated(compute_point!(10000, af, bf)) == 0
Test Passed

julia> @test @allocated(compute_point!(10000, ac, bc)) == 0
Test Passed
```

2. `isdefined` check elimination

This commit also implements a simple optimization to eliminate
`isdefined` call by checking load-fowardability.
This optimization may be especially useful to eliminate extra allocation
involved with a capturing closure, e.g.:
```julia
julia> callit(f, args...) = f(args...);

julia> function isdefined_elim()
           local arr::Vector{Any}
           callit() do
               arr = Any[]
           end
           return arr
       end;

julia> code_typed(isdefined_elim)
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Any}, svec(Any, Int64), 0, :(:ccall), Vector{Any}, 0, 0))::Vector{Any}
└──      goto #3 if not true
2 ─      goto #4
3 ─      $(Expr(:throw_undef_if_not, :arr, false))::Any
4 ┄      return %1
) => Vector{Any}
```

3. load-forwarding for non-eliminable but non-escaping mutables

EA also allows us to forward loads even when the mutable allocation
can't be eliminated but still its fields are known precisely.
The load forwarding might be useful since it may derive new type information
that succeeding optimization passes can use (or just because it allows
simpler code transformations down the load):
```julia
julia> code_typed((Bool,String,)) do c, s
           r = Ref{Any}(s)
           if c
               return r[]::String # adce_pass! will further eliminate this type assert call also
           else
               return r
           end
       end
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = %new(Base.RefValue{Any}, s)::Base.RefValue{Any}
└──      goto #3 if not c
2 ─      return s
3 ─      return %1
) => Union{Base.RefValue{Any}, String}
```

---

Please refer to the newly added test cases for more examples.
Also, EA's alias analysis already succeeds to reason about arrays, and
so this EA-based SROA will hopefully be generalized for array SROA as well.
aviatesk added a commit that referenced this pull request Jan 21, 2022
Unleashes the potential for SROA of mutables using the novel Julia-level
escape analysis (on top of #43800):
1. alias-aware SROA, mutable ϕ-node elimination
2. `isdefined` check elimination
3. load-forwarding for non-eliminable but non-escaping mutables

---

1. alias-aware SROA, mutable ϕ-node elimination

EA's alias analysis allows this new SROA to handle nested mutables allocations
pretty well. Now we can eliminate the heap allocations completely from
this insanely nested examples by the single analysis/optimization pass:
```julia
julia> function refs(x)
           (Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref((x))))))))))))[][][][][][][][][][]
       end
refs (generic function with 1 method)

julia> refs("julia"); @allocated refs("julia")
0
```

EA can also analyze escape of ϕ-node as well as its aliasing.
Mutable ϕ-nodes would be eliminated even for a very tricky case as like:
```julia
julia> code_typed((Bool,String,)) do cond, x
           # these allocation form multiple ϕ-nodes
           if cond
               ϕ2 = ϕ1 = Ref{Any}("foo")
           else
               ϕ2 = ϕ1 = Ref{Any}("bar")
           end
           ϕ2[] = x
           y = ϕ1[] # => x
           return y
       end
1-element Vector{Any}:
 CodeInfo(
1 ─     goto #3 if not cond
2 ─     goto #4
3 ─     nothing::Nothing
4 ┄     return x
) => Any
```

Combined with the powerful alias analysis and ϕ-node handling,
the following realistic examples will be fully optimized:
```julia
julia> # demonstrate the power of our field / alias analysis with realistic end to end examples
       # adapted from http://wiki.luajit.org/Allocation-Sinking-Optimization#implementation%5B
       abstract type AbstractPoint{T} end

julia> struct Point{T} <: AbstractPoint{T}
           x::T
           y::T
       end

julia> mutable struct MPoint{T} <: AbstractPoint{T}
           x::T
           y::T
       end

julia> add(a::P, b::P) where P<:AbstractPoint = P(a.x + b.x, a.y + b.y)
add (generic function with 1 method)

julia> function compute_point(T, n, ax, ay, bx, by)
           a = T(ax, ay)
           b = T(bx, by)
           for i in 0:(n-1)
               a = add(add(a, b), b)
           end
           a.x, a.y
       end
compute_point (generic function with 2 methods)

julia> function compute_point(n, a, b)
           for i in 0:(n-1)
               a = add(add(a, b), b)
           end
           a.x, a.y
       end
compute_point (generic function with 2 methods)

julia> function compute_point!(n, a, b)
           for i in 0:(n-1)
               a′ = add(add(a, b), b)
               a.x = a′.x
               a.y = a′.y
           end
       end
compute_point! (generic function with 1 method)

julia> compute_point(MPoint, 10, 1+.5, 2+.5, 2+.25, 4+.75);

julia> compute_point(MPoint, 10, 1+.5im, 2+.5im, 2+.25im, 4+.75im);

julia> @test @allocated(compute_point(MPoint, 10000, 1+.5, 2+.5, 2+.25, 4+.75)) == 0
Test Passed

julia> @test @allocated(compute_point(MPoint, 10000, 1+.5im, 2+.5im, 2+.25im, 4+.75im)) == 0
Test Passed

julia> compute_point(10, MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75));

julia> compute_point(10, MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im));

julia> @test @allocated(compute_point(10000, MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75))) == 0
Test Passed

julia> @test @allocated(compute_point(10000, MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im))) == 0
Test Passed

julia> af, bf = MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75);

julia> ac, bc = MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im);

julia> compute_point!(10, af, bf);

julia> compute_point!(10, ac, bc);

julia> @test @allocated(compute_point!(10000, af, bf)) == 0
Test Passed

julia> @test @allocated(compute_point!(10000, ac, bc)) == 0
Test Passed
```

2. `isdefined` check elimination

This commit also implements a simple optimization to eliminate
`isdefined` call by checking load-fowardability.
This optimization may be especially useful to eliminate extra allocation
involved with a capturing closure, e.g.:
```julia
julia> callit(f, args...) = f(args...);

julia> function isdefined_elim()
           local arr::Vector{Any}
           callit() do
               arr = Any[]
           end
           return arr
       end;

julia> code_typed(isdefined_elim)
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Any}, svec(Any, Int64), 0, :(:ccall), Vector{Any}, 0, 0))::Vector{Any}
└──      goto #3 if not true
2 ─      goto #4
3 ─      $(Expr(:throw_undef_if_not, :arr, false))::Any
4 ┄      return %1
) => Vector{Any}
```

3. load-forwarding for non-eliminable but non-escaping mutables

EA also allows us to forward loads even when the mutable allocation
can't be eliminated but still its fields are known precisely.
The load forwarding might be useful since it may derive new type information
that succeeding optimization passes can use (or just because it allows
simpler code transformations down the load):
```julia
julia> code_typed((Bool,String,)) do c, s
           r = Ref{Any}(s)
           if c
               return r[]::String # adce_pass! will further eliminate this type assert call also
           else
               return r
           end
       end
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = %new(Base.RefValue{Any}, s)::Base.RefValue{Any}
└──      goto #3 if not c
2 ─      return s
3 ─      return %1
) => Union{Base.RefValue{Any}, String}
```

---

Please refer to the newly added test cases for more examples.
Also, EA's alias analysis already succeeds to reason about arrays, and
so this EA-based SROA will hopefully be generalized for array SROA as well.
@JeffBezanson
Copy link
Member

I get this during build:

Generating REPL precompile statements... 36/36
Executing precompile statements... 121/1349[EA] bad EA: Array{Any, (2,)}[
  LibGit2.var"#148#149"{LibGit2.GitRepo, String, String},
  LibGit2.GitReference] at 134
[EA] bad EA: Array{Any, (2,)}[
  LibGit2.var"#148#149"{LibGit2.GitRepo, String, String},
  LibGit2.GitReference] at 162

base/Base.jl Outdated Show resolved Hide resolved
base/compiler/ssair/EscapeAnalysis/EscapeAnalysis.jl Outdated Show resolved Hide resolved
base/compiler/ssair/EscapeAnalysis/EscapeAnalysis.jl Outdated Show resolved Hide resolved
base/compiler/ssair/EscapeAnalysis/EscapeAnalysis.jl Outdated Show resolved Hide resolved
base/compiler/ssair/EscapeAnalysis/EscapeAnalysis.jl Outdated Show resolved Hide resolved
base/compiler/ssair/EscapeAnalysis/EscapeAnalysis.jl Outdated Show resolved Hide resolved
base/compiler/ssair/EscapeAnalysis/EscapeAnalysis.jl Outdated Show resolved Hide resolved
# end

# linear scan to find regions in which potential throws will be caught
function compute_tryregions(ir::IRCode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this use Core.Compiler.compute_trycatch now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Core.Compiler.compute_trycatch somehow sometimes throws unbalanced try/catch assertion error when applied to IRCode (should be easily reproduced). It may be worth digging into what's happening there, but for escape analysis we only need to know the locations of :enter expressions and that analysis can be done as a simple linear scan of compute_frameinfo.

arg = args[i]
if i-1 ≤ nargs
argi = i-1
else # handle isva signature: COMBAK will this be invalid once we take alias information into account ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, this this invalid now?

doc/src/devdocs/EscapeAnalysis/README.md Outdated Show resolved Hide resolved
@aviatesk aviatesk force-pushed the avi/EscapeAnalysis branch 2 times, most recently from e343279 to 007fa58 Compare January 22, 2022 05:15
@aviatesk
Copy link
Member Author

I get this during build:

Generating REPL precompile statements... 36/36
Executing precompile statements... 121/1349[EA] bad EA: Array{Any, (2,)}[
  LibGit2.var"#148#149"{LibGit2.GitRepo, String, String},
  LibGit2.GitReference] at 134
[EA] bad EA: Array{Any, (2,)}[
  LibGit2.var"#148#149"{LibGit2.GitRepo, String, String},
  LibGit2.GitReference] at 162

This is because this PR uses the simple is_load_forwardable to check if EA covers all load-forwardable structs that the currently existing sroa_pass! can find, but that simple is_load_forwardable check doesn't account for ccall-preserve uses, which sroa_pass! can sometimes rewrite with forwarded loads by some extra efforts. #43888 will rewrite mutable SROA pass based on EA, and it is able to account for such ccall-preserve uses and we won't miss this optimization opportunity.

aviatesk added a commit that referenced this pull request Jan 22, 2022
Unleashes the potential for SROA of mutables using the novel Julia-level
escape analysis (on top of #43800):
1. alias-aware SROA, mutable ϕ-node elimination
2. `isdefined` check elimination
3. load-forwarding for non-eliminable but analyzable mutables

---

1. alias-aware SROA, mutable ϕ-node elimination

EA's alias analysis allows this new SROA to handle nested mutables allocations
pretty well. Now we can eliminate the heap allocations completely from
this insanely nested examples by the single analysis/optimization pass:
```julia
julia> function refs(x)
           (Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref((x))))))))))))[][][][][][][][][][]
       end
refs (generic function with 1 method)

julia> refs("julia"); @allocated refs("julia")
0
```

EA can also analyze escape of ϕ-node as well as its aliasing.
Mutable ϕ-nodes would be eliminated even for a very tricky case as like:
```julia
julia> code_typed((Bool,String,)) do cond, x
           # these allocation form multiple ϕ-nodes
           if cond
               ϕ2 = ϕ1 = Ref{Any}("foo")
           else
               ϕ2 = ϕ1 = Ref{Any}("bar")
           end
           ϕ2[] = x
           y = ϕ1[] # => x
           return y
       end
1-element Vector{Any}:
 CodeInfo(
1 ─     goto #3 if not cond
2 ─     goto #4
3 ─     nothing::Nothing
4 ┄     return x
) => Any
```

Combined with the alias analysis and ϕ-node handling above,
allocations in the following "realistic" examples will be optimized:
```julia
julia> # demonstrate the power of our field / alias analysis with realistic end to end examples
       # adapted from http://wiki.luajit.org/Allocation-Sinking-Optimization#implementation%5B
       abstract type AbstractPoint{T} end

julia> struct Point{T} <: AbstractPoint{T}
           x::T
           y::T
       end

julia> mutable struct MPoint{T} <: AbstractPoint{T}
           x::T
           y::T
       end

julia> add(a::P, b::P) where P<:AbstractPoint = P(a.x + b.x, a.y + b.y);

julia> function compute_point(T, n, ax, ay, bx, by)
           a = T(ax, ay)
           b = T(bx, by)
           for i in 0:(n-1)
               a = add(add(a, b), b)
           end
           a.x, a.y
       end;

julia> function compute_point(n, a, b)
           for i in 0:(n-1)
               a = add(add(a, b), b)
           end
           a.x, a.y
       end;

julia> function compute_point!(n, a, b)
           for i in 0:(n-1)
               a′ = add(add(a, b), b)
               a.x = a′.x
               a.y = a′.y
           end
       end;

julia> compute_point(MPoint, 10, 1+.5, 2+.5, 2+.25, 4+.75);

julia> compute_point(MPoint, 10, 1+.5im, 2+.5im, 2+.25im, 4+.75im);

julia> @allocated compute_point(MPoint, 10000, 1+.5, 2+.5, 2+.25, 4+.75)
0

julia> @allocated compute_point(MPoint, 10000, 1+.5im, 2+.5im, 2+.25im, 4+.75im)
0

julia> compute_point(10, MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75));

julia> compute_point(10, MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im));

julia> @allocated compute_point(10000, MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75))
0

julia> @allocated compute_point(10000, MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im))
0

julia> af, bf = MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75);

julia> ac, bc = MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im);

julia> compute_point!(10, af, bf);

julia> compute_point!(10, ac, bc);

julia> @allocated compute_point!(10000, af, bf)
0

julia> @allocated compute_point!(10000, ac, bc)
0
```

2. `isdefined` check elimination

This commit also implements a simple optimization to eliminate
`isdefined` call by checking load-fowardability.
This optimization may be especially useful to eliminate extra allocation
involved with a capturing closure, e.g.:
```julia
julia> callit(f, args...) = f(args...);

julia> function isdefined_elim()
           local arr::Vector{Any}
           callit() do
               arr = Any[]
           end
           return arr
       end;

julia> code_typed(isdefined_elim)
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Any}, svec(Any, Int64), 0, :(:ccall), Vector{Any}, 0, 0))::Vector{Any}
└──      goto #3 if not true
2 ─      goto #4
3 ─      $(Expr(:throw_undef_if_not, :arr, false))::Any
4 ┄      return %1
) => Vector{Any}
```

3. load-forwarding for non-eliminable but analyzable mutables

EA also allows us to forward loads even when the mutable allocation
can't be eliminated but still its fields are known precisely.
The load forwarding might be useful since it may derive new type information
that succeeding optimization passes can use (or just because it allows
simpler code transformations down the load):
```julia
julia> code_typed((Bool,String,)) do c, s
           r = Ref{Any}(s)
           if c
               return r[]::String # adce_pass! will further eliminate this type assert call also
           else
               return r
           end
       end
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = %new(Base.RefValue{Any}, s)::Base.RefValue{Any}
└──      goto #3 if not c
2 ─      return s
3 ─      return %1
) => Union{Base.RefValue{Any}, String}
```

---

Please refer to the newly added test cases for more examples.
Also, EA's alias analysis already succeeds to reason about arrays, and
so this EA-based SROA will hopefully be generalized for array SROA as well.
aviatesk added a commit to aviatesk/Revise.jl that referenced this pull request Feb 15, 2022
JuliaLang/julia#43800 will define new module `EscapeAnalysis`
inside `Core.Compiler`, and it requires this minor special handling.
@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

@ianatol ianatol removed the needs nanosoldier run This PR should have benchmarks run on it label Feb 16, 2022
@ianatol
Copy link
Member

ianatol commented Feb 16, 2022

This needed a rebase, but I can't push to avi/EscapeAnalysis, so I opened #44195 in case this can still be merged prior to the freeze

@aviatesk aviatesk force-pushed the avi/EscapeAnalysis branch 2 times, most recently from bad5206 to 05f22d5 Compare February 16, 2022 02:41
@aviatesk
Copy link
Member Author

@nanosoldier runtests(["DataDeps", "DynamicAxisWarping", "ExponentialUtilities", "FileIO", "Geotherm", "HorseML", "InformationGeometry", "JuliaCon", "PSDMatrices", "Pluto", "ProgressiveHedging", "QuantumTomography", "Reactive", "SetIntersectionProjection"], vs = ":master")

@aviatesk aviatesk added this to the 1.8 milestone Feb 16, 2022
@aviatesk
Copy link
Member Author

aviatesk commented Feb 16, 2022

We discussed that we are going to merge and include this PR for 1.8 without enabling the escape analysis at all, but still allowing developers and external consumers to test and experiment with it.
Since this PR doesn't run EA at all, it shouldn't affect the compilation behavior or latency in any way. It is even not be bootstrapped and so it doesn't affect bootstrapping time neither.

I will merge this once CI turns green.

@KristofferC KristofferC added the backport 1.8 Change should be backported to release-1.8 label Feb 16, 2022
@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

This commit ports [EscapeAnalysis.jl](https://github.com/aviatesk/EscapeAnalysis.jl) into Julia base.
You can find the documentation of this escape analysis at [this GitHub page](https://aviatesk.github.io/EscapeAnalysis.jl/dev/)[^1].

[^1]: The same documentation will be included into Julia's developer
      documentation by this commit.

This escape analysis will hopefully be an enabling technology for various
memory-related optimizations at Julia's high level compilation pipeline.
Possible target optimization includes alias aware SROA (#43888),
array SROA (#43909), `mutating_arrayfreeze` optimization (#42465),
stack allocation of mutables, finalizer elision and so on[^2].

[^2]: It would be also interesting if LLVM-level optimizations can consume
      IPO information derived by this escape analysis to broaden
      optimization possibilities.

The primary motivation for porting EA by this PR is to check its impact
on latency as well as to get feedbacks from a broader range of developers.
The plan is that we first introduce EA to Julia Base by this commit, and
then merge the depending PRs built on top of this commit later.

This commit simply defines EA inside Julia base compiler and enables the
existing test suite with it. In this commit we don't run EA at all, and
so this commit shouldn't affect Julia-level compilation latency.

In the depending PRs, EA will run in two stages:
- `IPO EA`: run EA on pre-inlining state to generate IPO-valid cache
- `Local EA`: run EA on post-inlining state to generate local escape
              information used for various optimizations

In order to integrate `IPO EA` with our compilation cache system,
this commit also implements a new `CodeInstance.argescapes` field that
keeps the IPO-valid cache generated by `IPO EA`.
@KristofferC KristofferC mentioned this pull request Feb 16, 2022
7 tasks
@JeffBezanson JeffBezanson merged commit 8207f5d into master Feb 16, 2022
@JeffBezanson JeffBezanson deleted the avi/EscapeAnalysis branch February 16, 2022 16:04
@KristofferC KristofferC removed the backport 1.8 Change should be backported to release-1.8 label Feb 17, 2022
antoine-levitt pushed a commit to antoine-levitt/julia that referenced this pull request Feb 17, 2022
This commit ports [EscapeAnalysis.jl](https://github.com/aviatesk/EscapeAnalysis.jl) into Julia base.
You can find the documentation of this escape analysis at [this GitHub page](https://aviatesk.github.io/EscapeAnalysis.jl/dev/)[^1].

[^1]: The same documentation will be included into Julia's developer
      documentation by this commit.

This escape analysis will hopefully be an enabling technology for various
memory-related optimizations at Julia's high level compilation pipeline.
Possible target optimization includes alias aware SROA (JuliaLang#43888),
array SROA (JuliaLang#43909), `mutating_arrayfreeze` optimization (JuliaLang#42465),
stack allocation of mutables, finalizer elision and so on[^2].

[^2]: It would be also interesting if LLVM-level optimizations can consume
      IPO information derived by this escape analysis to broaden
      optimization possibilities.

The primary motivation for porting EA by this PR is to check its impact
on latency as well as to get feedbacks from a broader range of developers.
The plan is that we first introduce EA to Julia Base by this commit, and
then merge the depending PRs built on top of this commit later.

This commit simply defines EA inside Julia base compiler and enables the
existing test suite with it. In this commit we don't run EA at all, and
so this commit shouldn't affect Julia-level compilation latency.

In the depending PRs, EA will run in two stages:
- `IPO EA`: run EA on pre-inlining state to generate IPO-valid cache
- `Local EA`: run EA on post-inlining state to generate local escape
              information used for various optimizations

In order to integrate `IPO EA` with our compilation cache system,
this commit also implements a new `CodeInstance.argescapes` field that
keeps the IPO-valid cache generated by `IPO EA`.
timholy pushed a commit to timholy/Revise.jl that referenced this pull request Feb 18, 2022
JuliaLang/julia#43800 will define new module `EscapeAnalysis`
inside `Core.Compiler`, and it requires this minor special handling.
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
This commit ports [EscapeAnalysis.jl](https://github.com/aviatesk/EscapeAnalysis.jl) into Julia base.
You can find the documentation of this escape analysis at [this GitHub page](https://aviatesk.github.io/EscapeAnalysis.jl/dev/)[^1].

[^1]: The same documentation will be included into Julia's developer
      documentation by this commit.

This escape analysis will hopefully be an enabling technology for various
memory-related optimizations at Julia's high level compilation pipeline.
Possible target optimization includes alias aware SROA (JuliaLang#43888),
array SROA (JuliaLang#43909), `mutating_arrayfreeze` optimization (JuliaLang#42465),
stack allocation of mutables, finalizer elision and so on[^2].

[^2]: It would be also interesting if LLVM-level optimizations can consume
      IPO information derived by this escape analysis to broaden
      optimization possibilities.

The primary motivation for porting EA by this PR is to check its impact
on latency as well as to get feedbacks from a broader range of developers.
The plan is that we first introduce EA to Julia Base by this commit, and
then merge the depending PRs built on top of this commit later.

This commit simply defines EA inside Julia base compiler and enables the
existing test suite with it. In this commit we don't run EA at all, and
so this commit shouldn't affect Julia-level compilation latency.

In the depending PRs, EA will run in two stages:
- `IPO EA`: run EA on pre-inlining state to generate IPO-valid cache
- `Local EA`: run EA on post-inlining state to generate local escape
              information used for various optimizations

In order to integrate `IPO EA` with our compilation cache system,
this commit also implements a new `CodeInstance.argescapes` field that
keeps the IPO-valid cache generated by `IPO EA`.
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
This commit ports [EscapeAnalysis.jl](https://github.com/aviatesk/EscapeAnalysis.jl) into Julia base.
You can find the documentation of this escape analysis at [this GitHub page](https://aviatesk.github.io/EscapeAnalysis.jl/dev/)[^1].

[^1]: The same documentation will be included into Julia's developer
      documentation by this commit.

This escape analysis will hopefully be an enabling technology for various
memory-related optimizations at Julia's high level compilation pipeline.
Possible target optimization includes alias aware SROA (JuliaLang#43888),
array SROA (JuliaLang#43909), `mutating_arrayfreeze` optimization (JuliaLang#42465),
stack allocation of mutables, finalizer elision and so on[^2].

[^2]: It would be also interesting if LLVM-level optimizations can consume
      IPO information derived by this escape analysis to broaden
      optimization possibilities.

The primary motivation for porting EA by this PR is to check its impact
on latency as well as to get feedbacks from a broader range of developers.
The plan is that we first introduce EA to Julia Base by this commit, and
then merge the depending PRs built on top of this commit later.

This commit simply defines EA inside Julia base compiler and enables the
existing test suite with it. In this commit we don't run EA at all, and
so this commit shouldn't affect Julia-level compilation latency.

In the depending PRs, EA will run in two stages:
- `IPO EA`: run EA on pre-inlining state to generate IPO-valid cache
- `Local EA`: run EA on post-inlining state to generate local escape
              information used for various optimizations

In order to integrate `IPO EA` with our compilation cache system,
this commit also implements a new `CodeInstance.argescapes` field that
keeps the IPO-valid cache generated by `IPO EA`.
aviatesk added a commit that referenced this pull request Mar 23, 2022
Enhances SROA of mutables using the novel Julia-level escape analysis (on top of #43800):
1. alias-aware SROA, mutable ϕ-node elimination
2. `isdefined` check elimination
3. load-forwarding for non-eliminable but analyzable mutables

---

1. alias-aware SROA, mutable ϕ-node elimination

EA's alias analysis allows this new SROA to handle nested mutables allocations
pretty well. Now we can eliminate the heap allocations completely from
this insanely nested examples by the single analysis/optimization pass:
```julia
julia> function refs(x)
           (Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref((x))))))))))))[][][][][][][][][][]
       end
refs (generic function with 1 method)

julia> refs("julia"); @allocated refs("julia")
0
```

EA can also analyze escape of ϕ-node as well as its aliasing.
Mutable ϕ-nodes would be eliminated even for a very tricky case as like:
```julia
julia> code_typed((Bool,String,)) do cond, x
           # these allocation form multiple ϕ-nodes
           if cond
               ϕ2 = ϕ1 = Ref{Any}("foo")
           else
               ϕ2 = ϕ1 = Ref{Any}("bar")
           end
           ϕ2[] = x
           y = ϕ1[] # => x
           return y
       end
1-element Vector{Any}:
 CodeInfo(
1 ─     goto #3 if not cond
2 ─     goto #4
3 ─     nothing::Nothing
4 ┄     return x
) => Any
```

Combined with the alias analysis and ϕ-node handling above,
allocations in the following "realistic" examples will be optimized:
```julia
julia> # demonstrate the power of our field / alias analysis with realistic end to end examples
       # adapted from http://wiki.luajit.org/Allocation-Sinking-Optimization#implementation%5B
       abstract type AbstractPoint{T} end

julia> struct Point{T} <: AbstractPoint{T}
           x::T
           y::T
       end

julia> mutable struct MPoint{T} <: AbstractPoint{T}
           x::T
           y::T
       end

julia> add(a::P, b::P) where P<:AbstractPoint = P(a.x + b.x, a.y + b.y);

julia> function compute_point(T, n, ax, ay, bx, by)
           a = T(ax, ay)
           b = T(bx, by)
           for i in 0:(n-1)
               a = add(add(a, b), b)
           end
           a.x, a.y
       end;

julia> function compute_point(n, a, b)
           for i in 0:(n-1)
               a = add(add(a, b), b)
           end
           a.x, a.y
       end;

julia> function compute_point!(n, a, b)
           for i in 0:(n-1)
               a′ = add(add(a, b), b)
               a.x = a′.x
               a.y = a′.y
           end
       end;

julia> compute_point(MPoint, 10, 1+.5, 2+.5, 2+.25, 4+.75);

julia> compute_point(MPoint, 10, 1+.5im, 2+.5im, 2+.25im, 4+.75im);

julia> @allocated compute_point(MPoint, 10000, 1+.5, 2+.5, 2+.25, 4+.75)
0

julia> @allocated compute_point(MPoint, 10000, 1+.5im, 2+.5im, 2+.25im, 4+.75im)
0

julia> compute_point(10, MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75));

julia> compute_point(10, MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im));

julia> @allocated compute_point(10000, MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75))
0

julia> @allocated compute_point(10000, MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im))
0

julia> af, bf = MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75);

julia> ac, bc = MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im);

julia> compute_point!(10, af, bf);

julia> compute_point!(10, ac, bc);

julia> @allocated compute_point!(10000, af, bf)
0

julia> @allocated compute_point!(10000, ac, bc)
0
```

2. `isdefined` check elimination

This commit also implements a simple optimization to eliminate
`isdefined` call by checking load-fowardability.
This optimization may be especially useful to eliminate extra allocation
involved with a capturing closure, e.g.:
```julia
julia> callit(f, args...) = f(args...);

julia> function isdefined_elim()
           local arr::Vector{Any}
           callit() do
               arr = Any[]
           end
           return arr
       end;

julia> code_typed(isdefined_elim)
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Any}, svec(Any, Int64), 0, :(:ccall), Vector{Any}, 0, 0))::Vector{Any}
└──      goto #3 if not true
2 ─      goto #4
3 ─      $(Expr(:throw_undef_if_not, :arr, false))::Any
4 ┄      return %1
) => Vector{Any}
```

3. load-forwarding for non-eliminable but analyzable mutables

EA also allows us to forward loads even when the mutable allocation
can't be eliminated but still its fields are known precisely.
The load forwarding might be useful since it may derive new type information
that succeeding optimization passes can use (or just because it allows
simpler code transformations down the load):
```julia
julia> code_typed((Bool,String,)) do c, s
           r = Ref{Any}(s)
           if c
               return r[]::String # adce_pass! will further eliminate this type assert call also
           else
               return r
           end
       end
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = %new(Base.RefValue{Any}, s)::Base.RefValue{Any}
└──      goto #3 if not c
2 ─      return s
3 ─      return %1
) => Union{Base.RefValue{Any}, String}
```

---

Please refer to the newly added test cases for more examples.
Also, EA's alias analysis already succeeds to reason about arrays, and
so this EA-based SROA will hopefully be generalized for array SROA as well.
aviatesk added a commit that referenced this pull request Mar 23, 2022
Enhances SROA of mutables using the novel Julia-level escape analysis (on top of #43800):
1. alias-aware SROA, mutable ϕ-node elimination
2. `isdefined` check elimination
3. load-forwarding for non-eliminable but analyzable mutables

---

1. alias-aware SROA, mutable ϕ-node elimination

EA's alias analysis allows this new SROA to handle nested mutables allocations
pretty well. Now we can eliminate the heap allocations completely from
this insanely nested examples by the single analysis/optimization pass:
```julia
julia> function refs(x)
           (Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref(Ref((x))))))))))))[][][][][][][][][][]
       end
refs (generic function with 1 method)

julia> refs("julia"); @allocated refs("julia")
0
```

EA can also analyze escape of ϕ-node as well as its aliasing.
Mutable ϕ-nodes would be eliminated even for a very tricky case as like:
```julia
julia> code_typed((Bool,String,)) do cond, x
           # these allocation form multiple ϕ-nodes
           if cond
               ϕ2 = ϕ1 = Ref{Any}("foo")
           else
               ϕ2 = ϕ1 = Ref{Any}("bar")
           end
           ϕ2[] = x
           y = ϕ1[] # => x
           return y
       end
1-element Vector{Any}:
 CodeInfo(
1 ─     goto #3 if not cond
2 ─     goto #4
3 ─     nothing::Nothing
4 ┄     return x
) => Any
```

Combined with the alias analysis and ϕ-node handling above,
allocations in the following "realistic" examples will be optimized:
```julia
julia> # demonstrate the power of our field / alias analysis with realistic end to end examples
       # adapted from http://wiki.luajit.org/Allocation-Sinking-Optimization#implementation%5B
       abstract type AbstractPoint{T} end

julia> struct Point{T} <: AbstractPoint{T}
           x::T
           y::T
       end

julia> mutable struct MPoint{T} <: AbstractPoint{T}
           x::T
           y::T
       end

julia> add(a::P, b::P) where P<:AbstractPoint = P(a.x + b.x, a.y + b.y);

julia> function compute_point(T, n, ax, ay, bx, by)
           a = T(ax, ay)
           b = T(bx, by)
           for i in 0:(n-1)
               a = add(add(a, b), b)
           end
           a.x, a.y
       end;

julia> function compute_point(n, a, b)
           for i in 0:(n-1)
               a = add(add(a, b), b)
           end
           a.x, a.y
       end;

julia> function compute_point!(n, a, b)
           for i in 0:(n-1)
               a′ = add(add(a, b), b)
               a.x = a′.x
               a.y = a′.y
           end
       end;

julia> compute_point(MPoint, 10, 1+.5, 2+.5, 2+.25, 4+.75);

julia> compute_point(MPoint, 10, 1+.5im, 2+.5im, 2+.25im, 4+.75im);

julia> @allocated compute_point(MPoint, 10000, 1+.5, 2+.5, 2+.25, 4+.75)
0

julia> @allocated compute_point(MPoint, 10000, 1+.5im, 2+.5im, 2+.25im, 4+.75im)
0

julia> compute_point(10, MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75));

julia> compute_point(10, MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im));

julia> @allocated compute_point(10000, MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75))
0

julia> @allocated compute_point(10000, MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im))
0

julia> af, bf = MPoint(1+.5, 2+.5), MPoint(2+.25, 4+.75);

julia> ac, bc = MPoint(1+.5im, 2+.5im), MPoint(2+.25im, 4+.75im);

julia> compute_point!(10, af, bf);

julia> compute_point!(10, ac, bc);

julia> @allocated compute_point!(10000, af, bf)
0

julia> @allocated compute_point!(10000, ac, bc)
0
```

2. `isdefined` check elimination

This commit also implements a simple optimization to eliminate
`isdefined` call by checking load-fowardability.
This optimization may be especially useful to eliminate extra allocation
involved with a capturing closure, e.g.:
```julia
julia> callit(f, args...) = f(args...);

julia> function isdefined_elim()
           local arr::Vector{Any}
           callit() do
               arr = Any[]
           end
           return arr
       end;

julia> code_typed(isdefined_elim)
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Vector{Any}, svec(Any, Int64), 0, :(:ccall), Vector{Any}, 0, 0))::Vector{Any}
└──      goto #3 if not true
2 ─      goto #4
3 ─      $(Expr(:throw_undef_if_not, :arr, false))::Any
4 ┄      return %1
) => Vector{Any}
```

3. load-forwarding for non-eliminable but analyzable mutables

EA also allows us to forward loads even when the mutable allocation
can't be eliminated but still its fields are known precisely.
The load forwarding might be useful since it may derive new type information
that succeeding optimization passes can use (or just because it allows
simpler code transformations down the load):
```julia
julia> code_typed((Bool,String,)) do c, s
           r = Ref{Any}(s)
           if c
               return r[]::String # adce_pass! will further eliminate this type assert call also
           else
               return r
           end
       end
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = %new(Base.RefValue{Any}, s)::Base.RefValue{Any}
└──      goto #3 if not c
2 ─      return s
3 ─      return %1
) => Union{Base.RefValue{Any}, String}
```

---

Please refer to the newly added test cases for more examples.
Also, EA's alias analysis already succeeds to reason about arrays, and
so this EA-based SROA will hopefully be generalized for array SROA as well.
@maleadt maleadt mentioned this pull request Dec 31, 2022
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:optimizer Optimization passes (mostly in base/compiler/ssair/)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants