Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inlining: relax finalizer inlining control-flow restriction #46651

Merged
merged 1 commit into from
Sep 21, 2022

Conversation

aviatesk
Copy link
Member

@aviatesk aviatesk commented Sep 6, 2022

Eager finalizer inlining (#45272) currently has a restriction that
requires all the def/uses to be in a same basic block.

This commit relaxes that restriction a bit by allowing def/uses to
involve control flow when all of them are dominated by a finalizer
call to be inlined, since in that case it is safe to insert the body of
finalizer at the end of all the def/uses, e.g.

const FINALIZATION_COUNT = Ref(0)
init_finalization_count!() = FINALIZATION_COUNT[] = 0
get_finalization_count() = FINALIZATION_COUNT[]
@noinline add_finalization_count!(x) = FINALIZATION_COUNT[] += x
@noinline Base.@assume_effects :nothrow safeprint(io::IO, x...) = (@nospecialize; print(io, x...))
mutable struct DoAllocWithFieldInter
    x::Int
end
function register_finalizer!(obj::DoAllocWithFieldInter)
    finalizer(obj) do this
        add_finalization_count!(this.x)
    end
end

function cfg_finalization3(io)
    for i = -999:1000
        o = DoAllocWithFieldInter(i)
        register_finalizer!(o)
        if i == 1000
            safeprint(io, o.x, '\n')
        elseif i > 0
            safeprint(io, o.x)
        end
    end
end
let src = code_typed1(cfg_finalization3, (IO,))
    @test count(isinvoke(:add_finalization_count!), src.code) == 1
end
let
    init_finalization_count!()
    cfg_finalization3(IOBuffer())
    @test get_finalization_count() == 1000
end

There is a room for further improvement though -- if we have a way to
compute the post-domination, we can also inline a finalizer call
in a case when a finalizer block post-dominateds all the def/uses e.g.

function cfg_finalization5(io)
    for i = -999:1000
        o = DoAllocWithFieldInter(i)
        if i == 1000
            safeprint(io, o.x, '\n')
        elseif i > 0
            safeprint(io, o.x)
        end
        register_finalizer!(o)
    end
end
let src = code_typed1(cfg_finalization5, (IO,))
    # TODO we can fix this case by checking a case when a finalizer block is post-dominated by all the def/uses
    @test_broken count(isinvoke(:nothrow_side_effect), src.code) == 1
end

@aviatesk aviatesk requested a review from Keno September 6, 2022 15:10
@aviatesk aviatesk added compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) don't squash Don't squash merge labels Sep 6, 2022
Comment on lines 1304 to 1306
@test count(isnew, src.code) == 0
# FIXME
@test_broken count(isnew, src.code) == 0 && count(iscall(DoAllocNoEscapeSparam), src.code) == 0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not because of this PR, but I found that this test case has been broken.
On master:

julia> src = code_typed1(Tuple{Any}) do x
           for i = 1:1000
               DoAllocNoEscapeSparam(x)
           end
       end
CodeInfo(
1 ─       goto #7 if not true
2%2  = φ (#1 => 1, #6 => %9)::Int64
│         Main.DoAllocNoEscapeSparam(x)::Any%4  = (%2 === 1000)::Bool
└──       goto #4 if not %4
3 ─       goto #5
4%7  = Base.add_int(%2, 1)::Int64
└──       goto #5
5%9  = φ (#4 => %7)::Int64%10 = φ (#3 => true, #4 => false)::Bool%11 = Base.not_int(%10)::Bool
└──       goto #7 if not %11
6 ─       goto #2
7return nothing
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe because of #45062? /cc @ianatol

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing, I will take a look today

@aviatesk aviatesk force-pushed the avi/inline-cfg-finalizer branch from b112600 to 28f12c5 Compare September 6, 2022 16:20
Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

all(check_in_range, defuse.uses) || return nothing
all(check_in_range, defuse.defs) || return nothing
all(check_defuse, defuse.uses) || return nothing
all(check_defuse, defuse.defs) || return nothing

# For now: Require all statements in the basic block range to be nothrow.
all(minval:maxval) do idx::Int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably only look at the statements in blocks on the domination path, but we can do that in the followup. I'd also like a error_is_fatal optimizer setting that assumes that any errors are fatal and thus finalizers don't need to run after errors, implying that we can skip this legality check.

@Keno
Copy link
Member

Keno commented Sep 7, 2022

With this restriction lifted, it's no longer sufficient to just inline the finalizer after the numerically last SSA value that uses it. Something needs to ensure the finalizer happens after the last use on any path to the exit:

julia> @noinline nothrow_side_effect(x) =
           Base.@assume_effects :total !:effect_free @ccall jl_(x::Any)::Cvoid
nothrow_side_effect (generic function with 1 method)

julia> mutable struct DoAllocWithField
           x
           function DoAllocWithField(x)
               finalizer(new(x)) do this
                   nothrow_side_effect(this.x)
               end
           end
       end

julia> function foo(b::Bool)
           s = DoAllocWithField(1)
           if b
               nothrow_side_effect(s.x)
               return 1
           else
               return 0
           end
       end
foo (generic function with 1 method)

julia> code_typed(foo, Tuple{Bool})
1-element Vector{Any}:
 CodeInfo(
1 ─ %1 = %new(Main.:(var"#3#4"))::var"#3#4"
│   %2 = %new(Main.DoAllocWithField, 1)::DoAllocWithField
└──      goto #3 if not b
2 ─ %4 = Base.getfield(%2, :x)::Any
│        invoke %1(%2::DoAllocWithField)::Nothing
│        Main.nothrow_side_effect(%4)::Any
└──      return 1
3 ─      return 0
) => Int64

julia> foo(true)
1
1
1

julia> foo(false)
0

Keno
Keno previously requested changes Sep 7, 2022
Copy link
Member

@Keno Keno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right direction overall, but see the correctness issue mentioned in the main issue body.

@aviatesk aviatesk force-pushed the avi/inline-cfg-finalizer branch 3 times, most recently from bb2e304 to 252b159 Compare September 8, 2022 01:51
@aviatesk aviatesk removed the don't squash Don't squash merge label Sep 8, 2022
@aviatesk aviatesk force-pushed the avi/inline-cfg-finalizer branch 2 times, most recently from c742a32 to 7dcc2b5 Compare September 8, 2022 14:30
@aviatesk aviatesk changed the title inlining: relax finalizer inlining control-flow restriction wip: inlining: relax finalizer inlining control-flow restriction Sep 8, 2022
@staticfloat
Copy link
Member

staticfloat commented Sep 8, 2022

Just FYI, multiple windows buildbots hardlocked on this PR, and I had to manually reboot them. I'm not sure if that's a coincidence or not, but you can see the builds here:

So perhaps something on this branch causes things to jam up in a bad way? Of course, it is a failing of the current windows buildbots that they do not more forcefully kill the windows processes and get stuck like this, but that's something we are working to fix in the Buildkite era.

@Keno Keno force-pushed the avi/inline-cfg-finalizer branch from 7dcc2b5 to b01d02b Compare September 21, 2022 03:21
@Keno Keno changed the title wip: inlining: relax finalizer inlining control-flow restriction inlining: relax finalizer inlining control-flow restriction Sep 21, 2022
@Keno Keno force-pushed the avi/inline-cfg-finalizer branch 2 times, most recently from 6d143ca to 18bd85e Compare September 21, 2022 03:24
@Keno
Copy link
Member

Keno commented Sep 21, 2022

I've implemented the post-dominator analysis and updated this. I think this should be reasonably complete now. @aviatesk take a look, please.

@aviatesk
Copy link
Member Author

Looks great, thanks sooo much for implementing the post-domination analysis, that is very exciting itself.
I think there seems to be a problem in finalizer insertion point though, MRE would be:

const FINALIZATION_COUNT = Ref(0)
init_finalization_count!() = FINALIZATION_COUNT[] = 0
get_finalization_count() = FINALIZATION_COUNT[]
@noinline add_finalization_count!(x) = FINALIZATION_COUNT[] += x
@noinline Base.@assume_effects :nothrow safeprint(io::IO, x...) = (@nospecialize; print(io, x...))
@test Core.Compiler.is_finalizer_inlineable(Base.infer_effects(add_finalization_count!, (Int,)))

mutable struct DoAllocWithFieldInter
    x::Int
end
function register_finalizer!(obj::DoAllocWithFieldInter)
    finalizer(obj) do this
        add_finalization_count!(this.x)
    end
end

function cfg_finalization6(io)
    for i = -999:1000
        o = DoAllocWithFieldInter(0)
        register_finalizer!(o)
        if i == 1000
            o.x = i # with `setfield!`
        elseif i > 0
            safeprint(io, o.x, '\n')
        end
        # <= shouldn't the finalizer be inlined here?
    end
end
let src = code_typed1(cfg_finalization6, (IO,))
    @test count(isinvoke(:add_finalization_count!), src.code) == 1
end
let
    init_finalization_count!()
    cfg_finalization6(IOBuffer())
    @test get_finalization_count() == 1000 # this fails
end

Currently the finalizer is inlined after the allocation site, so can't observe the field value changed by setfield!:

julia> src = code_typed1(cfg_finalization6, (IO,))
CodeInfo(
1 ──       goto #11 if not true
2 ┄─ %2  = φ (#1 => -999, #10 => %20)::Int64%3  = φ (#1 => -999, #10 => %21)::Int64%4  = %new(Main.DoAllocWithFieldInter, 0)::DoAllocWithFieldInter%5  = Base.getfield(%4, :x)::Int64
│          invoke Main.add_finalization_count!(%5::Int64)::Int64%7  = (%2 === 1000)::Bool
└───       goto #4 if not %7
3 ──       Base.setfield!(%4, :x, %2)::Int64
└───       goto #6
4 ── %11 = Base.slt_int(0, %2)::Bool
└───       goto #6 if not %11
5 ── %13 = Base.getfield(%4, :x)::Int64
└───       invoke Main.safeprint(io::IO, %13::Any, '\n'::Vararg{Any})::Any
6 ┄─ %15 = (%3 === 1000)::Bool
└───       goto #8 if not %15
7 ──       goto #9
8 ── %18 = Base.add_int(%3, 1)::Int64
└───       goto #9
9 ┄─ %20 = φ (#8 => %18)::Int64%21 = φ (#8 => %18)::Int64%22 = φ (#7 => true, #8 => false)::Bool%23 = Base.not_int(%22)::Bool
└───       goto #11 if not %23
10 ─       goto #2
11return nothing
)

Copy link
Member Author

@aviatesk aviatesk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just minor style nits.

test/compiler/inline.jl Outdated Show resolved Hide resolved
base/compiler/ssair/passes.jl Outdated Show resolved Hide resolved
base/compiler/ssair/passes.jl Outdated Show resolved Hide resolved
base/compiler/ssair/domtree.jl Outdated Show resolved Hide resolved
@Keno
Copy link
Member

Keno commented Sep 21, 2022

I think there seems to be a problem in finalizer insertion point though, MRE would be:

Should be fixed by the following, if you want to apply that and add the test

diff --git a/base/compiler/ssair/passes.jl b/base/compiler/ssair/passes.jl
index 00577526ac..1e9d616a96 100644
--- a/base/compiler/ssair/passes.jl
+++ b/base/compiler/ssair/passes.jl
@@ -1105,8 +1105,8 @@ end
 is_nothrow(ir::IRCode, pc::Int) = (ir.stmts[pc][:flag] & IR_FLAG_NOTHROW) ≠ 0

 function reachable_blocks(cfg::CFG, from_bb, to_bb = nothing)
-    worklist = Int[]
-    visited = BitSet()
+    worklist = Int[from_bb]
+    visited = BitSet(from_bb)
     if to_bb !== nothing
         push!(visited, to_bb)
     end
@@ -1116,7 +1116,6 @@ function reachable_blocks(cfg::CFG, from_bb, to_bb = nothing)
             push!(worklist, bb)
         end
     end
-    visit!(from_bb)
     while !isempty(worklist)
         foreach(visit!, cfg.blocks[pop!(worklist)].succs)
     end
@@ -1148,7 +1147,15 @@ function try_resolve_finalizer!(ir::IRCode, idx::Int, finalizer_idx::Int, defuse
     bb_insert_idx = finalizer_idx
     function note_block_use!(bb, idx)
         new_bb_insert_block = nearest_common_dominator(get!(lazypostdomtree), bb_insert_block, bb)
-        bb_insert_idx = new_bb_insert_block == bb_insert_block ? max(idx, bb_insert_idx) : nothing
+        if new_bb_insert_block == bb
+            if bb == bb_insert_block
+                bb_insert_idx = max(bb_insert_idx, idx)
+            else
+                bb_insert_idx = idx
+            end
+        else
+            bb_insert_idx = nothing
+        end
         bb_insert_block = new_bb_insert_block
         nothing
     end

Plus an appropriate adjustment to the other use of reachable_blocks:

-         blocks = reachable_blocks(ir.cfg, finalizer_bb, bb_insert_block)
+        # Collect all reachable blocks between the finalizer registration and the
+        # insertion point
+        blocks = finalizer_bb == bb_insert_block ? Int[finalizer_bb] :
+            reachable_blocks(ir.cfg, finalizer_bb, bb_insert_block)

@aviatesk aviatesk force-pushed the avi/inline-cfg-finalizer branch from bd8e4a1 to 56c8a9c Compare September 21, 2022 13:02
Comment on lines 1186 to 1187
function check_range_nothrow(range)
return all(finalizer_idx:maxval) do sidx::Int
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems to do nothing actually (maxval is actually set to 0). I will give a try to enable it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks. It should obviously check the range

Eager `finalizer` inlining (#45272) currently has a restriction that
requires all the def/uses to be in a same basic block.

This commit relaxes that restriction a bit by allowing def/uses to
involve control flow when all of them are dominated by a `finalizer`
call to be inlined, since in that case it is safe to insert the body of
`finalizer` at the end of all the def/uses, e.g.
```julia
const FINALIZATION_COUNT = Ref(0)
init_finalization_count!() = FINALIZATION_COUNT[] = 0
get_finalization_count() = FINALIZATION_COUNT[]
@noinline add_finalization_count!(x) = FINALIZATION_COUNT[] += x
@noinline Base.@assume_effects :nothrow safeprint(io::IO, x...) = (@nospecialize; print(io, x...))
mutable struct DoAllocWithFieldInter
    x::Int
end
function register_finalizer!(obj::DoAllocWithFieldInter)
    finalizer(obj) do this
        add_finalization_count!(this.x)
    end
end

function cfg_finalization3(io)
    for i = -999:1000
        o = DoAllocWithFieldInter(i)
        register_finalizer!(o)
        if i == 1000
            safeprint(io, o.x, '\n')
        elseif i > 0
            safeprint(io, o.x)
        end
    end
end
let src = code_typed1(cfg_finalization3, (IO,))
    @test count(isinvoke(:add_finalization_count!), src.code) == 1
end
let
    init_finalization_count!()
    cfg_finalization3(IOBuffer())
    @test get_finalization_count() == 1000
end
```

To support this transformation, the domtree code also gains the ability
to represent post-dominator trees, which is generally useful.
@aviatesk aviatesk force-pushed the avi/inline-cfg-finalizer branch from 56c8a9c to 64f7e99 Compare September 21, 2022 13:35
Copy link
Member

@ianatol ianatol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand all of the domination logic here, but in general it seems to make sense. LGTM!

@@ -109,10 +109,16 @@ end

length(D::DFSTree) = length(D.from_pre)

function DFS!(D::DFSTree, blocks::Vector{BasicBlock})
function DFS!(D::DFSTree, blocks::Vector{BasicBlock}, is_post_dominator::Bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function DFS!(D::DFSTree, blocks::Vector{BasicBlock}, is_post_dominator::Bool)
function DFS!(D::DFSTree, blocks::Vector{BasicBlock}, is_postdominator::Bool)

Per #46651 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair callout, but I don't like this variable name that much anyway, so rather than going through a new CI cycle for this, I'll probably just rename this on the next refactor. DFS is really independent of domination, so this should probably just be is_reversed.

@Keno Keno merged commit 62ac26a into master Sep 21, 2022
@Keno Keno deleted the avi/inline-cfg-finalizer branch September 21, 2022 22:59
Keno added a commit that referenced this pull request Sep 22, 2022
Found while testing bigger examples.
Keno added a commit that referenced this pull request Sep 22, 2022
Found while testing bigger examples.
Keno added a commit that referenced this pull request Sep 22, 2022
Found while testing bigger examples.
Keno added a commit that referenced this pull request Sep 22, 2022
Found while testing bigger examples. Fixes the logic introduced in #46651
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:optimizer Optimization passes (mostly in base/compiler/ssair/)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants