-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inlining: relax finalizer inlining control-flow restriction #46651
Conversation
test/compiler/inline.jl
Outdated
@test count(isnew, src.code) == 0 | ||
# FIXME | ||
@test_broken count(isnew, src.code) == 0 && count(iscall(DoAllocNoEscapeSparam), src.code) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not because of this PR, but I found that this test case has been broken.
On master:
julia> src = code_typed1(Tuple{Any}) do x
for i = 1:1000
DoAllocNoEscapeSparam(x)
end
end
CodeInfo(
1 ─ goto #7 if not true
2 ┄ %2 = φ (#1 => 1, #6 => %9)::Int64
│ Main.DoAllocNoEscapeSparam(x)::Any
│ %4 = (%2 === 1000)::Bool
└── goto #4 if not %4
3 ─ goto #5
4 ─ %7 = Base.add_int(%2, 1)::Int64
└── goto #5
5 ┄ %9 = φ (#4 => %7)::Int64
│ %10 = φ (#3 => true, #4 => false)::Bool
│ %11 = Base.not_int(%10)::Bool
└── goto #7 if not %11
6 ─ goto #2
7 ┄ return nothing
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing, I will take a look today
b112600
to
28f12c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
base/compiler/ssair/passes.jl
Outdated
all(check_in_range, defuse.uses) || return nothing | ||
all(check_in_range, defuse.defs) || return nothing | ||
all(check_defuse, defuse.uses) || return nothing | ||
all(check_defuse, defuse.defs) || return nothing | ||
|
||
# For now: Require all statements in the basic block range to be nothrow. | ||
all(minval:maxval) do idx::Int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably only look at the statements in blocks on the domination path, but we can do that in the followup. I'd also like a error_is_fatal
optimizer setting that assumes that any errors are fatal and thus finalizers don't need to run after errors, implying that we can skip this legality check.
With this restriction lifted, it's no longer sufficient to just inline the finalizer after the numerically last SSA value that uses it. Something needs to ensure the finalizer happens after the last use on any path to the exit:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the right direction overall, but see the correctness issue mentioned in the main issue body.
bb2e304
to
252b159
Compare
c742a32
to
7dcc2b5
Compare
Just FYI, multiple windows buildbots hardlocked on this PR, and I had to manually reboot them. I'm not sure if that's a coincidence or not, but you can see the builds here:
So perhaps something on this branch causes things to jam up in a bad way? Of course, it is a failing of the current windows buildbots that they do not more forcefully kill the windows processes and get stuck like this, but that's something we are working to fix in the Buildkite era. |
7dcc2b5
to
b01d02b
Compare
6d143ca
to
18bd85e
Compare
I've implemented the post-dominator analysis and updated this. I think this should be reasonably complete now. @aviatesk take a look, please. |
Looks great, thanks sooo much for implementing the post-domination analysis, that is very exciting itself. const FINALIZATION_COUNT = Ref(0)
init_finalization_count!() = FINALIZATION_COUNT[] = 0
get_finalization_count() = FINALIZATION_COUNT[]
@noinline add_finalization_count!(x) = FINALIZATION_COUNT[] += x
@noinline Base.@assume_effects :nothrow safeprint(io::IO, x...) = (@nospecialize; print(io, x...))
@test Core.Compiler.is_finalizer_inlineable(Base.infer_effects(add_finalization_count!, (Int,)))
mutable struct DoAllocWithFieldInter
x::Int
end
function register_finalizer!(obj::DoAllocWithFieldInter)
finalizer(obj) do this
add_finalization_count!(this.x)
end
end
function cfg_finalization6(io)
for i = -999:1000
o = DoAllocWithFieldInter(0)
register_finalizer!(o)
if i == 1000
o.x = i # with `setfield!`
elseif i > 0
safeprint(io, o.x, '\n')
end
# <= shouldn't the finalizer be inlined here?
end
end
let src = code_typed1(cfg_finalization6, (IO,))
@test count(isinvoke(:add_finalization_count!), src.code) == 1
end
let
init_finalization_count!()
cfg_finalization6(IOBuffer())
@test get_finalization_count() == 1000 # this fails
end Currently the finalizer is inlined after the allocation site, so can't observe the field value changed by julia> src = code_typed1(cfg_finalization6, (IO,))
CodeInfo(
1 ── goto #11 if not true
2 ┄─ %2 = φ (#1 => -999, #10 => %20)::Int64
│ %3 = φ (#1 => -999, #10 => %21)::Int64
│ %4 = %new(Main.DoAllocWithFieldInter, 0)::DoAllocWithFieldInter
│ %5 = Base.getfield(%4, :x)::Int64
│ invoke Main.add_finalization_count!(%5::Int64)::Int64
│ %7 = (%2 === 1000)::Bool
└─── goto #4 if not %7
3 ── Base.setfield!(%4, :x, %2)::Int64
└─── goto #6
4 ── %11 = Base.slt_int(0, %2)::Bool
└─── goto #6 if not %11
5 ── %13 = Base.getfield(%4, :x)::Int64
└─── invoke Main.safeprint(io::IO, %13::Any, '\n'::Vararg{Any})::Any
6 ┄─ %15 = (%3 === 1000)::Bool
└─── goto #8 if not %15
7 ── goto #9
8 ── %18 = Base.add_int(%3, 1)::Int64
└─── goto #9
9 ┄─ %20 = φ (#8 => %18)::Int64
│ %21 = φ (#8 => %18)::Int64
│ %22 = φ (#7 => true, #8 => false)::Bool
│ %23 = Base.not_int(%22)::Bool
└─── goto #11 if not %23
10 ─ goto #2
11 ┄ return nothing
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just minor style nits.
Should be fixed by the following, if you want to apply that and add the test
Plus an appropriate adjustment to the other use of reachable_blocks:
|
bd8e4a1
to
56c8a9c
Compare
base/compiler/ssair/passes.jl
Outdated
function check_range_nothrow(range) | ||
return all(finalizer_idx:maxval) do sidx::Int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function seems to do nothing actually (maxval
is actually set to 0
). I will give a try to enable it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thanks. It should obviously check the range
Eager `finalizer` inlining (#45272) currently has a restriction that requires all the def/uses to be in a same basic block. This commit relaxes that restriction a bit by allowing def/uses to involve control flow when all of them are dominated by a `finalizer` call to be inlined, since in that case it is safe to insert the body of `finalizer` at the end of all the def/uses, e.g. ```julia const FINALIZATION_COUNT = Ref(0) init_finalization_count!() = FINALIZATION_COUNT[] = 0 get_finalization_count() = FINALIZATION_COUNT[] @noinline add_finalization_count!(x) = FINALIZATION_COUNT[] += x @noinline Base.@assume_effects :nothrow safeprint(io::IO, x...) = (@nospecialize; print(io, x...)) mutable struct DoAllocWithFieldInter x::Int end function register_finalizer!(obj::DoAllocWithFieldInter) finalizer(obj) do this add_finalization_count!(this.x) end end function cfg_finalization3(io) for i = -999:1000 o = DoAllocWithFieldInter(i) register_finalizer!(o) if i == 1000 safeprint(io, o.x, '\n') elseif i > 0 safeprint(io, o.x) end end end let src = code_typed1(cfg_finalization3, (IO,)) @test count(isinvoke(:add_finalization_count!), src.code) == 1 end let init_finalization_count!() cfg_finalization3(IOBuffer()) @test get_finalization_count() == 1000 end ``` To support this transformation, the domtree code also gains the ability to represent post-dominator trees, which is generally useful.
56c8a9c
to
64f7e99
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand all of the domination logic here, but in general it seems to make sense. LGTM!
@@ -109,10 +109,16 @@ end | |||
|
|||
length(D::DFSTree) = length(D.from_pre) | |||
|
|||
function DFS!(D::DFSTree, blocks::Vector{BasicBlock}) | |||
function DFS!(D::DFSTree, blocks::Vector{BasicBlock}, is_post_dominator::Bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function DFS!(D::DFSTree, blocks::Vector{BasicBlock}, is_post_dominator::Bool) | |
function DFS!(D::DFSTree, blocks::Vector{BasicBlock}, is_postdominator::Bool) |
Per #46651 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair callout, but I don't like this variable name that much anyway, so rather than going through a new CI cycle for this, I'll probably just rename this on the next refactor. DFS is really independent of domination, so this should probably just be is_reversed
.
Found while testing bigger examples.
Found while testing bigger examples.
Found while testing bigger examples.
Found while testing bigger examples. Fixes the logic introduced in #46651
Eager
finalizer
inlining (#45272) currently has a restriction thatrequires all the def/uses to be in a same basic block.
This commit relaxes that restriction a bit by allowing def/uses to
involve control flow when all of them are dominated by a
finalizer
call to be inlined, since in that case it is safe to insert the body of
finalizer
at the end of all the def/uses, e.g.There is a room for further improvement though -- if we have a way to
compute the post-domination, we can also inline a
finalizer
callin a case when a
finalizer
block post-dominateds all the def/uses e.g.