-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid IR causes OOB, trampling over LLVM objects, causing segfaults #51561
Comments
Looks like a bug in compaction, not renumbering SSA values that occur in
MWE: using Base.Meta
using Core.IR
let m = Meta.@lower 1 + 1
@assert Meta.isexpr(m, :thunk)
src = m.args[1]::CodeInfo
src.code = Any[
GotoIfNot(false, 1),
Expr(:something),
Expr(:something),
Expr(:something),
Expr(:something),
Expr(:something),
Expr(:enter, 8),
Expr(:leave),
Expr(:pop_exception, SSAValue(7)),
ReturnNode(),
]
nstmts = length(src.code)
src.ssavaluetypes = nstmts
src.codelocs = fill(Int32(1), nstmts)
src.ssaflags = fill(Int32(0), nstmts)
ir = Core.Compiler.inflate_ir(src)
Core.Compiler.verify_ir(ir)
println("IR at start:")
display(ir)
ir = Core.Compiler.compact!(ir, true)
Core.Compiler.verify_ir(ir)
println("After compaction:")
display(ir)
end |
Thanks for the great MWE - I'll take a look at this 👍 |
The issue here is that The verifier claims this is OK - we are allowed to have invalid IR in unreachable blocks, but codegen is not respecting that. Depending on which code we decide is "right", this is either a bug in codegen for not checking reachability or in |
As a fun side-bug, the MWE should not have passed the verifier to begin with. An error-handling A fixed version is: src.code = Any[
GotoIfNot(false, 12),
Expr(:something),
Expr(:something),
Expr(:something),
Expr(:something),
Expr(:something),
Expr(:enter, 10),
Expr(:leave),
GotoNode(12),
Expr(:leave),
Expr(:pop_exception, SSAValue(7)),
ReturnNode(nothing),
] which still shows the same problem. |
FWIW, the original MWE probably wasn't invalid, but this IR is the result of running |
) This should catch issues like #51561, at least when running with `FORCE_ASSERTIONS := 1` (as PkgEval does).
With #51579, PkgEval catches additional bugs like this. The GR-related failure reported here is triggered by 11 packages, and there's 2 additional failures that look related: https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_date/2023-10/10/JuMP.primary.log and https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_date/2023-10/10/ConstrainedShortestPaths.primary.log. I'll verify those once this bug gets fixed, to see if they originate from different IR than what I've reduced here. |
@topolarity saying that this is likely fixed, we just need to make sure that the fix is also on 1.11. |
The MWE in #51561 (comment) does not repro on 1.11 beta1 so I will assume this is fixed. Please reopen if this is still an issue. |
The following snippet (reduced from Plots.jl) produces IR with 87 SSA values, but contains an invalid
pop_exception
expression that references SSA value 101:That causes mayhem during compilation. For example, the
ssavalue_usecount
increment here goes out of bounds and tramples over a llvm::GlobalVariable, causing segfaults during pkgimage emission:julia/src/codegen.cpp
Lines 7786 to 7789 in a988992
The above crash (and others that look similar) has started appearing on PkgEval since we enabled pkgimages, and was caught in an rr trace which helped me track down the root cause, or at least to the point I found corrupt Julia IR.
Bisected to #50805, but it's not clear to me whether this caused the corruption or just exposed it.
Regarding the OOB, I wonder if we shouldn't bounds-check accesses to
ssavalue_usecount
and the like, at least when running with assertions.cc @Keno @aviatesk
The text was updated successfully, but these errors were encountered: