Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid IR causes OOB, trampling over LLVM objects, causing segfaults #51561

Closed
maleadt opened this issue Oct 3, 2023 · 8 comments
Closed

Invalid IR causes OOB, trampling over LLVM objects, causing segfaults #51561

maleadt opened this issue Oct 3, 2023 · 8 comments
Assignees
Labels
compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) regression Regression in behavior compared to a previous version
Milestone

Comments

@maleadt
Copy link
Member

maleadt commented Oct 3, 2023

The following snippet (reduced from Plots.jl) produces IR with 87 SSA values, but contains an invalid pop_exception expression that references SSA value 101:

julia> function _cbar_unique(values, propname)
           out = last(values)
           if any(x != out for x in values)
               @warn "Multiple series with different $propname share a colorbar. " *
                     "Colorbar may not reflect all series correctly."
           end
           out
       end

julia> Base.code_ircode(_cbar_unique,  Tuple{Array{Nothing, 1}, String})
1-element Vector{Any}:
2 1 ── %1   = Base.arraysize(_2, 1)::Int64       │╻╷╷╷╷╷   last
...
  37%78  = $(Expr(:the_exception))::Any       ││
  │           Core._call_latest(Base.CoreLogging.logging_error, %75, $(QuoteNode(Warn)), %76, Symbol("REPL[1]"), :Main_b51b0ccb, %74, 4, %78, true)::Any
  └───        $(Expr(:pop_exception, :(%101)))::Core.Const(nothing)pansion
...
7 43%86  = φ (#42 => %70, #21 => nothing)::Core.Const(nothing)
  └───        return %86=> Nothing

That causes mayhem during compilation. For example, the ssavalue_usecount increment here goes out of bounds and tramples over a llvm::GlobalVariable, causing segfaults during pkgimage emission:

julia/src/codegen.cpp

Lines 7786 to 7789 in a988992

if (jl_is_ssavalue(val)) {
ctx.ssavalue_usecount[((jl_ssavalue_t*)val)->id-1] += 1;
return true;
}

Failed to precompile Plots [91a5bcdd-55d7-5caf-9e0b-520d859cae80] to "/home/pkgeval/.julia/compiled/v1.11/Plots/jl_fbGldF".
[27] signal (11.128): Segmentation fault
in expression starting at none:0
_ZNK4llvm5Value12getValueNameEv at /opt/julia/bin/../lib/julia/libLLVM-15jl.so (unknown line)
_ZNK4llvm5Value7getNameEv at /opt/julia/bin/../lib/julia/libLLVM-15jl.so (unknown line)
jl_create_native_impl at /source/src/aotcompile.cpp:364
jl_precompile_ at /source/src/precompile_utils.c:263
jl_precompile_worklist at /source/src/precompile_utils.c:318 [inlined]
ijl_create_system_image at /source/src/staticdata.c:2740
ijl_write_compiler_output at /source/src/precompile.c:121
ijl_atexit_hook at /source/src/init.c:251
jl_repl_entrypoint at /source/src/jlapi.c:732
main at /source/cli/loader_exe.c:58
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x401098)
Allocations: 59722073 (Pool: 59655895; Big: 66178); GC: 2

The above crash (and others that look similar) has started appearing on PkgEval since we enabled pkgimages, and was caught in an rr trace which helped me track down the root cause, or at least to the point I found corrupt Julia IR.

Bisected to #50805, but it's not clear to me whether this caused the corruption or just exposed it.

Regarding the OOB, I wonder if we shouldn't bounds-check accesses to ssavalue_usecount and the like, at least when running with assertions.

cc @Keno @aviatesk

@maleadt maleadt added regression Regression in behavior compared to a previous version compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) labels Oct 3, 2023
@maleadt maleadt added this to the 1.11 milestone Oct 3, 2023
@maleadt
Copy link
Member Author

maleadt commented Oct 3, 2023

Looks like a bug in compaction, not renumbering SSA values that occur in :pop_exception position?

IR at start:
 1 ─      goto #1 if not false                                                                                    │
 2 ─      $(Expr(:something))::Any                                                                                │
 │        $(Expr(:something))::Any                                                                                │
 │        $(Expr(:something))::Any                                                                                │
 │        $(Expr(:something))::Any                                                                                │
 └──      $(Expr(:something))::Any                                                                                │
 3 ─ %7 = $(Expr(:enter, #4))                                                                                     │
 4 ┄      $(Expr(:leave))                                                                                         │
 5 ─      $(Expr(:pop_exception, :(%7)))::Any                                                                     │
 └──      unreachable                                                                                             │

After compaction:
 1 ─      goto #1                                                                                                 │
 2 ─      unreachable                                                                                             │
 3 ─      unreachable                                                                                             │
 4 ─      $(Expr(:leave))                                                                                         │
 5 ─      $(Expr(:pop_exception, :(%7)))::Any                                                                     │
 └──      unreachable                                                                                             │

MWE:

using Base.Meta
using Core.IR

let m = Meta.@lower 1 + 1
    @assert Meta.isexpr(m, :thunk)
    src = m.args[1]::CodeInfo
    src.code = Any[
        GotoIfNot(false, 1),
        Expr(:something),
        Expr(:something),
        Expr(:something),
        Expr(:something),
        Expr(:something),
        Expr(:enter, 8),
        Expr(:leave),
        Expr(:pop_exception, SSAValue(7)),
        ReturnNode(),
    ]
    nstmts = length(src.code)
    src.ssavaluetypes = nstmts
    src.codelocs = fill(Int32(1), nstmts)
    src.ssaflags = fill(Int32(0), nstmts)
    ir = Core.Compiler.inflate_ir(src)
    Core.Compiler.verify_ir(ir)

    println("IR at start:")
    display(ir)

    ir = Core.Compiler.compact!(ir, true)
    Core.Compiler.verify_ir(ir)

    println("After compaction:")
    display(ir)
end

@topolarity
Copy link
Member

Thanks for the great MWE - I'll take a look at this 👍

@topolarity
Copy link
Member

The issue here is that compact!() is leaving behind invalid Expr(:leave) and Expr(:pop_exception, ...) statements in what it knows are unreachable BasicBlocks.

The verifier claims this is OK - we are allowed to have invalid IR in unreachable blocks, but codegen is not respecting that.

Depending on which code we decide is "right", this is either a bug in codegen for not checking reachability or in verify_ir() and compact!() for not handling validity conditions in unreachable basic blocks.

@topolarity
Copy link
Member

As a fun side-bug, the MWE should not have passed the verifier to begin with.

An error-handling Expr(:leave) is not allowed to be reachable by standard control-flow edges, since that means the pop_exception would not obey stack discipline.

A fixed version is:

src.code = Any[
    GotoIfNot(false, 12),
    Expr(:something),
    Expr(:something),
    Expr(:something),
    Expr(:something),
    Expr(:something),
    Expr(:enter, 10),
    Expr(:leave),
    GotoNode(12),
    Expr(:leave),
    Expr(:pop_exception, SSAValue(7)),
    ReturnNode(nothing),
]

which still shows the same problem.

@maleadt
Copy link
Member Author

maleadt commented Oct 6, 2023

FWIW, the original MWE probably wasn't invalid, but this IR is the result of running creduce which may have introduced additional issues.

maleadt added a commit that referenced this issue Oct 10, 2023
)

This should catch issues like #51561,
at least when running with `FORCE_ASSERTIONS := 1` (as PkgEval does).
@maleadt
Copy link
Member Author

maleadt commented Oct 11, 2023

With #51579, PkgEval catches additional bugs like this. The GR-related failure reported here is triggered by 11 packages, and there's 2 additional failures that look related: https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_date/2023-10/10/JuMP.primary.log and https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_date/2023-10/10/ConstrainedShortestPaths.primary.log. I'll verify those once this bug gets fixed, to see if they originate from different IR than what I've reduced here.

@KristofferC
Copy link
Member

@topolarity saying that this is likely fixed, we just need to make sure that the fix is also on 1.11.

@KristofferC
Copy link
Member

The MWE in #51561 (comment) does not repro on 1.11 beta1 so I will assume this is fixed. Please reopen if this is still an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:optimizer Optimization passes (mostly in base/compiler/ssair/) regression Regression in behavior compared to a previous version
Projects
None yet
Development

No branches or pull requests

3 participants