Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashes in calls of functions generated with Symbolics.build_function #651

Closed
lassepe opened this issue Jul 13, 2022 · 8 comments · May be fixed by #664
Closed

Crashes in calls of functions generated with Symbolics.build_function #651

lassepe opened this issue Jul 13, 2022 · 8 comments · May be fixed by #664

Comments

@lassepe
Copy link

lassepe commented Jul 13, 2022

In some of my research code, I am seeing sporadic crashes for functions generated by Symbolics.build_functions in codegen.cpp

julia: /buildworker/worker/package_linux64/build/src/codegen.cpp:3635: jl_cgval_t emit_invoke(jl_codectx_t&, const jl_cgval_t&, const jl_cgval_t*, size_t, jl_value_t*): Assertion `(((jl_value_t*)(((jl_taggedvalue_t*)((char*)(mi) - sizeof(jl_taggedvalue_t)))->header & ~(uintptr_t)15))==(jl_value_t*)(jl_method_instance_type))' failed.
Click to see the full backtrace captured with gdb.
julia: /buildworker/worker/package_linux64/build/src/codegen.cpp:3635: jl_cgval_t emit_invoke(jl_codectx_t&, const jl_cgval_t&, const jl_cgval_t*, size_t, jl_value_t*): Assertion `(((jl_value_t*)(((jl_taggedvalue_t*)((char*)(mi) - sizeof(jl_taggedvalue_t)))->header & ~(uintptr_t)15))==(jl_value_t*)(jl_method_instance_type))' failed.

Thread 1 "julia" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) backtrace 
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7d8d859 in __GI_abort () at abort.c:79
#2  0x00007ffff7d8d729 in __assert_fail_base (fmt=0x7ffff7f23588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=0x7ffff70f5200 "(((jl_value_t*)(((jl_taggedvalue_t*)((char*)(mi) - sizeof(jl_taggedvalue_t)))->header & ~(uintptr_t)15))==(jl_value_t*)(jl_method_instance_type))", 
    file=0x7ffff70f3c48 "/buildworker/worker/package_linux64/build/src/codegen.cpp", line=3635, 
    function=<optimized out>) at assert.c:92
#3  0x00007ffff7d9efd6 in __GI___assert_fail (
    assertion=assertion@entry=0x7ffff70f5200 "(((jl_value_t*)(((jl_taggedvalue_t*)((char*)(mi) - sizeof(jl_taggedvalue_t)))->header & ~(uintptr_t)15))==(jl_value_t*)(jl_method_instance_type))", 
    file=file@entry=0x7ffff70f3c48 "/buildworker/worker/package_linux64/build/src/codegen.cpp", 
    line=line@entry=3635, 
    function=function@entry=0x7ffff70febe0 <emit_invoke(jl_codectx_t&, jl_cgval_t const&, jl_cgval_t const*, unsigned long, _jl_value_t*)::__PRETTY_FUNCTION__> "jl_cgval_t emit_invoke(jl_codectx_t&, const jl_cgval_t&, const jl_cgval_t*, size_t, jl_value_t*)") at assert.c:101
#4  0x00007ffff6f4bf90 in emit_invoke (ctx=..., lival=..., argv=argv@entry=0x7fffffff6f40, nargs=nargs@entry=4, 
    rt=rt@entry=0x7fffe33a8160 <jl_system_image_data+1306272>)
    at /buildworker/worker/package_linux64/build/src/codegen.cpp:3635
#5  0x00007ffff6f751a7 in emit_invoke (ctx=..., rt=0x7fffe33a8160 <jl_system_image_data+1306272>, 
    ex=<optimized out>) at /buildworker/worker/package_linux64/build/src/codegen.cpp:3626
#6  0x00007ffff6f6ec85 in emit_expr (ctx=..., expr=expr@entry=0x7ffed3cb72b0, ssaval=ssaval@entry=2)
    at /buildworker/worker/package_linux64/build/src/codegen.cpp:4585
#7  0x00007ffff6f77e55 in emit_ssaval_assign (ctx=..., idx=idx@entry=2, r=r@entry=0x7ffed3cb72b0)
    at /buildworker/worker/package_linux64/build/src/codegen.cpp:4245
#8  0x00007ffff6f67cca in emit_stmtpos (ssaval_result=2, expr=0x7ffed3cb72b0, ctx=...)
    at /buildworker/worker/package_linux64/build/src/codegen.cpp:4487
#9  emit_function (lam=lam@entry=0x7ffeeb6d3210, src=src@entry=0x7fff54adfc90, 
    jlrettype=jlrettype@entry=0x7fffe33a8160 <jl_system_image_data+1306272>, params=..., 
    vaOverride=vaOverride@entry=false) at /buildworker/worker/package_linux64/build/src/codegen.cpp:7326
#10 0x00007ffff6f7f2b9 in jl_emit_code (li=0x7ffeeb6d3210, src=0x7fff54adfc90, 
    jlrettype=0x7fffe33a8160 <jl_system_image_data+1306272>, params=...)
    at /buildworker/worker/package_linux64/build/src/codegen.cpp:7688
#11 0x00007ffff6f7f721 in jl_emit_codeinst (codeinst=codeinst@entry=0x7ffee5832b30, src=<optimized out>, 
    src@entry=0x7fff54adfc90, params=...) at /buildworker/worker/package_linux64/build/src/codegen.cpp:7733
#12 0x00007ffff7033950 in _jl_compile_codeinst (codeinst=codeinst@entry=0x7ffee5832b30, src=0x7fff54adfc90, 
    world=world@entry=31858) at /buildworker/worker/package_linux64/build/src/jitlayers.cpp:124
#13 0x00007ffff7035082 in jl_generate_fptr (mi=mi@entry=0x7ffeeb6d3210, world=world@entry=31858)
    at /buildworker/worker/package_linux64/build/src/jitlayers.cpp:350
#14 0x00007ffff6fa62dd in jl_compile_method_internal (mi=mi@entry=0x7ffeeb6d3210, world=world@entry=31858)
    at /buildworker/worker/package_linux64/build/src/gf.c:1980
#15 0x00007ffff6fa6c33 in jl_compile_method_internal (world=31858, mi=0x7ffeeb6d3210)
    at /buildworker/worker/package_linux64/build/src/gf.c:2246
#16 _jl_invoke (world=31858, mfunc=0x7ffeeb6d3210, nargs=3, args=0x7fffffff9a60, F=0x7ffee9cbee48)
    at /buildworker/worker/package_linux64/build/src/gf.c:2239
#17 jl_invoke (F=0x7ffee9cbee48, args=0x7fffffff9a60, nargs=3, mfunc=0x7ffeeb6d3210)
    at /buildworker/worker/package_linux64/build/src/gf.c:2254

It does not happen every time but the frequency seems to correlate with:

  • the "size" of the vector-valued function (happens more often for large outputs)
  • the re-generation of this function generated with Revise.jl (though I have also seen this error on first try without any Revise action)

Unfortunately, I have thus far been unable to create a compact reproducer that does not involve a ton of my research code. However, I have a setup to reproduce this issue locally with gdb and am happy to provide more information if necessary.

See reproducer below.

Version Info

  • Julia 1.7.3
  • Symbolics 4.8.3
  • RuntimeGeneratedFunctions 0.5.3
@ChrisRackauckas
Copy link
Member

Is this a standard generic Julia binary from https://julialang.org/?

@lassepe
Copy link
Author

lassepe commented Jul 13, 2022

Yes, it is.

@lassepe
Copy link
Author

lassepe commented Jul 14, 2022

Here's a minimal reproducer that reliably triggers the issue for me:

using Symbolics
using SparseArrays

function main(; n=10000, m=10000, nsp=10)
    x = begin
        @variables(x[1:n],) |> only |> Symbolics.scalarize
    end

    f = map(1:m) do _
        ind = rand(eachindex(x), nsp)
        sum(x -> x^2, x[ind])
    end

    J = Symbolics.sparsejacobian(f, x)
    (J_rows, J_cols, J_vals) = findnz(J)


    J_vals_fn! = Symbolics.build_function(J_vals, x; expression=Val{false})[2]
    sparse_J = (; rows=J_rows, cols=J_cols, (vals_fn!)=J_vals_fn!)

    result = zeros(length(sparse_J.rows))
    input = rand(length(x))
    sparse_J.vals_fn!(result, input)
    result
end

@shashi
Copy link
Member

shashi commented Jul 22, 2022

@JeffBezanson

@lassepe
Copy link
Author

lassepe commented Jul 22, 2022

I should probably add here that I have opened an issue in Base instead because it seems the more appropriate place for this kind of bug. That issue is also strictly more informative than this one by now, since I've added an rr-trace and git bisect results.

So to the maintainers here: feel free to close this issue if you don't think that there is any value in tracking it here as well.

@lassepe lassepe changed the title Sporatic crashes in calls of functions generated with Symbolics.build_function Crashes in calls of functions generated with Symbolics.build_function Jul 22, 2022
@shashi
Copy link
Member

shashi commented Jul 23, 2022

I think Jeff found the issue and fixed it today. I have a fix for this that makes it work within Symbolics, I'll close this issue with that change.

@JeffBezanson
Copy link

Should be fixed by JuliaLang/julia#45173

@lassepe
Copy link
Author

lassepe commented Jul 26, 2022

I tested the reproducer above with JuliaLang/julia#45173 and the bug does not occur anymore. I should also say, however, that on nightly (even without that PR) the problem seemed to be fixed (though probably only symptomatically).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants