Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault due to MethodError when tracking Base #552

Open
simeonschaub opened this issue Oct 5, 2020 · 26 comments · May be fixed by #556
Open

Segfault due to MethodError when tracking Base #552

simeonschaub opened this issue Oct 5, 2020 · 26 comments · May be fixed by #556
Labels

Comments

@simeonschaub
Copy link
Contributor

Steps to reproduce:

Apply this patch to base/range.jl on the latest Julia master:

diff --git a/base/range.jl b/base/range.jl
index 5f30a31a0b..e004161551 100644
--- a/base/range.jl
+++ b/base/range.jl
@@ -318,7 +318,7 @@ be 1.
 """
 struct OneTo{T<:Integer} <: AbstractUnitRange{T}
     stop::T
-    OneTo{T}(stop) where {T<:Integer} = new(max(zero(T), stop))
+    OneTo{T}(stop) where {T<:Integer} = (@_inline_meta; new(max(zero(T), stop)))
     function OneTo{T}(r::AbstractRange) where {T<:Integer}
         throwstart(r) = (@_noinline_meta; throw(ArgumentError("first element must be 1, got $(first(r))")))
         throwstep(r)  = (@_noinline_meta; throw(ArgumentError("step must be 1, got $(step(r))")))
@@ -327,7 +327,7 @@ struct OneTo{T<:Integer} <: AbstractUnitRange{T}
         return new(max(zero(T), last(r)))
     end
 end
-OneTo(stop::T) where {T<:Integer} = OneTo{T}(stop)
+OneTo(stop::T) where {T<:Integer} = (@_inline_meta; OneTo{T}(stop))
 OneTo(r::AbstractRange{T}) where {T<:Integer} = OneTo{T}(r)
 
 ## Step ranges parameterized by length

Run Julia and do:

julia> using Revise

julia> Revise.track(Base)

julia> mat = zeros(10,10);
error in running finalizer: MethodError(f=Base.OneTo{T} where T<:Integer, args=(16,), world=0x000000000000722e)
error in running finalizer: MethodError(f=Base.OneTo{T} where T<:Integer, args=(16,), world=0x000000000000722e)
error in running finalizer: MethodError(f=Base.OneTo{T} where T<:Integer, args=(16,), world=0x000000000000722e)
error in running finalizer: MethodError(f=Base.OneTo{T} where T<:Integer, args=(16,), world=0x000000000000722e)
error in running finalizer: MethodError(f=Base.OneTo{T} where T<:Integer, args=(16,), world=0x000000000000722e)
┌ Error: Failed to revise /home/simeon/Documents/Julia/julia/base/range.jl
│   exception =
│    LoadError: MethodError: no method matching Base.OneTo(::Int64)
│    Closest candidates are:
│      Base.OneTo(::AbstractRange{T}) where T<:Integer at range.jl:331
│    Stacktrace:
│     [1] map(f::Type{Base.OneTo}, t::Tuple{Int64})
│       @ Base ./tuple.jl:179
│     [2] axes
│       @ ./abstractarray.jl:75 [inlined]
│     [3] _array_for(#unused#::Type{Symbol}, itr::UnitRange{Int64}, #unused#::Base.HasShape{1})
│       @ Base ./array.jl:678
│     [4] vect(::Symbol, ::Vararg{Symbol, N} where N)
│       @ Base ./array.jl:130
│     [5] docm(source::LineNumberNode, mod::Module, meta::Any, ex::Any, define::Bool)
│       @ Base.Docs ./docs/Docs.jl:545
│     [6] @doc(__source__::LineNumberNode, __module__::Module, x::Vararg{Any, N} where N)
│       @ Core ./boot.jl:508
│     [7] lower
│       @ ./meta.jl:124 [inlined]
│     [8] methods_by_execution!(recurse::Any, methodinfo::Revise.CodeTrackingMethodInfo, docexprs::Dict{Module, Vector{Expr}}, mod::Module, ex::Expr; mode::Symbol, disablebp::Bool, always_rethrow::Bool, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
│       @ Revise ~/.julia/packages/Revise/fUuEE/src/lowered.jl:178
│    in expression starting at /home/simeon/Documents/Julia/julia/base/range.jl:312
└ @ Revise ~/.julia/packages/Revise/fUuEE/src/Revise.jl:705
┌ Warning: The running code does not match the saved version for the following files:
│ 
│   /home/simeon/Documents/Julia/julia/base/range.jl
│ 
│ If the error was due to evaluation order, it can sometimes be resolved by calling `Revise.retry()`.
│ Use Revise.errors() to report errors again. Only the first error in each file is shown.
│ Your prompt color may be yellow until the errors are resolved.
└ @ Revise ~/.julia/packages/Revise/fUuEE/src/Revise.jl:799

SYSTEM (REPL): showing an error caused an error

SYSTEM (REPL): caught exception of type MethodError while trying to handle a nested exception; giving up

julia>

###
### I press `a`
###

fatal: error thrown and no exception handler available.
MethodError(f=Base.OneTo{T} where T<:Integer, args=(32,), world=0x000000000000722e)
jl_method_error_bare at /home/simeon/Documents/Julia/julia/src/gf.c:1767
jl_method_error at /home/simeon/Documents/Julia/julia/src/gf.c:1785
jl_lookup_generic_ at /home/simeon/Documents/Julia/julia/src/gf.c:2355 [inlined]
jl_apply_generic at /home/simeon/Documents/Julia/julia/src/gf.c:2370
map at ./tuple.jl:179
axes at ./abstractarray.jl:75 [inlined]
has_offset_axes at ./abstractarray.jl:85
require_one_based_indexing at ./abstractarray.jl:89
GenericIOBuffer at ./iobuffer.jl:19
GenericIOBuffer at ./iobuffer.jl:27
#IOBuffer#348 at ./iobuffer.jl:98
Type##kw at ./iobuffer.jl:91
#IOBuffer#349 at ./iobuffer.jl:114
GenericIOBuffer at ./iobuffer.jl:112
#with_output_color#795 at ./util.jl:71
with_output_color##kw at ./util.jl:71
unknown function (ip: 0x7fb64c1734c4)
#printstyled#796 at ./util.jl:105
printstyled##kw at ./util.jl:105
unknown function (ip: 0x7fb64c1733eb)
display_error at ./client.jl:102
display_error at ./client.jl:107
jl_apply at /home/simeon/Documents/Julia/julia/src/julia.h:1687 [inlined]
do_apply at /home/simeon/Documents/Julia/julia/src/builtins.c:672
jl_f__apply_latest at /home/simeon/Documents/Julia/julia/src/builtins.c:722
#invokelatest#2 at ./essentials.jl:709 [inlined]
invokelatest at ./essentials.jl:708 [inlined]
_start at ./client.jl:487
jfptr__start_31012 at /home/simeon/Documents/Julia/julia/usr/lib/julia/sys.so (unknown line)
jl_apply at /home/simeon/Documents/Julia/julia/src/julia.h:1687 [inlined]
true_main at /home/simeon/Documents/Julia/julia/src/jlapi.c:553
repl_entrypoint at /home/simeon/Documents/Julia/julia/src/jlapi.c:672
main at /home/simeon/Documents/Julia/julia/cli/loader_exe.c:46
__libc_start_main at /usr/lib/libc.so.6 (unknown line)
_start at ./julia (unknown line)

atexit hook threw an error: MethodError(f=Base.OneTo{T} where T<:Integer, args=(1,), world=0x000000000000722e)%

That should work, right? My best guess is that the compiler is trying to inline a method that Revise just deleted.

@timholy
Copy link
Owner

timholy commented Oct 5, 2020

It's a good guess. Probably Revise deleted the method, but Revise itself depends on it, and some place where it's using runtime dispatch ended up getting stuck. This is probably not fixable unless we do something to make Revise un-invalidatable, kind of like Core.Compiler. @c42f's work on running in a particular world age might be a way to do that?

@simeonschaub
Copy link
Contributor Author

simeonschaub commented Oct 5, 2020

I should also mention that this only happens for me if I add both @_inline_meta annotations, it's fine if I add either one of them on their own.

@c42f
Copy link
Collaborator

c42f commented Oct 6, 2020

Yes, sounds like you could try invoke_in_world to isolate the Revise machinery from Base getting broken. Points of note:

  1. You'd probably need to remove some places which Revise currently uses invokelatest, to avoid undoing the invoke_in_world.
  2. The world age isn't preserved across @async task launches (what world should Tasks run in? JuliaLang/julia#35690) so invoking any background tasks in the right world might need to be handled manually.

I had a very quick look, but I think my efforts were undone by point (1) as it seems there's some tricky uses of invokelatest which I didn't have time to stare closely at.

@timholy
Copy link
Owner

timholy commented Oct 6, 2020

The invokelatests are there simply to break the chain of inference. Revise3 is so much faster to launch and has less overhead in loading packages mostly because it uses tricks to delay compilation until it can be delayed no longer. Consequently you only pay the price of big chunks of Revise's functionality when you actually use them.

I'm guessing invoke_in_world would do the same? We could just store the world age in a Ref. I'll poke at this a bit.

so invoking any background tasks in the right world might need to be handled manually.

I already do that (see TaskThunk) as one of many efforts to reduce latency. It's hard to precompile anonymous functions as generated by @async, so if you care about latency it's better to use mechanism designed to achieve the same ends in a more precompile-friendly fashion.

@timholy
Copy link
Owner

timholy commented Oct 6, 2020

An aside: since both of you have made changes to Revise, if you don't know already it's often useful to disable

function trim_toplevel!(bt)
by adding a return bt to the top. It's there to prevent users from seeing long backtraces to Revise's own code, but when you're hacking on Revise you want to see them.

So, with these changes:

diff --git a/src/legacy_loading.jl b/src/legacy_loading.jl
index e310f22..5a6448f 100644
--- a/src/legacy_loading.jl
+++ b/src/legacy_loading.jl
@@ -156,7 +156,7 @@ function parse_pkg_files(id::PkgId)
     # To reduce compiler latency, use runtime dispatch for `queue_includes!`.
     # `queue_includes!` requires compilation of the whole parsing/expression-splitting infrastructure,
     # and it's better to wait to compile it until we actually need it.
-    Base.invokelatest(queue_includes!, pkgdata, id)
+    Base.invoke_in_world(worldage[], queue_includes!, pkgdata, id)
     return pkgdata
 end
 
diff --git a/src/loading.jl b/src/loading.jl
index db8d0b2..0764985 100644
--- a/src/loading.jl
+++ b/src/loading.jl
@@ -48,7 +48,7 @@ function parse_pkg_files(id::PkgId)
     # To reduce compiler latency, use runtime dispatch for `queue_includes!`.
     # `queue_includes!` requires compilation of the whole parsing/expression-splitting infrastructure,
     # and it's better to wait to compile it until we actually need it.
-    Base.invokelatest(queue_includes!, pkgdata, id)
+    Base.invoke_in_world(worldage[], queue_includes!, pkgdata, id)
     return pkgdata
 end
 
diff --git a/src/packagedef.jl b/src/packagedef.jl
index 441c25a..c053b8a 100644
--- a/src/packagedef.jl
+++ b/src/packagedef.jl
@@ -198,6 +198,11 @@ const silence_pkgs = Set{Symbol}()
 const depsdir = joinpath(dirname(@__DIR__), "deps")
 const silencefile = Ref(joinpath(depsdir, "silence.txt"))  # Ref so that tests don't clobber
 
+"""
+    world age
+"""
+const worldage = Ref{Union{Nothing,UInt}}(nothing)
+
 ##
 ## The inputs are sets of expressions found in each file.
 ## Some of those expressions will generate methods which are identified via their signatures.
@@ -1178,7 +1183,7 @@ end
 # This uses invokelatest not for reasons of world age but to ensure that the call is made at runtime.
 # This allows `revise_first` to be compiled without compiling `revise` itself, and greatly
 # reduces the overhead of using Revise.
-revise_first(ex) = Expr(:toplevel, :(isempty($revision_queue) || Base.invokelatest($revise)), ex)
+revise_first(ex) = Expr(:toplevel, :(isempty($revision_queue) || Base.invoke_in_world($(worldage[]), $revise)), ex)
 
 @noinline function run_backend(backend)
     while true
@@ -1277,6 +1282,7 @@ function init_worker(p)
 end
 
 function __init__()
+    worldage[] = Base.get_world_counter()
     run_on_worker = get(ENV, "JULIA_REVISE_WORKER_ONLY", "0")
     if !(myid() == 1 || run_on_worker == "1")
         return nothing
diff --git a/src/pkgs.jl b/src/pkgs.jl
index 95fbaa6..fb515f1 100644
--- a/src/pkgs.jl
+++ b/src/pkgs.jl
@@ -171,7 +171,7 @@ function maybe_add_includes_to_pkgdata!(pkgdata::PkgData, file::AbstractString,
                 parse_source!(fi.modexsigs, fullfile, mod)
                 if eval_now
                     # Use runtime dispatch to reduce latency
-                    Base.invokelatest(instantiate_sigs!, fi.modexsigs; mode=:eval)
+                    Base.invoke_in_world(worldage[], instantiate_sigs!, fi.modexsigs; mode=:eval)
                 end
             end
             # Add to watchlist
@@ -237,7 +237,7 @@ function _add_require(sourcefile, modcaller, idmod, modname, expr)
             end
         end
         if complex
-            Base.invokelatest(eval_require_now, pkgdata, fileidx, filekey, sourcefile, modcaller, expr)
+            Base.invoke_in_world(worldage[], eval_require_now, pkgdata, fileidx, filekey, sourcefile, modcaller, expr)
         end
     finally
         unlock(requires_lock)
diff --git a/src/utils.jl b/src/utils.jl
index 785c4ac..7b97483 100644
--- a/src/utils.jl
+++ b/src/utils.jl
@@ -184,6 +184,7 @@ println_maxsize(io::IO, args...; kwargs...) = printf_maxsize(println, io, args..
 
 # Trimming backtraces
 function trim_toplevel!(bt)
+    return bt
     n = itoplevel = length(bt)
     for (i, t) in enumerate(bt)
         sfs = StackTraces.lookup(t)

I then get

julia> Revise.track(Base)

julia> 1+1
error in running finalizer: MethodError(f=Base.OneTo{T} where T<:Integer, args=(16,), world=0x000000000000722c)
error in running finalizer: MethodError(f=Base.OneTo{T} where T<:Integer, args=(16,), world=0x000000000000722c)
error in running finalizer: MethodError(f=Base.OneTo{T} where T<:Integer, args=(16,), world=0x000000000000722c)
error in running finalizer: MethodError(f=Base.OneTo{T} where T<:Integer, args=(16,), world=0x000000000000722c)
error in running finalizer: MethodError(f=Base.OneTo{T} where T<:Integer, args=(16,), world=0x000000000000722c)
┌ Error: Failed to revise /home/tim/src/julia-master/base/range.jl
│   exception =
│    LoadError: MethodError: no method matching Base.OneTo(::Int64)
│    Closest candidates are:
│      Base.OneTo(::AbstractRange{T}) where T<:Integer at range.jl:331
│    Stacktrace:
│      [1] map(f::Type{Base.OneTo}, t::Tuple{Int64})
│        @ Base ./tuple.jl:179
│      [2] axes
│        @ ./abstractarray.jl:75 [inlined]
│      [3] _array_for(#unused#::Type{Symbol}, itr::UnitRange{Int64}, #unused#::Base.HasShape{1})
│        @ Base ./array.jl:678
│      [4] vect(::Symbol, ::Vararg{Symbol, N} where N)
│        @ Base ./array.jl:130
│      [5] docm(source::LineNumberNode, mod::Module, meta::Any, ex::Any, define::Bool)
│        @ Base.Docs ./docs/Docs.jl:545
│      [6] @doc(__source__::LineNumberNode, __module__::Module, x::Vararg{Any, N} where N)
│        @ Core ./boot.jl:508
│      [7] lower
│        @ ./meta.jl:124 [inlined]
│      [8] methods_by_execution!(recurse::Any, methodinfo::Revise.CodeTrackingMethodInfo, docexprs::Dict{Module, Vector{Expr}}, mod::Module, ex::Expr; mode::Symbol, disablebp::Bool, always_rethrow::Bool, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
│        @ Revise ~/.julia/dev/Revise/src/lowered.jl:178
│      [9] eval_with_signatures(mod::Module, ex::Expr; mode::Symbol, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
│        @ Revise ~/.julia/dev/Revise/src/packagedef.jl:460
│     [10] (::Revise.var"#86#88"{Symbol, Revise.RelocatableExpr, OrderedCollections.OrderedDict{Revise.RelocatableExpr, Union{Nothing, Vector{Any}}}, Module})()
│        @ Revise ~/.julia/dev/Revise/src/packagedef.jl:317
│     [11] with_logstate(f::Function, logstate::Any)
│        @ Base.CoreLogging ./logging.jl:491
│     [12] with_logger
│        @ ./logging.jl:603 [inlined]
│     [13] #eval_rex#85
│        @ ~/.julia/dev/Revise/src/packagedef.jl:310 [inlined]
│     [14] revise(; throw::Bool)
│        @ Revise ~/.julia/dev/Revise/src/packagedef.jl:760
│     [15] revise()
│        @ Revise ~/.julia/dev/Revise/src/packagedef.jl:723
│     [16] #invoke_in_world#4
│        @ ./essentials.jl:744 [inlined]
│     [17] invoke_in_world(::UInt64, ::Any)
│        @ Base ./essentials.jl:743
│     [18] top-level scope
│        @ none:1
│     [19] eval(m::Module, e::Any)
│        @ Core ./boot.jl:360
│     [20] eval_user_input(ast::Any, backend::REPL.REPLBackend)
│        @ REPL ~/src/julia-master/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:139
│     [21] repl_backend_loop(backend::REPL.REPLBackend)
│        @ REPL ~/src/julia-master/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:200
│     [22] start_repl_backend(backend::REPL.REPLBackend, consumer::Any)
│        @ REPL ~/src/julia-master/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:185
│     [23] run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool)
│        @ REPL ~/src/julia-master/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:316
│     [24] run_repl(repl::REPL.AbstractREPL, consumer::Any)
│        @ REPL ~/src/julia-master/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:304
│     [25] (::Base.var"#855#857"{Bool, Bool, Bool})(REPL::Module)
│        @ Base ./client.jl:387
│     [26] #invokelatest#2
│        @ ./essentials.jl:709 [inlined]
│     [27] invokelatest
│        @ ./essentials.jl:708 [inlined]
│     [28] run_main_repl(interactive::Bool, quiet::Bool, banner::Bool, history_file::Bool, color_set::Bool)
│        @ Base ./client.jl:372
│     [29] exec_options(opts::Base.JLOptions)
│        @ Base ./client.jl:302
│     [30] _start()
│        @ Base ./client.jl:485
│    in expression starting at /home/tim/src/julia-master/base/range.jl:312
└ @ Revise ~/.julia/dev/Revise/src/packagedef.jl:698
┌ Warning: The running code does not match the saved version for the following files:
│ 
│   /home/tim/src/julia-master/base/range.jl
│ 
│ If the error was due to evaluation order, it can sometimes be resolved by calling `Revise.retry()`.
│ Use Revise.errors() to report errors again. Only the first error in each file is shown.
│ Your prompt color may be yellow until the errors are resolved.
└ @ Revise ~/.julia/dev/Revise/src/packagedef.jl:794
2

So it seems the invoke_in_world "works" (it appears in the backtrace) but it still ends up hitting the error. lower is just a ccall to jl_expand, so it ends up making it back into Julia code and perhaps it discards the selected world age at that point? I'm a bit stumped.

I've pushed my WIP as the invoke_in_world branch, in case it's useful. Also happy to grant commit access to either of you two if you want it.

@timholy
Copy link
Owner

timholy commented Oct 6, 2020

@simeonschaub, can't resist asking: why would you need to force inlining on those range methods? That should happen automatically.

@simeonschaub
Copy link
Contributor Author

Yes, you are absolutely right, it didn't end up doing anything. I was looking into the allocations in JuliaLang/julia#37888 and thought I'd just try that in case there was anything weird going on. The allocations came from somewhere else though.

@simeonschaub
Copy link
Contributor Author

lower is just a ccall to jl_expand, so it ends up making it back into Julia code and perhaps it discards the selected world age at that point?

The problem seems to be that macro expansion always just uses the latest worldage:

julia> macro m() 1 end
@m (macro with 1 method)

julia> wa = Base.get_world_counter()
0x0000000000007221

julia> macro m() 2 end
@m (macro with 1 method)

julia> Base.get_world_counter()
0x0000000000007222

julia> Base.invoke_in_world(wa, Meta.lower, Main, :(@m))
2

@c42f Is it possible to specify which worldage macro expansion should use?

@simeonschaub
Copy link
Contributor Author

simeonschaub commented Oct 6, 2020

It doesn't seem so: https://github.com/JuliaLang/julia/blob/949caad5f13e60c62a2b774b93057754afc3f0b4/src/ast.c#L971

It would probably make sense to have the ability to pass through an additional worldage argument in jl_expand.

@timholy
Copy link
Owner

timholy commented Oct 6, 2020

So does jl_expand_in_world, together with the invoke_in_world branch, actually fix the problem or is there still more stuff to do?

@simeonschaub
Copy link
Contributor Author

simeonschaub commented Oct 6, 2020

Yes, together with JuliaLang/julia#37902, this works:

diff --git a/src/legacy_loading.jl b/src/legacy_loading.jl
index e310f22..5a6448f 100644
--- a/src/legacy_loading.jl
+++ b/src/legacy_loading.jl
@@ -156,7 +156,7 @@ function parse_pkg_files(id::PkgId)
     # To reduce compiler latency, use runtime dispatch for `queue_includes!`.
     # `queue_includes!` requires compilation of the whole parsing/expression-splitting infrastructure,
     # and it's better to wait to compile it until we actually need it.
-    Base.invokelatest(queue_includes!, pkgdata, id)
+    Base.invoke_in_world(worldage[], queue_includes!, pkgdata, id)
     return pkgdata
 end
 
diff --git a/src/loading.jl b/src/loading.jl
index db8d0b2..0764985 100644
--- a/src/loading.jl
+++ b/src/loading.jl
@@ -48,7 +48,7 @@ function parse_pkg_files(id::PkgId)
     # To reduce compiler latency, use runtime dispatch for `queue_includes!`.
     # `queue_includes!` requires compilation of the whole parsing/expression-splitting infrastructure,
     # and it's better to wait to compile it until we actually need it.
-    Base.invokelatest(queue_includes!, pkgdata, id)
+    Base.invoke_in_world(worldage[], queue_includes!, pkgdata, id)
     return pkgdata
 end
 
diff --git a/src/lowered.jl b/src/lowered.jl
index e754c6b..d3812e6 100644
--- a/src/lowered.jl
+++ b/src/lowered.jl
@@ -128,6 +128,8 @@ function methods_by_execution(mod::Module, ex::Expr; kwargs...)
     return methodinfo, docexprs, frame
 end
 
+_lower(m::Module, ex, world::UInt) = ccall(:jl_expand_in_world, Any, (Any, Module, Cstring, Cint, Csize_t), ex, m, "none", 0, world)
+
 """
     methods_by_execution!(recurse=JuliaInterpreter.Compiled(), methodinfo, docexprs, mod::Module, ex::Expr;
                           mode=:eval, disablebp=true, skip_include=mode!==:eval, always_rethrow=false)
@@ -175,7 +177,7 @@ The other keyword arguments are more straightforward:
 function methods_by_execution!(@nospecialize(recurse), methodinfo, docexprs, mod::Module, ex::Expr;
                                mode::Symbol=:eval, disablebp::Bool=true, always_rethrow::Bool=false, kwargs...)
     mode ∈ (:sigs, :eval, :evalmeth, :evalassign) || error("unsupported mode ", mode)
-    lwr = Meta.lower(mod, ex)
+    lwr = _lower(mod, ex, worldage[])
     isa(lwr, Expr) || return nothing, nothing
     if lwr.head === :error || lwr.head === :incomplete
         error("lowering returned an error, ", lwr)
diff --git a/src/packagedef.jl b/src/packagedef.jl
index 441c25a..c053b8a 100644
--- a/src/packagedef.jl
+++ b/src/packagedef.jl
@@ -198,6 +198,11 @@ const silence_pkgs = Set{Symbol}()
 const depsdir = joinpath(dirname(@__DIR__), "deps")
 const silencefile = Ref(joinpath(depsdir, "silence.txt"))  # Ref so that tests don't clobber
 
+"""
+    world age
+"""
+const worldage = Ref{Union{Nothing,UInt}}(nothing)
+
 ##
 ## The inputs are sets of expressions found in each file.
 ## Some of those expressions will generate methods which are identified via their signatures.
@@ -1178,7 +1183,7 @@ end
 # This uses invokelatest not for reasons of world age but to ensure that the call is made at runtime.
 # This allows `revise_first` to be compiled without compiling `revise` itself, and greatly
 # reduces the overhead of using Revise.
-revise_first(ex) = Expr(:toplevel, :(isempty($revision_queue) || Base.invokelatest($revise)), ex)
+revise_first(ex) = Expr(:toplevel, :(isempty($revision_queue) || Base.invoke_in_world($(worldage[]), $revise)), ex)
 
 @noinline function run_backend(backend)
     while true
@@ -1277,6 +1282,7 @@ function init_worker(p)
 end
 
 function __init__()
+    worldage[] = Base.get_world_counter()
     run_on_worker = get(ENV, "JULIA_REVISE_WORKER_ONLY", "0")
     if !(myid() == 1 || run_on_worker == "1")
         return nothing
diff --git a/src/pkgs.jl b/src/pkgs.jl
index 95fbaa6..fb515f1 100644
--- a/src/pkgs.jl
+++ b/src/pkgs.jl
@@ -171,7 +171,7 @@ function maybe_add_includes_to_pkgdata!(pkgdata::PkgData, file::AbstractString,
                 parse_source!(fi.modexsigs, fullfile, mod)
                 if eval_now
                     # Use runtime dispatch to reduce latency
-                    Base.invokelatest(instantiate_sigs!, fi.modexsigs; mode=:eval)
+                    Base.invoke_in_world(worldage[], instantiate_sigs!, fi.modexsigs; mode=:eval)
                 end
             end
             # Add to watchlist
@@ -237,7 +237,7 @@ function _add_require(sourcefile, modcaller, idmod, modname, expr)
             end
         end
         if complex
-            Base.invokelatest(eval_require_now, pkgdata, fileidx, filekey, sourcefile, modcaller, expr)
+            Base.invoke_in_world(worldage[], eval_require_now, pkgdata, fileidx, filekey, sourcefile, modcaller, expr)
         end
     finally
         unlock(requires_lock)
diff --git a/src/utils.jl b/src/utils.jl
index 785c4ac..7b97483 100644
--- a/src/utils.jl
+++ b/src/utils.jl
@@ -184,6 +184,7 @@ println_maxsize(io::IO, args...; kwargs...) = printf_maxsize(println, io, args..
 
 # Trimming backtraces
 function trim_toplevel!(bt)
+    return bt
     n = itoplevel = length(bt)
     for (i, t) in enumerate(bt)
         sfs = StackTraces.lookup(t)
~/.../Julia/julia2 >>> ./julia --startup=no                                                                   ✓ 522
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.6.0-DEV.1140 (2020-10-06)
 _/ |\__'_|_|_|\__'_|  |  Commit 4ccfd37dc9* (0 days old master)
|__/                   |

julia> using Revise

julia> Revise.track(Base)

julia> 1+1
WARNING: redefinition of constant GIT_VERSION_INFO. This may fail, cause incorrect answers, or produce other errors.
2

@simeonschaub
Copy link
Contributor Author

Wait, I spoke too soon! After trying this again, I get the following output:

julia> Revise.track(Base)

julia> 1+1
error in running finalizer: MethodError(f=Base.OneTo{T} where T<:Integer, args=(16,), world=0x000000000000722a)
error in running finalizer: MethodError(f=Base.OneTo{T} where T<:Integer, args=(16,), world=0x000000000000722a)
error in running finalizer: MethodError(f=Base.OneTo{T} where T<:Integer, args=(16,), world=0x000000000000722a)
error in running finalizer: MethodError(f=Base.OneTo{T} where T<:Integer, args=(16,), world=0x000000000000722a)
error in running finalizer: MethodError(f=Base.OneTo{T} where T<:Integer, args=(16,), world=0x000000000000722a)
WARNING: redefinition of constant GIT_VERSION_INFO. This may fail, cause incorrect answers, or produce other errors.
2

It doesn't seem to be fatal, and I can use the REPL just fine without any error messages afterwards. So this is definitely an improvement, but it looks like some finalizers are still run in the wrong worldage.

@timholy
Copy link
Owner

timholy commented Oct 6, 2020

It doesn't seem to be fatal

That's what happens to me in the original...but I get a crash when I exit Julia. But that's without jl_expand_in_world.

@simeonschaub
Copy link
Contributor Author

With jl_expand_in_world, exiting Julia works fine without crashing, it just complains about the errors in the finalizers once.

@timholy
Copy link
Owner

timholy commented Oct 6, 2020

I wonder if we can find out whether these are Revise-dependent. It's too bad this involves an inner constructor, otherwise we could check whether just @evaling the same methods without Revise produces the same warnings.

@simeonschaub
Copy link
Contributor Author

I applied this patch to my Julia build, in hope to get more info about these errors in the finalizers:

diff --git a/base/gcutils.jl b/base/gcutils.jl
index 51e3943877..93e8623696 100644
--- a/base/gcutils.jl
+++ b/base/gcutils.jl
@@ -45,8 +45,19 @@ function finalizer(@nospecialize(f), @nospecialize(o))
     if !ismutable(o)
         error("objects of type ", typeof(o), " cannot be finalized")
     end
+    function _f(o)
+        try
+            f(o)
+        catch e
+            print("o = ")
+            show(stderr, o)
+            println()
+            Base.showerror(stderr, e)
+            Base.show_backtrace(stderr, backtrace())
+        end
+    end
     ccall(:jl_gc_add_finalizer_th, Cvoid, (Ptr{Cvoid}, Any, Any),
-          Core.getptls(), o, f)
+          Core.getptls(), o, _f)
     return o
 end

It didn't work, but I now got this error:

julia> Revise.track(Base)

julia> 1+1
o = error in running finalizer: ErrorException("task switch not allowed from inside gc finalizer")
o = error in running finalizer: ErrorException("task switch not allowed from inside gc finalizer")
o = error in running finalizer: ErrorException("task switch not allowed from inside gc finalizer")
o = error in running finalizer: ErrorException("task switch not allowed from inside gc finalizer")
o = error in running finalizer: ErrorException("task switch not allowed from inside gc finalizer")
2

I might just be reading this completely wrong, but it leads me to suspect that this may have something to do with tasks Revise spawns? Do you have any ideas how to further debug this?

@timholy
Copy link
Owner

timholy commented Oct 6, 2020

const finalizer_backtraces = []

function finalizer(...

try
    f(o)
catch e
    push!(finalizer_backtraces, backtrace())
    ...
end
...

and then look at them later. (EDIT: the task switch comes from the fact that you're trying to print, so the remedy is to avoid printing. Storing the data somewhere lets you look at it later.)

@simeonschaub
Copy link
Contributor Author

Ok, I think I'm a bit out of ideas:

diff --git a/base/gcutils.jl b/base/gcutils.jl
index 51e3943877..fc674766f6 100644
--- a/base/gcutils.jl
+++ b/base/gcutils.jl
@@ -4,6 +4,8 @@
 ==(w::WeakRef, v) = isequal(w.value, v)
 ==(w, v::WeakRef) = isequal(w, v.value)
 
+const finalizer_bts = Array{Any,1}()
+
 """
     finalizer(f, x)
 
@@ -45,8 +47,16 @@ function finalizer(@nospecialize(f), @nospecialize(o))
     if !ismutable(o)
         error("objects of type ", typeof(o), " cannot be finalized")
     end
+    function _f(o)
+        try
+            f(o)
+        catch e
+            push!(Base.finalizer_bts, backtrace())
+            rethrow()
+        end
+    end
     ccall(:jl_gc_add_finalizer_th, Cvoid, (Ptr{Cvoid}, Any, Any),
-          Core.getptls(), o, f)
+          Core.getptls(), o, _f)
     return o
 end
~/.../Julia/julia2 >>> JULIA_NUM_THREADS=1 ./julia --startup=no                                               ✓ 554
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.6.0-DEV.1141 (2020-10-06)
 _/ |\__'_|_|_|\__'_|  |  jl_expand_in_world/0539d3f9f4* (fork: 1 commits, 0 days)
|__/                   |

julia> using Revise
[ Info: Precompiling Revise [295af30f-e4ad-537b-8983-00126c2a3abe]

julia> Revise.track(Base)

julia> 1+1
error in running finalizer: MethodError(f=Base.OneTo{T} where T<:Integer, args=(1,), world=0x000000000000722b)
error in running finalizer: MethodError(f=Base.OneTo{T} where T<:Integer, args=(1,), world=0x000000000000722b)
error in running finalizer: MethodError(f=Base.OneTo{T} where T<:Integer, args=(1,), world=0x000000000000722b)
error in running finalizer: MethodError(f=Base.OneTo{T} where T<:Integer, args=(1,), world=0x000000000000722b)
error in running finalizer: MethodError(f=Base.OneTo{T} where T<:Integer, args=(1,), world=0x000000000000722b)
2

julia> Base.finalizer_bts
Any[]

My best guess is that it's somehow due to bootstrapping, so we actually have two separate finalizer_bts' and the problematic finalizer is actually defined somewhere in Core, but I have no clue whether that's actually what's going on or where to go from there.

@c42f
Copy link
Collaborator

c42f commented Oct 7, 2020

Nice progress!

Could the finalization problem be due to the GC always running finalizers in the latest world? See here:

https://github.com/JuliaLang/julia/blob/7d3dac44dc917a215607bfa1a6054a21846f02a7/src/gc.c#L275

Combine this with Revise deleting a method which is relied on by the finalizer?

o = error in running finalizer: ErrorException("task switch not allowed from inside gc finalizer")

You can probably work around this by hackily calling blocking IO from the C library rather than using Julia's non-blocking IO. For example,

julia> kill(a) = @ccall jl_safe_printf("finalizing a %s\n  %s"::Cstring; 
                                       repr(typeof(a))::Cstring,
                                       repr(a)::Cstring)::Cvoid
kill (generic function with 1 method)

julia> let
           finalizer(kill, [1,2])
           nothing
       end
       GC.gc()
finalizing a Vector{Int64}
  [1, 2]

@simeonschaub
Copy link
Contributor Author

simeonschaub commented Oct 7, 2020

Ok, we've got an object and something resembling a stacktrace (although I don't think this stacktrace is particularly helpful, because it doesn't tell us anything about where the finalizer was initially created, and for some reason, it's also cut off):

~/.../Julia/julia2 >>> JULIA_NUM_THREADS=1 ./julia --startup=no                                               ✓ 570
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.6.0-DEV.1141 (2020-10-06)
 _/ |\__'_|_|_|\__'_|  |  jl_expand_in_world/0539d3f9f4* (fork: 1 commits, 0 days)
|__/                   |

julia> using Revise
[ Info: Precompiling Revise [295af30f-e4ad-537b-8983-00126c2a3abe]

julia> Revise.track(Base)

julia> 1+1
error in running finalizer: MethodError(f=Base.OneTo{Int64}, args=(0,), world=0x000000000000722b)
error in running finalizer: MethodError(f=Base.OneTo{Int64}, args=(0,), world=0x000000000000722a)
error in running finalizer: MethodError(f=Base.OneTo{Int64}, args=(0,), world=0x0000000000007229)
error in running finalizer: MethodError(f=Base.OneTo{Int64}, args=(0,), world=0x0000000000007228)
Distributed.Future(1, 1, 1, nothing) :: Distributed.Future
ERROR: MethodError: no method matching Base.OneTo(::Int64)
Closest candidates are:
  Base.OneTo(!Matched::AbstractRange{T}) where T<:Integer at range.jl:331

Stacktrace:
  [1] (::Base.var"#6#8"{Distributed.Future, MethodError})()
    @ Base ./gcutils.jl:58
  [2] #invoke_in_world#4
    @ ./essentials.jl:744 [inlined]
  [3] invoke_in_world
    @ ./essentials.jl:743 [inlined]
  [4] (::Base.var"#_f#7"{typeof(Distributed.finalize_ref), UInt64})(o::Distributed.Future)
    @ Base ./gcutils.jl:55
  [5] sort!
    @ ./sort.jl:735 [inlined]
  [6] compute_ssa_mapping_delete_statements!(code::Core.CodeInfo, stmts::Vector{Int64})
    @ JuliaInterpreter ~/.julia/packages/JuliaInterpreter/muncv/src/optimize.jl:72
  [7] optimize!(code::Core.CodeInfo, scope::Method)
    @ JuliaInterpreter ~/.julia/packages/JuliaInterpreter/muncv/src/optimize.jl:196
  [8] JuliaInterpreter.FrameCode(scope::Method, src::Core.CodeInfo; generator::Bool, optimize::Bool)
    @ JuliaInterpreter ~/.julia/packages/JuliaInterpreter/muncv/src/types.jl:101
  [9] prepare_framecode(method::Method, argtypes::Any; enter_generated::Bool)
    @ JuliaInterpreter ~/.julia/packages/JuliaInterpr2

I suspect the finalizer comes from Distributed.test_exisiting_ref in the constructor of Distributed.Future, but I don't quite understand why Distributed is involved here at all. Does Revise actually use Distributed directly? If so, it probably needs to tell the finalizer to use Base.invoke_in_world using Revise's frozen worldage.

I obtained this using the following patch on top of my `jl_expand_in_world` branch:

diff --git a/base/gcutils.jl b/base/gcutils.jl
index 51e3943877..5da0fef06a 100644
--- a/base/gcutils.jl
+++ b/base/gcutils.jl
@@ -45,8 +45,24 @@ function finalizer(@nospecialize(f), @nospecialize(o))
     if !ismutable(o)
         error("objects of type ", typeof(o), " cannot be finalized")
     end
+
+    wa = get_world_counter()
+
+    function _f(o)
+        try
+            f(o)
+        catch e
+            invoke_in_world(wa, function()
+                safe_print("$(repr(o)) :: $(typeof(o))\n")
+                safe_print("ERROR: " * sprint(showerror, e) * "\n")
+                safe_print(sprint(show_backtrace, backtrace()) * "\n")
+            end)
+            nothing
+        end
+    end
+
     ccall(:jl_gc_add_finalizer_th, Cvoid, (Ptr{Cvoid}, Any, Any),
-          Core.getptls(), o, f)
+          Core.getptls(), o, _f)
     return o
 end
 
diff --git a/base/refpointer.jl b/base/refpointer.jl
index 67cec0925f..ae994517d8 100644
--- a/base/refpointer.jl
+++ b/base/refpointer.jl
@@ -194,3 +194,5 @@ e.g., intrinsics that are selected based on the address space, or back-ends that
 pointers to be identifiable by their types.
 """
 Core.LLVMPtr
+
+safe_print(s) = ccall(:jl_safe_printf, Cvoid, (Cstring,), s)

@timholy
Copy link
Owner

timholy commented Oct 7, 2020

Nice progress! I hate to keep asking, but what happens if you put bt = backtrace() right after wa = get_world_counter() and then use safe_print(sprint(show_backtrace, bt) * "\n")? To me that seems more likely to trace back to the creator.

@timholy
Copy link
Owner

timholy commented Oct 7, 2020

Does Revise actually use Distributed directly?

It supports revising workers, e.g.,

Revise.jl/src/packagedef.jl

Lines 264 to 269 in ed86caf

for p in workers()
try # guard against serialization errors if the type isn't defined on the worker
remotecall(Core.eval, p, Main, :(delete_method_by_sig($sig)))
catch
end
end
and similar blocks for new evaluation.

@simeonschaub
Copy link
Contributor Author

Nice progress! I hate to keep asking, but what happens if you put bt = backtrace() right after wa = get_world_counter() and then use safe_print(sprint(show_backtrace, bt) * "\n")? To me that seems more likely to trace back to the creator.

TIL that you can use backtrace() anywhere. 😁 I always thought, it only worked inside catch.

Does Revise actually use Distributed directly?

It supports revising workers, e.g.,

Revise.jl/src/packagedef.jl

Lines 264 to 269 in ed86caf

for p in workers()
try # guard against serialization errors if the type isn't defined on the worker
remotecall(Core.eval, p, Main, :(delete_method_by_sig($sig)))
catch
end
end

and similar blocks for new evaluation.

Funny you should point me to those exact lines, I think they are the problem:

Stacktrace:
 [1] finalizer
   @ ./gcutils.jl:50 [inlined]
 [2] test_existing_ref(r::Distributed.Future)
   @ Distributed ~/Documents/Julia/julia2/usr/share/julia/stdlib/v1.6/Distributed/src/remotecall.jl:81
 [3] Future
   @ ~/Documents/Julia/julia2/usr/share/julia/stdlib/v1.6/Distributed/src/remotecall.jl:32 [inlined]
 [4] Future
   @ ~/Documents/Julia/julia2/usr/share/julia/stdlib/v1.6/Distributed/src/remotecall.jl:31 [inlined]
 [5] Future
   @ ~/Documents/Julia/julia2/usr/share/julia/stdlib/v1.6/Distributed/src/remotecall.jl:111 [inlined]
 [6] Future
   @ ~/Documents/Julia/julia2/usr/share/julia/stdlib/v1.6/Distributed/src/remotecall.jl:112 [inlined]
 [7] remotecall(::Function, ::Distributed.LocalProcess, ::Module, ::Vararg{Any, N} where N; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ Distributed ~/Documents/Julia/julia2/usr/share/julia/stdlib/v1.6/Distributed/src/remotecall.jl:358
 [8] remotecall(::Function, ::Distributed.LocalProcess, ::M
Stacktrace:
 [1] remotecall(::Function, ::Int64, ::Module, ::Vararg{Any, N} where N)
   @ Distributed ~/Documents/Julia/julia2/usr/share/julia/stdlib/v1.6/Distributed/src/remotecall.jl:376
 [2] (::Revise.var"#81#83"{OrderedCollections.OrderedDict{Revise.RelocatableExpr, Union{Nothing, Vector{Any}}}, OrderedCollections.OrderedDict{Revise.RelocatableExpr, Union{Nothing, Vector{Any}}}})()
   @ Revise ~/.julia/dev/Revise/src/packagedef.jl:271
 [3] with_logstate(f::Function, logstate::Any)
   @ Base.CoreLogging ./logging.jl:491
 [4] with_logger
   @ ./logging.jl:603 [inlined]
 [5] delete_missing!
   @ ~/.julia/dev/Revise/src/packagedef.jl:241 [inlined]
 [6] delete_missing!(mod_exs_sigs_old::OrderedCollections.OrderedDict{Module, OrderedCollections.OrderedDict{Revise.RelocatableExpr, Union{Nothing, Vector{Any}}}}, mod_exs_sigs_new::OrderedCollections.OrderedDict{Module, OrderedCollections.OrderedDict{Revise.RelocatableExpr, Union{Nothing, Vector{Any}}}})
   @ Revise ~/.julia/dev/Revise/src
Stacktrace:
  [1] #invoke_in_world#4
    @ ./essentials.jl:744 [inlined]
  [2] invoke_in_world(::UInt64, ::Any)
    @ Base ./essentials.jl:743
  [3] top-level scope
    @ none:1
  [4] eval(m::Module, e::Any)
    @ Core ./boot.jl:360
  [5] eval_user_input(ast::Any, backend::REPL.REPLBackend)
    @ REPL ~/Documents/Julia/julia2/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:139
  [6] repl_backend_loop(backend::REPL.REPLBackend)
    @ REPL ~/Documents/Julia/julia2/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:200
  [7] start_repl_backend(backend::REPL.REPLBackend, consumer::Any)
    @ REPL ~/Documents/Julia/julia2/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:185
  [8] run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool)
    @ REPL ~/Documents/Julia/julia2/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:316
  [9] run_repl(repl::REPL.AbstractREPL, consumer::Any)
    @ REPL ~/Documents/Julia/julia2/usr/share/julia/stdlib/v1.6/REPL/src/REPL.jl:304
 [10] (::Base.var"#8582

We may need to provide a different finalizer for Distributed.Future inside remotecall that uses Base.invoke_in_world. Will investigate further.

@timholy
Copy link
Owner

timholy commented Oct 7, 2020

This is awesome progress. What a team!

@simeonschaub
Copy link
Contributor Author

🎉:

~/.../Julia/julia2 >>> JULIA_NUM_THREADS=1 ./julia --startup=no                                               ✓ 603
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.6.0-DEV.1142 (2020-10-07)
 _/ |\__'_|_|_|\__'_|  |  jl_expand_in_world/0bd0014add* (fork: 2 commits, 1 day)
|__/                   |

julia> using Revise

julia> Revise.track(Base)

julia> 1+1
2

Look ma, no MethodErrors! 😆
Will put together a PR shortly.

@simeonschaub simeonschaub linked a pull request Oct 7, 2020 that will close this issue
@simeonschaub
Copy link
Contributor Author

And thanks for both of your help! I really learned a lot about Julia and Revise internals I didn't know before.

@timholy timholy added the bug label Oct 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants