issues with the semantics of top-level expressions #24569

JeffBezanson · 2017-11-11T00:36:33Z

Top-level expressions are special because they are able to cause certain side-effects that strongly affect program behavior and that ordinary expressions can't cause. These are:

Binding resolution (global, const, import, using)
Method definition

Fortunately, that's all. (Type definitions could be on this list, but the only tricky part of their behavior is assigning a global name, which is equivalent to item 1.)

The most desirable behavior for any statements not inside methods is "interpreter-like": all statements should see all effects of all previous statements. Unfortunately this is at odds with compiling fast code, which would like to assume a fixed world state (i.e. world counter). This tension has led to several bugs and tricky design decisions, such as:

Code behaving differently based on whether it gets compiled: #2586 #24566
Top-level code breaking inference's fixed-world assumption: #24316
Problems with binding effects: #18933 #22984 #12010

Three general kinds of solutions are possible: (1) Change top-level semantics to enable optimizations, (2) make optimizations sound with respect to top-level semantics, or (3) don't optimize top-level expressions. (1) is likely to be unpopular, since it would lead to things like:

f(x) = 1
begin
    f(x::Int) = 2
    f(1)  # gives 1
end

due to inference's usual fixed-world assumption. That leads us to (2). But it wouldn't be able to optimize very much. Consider a case like

for i = 1:100
    include("file$i.jl")
    f(1)
end

where we can't reasonably assume anything about what f(1) does.

That brings us to (3). In practice, it is basically never useful to infer and compile a top-level expression. The only exceptions are casual benchmarking (@time), and maybe an occasional long-running loop in the REPL. So to compromise, for a long time we've been compiling top-level expressions if they have loops. But that's a rather blunt instrument, since compiling loops that e.g. call eval is not worthwhile, and in other cases is unsound (#24316).

I'm not sure what to do, but overall I think we should optimize top-level code much less often (which might improve load times as well). One idea is to add a @compile or @fast macro that turns off top-level semantics, basically equivalent to wrapping the code in a 0-arg function and calling it.

Thoughts?

The text was updated successfully, but these errors were encountered:

StefanKarpinski · 2017-11-11T02:41:09Z

Would it be possible to compile loops only if they don't call anything that does one of the above special actions or results in eval being called? I suspect that requires trying some amount of compilation and bailing out if those conditions are encountered, which is a bit suboptimal.

yuyichao · 2017-11-11T04:17:50Z

if they don't call anything that does one of the above special actions or results in eval being called?

I think that's undecidable.

StefanKarpinski · 2017-11-11T16:51:52Z

No kidding, but we can be conservative here. We don't need a precise answer, we just want to be able to optimize cases where it's obvious that none of the problematic operations happen.

yuyichao · 2017-11-11T17:45:16Z

we just want to be able to optimize cases where it's obvious that none of the problematic operations happen.

For a start that requires no function call and I think that pretty much rule out almost all the useful cases.

StefanKarpinski · 2017-11-11T22:57:31Z

It doesn't eliminate cases where the entire loop body is inlined, which is not a useless case. Even if there are function calls but no dynamic dispatch, it would be possible to know that particular method doesn't itself call any top-level-affecting constructs. Once you have dynamic dispatch or call a known method that itself cannot be certain not to affect the top-level, then sure, the game is up. But I really don't think that's a negligible or useless set of cases.

yuyichao · 2017-11-11T23:29:48Z

The loop also has to not do any allocation since the finalizer can modify global states.

vtjnash · 2017-11-12T06:50:04Z

It doesn't eliminate cases where the entire loop body is inlined

That's a tautology. You can't know if the loop body is inlined unless you can define that it can't have side-effects. Or we need to implement (2) above: teach inference how to reason about all of the possible side-effects of top-level code.

StefanKarpinski · 2017-11-12T15:06:38Z

That is why I mentioned trying to compile the code first and bailing out if it turns out to be too dynamic or too to do something top-level.

yuyichao · 2017-11-12T15:43:43Z

That's exactly what I said is impossible. No-allocation isn't even defined in language semantics so finalizer can happen anywhere.

JeffBezanson · 2017-11-13T05:25:13Z

I don't think we should worry about finalizers here. They should not be adding methods, and we can either say they are not allowed to, or that it's undefined when those effects are seen. Other than that the problem is the time needed to develop either an analysis for this or deoptimization techniques.

yuyichao · 2017-11-13T14:09:24Z

they are not allowed to

I think this is fine, but in which case we should disallow it explicitly, just like task switches.

that it's undefined when those effects are seen

This we can certainly do but it doesn't help since we still need to satisfy causality. The point is that we don't have to do that but the user can do the synchronization themselves. As long as we allow finalizers to run on the same thread / task that's currently running code (which can be toplevel code), it would be possible for the user to synchronize with that code and expect certain effect to be visible based on transitivity. Also note that such synchrnoization doesn't even require atomics since they are running on the same thread which is an observable that the code can and sometime has to rely on (a good example being recursive lock). This make such synchronization impossible to detect (it can be any heap read/write).

Adding threading to this make things much worse since the finalizer can synchrnoize using proper atomics/locks (which they do need sometimes) with another thread that defines new method meaning that the top level loop can be synchronized with a finalizer that synchronize with another thread that defines a new method and none of the above can actually deal with this.

I would also point out that I actually also doubt just being able to compile a loop in the same world is going to help much. The loop has to not reference any non-const global for this to be remotely inferrable (otherwise we would have got significantly less complaint from slow toplevel code). Assuming the type of global doesn't change will also not be valid due to finalizers and I think forbit finalizers from mutating global variables is not a good idea.

JeffBezanson added needs decision A decision on this change is needed design Design of APIs or of the language itself compiler:inference Type inference compiler:interpreter labels Nov 11, 2017

JeffBezanson mentioned this issue Nov 13, 2017

improve heuristics for executing top-level code #24600

Merged

JeffBezanson mentioned this issue Jan 7, 2020

Functions behaves strangely different inside and outside @testset #34252

Closed

KristofferC mentioned this issue Jan 17, 2020

remove code forcing testsets to be compiled #34405

Merged

JeffBezanson mentioned this issue Jan 5, 2022

Julia crashes with Illegal instruction #43668

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issues with the semantics of top-level expressions #24569

issues with the semantics of top-level expressions #24569

JeffBezanson commented Nov 11, 2017

StefanKarpinski commented Nov 11, 2017

yuyichao commented Nov 11, 2017

StefanKarpinski commented Nov 11, 2017

yuyichao commented Nov 11, 2017 •

edited

Loading

StefanKarpinski commented Nov 11, 2017

yuyichao commented Nov 11, 2017

vtjnash commented Nov 12, 2017

StefanKarpinski commented Nov 12, 2017

yuyichao commented Nov 12, 2017

JeffBezanson commented Nov 13, 2017

yuyichao commented Nov 13, 2017

issues with the semantics of top-level expressions #24569

issues with the semantics of top-level expressions #24569

Comments

JeffBezanson commented Nov 11, 2017

StefanKarpinski commented Nov 11, 2017

yuyichao commented Nov 11, 2017

StefanKarpinski commented Nov 11, 2017

yuyichao commented Nov 11, 2017 • edited Loading

StefanKarpinski commented Nov 11, 2017

yuyichao commented Nov 11, 2017

vtjnash commented Nov 12, 2017

StefanKarpinski commented Nov 12, 2017

yuyichao commented Nov 12, 2017

JeffBezanson commented Nov 13, 2017

yuyichao commented Nov 13, 2017

yuyichao commented Nov 11, 2017 •

edited

Loading