Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues with the semantics of top-level expressions #24569

Open
JeffBezanson opened this issue Nov 11, 2017 · 11 comments
Open

issues with the semantics of top-level expressions #24569

JeffBezanson opened this issue Nov 11, 2017 · 11 comments
Labels
compiler:inference Type inference compiler:interpreter design Design of APIs or of the language itself needs decision A decision on this change is needed

Comments

@JeffBezanson
Copy link
Member

Top-level expressions are special because they are able to cause certain side-effects that strongly affect program behavior and that ordinary expressions can't cause. These are:

  1. Binding resolution (global, const, import, using)
  2. Method definition

Fortunately, that's all. (Type definitions could be on this list, but the only tricky part of their behavior is assigning a global name, which is equivalent to item 1.)

The most desirable behavior for any statements not inside methods is "interpreter-like": all statements should see all effects of all previous statements. Unfortunately this is at odds with compiling fast code, which would like to assume a fixed world state (i.e. world counter). This tension has led to several bugs and tricky design decisions, such as:

Code behaving differently based on whether it gets compiled: #2586 #24566
Top-level code breaking inference's fixed-world assumption: #24316
Problems with binding effects: #18933 #22984 #12010

Three general kinds of solutions are possible: (1) Change top-level semantics to enable optimizations, (2) make optimizations sound with respect to top-level semantics, or (3) don't optimize top-level expressions. (1) is likely to be unpopular, since it would lead to things like:

f(x) = 1
begin
    f(x::Int) = 2
    f(1)  # gives 1
end

due to inference's usual fixed-world assumption. That leads us to (2). But it wouldn't be able to optimize very much. Consider a case like

for i = 1:100
    include("file$i.jl")
    f(1)
end

where we can't reasonably assume anything about what f(1) does.

That brings us to (3). In practice, it is basically never useful to infer and compile a top-level expression. The only exceptions are casual benchmarking (@time), and maybe an occasional long-running loop in the REPL. So to compromise, for a long time we've been compiling top-level expressions if they have loops. But that's a rather blunt instrument, since compiling loops that e.g. call eval is not worthwhile, and in other cases is unsound (#24316).

I'm not sure what to do, but overall I think we should optimize top-level code much less often (which might improve load times as well). One idea is to add a @compile or @fast macro that turns off top-level semantics, basically equivalent to wrapping the code in a 0-arg function and calling it.

Thoughts?

@JeffBezanson JeffBezanson added needs decision A decision on this change is needed design Design of APIs or of the language itself compiler:inference Type inference compiler:interpreter labels Nov 11, 2017
@StefanKarpinski
Copy link
Member

Would it be possible to compile loops only if they don't call anything that does one of the above special actions or results in eval being called? I suspect that requires trying some amount of compilation and bailing out if those conditions are encountered, which is a bit suboptimal.

@yuyichao
Copy link
Contributor

if they don't call anything that does one of the above special actions or results in eval being called?

I think that's undecidable.

@StefanKarpinski
Copy link
Member

No kidding, but we can be conservative here. We don't need a precise answer, we just want to be able to optimize cases where it's obvious that none of the problematic operations happen.

@yuyichao
Copy link
Contributor

yuyichao commented Nov 11, 2017

we just want to be able to optimize cases where it's obvious that none of the problematic operations happen.

For a start that requires no function call and I think that pretty much rule out almost all the useful cases.

@StefanKarpinski
Copy link
Member

It doesn't eliminate cases where the entire loop body is inlined, which is not a useless case. Even if there are function calls but no dynamic dispatch, it would be possible to know that particular method doesn't itself call any top-level-affecting constructs. Once you have dynamic dispatch or call a known method that itself cannot be certain not to affect the top-level, then sure, the game is up. But I really don't think that's a negligible or useless set of cases.

@yuyichao
Copy link
Contributor

The loop also has to not do any allocation since the finalizer can modify global states.

@vtjnash
Copy link
Member

vtjnash commented Nov 12, 2017

It doesn't eliminate cases where the entire loop body is inlined

That's a tautology. You can't know if the loop body is inlined unless you can define that it can't have side-effects. Or we need to implement (2) above: teach inference how to reason about all of the possible side-effects of top-level code.

@StefanKarpinski
Copy link
Member

That is why I mentioned trying to compile the code first and bailing out if it turns out to be too dynamic or too to do something top-level.

@yuyichao
Copy link
Contributor

That's exactly what I said is impossible. No-allocation isn't even defined in language semantics so finalizer can happen anywhere.

@JeffBezanson
Copy link
Member Author

I don't think we should worry about finalizers here. They should not be adding methods, and we can either say they are not allowed to, or that it's undefined when those effects are seen. Other than that the problem is the time needed to develop either an analysis for this or deoptimization techniques.

@yuyichao
Copy link
Contributor

they are not allowed to

I think this is fine, but in which case we should disallow it explicitly, just like task switches.

that it's undefined when those effects are seen

This we can certainly do but it doesn't help since we still need to satisfy causality. The point is that we don't have to do that but the user can do the synchronization themselves. As long as we allow finalizers to run on the same thread / task that's currently running code (which can be toplevel code), it would be possible for the user to synchronize with that code and expect certain effect to be visible based on transitivity. Also note that such synchrnoization doesn't even require atomics since they are running on the same thread which is an observable that the code can and sometime has to rely on (a good example being recursive lock). This make such synchronization impossible to detect (it can be any heap read/write).

Adding threading to this make things much worse since the finalizer can synchrnoize using proper atomics/locks (which they do need sometimes) with another thread that defines new method meaning that the top level loop can be synchronized with a finalizer that synchronize with another thread that defines a new method and none of the above can actually deal with this.

I would also point out that I actually also doubt just being able to compile a loop in the same world is going to help much. The loop has to not reference any non-const global for this to be remotely inferrable (otherwise we would have got significantly less complaint from slow toplevel code). Assuming the type of global doesn't change will also not be valid due to finalizers and I think forbit finalizers from mutating global variables is not a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:inference Type inference compiler:interpreter design Design of APIs or of the language itself needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests

4 participants