Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduction of an inlinable mutable global binding #38588

Open
FedericoStra opened this issue Nov 27, 2020 · 23 comments
Open

Introduction of an inlinable mutable global binding #38588

FedericoStra opened this issue Nov 27, 2020 · 23 comments

Comments

@FedericoStra
Copy link
Contributor

FedericoStra commented Nov 27, 2020

Introduction of an inlinable mutable global binding

The main purpose of the const qualifier seems to be to allow the compiler to infer the type of global bindings and possibly inline their value when they are referenced inside functions.

For instance, in this code

const c = 0
v = 0
use_globals() = (c, v)
@code_typed use_globals()

the inferred return type is Tuple{Int64,Any}, and c is inlined and replaced by Core.Compiler.Const(0, false), whereas v cannot be.

Despite the name, the const qualifier doesn't seem too much concerned about const-correctness (preventing subtle bugs caused by inadvertently modifying a variable), since the const qualifier is not applicable to local variables to make the binding immutable, consequently limiting the usefulness of the qualifier in this regard. If const-correctness were a major concern, then why not allow const to be used in local scopes too?

In summary, as it stands right now, const seems to be more about performance than correctness.

Qualifying a global variables with const, however, has the (undesirable?) consequence that reassigning to the variable is probably undefined behavior (#38584), and the semantics of this does not seem likely to change.

Especially during an interactive session, a user might feel the need to redefine a global const, because maybe it was mistakenly defined, or the user is experimenting and wants to compare different definitions. With the current semantics, however, the only option is to restart the session, with the obvious inconveniences of losing everything else currently in the session, paying the price of the overhead to load again using ... statements, etc. This hinders very much the interactive aspect of the language.

The natural question to raise is whether it is appropriate to introduce a qualifier that retains the performance characteristics of const, whilst allowing reassignment without invoking undefined behavior.

Proposal

I therefore propose the introduction of the following qualifier: inline.
The precise form of this feature is of course subject to modification; here I'm using a new keyword inline just to exemplify the concept.

The meaning of

inline c = value

is similar to const c = value, but with some crucial differences.

  1. The type of the global variable c is not allowed to change. If this is the first definition of g, then the type of c will be forever restricted to be the type of value. If c was already defined without inline, it is an error; if value is not of the frozen type that c already has, it is an error.
  2. From this moment on, whenever the compiler compiles the specialization of a function that references the global variable c, it is allowed to inline the value value instead and infer the frozen type as indicated in point 1. In particular, it is legal to reassign to the variable c (if the right hand side has the correct type, of course), and the consequence is that whenever the compiler decides to inline c from this moment onward it uses this new value, unless overridden again.

More specifically, let's say that a method foo() is defined at a certain time t_d and called at a later time t_c . At an intermediate time t_s ∈ [t_d, t_c] the method is specialized and compiled. If the method references a global variable inline c, then the compiler is free to must inline the value of c at the time t_s. The exact time at which this happens (or whether it happens at all) can be implementation specific, or even unspecified; but the important difference relative to const is that reassignment is not undefined behavior.

The compiler has the right not to inline the global binding. In particular, if it does so, the function will hence witness any future changes of the global variable. If the value is inlined instead, any future changes of the variable will be ignored and the function will always use the frozen value that the variable had at the time of specialization. (Edit: this paragraph is very questionable, and it is probably better to require that the compiler actually inlines the value. I recognize that my original phrasing was a mistake.)

Comparisons

The proposed feature is extremely similar to how numba treats global variables:

from numba import jit
c = 0
@jit
def foo():
    return c
# foo()
c = 1
foo()

The result can be either 0 or 1 and depends on whether in the commented line we call foo (forcing it to compile at a time when c == 0) or not (forcing it to compile at a time when c == 1).

Remarks

The proposed feature has the same performance characteristics of const, because the type can be inferred and the value can be inlined in the same manner.

At the same time, the proposed feature allows for more flexible interactive use, because the behavior of the following program would be well defined

inline c = 0
f() = c
print(f()) # prints 0
inline c = 1
f() = c    # this is a new method, hence requires a new specialization
print(f()) # prints 1

whereas the analogous program with const instead of inline requires a restart after line 3.

The described semantics of the proposed features seems to be extremely close to the actual current implementation specific behaviour of const, hence it is plausible to imagine that implementing inline would not require the addition of new intricate machinery to the internals of Julia.

@yuyichao
Copy link
Contributor

The compiler has the right not to inline the global binding. In particular, if it does so, the function will hence witness any future changes of the global variable. If the value is inlined instead, any future changes of the variable will be ignored and the function will always use the frozen value that the variable had at the time of specialization.

This is unacceptable.

@tpapp
Copy link
Contributor

tpapp commented Nov 27, 2020

Perhaps you are looking for

c() = 0

which already does what you want and even allows changing types. Redefine the function, and Julia will recompile what it needs to (which is of course costly, but always guaranteed to do the Right Thing:tm:).

The other standard idiom you can use is a 1-element container, such as

const c = Ref(0) # access and modify c[]

@FedericoStra
Copy link
Contributor Author

FedericoStra commented Nov 27, 2020

This is unacceptable.

Fine. I stated it as a right that an implementation has. If an implementation does not like it, it can decide to always inline (and maybe document this choice).

Most importantly, all the above is open to discussion. It is meant as a feature primarily focused at interactive use in order to reduce the need for restarts.

Maybe an argumentation slightly more verbose than "This is unacceptable" would be helpful to clarify why you believe so.

@FedericoStra
Copy link
Contributor Author

FedericoStra commented Nov 27, 2020

Perhaps you are looking for

c() = 0

No, I'm not. One of the primary intended uses would be to redefine structures

struct S1 ... end
inline S = S1
# play around with S, defining and calling functions that
# accept and return objects of type S, which currently is S1
struct S2 ... end
inline S = S2
# play around with S, defining and calling functions that
# accept and return objects of type S, which currently is S2

while developing a module, without the need for restarts. Currently, with const it is impossible. This would render trivial many issues associated with Revise and its inability to properly reload modules. It's not Revise's fault, it's a shortcoming of the language.

@yuyichao
Copy link
Contributor

yuyichao commented Nov 27, 2020

it can decide to always inline (and maybe document this choice).

No that's not the problem. The behavior for well defined code must not depend on whether the compiler decide to compile something, or whether some code is run.

@FedericoStra
Copy link
Contributor Author

FedericoStra commented Nov 27, 2020

I can give you an example of how redefining structures would look from the user perspective. This can play the role of a motivation for this feature, but I don't want it to hijack the discussion. Anyway, let me introduce this macro

macro redefinable(struct_def)
    struct_def isa Expr && struct_def.head == :struct || error("struct definition expected")
    is_unionall = false
    if struct_def.args[2] isa Symbol
        name = struct_def.args[2]
        real_name = struct_def.args[2] = gensym(name)
    elseif struct_def.args[2].head == :curly
        is_unionall = true
        name = struct_def.args[2].args[1]
        real_name = struct_def.args[2].args[1] = gensym(name)
    elseif struct_def.args[2].head == :<:
        if struct_def.args[2].args[1] isa Symbol
            name = struct_def.args[2].args[1]
            real_name = struct_def.args[2].args[1] = gensym(name)
        elseif struct_def.args[2].args[1].head == :curly
            is_unionall = true
            name = struct_def.args[2].args[1].args[1]
            real_name = struct_def.args[2].args[1].args[1] = gensym(name)
        else
            error("expected `S <: AbstractType`")
        end
    else
        error("expected `S` or `S <: AbstractType`")
    end
    if is_unionall
        fix_name = :($real_name.body.name.name = $(QuoteNode(name)))
    else
        fix_name = :($real_name.name.name = $(QuoteNode(name)))
    end
    esc(quote
        $struct_def
        $fix_name
        $name = $real_name # this should be `const $name = $real_name`
    end)
end

I know it may look scary, but what it does is quite simple. Let's say that you do

abstract type A end
@redefinable struct S end
@redefinable struct S <: A end
@redefinable struct S{T} end
@redefinable struct S{T} <: A end

At each step it defines a structure with a "secret" name gensym(:S) and then binds the global variable S to this structure. You can see that

Base.remove_linenums!(@macroexpand @redefinable struct S{T} <: A end)

expands to

struct var"##S#262"{T} <: A
end
(var"##S#262").body.name.name = :S
S = var"##S#262"

As written in a comment toward the end of the macro, we would really want const S = var"##S#262" instead of the last assignment, so that from this moment on any usage of S could be inlined. This however is not possible with const.

(Edit: if you don't like the tricky (var"##S#262").body.name.name = :S, it is just a way to have the secret type display its name as S. The same can be achieved by defining Base.show_datatype(io::Base.IO, ::Base.Type{$real_name}) = Base.print(io, $(QuoteNode(name))) and leaving the real name untouched).

@FedericoStra
Copy link
Contributor Author

FedericoStra commented Nov 27, 2020

No that's not the problem. The behavior for well defined code must not depend on whether the compiler decide to compile something, or whether some code is run.

I agree, although at least in C there is the concept of "unspecified value", which is a different notion from undefined behavior. The behavior of the program is well defined, simply the value can be any valid value of the suitable type, and may depend on external factors not under the control of the programmer. With the freedom for the compiler not to inline, the global inline variable would be more similar to this.

Anyway, I don't think this precludes the examination of this feature. I originally stated it in that more flexible way to leave more choice to the implementation. I now recognize that it was almost indisputably an error. It is better to ask that the compiler always inlines the value.

@yuyichao
Copy link
Contributor

yuyichao commented Nov 27, 2020

The behavior of the program is well defined, simply the value can be any valid value of the suitable type, and may depend on external factors not under the control of the programmer.

And that's exactly what we don't want.

It is better to ask that the compiler always inlines the value.

No it's not even about asking the compiler to always inline. The compiler isn't a concept that exist at as far as the user is concerned. There isn't a "compilation" step. The code can run with or without it so one must not make different decision (again for well defined code) to do different things depending on if and when the code is compiled.

@FedericoStra
Copy link
Contributor Author

FedericoStra commented Nov 27, 2020

The compiler isn't a concept that exist at as far as the user is concerned. There isn't a "compilation" step.

I'll quote and paraphrase myself to answer this.

More specifically, let's say that a method foo() is defined at a certain time t_d and called at a later time t_c . At an intermediate time t_s ∈ [t_d, t_c] the method is specialized and compiled. If the method references a global variable inline c, then the compiler is free to must inline the value of c at the time t_s. The exact time at which this happens (or whether it happens at all) can be implementation specific, or even unspecified; but the important difference relative to const is that reassignment is not undefined behavior.

Let me put it differently.

  • The user defines a method at time t_d. The user calls a method at time t_c > t_d. From the user perspective, any reference to an inline c in the called method get resolved to one of the possible values that c had between time t_d and t_c.

There is no mentioning of the compiler in the previous formulation. In my original post I spoke about the compiler to explain how it would kind of work internally. The specification of the meaning does not need to refer to the compiler.

foo() = c # this is time t_d
# time t_s must be somewhere here in between
foo() # this is time t_c

several inline c = ... can occur both before and in between. The value that gets inlined is the value that c has at a certain time t_s which is between t_d and t_c.

There are several useful instances where there is a unique choice of the value to inline.

And remember that this feature is particularly targeted at interactive use. I don't mind if it is decided to ban it from packages at an earlier stage. It is a mean of working around the current limitations for interactivity. I gave an example where this results in a completely unambiguous program. This program

const c = 0
const c = 1

currently is (almost surely) undefined behavior.

The linked issue (#38588) is about making it defined behavior (basically with the same meaning of inline), or leaving it undefined behavior and documenting it properly!

This issue is about retaining the meaning of const, but extending the language and make this new program

inline c = 0
inline c = 1
foo() = c
foo() # must return 1

a program with defined behavior.

Or, if you prefer a more useful example:

# 3 lines coming from an include(...)
struct S1 x::Int end
inline S = S1
foo(s::S) = ... s.x ... S(42)

# work with foo and S

# update the included file
struct S2 y::Float64 end
inline S = S2
foo(s::S) = ... s.y ... S(3.14)

# experiment with the new foo and S

The macro I presented above can hide the existence of the names S1 and S2 to the user, so he can just work with foo and S without the need to restart.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Nov 28, 2020

If const-correctness were a major concern, then why not allow const to be used in local scopes too?

That's just a missing feature. If someone were to make a pull-request implementing local const support, that would be great. It's a feature that we've always wanted but has never made it to the top of the ever-expanding list of things to be done. Another related feature that would be great to have would be typed globals, declared as x::T = value, which would be non-constant but only allow values assigned with the same type (this, conversely to const locals, already works in local scope but not in global scope).

With the current semantics, however, the only option is to restart the session [...] This hinders very much the interactive aspect of the language.

The other option is just redefine the constant and carry on—it usually works just fine. The reason that it is still undefined behavior is that even though it usually works to redefine a constant, is that guaranteeing that nothing bad will happen is very hard and puts quite a burden on the compiler.

One of the primary intended uses would be to redefine structures

That would indeed by useful, but it seems like it would make more sense to directly request that as a feature instead of proposing a new language feature that it's not clear how one would use correctly. Would there be any justified usage of the proposed inline global feature? I can't think of any. Every use case in a final working program should actually be one of const, const Ref, or a function returning a value. Adding a language feature that should not be used is kind of strange.

@yuyichao
Copy link
Contributor

any reference to an inline c in the called method get resolved to one of the possible values that c had between time t_d and t_c.

You just removed the mentioning of "compilation" but that doesn't fix any problem with this at all. It's just called "resolve" instead of "compile" now. That's not a concept that exist and must never have any user visible effect for well defined code.

@antoine-levitt
Copy link
Contributor

@FedericoStra I'm following julia development from the sidelines, so hopefully somebody more knowledgeable will correct me if I'm wrong, but the broader point is that the language is separate from the compiler, and that the language should be designed not for the compiler we have now but for the compiler we wish to have. While the julia language itself is quite stable and looks like it should be around for a good number of years, some of the compiler limitations can hopefully be lifted in future releases. It seems that the features that your proposal is meant to work around (struct redefinition, const-type global variables), which a lot of people want, are implementable in the current state of the language, it's "just" a question of somebody actually doing it. So the answer to your point "It's not Revise's fault, it's a shortcoming of the language" is no, it's "just" a shortcoming of the compiler.

This is putting a lot of weight onto compiler people and is frustrating because in the short term we're missing important features, but it's a principle that seems to have served julia well up to now. It's tempting to introduce language features to address compiler problems, but it's hurtful in the long-term.

@FedericoStra
Copy link
Contributor Author

FedericoStra commented Nov 28, 2020

@yuyichao

You just removed the mentioning of "compilation" but that doesn't fix any problem with this at all. It's just called "resolve" instead of "compile" now. That's not a concept that exist and must never have any user visible effect for well defined code.

"Resolves" means "evaluates to", not "compiles".

Regarding the text in bold, that's just a plain wrong opinion. As I already said, at least in languages like C, there is the concept of unspecified value, which is different from undefined behavior. What it means is that any instance can evaluate to a value on which the language specification imposes no conditions apart from being valid for the relative type. The behavior is not undefined, because in particular the instance must evaluate to something valid. The semantic of inline I described above is even more restrictive, because it prescribes the rule that the value that the variable evaluates to must be one of the values that the variable held during some portion of its lifetime. Saying that this is undefined behavior is simply wrong.

You can even witness unspecified values in Julia. Quoting from the docs:

Julia considers some types to be "plain data", meaning all of their data is self-contained and does not reference other objects. The plain data types consist of primitive types (e.g. Int) and immutable structs of other plain data types. The initial contents of a plain data type is undefined:

julia> struct HasPlain
           n::Int
           HasPlain() = new()
       end
julia> HasPlain()
HasPlain(438103441441)

I interpret that "undefined" at the end to mean actually the same thing as "unspecified value" from the C standard, and not that the previous program exhibits undefined behavior. There is in fact no mentioning in the Julia docs that accessing HasPlain().n is undefined behavior. If it is, the documentation is not clear on this point.


@StefanKarpinski I get your points, and from a "language purity" perspective I agree with you at 95%. I would just like to comment on this

The other option is just redefine the constant and carry on—it usually works just fine. The reason that it is still undefined behavior is that even though it usually works to redefine a constant, is that guaranteeing that nothing bad will happen is very hard and puts quite a burden on the compiler.

Since redefining const/struct is such an integral part of the workflow while developing packages, I feel like it would be better to give a tool to the programmers that guarantees that doing so is not undefined behavior. As it currently stands, the developer must have faith in the interpreter/compiler that this undefined behavior is not so wild and is actually pretty close to what he expects. Having a language guarantee would get rid of this recurring Russian roulette.

I would even go as far as saying that this inlineable concept could be restricted to interactive use and modules under development, and banned from imported packages. I agree that the "final product" shouldn't really need/use inline, but while developing or working in an interactive session it is too restrictive not to allow redefinition of "constants". A good fraction of the interactivity goes out the window if, when in need of redefining a struct/const, we are faced with the choice between restarting or playing daredevil with undefined behavior.


@antoine-levitt Again, I agree at a 95% confidence level.

It seems that the features that your proposal is meant to work around (struct redefinition, const-type global variables), which a lot of people want, are implementable in the current state of the language, it's "just" a question of somebody actually doing it. So the answer to your point "It's not Revise's fault, it's a shortcoming of the language" is no, it's "just" a shortcoming of the compiler.

Maybe I'm misunderstanding what you mean, but if structs and const are truly meant to be immutable once defined (I mean, from the point of view of what the language specification dictates), then no enhancement of the compiler can get around this constraint. If the language says they are constants, the compiler cannot implement a feature that magically lets redefine them.

In order to have an interpreter/compiler that gives us the possibility to redefine struct/const, there must be in the language specification a feature which is a mutable (obviously) inlinable (otherwise it has a performance penalty) global binding.

@antoine-levitt
Copy link
Contributor

antoine-levitt commented Nov 28, 2020

The fundamental problems here are that 1) structs can't be redefined and 2) non-const globals are slow. If you relax 1) (which plausibly can be done without any other change to the language) and you add global-scope type assertions, you can make const error on redefinition (as it probably should have been from the start, if not for the problem that without this there's no workaround for performant globals) and tell people to use type assertions.

@FedericoStra
Copy link
Contributor Author

FedericoStra commented Nov 28, 2020

Now I agree 99% that this would be satisfactory. But please notice that by relaxing 1) you now allow struct to be redefined also in "final product" packages, and not just during interactive use. Is that something we are sure that we want? What I was trying to propose here was instead a way to give a meaning to the UB inherent in redefining const during interactive use. This inline feature could be restricted to the REPL and modules under development, while banned from "final products". If you make structs redefinable, you get it everywhere.

Also, if I understand correctly, global-scope type assertions have a stricter semantics than inline, because when you call a method that references the variable you always want to get the latest value (and in particular it cannot be inlined without forcing recompilation). inline, on the other hand, can resolve to an older value:

c::Int = 0
foo() = c
foo()
c = 1
foo() # must be 1

versus

inline c = -1
inline c = 0
foo() = c
foo()
inline c = 1
foo() # can be 0 or 1

Moreover, implementing global-scope type assertions may require complicate machinery. On the other hand, the semantics of inline is intentionally more relaxed and closer to the current implementation specific behavior of reassigning constants. This means that it would probably require much less effort to implement. You can think of it as a lower-effort feature that would be sufficient to solve some of the issues that are currently faced.

@yuyichao
Copy link
Contributor

"Resolves" means "evaluates to", not "compiles".

Of course, but that's what matters anyway. It's the step during "compilation" that does what you want to do. Nothing else in the compilation matters here.

As I already said, at least in languages like C

Which isn't always a good argument.

Saying that this is undefined behavior is simply wrong.

Well, first of all, by definition/as it stands, it is undefined behavior.
Also, what I was stating is the desired property we would like to have, not about anything you can observe in julia or can find in other languages. What you are describing is simply the behavior of a simple compiler and you are basically requesting to make that a well defined feature. However, nothing is as simple as that. For example, for a constant/inline global C, it is totally possible for f(C); g(C) in the code to use different values of C if C was assigned to and even if that's done before the first call. With this kind of inconsistency (unlikely but possible) that could happen it becomes very difficult to predict a lot of things and a smart enough compiler can very reasonably do some very unexpected transformation because of this. That's why this is left as undefined and that's why it's so important to defined what exactly something must do independent of the execution/compilation pipeline.

I interpret that "undefined" at the end to mean actually the same thing as "unspecified value" from the C standard, and not that the previous program exhibits undefined behavior.

This is wrong. It IS undefined behavior.

@antoine-levitt
Copy link
Contributor

antoine-levitt commented Nov 28, 2020

I think redefining structs unconditionally is a pretty good deal, as it would allow you to redefine any struct, not those explicitly marked redefinable. It would be analogous to function redefinition in that sense (note I have no idea if it's even feasible to implement...) : you can already redefine Base.+(::Int, ::Int) (but it's not a great idea)

Your proposed inline feels to me like a misfeature explicitly introducing hidden state and compiler-dependent behavior, which is a magnet for subtle bugs. In a hypothetical future with redefinable structs and global-scope type assertions, it also doesn't feel too useful (although it would certainly be very useful in the short term). If your code really depends on inlining of a global non-const variable for performance, it's probably not a very good design (and it's explicitly discouraged style in julia)

@FedericoStra
Copy link
Contributor Author

FedericoStra commented Nov 28, 2020

@yuyichao

As I already said, at least in languages like C

Which isn't always a good argument.

I keep referring to the C standard because, despite being considered a huge mess by the masses, at least it strives to give precise definitions of some concepts. In particular, "unspecified value" != "undefined behavior".

I interpret that "undefined" at the end to mean actually the same thing as "unspecified value" from the C standard, and not that the previous program exhibits undefined behavior.

This is wrong. It IS undefined behavior.

Are you 100% sure that

struct HasPlain
    n::Int
    HasPlain() = new()
end
HasPlain().n
print("hello")

is really meant to be undefined behavior by the language spec, and not an unspecified valid value (again, in the sense of the C standard)? If it really is undefined behavior, then it means that executing it may lead to a crash before printing "hello". On the other hand, if the field n simply has an unspecified value, then "hello" must be printed. Are you really really really sure that the (non-existent) language specification wants this to be undefined behavior, rather than you simply conflating the two concepts?

I'm not saying that I know the answer. I can only read the docs that get published online, and they are not clear.

It may very well be that this really is the intention in the minds of the developers of the language, but for sure it is not what is communicated through the documentation.

@yuyichao
Copy link
Contributor

is really meant to be undefined behavior by the language spec, and not an unspecified valid value (again, in the sense of the C standard)?

Yes. Because this is mapped to undefined behavior in the compiler.

@FedericoStra
Copy link
Contributor Author

FedericoStra commented Nov 28, 2020

@antoine-levitt

I think redefining structs unconditionally is a pretty good deal, as it would allow you to redefine any struct, not those explicitly marked redefinable. It would be analogous to function redefinition in that sense (note I have no idea if it's even feasible to implement...) : you can already redefine Base.+(::Int, ::Int) (but it's not a great idea)

Well, sure, it sounds cool, but quite dangerous too, arguably more than redefining inline globals. I'm not saying that I don't want it, though: I would love to crash the interpreter in hilarious ways... :)

Your proposed inline feels to me like a misfeature explicitly introducing hidden state and compiler-dependent behavior, which is a magnet for subtle bugs.

If it (rightfully) feels too dangerous, it can be disallowed in "final products" and enabled only in interactive sessions and modules under development. After all, this is its primary goal.

In a hypothetical future with redefinable structs and global-scope type assertions, it also doesn't feel too useful (although it would certainly be very useful in the short term).

I'm not sure that global variables with type assertions will be able to be inlined, because you always want the newest value, hence, they will probably not have the same performance characteristics of constants. inline would be a bridge between the two.

If your code really depends on inlining of a global non-const variable for performance, it's probably not a very good design (and it's explicitly discouraged style in julia)

Again, I'm not advocating abuse of inline as "good coding practice". It's just meant primarily for interactive use, where you want to change your mind and keep going instead of falling into undefined behavior. Imagine you have a file file.jl

inline c = 42
inline S = struct ... end
f() = ... c ...
g(::S) = ...

At the REPL, you keep reloading it with include("file.jl"). You can play with c and S and change them, and everything will be fine because you keep reloading the associated definitions of the functions, so there is no ambiguity on which value gets used.

@FedericoStra
Copy link
Contributor Author

FedericoStra commented Nov 28, 2020

@yuyichao

Yes. Because this is mapped to undefined behavior in the compiler.

Where does the language specification say so? Also, accessing an uninitialized field currently isn't mapped to instant UB:

struct HasPlain
    n::Int
    HasPlain() = new()
end
foo() = HasPlain().n
@code_llvm foo()

shows

define i64 @julia_foo_352() {
top:
  ret i64 undef
}

Producing or using in certain ways a value of undef is not undefined behavior (despite the misleading name). It plays the role of unspecified value of the C standard. From the LLVM docs:

The string ‘undef’ can be used anywhere a constant is expected, and indicates that the user of the value may receive an unspecified bit-pattern. Undefined values may be of any type (other than ‘label’ or ‘void’) and be used anywhere a constant is permitted.
Undefined values are useful because they indicate to the compiler that the program is well defined no matter what value is used. This gives the compiler more freedom to optimize.

[plenty of examples omitted]

These examples show the crucial difference between an undefined value and undefined behavior.

Some usages of undef lead to undefined behavior (but others do not):

However, a store to an undefined location could clobber arbitrary memory, therefore, it has undefined behavior.

Branching on an undefined value is undefined behavior.

The Julia docs only say:

The plain data types consist of primitive types (e.g. Int) and immutable structs of other plain data types. The initial contents of a plain data type is undefined.

It doesn't say "reading the contents of an uninitialized field of plain data type is undefined behavior". Furthermore, accessing an uninitialized field which is not of plain data type is surely not undefined behavior:

While you are allowed to create objects with uninitialized fields, any access to an uninitialized reference is an immediate error:

julia> z.data
ERROR: UndefRefError: access to undefined reference

Throwing an exception is most definitely not undefined behavior! You can catch the error and recover

struct S
  x
  S() = new()
end
try
  S().x
catch
  print("nevermind")
end

The docs only say that the contents is undefined. And indeed we have

@code_llvm HasPlain()
define [1 x i64] @julia_HasPlain_364() {
top:
  ret [1 x i64] undef
}

Your are interpreting the docs as saying that it must be UB, but it could only be because you are not aware of the difference between undef/poison, or "unspecified value"/"undefined behavior".

If the docs mean that it should be undefined behavior (which they are currently not saying clearly), then @code_llvm HasPlain() can compile to this then

define [1 x i64] @julia_HasPlain_364() {
top:
  ret [1 x i64] poison
}

Here I don't want to debate anymore with you about your interpretations of what undefined behavior is. I want to know from a reputable source if the specification of the Julia language says anywhere that

HasPlain().n
print("hello")

exhibits undefined behavior and could crash without printing anything. I can open a separate issue specific for the question, because this discussion is getting sidetracked.

@yuyichao
Copy link
Contributor

Your are interpreting the docs as saying that it must be UB, but it could only be because you are not aware of the difference between undef/poison, or "unspecified value"/"undefined behavior".

No I'm not interpreting any docs, it is you that is reading the doc. I am merely telling you what are the intended behavior of the compiler and the runtime.

Yes undef value is not undefined behavior but we are also not statically compiling to LLVM. There's no guarantee (though it usually don't happen) that the compiler/runtime doesn't look at a value in the code that is otherwise unused. It is perfectly legal for the compiler and runtime to look at the undef value, and say, do a branch on it, which will become UB. Since there's no way for the user to garantee that such thing won't happen, it is UB as long as the user looks at such values. It does mean that the runtime needs to be careful to not read uninitialized objects but that's easier to achieve since it's not a problem when the runtime allocates the memory.

@JeffBezanson
Copy link
Member

That may be the case currently, but I don't think any of us really want it to be that way --- we want HasPlain().n to give an unspecified value, and heck, maybe even define it to be 0 some time soon. If setting it to 0 is the only option LLVM gives us to be safe, then ok, we should just do that.

Changing a constant is a different situation. Any situation where (1) the compiler is allowed to assume the value of something, but (2) that value might change, can lead to unsoundness, i.e. the compiler makes assumptions that are incorrect, which then must be undefined behavior (or a compiler bug). The value of the variable can have unlimited downstream effects, e.g. causing the return type of a function to change, so there's not really any such thing as "just" observing the value of the variable.

The only way out of this would be to track dependencies on constant values the same way we do for method definitions, so we can recompile when the value changes. So far we've felt that would be a waste of effort, since redefining constants is not something a program should do anyway. However, maybe the use case of changing structs with Revise makes it worthwhile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants