-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define unspecified behavior #214
Conversation
I really like the separation between Undefined Behavior and Unspecified Behavior :) |
#### Unspecified behavior | ||
[unspecified]: #unspecified | ||
|
||
*Unspecified behavior* is not an error condition in the abstract machine, but beyond that, the Rust language provides no other guarantees about what behavior these programs have. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an odd introduction... the first thing you say is what this not is.
Also, we are moving towards a more general "assumptions made by compiler" def.n for UB.
Proposal:
Unspecified behavior is behavior of the Rust Abstract Machine that the Rust language provides no guarantee for. Unspecified behavior always comes with a set of behaviors that the implementation can pick from.
The latter part is important. I don't think "anything but the error state" is a useful spec. And for your example of field offsets, there is such a set: In https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/layout/structs-and-tuples.md we define what the dregrees of freedom are here for the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an odd introduction... the first thing you say is what this not is.
The only thing we guarantee about unspecified behavior is that it is not an error in the abstract machine. I'm open of different ways of wording this guarantee.
Unspecified behavior is behavior of the Rust Abstract Machine that the Rust language provides no guarantee for.
That's incorrect, "the behavior for which the Rust Abstract Machine provides no guarantees for" is undefined behavior. For unspecified behavior we do provide some guarantees, the most important one being that unspecified behavior is not undefined.
I don't think "anything but the error state" is a useful spec. And for your example of field offsets, there is such a set: In https://github.com/rust-lang/unsafe-code-guidelines/blob/master/reference/src/layout/structs-and-tuples.md we define what the dregrees of freedom are here for the compiler.
In that document, we define that field offset is a degree-of-freedom that the compiler has when determining struct layout, but that the compiler is "free to re-order field layout as it wishes".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the behavior for which the Rust Abstract Machine provides no guarantees for" is undefined behavior
Not really -- there's no behavior in UB for the R-AM, it is an error state. But the wording is still not great; I agree with that part.
The only thing we guarantee about unspecified behavior is that it is not an error in the abstract machine.
That's useless. Then the behavior could still be "replace all memory contents by 0x00", making it impossible to program.
We always need to give a bound on what "unspecified behavior" can do, or we might as well declare it UB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next attempt:
Unspecified behavior occurs in the Rust language when the implementation is free to pick any one of a given set of possible behaviors of the R-AM. The implementation does not have to document that choice nor commit to it, and the choice it makes can vary even within the execution of a single program.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's useless. Then the behavior could still be "replace all memory contents by 0x00", making it impossible to program.
We always need to give a bound on what "unspecified behavior" can do, or we might as well declare it UB.
This definition isn't useless since it provides a guarantee over undefined behavior. Text that uses it might provide extra bounds, but I don't think this definition needs to try to provide such bounds nor require them to exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To elaborate on "useless", imagine we had a function fn unspec();
and said calling it is unspecified behavior. "Anything except for UB can happen". Well, one possible choice for "anything" is "oops your memory is empty now, we deallocated all of it", so the following program could have UB:
let x = 4;
unspec();
assert!(x == 4); // UB! x might not be allocated any more
We could carefully try to restrict what unspecified behavior can do in general, but that's going to be super painful. So unless there is a strong motivation for having "(almost) unbounded unspecified behavior", we better avoid it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless you allow "bounded by a potentially unspecified bound" I'm not sure how you can describe FFI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But at that point, I'd rather leave the bound to whatever feature decides to be "unspecified behavior", and if that feature decides to provide no bounds, and that in your opinion makes the feature useless, then just make the case against adding such a feature to the language? If a feature provides absolutely no bounds, an RFC would really need to make a good case for it for landing such a feature in the language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FFI isn't "unspecified behavior"... that would be rather catastrophic as one couldn't program against it.^^ Specifying FFI is hard, but we shouldn't pretend that we can properly handle it by saying "unspecified". We need to define cross-language linking to specify FFI. Without xLTO we could do it on the target/assembly level; with xLTO... TBH at that point we probably have to work on the LLVM level as I doubt we can make C programs run on the R-AM.^^
OTOH, struct field offsets are a good example for unspecified behavior precisely because we can bound the choices but do not want to commit. That's what we should use it for; not as an excuse for "sorry it's hard we don't know what to say".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FFI isn't "unspecified behavior"... that would be rather catastrophic as one couldn't program against it.^^
Really? What is it then ? (notice that for many platforms, the C ABI is unspecified - also notice that some platforms don't have a C ABI at all).
Maybe this is my C++ talking (since "no guarantee" is such a typical and famous definition of UB over in C++-land), but:
sounds borderline contradictory to me. A set of behaviors that the implementation must pick from is a guarantee that no other behaviors will occur, right? (and it is must, right?) |
Well I guess, maybe "no guarantee" is too strong then. The point is that within that set, no guarantee is provided, and also no stability -- behavior can differ between compiler versions, between multiple runs of the same compiler, between multiple copies of the same code in a single program, and even within execution of a single program. |
Foolish Question: Why are we picking a secondary term that also shortens to UB? Have we considered alternative terms to "unspecified"? |
@Lokathor not a foolish question at all but it's what C++ does... |
"Unspecified" is also a bad choice because there'll also always be parts of the spec that we haven't written or that we don't know how to write well, and it sounds like that's what this is about. In fact, with @gnzlbg saying they consider FFI "unspecified", maybe that's even what they mean. But FFI and struct layout are very different in the amount of uncertainty and should certainly not be put into the same bucket! if unspecified is for FFI I propose we define it as "behavior where the spec is unfinished and we do not currently know how to say anything more precise". |
"Unspecified behavior" does not prevent text that uses this term to contain "more words" that delineate the guarantees.
No it isn't. This is about behavior for which we can't say much in practice due to real world limitations, or because we do not want to allow users to rely on implementation details. This has nothing with the spec being unfinished, but rather, for the cases where the spec wants to not say anything by design. |
(a) These are very different cases IMO so I think conflating them in one category of behavior is bad. (b) FFI isn't "unspecified" because of real world limitations but because specifying linking is genuinely hard. At least that's my thinking. Maybe you, less concerned with mathematical definitions, are seeing another barrier for FFI? FFI also has many aspects that have nothing to do with the R-AM itself but with the way it gets translated to assembly -- things like calling conventions or unwinding mechanisms. Maybe that's what you mean? But that is a totally different thing from not comiting to struct layout. We shouldn't conflate such unrelated notions. In particular this is not even part of the spec of the R-AM but rather part of the spec for how the R-AM must be compiled down to assembly, to enable its interaction with other assembly code (somewhat comparable with what we called "target UB" elsewhere). As a user, for things like struct layout I can handle that (e.g. when convincing myself that my code is correct) by arguing that "for every possible choice the compiler could make, my code works". This is useful, it is productive and precise -- I then know that for any conforming Rust implementation, my code is fine. But it only works when there is a set to pick from. When there is no set, my experience as a user is totally different. I fundamentally cannot actually convince myself that my code is correct for any conforming implementation, I have to do something else. I think this shows quite clearly that you are conflating two very different beasts under one name here. (C++ does that, but that doesn't mean it is good -- it just means some people's way of thinking is already shaped that way, but I don't think this is a good enough argument to repeat the same mistake in Rust.) |
On a platform where the "C" ABI does not exist, the toolchain might implement a C ABI in any way, and such an ABI implementation might change on every toolchain version. This would be by design, e.g., to allow actually implementing the ABI to confirm the plaform were the platform to specify one ABI in the future. That is, there are really no guarantees about how the C ABI is implemented, but calling a C function is not undefined behavior. A user could look up what Rust does, and write C code that uses the same C ABI that Rust implements. The proposed specification of "unspecified behavior" provides the guarantee that "unspecified behavior is not an error in the abstract machine" (or not UB). For some reason, you appear to conclude from that that, if we say that the behavior of something is unspecified, providing bounds for what the behavior is allowed to do is illegal. I don't understand how you arrive at this conclusion. It is perfectly fine to say that "the precise layout of And yes, programs that rely on unspecified behavior might end up exhibiting undefined behavior as a consequence later in their execution, but the "unspecified" operation does not instantaneously exhibit UB. |
Let's turn that argument on its head. If there ever is a legitimate case where somehing is specified as "anything at all that isn't a R-AM error can happen", then it can simply specify this very large set of possibilities as the bounds. So putting the notion of some bounds on the behavior is not actually a restriction, it just suggests a default that is far more reasonable along many axes: most unspec'd behavior does have a (non-maximal) set of possible behaviors (no guarantees at all is the exception, in fact so far it seems hypothetical and very controversial), and making this concept part of unspec'd behavior both nudges spec writers to think about what a reasonable set of possible behavior would be in each case and tells readers of the glossary to expect such a set. |
@rkruppe I'd be 100% comfortable with adding something like:
Maybe what @RalfJung is looking for implementation-defined behavior ? That's behavior for which we could require the spec to document all possible alternatives and for the implementation to pick one and document what it picks, but that's very different from unspecified behavior. |
So basically we have, in order of "defined-ness":
"Implementation Defined Behavior" seems like a good term for the middle band of situations, and a particular compiler can choose to say "our details here are both implementation defined and unstable across versions, compiler calls, etc". |
I am as confident as one can possibly be when speaking for another when I say that this is obviously not what @RalfJung is looking for. As you defined it yourself, impl-defined behavior requires implementations to make a choice and document it, in contrast to the running example of struct field layout where implementations don't want to commit to any one choice:
|
One bikeshed color I kind of like is to explicitly separate what the R-AM says from what each implementation says, using separate words for each. For example, we could say "implementation-defined and undocumented" to mean what (I think?) C++ calls "unspecified behavior", and we could say "implementation-defined and documented" to mean what (I think?) C++ calls "implementation-defined". Some other ideas: "unstable/stable implementation-defined", "de facto/de jure implementation-defined", "implementation-determined vs -defined". Now that I've written this, I feel like unstable/stable is the best variant. |
I never said or concluded anything like that. The reason we have a spec is to let users argue against the spec that their code is correct. As I explained above, it is fundamentally impossible to do that with your proposed definition. Hence it fails its goal as a specification.
And that's useful... why? For whom? Which program can I write about which I now know more than I did without the "except UB" clause? The way I view it, for all intents and purposes, your definition is equivalent to one that just says "unspec behavior can be anything, including UB". You clearly don't like that, that's why you added the exception -- and you didn't state what exactly the goal is you think you are achiving with the exception, but I think you are not achieving anything.
I understand the difference and that is not what I meant. My last proposal specifically said that the unspec behavior does not have to be documented.
"secret" is not really the right word, but yes. |
I like "Implementation Defined" with the understanding that there needs to be some sort of parameters of what's allowed and that a particular implementation is allowed to say "we fall within the parameters but otherwise are unstable about it" |
Thanks for introducing an example, that helps a lot. These are lots of terms that do not appear in the vocabulary of the R-AM. So what you are talking about has little to do with R-AM error states. ABIs are part of how the compiler implements the R-AM in terms o the target and they are generally not observable by the programmer -- in a pure Rust program. They of course become observable if the programmer looks at the assembly, or puts the program into a tracer, or so. But generally we don't specify what happens on that level, quite deliberately -- if we specified what each R-AM program looked like on the assembly level there would be nothing left for the optimizer to do! So yes, all of that is "unspecified", but not in any way that I think we need to call out anywhere. It is unspecified by omission. We don't guarantee which x86 assembly instruction is used to compile Also in your example, if the C ABI in fact changes, then calling a C function compiled with the new ABI from a previously compiled Rust program will be UB. Compiler assumptions can easily be violated if the calling convention changes. So even under your proposed notion of unspec I don't think we could use that term for FFI. |
This example assumes that we will add an operation to the language that's just specified as "This operation is unspecified" without any other information or any other language feature that allows using it correctly. I think this assumption is incorrect, but if you believe otherwise please do elaborate on how such a feature would make it through the RFC process given that you appear to be strongly against it and probably others would as well (at least I would).
The Rust "call ABI" is unspecified, that is, there is no documented way in the spec to properly call Rust functions, and in fact, if you compile two crates with different Rust toolchains and call one function from one crate from the other, the behavior is undefined if both toolchains disagree on the function ABI (that's a "pure Rust" program). The language does provide the function call operator, which allows you to call a Rust function, as long as it has the same ABI that the current toolchain has, even though this ABI is unspecified. Do you think Rust functions are useless? If not, how would you define in the spec how to call a Rust function? (keeping in mind the existence of other toolchains, FFI, inline assembly which you can use to call a Rust function, etc.). If you wouldn't say that the Rust call ABI is unspecified, what would you say ? Also, if under your model the Rust call ABI is somehow "UB", how do you specify what a function call does in Rust ? (it calls a function "somehow", but if that "somehow" is undefined behavior, how is a function call not undefined behavior?) The same applies to field offsets of
Can you mention a single specific instance of unspecified behavior in Rust today for which such reasoning is impossible? (e.g. for function calls, panics, struct layout, users appear to be able to write useful code even though we do not say anywhere how any of that precisely works). |
I argued above that all cases of unspecified behavior should come with a restriction. You argued that that would not necessarily be the case, citing FFI as an example. But now you say unspec behavior without restriction would never make it through the RFC process? I am confused.
Yeah 'cause clearly I said that...
We define it in the Rust Abstract Machine. See Miri for a concrete implementation of that -- somewhat messy, but certainly fully specified.
Ah, now you are again going down to the target level! I thought we are talking about unspecified behavior of the R-AM here? There is no FFI or inline assembly in the R-AM. So do we agree that for unspec behavior in the R-AM, there should always be a concrete set of behaviors that the implementation has to choose from? To come back to your question: we don't define exactly which assembly instruction is emitted for an addition operation. We only specify what the Rust Abstract Machine does with it. That's enough. The same is true for function calls. In first approximation, we say nothing about what that means on the target, other than "if you run it, the observable behavior is that of the R-AM". So the addition might become a normal add instruction, it might disappear entirely (through constant folding), or Now, in reality, for reasons you alluded to, we actually do say some things about interaction with the target platform. After all, we want to support FFI. So, while generally the question how the R-AM gets realized is entirely up to us, we do make some promises, such as how function calls to |
Just skimmed the thread. It's bringing back bad memories. Definitions are hard. One thing I think I may be able to help clarify, here. @gnzlbg wrote:
I think the point here is that it may be that the "effects" of some particular things are unspecified, but we do specify that a certain pattern will work. To continue with the theme of FFI, you might imagine saying that when a Rust panic unwinds into a C ABI, it is translated to some form of "foreign exception", and that the details of that are unspecified. Similarly, when a foreign exception unwinds into Rust code, that behavior is also unspecified -- except that if the foreign exception originated as a Rust panic, then it must be translated back into a Rust panic that can be caught with But I don't really know how much of a contradiction this is. It seems to me like what Ralf is saying is:
I think that these "parameters" might be a set of possible behaviors, but it also also be that more complex patterns must work (e.g., the translation between Rust panics and foreign exceptions is unspecified, except that Rust panics can be faithfully propagated). Maybe one interesting point is how "local" the parameters are. i.e., in the FFI example, we can't say constrain the conversion of "Rust Panic -> Foreign Exception" in isolation, only the behavior of it when coupled with "Foreign Exception -> Rust Panic". In any case, as @rkruppe points out, simply saying that "some parameters must be given" could of course really permit us to write all kinds of possibilities, and is thus more a statement of intent than anything else. The intent is to say that there is useful stuff you can do with this, and we'll try to specify some of those things, whereas with "undefined behavior" the idea is that there is no reason a program might want to do the thing. I think an interesting example is a longjmp over a frame with destructors. In windows, as I understand it, this is defined behavior (the destructors execute). But in Unix, it's just a bug (destructors do not execute). In C++ spec, it is UB, and yet if you know your platform is msvc, you can indeed rely on it. This feels "not quite like" other instances of UB. I could imagine writing it as "unspecified behavior" where the choices include "running destructors" and "aborting the machine". Here there is no useful guarantee but it's "informative" somehow to give those range of options, it hints that one might get stronger guarantees from one's platform. Ok, somehow this turned into a long comment. Not sure if it is helpful but I'll post it anyhow. |
The defining characteristics of UB as I understand it is that it can break completely unrelated well-defined behavior and cause otherwise impossible things to preciptate in execution. Perhaps a definition that avoids these infamous characteristics fits with the spirit of what most people understand unspecified behavior to be and avoids accidentally defining unspecified behavior as a synonym of undefined behavior. So I think that at a minimum, that if something is described as unspecified, that it does not affect the correctness of values or behavior that do not depend on it, and that the range of possible behaviors and states of the program resulting from an unspecified construct is a (possibly non-strict) subset of those which may be caused a well-defined construct with the same type. Would there be problems with this approach? |
Closing since we haven't reached consensus around the proposed definitions and probably need to start over. |
No description provided.