Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making a value for uninhabited type is compiled to SIGILL on Nightly but only in Release mode #50976

Closed
vitalyd opened this issue May 22, 2018 · 14 comments

Comments

@vitalyd
Copy link

vitalyd commented May 22, 2018

https://users.rust-lang.org/t/help-to-find-root-cause-of-illegal-hardware-instruction/17600/7 gets right to the thick of things. Basically, this playground demonstrates the SIGILL due to compiler emitting a ud2 instruction as a result of the UB hit when trying to make a value for libc::timezone (an empty enum). However, this only happens on nightly and release (optimized) mode.

My concern is this type of thing will likely bite users, who may have (unintentionally) done something silly like the example above (why does libc define timezone as an empty enum? I'm not sure). But my questions are mostly:

  1. Is this an expected change in nightly?
  2. Is it expected that only optimized (release) builds hit this? Why isn't a debug building compiling to the same thing?
@cuviper
Copy link
Member

cuviper commented May 22, 2018

The superficial answer is in the name of UB -- undefined behavior should not be expected to behave in any particular way. An immediate ud2 is a somewhat useful outcome, but the optimizer might also just prune the entire control flow that led to this "unreachable" case. (Or trash your hard drive, or eat your lunch, etc.)

@sfackler
Copy link
Member

Empty enums are the standard way of representing opaque types in FFI until extern types stabilize.

@vitalyd
Copy link
Author

vitalyd commented May 23, 2018

@cuviper, I’m aware of what UB implies but that’s not really my question :).

@vitalyd
Copy link
Author

vitalyd commented May 23, 2018

@sfackler, that’s indeed the case although it’s usually used for “incoming” (ie from FFI) raw ptrs, and kept as such.

To reiterate, I’d like to know if the codegen is an intentional change in nightly (stable + beta don’t fault) and why it’s only in release builds.

I do worry that this will catch some people by surprise.

@vitalyd
Copy link
Author

vitalyd commented May 23, 2018

There’s also the optics of this: it’ll remind some people of UB in C or C++ where the compiler will silently just reduce code to rubbish, and that’s always been a hotly debated topic. Granted, this does trap at runtime so at least it doesn’t carry on, but there’s no compiler error or warning and I’d hate for Rust to be viewed in similar light to those languages as a result (or at least for people to use as an example).

@cuviper
Copy link
Member

cuviper commented May 23, 2018

@vitalyd, I expect you do understand UB, but... that's still my answer. Rust does have UB akin to C or C++, but you can only trigger UB with unsafe code (with a few exceptions that are considered bugs).

Granted, this does trap at runtime so at least it doesn’t carry on

That was lucky, but it's still not something to rely on! I don't know if something explicitly changed in rustc codegen recently, but it could equally well be a change in LLVM, or vary between targets, or...

@cuviper
Copy link
Member

cuviper commented May 23, 2018

Looking at what the user called in particular:

int gettimeofday(struct timeval *tv, struct timezone *tz);

struct timezone {
    int tz_minuteswest;     /* minutes west of Greenwich */
    int tz_dsttime;         /* type of DST correction */
};

If tz is not NULL, then gettimeofday is going to write 8 bytes to it. Rust doesn't have the right type definition to reserve the appropriate memory anywhere, so who knows what pointer it's giving -- maybe just a zero-sized stack allocation. Even in the cases where it doesn't hit ud2, with a bogus pointer either some memory is going to get trashed or it will segfault.

I'd be interested to see what pointer Rust produced in the "working" case...

@sfackler
Copy link
Member

I don't really understand how the optics of "I misused unsafe code and triggered UB" are bad.

The entire purpose of undefined behavior is to allow the compiler backend to take advantage of it to do interesting optimizations. It's entirely expected that it would behave differently in optimized and unoptimized builds.

@vitalyd
Copy link
Author

vitalyd commented May 23, 2018

@cuviper, the “working” case sends garbage - %rsp.

To be clear, I’m not suggesting that this is a bad change, per se - clearly trapping is better than proceeding. My question (to reiterate) is whether this is intentional in nightly.

The second question, to @sfackler’s last comment, is whether “clear” violations like this can be caught at compile time or if not, maybe by clippy. The fact the optimizer assumes certain things don’t happen and optimizes around them, is fine on its own. But, Rust should do better (I hope) here given its focus on safety and correctness. Shouldn’t there be at least a warning for let tz: timezone = std::mem::uninitialized();? That line alone, even if no reference is taken and this value is never used, kills the function. Can the compiler (or clippy) do better for at least clear cut (I’d think?) cases?

@sfackler
Copy link
Member

You could certainly make a lint in clippy about creating an uninitialized or zeroed uninhabited type. The ud2 is probably coming from #45920 but that behavior is definitely not guaranteed.

@vitalyd
Copy link
Author

vitalyd commented May 23, 2018

@sfackler, #45920 does seem relevant - thanks. But, it looks to have been merged 6 months ago whereas the codegen issue is nightly only. Did something else change to tickle it?

@sfackler
Copy link
Member

Clearly yes, if the ud2 is only generated on nightly! It could have been an LLVM change, or a codegen change, or something else. I don't think the specific set of conditions that generate this exact version of a function that invokes undefined behavior is all that interesting.

@vitalyd
Copy link
Author

vitalyd commented May 23, 2018

@sfackler, it’s at least interesting from release notes perspective. Once this trickles into stable, it likely warrants a note even if though it’s been UB all along. I’ve generally seen relnotes have a link to the commit that made the observable change.

As I mentioned earlier, even if there are no uses of the uninhabited type beyond the uninitialized() call, the trap is put in place. This means existing user code that doesn’t actually misbehave (because they don’t make use of the value) will start trapping. Now, they shouldn’t have dead code there but that’s beside the point IMO.

If you (and the rest of the compiler team) believes there’s actually nothing to see here, then feel free to close this.

@oli-obk
Copy link
Contributor

oli-obk commented May 24, 2018

I'd totally love to see a clippy lint for calls to zeroed or uninitialized with a never return type. The various pointer functions should probably lint, too (read comes to mind).

I agree that there's no action beyond that, you could have already gotten that trap on older stable compilers and just not on the last (few) stable. If we put all cases of changes in codegen for obvious UB into the changelog, you would drown out the actual changes. There's no end to what trivial changes to codegen can do to UB.

@oli-obk oli-obk closed this as completed May 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants