From bb47320806319df85be777d09550d1c01ce37ce3 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Tue, 10 Nov 2020 09:45:19 +0100 Subject: [PATCH 01/18] add const-ub RFC --- text/0000-const-ub.md | 112 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 112 insertions(+) create mode 100644 text/0000-const-ub.md diff --git a/text/0000-const-ub.md b/text/0000-const-ub.md new file mode 100644 index 00000000000..f0b2dc6a2fd --- /dev/null +++ b/text/0000-const-ub.md @@ -0,0 +1,112 @@ +- Feature Name: `const_ub` +- Start Date: 2020-10-10 +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary +[summary]: #summary + +Define how UB during const evaluation is treated: +some kinds of UB must be detected, the rest leads to an unspecified result for the affected CTFE query (but does not otherwise "taint" the compilation process). + +# Motivation +[motivation]: #motivation + +So far, nothing is specified about what happens when `unsafe` code leads to UB during CTFE. +This is a major blocker for stabilizing `unsafe` operations in const-contexts. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +There are some values that Rust needs to compute at compile-time. +This includes the initial value of a `const`/`static`, and array lengths (and more general, const generics). +Computing these initial values is called compile-time function evaluation (CTFE). +CTFE in Rust is very powerful and permits running almost arbitrary Rust code. +This begs the question, what happens when there is `unsafe` code and it causes [Undefined Behavior (UB)][UB]? + +The answer depends on the kind of UB: some kinds of UB are guaranteed to be detected, +while other kinds of UB might either be detected, or else evaluation will continue as if the violated UB condition did not exist (i.e., as if this operation was actually defined). +This can change from compiler version to compiler version: CTFE code that causes UB could build fine with one compiler and fail to build with another. +(This is in accordance with the general policy that unsound code is not subject to strict stability guarantees.) + +[UB]: https://doc.rust-lang.org/reference/behavior-considered-undefined.html + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +The following kinds of UB are detected by CTFE, and will cause compilation to stop with an error: +* Incorrect use of compiler intrinsics (e.g., reaching an `unreachable` or violating the assumptions of `exact_div`). +* Dereferencing dangling pointers. +* Using an invalid value in an arithmetic, logical or control-flow operation. + +These kinds of UB have in common that there is nothing sensible evaluation can do besides stopping with an error. + +Other kinds of UB might or might not be detected: +* Dereferencing unaligned pointers. +* Violating Rust's aliasing rules. +* Producing an invalid value (but not using it in one of the ways defined above). +* Any [other UB][UB] not listed here. + +All of this UB has in common that there is an "obvious" way to continue evaluation even though the program has caused UB: +we can just access the underlying memory despite alignment and/or aliasing rules being violated, and we can just ignore the existence of an invalid value as long as it is not used in some arithmetic, logical or control-flow operation. +There is no guarantee that CTFE detects such UB: evaluation may either fail with an error, or continue with the "obvious" result. + +If the compile-time evaluation uses operations that are specified as non-deterministic, +and only some of the non-deterministic choices lead to CTFE-detected UB, +then CTFE may choose any possible execution and thus miss the possible UB. +For example, if we end up specifying the value of padding after a typed copy to be non-deterministically chosen, then padding will be initialized in some executions and uninitialized in others. +If the program then performs integer arithmetic on a padding byte, that might or might not be detected as UB, depending on the non-deterministic choice made by CTFE. + +## Note to implementors + +This requirement implies that CTFE must happen on code that was *not subject to UB-exploiting optimizations*. +In general, optimizations of Rust code may assume that the source program does not have UB, so programs that exhibit UB can simply be ignored when arguing for the correctness of an optimization. +However, this can lead to programs with UB being translated into programs without UB, so if constant evaluation runs after such an optimization, it might fail to detect the UB. +The only permissible optimizations are those that preserve all UB and that preserve the behavior of programs whose UB CTFE does not detect. +Formally speaking this means they must be correct optimizations for the abstract machine *that CTFE actually implements*, not just for the abstract machine that specifies Rust; and moreover they must preserve the location and kind of UB that is detected by CTFE. + +# Drawbacks +[drawbacks]: #drawbacks + +To be able to either detect UB or continue evaluation in a well-defined way, CTFE must run on unoptimized code. +This means when compiling a `const fn` in some crate, the unoptimized code needs to be stored. +So either the code is stored twice (optimized and unoptimized), or optimizations can only happen after all CTFE results have been computed. +[Experiments in rustc](https://perf.rust-lang.org/compare.html?start=35debd4c111610317346f46d791f32551d449bd8&end=3dbdd3b981f75f965ac04452739653a3d47ff0ed) showed a severe performance impact on CTFE stress-tests, but no impact on real code except for a slowdown of "incr-unchanged" (which are rather fast so small changes lead to large percentages). + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +The most obvious alternative is to say that UB during CTFE will definitely be detected. +However, that is expensive and might even be impossible. +Even Miri does not currently detect all UB, and Miri is already performing many additional checks that would significantly slow down CTFE. +Furthermore, implementing these checks requires a more precise understanding of UB than we currently have; basically, this would block having any potentially-UB operations at const-time on having a spec for Rust that precisely describes their UB in a checkable way. +In particular, this would mean we need to decide on an aliasing model before permitting raw pointers in CTFE. + +To avoid the need for keeping the unoptimized sources of `const fn` around, we could weaken the requirement for detecting UB and instead say that UB might cause arbitrary evaluation results. +Under the assumption that unsound code is not subject to the usual stability guarantees, this is an option we can still move to in the future, should it turn out that the proposal made in this RFC is too expensive. + +Another extreme alternative would be to say that UB during CTFE may have arbitrary effects in the host compiler, including host-level UB. +Basically this would mean that CTFE would be allowed to "leave its sandbox". +This would allow JIT'ing CTFE and running the resulting code unchecked. +While compiling untrusted code should only be done with care (including additional sandboxing), this seems like an unnecessary extra footgun. + +# Prior art +[prior-art]: #prior-art + +C++ requires compilers to detect UB in `constexpr`. +However, the fragment of C++ that is available to `constexpr` excludes pointer casts, pointer arithmetic (beyond array bounds), and union-based type punning, which makes such checks not very complicated and avoids most of the poorly specified parts of UB. +The corresponding type-punning-free fragment of Rust (no raw pointers, no `union`, no `transmute`) can only cause UB that is defined UB to be definitely detected during CTFE. +In that sense, rust achieves feature parity with C++ in terms of UB detection during CTFE. +(Indeed, this was the prime motivation for making such strict UB detection requirements in the first place.) + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +Currently none. + +# Future possibilities +[future-possibilities]: #future-possibilities + +This RFC provides an easy way forward for "unconst" operations, i.e., operations that are safe at run-time but not at compile-time. +Primary examples of such operations are anything involving the integer representation of pointers, which cannot be known at compile-time. +If this RFC were accepted, we could declare such operations "definitely detected UB" during CTFE (and thus naturally they would only be permitted in an `unsafe` block). From 656a14a16c2e93f8fff33e3114e807a10e551655 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Tue, 10 Nov 2020 09:55:44 +0100 Subject: [PATCH 02/18] typos --- text/0000-const-ub.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0000-const-ub.md b/text/0000-const-ub.md index f0b2dc6a2fd..0e719435de4 100644 --- a/text/0000-const-ub.md +++ b/text/0000-const-ub.md @@ -95,8 +95,8 @@ While compiling untrusted code should only be done with care (including addition C++ requires compilers to detect UB in `constexpr`. However, the fragment of C++ that is available to `constexpr` excludes pointer casts, pointer arithmetic (beyond array bounds), and union-based type punning, which makes such checks not very complicated and avoids most of the poorly specified parts of UB. -The corresponding type-punning-free fragment of Rust (no raw pointers, no `union`, no `transmute`) can only cause UB that is defined UB to be definitely detected during CTFE. -In that sense, rust achieves feature parity with C++ in terms of UB detection during CTFE. +The corresponding type-punning-free fragment of Rust (no raw pointers, no `union`, no `transmute`) can only cause UB that is defined to be definitely detected during CTFE. +In that sense, Rust achieves feature parity with C++ in terms of UB detection during CTFE. (Indeed, this was the prime motivation for making such strict UB detection requirements in the first place.) # Unresolved questions From 40abf688b566d7d52f594762cd91da74d3dd918f Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 12 Nov 2020 19:51:41 +0100 Subject: [PATCH 03/18] be mroe clear about the lack of stability --- text/0000-const-ub.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/text/0000-const-ub.md b/text/0000-const-ub.md index 0e719435de4..d1b514100bb 100644 --- a/text/0000-const-ub.md +++ b/text/0000-const-ub.md @@ -27,7 +27,11 @@ This begs the question, what happens when there is `unsafe` code and it causes [ The answer depends on the kind of UB: some kinds of UB are guaranteed to be detected, while other kinds of UB might either be detected, or else evaluation will continue as if the violated UB condition did not exist (i.e., as if this operation was actually defined). This can change from compiler version to compiler version: CTFE code that causes UB could build fine with one compiler and fail to build with another. -(This is in accordance with the general policy that unsound code is not subject to strict stability guarantees.) + +This RFC does not alter the general policy that unsound code is not subject to strict stability guarantees. +In other words, unsafe code may not rely on all future versions of Rust to implement this RFC. +The RFC only helps *consumers* of unsafe code to be sure that right now, all UB during CTFE will be detected. +It does not grant any new possibilities to *authors* of unsafe code. [UB]: https://doc.rust-lang.org/reference/behavior-considered-undefined.html From 278168332e4c2dbb56008b2b18b0b1c15fac2493 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sun, 15 Nov 2020 18:23:17 +0100 Subject: [PATCH 04/18] future possibility: a flag to disable UB checking --- text/0000-const-ub.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/text/0000-const-ub.md b/text/0000-const-ub.md index d1b514100bb..b5b3ac40100 100644 --- a/text/0000-const-ub.md +++ b/text/0000-const-ub.md @@ -114,3 +114,6 @@ Currently none. This RFC provides an easy way forward for "unconst" operations, i.e., operations that are safe at run-time but not at compile-time. Primary examples of such operations are anything involving the integer representation of pointers, which cannot be known at compile-time. If this RFC were accepted, we could declare such operations "definitely detected UB" during CTFE (and thus naturally they would only be permitted in an `unsafe` block). + +If UB checks turn out to be expensive, we could consider adding a flag to let users opt-out of UB checking. +This will speed up compilation, and not change behavior of correct code. From e78333a0129f2ed595e642a031fb51dc8e99b78c Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sun, 15 Nov 2020 18:32:06 +0100 Subject: [PATCH 05/18] extend discussion of intrinsics and library UB --- text/0000-const-ub.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/text/0000-const-ub.md b/text/0000-const-ub.md index b5b3ac40100..54a2e023b29 100644 --- a/text/0000-const-ub.md +++ b/text/0000-const-ub.md @@ -39,7 +39,6 @@ It does not grant any new possibilities to *authors* of unsafe code. [reference-level-explanation]: #reference-level-explanation The following kinds of UB are detected by CTFE, and will cause compilation to stop with an error: -* Incorrect use of compiler intrinsics (e.g., reaching an `unreachable` or violating the assumptions of `exact_div`). * Dereferencing dangling pointers. * Using an invalid value in an arithmetic, logical or control-flow operation. @@ -55,12 +54,23 @@ All of this UB has in common that there is an "obvious" way to continue evaluati we can just access the underlying memory despite alignment and/or aliasing rules being violated, and we can just ignore the existence of an invalid value as long as it is not used in some arithmetic, logical or control-flow operation. There is no guarantee that CTFE detects such UB: evaluation may either fail with an error, or continue with the "obvious" result. +In particular, the RFC does not mandate whether UB caused by implementation-defined compiler intrinsics (insofar as they are supported by CTFE) is detected. +However, implementations should document for which intrinsics UB is detected, and (if UB is not detected), what the behavior if CTFE will be instead. +For rustc, all intrinsic-specific UB (e.g., reaching an `unreachable` or violating the assumptions of `exact_div`) will be detected, but if intrinsics perform memory accesses, they are treated like regular accesses for UB detection (e.g., aliasing or alignment violations are not detected, and execution proceeds just ignoring this check). + +The RFC also does not mandate detecting any library UB, i.e., UB caused by violating the contract of a (standard) library function. +The same conditions as for intrinsics apply: implementations should document which UB is detected. +If library UB is ignored, execution must continue by just following the rules of the Abstract Machine for current implementation of the library function, treating it as if that code had no contract applied to it. +In rustc, no library UB will be detected. + If the compile-time evaluation uses operations that are specified as non-deterministic, and only some of the non-deterministic choices lead to CTFE-detected UB, then CTFE may choose any possible execution and thus miss the possible UB. For example, if we end up specifying the value of padding after a typed copy to be non-deterministically chosen, then padding will be initialized in some executions and uninitialized in others. If the program then performs integer arithmetic on a padding byte, that might or might not be detected as UB, depending on the non-deterministic choice made by CTFE. +This RFC is concerned only with language-UB, not with library-UB, i.e., UB caused by violating the contract of a (standard) library function. + ## Note to implementors This requirement implies that CTFE must happen on code that was *not subject to UB-exploiting optimizations*. From a7c303406ded19397d1566b716a2ef15e0d0ea0e Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sun, 15 Nov 2020 19:57:35 +0100 Subject: [PATCH 06/18] tweak wording --- text/0000-const-ub.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-const-ub.md b/text/0000-const-ub.md index 54a2e023b29..0b11d57fcab 100644 --- a/text/0000-const-ub.md +++ b/text/0000-const-ub.md @@ -55,7 +55,7 @@ we can just access the underlying memory despite alignment and/or aliasing rules There is no guarantee that CTFE detects such UB: evaluation may either fail with an error, or continue with the "obvious" result. In particular, the RFC does not mandate whether UB caused by implementation-defined compiler intrinsics (insofar as they are supported by CTFE) is detected. -However, implementations should document for which intrinsics UB is detected, and (if UB is not detected), what the behavior if CTFE will be instead. +However, implementations should document for each intrinsic whether UB is detected, and (if UB is ignored for an intrinsic), what the behavior of CTFE will be when UB occurs. For rustc, all intrinsic-specific UB (e.g., reaching an `unreachable` or violating the assumptions of `exact_div`) will be detected, but if intrinsics perform memory accesses, they are treated like regular accesses for UB detection (e.g., aliasing or alignment violations are not detected, and execution proceeds just ignoring this check). The RFC also does not mandate detecting any library UB, i.e., UB caused by violating the contract of a (standard) library function. From 5045d5fa0aa4007a1542bc5ad458c29025f37535 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sat, 27 Feb 2021 13:34:45 +0100 Subject: [PATCH 07/18] some clarifications --- text/0000-const-ub.md | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/text/0000-const-ub.md b/text/0000-const-ub.md index 0b11d57fcab..ffe1022c4be 100644 --- a/text/0000-const-ub.md +++ b/text/0000-const-ub.md @@ -7,7 +7,8 @@ [summary]: #summary Define how UB during const evaluation is treated: -some kinds of UB must be detected, the rest leads to an unspecified result for the affected CTFE query (but does not otherwise "taint" the compilation process). +some kinds of UB must be detected, the remaining UB conditions are ignored and evaluation continues in a well-defined way. +However, CTFE queries causing UB are not subject to stability guarantees and thus may fail to build in the future (e.g. when more UB is being detected). # Motivation [motivation]: #motivation @@ -30,7 +31,7 @@ This can change from compiler version to compiler version: CTFE code that causes This RFC does not alter the general policy that unsound code is not subject to strict stability guarantees. In other words, unsafe code may not rely on all future versions of Rust to implement this RFC. -The RFC only helps *consumers* of unsafe code to be sure that right now, all UB during CTFE will be detected. +The RFC only helps *consumers* of unsafe code to be sure that right now, all UB during CTFE will be detected or non-consequential. It does not grant any new possibilities to *authors* of unsafe code. [UB]: https://doc.rust-lang.org/reference/behavior-considered-undefined.html @@ -50,6 +51,10 @@ Other kinds of UB might or might not be detected: * Producing an invalid value (but not using it in one of the ways defined above). * Any [other UB][UB] not listed here. +In rustc, none of this UB will be detected for now. +However, code causing any kind of UB is still considered buggy and not subject to stability guarantees. +Hence, rustc may start detecting more UB in the future. + All of this UB has in common that there is an "obvious" way to continue evaluation even though the program has caused UB: we can just access the underlying memory despite alignment and/or aliasing rules being violated, and we can just ignore the existence of an invalid value as long as it is not used in some arithmetic, logical or control-flow operation. There is no guarantee that CTFE detects such UB: evaluation may either fail with an error, or continue with the "obvious" result. @@ -60,7 +65,7 @@ For rustc, all intrinsic-specific UB (e.g., reaching an `unreachable` or violati The RFC also does not mandate detecting any library UB, i.e., UB caused by violating the contract of a (standard) library function. The same conditions as for intrinsics apply: implementations should document which UB is detected. -If library UB is ignored, execution must continue by just following the rules of the Abstract Machine for current implementation of the library function, treating it as if that code had no contract applied to it. +If library UB is ignored, execution must continue by just following the rules of the Abstract Machine for the current implementation of the library function, treating it as if that code had no contract applied to it. In rustc, no library UB will be detected. If the compile-time evaluation uses operations that are specified as non-deterministic, @@ -69,8 +74,6 @@ then CTFE may choose any possible execution and thus miss the possible UB. For example, if we end up specifying the value of padding after a typed copy to be non-deterministically chosen, then padding will be initialized in some executions and uninitialized in others. If the program then performs integer arithmetic on a padding byte, that might or might not be detected as UB, depending on the non-deterministic choice made by CTFE. -This RFC is concerned only with language-UB, not with library-UB, i.e., UB caused by violating the contract of a (standard) library function. - ## Note to implementors This requirement implies that CTFE must happen on code that was *not subject to UB-exploiting optimizations*. @@ -127,3 +130,5 @@ If this RFC were accepted, we could declare such operations "definitely detected If UB checks turn out to be expensive, we could consider adding a flag to let users opt-out of UB checking. This will speed up compilation, and not change behavior of correct code. + +The RFC clarifies that there is no *guarantee* that code with UB is evaluated in any particular way, so if we want to detect more UB during CTFE in the future, we are free to do so from a stability perspective. From e1a29a7f9a81b7a8de327843dd3732601e9717c6 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sat, 27 Feb 2021 14:20:01 +0100 Subject: [PATCH 08/18] more precise wording --- text/0000-const-ub.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/text/0000-const-ub.md b/text/0000-const-ub.md index ffe1022c4be..88d2e634034 100644 --- a/text/0000-const-ub.md +++ b/text/0000-const-ub.md @@ -45,12 +45,13 @@ The following kinds of UB are detected by CTFE, and will cause compilation to st These kinds of UB have in common that there is nothing sensible evaluation can do besides stopping with an error. -Other kinds of UB might or might not be detected: +Other kinds of UB might or might not be detected depending on the implementation: * Dereferencing unaligned pointers. * Violating Rust's aliasing rules. * Producing an invalid value (but not using it in one of the ways defined above). * Any [other UB][UB] not listed here. +Implementations should document which of these kinds of UB they detect. In rustc, none of this UB will be detected for now. However, code causing any kind of UB is still considered buggy and not subject to stability guarantees. Hence, rustc may start detecting more UB in the future. From a90a538b49adbe0c2a1cbd20fe4385cb19b7398c Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sat, 6 Mar 2021 18:04:25 +0100 Subject: [PATCH 09/18] clarify --- text/0000-const-ub.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-const-ub.md b/text/0000-const-ub.md index 88d2e634034..d1d90e1e256 100644 --- a/text/0000-const-ub.md +++ b/text/0000-const-ub.md @@ -129,7 +129,7 @@ This RFC provides an easy way forward for "unconst" operations, i.e., operations Primary examples of such operations are anything involving the integer representation of pointers, which cannot be known at compile-time. If this RFC were accepted, we could declare such operations "definitely detected UB" during CTFE (and thus naturally they would only be permitted in an `unsafe` block). -If UB checks turn out to be expensive, we could consider adding a flag to let users opt-out of UB checking. +If UB checks turn out to be expensive, the RFC leaves the option of adding a flag to let users opt-out of UB checking. This will speed up compilation, and not change behavior of correct code. The RFC clarifies that there is no *guarantee* that code with UB is evaluated in any particular way, so if we want to detect more UB during CTFE in the future, we are free to do so from a stability perspective. From 60bef157df055aa4cae75a7708fd7b336353895f Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sat, 13 Mar 2021 16:35:25 +0100 Subject: [PATCH 10/18] better language and further clarification --- text/0000-const-ub.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/text/0000-const-ub.md b/text/0000-const-ub.md index d1d90e1e256..52fe9ecefb5 100644 --- a/text/0000-const-ub.md +++ b/text/0000-const-ub.md @@ -23,7 +23,7 @@ There are some values that Rust needs to compute at compile-time. This includes the initial value of a `const`/`static`, and array lengths (and more general, const generics). Computing these initial values is called compile-time function evaluation (CTFE). CTFE in Rust is very powerful and permits running almost arbitrary Rust code. -This begs the question, what happens when there is `unsafe` code and it causes [Undefined Behavior (UB)][UB]? +This raises the question, what happens when there is `unsafe` code and it causes [Undefined Behavior (UB)][UB]? The answer depends on the kind of UB: some kinds of UB are guaranteed to be detected, while other kinds of UB might either be detected, or else evaluation will continue as if the violated UB condition did not exist (i.e., as if this operation was actually defined). @@ -31,8 +31,8 @@ This can change from compiler version to compiler version: CTFE code that causes This RFC does not alter the general policy that unsound code is not subject to strict stability guarantees. In other words, unsafe code may not rely on all future versions of Rust to implement this RFC. -The RFC only helps *consumers* of unsafe code to be sure that right now, all UB during CTFE will be detected or non-consequential. -It does not grant any new possibilities to *authors* of unsafe code. +The RFC only helps *consumers* of unsafe code to be sure that right now, all UB during CTFE will be detected or non-consequential (i.e., evaluation will proceed as if there was no UB). +It does not grant any new possibilities to *authors* of unsafe code; in particular, it is still considered a critical bug for CTFE code to raise UB, and no stability guarantees are made for such code (as is the case with regular runtime code raising UB). [UB]: https://doc.rust-lang.org/reference/behavior-considered-undefined.html From 9eead862fb7350f1127457b7673b227025c0b8fc Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Tue, 16 Mar 2021 17:24:49 +0100 Subject: [PATCH 11/18] require implementations to document the 'obvious' --- text/0000-const-ub.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/text/0000-const-ub.md b/text/0000-const-ub.md index 52fe9ecefb5..98ba52708a0 100644 --- a/text/0000-const-ub.md +++ b/text/0000-const-ub.md @@ -56,9 +56,10 @@ In rustc, none of this UB will be detected for now. However, code causing any kind of UB is still considered buggy and not subject to stability guarantees. Hence, rustc may start detecting more UB in the future. -All of this UB has in common that there is an "obvious" way to continue evaluation even though the program has caused UB: +All of this UB has in common that there is an easy way to continue evaluation even though the program has caused UB: we can just access the underlying memory despite alignment and/or aliasing rules being violated, and we can just ignore the existence of an invalid value as long as it is not used in some arithmetic, logical or control-flow operation. -There is no guarantee that CTFE detects such UB: evaluation may either fail with an error, or continue with the "obvious" result. +There is no guarantee that CTFE detects such UB: evaluation may either fail with an error, or continue with the some well-defined result. +In the latter case, implementations should document how evaluation will proceed, i.e., how the result is computed. In particular, the RFC does not mandate whether UB caused by implementation-defined compiler intrinsics (insofar as they are supported by CTFE) is detected. However, implementations should document for each intrinsic whether UB is detected, and (if UB is ignored for an intrinsic), what the behavior of CTFE will be when UB occurs. From 6e1739f98707922d5a3bd581fe161b04d8c81b2c Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sun, 28 Mar 2021 14:19:16 +0200 Subject: [PATCH 12/18] rewrite RFC: do not require UB detection --- text/0000-const-ub.md | 105 ++++++++++++++++-------------------------- 1 file changed, 40 insertions(+), 65 deletions(-) diff --git a/text/0000-const-ub.md b/text/0000-const-ub.md index 98ba52708a0..24798e90216 100644 --- a/text/0000-const-ub.md +++ b/text/0000-const-ub.md @@ -6,9 +6,7 @@ # Summary [summary]: #summary -Define how UB during const evaluation is treated: -some kinds of UB must be detected, the remaining UB conditions are ignored and evaluation continues in a well-defined way. -However, CTFE queries causing UB are not subject to stability guarantees and thus may fail to build in the future (e.g. when more UB is being detected). +Define UB during const evaluation to lead to an unspecified result for the affected CTFE query, but not otherwise infect the compilation process. # Motivation [motivation]: #motivation @@ -25,72 +23,55 @@ Computing these initial values is called compile-time function evaluation (CTFE) CTFE in Rust is very powerful and permits running almost arbitrary Rust code. This raises the question, what happens when there is `unsafe` code and it causes [Undefined Behavior (UB)][UB]? -The answer depends on the kind of UB: some kinds of UB are guaranteed to be detected, -while other kinds of UB might either be detected, or else evaluation will continue as if the violated UB condition did not exist (i.e., as if this operation was actually defined). -This can change from compiler version to compiler version: CTFE code that causes UB could build fine with one compiler and fail to build with another. +The answer is that in this case, the final value that is currently being executed is arbitrary. +For example, when UB arises while computing an array length, then the final array length can be any `usize`, or it can be (partially) uninitialized. +No guarantees are made about this final value, and it can be different depending on host and target architecture, compiler flags, and more. +However, UB will not otherwise adversely affect the currently running compiler; type-checking and lints and everything else will work correctly given whatever the result of the CTFE computation is. -This RFC does not alter the general policy that unsound code is not subject to strict stability guarantees. -In other words, unsafe code may not rely on all future versions of Rust to implement this RFC. -The RFC only helps *consumers* of unsafe code to be sure that right now, all UB during CTFE will be detected or non-consequential (i.e., evaluation will proceed as if there was no UB). -It does not grant any new possibilities to *authors* of unsafe code; in particular, it is still considered a critical bug for CTFE code to raise UB, and no stability guarantees are made for such code (as is the case with regular runtime code raising UB). +Note, however, that this means compile-time UB can later cause runtime UB when the program is actually executed: +for example, if there is UB while computing the initial value of a `Vec`, the result might be a completely invalid vector that causes UB at runtime when used in the program. -[UB]: https://doc.rust-lang.org/reference/behavior-considered-undefined.html +Sometimes, the compiler might be able to detect such problems and show an error or warning about CTFE computation having gone wrong (for example, the compiler might detect when the array length ends up being uninitialized). +But other times, this might not be the case -- there is no guarantee that UB is reliably detected during CTFE. +This can change from compiler version to compiler version: CTFE code that causes UB could build fine with one compiler and fail to build with another. +(This is in accordance with the general policy that unsound code is not subject to stability guarantees.) +Implementations are encouraged to perform as many UB checks as they feasibly can, and they are encouraged to document which UB is and is not detected during CTFE and what the consequences of undetected UB can be, but none of this is required. -# Reference-level explanation -[reference-level-explanation]: #reference-level-explanation +## CTFE UB-checking in `rustc` -The following kinds of UB are detected by CTFE, and will cause compilation to stop with an error: +For `rustc` specifically at the time the RFC is written, a lot of UB will actually be detected reliably: * Dereferencing dangling pointers. -* Using an invalid value in an arithmetic, logical or control-flow operation. - -These kinds of UB have in common that there is nothing sensible evaluation can do besides stopping with an error. +* Using an invalid value in an arithmetic, logical or control-flow operation (e.g. using `3` transmuted to a `bool` value in an `if`, or using an uninitialized integer in `+` or `|`). +* Violating the precondition of an intrinsic (e.g., reaching an `unreachable` or violating the assumptions of `exact_div`). -Other kinds of UB might or might not be detected depending on the implementation: -* Dereferencing unaligned pointers. -* Violating Rust's aliasing rules. -* Producing an invalid value (but not using it in one of the ways defined above). -* Any [other UB][UB] not listed here. +If any of these errors arise during CTFE, they will currently be reliably detected and a CTFE error will be raised. -Implementations should document which of these kinds of UB they detect. -In rustc, none of this UB will be detected for now. -However, code causing any kind of UB is still considered buggy and not subject to stability guarantees. -Hence, rustc may start detecting more UB in the future. +Other kinds of UB are ignored, and evaluation continues as if there was no error. +* Dereferencing unaligned pointers: memory is accessed at the given address even if it is insufficiently aligned. +* Violating Rust's aliasing rules: memory is read/written even if that violates aliasing guarantees. +* Producing an invalid value (but not using it in one of the ways defined above): evaluation continues despite the fact that an invalid value was produced. -All of this UB has in common that there is an easy way to continue evaluation even though the program has caused UB: -we can just access the underlying memory despite alignment and/or aliasing rules being violated, and we can just ignore the existence of an invalid value as long as it is not used in some arithmetic, logical or control-flow operation. -There is no guarantee that CTFE detects such UB: evaluation may either fail with an error, or continue with the some well-defined result. -In the latter case, implementations should document how evaluation will proceed, i.e., how the result is computed. +`rustc` also currently makes no attempt at detecting library UB. -In particular, the RFC does not mandate whether UB caused by implementation-defined compiler intrinsics (insofar as they are supported by CTFE) is detected. -However, implementations should document for each intrinsic whether UB is detected, and (if UB is ignored for an intrinsic), what the behavior of CTFE will be when UB occurs. -For rustc, all intrinsic-specific UB (e.g., reaching an `unreachable` or violating the assumptions of `exact_div`) will be detected, but if intrinsics perform memory accesses, they are treated like regular accesses for UB detection (e.g., aliasing or alignment violations are not detected, and execution proceeds just ignoring this check). +No UB-exploiting MIR optimizations are currently being performed for CTFE, so a CTFE execution currently will never go wrong in arbitrary ways: UB is either detected, or evaluation continues in a well-defined manner as described above. -The RFC also does not mandate detecting any library UB, i.e., UB caused by violating the contract of a (standard) library function. -The same conditions as for intrinsics apply: implementations should document which UB is detected. -If library UB is ignored, execution must continue by just following the rules of the Abstract Machine for the current implementation of the library function, treating it as if that code had no contract applied to it. -In rustc, no library UB will be detected. +However, this is just a snapshot of what `rustc` currently does. +None of this is *guaranteed*, and `rustc` may relax or otherwise change its UB checking any time. -If the compile-time evaluation uses operations that are specified as non-deterministic, -and only some of the non-deterministic choices lead to CTFE-detected UB, -then CTFE may choose any possible execution and thus miss the possible UB. -For example, if we end up specifying the value of padding after a typed copy to be non-deterministically chosen, then padding will be initialized in some executions and uninitialized in others. -If the program then performs integer arithmetic on a padding byte, that might or might not be detected as UB, depending on the non-deterministic choice made by CTFE. +[UB]: https://doc.rust-lang.org/reference/behavior-considered-undefined.html -## Note to implementors +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation -This requirement implies that CTFE must happen on code that was *not subject to UB-exploiting optimizations*. -In general, optimizations of Rust code may assume that the source program does not have UB, so programs that exhibit UB can simply be ignored when arguing for the correctness of an optimization. -However, this can lead to programs with UB being translated into programs without UB, so if constant evaluation runs after such an optimization, it might fail to detect the UB. -The only permissible optimizations are those that preserve all UB and that preserve the behavior of programs whose UB CTFE does not detect. -Formally speaking this means they must be correct optimizations for the abstract machine *that CTFE actually implements*, not just for the abstract machine that specifies Rust; and moreover they must preserve the location and kind of UB that is detected by CTFE. +When UB arises as part of CTFE, the result of this evaluation is an unspecified constant, i.e., it is arbitrary, and might not even be of the right type. +The compiler might be able to detect that UB occurred and raise an error or a warning, but this is not mandated, and absence of lints does not imply absence of UB. +However, the rest of the compiler will continue to function properly, and compilation *itself* will not raise UB. # Drawbacks [drawbacks]: #drawbacks -To be able to either detect UB or continue evaluation in a well-defined way, CTFE must run on unoptimized code. -This means when compiling a `const fn` in some crate, the unoptimized code needs to be stored. -So either the code is stored twice (optimized and unoptimized), or optimizations can only happen after all CTFE results have been computed. -[Experiments in rustc](https://perf.rust-lang.org/compare.html?start=35debd4c111610317346f46d791f32551d449bd8&end=3dbdd3b981f75f965ac04452739653a3d47ff0ed) showed a severe performance impact on CTFE stress-tests, but no impact on real code except for a slowdown of "incr-unchanged" (which are rather fast so small changes lead to large percentages). +This means UB during CTFE can silently "corrupt" the build in a way that the final program has UB when being executed +(but not more so than if the CTFE code would instead have been run at runtime). # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives @@ -98,25 +79,23 @@ So either the code is stored twice (optimized and unoptimized), or optimizations The most obvious alternative is to say that UB during CTFE will definitely be detected. However, that is expensive and might even be impossible. Even Miri does not currently detect all UB, and Miri is already performing many additional checks that would significantly slow down CTFE. -Furthermore, implementing these checks requires a more precise understanding of UB than we currently have; basically, this would block having any potentially-UB operations at const-time on having a spec for Rust that precisely describes their UB in a checkable way. +Furthermore, since optimizations can "hide" UB (an optimization can turn a program with UB into one without), this means we would have to run CTFE on unoptimized MIR. +And finally, implementing these checks requires a more precise understanding of UB than we currently have; basically, this would block having any potentially-UB operations at const-time on having a spec for Rust that precisely describes their UB in a checkable way. In particular, this would mean we need to decide on an aliasing model before permitting raw pointers in CTFE. -To avoid the need for keeping the unoptimized sources of `const fn` around, we could weaken the requirement for detecting UB and instead say that UB might cause arbitrary evaluation results. -Under the assumption that unsound code is not subject to the usual stability guarantees, this is an option we can still move to in the future, should it turn out that the proposal made in this RFC is too expensive. - Another extreme alternative would be to say that UB during CTFE may have arbitrary effects in the host compiler, including host-level UB. Basically this would mean that CTFE would be allowed to "leave its sandbox". This would allow JIT'ing CTFE and running the resulting code unchecked. While compiling untrusted code should only be done with care (including additional sandboxing), this seems like an unnecessary extra footgun. +A possible middle-ground is to guarantee to detect *some UB*. +However, what is cheap and/or easy to detect might change over time as the implementation of CTFE evolves, so to avoid drawing Rust into a corner, this RFC avoids making any such guarantees. + # Prior art [prior-art]: #prior-art C++ requires compilers to detect UB in `constexpr`. However, the fragment of C++ that is available to `constexpr` excludes pointer casts, pointer arithmetic (beyond array bounds), and union-based type punning, which makes such checks not very complicated and avoids most of the poorly specified parts of UB. -The corresponding type-punning-free fragment of Rust (no raw pointers, no `union`, no `transmute`) can only cause UB that is defined to be definitely detected during CTFE. -In that sense, Rust achieves feature parity with C++ in terms of UB detection during CTFE. -(Indeed, this was the prime motivation for making such strict UB detection requirements in the first place.) # Unresolved questions [unresolved-questions]: #unresolved-questions @@ -128,9 +107,5 @@ Currently none. This RFC provides an easy way forward for "unconst" operations, i.e., operations that are safe at run-time but not at compile-time. Primary examples of such operations are anything involving the integer representation of pointers, which cannot be known at compile-time. -If this RFC were accepted, we could declare such operations "definitely detected UB" during CTFE (and thus naturally they would only be permitted in an `unsafe` block). - -If UB checks turn out to be expensive, the RFC leaves the option of adding a flag to let users opt-out of UB checking. -This will speed up compilation, and not change behavior of correct code. - -The RFC clarifies that there is no *guarantee* that code with UB is evaluated in any particular way, so if we want to detect more UB during CTFE in the future, we are free to do so from a stability perspective. +If this RFC were accepted, we could declare such operations UB during CTFE (and thus naturally they would only be permitted in an `unsafe` block). +This still leaves the door open for providing better guarantees in the future. From 7983e460fe22c236e2037da1b4a42765b25924f9 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Mon, 29 Mar 2021 19:37:44 +0200 Subject: [PATCH 13/18] Update text/0000-const-ub.md Co-authored-by: Oli Scherer --- text/0000-const-ub.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-const-ub.md b/text/0000-const-ub.md index 24798e90216..50e584aaa50 100644 --- a/text/0000-const-ub.md +++ b/text/0000-const-ub.md @@ -79,7 +79,7 @@ This means UB during CTFE can silently "corrupt" the build in a way that the fin The most obvious alternative is to say that UB during CTFE will definitely be detected. However, that is expensive and might even be impossible. Even Miri does not currently detect all UB, and Miri is already performing many additional checks that would significantly slow down CTFE. -Furthermore, since optimizations can "hide" UB (an optimization can turn a program with UB into one without), this means we would have to run CTFE on unoptimized MIR. +Furthermore, since optimizations can "hide" UB (an optimization can turn a program with UB into one without), this means we have to keep running CTFE on unoptimized MIR. And finally, implementing these checks requires a more precise understanding of UB than we currently have; basically, this would block having any potentially-UB operations at const-time on having a spec for Rust that precisely describes their UB in a checkable way. In particular, this would mean we need to decide on an aliasing model before permitting raw pointers in CTFE. From b515180b04730f322ce0279371ed6302e27ac63e Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Mon, 29 Mar 2021 19:40:49 +0200 Subject: [PATCH 14/18] edits --- text/0000-const-ub.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0000-const-ub.md b/text/0000-const-ub.md index 50e584aaa50..5a610b94d5f 100644 --- a/text/0000-const-ub.md +++ b/text/0000-const-ub.md @@ -6,7 +6,7 @@ # Summary [summary]: #summary -Define UB during const evaluation to lead to an unspecified result for the affected CTFE query, but not otherwise infect the compilation process. +Define UB during const evaluation to lead to an unspecified result or hard error for the affected CTFE query, but not otherwise infect the compilation process. # Motivation [motivation]: #motivation @@ -63,7 +63,7 @@ None of this is *guaranteed*, and `rustc` may relax or otherwise change its UB c # Reference-level explanation [reference-level-explanation]: #reference-level-explanation -When UB arises as part of CTFE, the result of this evaluation is an unspecified constant, i.e., it is arbitrary, and might not even be of the right type. +When UB arises as part of CTFE, the result of this evaluation is an unspecified constant, i.e., it is arbitrary, and might not even be valid for the expected return type of this evaluation. The compiler might be able to detect that UB occurred and raise an error or a warning, but this is not mandated, and absence of lints does not imply absence of UB. However, the rest of the compiler will continue to function properly, and compilation *itself* will not raise UB. From 3e9cbb529b26f1660aad72ac4e41e89fb617453f Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Tue, 30 Mar 2021 19:46:43 +0200 Subject: [PATCH 15/18] clarify that CTFE remains consistent --- text/0000-const-ub.md | 1 + 1 file changed, 1 insertion(+) diff --git a/text/0000-const-ub.md b/text/0000-const-ub.md index 5a610b94d5f..0ab18a14ef7 100644 --- a/text/0000-const-ub.md +++ b/text/0000-const-ub.md @@ -27,6 +27,7 @@ The answer is that in this case, the final value that is currently being execute For example, when UB arises while computing an array length, then the final array length can be any `usize`, or it can be (partially) uninitialized. No guarantees are made about this final value, and it can be different depending on host and target architecture, compiler flags, and more. However, UB will not otherwise adversely affect the currently running compiler; type-checking and lints and everything else will work correctly given whatever the result of the CTFE computation is. +In particular, when the same constant is used in two different crates, those crates will still definitely see the same value for that constant -- everything else would break the type system. Note, however, that this means compile-time UB can later cause runtime UB when the program is actually executed: for example, if there is UB while computing the initial value of a `Vec`, the result might be a completely invalid vector that causes UB at runtime when used in the program. From b1734f8edb46fc33a21fdaf25cc5d4c5ee0a84f2 Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Tue, 4 May 2021 10:01:42 -0400 Subject: [PATCH 16/18] move to final text location. --- text/{0000-const-ub.md => 3016-const-ub.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename text/{0000-const-ub.md => 3016-const-ub.md} (100%) diff --git a/text/0000-const-ub.md b/text/3016-const-ub.md similarity index 100% rename from text/0000-const-ub.md rename to text/3016-const-ub.md From 54f728626d34fe0404d46b7b5c7ae3483e59571b Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Tue, 4 May 2021 10:10:47 -0400 Subject: [PATCH 17/18] My understanding is that RFC 3016 is just codifying existing behavior. So it does not need a tracking issue. --- text/3016-const-ub.md | 1 - 1 file changed, 1 deletion(-) diff --git a/text/3016-const-ub.md b/text/3016-const-ub.md index 0ab18a14ef7..f5afd4d1aaf 100644 --- a/text/3016-const-ub.md +++ b/text/3016-const-ub.md @@ -1,7 +1,6 @@ - Feature Name: `const_ub` - Start Date: 2020-10-10 - RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) -- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) # Summary [summary]: #summary From fcf400b0433a0e644bc2bb08b4425c424275812d Mon Sep 17 00:00:00 2001 From: "Felix S. Klock II" Date: Tue, 4 May 2021 17:29:11 -0400 Subject: [PATCH 18/18] add link to RFC PR itself. --- text/3016-const-ub.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/3016-const-ub.md b/text/3016-const-ub.md index f5afd4d1aaf..19b6c716009 100644 --- a/text/3016-const-ub.md +++ b/text/3016-const-ub.md @@ -1,6 +1,6 @@ - Feature Name: `const_ub` - Start Date: 2020-10-10 -- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- RFC PR: [rust-lang/rfcs#3016](https://github.com/rust-lang/rfcs/pull/3016) # Summary [summary]: #summary