-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate Read::initializer in favor of ptr::freeze #58363
Conversation
This doesn't currently work on asmjs and wasm, since those don't support inline assembly. What's the right way to link an extern function to libcore? It doesn't seem like we currently do anything like that. |
/// ``` | ||
#[inline] | ||
#[unstable(feature = "ptr_freeze", issue = "0")] | ||
pub unsafe fn freeze<T>(dst: *mut T, count: usize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right interface? It's currently a bit weird in that we don't actually use the count. It could alternatively just take the pointer, and say that it freezes all memory reachable through it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, should this be unsafe in the first place? Since it's not actually modifying any of the pointed-to data, does it matter if it's valid or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's already "basically stable", I wonder if this should take &mut [T]
and move to std::mem
?
I'd naively think that it could be safe and probably should be, but I'm not an expert!
We also briefly discussed maybe only taking u8
for now? I'm not sure how useful this would be beyond u8
and other POD types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should take raw pointers so people don't have to create references to uninitialized data.
count
being unused just comes from the fact that LLVM does not support "real" freeze, but I think this is a much better interface than "reachable from".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we change this to T: ?Sized
so that you can pass a slice in? Then the count
parameter would no longer be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK there's currently no way to form a *mut [T]
without going through a reference first.
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
Read implementations should only write into the buffer passed to them, but have the ability to read from it. Access of uninitialized memory can easily cause UB, so there's then a question of what a user of a reader should do to initialize buffers. Previously, we allowed a Read implementation to promise it wouldn't look at the contents of the buffer, which allows the user to pass uninitialized memory to it. Instead, this PR adds a method to "freeze" undefined bytes into arbitrary-but-defined bytes. This is currently done via an inline assembly directive noting the address as an output, so LLVM no longer knows it's uninitialized. There is a proposed "freeze" operation in LLVM itself that would do this directly, but it hasn't been fully implemented. Some targets don't support inline assembly, so there we instead pass the pointer to an extern "C" function, which is similarly opaque to LLVM. The current approach is very low level. If we stabilize, we'll probably want to add something like `slice.freeze()` to make this easier to use.
d92521f
to
5e0fb23
Compare
src/libcore/ptr.rs
Outdated
/// // We're passing this buffer to an arbitrary reader and aren't | ||
/// // guaranteed they won't read from it, so freeze to avoid UB. | ||
/// let mut buf: [u8; 4] = mem::uninitialized(); | ||
/// ptr::freeze(&mut buf, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use buf.as_mut_ptr()
and 4?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it really matters either way - we're either freezing a single [u8; 4]
value, or 4 u8
values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sure, I just figured it was a bit odd compared to how we'd expect it to idiomatically be used
I think one thing that might be good to add here as well is a few tests that exercise |
cc @RalfJung |
Also, do we have some plan for how to let Miri do this? Miri can actually meaningfully take |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nits :)
src/libcore/ptr.rs
Outdated
/// Uninitialized memory has undefined contents, and interation with that data | ||
/// can easily cause undefined behavior. This function "freezes" memory | ||
/// contents, converting uninitialized memory to initialized memory with | ||
/// arbitrary conents so that use of it is well defined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// arbitrary conents so that use of it is well defined. | |
/// arbitrary contents so that use of it is well defined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"arbitrary but fixed contents" might be a better formulation.
Also it might be worth noting that use is only well defined for integer type -- even with arbitrary but fixed contents, using this with bool
or &T
is UB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also it might be worth noting that use is only well defined for integer type -- even with arbitrary but fixed contents, using this with
bool
or&T
is UB.
That's a great point! Definitely worth noting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have this phrased as "Every bit representation of T
must be a valid value", but I don't think that's the best way of saying that. Ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have this phrased as "Every bit representation of
T
must be a valid value",
Is there perhaps an auto-trait waiting to be invented for that? We could ostensibly make the API a bit safer by adding a constraint T: TheTrait
...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's been talk of this kind of thing for quite a while (pub auto trait Pod {}
), but I think that'd be relevant as part of a safe interface over this specific function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this can be an auto trait -- auto traits are always implemented for fieldless enums, but this function is not sufficient to make a fieldless enum valid.
/// | ||
/// * `dst` must be [valid] for reads. | ||
/// | ||
/// Note that even if `T` has size `0`, the pointer must be non-NULL and properly aligned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Note that even if `T` has size `0`, the pointer must be non-NULL and properly aligned. | |
/// Note that even if `size_of::<T>() == 0`, the pointer must be non-NULL and properly aligned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this formulation is used consistently throughout this file, so I'd prefer that if it gets changed that happens in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah; that's a good idea; I'll see if I can remember to make such a PR... :)
Either an intrinsic or a lang item. I think an intrinsic is the Right Thing here, because it would allow the compiler to choose which magic to apply. So the on some platforms the intrinsic lowering would emit assembly, on others a call to the opaque extern function. Although I can believe that writing this with cfg in pure Rust is easier (like done by this PR). If that is the preferred way, we can just make the function a lang item and thus miri knows when it's calling it and can intercept the call. |
Co-Authored-By: sfackler <sfackler@gmail.com>
A cursory look at the LLVM code reveals that an assembly parser exists, which would suggest that wasm does in fact support |
It is also the right thing given that |
Oh, great! I was just guessing off the fact that test::black_box is cfg'd off on those platforms. I'll update it to an intrinsic tonight. |
src/libcore/ptr.rs
Outdated
@@ -946,6 +946,58 @@ pub unsafe fn write_volatile<T>(dst: *mut T, src: T) { | |||
intrinsics::volatile_store(dst, src); | |||
} | |||
|
|||
/// Freezes `count * size_of::<T>()` bytes of memory, converting undefined data into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the first time we talk about this kind of data in the docs. I usually call it "uninitialized data" as I feel that is easier to understand. It also is further away from LLVM's undef
, which is good -- Rust's "uninitialized data" is much more like poision
than undef
.
src/libcore/ptr.rs
Outdated
/// arbitrary contents so that use of it is well defined. | ||
/// | ||
/// This function has no runtime effect; it is purely an instruction to the | ||
/// compiler. In particular, it does not actually write anything to the memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function does have an effect in the "Rust abstract machine" though, not just in the compiler. And of course it has a run-time effect by inhibiting optimizations.
Maybe a comparison with a compiler fence helps? Those also clearly have an effect on runtime behavior even though they do not have a runtime effect themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, also I think we should be clear about this counting as a write access as far as mutability and data races are concerned.
Doing this on a shared reference is UB.
/// unsafe { | ||
/// // We're passing this buffer to an arbitrary reader and aren't | ||
/// // guaranteed they won't read from it, so freeze to avoid UB. | ||
/// let mut buf: [u8; 4] = mem::uninitialized(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a FIXME somewhere about porting this to MaybeUninit
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This is still open, from what I can see)
Could you add to the PR description an explanation of why you want to move away from the now-deprecated scheme, or add a link to where this was documented? |
The docs say
I think this should be clearly marked as a detail of the current implementation. The specified behavior of this function is to freeze all uninitialized memory and keep initialized memory unchanged. How that is achieved and whether this has any run-time cost is up to the implementation. I think this is like |
How does everyone feel about adding a wrapper struct akin to pub struct Frozen<T>;
impl<T> Frozen<T> {
pub fn new(t: T) -> Frozen<T>;
pub fn into_inner(this: Frozen<T>) -> T;
}
impl<T: ?Sized> Frozen<T> {
pub fn from_mut(t: &mut T) -> &mut Frozen<T>;
}
impl<T: ?Sized> Deref for Frozen<T> {
type Target = T;
}
impl<T: ?Sized> DerefMut for Frozen<T> {} |
@eternaleye If the problem is literally just with the number zero, that seems solvable by making it 42 / 24601 / funny arbitrary-but-not-random-number-of-choice. |
@cramertj: The problem is that any fixed choice, precisely by failing to capture the nondeterminism, artificially hides risk. The source of danger is telling the compiler it's allowed to be certain, when the problem is that there may be uncertainty. EDIT: Nasty case: const fn foo(x: u64) -> u64 {
...
ptr::freeze(...);
...
}
#[test]
fn check_foo() {
assert!(foo(3) == 7)
} This test silently became meaningless regarding runtime behavior, when the argument to |
@cramertj That might work during CTFE, but does |
@Centril From your question I take it you view referential transparency of |
One problem that I can forsee is optimizations changing the behavior of code by const evaluating a call to a That said, we already have such things as unstable const eval features (e.g. comparing pointers or casting pointers to So we can just mark the |
But then CTFE could yield a different result than run-time code, which is confusing at best.
Well, as long as the behavior at compile-time is one of the possible behaviors at run-time, such an optimization is still correct. If we say " |
The following is what I think on
I think referential transparency is nice, but it is a stronger property than I think we can get away with, at least for polymorphic functions, due to the pervasiveness of Instead of a type system guarantee of referential transparency, my goal is a weaker determinism property roughly as outlined by @RalfJung in their thoughts on compile-time function evaluation and type systems. Particularly, I think we should aspire to "CTFE correctness" in Ralf's terminology.
For the same reason as outlined by Ralf in the post and here (#58363 (comment)). It would be hugely surprising if execution is deterministic at compile-time but not at run-time. Moreover, I believe that the determinism of I don't think we can bear the complexity cost of another almost-const-fn-but-not and so
One could imagine situations where relying on determinism at run-time is important. If we move beyond const-generics and allow types to depend on run-time computations (this is not something that is on our roadmaps, but it would be sad to take steps to rule out such a long-term future...), e.g. fn foo(n: usize) {
let arr: [u8; dyn n] = ...;
} we cannot say, given -- This is fine, we can already fake this today in Rust:
data (=) : a -> b -> Type where
Refl : x = x
-- With non-determinism, `f` may give varying results for equal inputs
-- and so it would be unsound to claim that `f a = f b`.
cong : {f : t -> u} -> a = b -> f a = f b We do not need β-reduction to be strongly normalizing for such computations to be fine if |
☔ The latest upstream changes (presumably #58357) made this pull request unmergeable. Please resolve the merge conflicts. |
@oli-obk Even if the behavior of branching on frozen memory is defined, doing so accidentally is probably a bug. So I think I would prefer if miri would warn or error in branching on frozen memory by default, and if we could offer a way to suppress the lint (e.g. |
Maybe also detecting if a function "leaks" frozen memory, and requiring a Also, I would prefer if the way in which |
There's an easy way to do this, which is to make To actually distinguish uninitialized and frozen data is certainly possible, but very expensive in terms of code complexity and run-time cost. I don't think it will happen any time soon. Extending Miri with non-determinism in a configurable way is on my personal roadmap. |
ping from triage @sfackler any updates on this? |
This is blocked on an RFC that I'm probably not going to have time to write in the immediate future, so let's close this PR for now. |
I was told above that an RFC is not needed to experiment with this on nightly. The reason I didn't r+ is that there are review comments that didn't get addressed yet. |
During the discussion about my latest blog post, it was discovered that when the allocator uses One consequence of this is that it is impossible to implement "by-reference" |
AFAIK the C standard does not require anywhere that uninitialized memory must preserve its value, even if no program action modifies it. For example, if the compiler can prove that the padding bytes of a struct S { char x; int32_t y; };
void foo(char input) {
char z = 42;
struct S s = S { 0, 0 };
char b = read_first_padding_byte(&s);
assert(z == first_padding_byte);
// MAYBE-OK: ^^^ the compiler might have stored z inside S
if (input > 1) {
z = 13;
}
char b2 = read_first_padding_byte(&s);
assert(b == b2); // MAYBE-FAIL, the z=13 modified the padding of S
} In this example, the program reads the padding bytes of The same applies to heap memory returned by In fact, modern compilers are even smarter than that. When you read uninitialized memory returned by malloc, the compiler does not even need to emit a read, whatever is in the registers is as good as any other result, and that can change - since there might not be any reads to the memory, this can allow eliding the allocation completely. For example, clang optimizes this: static char* global = (char*)malloc(1);
char uninit() { return *global; } to: uninit(): # @uninit()
ret such that you will read different uninitialized memory each time you call So I don't know either how |
@gnzlbg It seems like you are missing some of the context established further up in this thread.
The C standards committee thinks it is reasonable for uninitialized memory to change without an action of the program even if the standard does not say anywhere that this is the case. I think that's silly; this is a case of the standard being awfully ambiguous at best and the committee knowing about this and still not fixing the problem. But that is besides the point. We always knew that the compiler does not preserve uninitialized memory. The proposal to implement The problem, however, is that the operating system does not preserve uninitialized memory, and hence using "black box" does not actually work. This is the new bit of information I was relaying above. |
It may just not be the right approach, but we could paper over this issue
by having freeze write one byte to each page.
On Wed, Jul 17, 2019 at 5:10 AM Ralf Jung ***@***.***> wrote:
@gnzlbg <https://github.com/gnzlbg> It seems like you are missing some of
the context established further up in this thread.
AFAIK the C standard does not require anywhere that uninitialized memory
must preserve its value, even if no program action modifies it.
The C standards committee thinks it is reasonable for uninitialized memory
to change without an action of the program even if the standard does not
say anywhere that this is the case. I think that's silly.
But that is besides the point. We always knew that *the compiler* does
not preserve uninitialized memory. The proposal to implement freeze was
to basically use the "black box" trick, so the compiler has to assume that
memory did, in fact, get initialized. That takes care of everything you
wrote.
The problem, however, is that *the operating system* does not preserve
uninitialized memory, and hence using "black box" does not actually work.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#58363?email_source=notifications&email_token=AALDMUKH35Y6R3ILN4OAVM3P73OYDA5CNFSM4GWNJG4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2DRVSI#issuecomment-512170697>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AALDMUPVX7F6BMNIA7OIXOTP73OYDANCNFSM4GWNJG4A>
.
--
Steven Fackler
|
Read implementations should only write into the buffer passed to them,
but have the ability to read from it. Access of uninitialized memory can
easily cause UB, so there's then a question of what a user of a reader
should do to initialize buffers.
Previously, we allowed a Read implementation to promise it wouldn't look
at the contents of the buffer, which allows the user to pass
uninitialized memory to it.
Instead, this PR adds a method to "freeze" undefined bytes into
arbitrary-but-defined bytes. This is currently done via an inline
assembly directive noting the address as an output, so LLVM no longer
knows it's uninitialized. There is a proposed "freeze" operation in LLVM
itself that would do this directly, but it hasn't been fully
implemented.
Some targets don't support inline assembly, so there we instead pass the
pointer to an extern "C" function, which is similarly opaque to LLVM.
The current approach is very low level. If we stabilize, we'll probably
want to add something like
slice.freeze()
to make this easier to use.r? @alexcrichton