-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heap allocations in constants #20
Comments
Sounds perfect!
Ouch. :( Why are
I don't think we should do this: This means that any difference between compile-time and run-time execution becomes an immediate soundness error. Also, it's not just const B: String = String::from("foo");
let mut b = B;
b.push_str("bar"); // reallocates the "heap"-allocated buffer |
The problem with static B: Mutex<String> = Mutex::new(String::from("foo"));
let mut s = B.lock().unwrap();
s.push_str("bar"); // reallocates the "heap"-allocated buffer |
Ugh. Looks like we painted ourselves into a corner. Let's see if we can spiderman our way out. So... new rule. The final value of a constant/static may either be
The analyis happens completely on a constant's value+type combination |
Note that contrary to what I thought, rejecting types with Right now, even if we could change the past, I don't know something we could have done that would help here. |
Yea I realized that from your example, too. I believe that the two step value+type analysis covers all cases. We'd allow |
So just having rule (1) would mean if there is a ptr (value) that is not of type I am not sure I understand what (2) changes now. Does that mean if I encounter a pointer that is not a Btw, I just wondered why we don't rule out pointers to allocations of type "heap". Those are the only ones where deallocation is allowed, so if there are no such pointers, we are good. You say
but the example that follows doesn't do anything on the heap, so I don't understand. We currently do not allow heap allocation, so allowing it but not allowing such pointers in the final value must be fully backwards-compatible -- right? The thing is that you also want to allow const C: &String = &String::from("foo"); // Ok
const D: &str = &String::from("foo"); // Ok and that's where it gets hard. And now what you are trying to exploit is some guarantee of the form "data behind a frozen shared ref cannot be deallocated", and hence allow some heap pointers based on that? I think this is hereditary, meaning I don't understand why you seem to restrict this to 1 level of indirection. Consider const E: &Vec<String> = &vec![String::from("foo")]; // OK? Given that types have free reign over their invariants, I am not convinced this kind of reasoning holds. I do think we could allow (publicly visible) |
Hm... yea, I did not think about this properly. A raw pointer can just be So... we would also allow I'm not sure if it is legal to transmute |
I think we can have a privacy-sensitive value visitor.
Yeah that's why I suggested only going for public fields. I think such a type would be invalid anyway (it would still have a shared reference around, and Stacked Borrows will very quickly get angry at you for modifying what is behind that reference). But that seems somewhat shady, and anyway there doesn't seem to be much benefit from allowing private shared references. OTOH, none of this would allow I think if we want to allow that, we will have to ask for explicit consent from the user: some kind of annotation on the field saying that we will not perform mutation or deallocation on that field on methods taking |
This should intercept calls to
Does this run destructors?
Sounds good in const eval, but as you discovered below, this does not work if run-time code tries to dealloc (or possibly also grow) the
I don't like any of them, so I'd say, ban that. That is: const fn foo() -> String {
const S: String = "foo".to_string(); // OK
let mut s = "foo".to_string(); // OK
s.push("!"); // OK
if true {
S // OK
} else {
s // OK
}
}
fn bar() -> String {
const S: String = foo(); // OK
let s = S.clone(); // OK
if true {
S // ERROR
} else {
s // OK
}
} I think that either we make the unknown_ffi_dealloc_String(bar()); works. That is, an unknown FFI function must be able to deallocate a |
Since we can call
const FOO: () = mem::leak(String::from("foo")); would not run any destructors, but also not keep around the memory because we know there are no pointers to it anymore when const eval for FOO is done. |
Since this feature was just merged into C++20, the paper doing this would probably be useful to read as prior art: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0784r5.html |
The key requirements seem to be
I am a bit puzzled by the hypothetical part about "would we a valid core constant expression and would deallocate". @ubsan do you know what is the purpose of this? (The paper unfortunately just states a bunch of rules with no motivation.) Also, the part at the end about being "immutable" confuses me. Can't I use a constexpr to initialize a static, and then later mutate that static? Or use a constexpr to initialize a local variable and later mutate that? |
@RalfJung These are specifically for Initializing a non- |
@RalfJung These rules are for the initialization of constexpr auto foo = bar(); if
Note that the rules you quote are for non-transient allocations, that is, allocations that are created and free'd during constant evaluation and that do not escape it, e.g., constexpr int foo() {
std::vector<int> _v{1, 2, 3};
return 3;
} where the memory allocated by Transient allocations are those that scape to the caller, e.g, if That is, in constexpr vector<int> foo = alloc_vec();
static vector<int> bar = foo; the EDIT: In particular, the memory of |
Ok, so basically C++ avoids the issues we've talked about here by using copy constructors whenever it moves to a non-constexpr space. This basically is const A: String = String::new(); // Ok
const B: String = String::from("foo"); // Not OK
const C: &String = &String::from("foo"); // Ok
const D: &str = &String::from("foo"); // Ok because |
I'm not sure, maybe there is some sort of optimization that might be guaranteed to apply here in C++ that would elide this, but I don't know why Rust should do the same. To me Rust is even simpler. The key thing is separating allocations that are free'd within constant evaluation (which C++ calls transient) and allocations that escape constant evaluation (which C++ calls non-transient), which get put in read-only static memory, where this memory can be read, but it cannot be written to, nor free'd. So if const C: String = String::from("foo");
static S: String = C;
let v: String = S.clone(); during constant evaluation When we write That is, allocations that "escape" to
Creating A consequence of this is that: const C: String = String::from("foo");
static S: String = C;
static S2: String = C;
// S === S2 Here |
The problem occurs when you move to a static S: Mutex<String> = Mutex::new(C);
*S.lock() = String::new(); The old value is dropped and a new one is obtained. Now we could state that we simply forbid heap pointers in values with interior mutability, so the above static S: Mutex<Option<String>> = Mutex::new(None); is legal. This rule also is problematic, because when you have e.g. static S: Vec<Mutex<String>> = vec![Mutex::new(C)]; we have a datastructure This is the same problem we'd have with const FOO: &Vec<Mutex<String>> = &vec![Mutex::new(C)]; |
We discussed my previous comment on Discord yesterday, and the summary is that it is 100% flawed because it assumed that this was not possible in stable Rust today: struct S;
impl Drop for S {
fn drop(&mut self) {}
}
const X: S = S;
let _ = X; and also because it did not take into account moving consts into statics with interior mutability. |
Oh, so there are special global variables like this that you can only get const pointers to, or so? What would go wrong if the destructor check would not be done? The compiler can easily see all the pointers in the
Oh, I thought there'd be some magic here but this is basically what @oli-obk says, it just calls the copy constructor in the static initializer?
Well, plus it lets you write arbitrary code in a |
constexpr auto x = ...; // this variable can be used when a constant expression is needed
// it cannot be mutated
// one can only access a const lvalue which refers to x
constinit auto x = ...; // this variable is initialized at compile time
// it cannot be used when a constant expression is needed
// it can be mutated The "copy constructors" aren't magical at all - it's simply using a The thing that Rust does is kind of... extremely odd. Basically, it treats Leaking would, in theory, be valid, but I imagine they don't allow it in order to catch bugs. |
If the initializer involves non-transient allocations, @gnzlbg said above that they would become run-time allocations. How does that work, then, to initialize at compile-time a run-time allocation?
Yeah, that's a good way of viewing it. |
If the initializer involves non-transient allocations, the content of the allocation is put into the read-only static memory segment of the binary at compile-time. If you then use that to initialize a static, then the copy constructor is invoked AFAICT, which can heap allocate at run-time, and copy the memory from the static memory segment to the heap. All of this happens in "life before main". |
I'm not 100% sure about this, and it is kind of implicit in the proposal, but AFAICT there is no other way that this could work in C++ because either the copy constructor or move constructor must be invoked, and you can't "move" out of a |
But what if I use that to initialize a |
None of these proposals has been merged into the standard (the heap-allocation one has the "good to go", but there is a long way from there to being merged), and they do not consider each other. That is, the So AFAICT, when heap-allocation in constexpr functions get merged, the it will be I will ask around though. |
So from what @gnzlbg said on Zulip, it seems non-transient constexpr allocations did not make it for C++20, while transient allocations did. And indeed, there is very little concern with transient heap allocations for Rust as well, from what I can see. So how about we start with getting that done? Basically, interning/validation can check whether the pointers we are interning point to the CTFE heap, and reject the constant if they do. |
+1. Those seem very uncontroversial and deliver instant value. It makes no sense to block that on solving how to deal with non-transient allocations. |
As Ralf mentioned. Statically checking for transience is necessary for associated constants in trait declarations (assoc constants may not be evaluable immediately because they depend on other associated consts that the impl needs to define) |
So... @gnzlbg had a discussion on discord that I'm going to summarize here. The TLDR is that we believe a good solution is to have (names bikesheddable!)
Other types may (or may not) appear behind references by implementing the
Additionally values that contain no pointers to heap allocations are allowed as the final value of a constant. Our rationale is that
In order to distinguish these two types, we need to get some information from the user. The user can write unsafe impl ConstRefSafe for String {} and declare that they have read and understood the Backcompat issue 1Now one issue with this is that we'd suddenly forbid struct Foo(*mut ());
const FOO: Foo = Foo(std::ptr::null_mut()); which is perfectly sane and legal on stable Rust. The problems only happen once there are pointers to actual heap allocations or to mutable statics in the pointer field. Thus we allow any type directly in the root of a constant, as long as there are none such pointers in there. Backcompat issue 2Another issue is that struct Foo(*mut ());
const FOO: &'static Foo = &Foo(std::ptr::null_mut()); is also perfectly sane and legal on stable Rust. Basically as long as there are no heap pointers, we'll just allow any value, but if there are heap pointers, we require |
I like the idea of using a trait or two to make the programmer opt in to this explicitly! I think to follow this approach, we should figure out what exactly it the proof obligation that I think the proof obligation will be something along the lines of: the data can be placed in static memory and the entire safe API surface of this type is still fine. Basically that means there is no deallocation. However, how does this interact with whether data is placed in constant or mutable memory? |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
The We could add a |
The thing is, we'd also need |
We could make types with the (Also, I think this could be a lang initiative, which could accelerate the development) |
Might the |
Very layman's perspective here, but is there a reason the allocator can't just ignore any static segment that these allocations would exist in? Such that the free(...) impl would simply be a noop for references in that segment. Then drop could run to completion, and the normal "dealloc" code could run on any heap allocations within that type. |
AFAIK there is no existing allocator used in the real world which does that. In addition you may choose any custom allocator which doesn't need to support it. |
Current proposal/summary: #20 (comment)
Motivation
In order to totally outdo any other constant evaluators out there, it is desirable to allow things like using serde to deserialize e.g. json or toml files into constants. In order to not duplicate code between const eval and runtime, this will require types like
Vec
andString
. Otherwise every type with aString
field would either need to be generic and support&str
andString
in that field, or just outright have a mirror struct for const eval. Both ways seem too restrictive and not in the spirit of "const eval that just works".Design
Allocating and Deallocating
Allow allocating and deallocating heap inside const eval. This means
Vec
,String
,Box
* Similar to how
panic
is handled, we intercept calls to an allocator'salloc
method and never actually call that method. Instead the miri-engine runs const eval specific code for producing an allocation that "counts as heap" during const eval, but if it ends up in the final constant, it becomes an unnamed static. If it is leaked without any leftover references to it, the value simply disappears after const eval is finished. If the value is deallocated, the call todealloc
in intercepted and the miri engine removes the allocation. Pointers to dead allocations will cause a const eval error if they end up in the final constant.Final values of constants and statics
If a constant's final value were of type
String
, and the string is not empty, it would be very problematic to use such a constant:While there are a few options that could be considered, all of them are very hard to reason about and easy to get wrong. I'm listing them for completeness:
Box
We cannot ban types that contain heap allocations, because
is perfectly legal stable Rust today. While we could try to come up with a scheme that forbids types that can contain allocations inside, this is
impossiblevery hard to do.There's a dynamic way to check whether dropping the value is problematic:
Now this seems very dynamic in a way that means changing the code inside a
const impl Drop
is a breaking change if it causes any deallocations where it did not before. This also means that it's a breaking change to add any allocations to code modifying or creating such values. So ifSmallVec
(a type not heap allocating for N elements, but allocating for anything beyond that) changes theN
, that's a breaking change.But the rule would give us the best of all worlds:
More alternatives? Ideas? Code snippets to talk about?
Current proposal/summary: #20 (comment)
The text was updated successfully, but these errors were encountered: