-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can we make AllocId actually uniquely "identify" an allocation? #128775
Comments
Fun fact: DefIds are also created on the fly during deserialization. Tho they do handle their crate as you said. A DefId (or specifically the non-CrateNum part) is an interned We can reuse def ids, which is quite convenient, but may cause us to run out of identifiers if we ever end up with const heap, as DefIds assume you don't have more than 2^32 definitions. Probably not gonna be a problem in practice. Or we add a true interning scheme, mirroring the |
So the concrete integer stored in a Or is only the
Looking at the types there I see a
If we only use DefIds during interning, then only the allocations that actually reach the final value count. I hope 2^32 is enough for that... But that means a DefId alone is not enough to give us what we need, right? The crux is that when we encounter a pointer in a constant while lowering to LLVM, we need to be able to reference the pointed-to allocation with some globally unique name. If DefId are also remapped on-the-fly, we can't use that to compute the name, right? OTOH this somehow works for nested statics and I can't figure out where their names get generated... in a quick test, the name was |
The disambiguator is used for cases like fn foo() {
{
fn bar() {}
}
{
fn bar() {}
}
} where the two
Yes |
Sure, but is it also used for nested statics? Or how do those get their unique ID? It uses create_def for that which unfortunately does not have a very useful doc comment. |
Yes, |
Ah right, I see there's a name as well. It's So... that should work in general, not just for statics, right? Interning is always in the context of a |
It breaks down when you have generic constants, as you'd then also need the generic params in the name. We don't have the same issues as generic statics, so two different crates independently creating the same constant allocation, naming it the same and giving it the same data would be fine if we use the right kind of linkage. Just using the same linkage as statics would cause linker errors due to duplicate symbols. |
We also cannot create child definitions of a DefId of another crate, so evaluating a constant from a different crate would need some different logic in general |
Ah, right... we want to synthesize a monomorphic
... in the local crate. So it sounds more like we'll want to use |
yes
👍
Nah, this seems fine to me. It doesn't pollute any namespaces or anything like that, there are no concerns that I know of. |
If there already exists a symbol named |
DefIds shouldnt need to worry about naming conflicts. They do disambiguation automatically. For example, all closures are just named "{closure}". |
Ah, maybe we should also use braces then to clearly mark it as a synthesized name. |
@RalfJung this shouldn't be necessary to do explicitly. Presumably if we were to make allocations have def ids, theyd have their own rust/compiler/rustc_hir/src/definitions.rs Line 437 in 3139ff0
|
Ah, okay. For nested statics no new |
Adding a separate |
I don't now what you mean by that. That said, I was wondering one more thing -- doesn't that disambiguator number lead to potential "concurrent rustc" determinism issues? When constants get evaluated in a different order, they will get a different disambiguator number... |
I actually think that const allocs are very different than consts. Const allocations don't have generics, predicates, a param-env, or frankly most other queries that would otherwise need to be fed for const items. Unless I'm misunderstanding what the ask here is actually for. |
Hm.. That sounds generally like a problem with concurrent def id synthesis 🤔 |
Usually it's no problem as you specify a parent, and they don't tend to have many child DefIds that are created out of order (even with infinite parallelism, you currently get the same DefPathData including disambiguators). But that is entirely incidental. No one has created more DefIds of "remote" parents yet. Edit: RPITITs may actually be creating different disambiguators dpeending on query invocation order, which may be random with more than 1 thread |
Yeah and here we'd possibly create a lot of DefIds in the crate root, which would definitely end up depending on query order. I still don't understand what you mean by "using fields to differentiate nested const allocs". Fields of what? |
The variants of |
So... on the topic at hand... I think the thing we need to answer before looking for a solution is: what is the key we want to have that uniquely identifies a allocation from a constant? Options are
|
We certainly don't want to use the alloc contents as the key. My expectation was that every unique allocation that is generated during interpretation also gets a new unique global identity. We rely on query caching inside a crate, but if the same query is executed multiple times across crates it is okay for those allocations to be duplicated. That's option 3. I didn't realize option 2 is even a possibility. That is probably the semantics people expect: the same constant with the same args has a single unique value everywhere. If that can be done, that sounds amazing. What I wonder is, what happens when multiple codegen units all define the same global symbol? They will all give it the same initial value, but what does it actually do in terms of runtime address equality? I guess for now we can mark these symbols as So, anyway, to get back to your question -- option 2 sounds amazing, option 3 is what I originally was looking for. The determinism issue could possibly be resolved by using the evaluating crate plus the identity of the const (path + generics) plus a per-const disambiguation index. |
Not totally sure how we're gonna make (2.) work, though -- we probably can't store generics in a |
We don't need to. We can just keep using AllocId and deduplicate via whatever unique symbol name we come up with. Basically thr current system, but with a symbol name in the global alloc |
Ah, yeah I guess we could mangle the allocation name into a symbol and use that in |
But doesn't that require us to generate new DefIds in other crates, when we evaluate a constant from another crate? Is that possible? |
What I'm saying is ditch the Similar to |
Ah, hm... I kind of liked the idea of statics just not having an AllocId. But maybe it's not worth it. Function pointers already have a "symbol name" so it would not make a ton of sense to store another one. So yeah this field should probably only exist for Does this mean that a |
yes, just like |
The way
AllocId
works right now is super counter-intuitive: they are entirely a per-crate identifier, and when loading the metadata of another crate, we generate a fresh "localAllocId
" for each ID we encounter in the other crate and re-map everything we load. (At least I think that's what happens, @oli-obk please correct me if I am wrong.)Unfortunately this means that a
ConstValue
that holds a pointer isn't actually a "value" in the usual sense of the world: if the value is computed in one crate and then used in another crate, itsAllocId
gets re-mapped. During code generation, when we encounter such anAllocId
, we just always generate a local copy of that allocation and point to there. This means the "same"ConstValue
, codegen'd in different crates, can result in observably different values! That's extremely confusing for users and compiler devs alike (#84581, #123670). In many cases this will get de-duplicated later but we can't always rely on that.So... I'd like to consider switching how
AllocId
s work, with the goal of makingConstValue
actually be a value. This will make #121644 unnecessary: we can just evaluate the static once, store its final value, and use that in all crates without running into issues like this. This requires not re-mappingAllocId
, and instead when crate B receives aConstValue
from crate A it should be able to point to the allocation already generates by crate A. Unfortunately I am largely unfamiliar with how we manage "cross-crate identity of objects" so I don't know what the possible options here look like.Some first rough ideas that popped into my head:
AllocId
uniformly at random and fail when loading two crates that happened to get the same ID. That's fundamentally non-reproducible so either we have to make sure theseAllocId
don't matter for anything except the question whether they are equal or not (that seems hard to enforce) or we have to pick some deterministic scheme based on this. Also, curing codegen, how would we know whether the allocation has been previously already generated or whether it is our job to generate it? We'd have to keep track of whichAllocId
are "local", or so.AllocId
to store theCrateNum
of the crate that generated the allocation, and the rest to store some sort of per-crate allocation ID. I guess this still has to be remapped on load, but then during codegen when we encounter another crate's allocation we'd import it instead of generating a copy.DefId
.AllocId
outside of an interpreter session basically becomesDefId
(or a new kind of ID with the same properties). We don't even need analloc_map
intcx
any more, we just have a new kind of "definition" that represents "global allocations" and a query taking aDefId
and returning aGlobalAlloc
. (That query would mostly, if not exclusively, be computed by feeding, maybe except forstatic
s that it could evaluate directly. I guess if it is exclusively feeding it doesn't make much sense to make this a query rather than a normal hash map.)Inside the interpreter, we certainly don't want to generate a
DefId
for each allocation. I can imagine two schemes here:CrateNum
value to indicate "local interpreter instance" so that we can just make upDefIndex
es locally while the interpreter runs and still know which allocations need to be looked up where. During interning, we generate properDefId
insideLOCAL_CRATE
and remap everything we encounter.AllocId
type that we do now, but make it valid only inside an interpreter instance, and track a per-interpreter-instance mapping between globalDefId
and localAllocId
. Unfortunately this means extra work whenever we "import" a global allocation into an interpreter instance as we need to apply that mapping (and then map back during interning).The last two schemes (2 and 3) seem fairly similar, given that
DefId
is justCrateNum
+ per-crateDefIndex
. The only difference is whether there's a single shared "index" namespace for everything or a dedicated namespace for allocations. My main concern with the single shared namespace is that we'd quite like to use some bits for other purposes insideAllocId
: we want it to have a niche. We also probably need to distinguish allocations inside the current interpreter instance from "global allocations" (and do a remapping during interning), and at least inside an interpreter instance we are using some bits to track whether the pointer is derived from a shared reference and whether that shared reference had interior mutability. Option 2 could possibly entirely avoid doing any kind of mapping during interning, if we think that 2^30 total allocations are enough for every crate -- though I assume interning is already quite expensive so maybe it's not worth optimizing for that. It does seem worth optimizing for "no remapping when accessing previously interned global allocations", which excludes 3ii (which might otherwise be my favorite as it keeps everything fairly clear).@oli-obk @rust-lang/wg-const-eval any thoughts?
@compiler-errors @wesleywiser I know you're not const-eval experts but maybe you know the query system sufficiently well to provide some helpful input. :)
The text was updated successfully, but these errors were encountered: