-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create and use StableTypeId instead of TypeId #32
Comments
This is even more important now that we're using bevy_ecs instead of legion. Dynamic plugin loading will be broken until we add this. |
How would a hashing algorithm deal with collisions? Obvious question, but I don't see how this issue can be addressed without resolving it 😶 |
If you are discussing hash collisions where "a" and "b" hash to the same thing, i think we can choose one that is effectively "collision free" for the type names we would be using. The world runs on hashing strings. For the case where there are two types that have the same "type name": I don't think thats a problem in practice. Within a crate there cannot be collisions. Across crates, crates.io enforces uniqueness. I'd love your thoughts on this! |
I'm concerned about unsafe code making assumptions that any given I think Rust's safety is best thought of as a binary property, and wouldn't like a solution that just made it really unlikely that you manage to write two types with the same hash (which could then proceed to cause UB). Especially as the algorithm would be public, and crates.io is an amazing attack vector. That said, I could be completely unaware of some way of proving an algorithm "collision free" for type names, or there really is a way of verifying the hashes. |
Back when bevy used a legion fork, i solved the type id problem with
I think the cost/benefit here is quite low. The cost is "very very very unlikely collisions" and the benefit is dynamic plugin loading and no additional runtime cost. On the "malicious code" side, if you are taking a dependency on something, that is already trusted code. We will never be able to protect against malicious dependencies. |
What do you mean by that? There's a lot that can be done to defend from dangerous code, and Rust is an effort to empower developers in that sense. As far as I'm aware, there shouldn't be any way for a A soundness hole allows those otherwise sandboxed dependencies to trigger UB, and get up to whatever they like. Instead, could this be done with a derive macro? The uniqueness of the IDs could then be verified at compile time, and compilation can fail if a collision is found. I do understand why you don't consider a priority - the chances are incredibly low. It's just a disappointing sacrifice to make (Side note: this flaw would mean that properly reviewing dependencies for safety would require a manual verification step anyway since a human can't know that a name is safe for their project) |
I actually wasn't thinking about the "Sandboxed dependencies" is a nice feature thats may be worth pursuing, but Bevy's primary goals are not "running untrusted dependencies" or even "memory safety". Usability and productivity will take priority over almost everything. I don't think the concerns you have will be encountered in the "common case" (or even the top 99% of cases) and the cost of resolving them (appears to be) either major runtime overhead or major ergonomics costs. I do want to resolve these problems, but ideally we find a way to do it that doesn't require high run time costs, macros, or user facing "boilerplate". For example, a blunt force tool would be to add an opt-in cargo feature that uses TypeIds instead of StableTypeId. The cost there is that you wouldn't be able to use dynamic plugin loading. Another option is exploring the "compile time dupe detection", but im afraid that breaks down in a world with dynamic dependency loading. "Run time dupe detection" seems reasonable though. |
It'd be a shame to have to use some kind of boilerplate. The ease of use of components at the moment is just brilliant. And given a decent hashing algorithm, I'd be surprised if this issue ever actually mattered. However much I like the idea, you're probably right that it isn't worth it. Could this decision be added to the documentation if a better solution isn't found? I think it's only fair to users to make the choice visible, since it does contradict the technical meaning of the safe API. Hopefully it'd also give traction to an RFC for a real |
fyi, different versions of a single crate can be included in a dependency tree, so even without plugin loading I believe you can have multiple types with the same one solution to speed up comparison + hashing would be to have a global type name interner at run time. it's not going to be written often so it doesn't need to be particularly fancy. it wouldn't speed up serialization, though. it also doesn't solve the uniqueness issue. you could potentially reduce the chance of conflicts by checking oh, alternative strategy: you could add something to the impl Properties for $T {
fn unique_name() -> &'static str {
concat!(env!("CARGO_PKG_NAME"), "@", env!("CARGO_PKG_VERSION"), "::", std::module_path!(), "::$T"))
}
} or whatever, and then require that all types that are serialized / pass through plugins implement Property. Would just need to make sure that wouldn't slow down incremental compiles. |
I agree with @kazimuth. The docs for |
Relevant to this discussion is #142. In order to have dynamically registered components, |
I opened a forum topic to find out if there is a way, without using specialization, to automatically implement a Unfortunately I think its the exact definition of the use-case for specialization: Any way to do this without specialization?. |
I'm thinking we could do something like this (pseudo-code):
|
Oh, OK. That works. 😄 👍 I think that's easy, then. What do you think of renaming it Also I think it is a good idea to add some of the Cargo env vars to the I'll try to implement this. |
I'd love that! I think "approximately unique" is fine, especially while we are in the experimentation phase. We will soon learn if the problems stated above are actual problems in practice. At the end of the day, I think a good user experience requires a stable version of the TypeId interface. How we generate that can change, but we need to start somewhere. I think I would prefer to keep the name a generic Also definitely try to make the |
I also do like adding the cargo metadata to the path to help with ambiguity (provided it doesnt prevent us from const hashing). |
So btw, using the cargo metadata will require a procedural macro (since Since this will force |
Actually we can just use the
How would the macro check for collisions? I suppose if the procedural macro would log every id to a file as it was compiling hashing out the ids and then checked to make sure every id that it hashed was not one that was previously hashed maybe? I think I'm going to start just by keeping it simple and using |
Oh, well, unfortunately, Any ideas? |
Have you considered a UUID as type ID as per https://docs.rs/type-uuid/0.1.2/type_uuid/ ? |
Yeah, I'm thinking that to get all the features we want we are just going to need to require a Question: do we need to change the At this point it seems like the Maybe something like this: #[derive(Component)]
#[component(generation = 1)]
struct Mycomponent; When you derive component the proc macro will create or update a file in you CARGO_DIR called |
I've considered type_uuid, but I have a strong preference to make things work without additional derives (especially ones that require users to generate UUIDs. theres a lot of friction there). UUIDs are also not human readable, which means if we use them in Scenes that basically eliminates the "html-like" scene composition we're shooting for here: https://gist.github.com/cart/3e77d6537e1a0979a69de5c6749b6bcb. The "moving components" problem is resolvable if we use "short names" by default (which we currently do) and "long names" to resolve ambiguity. To do this properly and avoid accidental name stomping we need runtime collision checks that force developers to resolve ambiguity. I'm fine with adding optional UUID support for folks that need explicit uniqueness and are willing to deal with the overhead, but in my head the cure is worse than the disease for the common case. I will likely be pretty stubborn here, but I'll try to be as reasonable as I can be. Imo if rust paths are suitable for importing types in the rust language (and trusting that those imports are what we want them to be), then they should be suitable for "importing" things into a game's "component namespace". |
@zicklag's "component generation + file" idea is very clever and worth considering, but I'm still biased toward making paths work. |
I tried the Stable TypeId in my organization Kokoro, and while it might not be ideal yet, it worked and performed well. |
How do you like the idea of creating a TypeId not only from the name and path, but also from fields? |
Yes, that is the implementation of stable typeid. |
Hello, an off-topic question, why did github automatically mention my issue? When did I add the comment there? |
How do you like the idea of storing not only the TypeId but also the TypeId type in the StableTypeId? Because if someone wants to use bevy in Java, then the question arises: how to synchronize the TypeId from Java and the TypeId from Rust so that there are as few conflicts as possible, and just It would be possible to indicate which TypeId is used, for Rust, for example, 0, and for Java 1 (or whichever user wants)?
|
TL;DR: stable type resolution might be impossible to make use-case agnostic, opt for a customizable (possible with defaults) and use-case-specific approach. I think a stable type id is a fundamentally ill-based concept, because it relies on a nebulous at best, undefinable at worst notion of type identity. use std::any::*;
mod new_module {
pub struct A { _foo: u32 }
}
struct A { _foo: u32 }
fn main() {
assert_ne!(TypeId::of::<A>(), TypeId::of::<new_module::A>());
} On a programming language level, type identity is well defined! The above snippet compiles and runs perfectly. Now, the given code isn't very interesting: The two types are indeed distinct! But now consider that code is malleable and will change over time. The same type might move from one module two another, possibly across crate boundaries (consider #13945). But the previous sentence defined a notion of type identity? Isn't that contradictory to what I first stated? No! In fact, there are many ways to define type identity.
And there are lots more, like compositons or arbitrary relations (think subtyping, which - when reasoning - can have non-obvious "is type" relations).
I assume the discussed derive macro composes "maybe locational" and structural reasoning. "maybe locational" because if it's anything like All this put together, there might be no good way to uniquely identify a type for multiple different purposes. If this is just for scenes (I assume plugin loading is not important, after all: It was deprecated), that's kinda similar to the serde model of reason: Just try to make it fit somehow (including dynamicly structured value (read: |
Why not just use TypeUuid? Yes, you need to specify the type in the code itself, but it is a universally unique identifier. Then, if some other way of identifying a subset of types is needed, a mapping could be made, like the type path and such. |
After pondering about this more, I suggest a following scheme: |
Drafted an RFC based on those thoughts: bevyengine/rfcs#83 |
Currently this doesn't work. The TypeIds of things in different binaries are different, so all the bevy reflecty things break. Depends on: - bevyengine/bevy#13080 - bevyengine/bevy#8390 - bevyengine/bevy#32
@BERADQ I gave implementing stable id in a PR a shot, and it seems like there are 2 issues: |
this is probably false on further reflection, but the only example provided was a derive and not a regular macro for passing in primitives and std types. |
I have updated the derive macro to actually work for tuple structs. I will continue working on this throughout the week and see what I run into.
needless to say... impl StableID for () {
const _STABLE_ID: &'static StableId = &StableId(0);
} is unbelieveably unsafe and stupid. this will not be in the final implementation. |
this is better for those pesky macro_rules! impl_with_type_name {
($name:ident) => {
impl StableAny for $name {
fn stable_id(&self) -> &'static StableId
where
Self: Sized,
{
$name::_STABLE_ID
}
}
impl StableID for $name {
const _STABLE_ID: &'static StableId =
&StableId(fnv1a_hash_64(type_name::<$name>().as_bytes(), None));
}
};
}
impl_with_type_name!(bool);
impl_with_type_name!(char);
impl_with_type_name!(u8);
impl_with_type_name!(u16);
impl_with_type_name!(u32);
impl_with_type_name!(u64);
impl_with_type_name!(u128);
impl_with_type_name!(usize);
impl_with_type_name!(i8); |
@Lubba-64 relying on |
I did not consider methods. thats going to be an issue. How would we go about primitives then? How would we go about STD types? we need their syntax tree to fulfill these requirements... |
Looks like we're gonna need to hash feature flags as well? |
Ideally we would run the derive macro @BERADQ created (or an analog) directly on the std and foreign types we need to implement
According to bjorn on discord:
Bevy has a finite number of std and foreign types. for those types, I cannot move forward without doing the best I can with what is possible currently. apparently we cannot check features or versions of dependencies without going into a The current state of affairs is quite unacceptable. I can understand why this is the second oldest issue now. I am probably going to finish my branch's implementation because this is a hard requirement for multiple projects I am working on, and I think the additional unsoundness is something I'll have to put up with until the situation improves in my fork. |
Actually I can add |
If I could concat const strings with const fn generics! which you cannot! |
This is not true, you can compile the same bevy version with both a different rustc and different dependencies versions. |
Good to know 🤦 |
Someone would probably have to do something pretty messy to get this to work but I believe it may be possible. |
For plugins really the only thing guaranteed to work is TypeId + ensuring that you use the exact same bevy dylib (and dylib for whatever dependencies and game specific types you have that you want to use in your plugin) in the plugin as in the main game executable. |
I personally haven't felt the need for a "stable type id" in awhile. Iirc I created this issue back when I was first doing dynamic linking and I noticed the TypeIds didn't match across binaries. Turns out that was a sign that the binaries weren't compatible in the first place (see all of @bjorn3's messages). In the context of scenes and networking I'm totally cool with using type path to establish identity. In both of those cases to save space we can exchange full type paths for short temporary ids. Ex: within the context of a specific scene file, create the mapping
In theory this can have correctness issues across binaries. But in practice (1) in the cross-binary context we can / should / do rely on TypeId when working with actual references to the type and (2) for scene / networking contexts, the mismatch is still "sound" because we are (attempting) to translate that "other" context to the current runtime source of truth. If there is a mismatch, there will likely be a runtime error of some kind (ex: because a field is missing / one was provided that doesn't exist). In theory TypeUuid is the "better" fix here. But imo the UX cost of defining that everywhere is not worth it. People defining types in Rust already largely rely on TypePath for identity in practice, and notably we get it for free. Using TypePath in things like scenes and networking also has the benefit of being self-documenting. I'm closing this out to prevent us from trying to tackle something that I don't think is the right path forward. Feel free to keep discussing here / make the case to reopen if you have different opinions. |
substitute after preprocessing
std::any::TypeId
is not stable across binaries. This makes it unsuitable for use in scene files, networking, or dynamic plugins.In the short term we get around this by just using
std::any::type_name
. But this isn't particularly efficient for certain serialization cases andEq
/Hash
traits. Legion uses TypeIds extensively and we've (in the short term) replaced those withtype_name
to allow for dynamic plugin loading. But that probably incurs measurable overhead.Its worth exploring the idea of a
StableTypeId
, which is just a wrapped integer. I think it makes sense to use a const-hashing algorithm on type_name to produceStableTypeIds
The text was updated successfully, but these errors were encountered: