Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison and hashing archetype::IdentifierRefs by address #160

Closed
Anders429 opened this issue Dec 22, 2022 · 3 comments
Closed

Comparison and hashing archetype::IdentifierRefs by address #160

Anders429 opened this issue Dec 22, 2022 · 3 comments
Labels
A - Storage Area: Storage inside a World. C - Enhancement Category: New feature or request. C - Performance Category: Related to performance. P - Low Priority: Not particularly urgent. S - Blocked on Dependency Status: Cannot continue without a dependency implementing a feature. S - Needs Investigation Status: Further investigation is needed.

Comments

@Anders429
Copy link
Owner

Currently, hashing is done by relying on the slice hash implementation: https://github.com/Anders429/brood/blob/master/src/archetype/identifier/mod.rs#L341. This eventually calls directly into hash_slice, which touches every element of the slice: https://doc.rust-lang.org/src/core/hash/mod.rs.html#237-245

This is inefficient for the case of IdentifierRef, for most use cases. Within the Archetypes map, the key is the unique IdentifierRef (see here: https://github.com/Anders429/brood/blob/master/src/archetypes/mod.rs#L80-L84), which will always be unique both value-wise and address-wise. If two IdentifierRefs in this context have different addresses, they will also have different values, invariantly. This is ensured because a new Identifier, and its subsequent IdentifierRef, will only be created if one does not already exist in the table for the given entity signature.

This, however, does not apply to serialization. For example, this test case: https://github.com/Anders429/brood/blob/master/src/archetypes/impl_serde.rs#L406 provides the same archetype identifiers for multiple archetypes, and is expected to fail because they hash to the same thing. In this case, the hashing should be done based on value; if it is done based on address, the current logic can't catch the duplicate.

It seems there are also some edge cases in the Clone implementations, but I don't have time to investigate those right now.

It would be great to change the usages that shouldn't care about the values to just pay attention to the addresses. This would be more efficient, I believe.

@Anders429 Anders429 added C - Enhancement Category: New feature or request. S - Needs Investigation Status: Further investigation is needed. C - Performance Category: Related to performance. P - Low Priority: Not particularly urgent. A - Storage Area: Storage inside a World. labels Dec 22, 2022
@Anders429
Copy link
Owner Author

The relevant part of the Clone implementation comes here:

if let Some(archetype) = self.get_mut(unsafe { source_archetype.identifier() }) {

A possible solution is to have another map in Archetypes mapping from slice values to existing IdentifierRef. Then the serialization and cloning can both check that mapping.

Furthermore, if the hash implementation for IdentifierRef is changed to use pointer value, we can get rid of the dependency on by_address. I would like to have as few dependencies as possible.

@Anders429
Copy link
Owner Author

I've got a lot of this working, but I've hit a bit of a snag with the serialization part. Hashing based on address makes the serde tests super flakey. Like, unusably so. This is because addresses are not constant between runs, unlike values, and therefore the hashing is super inconsistent and the orders of multiple archetypes in serialization is unpredictable. I can get the tests for multiple archetypes to pass each about 4.125% of the time, which makes sense as there are four archetypes to be ordered. That also means that the chance of both flakey tests passing is about 0.0017%, which is basically unusable.

Unfortunately, serde_test doesn't seem to have any way around this. An issue was opened up a few years ago, specifically for the case of testing HashMaps, but it didn't receive any proposed workarounds or solutions. As far as I can tell, that's because there isn't currently a solution built-in to serde directly. It seems some time later hashbrown switched to using fnv in its tests to work around this inconsistency: rust-lang/hashbrown#214. The use of fnv explains why this hasn't been a problem in this library prior to this change.

A possible solution is to test serialization and deserialization in these cases separately. The deserialization is straight-forward: any ordering of archetypes will result in equivalent deserializations. For serialization, we could possibly parse the Tokens directly, and when we get to the archetypes we can stick the tokens into a HashSet to determine their equality. This would be tricky, since Token does not implement Hash, but we could create our own Token and implement From<Token> and Hash on it, and convert the tokens as we go.

It's unfortunate that serde_test doesn't have any other easier way around this issue, but I can understand the desire for a simple interface. Implementing something to resolve this in a generic way would be complicated.

@Anders429
Copy link
Owner Author

I forgot that serde_test doesn't actually expose its internal Serializer, so I can't actually access the serialized tokens directly.

The other solution is to implement my own Serializer, based off of the one serde_test uses. Looks like serde's author has acknowledged that there are limitations in the library, but (understandably) isn't interested in adding complexity to fix it: serde-rs/serde#2325 (review). This isn't the first time I've run into issues testing custom serde implementations, so maybe a new (internal to this library) serializer is the way to go.

@Anders429 Anders429 added the S - Blocked on Dependency Status: Cannot continue without a dependency implementing a feature. label Dec 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A - Storage Area: Storage inside a World. C - Enhancement Category: New feature or request. C - Performance Category: Related to performance. P - Low Priority: Not particularly urgent. S - Blocked on Dependency Status: Cannot continue without a dependency implementing a feature. S - Needs Investigation Status: Further investigation is needed.
Projects
None yet
Development

No branches or pull requests

1 participant