Comparison and hashing `archetype::IdentifierRef`s by address #160

Anders429 · 2022-12-22T01:00:19Z

Currently, hashing is done by relying on the slice hash implementation: https://github.com/Anders429/brood/blob/master/src/archetype/identifier/mod.rs#L341. This eventually calls directly into hash_slice, which touches every element of the slice: https://doc.rust-lang.org/src/core/hash/mod.rs.html#237-245

This is inefficient for the case of IdentifierRef, for most use cases. Within the Archetypes map, the key is the unique IdentifierRef (see here: https://github.com/Anders429/brood/blob/master/src/archetypes/mod.rs#L80-L84), which will always be unique both value-wise and address-wise. If two IdentifierRefs in this context have different addresses, they will also have different values, invariantly. This is ensured because a new Identifier, and its subsequent IdentifierRef, will only be created if one does not already exist in the table for the given entity signature.

This, however, does not apply to serialization. For example, this test case: https://github.com/Anders429/brood/blob/master/src/archetypes/impl_serde.rs#L406 provides the same archetype identifiers for multiple archetypes, and is expected to fail because they hash to the same thing. In this case, the hashing should be done based on value; if it is done based on address, the current logic can't catch the duplicate.

It seems there are also some edge cases in the Clone implementations, but I don't have time to investigate those right now.

It would be great to change the usages that shouldn't care about the values to just pay attention to the addresses. This would be more efficient, I believe.

The text was updated successfully, but these errors were encountered:

Anders429 · 2022-12-22T05:37:25Z

The relevant part of the Clone implementation comes here:

brood/src/archetypes/mod.rs

Line 409 in 9778eb3

 if let Some(archetype) = self.get_mut(unsafe { source_archetype.identifier() }) { 

A possible solution is to have another map in Archetypes mapping from slice values to existing IdentifierRef. Then the serialization and cloning can both check that mapping.

Furthermore, if the hash implementation for IdentifierRef is changed to use pointer value, we can get rid of the dependency on by_address. I would like to have as few dependencies as possible.

Anders429 · 2022-12-22T23:36:36Z

I've got a lot of this working, but I've hit a bit of a snag with the serialization part. Hashing based on address makes the serde tests super flakey. Like, unusably so. This is because addresses are not constant between runs, unlike values, and therefore the hashing is super inconsistent and the orders of multiple archetypes in serialization is unpredictable. I can get the tests for multiple archetypes to pass each about 4.125% of the time, which makes sense as there are four archetypes to be ordered. That also means that the chance of both flakey tests passing is about 0.0017%, which is basically unusable.

Unfortunately, serde_test doesn't seem to have any way around this. An issue was opened up a few years ago, specifically for the case of testing HashMaps, but it didn't receive any proposed workarounds or solutions. As far as I can tell, that's because there isn't currently a solution built-in to serde directly. It seems some time later hashbrown switched to using fnv in its tests to work around this inconsistency: rust-lang/hashbrown#214. The use of fnv explains why this hasn't been a problem in this library prior to this change.

A possible solution is to test serialization and deserialization in these cases separately. The deserialization is straight-forward: any ordering of archetypes will result in equivalent deserializations. For serialization, we could possibly parse the Tokens directly, and when we get to the archetypes we can stick the tokens into a HashSet to determine their equality. This would be tricky, since Token does not implement Hash, but we could create our own Token and implement From<Token> and Hash on it, and convert the tokens as we go.

It's unfortunate that serde_test doesn't have any other easier way around this issue, but I can understand the desire for a simple interface. Implementing something to resolve this in a generic way would be complicated.

Anders429 · 2022-12-23T01:38:46Z

I forgot that serde_test doesn't actually expose its internal Serializer, so I can't actually access the serialized tokens directly.

The other solution is to implement my own Serializer, based off of the one serde_test uses. Looks like serde's author has acknowledged that there are limitations in the library, but (understandably) isn't interested in adding complexity to fix it: serde-rs/serde#2325 (review). This isn't the first time I've run into issues testing custom serde implementations, so maybe a new (internal to this library) serializer is the way to go.

Anders429 added C - Enhancement Category: New feature or request. S - Needs Investigation Status: Further investigation is needed. C - Performance Category: Related to performance. P - Low Priority: Not particularly urgent. A - Storage Area: Storage inside a World. labels Dec 22, 2022

Anders429 added the S - Blocked on Dependency Status: Cannot continue without a dependency implementing a feature. label Dec 25, 2022

This was referenced Jan 14, 2023

Investigate switching to serde_assert #162

Closed

Switch from serde_test to serde_assert. #163

Merged

Hash and compare archetype identifiers by address #164

Merged

Anders429 closed this as completed Jan 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparison and hashing `archetype::IdentifierRef`s by address #160

Comparison and hashing `archetype::IdentifierRef`s by address #160

Anders429 commented Dec 22, 2022

Anders429 commented Dec 22, 2022

Anders429 commented Dec 22, 2022

Anders429 commented Dec 23, 2022

Comparison and hashing archetype::IdentifierRefs by address #160

Comparison and hashing archetype::IdentifierRefs by address #160

Comments

Anders429 commented Dec 22, 2022

Anders429 commented Dec 22, 2022

Anders429 commented Dec 22, 2022

Anders429 commented Dec 23, 2022

Comparison and hashing `archetype::IdentifierRef`s by address #160

Comparison and hashing `archetype::IdentifierRef`s by address #160