Fix `Changed` docs with advantages and drawbacks #3084

djeedai · 2021-11-07T21:01:33Z

Objective

Fix the documentation of the Changed filter to detail its mutating
detection functioning, and explain the advantages and drawbacks of using
it.

Fixes #3082

Solution

Discussed on #3082, integrate remarks from @cart and clarify drawbacks.

Fix the documentation of the `Changed` filter to detail its mutating detection functioning, and explain the advantages and drawbacks of using it. Bug: bevyengine#3082

crates/bevy_ecs/src/query/filter.rs

alice-i-cecile

Some minor nits, but generally looks good.

I think this also needs a note on the (very rare) wraparound failure case before I would say it "fixes" this linked issue.

crates/bevy_ecs/src/query/filter.rs

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>

Co-authored-by: MinerSebas <66798382+MinerSebas@users.noreply.github.com>

djeedai · 2021-11-07T23:10:30Z

I think this also needs a note on the (very rare) wraparound failure case before I would say it "fixes" this linked issue.

I don't know about that part. Can you help me and describe what's happening in that case @alice-i-cecile or make a suggestion of a change? Thanks!

alice-i-cecile · 2021-11-18T07:06:45Z

So, the interesting logic is here. The basic idea is pretty simple:

We use wrapping subtraction to get a continuous ring buffer of change detection (like you might use for modelling a 24 hour clock).
We can only measure distances in the past some fraction of our storage size (u32::MAX), because otherwise the wrapping breaks.
When this happen we treat an event as having occurred "as far in the past as possible".

When this very niche edge case occurs, we surface this information in a warning.

For the sake of example, let's pretend the change was made at noon.
If we have a change-detecting system that runs at 4 pm and last ran at 11 am, we can be confident that it's seen the changes made at 10 am, but not those made at noon.

But suppose we've skipped that system for so long that it's now 3 days in the future.
Has this system already seen the changes made at noon?

We can't know for certain, since we don't store any information about how many days have passed, only the local, looping time.

The logic here is that we do know that a very long time has elapsed, because while we're not running the system every frame, we are quickly checking if it's been asleep for far too long. As a result, we can warn the users that things are likely to be messed up.

However, I'm not sure I like the current design (ping @bd). As we observed, if this occurs, the systems will continue to output junk warnings every frame: the tick will advance by one, immediately out-of-bounds-ing the value that was just set to the maximum value.

Instead, I think we should perhaps consider a way to set a Changed / Added filter as broken, and report that the first time it happens. Once a filter is broken, it will return true for every matching entity the first time the system is successfully run, and then reset back to the original behavior.

This is more direct and less spammy. The docs should still warn that this is a possibility in extreme cases, and urge users to make their systems idempotent if this is likely to occur (so then double-detecting a change or addition won't break things).

If you're feeling confident, feel free to try your hand at this change (if you want to extend this PR, please update the description to match). Otherwise, I'll try to tackle it since I know this code well, and we can merge this PR as is and leave the issue open.

DJMcNab · 2021-11-18T08:46:23Z

I agree with that suggestion

However, I think the brokenness would just be per-system, instead of per filter. But otherwise I think that is the best option.

My only concern is that the branching required could be quite expensive? I suppose all it adds is a boolean or to the hot loop, which is probably cheap enough?

Davier · 2021-11-18T10:01:38Z

The warning would be emitted at most every u32::MAX / 8 systems, not every frame.

bevy/crates/bevy_ecs/src/schedule/stage.rs

Lines 546 to 549 in 07ed1d0

 // Only check after at least `u32::MAX / 8` counts, and at most `u32::MAX / 4` counts 

 // since the max number of [System] in a [SystemStage] is limited to `u32::MAX / 8` 

 // and this function is called at the end of each [SystemStage] loop 

 const MIN_TIME_SINCE_LAST_CHECK: u32 = u32::MAX / 8;

I'd rather not slow down the hot path (all system executions) in favor of the exceptional path (a check performed every u32::MAX / 8).
Also, it should be fine to spam warnings when a system has broken change detection, since it can break invariants and induce (game logic-wise) UB. What's not fine IMO is to emit this warning on systems that do not use change detection. I meant to tackle this in a follow-up PR but completely forgot...
My plan was to have a detects_change boolean on each system that is checked before calling check_system_change_tick. Alternatively, we could build a list of change detecting systems and iterate that directly, but this seems more involved.

Even with this fix, a question remains: what to do with change-detecting systems that are disabled for a long time, for instance because they are in a State that was not entered?

alice-i-cecile · 2022-02-04T19:52:25Z

bors try

bors · 2022-02-04T20:09:05Z

try

Build succeeded:

alice-i-cecile · 2022-03-19T23:00:19Z

Closing in favor of #3956.

superdump · 2022-03-20T01:54:40Z

crates/bevy_ecs/src/query/filter.rs

+ /// query is used runs after the system(s) which mutate the component.
+ ///
+ /// To instead retrieve all components without filtering but allow querying if they changed
+ /// or not since last tick, you can use [`ChangeTrackers`](crate::query::ChangeTrackers).


Or just use Changed<T> in the query rather than the filter which will give you a bool in your query.

## Objective - ~~Make absurdly long-lived changes stay detectable for even longer (without leveling up to `u64`).~~ - Give all changes a consistent maximum lifespan. - Improve code clarity. ## Solution - ~~Increase the frequency of `check_tick` scans to increase the oldest reliably-detectable change.~~ (Deferred until we can benchmark the cost of a scan.) - Ignore changes older than the maximum reliably-detectable age. - General refactoring—name the constants, use them everywhere, and update the docs. - Update test cases to check for the specified behavior. ## Related This PR addresses (at least partially) the concerns raised in: - #3071 - #3082 (and associated PR #3084) ## Background - #1471 Given the minimum interval between `check_ticks` scans, `N`, the oldest reliably-detectable change is `u32::MAX - (2 * N - 1)` (or `MAX_CHANGE_AGE`). Reducing `N` from ~530 million (current value) to something like ~2 million would extend the lifetime of changes by a billion. | minimum `check_ticks` interval | oldest reliably-detectable change | usable % of `u32::MAX` | | --- | --- | --- | | `u32::MAX / 8` (536,870,911) | `(u32::MAX / 4) * 3` | 75.0% | | `2_000_000` | `u32::MAX - 3_999_999` | 99.9% | Similarly, changes are still allowed to be between `MAX_CHANGE_AGE`-old and `u32::MAX`-old in the interim between `check_tick` scans. While we prevent their age from overflowing, the test to detect changes still compares raw values. This makes failure ultimately unreliable, since when ancient changes stop being detected varies depending on when the next scan occurs. ## Open Question Currently, systems and system states are incorrectly initialized with their `last_change_tick` set to `0`, which doesn't handle wraparound correctly. For consistent behavior, they should either be initialized to the world's `last_change_tick` (and detect no changes) or to `MAX_CHANGE_AGE` behind the world's current `change_tick` (and detect everything as a change). I've currently gone with the latter since that was closer to the existing behavior. ## Follow-up Work (Edited: entire section) We haven't actually profiled how long a `check_ticks` scan takes on a "large" `World` , so we don't know if it's safe to increase their frequency. However, we are currently relying on play sessions not lasting long enough to trigger a scan and apps not having enough entities/archetypes for it to be "expensive" (our assumption). That isn't a real solution. (Either scanning never costs enough to impact frame times or we provide an option to use `u64` change ticks. Nobody will accept random hiccups.) To further extend the lifetime of changes, we actually only need to increment the world tick if a system has `Fetch: !ReadOnlySystemParamFetch`. The behavior will be identical because all writes are sequenced, but I'm not sure how to implement that in a way that the compiler can optimize the branch out. Also, since having no false positives depends on a `check_ticks` scan running at least every `2 * N - 1` ticks, a `last_check_tick` should also be stored in the `World` so that any lull in system execution (like a command flush) could trigger a scan if needed. To be completely robust, all the systems initialized on the world should be scanned, not just those in the current stage.

## Objective - ~~Make absurdly long-lived changes stay detectable for even longer (without leveling up to `u64`).~~ - Give all changes a consistent maximum lifespan. - Improve code clarity. ## Solution - ~~Increase the frequency of `check_tick` scans to increase the oldest reliably-detectable change.~~ (Deferred until we can benchmark the cost of a scan.) - Ignore changes older than the maximum reliably-detectable age. - General refactoring—name the constants, use them everywhere, and update the docs. - Update test cases to check for the specified behavior. ## Related This PR addresses (at least partially) the concerns raised in: - bevyengine#3071 - bevyengine#3082 (and associated PR bevyengine#3084) ## Background - bevyengine#1471 Given the minimum interval between `check_ticks` scans, `N`, the oldest reliably-detectable change is `u32::MAX - (2 * N - 1)` (or `MAX_CHANGE_AGE`). Reducing `N` from ~530 million (current value) to something like ~2 million would extend the lifetime of changes by a billion. | minimum `check_ticks` interval | oldest reliably-detectable change | usable % of `u32::MAX` | | --- | --- | --- | | `u32::MAX / 8` (536,870,911) | `(u32::MAX / 4) * 3` | 75.0% | | `2_000_000` | `u32::MAX - 3_999_999` | 99.9% | Similarly, changes are still allowed to be between `MAX_CHANGE_AGE`-old and `u32::MAX`-old in the interim between `check_tick` scans. While we prevent their age from overflowing, the test to detect changes still compares raw values. This makes failure ultimately unreliable, since when ancient changes stop being detected varies depending on when the next scan occurs. ## Open Question Currently, systems and system states are incorrectly initialized with their `last_change_tick` set to `0`, which doesn't handle wraparound correctly. For consistent behavior, they should either be initialized to the world's `last_change_tick` (and detect no changes) or to `MAX_CHANGE_AGE` behind the world's current `change_tick` (and detect everything as a change). I've currently gone with the latter since that was closer to the existing behavior. ## Follow-up Work (Edited: entire section) We haven't actually profiled how long a `check_ticks` scan takes on a "large" `World` , so we don't know if it's safe to increase their frequency. However, we are currently relying on play sessions not lasting long enough to trigger a scan and apps not having enough entities/archetypes for it to be "expensive" (our assumption). That isn't a real solution. (Either scanning never costs enough to impact frame times or we provide an option to use `u64` change ticks. Nobody will accept random hiccups.) To further extend the lifetime of changes, we actually only need to increment the world tick if a system has `Fetch: !ReadOnlySystemParamFetch`. The behavior will be identical because all writes are sequenced, but I'm not sure how to implement that in a way that the compiler can optimize the branch out. Also, since having no false positives depends on a `check_ticks` scan running at least every `2 * N - 1` ticks, a `last_check_tick` should also be stored in the `World` so that any lull in system execution (like a command flush) could trigger a scan if needed. To be completely robust, all the systems initialized on the world should be scanned, not just those in the current stage.

Fix Changed docs with advantages and drawbacks

5b599d2

Fix the documentation of the `Changed` filter to detail its mutating detection functioning, and explain the advantages and drawbacks of using it. Bug: bevyengine#3082

github-actions bot added the S-Needs-Triage This issue needs to be labelled label Nov 7, 2021

djeedai mentioned this pull request Nov 7, 2021

Changed doesn't explain any potential drawback #3082

Closed

alice-i-cecile reviewed Nov 7, 2021

View reviewed changes

crates/bevy_ecs/src/query/filter.rs Outdated Show resolved Hide resolved

Fix typo

f71aac0

alice-i-cecile reviewed Nov 7, 2021

View reviewed changes

crates/bevy_ecs/src/query/filter.rs Outdated Show resolved Hide resolved

alice-i-cecile reviewed Nov 7, 2021

View reviewed changes

alice-i-cecile added A-ECS Entities, components, systems, and events C-Docs An addition or correction to our documentation and removed S-Needs-Triage This issue needs to be labelled labels Nov 7, 2021

MinerSebas reviewed Nov 7, 2021

View reviewed changes

crates/bevy_ecs/src/query/filter.rs Outdated Show resolved Hide resolved

djeedai and others added 2 commits November 7, 2021 23:08

Apply suggestion

1dcebc3

Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>

Apply suggestion

cdf4bf0

Co-authored-by: MinerSebas <66798382+MinerSebas@users.noreply.github.com>

bors bot added a commit that referenced this pull request Feb 4, 2022

Try #3084:

9984bbe

maniwani mentioned this pull request Mar 9, 2022

[Merged by Bors] - Make change lifespan deterministic and update docs #3956

Closed

alice-i-cecile closed this Mar 19, 2022

superdump reviewed Mar 20, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `Changed` docs with advantages and drawbacks #3084

Fix `Changed` docs with advantages and drawbacks #3084

djeedai commented Nov 7, 2021

alice-i-cecile left a comment

djeedai commented Nov 7, 2021

alice-i-cecile commented Nov 18, 2021

DJMcNab commented Nov 18, 2021

Davier commented Nov 18, 2021 •

edited

Loading

alice-i-cecile commented Feb 4, 2022

bors bot commented Feb 4, 2022

alice-i-cecile commented Mar 19, 2022

superdump Mar 20, 2022

Fix Changed docs with advantages and drawbacks #3084

Fix Changed docs with advantages and drawbacks #3084

Conversation

djeedai commented Nov 7, 2021

Objective

Solution

alice-i-cecile left a comment

Choose a reason for hiding this comment

djeedai commented Nov 7, 2021

alice-i-cecile commented Nov 18, 2021

DJMcNab commented Nov 18, 2021

Davier commented Nov 18, 2021 • edited Loading

alice-i-cecile commented Feb 4, 2022

bors bot commented Feb 4, 2022

try

alice-i-cecile commented Mar 19, 2022

superdump Mar 20, 2022

Choose a reason for hiding this comment

Fix `Changed` docs with advantages and drawbacks #3084

Fix `Changed` docs with advantages and drawbacks #3084

Davier commented Nov 18, 2021 •

edited

Loading