Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Changed docs with advantages and drawbacks #3084

Closed
wants to merge 4 commits into from

Conversation

djeedai
Copy link
Contributor

@djeedai djeedai commented Nov 7, 2021

Objective

Fix the documentation of the Changed filter to detail its mutating
detection functioning, and explain the advantages and drawbacks of using
it.

Fixes #3082

Solution

Discussed on #3082, integrate remarks from @cart and clarify drawbacks.

Fix the documentation of the `Changed` filter to detail its mutating
detection functioning, and explain the advantages and drawbacks of using
it.

Bug: bevyengine#3082
@github-actions github-actions bot added the S-Needs-Triage This issue needs to be labelled label Nov 7, 2021
Copy link
Member

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor nits, but generally looks good.

I think this also needs a note on the (very rare) wraparound failure case before I would say it "fixes" this linked issue.

@alice-i-cecile alice-i-cecile added A-ECS Entities, components, systems, and events C-Docs An addition or correction to our documentation and removed S-Needs-Triage This issue needs to be labelled labels Nov 7, 2021
djeedai and others added 2 commits November 7, 2021 23:08
Co-authored-by: Alice Cecile <alice.i.cecile@gmail.com>
Co-authored-by: MinerSebas <66798382+MinerSebas@users.noreply.github.com>
@djeedai
Copy link
Contributor Author

djeedai commented Nov 7, 2021

I think this also needs a note on the (very rare) wraparound failure case before I would say it "fixes" this linked issue.

I don't know about that part. Can you help me and describe what's happening in that case @alice-i-cecile or make a suggestion of a change? Thanks!

@alice-i-cecile
Copy link
Member

So, the interesting logic is here. The basic idea is pretty simple:

  1. We use wrapping subtraction to get a continuous ring buffer of change detection (like you might use for modelling a 24 hour clock).
  2. We can only measure distances in the past some fraction of our storage size (u32::MAX), because otherwise the wrapping breaks.
  3. When this happen we treat an event as having occurred "as far in the past as possible".

When this very niche edge case occurs, we surface this information in a warning.

For the sake of example, let's pretend the change was made at noon.
If we have a change-detecting system that runs at 4 pm and last ran at 11 am, we can be confident that it's seen the changes made at 10 am, but not those made at noon.

But suppose we've skipped that system for so long that it's now 3 days in the future.
Has this system already seen the changes made at noon?

We can't know for certain, since we don't store any information about how many days have passed, only the local, looping time.

The logic here is that we do know that a very long time has elapsed, because while we're not running the system every frame, we are quickly checking if it's been asleep for far too long. As a result, we can warn the users that things are likely to be messed up.

However, I'm not sure I like the current design (ping @bd). As we observed, if this occurs, the systems will continue to output junk warnings every frame: the tick will advance by one, immediately out-of-bounds-ing the value that was just set to the maximum value.

Instead, I think we should perhaps consider a way to set a Changed / Added filter as broken, and report that the first time it happens. Once a filter is broken, it will return true for every matching entity the first time the system is successfully run, and then reset back to the original behavior.

This is more direct and less spammy. The docs should still warn that this is a possibility in extreme cases, and urge users to make their systems idempotent if this is likely to occur (so then double-detecting a change or addition won't break things).

If you're feeling confident, feel free to try your hand at this change (if you want to extend this PR, please update the description to match). Otherwise, I'll try to tackle it since I know this code well, and we can merge this PR as is and leave the issue open.

@DJMcNab
Copy link
Member

DJMcNab commented Nov 18, 2021

I agree with that suggestion

However, I think the brokenness would just be per-system, instead of per filter. But otherwise I think that is the best option.

My only concern is that the branching required could be quite expensive? I suppose all it adds is a boolean or to the hot loop, which is probably cheap enough?

@Davier
Copy link
Contributor

Davier commented Nov 18, 2021

The warning would be emitted at most every u32::MAX / 8 systems, not every frame.

// Only check after at least `u32::MAX / 8` counts, and at most `u32::MAX / 4` counts
// since the max number of [System] in a [SystemStage] is limited to `u32::MAX / 8`
// and this function is called at the end of each [SystemStage] loop
const MIN_TIME_SINCE_LAST_CHECK: u32 = u32::MAX / 8;

I'd rather not slow down the hot path (all system executions) in favor of the exceptional path (a check performed every u32::MAX / 8).
Also, it should be fine to spam warnings when a system has broken change detection, since it can break invariants and induce (game logic-wise) UB. What's not fine IMO is to emit this warning on systems that do not use change detection. I meant to tackle this in a follow-up PR but completely forgot...
My plan was to have a detects_change boolean on each system that is checked before calling check_system_change_tick. Alternatively, we could build a list of change detecting systems and iterate that directly, but this seems more involved.

Even with this fix, a question remains: what to do with change-detecting systems that are disabled for a long time, for instance because they are in a State that was not entered?

@alice-i-cecile
Copy link
Member

bors try

bors bot added a commit that referenced this pull request Feb 4, 2022
@alice-i-cecile
Copy link
Member

Closing in favor of #3956.

/// query is used runs after the system(s) which mutate the component.
///
/// To instead retrieve all components without filtering but allow querying if they changed
/// or not since last tick, you can use [`ChangeTrackers`](crate::query::ChangeTrackers).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just use Changed<T> in the query rather than the filter which will give you a bool in your query.

bors bot pushed a commit that referenced this pull request May 9, 2022
## Objective

- ~~Make absurdly long-lived changes stay detectable for even longer (without leveling up to `u64`).~~
- Give all changes a consistent maximum lifespan.
- Improve code clarity.

## Solution

- ~~Increase the frequency of `check_tick` scans to increase the oldest reliably-detectable change.~~
(Deferred until we can benchmark the cost of a scan.)
- Ignore changes older than the maximum reliably-detectable age.
- General refactoring—name the constants, use them everywhere, and update the docs.
- Update test cases to check for the specified behavior.

## Related

This PR addresses (at least partially) the concerns raised in:

- #3071
- #3082 (and associated PR #3084)

## Background

- #1471

Given the minimum interval between `check_ticks` scans, `N`, the oldest reliably-detectable change is `u32::MAX - (2 * N - 1)` (or `MAX_CHANGE_AGE`). Reducing `N` from ~530 million (current value) to something like ~2 million would extend the lifetime of changes by a billion.

| minimum `check_ticks` interval | oldest reliably-detectable change  | usable % of `u32::MAX` |
| --- | --- | --- |
| `u32::MAX / 8`  (536,870,911) | `(u32::MAX / 4) * 3` | 75.0% |
| `2_000_000` | `u32::MAX - 3_999_999` | 99.9% |

Similarly, changes are still allowed to be between `MAX_CHANGE_AGE`-old and `u32::MAX`-old in the interim between `check_tick` scans. While we prevent their age from overflowing, the test to detect changes still compares raw values. This makes failure ultimately unreliable, since when ancient changes stop being detected varies depending on when the next scan occurs.

## Open Question

Currently, systems and system states are incorrectly initialized with their `last_change_tick` set to `0`, which doesn't handle wraparound correctly.

For consistent behavior, they should either be initialized to the world's `last_change_tick` (and detect no changes) or to `MAX_CHANGE_AGE` behind the world's current `change_tick` (and detect everything as a change). I've currently gone with the latter since that was closer to the existing behavior.

## Follow-up Work

(Edited: entire section)

We haven't actually profiled how long a `check_ticks` scan takes on a "large" `World` , so we don't know if it's safe to increase their frequency. However, we are currently relying on play sessions not lasting long enough to trigger a scan and apps not having enough entities/archetypes for it to be "expensive" (our assumption). That isn't a real solution. (Either scanning never costs enough to impact frame times or we provide an option to use `u64` change ticks. Nobody will accept random hiccups.)

To further extend the lifetime of changes, we actually only need to increment the world tick if a system has `Fetch: !ReadOnlySystemParamFetch`. The behavior will be identical because all writes are sequenced, but I'm not sure how to implement that in a way that the compiler can optimize the branch out.

Also, since having no false positives depends on a `check_ticks` scan running at least every `2 * N - 1` ticks, a `last_check_tick` should also be stored in the `World` so that any lull in system execution (like a command flush) could trigger a scan if needed. To be completely robust, all the systems initialized on the world should be scanned, not just those in the current stage.
bors bot pushed a commit that referenced this pull request May 9, 2022
## Objective

- ~~Make absurdly long-lived changes stay detectable for even longer (without leveling up to `u64`).~~
- Give all changes a consistent maximum lifespan.
- Improve code clarity.

## Solution

- ~~Increase the frequency of `check_tick` scans to increase the oldest reliably-detectable change.~~
(Deferred until we can benchmark the cost of a scan.)
- Ignore changes older than the maximum reliably-detectable age.
- General refactoring—name the constants, use them everywhere, and update the docs.
- Update test cases to check for the specified behavior.

## Related

This PR addresses (at least partially) the concerns raised in:

- #3071
- #3082 (and associated PR #3084)

## Background

- #1471

Given the minimum interval between `check_ticks` scans, `N`, the oldest reliably-detectable change is `u32::MAX - (2 * N - 1)` (or `MAX_CHANGE_AGE`). Reducing `N` from ~530 million (current value) to something like ~2 million would extend the lifetime of changes by a billion.

| minimum `check_ticks` interval | oldest reliably-detectable change  | usable % of `u32::MAX` |
| --- | --- | --- |
| `u32::MAX / 8`  (536,870,911) | `(u32::MAX / 4) * 3` | 75.0% |
| `2_000_000` | `u32::MAX - 3_999_999` | 99.9% |

Similarly, changes are still allowed to be between `MAX_CHANGE_AGE`-old and `u32::MAX`-old in the interim between `check_tick` scans. While we prevent their age from overflowing, the test to detect changes still compares raw values. This makes failure ultimately unreliable, since when ancient changes stop being detected varies depending on when the next scan occurs.

## Open Question

Currently, systems and system states are incorrectly initialized with their `last_change_tick` set to `0`, which doesn't handle wraparound correctly.

For consistent behavior, they should either be initialized to the world's `last_change_tick` (and detect no changes) or to `MAX_CHANGE_AGE` behind the world's current `change_tick` (and detect everything as a change). I've currently gone with the latter since that was closer to the existing behavior.

## Follow-up Work

(Edited: entire section)

We haven't actually profiled how long a `check_ticks` scan takes on a "large" `World` , so we don't know if it's safe to increase their frequency. However, we are currently relying on play sessions not lasting long enough to trigger a scan and apps not having enough entities/archetypes for it to be "expensive" (our assumption). That isn't a real solution. (Either scanning never costs enough to impact frame times or we provide an option to use `u64` change ticks. Nobody will accept random hiccups.)

To further extend the lifetime of changes, we actually only need to increment the world tick if a system has `Fetch: !ReadOnlySystemParamFetch`. The behavior will be identical because all writes are sequenced, but I'm not sure how to implement that in a way that the compiler can optimize the branch out.

Also, since having no false positives depends on a `check_ticks` scan running at least every `2 * N - 1` ticks, a `last_check_tick` should also be stored in the `World` so that any lull in system execution (like a command flush) could trigger a scan if needed. To be completely robust, all the systems initialized on the world should be scanned, not just those in the current stage.
robtfm pushed a commit to robtfm/bevy that referenced this pull request May 10, 2022
## Objective

- ~~Make absurdly long-lived changes stay detectable for even longer (without leveling up to `u64`).~~
- Give all changes a consistent maximum lifespan.
- Improve code clarity.

## Solution

- ~~Increase the frequency of `check_tick` scans to increase the oldest reliably-detectable change.~~
(Deferred until we can benchmark the cost of a scan.)
- Ignore changes older than the maximum reliably-detectable age.
- General refactoring—name the constants, use them everywhere, and update the docs.
- Update test cases to check for the specified behavior.

## Related

This PR addresses (at least partially) the concerns raised in:

- bevyengine#3071
- bevyengine#3082 (and associated PR bevyengine#3084)

## Background

- bevyengine#1471

Given the minimum interval between `check_ticks` scans, `N`, the oldest reliably-detectable change is `u32::MAX - (2 * N - 1)` (or `MAX_CHANGE_AGE`). Reducing `N` from ~530 million (current value) to something like ~2 million would extend the lifetime of changes by a billion.

| minimum `check_ticks` interval | oldest reliably-detectable change  | usable % of `u32::MAX` |
| --- | --- | --- |
| `u32::MAX / 8`  (536,870,911) | `(u32::MAX / 4) * 3` | 75.0% |
| `2_000_000` | `u32::MAX - 3_999_999` | 99.9% |

Similarly, changes are still allowed to be between `MAX_CHANGE_AGE`-old and `u32::MAX`-old in the interim between `check_tick` scans. While we prevent their age from overflowing, the test to detect changes still compares raw values. This makes failure ultimately unreliable, since when ancient changes stop being detected varies depending on when the next scan occurs.

## Open Question

Currently, systems and system states are incorrectly initialized with their `last_change_tick` set to `0`, which doesn't handle wraparound correctly.

For consistent behavior, they should either be initialized to the world's `last_change_tick` (and detect no changes) or to `MAX_CHANGE_AGE` behind the world's current `change_tick` (and detect everything as a change). I've currently gone with the latter since that was closer to the existing behavior.

## Follow-up Work

(Edited: entire section)

We haven't actually profiled how long a `check_ticks` scan takes on a "large" `World` , so we don't know if it's safe to increase their frequency. However, we are currently relying on play sessions not lasting long enough to trigger a scan and apps not having enough entities/archetypes for it to be "expensive" (our assumption). That isn't a real solution. (Either scanning never costs enough to impact frame times or we provide an option to use `u64` change ticks. Nobody will accept random hiccups.)

To further extend the lifetime of changes, we actually only need to increment the world tick if a system has `Fetch: !ReadOnlySystemParamFetch`. The behavior will be identical because all writes are sequenced, but I'm not sure how to implement that in a way that the compiler can optimize the branch out.

Also, since having no false positives depends on a `check_ticks` scan running at least every `2 * N - 1` ticks, a `last_check_tick` should also be stored in the `World` so that any lull in system execution (like a command flush) could trigger a scan if needed. To be completely robust, all the systems initialized on the world should be scanned, not just those in the current stage.
exjam pushed a commit to exjam/bevy that referenced this pull request May 22, 2022
## Objective

- ~~Make absurdly long-lived changes stay detectable for even longer (without leveling up to `u64`).~~
- Give all changes a consistent maximum lifespan.
- Improve code clarity.

## Solution

- ~~Increase the frequency of `check_tick` scans to increase the oldest reliably-detectable change.~~
(Deferred until we can benchmark the cost of a scan.)
- Ignore changes older than the maximum reliably-detectable age.
- General refactoring—name the constants, use them everywhere, and update the docs.
- Update test cases to check for the specified behavior.

## Related

This PR addresses (at least partially) the concerns raised in:

- bevyengine#3071
- bevyengine#3082 (and associated PR bevyengine#3084)

## Background

- bevyengine#1471

Given the minimum interval between `check_ticks` scans, `N`, the oldest reliably-detectable change is `u32::MAX - (2 * N - 1)` (or `MAX_CHANGE_AGE`). Reducing `N` from ~530 million (current value) to something like ~2 million would extend the lifetime of changes by a billion.

| minimum `check_ticks` interval | oldest reliably-detectable change  | usable % of `u32::MAX` |
| --- | --- | --- |
| `u32::MAX / 8`  (536,870,911) | `(u32::MAX / 4) * 3` | 75.0% |
| `2_000_000` | `u32::MAX - 3_999_999` | 99.9% |

Similarly, changes are still allowed to be between `MAX_CHANGE_AGE`-old and `u32::MAX`-old in the interim between `check_tick` scans. While we prevent their age from overflowing, the test to detect changes still compares raw values. This makes failure ultimately unreliable, since when ancient changes stop being detected varies depending on when the next scan occurs.

## Open Question

Currently, systems and system states are incorrectly initialized with their `last_change_tick` set to `0`, which doesn't handle wraparound correctly.

For consistent behavior, they should either be initialized to the world's `last_change_tick` (and detect no changes) or to `MAX_CHANGE_AGE` behind the world's current `change_tick` (and detect everything as a change). I've currently gone with the latter since that was closer to the existing behavior.

## Follow-up Work

(Edited: entire section)

We haven't actually profiled how long a `check_ticks` scan takes on a "large" `World` , so we don't know if it's safe to increase their frequency. However, we are currently relying on play sessions not lasting long enough to trigger a scan and apps not having enough entities/archetypes for it to be "expensive" (our assumption). That isn't a real solution. (Either scanning never costs enough to impact frame times or we provide an option to use `u64` change ticks. Nobody will accept random hiccups.)

To further extend the lifetime of changes, we actually only need to increment the world tick if a system has `Fetch: !ReadOnlySystemParamFetch`. The behavior will be identical because all writes are sequenced, but I'm not sure how to implement that in a way that the compiler can optimize the branch out.

Also, since having no false positives depends on a `check_ticks` scan running at least every `2 * N - 1` ticks, a `last_check_tick` should also be stored in the `World` so that any lull in system execution (like a command flush) could trigger a scan if needed. To be completely robust, all the systems initialized on the world should be scanned, not just those in the current stage.
ItsDoot pushed a commit to ItsDoot/bevy that referenced this pull request Feb 1, 2023
## Objective

- ~~Make absurdly long-lived changes stay detectable for even longer (without leveling up to `u64`).~~
- Give all changes a consistent maximum lifespan.
- Improve code clarity.

## Solution

- ~~Increase the frequency of `check_tick` scans to increase the oldest reliably-detectable change.~~
(Deferred until we can benchmark the cost of a scan.)
- Ignore changes older than the maximum reliably-detectable age.
- General refactoring—name the constants, use them everywhere, and update the docs.
- Update test cases to check for the specified behavior.

## Related

This PR addresses (at least partially) the concerns raised in:

- bevyengine#3071
- bevyengine#3082 (and associated PR bevyengine#3084)

## Background

- bevyengine#1471

Given the minimum interval between `check_ticks` scans, `N`, the oldest reliably-detectable change is `u32::MAX - (2 * N - 1)` (or `MAX_CHANGE_AGE`). Reducing `N` from ~530 million (current value) to something like ~2 million would extend the lifetime of changes by a billion.

| minimum `check_ticks` interval | oldest reliably-detectable change  | usable % of `u32::MAX` |
| --- | --- | --- |
| `u32::MAX / 8`  (536,870,911) | `(u32::MAX / 4) * 3` | 75.0% |
| `2_000_000` | `u32::MAX - 3_999_999` | 99.9% |

Similarly, changes are still allowed to be between `MAX_CHANGE_AGE`-old and `u32::MAX`-old in the interim between `check_tick` scans. While we prevent their age from overflowing, the test to detect changes still compares raw values. This makes failure ultimately unreliable, since when ancient changes stop being detected varies depending on when the next scan occurs.

## Open Question

Currently, systems and system states are incorrectly initialized with their `last_change_tick` set to `0`, which doesn't handle wraparound correctly.

For consistent behavior, they should either be initialized to the world's `last_change_tick` (and detect no changes) or to `MAX_CHANGE_AGE` behind the world's current `change_tick` (and detect everything as a change). I've currently gone with the latter since that was closer to the existing behavior.

## Follow-up Work

(Edited: entire section)

We haven't actually profiled how long a `check_ticks` scan takes on a "large" `World` , so we don't know if it's safe to increase their frequency. However, we are currently relying on play sessions not lasting long enough to trigger a scan and apps not having enough entities/archetypes for it to be "expensive" (our assumption). That isn't a real solution. (Either scanning never costs enough to impact frame times or we provide an option to use `u64` change ticks. Nobody will accept random hiccups.)

To further extend the lifetime of changes, we actually only need to increment the world tick if a system has `Fetch: !ReadOnlySystemParamFetch`. The behavior will be identical because all writes are sequenced, but I'm not sure how to implement that in a way that the compiler can optimize the branch out.

Also, since having no false positives depends on a `check_ticks` scan running at least every `2 * N - 1` ticks, a `last_check_tick` should also be stored in the `World` so that any lull in system execution (like a command flush) could trigger a scan if needed. To be completely robust, all the systems initialized on the world should be scanned, not just those in the current stage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ECS Entities, components, systems, and events C-Docs An addition or correction to our documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Changed doesn't explain any potential drawback
6 participants