Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should signal loudly when internal channel capacity exhausted #209

Open
pnkfelix opened this issue Dec 16, 2021 · 2 comments
Open

Should signal loudly when internal channel capacity exhausted #209

pnkfelix opened this issue Dec 16, 2021 · 2 comments
Labels
A-instrumentation Area: instrumentation. C-console Crate: console. C-subscriber Crate: console-subscriber. E-medium Effort: medium. S-feature Severity: feature. This is adding a new feature.

Comments

@pnkfelix
Copy link
Contributor

pnkfelix commented Dec 16, 2021

After writing the docs for the parameters that control the internal channel capacity for the ConsoleLayer -> Server communication, I realized that I don't think the current UI does anything to relay that channel exhaustion may have occurred. The system reacts to this scenario by throwing events away until capacity is freed up.

Thus, the output may be misleading (e.g. showing tasks that never joined, when in fact what happened is that their join event was dropped) if some of the events have been dropped. But the console UI doesn't say that this has happened. (I think its currently up to the human user to notice "something's weird about this output" and dig deeper into why.)

I suggest we attack this in two ways, concurrently:

  1. We keep a single boolean for the channel, on whether it had ever been exhausted or not. When the channel is exhausted and that flag is false, then after capacity is subsequently freed up, the first event we will push to it is a special event signalling "an arbitrary number of events may have been dropped from this point on", so that the client UI can reflect that in some sort of warning about the model's accuracy from that point on in the console output.
  2. We add an extra channel, solely for communicating with high priority when the first channel is exhausted (but that the events that are currently in the channel have not yet been dropped -- i.e., that they still yield an accurate model of the runtime). This way, if the problem is that the app is simply generating events at too high a frequency, the end developer will hopefully at least get a notification about it before waiting to process the entirety of the event.

With part 1 in place, the console UI should be able to render sets of tasks (i.e. every task that was still alive at the time of channel exhaustion) as having potentially inaccurate output. (Ideally we'd be able to indicate which outputs should not be relied upon, but I'd be happy with just a general warning at the top.)

With part 2 in pace, the console UI should be able to signal exhaustion very soon after it occurs, before the model becomes inaccurate. Thus, end developers who are still debugging their app can start investigating the problem without waiting for the events currently in the channel to drain completely.

@pnkfelix
Copy link
Contributor Author

I think at least some instances of #189 have this as their root cause. To be concrete: for a demo app I made that spawns a ton of short-lived tasks, I found evidence of #189, but then when I increased the event buffer capacity by 1024x, it seemed to go away.

hawkw added a commit that referenced this issue Dec 16, 2021
See also #209

Currently, we don't really have any way of surfacing potential data loss
due to the event buffer capacity limit --- so, if events are dropped,
the user may not be *aware* that they're seeing an incomplete picture of
their application. It would be better if we had a way to surface this in
the UI.

This branch adds support for counting the number of dropped events in
the `ConsoleLayer`. This data is now included in the `ResourceUpdate`,
`TaskUpdate`, and `AsyncOpUpdate` messages, respectively. We track a
separate counter for dropped events of each type, to make it easier to
determine what data may be missing.

The console UI doesn't currently *display* these counts; that can be
added in a separate PR. We may want to use the warnings interface for
displaying this information?
hawkw added a commit that referenced this issue Dec 16, 2021
See also #209

Currently, we don't really have any way of surfacing potential data loss
due to the event buffer capacity limit --- so, if events are dropped,
the user may not be *aware* that they're seeing an incomplete picture of
their application. It would be better if we had a way to surface this in
the UI.

This branch adds support for counting the number of dropped events in
the `ConsoleLayer`. This data is now included in the `ResourceUpdate`,
`TaskUpdate`, and `AsyncOpUpdate` messages, respectively. We track a
separate counter for dropped events of each type, to make it easier to
determine what data may be missing.

The console UI doesn't currently *display* these counts; that can be
added in a separate PR. We may want to use the warnings interface for
displaying this information?
@hawkw hawkw added S-feature Severity: feature. This is adding a new feature. E-medium Effort: medium. C-console Crate: console. C-subscriber Crate: console-subscriber. A-instrumentation Area: instrumentation. labels Dec 17, 2021
@bobrik
Copy link
Contributor

bobrik commented Mar 31, 2022

I added the ui portion in #316.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-instrumentation Area: instrumentation. C-console Crate: console. C-subscriber Crate: console-subscriber. E-medium Effort: medium. S-feature Severity: feature. This is adding a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants