Should signal loudly when internal channel capacity exhausted #209

pnkfelix · 2021-12-16T12:52:27Z

After writing the docs for the parameters that control the internal channel capacity for the ConsoleLayer -> Server communication, I realized that I don't think the current UI does anything to relay that channel exhaustion may have occurred. The system reacts to this scenario by throwing events away until capacity is freed up.

Thus, the output may be misleading (e.g. showing tasks that never joined, when in fact what happened is that their join event was dropped) if some of the events have been dropped. But the console UI doesn't say that this has happened. (I think its currently up to the human user to notice "something's weird about this output" and dig deeper into why.)

I suggest we attack this in two ways, concurrently:

We keep a single boolean for the channel, on whether it had ever been exhausted or not. When the channel is exhausted and that flag is false, then after capacity is subsequently freed up, the first event we will push to it is a special event signalling "an arbitrary number of events may have been dropped from this point on", so that the client UI can reflect that in some sort of warning about the model's accuracy from that point on in the console output.
We add an extra channel, solely for communicating with high priority when the first channel is exhausted (but that the events that are currently in the channel have not yet been dropped -- i.e., that they still yield an accurate model of the runtime). This way, if the problem is that the app is simply generating events at too high a frequency, the end developer will hopefully at least get a notification about it before waiting to process the entirety of the event.

With part 1 in place, the console UI should be able to render sets of tasks (i.e. every task that was still alive at the time of channel exhaustion) as having potentially inaccurate output. (Ideally we'd be able to indicate which outputs should not be relied upon, but I'd be happy with just a general warning at the top.)

With part 2 in pace, the console UI should be able to signal exhaustion very soon after it occurs, before the model becomes inaccurate. Thus, end developers who are still debugging their app can start investigating the problem without waiting for the events currently in the channel to drain completely.

The text was updated successfully, but these errors were encountered:

pnkfelix · 2021-12-16T13:14:37Z

I think at least some instances of #189 have this as their root cause. To be concrete: for a demo app I made that spawns a ton of short-lived tasks, I found evidence of #189, but then when I increased the event buffer capacity by 1024x, it seemed to go away.

See also #209 Currently, we don't really have any way of surfacing potential data loss due to the event buffer capacity limit --- so, if events are dropped, the user may not be *aware* that they're seeing an incomplete picture of their application. It would be better if we had a way to surface this in the UI. This branch adds support for counting the number of dropped events in the `ConsoleLayer`. This data is now included in the `ResourceUpdate`, `TaskUpdate`, and `AsyncOpUpdate` messages, respectively. We track a separate counter for dropped events of each type, to make it easier to determine what data may be missing. The console UI doesn't currently *display* these counts; that can be added in a separate PR. We may want to use the warnings interface for displaying this information?

bobrik · 2022-03-31T19:11:59Z

I added the ui portion in #316.

hawkw mentioned this issue Dec 16, 2021

feat(subscriber): count dropped events due to buffer cap #211

Merged

hawkw added S-feature Severity: feature. This is adding a new feature. E-medium Effort: medium. C-console Crate: console. C-subscriber Crate: console-subscriber. A-instrumentation Area: instrumentation. labels Dec 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should signal loudly when internal channel capacity exhausted #209

Should signal loudly when internal channel capacity exhausted #209

pnkfelix commented Dec 16, 2021 •

edited

Loading

pnkfelix commented Dec 16, 2021

bobrik commented Mar 31, 2022

Should signal loudly when internal channel capacity exhausted #209

Should signal loudly when internal channel capacity exhausted #209

Comments

pnkfelix commented Dec 16, 2021 • edited Loading

pnkfelix commented Dec 16, 2021

bobrik commented Mar 31, 2022

pnkfelix commented Dec 16, 2021 •

edited

Loading