feat(eventbus): warn applications whenever events aren't read #2026

Wondertan · 2023-01-28T15:21:14Z

Adds logging for the EventBus, Mainly a periodical warning for applications whenever they are slow in reading or do not read EventBus notifications. It came up after debugging a very tricky case where a bugged Server side application wasn't reading EvtConnectednessChanged events, which manifested in the server's inability to accept new streams as connection handling logic deadlocked on sync Notifee's Connected callbacks which emit the events.

P.S. This change should save some time for future application developers.

Adds logging for the EventBus, Mainly a periodical warning for applications whenever they are slow in reading or do not read EventBus notifications. It came up after debugging a very tricky case where a bugged Server side application wasn't reading `EvtConnectednessChanged` events, which manifested in the server's inability to accept new streams as connection handling logic deadlocked on _sync_ Notifee's `Connected` callbacks which emit the events. P.S. This change should save some time for future application developers.

marten-seemann

Please don't introduce a timer here. Emitting multiple log entries because you might miss a single entry doesn't sound very convincing to me. This will just spam the log.

Wondertan · 2023-01-29T14:09:49Z

Timer/ticker prevents false warnings. There could be an ordinary case of a connection burst or another event that fills up the buffer to the point where the default case is hit. emitWithWarn gives the reader some time buffer to process events instead of warning instantly. Hitting default does not necessarily mean the reader is slow.
Spamming is not an issue if its goal is to bring attention to something critical. E.g., for CANONICAL logs, we don't produce only one log, but multiple, with sampling rate being a configuration to prevent over-spamming. Similarly, in this case, we can configure how often the warning should be produced(and after what time we see blocked event worth to be logged out)

marten-seemann · 2023-01-29T18:25:08Z

Starting a timer to emit a log message seems very heavy. If that’s needed due to burstiness, I’m beginning to wonder if this is the right approach at all. Maybe building proper monitoring would be the better way?

I’m strongly opposed to spamming the logs. Just because a log entry is important is no justification for repeating it 10 times (per second!).

E.g., for CANONICAL logs, we don't produce only one log, but multiple, with sampling rate being a configuration to prevent over-spamming.

That case is quite the opposite. These are all separate events, and we sample them to dial down the log frequency. If at all, this is an argument why we should not spam the log.

aviscode · 2023-01-30T08:05:15Z

Please don't introduce a timer here. Emitting multiple log entries because you might miss a single entry doesn't sound very convincing to me. This will just spam the log.

i think we can have a retry of 3 times without a timer @marten-seemann what do u think ?

marten-seemann · 2023-01-30T08:10:48Z

Please don't introduce a timer here. Emitting multiple log entries because you might miss a single entry doesn't sound very convincing to me. This will just spam the log.

i think we can have a retry of 3 times without a timer @marten-seemann what do u think ?

There's no point in retrying if you don't wait in between.

aviscode · 2023-01-30T08:14:03Z

Please don't introduce a timer here. Emitting multiple log entries because you might miss a single entry doesn't sound very convincing to me. This will just spam the log.

i think we can have a retry of 3 times without a timer @marten-seemann what do u think ?

There's no point in retrying if you don't wait in between.

can we wait with a sleep not with a timer ?

marten-seemann · 2023-02-04T21:56:03Z

@Wondertan Can you have a look at the dashboard we're adding in #2038. There are two dashboards that should have helped you detect the bug:

Subscriber Queue Length: You would have seen the queue for one of your subscribers to monotonically grow.
Subscriber Queue Filled Up: You would've seen that subscriber turn red in the state timeline:

marten-seemann · 2023-02-08T04:51:20Z

Closing since this has been addressed by adding metrics.

Wondertan force-pushed the event-bus/slow-reader-warning branch from da366f1 to e974590 Compare January 28, 2023 15:35

This was referenced Jan 28, 2023

fix(basichost): Emit Connected events asynchronously #2027

Closed

Unstable connectivity and stream negotiation on test networks celestiaorg/celestia-node#1623

Closed

marten-seemann requested changes Jan 28, 2023

View reviewed changes

marten-seemann closed this Feb 8, 2023

Wondertan mentioned this pull request Jun 14, 2023

eventbus: log a warning if an event channel is full (and continue to block) #2361

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(eventbus): warn applications whenever events aren't read #2026

feat(eventbus): warn applications whenever events aren't read #2026

Wondertan commented Jan 28, 2023

marten-seemann left a comment

Wondertan commented Jan 29, 2023

marten-seemann commented Jan 29, 2023 •

edited

Loading

aviscode commented Jan 30, 2023

marten-seemann commented Jan 30, 2023

aviscode commented Jan 30, 2023

marten-seemann commented Feb 4, 2023

marten-seemann commented Feb 8, 2023

feat(eventbus): warn applications whenever events aren't read #2026

feat(eventbus): warn applications whenever events aren't read #2026

Conversation

Wondertan commented Jan 28, 2023

marten-seemann left a comment

Choose a reason for hiding this comment

Wondertan commented Jan 29, 2023

marten-seemann commented Jan 29, 2023 • edited Loading

aviscode commented Jan 30, 2023

marten-seemann commented Jan 30, 2023

aviscode commented Jan 30, 2023

marten-seemann commented Feb 4, 2023

marten-seemann commented Feb 8, 2023

marten-seemann commented Jan 29, 2023 •

edited

Loading