-
Notifications
You must be signed in to change notification settings - Fork 981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PushMeterRegistry doesn't align publish() with step boundaries #1218
Comments
This seems to be resolved via #2393. |
For a visual, let me provide a diagram illustrating the problem.
In the above case, the The problem described by this issue is illustrated by the next publish that may happen slightly later than exactly one step interval, pushing it across a step boundary. Therefore the second publication (the second Subsequently, if the next publication happened slightly sooner than scheduled, it could happen in the same step as the previous publication, resulting in publishing the same values as the previous publish. I'm not sure if this happens in practice. I can understand scheduling being late but I'm not sure how it would be earlier than expected. The fix implemented here changed the situation to the following illustration.
This avoids the above mentioned issues. The first publication happens at the start of the next step interval after |
This avoids different instances calling publish at the same time. It will avoid scheduling publishing at the last 20% of a step interval to avoid the issue described in micrometer-metricsgh-1218 where publishing could take long enough to finish in a subsequent step after it started. An additional background thread is added for StepMeterRegistry that will poll all meters at the start of the step, separate from when publishing happens. This is so the values published are the ones recorded in the previous step. We need rollCount to be called for step meters as early as possible in the step to minimize the number of recordings reported for the wrong step. This is not perfect, but it is no worse than the status quo.
PushMeterRegistry uses
scheduledExecutorService.scheduleAtFixedRate(this::publish, config.step().toMillis(), config.step().toMillis(), TimeUnit.MILLISECONDS)
to schedulepublish()
with an initial delay ofconfig.step().toMillis()
. So while the period between invocations is fixed, they may run at any time with respect to the epoch.StepDouble sets step boundaries relative to the epoch with
final long stepTime = now / stepMillis
.As a result, PushMeterRegistry can schedule calls to publish() with arbitrary offsets relative to step boundaries. If we are unlucky, publish() will be scheduled right before a step boundary, so ordinary scheduling variations can lead to publish() sometimes being called before the boundary, and sometimes right after, leading to StepDouble sporadically duplicating or dropping data. Code example here:
https://gist.github.com/john-karp/93b86885b39f9d813345100be2a6496a
The simplest fix for PushMeterRegistry would set the initial delay of the executor service to align it with the steps. (Though I think ScheduledExecutorService cannot be trusted to keep that alignment in the long term when you have a server with ntpd active.)
The text was updated successfully, but these errors were encountered: