-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix backoffSleep() so it jitters the sleep time. #505
Conversation
Fix publish() so it does not use a negative sleep time.
// Max wait time is 1 minute (jittered) | ||
d := 1 * time.Minute | ||
if n < 5 { | ||
d = base * time.Duration(1<<int64(n)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No change in behavior.
Just a formatting change.
Codecov Report
@@ Coverage Diff @@
## master #505 +/- ##
==========================================
+ Coverage 56.71% 56.75% +0.03%
==========================================
Files 374 374
Lines 17733 17736 +3
==========================================
+ Hits 10058 10066 +8
+ Misses 7083 7079 -4
+ Partials 592 591 -1
Continue to review full report at Codecov.
|
log.Printf("Got publisherJitter %v", publishJitter) | ||
assert.GreaterOrEqual(t, publishJitter, time.Duration(0)) | ||
assert.Less(t, publishJitter, time.Minute) | ||
assert.NotEqual(t, publishJitter, last) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is effectively no chance that 2 random numbers can be the same considering the range is 0 to 60,000,000,000.
If someone is opposed I can remove the NotEqual
check.
assert.True(t, svc.AssertNumberOfCalls(t, "PutMetricData", 5)) | ||
|
||
cloudWatchOutput.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No real change in TestWriteError()
.
The type of backoffRetryBase
was changed elsewhere.
The extra call to cloudWatchOutput.Close()
is just for cleanup.
Closed this PR since it is just a subset of this larger PR: |
Description of the issue
When a burst of metrics is forwards to the output plugin and the number of metrics exceeds the internal buffer size, CWA will push the metrics immediately. If there are many hosts running CWA and they all get a burst of metrics at the beginning of each minute, then they will all try uploading at the beginning of the minute. The burst of uploads is likely to surpass API rate limits, and will need to be retried.
Currently when PutMetricData API calls fail, they are retried after deterministic delays instead of jittered.
Description of changes
Fix backoffSleep() so it jitters the sleep time.
Fix publish() so it does not use a negative sleep time.
License
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Tests
Requirements
Before commit the code, please do the following steps.
make fmt
andmake fmt-sh
make linter
(skipped formatting to keep PR small)