-
Notifications
You must be signed in to change notification settings - Fork 107
MemoryReporter: make call to runtime.ReadMemStats time bound to avoid lost metrics #1494
Conversation
4dfe3fc
to
25bcdf1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
mem := runtime.MemStats{} | ||
runtime.ReadMemStats(&mem) | ||
return mem | ||
}, 5*time.Second, 1*time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that all graphite/metrictank systems support down to 1 second resolution.
that's also what metrictank is configured to emit by default.
as such a 5s timeout seems too long for such a setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After more discussion we think it will be best to model the timeout after 65% of the set interval. However, we don't know if this is enough time for the rest of the reporters to complete their operations. What are your thoughts on the best way to proceed with fixing this issue? Also of note, once we update it here we will need to update it in a few other projects.
Another option is to launch them all at the same time and wait for results. Any reporter that doesn't return a result within the interval will not get reported for that tick, but all the others will.
We can also add a budget to the launched reporting function and it can decide if it would like to use caching or not. We feel this is the better option, but it does add a bit of overhead which takes away from the overall time allocated for that tick. It still seems best. Thoughts?
So if an operation is consistently slow we will never report data for it ? I don't know what you mean with budget, but I'm sure you guys can easily implement a decent solution for this. We don't have to debate minutiae |
Actually, I think we are spending way too much time on this given that the fix for readmemstats stopped by GC will be in Go 1.14. I suggest we do nothing more than what was done unless we have a big customer who uses a 1 second resolution timer. |
sounds good |
MemoryReporter: make call to runtime.ReadMemStats time bound to avoid lost metrics. If the timeout is reached, use the previous result of the function.
Added a generic decorator to limit function execution time.
Inspired by prometheus/client_golang#568
Fixes #1207