Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple karma points to exclude collectors that keep crashing #50

Open
tsuna opened this issue Jul 24, 2012 · 0 comments
Open

Simple karma points to exclude collectors that keep crashing #50

tsuna opened this issue Jul 24, 2012 · 0 comments

Comments

@tsuna
Copy link
Member

tsuna commented Jul 24, 2012

Sometimes a new collector gets deployed and it doesn't work, or more commonly it only works on a small subset of hosts and it doesn't properly exit(13) on the hosts where it's not supposed to run. What would be nice is to have a dead-simple karma point system:

  • When the collector is first discovered and first started, it gets X karma points.
  • Each time the collector crashes, it loses C karma points.
  • Every N seconds that elapse, the collector gains G karma points, up to an upper bound of Gmax points.
  • Whenever a collector crashes, we check its karma, if it's negative, we mark it as dead and don't restart it anymore

The idea is that if a collector crashes too often, we want to give up on it, instead of spamming the logs. But if a collector has been up for a while, and all of a sudden it starts crashing a few times in a row, it's worth trying some more before giving up on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant