-
Notifications
You must be signed in to change notification settings - Fork 814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[marathon] Marathon plugin slows down agent when marathon has many apps running #1861
Comments
Hi @Zarkantho, Sorry to hear you have trouble with our Marathon integration. Could you reach out to support AT datadoghq.com with those details and a 'flare' archive please ? That's a valuable feedback to help us understanding your needs and improving the check. Thank you. |
Hi @yannmh, Sorry about this, but we actually already replaced the marathon check with a custom check to work around this problem, which means the logs would probably not be relevant at this point. All we did was comment out this line: https://github.com/DataDog/dd-agent/blob/5.4.4/checks.d/marathon.py#L53, since we don't really care about the "versions" of each application in marathon. This made the problem go away. Sorry again that we can't use the standard tool here, but I hope this description was at least somewhat helpful for diagnosing the problem. |
Happy to know you find a solution to this issue. Still, let's keep the ticket opened so we can assess for the 5.6.0 agent release based on your feedback. Thanks ! |
This metric is not really useful but it’s really costly to collect. Let’s remove it. Fix #1861
This metric is not really useful but it’s really costly to collect. Let’s remove it. Fix #1861
We are monitoring a marathon framework using datadog which has over 150 apps, and the marathon check seems to be slowing down the entire datadog process.
After investigating what the plugin actually does, the problem seems to be this loop: https://github.com/DataDog/dd-agent/blob/5.4.4/checks.d/marathon.py#L46. It appears that the agent is sequentially hitting the API 150 times, which is enough to stop the agent from reporting metrics long enough to trigger some of our other alerts.
The text was updated successfully, but these errors were encountered: