-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Telemetry Reporting to Metrics Server and/or Prometheus #896
Milestone
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Feature Request
For analytic and batch workflows, having precise telemetry is terribly important. We need to analyze run time as well as memory and CPU usage to be able to further tune scheduling of jobs.
What would be terribly nice to have is a hierarchy of stats posted to MetricsServer or the Prometheus PushGateway, or some configurable endpoint. All of the standard pod stats that are reported by metrics-server would be great.
I'm not terribly clear on the mechanics of metrics-server -> Prometheus. I'm assuming that there is some logic that tells it to only report on pods if they were spun up by Deployments or similar. Maybe there is something that can be issued from the Argo operator when it is creating new workflows to ensure they are captured into Prometheus.
I am happy to help implement this, but I have no idea where to begin with regards to where to hook for stat collection and reporting.
The text was updated successfully, but these errors were encountered: