-
Notifications
You must be signed in to change notification settings - Fork 73
Proposal for exposing generic prometheus metrics in common operator #22
Comments
Issue-Label Bot is automatically applying the label Links: app homepage, dashboard and code for this bot. |
/cc @terrytangyuan The feature LGTM. |
lgtm, +1 |
Sounds great to me. This would be a good way to standardize metrics collection. We could also expose some utility methods that operators can use to collect operator-specific custom metrics, which leads to shared best practices and standards across operators. |
Sounds great to me. /cc @jlewi |
Great. LGTM |
Sure. kubebuilder supports the feature, thus I think we can also implement it in common-operator if we design it well. |
LGTM, this looks so good. |
Any progress for this issue? |
@yeya24 AFAIK, there is no one working on it now. |
Hi all, I added a detailed outline of the Prometheus metrics we plan to coverage in common operator in #77. Please take a look and any feedback would be appreciated. |
Co-authored-by: depfu[bot] <23717796+depfu[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Proposal
Add generic metrics (jobs/pods/...) to the common operator, which can be directly enabled and used by operators built base on common operator
Motivation
To track some job-level metrics, currently we need to add prometheus metric code inside each job operator. For example, to know how many tfjobs created in the last hour, we need to add a Counter inside tf-operator. This request is very common and is needed for different operators. As we're moving common code to the common operator, we could also add metric-related code there, and can be used by all operators built base on the common one.
Details
For metric definition and registry, will add a new
metrics
folder and all metrics will be defined there. Some prelim metrics include # jobs/pods/services created, durations for various operations, etc.For metrics updating:
As the common project is still under active development, some details discussed above may be changed later. Comments will be very appreciated, @jlewi @richardsliu @gaocegege @jian-he .
The text was updated successfully, but these errors were encountered: