Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add prometheus metrics for experiment and trial #870

Merged
merged 1 commit into from
Oct 11, 2019
Merged

Add prometheus metrics for experiment and trial #870

merged 1 commit into from
Oct 11, 2019

Conversation

hougangliu
Copy link
Member

@hougangliu hougangliu commented Oct 10, 2019

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Release note:



This change is Reviewable

@hougangliu
Copy link
Member Author

# curl 10.0.47.94:8080/metrics|grep katib_
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 17729    0 17729    0     0   524k      0 --:--:-- --:--:-- --:--:--  541k
# HELP katib_experiment_created Counts number of Experiment created
# TYPE katib_experiment_created counter
katib_experiment_created 1
# HELP katib_experiment_deleted Counts number of Experiment deleted
# TYPE katib_experiment_deleted counter
katib_experiment_deleted 1
# HELP katib_experiment_failed Counts number of Experiment failed
# TYPE katib_experiment_failed counter
katib_experiment_failed 0
# HELP katib_experiment_succeeded Counts number of Experiment succeeded
# TYPE katib_experiment_succeeded counter
katib_experiment_succeeded 1
# HELP katib_trial_created Counts number of Trial created
# TYPE katib_trial_created counter
katib_trial_created 12
# HELP katib_trial_deleted Counts number of Trial deleted
# TYPE katib_trial_deleted counter
katib_trial_deleted 12
# HELP katib_trial_failed Counts number of Trial failed
# TYPE katib_trial_failed counter
katib_trial_failed 0
# HELP katib_trial_succeeded Counts number of Trial succeeded
# TYPE katib_trial_succeeded counter
katib_trial_succeeded 12

@hougangliu
Copy link
Member Author

/test kubeflow-katib-presubmit
/cc @johnugeorge @gaocegege

return reconcile.Result{}, err
} else {
if !instance.ObjectMeta.DeletionTimestamp.IsZero() {
utilv1alpha3.IncreaseExperimentsDeletedCount()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of Increase, increment seems more appropriate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean rename IncreaseExperimentsDeletedCount to Increment ExperimentsDeletedCount? If so, I think Increase is better, since it is a verb

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hougangliu when we specify counters, we typically specify the term Increment. See Inc definition in prometheus https://github.com/kubeflow/katib/blob/master/vendor/github.com/prometheus/client_golang/prometheus/counter.go#L37

Or, should we directly call experimentsDeletedCount.Inc() ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I find it. In fact https://github.com/kubeflow/katib/blob/master/vendor/github.com/prometheus/client_golang/prometheus/counter.go#L37 should use increase too, it should be a typo error.
The reason why I didn't use experimentsDeletedCount.Inc() is that other package "pkg/controller.v1alpha3/experiment/util" also call it, but experimentsDeletedCount cannot be access by other packages

Copy link
Member

@gaocegege gaocegege left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@johnugeorge
Copy link
Member

/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnugeorge

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hougangliu
Copy link
Member Author

/test kubeflow-katib-presubmit

@gaocegege
Copy link
Member

@hougangliu Rebase upstream master may be helpful for CI failures.

@k8s-ci-robot k8s-ci-robot removed the lgtm label Oct 11, 2019
@hougangliu
Copy link
Member Author

@hougangliu Rebase upstream master may be helpful for CI failures.

OK, rebased. Thanks

@gaocegege
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot merged commit 030b691 into kubeflow:master Oct 11, 2019

var (
experimentsDeletedCount = prometheus.NewCounter(prometheus.CounterOpts{
Name: "katib_experiment_deleted",
Copy link
Contributor

@yeya24 yeya24 Oct 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to rename the metrics?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants