-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change prometheus metrics type from summary to histogram #1202
Conversation
Thanks for your PR. The following commands are available:
|
Codecov Report
@@ Coverage Diff @@
## master #1202 +/- ##
==========================================
+ Coverage 56.06% 56.26% +0.19%
==========================================
Files 108 108
Lines 11887 12027 +140
==========================================
+ Hits 6665 6767 +102
- Misses 4634 4671 +37
- Partials 588 589 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use "Fixes: #" in the PR description, so that the issue could be linked and closed automatically when PR is merged.
LGTM otherwise.
This reminds me that we could enhance testing for Prometheus metrics at the controller like agent (#799 ).. possibly using network policy controller unit tests (networkpolicy_controller_test.go). I will update the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I think the PR description should mention the effect (if any) of changing the metrics type on the consumers of these metrics. If I have metrics enabled and I upgrade my version of Antrea, will there be any issue in the Prometheus server, on in a dashboard like Grafana?
For that purpose, K8S metrics wrapper has deprecation capabilities. We could mark the summary metric as deprecated and add the histogram. |
@ksamoray Yes, you are right. Ideally, we should go through the process of K8s deprecation capabilities. As mentioned by @antoninbas, some consumers may get affected, if they have predefined Grafana dashboards or query scripts. They may need to update their scripts. What is RN? |
Release notes |
/test-all |
/skip-hw-offload as it's not related. |
@dreamtalen the commit message is not well wrapped, could you follow https://github.com/golang/go/wiki/CommitMessage to revise?
|
Thanks for pointing out. I will revise it. |
237c921
00e3cd0
to
fe66434
Compare
Sorry for being nit-picking, the commit message seems not wrapped neatly, and a newline is usually inserted between paragraphs.
|
The summary types are tagged for deprecation, Kubernetes recommended to use histograms instead of summaries. The main advantages of histogram types are aggregation and inexpensive. In this commit, we changed three Antrea controller metrics from summary to histogram type. They are DurationAppliedToGroupSyncing, DurationAddressGroupSyncing, and DurationInternalNetworkPolicySyncing. Fixes antrea-io#905
fe66434
to
9f64d5c
Compare
Never mind. Thanks for your advice. |
/test-all |
/test-windows-conformance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, LGTM
/skip-hw-offload as this is not related test. Merging the PR. |
) The summary types are tagged for deprecation, Kubernetes recommended to use histograms instead of summaries. The main advantages of histogram types are aggregation and inexpensive. In this commit, we changed three Antrea controller metrics from summary to histogram type. They are DurationAppliedToGroupSyncing, DurationAddressGroupSyncing, and DurationInternalNetworkPolicySyncing. Fixes antrea-io#905 Co-authored-by: Yongming Ding <dyongming@vmware.com>
Fix #905
The summary types are tagged for deprecation, Kubernetes recommended to use histograms instead of summaries. The main advantages of histogram types are aggregation and inexpensive. In this commit, we changed three Antrea controller metrics from summary to histogram type. They are DurationAppliedToGroupSyncing, DurationAddressGroupSyncing, and DurationInternalNetworkPolicySyncing.
Did the manually test to check the type of these metrics using following commands:
make
./test/e2e/infra/vagrant/push_antrea.sh --prometheus
kubectl exec -it <antrea_controller_pod> -n kube-system -- bash
TOKEN=$(cat /var/run/antrea/apiserver/loopback-client-token)
curl --insecure --header "Authorization: Bearer $TOKEN" https://127.0.0.1:10349/metrics
Test results:
# HELP antrea_controller_applied_to_group_sync_duration_milliseconds [ALPHA] The duration of syncing applied-to-group
# TYPE antrea_controller_applied_to_group_sync_duration_milliseconds histogram
# HELP antrea_controller_address_group_sync_duration_milliseconds [ALPHA] The duration of syncing address-group
# TYPE antrea_controller_address_group_sync_duration_milliseconds histogram
# HELP antrea_controller_network_policy_sync_duration_milliseconds [ALPHA] The duration of syncing internal-networkpolicy
# TYPE antrea_controller_network_policy_sync_duration_milliseconds histogram