-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate prometheus run time info metrics #1217
Conversation
Thanks for your PR. The following commands are available:
|
Codecov Report
@@ Coverage Diff @@
## master #1217 +/- ##
==========================================
- Coverage 56.29% 54.97% -1.32%
==========================================
Files 109 110 +1
Lines 12045 10573 -1472
==========================================
- Hits 6781 5813 -968
+ Misses 4673 4186 -487
+ Partials 591 574 -17
Flags with carried forward coverage won't be shown. Click here to find out more.
|
I don't believe that this is relevant here as I doubt that anyone uses these metrics, but maybe it's worthwhile to conclude a deprecation policy for metrics which we change or drop. |
My understanding of the deprecated version is to have the future Antrea version (+1 to current version) as deprecated version. This serves as a notice to the consumers of the current version. And finally we could remove the metric in the future Antrea version. Does this process sound good? |
@srikartati could you clarify what you mean here? |
As @srikartati noted, we remove these metrics as their usability is questionable (I'm generous here...). I would remove them without going through deprecation process. As for #1202, maybe we should document the change somewhere beyond slack (e.g release notes document if we have any). I agree that retaining the metrics name is valuable though. |
It will be documented in the release notes no matter what. The advance notice on slack / email groups was to check if anyone had an objection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@antoninbas In high-level, I meant the same but was not sure which release we should remove the metric after it is being deprecated. I believe we should define a simple custom deprecation policy for Antrea metrics and not follow the Kubernetes metrics specifically the hidden metrics convention. For this purpose, we could employ the policy you suggested and also follow the guidelines here w.r.t name and help text. In addition, we should describe this policy in the Antrea's Prometheus documentation. What do you think? |
@srikartati sounds good to me. Feel free to submit a documentation patch for this and to link to / borrow from the K8s documentation, while adjusting / simplifying a few things as needed. |
e2a16be
e2a16be
to
fbd01af
Compare
docs/prometheus-metrics.md
Outdated
@@ -69,7 +69,7 @@ rules: | |||
|
|||
### Antrea Components Scraping configuration | |||
Add the following jobs to Prometheus scraping configuration to enable metrics | |||
collection from Antrea components | |||
collection from Antrea components. Antrea Agent metrics endpoint is exposed through Antrea apiserver on `apiport` config parameter given in `antrea-agent.conf` (default value is 10350). Antrea Controller metrics endpoint is exposed through Antrea apiserver on `apiport` config parameter given in `antrea-controller.conf` (default value is 10349). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made this change as there were some questions on what ports Prometheus metrics are exposed recently when integrating with Wavefront solution.
Name: "antrea_agent_runtime_info", | ||
Help: "Antrea agent runtime info , defined as labels. The value of the gauge is always set to 1.", | ||
|
||
deprecatedGaugeHost := metrics.NewGauge(&metrics.GaugeOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would leave gauge names as they are - as renaming just adds plenty of unnecessary diffs and merge efforts later on.
Just add maybe DeprecatedVersion: "x.x" to the gauge defs, e.g
metrics.GaugeOpts{
Name: "antrea_agent_runtime_info",
Help: "...",
ConstLabels: "..."
StabilityLevel: metrics.STABLE,
DeprecationVersion: "x.x"}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeprecatedVersion is to implement the deprecation lifecycle for Kubernetes control plane metrics as given here.
The instrumented version is Kubernetes version. We could employ a simpler policy as mentioned in #1217 (comment)
I would like to not add this field to avoid the confusion of the deprecation policy and version numbers.
Can you elaborate on merging difficulties with the change in the variable name? IMO, this serves as a quick indicator in source code that these metrics have to be removed in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @srikartati,
See deprecation example here.
BTW that metric has been actually removed since then :)
In the above example given by you, there is a change in the variable name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @srikartati, DeprecatedVersion should expose deprecation to users (AFAIK k8s metrics will note that in the metric help) while renaming variables is only visible to developers. I'm not sure that this is enough.
Not really sure which confusion you're trying to avoid by not using it.
Merging wise - that change will popup as a diff on every branch comparison...
My vote is still to remove these metrics immediately, but if we decide to deprecate in a later time, lets use the best tools for this purpose.
@@ -78,18 +78,19 @@ func InitializePrometheusMetrics() { | |||
} | |||
|
|||
klog.Info("Initializing prometheus metrics") | |||
gaugeHost := metrics.NewGauge(&metrics.GaugeOpts{ | |||
deprecatedGaugeHost := metrics.NewGauge(&metrics.GaugeOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
Hi @srikartati, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that there is disagreement on what to do here?
- It makes sense to me to rename the variable with a "deprecated" prefix
- It also makes sense to use the
DeprecatedVersion
field. If we do that, I think we can just adhere to the K8s metrics deprecation lifecycle. Since this was tagged as aSTABLE
metric, we can do the following for these metrics: Deprecated (0.10.0) -> Hidden (0.11.0) -> Deleted (0.12.0). I believe it is better than removing them altogether in 0.10.0 as it sets a better precedent.
docs/prometheus-metrics.md
Outdated
# Deprecation Policy | ||
|
||
In Antrea, Prometheus metric deprecation policy is to flag them deprecated in `Antrea 0.N.0` by adding `(Deprecated since Antrea 0.N.0)` in the front of metric's help text. The release target of removing the deprecated metrics would be `Antrea 0.N+1.0`. We follow the policy given in the [Kubernetes Metrics Overhaul](https://github.com/kubernetes/enhancements/blob/master/keps/sig-instrumentation/20181106-kubernetes-metrics-overhaul.md#deprecation-plan) document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's not make this part of this PR, I don't want it to go unnoticed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this.
@srikartati pointed out to me that "hidden" metrics is a k8s concept and not a Prometheus concept. To implement this, we would need to expose the |
/test-all |
fbd01af
to
26dc5e9
Compare
Yes, we need to expose the show hidden flag and add some instrumentation code like K8s in Antrea to implement the deprecation life cycle given below: Instead, I opted for a simpler deprecation policy provided here in the Kubernetes overhaul metrics documentation: Agree that we can make this decision on Antrea metrics deprecation policy in 0.11.0 time frame. Set the Deprecated version field with Antrea version for now. |
26dc5e9
to
83777d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, a couple more comments
docs/prometheus-integration.md
Outdated
@@ -69,7 +69,7 @@ rules: | |||
|
|||
### Antrea Components Scraping configuration | |||
Add the following jobs to Prometheus scraping configuration to enable metrics | |||
collection from Antrea components | |||
collection from Antrea components. Antrea Agent metrics endpoint is exposed through Antrea apiserver on `apiport` config parameter given in `antrea-agent.conf` (default value is 10350). Antrea Controller metrics endpoint is exposed through Antrea apiserver on `apiport` config parameter given in `antrea-controller.conf` (default value is 10349). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: line not wrapped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
StabilityLevel: metrics.STABLE, | ||
deprecatedGaugeHost := metrics.NewGauge(&metrics.GaugeOpts{ | ||
Name: "antrea_agent_runtime_info", | ||
Help: "Antrea agent runtime info (Deprecated since Antrea 0.10.0), defined as labels. The value of the gauge is always set to 1.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my understanding was that by setting DeprecatedVersion
, K8s would automatically add the deprecation information to the help message. Is it not the case after all? @ksamoray @srikartati
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antoninbas It only happens if metric is initialized through following function:
// Create forces the initialization of metric which has been deferred until
// the point at which this method is invoked. This method will determine whether
// the metric is deprecated or hidden, no-opting if the metric should be considered
// hidden. Furthermore, this function no-opts and returns true if metric is already
// created.
func (r *lazyMetric) Create(version *semver.Version) bool {
if version != nil {
r.determineDeprecationStatus(*version)
}
// let's not create if this metric is slated to be hidden
if r.IsHidden() {
return false
}
r.createOnce.Do(func() {
r.createLock.Lock()
defer r.createLock.Unlock()
r.isCreated = true
if r.IsDeprecated() {
r.self.initializeDeprecatedMetric()
} else {
r.self.initializeMetric()
}
})
return r.IsCreated()
}
Above function is called through only below initialization function:
func NewGaugeFunc(opts GaugeOpts, function func() float64)
go/pkg/mod/k8s.io/component-base@v0.18.4/metrics/gauge.go:189
In K8s 0.18.4 version, this initialization is provided only for Guage type and not for all types of metrics. In addition, most guage type metrics in K8s seemed to be using NewGuage which doesn't support this instrumentation yet. So we need to manually add in "HELP" statement.
May be this will be improved in future K8s version, but this instrumentation code works only for K8s version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks good to know. Let's see how this evolves in the future.
Deprecate run time info metrics that provide node name and pod name from Antrea 0.10.0. These metrics will be removed Antrea 0.11.0. In addition, antrea-prometheus.yaml is changed to add nodename and pod name to instance label instead of default IP:port. This will help with promql queries to search for metrics on a specific node or pod.
83777d6
to
f2628c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-all |
/skip-hw-offload |
As this PR is about deprecating metrics, code coverage is not applicable. Merged this. |
Deprecate run time info metrics that provide node name and pod name
from Antrea 0.10.0. These metrics will be removed Antrea 0.11.0. Added
deprecation policy for the Prometheus metric in Antrea documentation.
In addition, antrea-prometheus.yaml is changed to add nodename and
pod name to the instance label instead of the default IP:port. This will help
with promql queries to search for metrics on a specific node or pod.