-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add agent networkpolicy metrics #770
Add agent networkpolicy metrics #770
Conversation
Thanks for your PR. The following commands are available:
These commands can only be run by members of the vmware-tanzu organization. |
) | ||
|
||
var ( | ||
IngressNetworkPolicyCount = metrics.NewGauge(&metrics.GaugeOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to exclude these metrics from pkg/agent/metrics/prometheus.go? Or at least from pkg/agent/metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Kobi. These should be part of pkg/agent/metrics IMO.
May be a separate directory like pkg/agent/metrics/networkpolicy will be good.
In future, we can possibly add a way to enable/disable specific metrics in antrea agent. What do you think @ksamoray ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason is to avoid import cycle.
pkg/agent/metrics/prometheus.go use pkg/agent/openflow package
I tried that pkg/agent//openflow/network_policy.go use metrics/prometheus.go, which caused the loop.
What was on my mind was
- To implement antrea_agent_ovs_flow_table wihtout using openflow package in prometheus.go
- Use another package not to have import cycle.
I think 1st one is better considering the future use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do the following to resolve the dependency? Have separate directories for definition and callback (if needed) for metrics, and just do initialization in prometheus.go
pkg/agent/metrics/ovs --for pod count (ovs container interface count) and ovs flow table count
pkg/agent/metrics/networkpolicy---for network policy related metrics
pkg/agent/metrics/prometheus.go---for initializing/registration of the metrics. In future, we can selectively register the metrics.
Better to get opinion of @ksamoray too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comment. Let me check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing that importing pkg/agent/openflow wasn't a good practice on my side.
If possible, I think that it's better to avoid a case where some of the metric implementations import pkg/agent/metrics while others are imported by.
Having said that, it could be painful to reimplement antrea_agent_ovs_flow_table while walking around openflow package. Therefore I think that for now you're solution is valid and so is the proposal by @srikartati
Unless someone has a different proposal - e.g maintaining flow counts as counter metrics instead of using a callback. My openflow package it too limited to conclude this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for registering metrics selectively, that would require a whole lot of configuration flags. Maybe a few individual metrics which their collection is costly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ksamoray I guess it depends on how we group/classify various metrics and interface we provide to enable those groups. Moreover, muxing of metrics is to provide flexibility to the user who is interested in subset of metrics rather than complexity in tracking metrics.
@yktsubo You probably already saw PR#781. I tried to remove the dependency with other packages. You can now define the new metrics in metrics package. Also, ok with not having separate package for network policy metrics in the current patch. Grouping/classification may probably need more discussion and work.
test/e2e/prometheus_test.go
Outdated
@@ -36,6 +36,8 @@ const monitoringNamespace string = "monitoring" | |||
var antreaAgentMetrics = []string{ | |||
"antrea_agent_local_pod_count", | |||
"antrea_agent_ovs_flow_table", | |||
"antrea_agent_ingress_networkpolicy_rule", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had that list sorted, IDK if that has any importance though :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the comment, I'll change based on your suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this.
Wondering how you were able to verify these metrics through e2e test? As in, were mock network policies and rules are added and verified?
) | ||
|
||
var ( | ||
IngressNetworkPolicyCount = metrics.NewGauge(&metrics.GaugeOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Kobi. These should be part of pkg/agent/metrics IMO.
May be a separate directory like pkg/agent/metrics/networkpolicy will be good.
In future, we can possibly add a way to enable/disable specific metrics in antrea agent. What do you think @ksamoray ?
pkg/agent/openflow/network_policy.go
Outdated
@@ -738,6 +739,14 @@ func (c *client) InstallPolicyRuleFlows(ruleID uint32, rule *types.PolicyRule, n | |||
if err := c.applyConjunctiveMatchFlows(ctxChanges); err != nil { | |||
return err | |||
} | |||
|
|||
// Count up antrea_agent_ingress_networkpolicy_rule or antrea_agent_egress_networkpolicy_rule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not installOFRule function in pkg/agent/controller/networkpolicy/reconciler.go?
This function could be used to track number of network policies in addition to the current metric, number of network policy rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion, it sounds great. I'll take a look into it.
pkg/agent/openflow/network_policy.go
Outdated
@@ -944,6 +953,21 @@ func (c *client) UninstallPolicyRuleFlows(ruleID uint32) error { | |||
return err | |||
} | |||
|
|||
isFound := false | |||
for _, ctxChange := range ctxChanges { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctxChanges seem to be some what complicated.
Is it possible to change following function in reconciler.go to accept Direction or pointer to rule as installOFRule?
func (r *reconciler) uninstallOFRule(ofID uint32)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your suggestion, let me take a look into it.
bd4b623
to
35651c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any ideas on testing these metrics if they are tracked correctly?
I see e2e test do not have a way for accuracy of metrics. Looks like it is just tracking the list of metrics.
May be we could create some pods and then use createNetworkPolicy/deleteNetworkpolicy and check for the value of metrics in prometheus e2e test. What do you think?
@@ -368,6 +369,14 @@ func (r *reconciler) installOFRule(ofRule *types.PolicyRule, npName, npNamespace | |||
r.idAllocator.release(ofID) | |||
return 0, fmt.Errorf("error installing ofRule %v: %v", ofID, err) | |||
} | |||
|
|||
// Count up antrea_agent_ingress_networkpolicy_rule_count or antrea_agent_egress_networkpolicy_rule_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at it more closely, installOFRule is also invoked for update of network policy rules (through "update" function). We may double count the rules. Please look at "add" function to get the correct count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, Actually I wanted to confirm the point whether we have a case to increment or decrement counters in update.
@@ -398,11 +407,19 @@ func (r *reconciler) updateOFRule(ofID uint32, addedFrom []types.Address, addedT | |||
return nil | |||
} | |||
|
|||
func (r *reconciler) uninstallOFRule(ofID uint32) error { | |||
func (r *reconciler) uninstallOFRule(ofID uint32, flowDirection v1beta1.Direction) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here as well, please look at "Forget" function that calls uninstallOFRule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll work on it.
@srikartati for some of the metrics we can validate with the numbers in the controller, agent CRDs. |
We are adding these metrics by hooking the tracking code in the existing code. IMO testing is necessary to get the code right and will be easily maintainable if there any future changes in existing code. Therefore, I do not completely agree that this is out of scope in this PR. As pointed out in my previous comment, testSetup in Prometheus e2e test can be extended to deploy/delete pods, create/delete network policies etc. At least metrics such as network policy, pod count can be covered. In addition, we may also antctl (tests/e2e/antctl_test) to get the expected values from agent/controller to compare with metrics gathered by API. |
Added a new issue for improving testing: #799 |
@ksamoray @srikartati |
Add ingress and egress networkpolicy metrics Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com>
Add networkpolicy metrics in agent Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com>
Update register function based on kubernetes library upgrade Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com>
Update metric name Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com>
Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com>
Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com>
Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com>
Use reconcier to implement networkpolicy metrics Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com>
Change based on rebase Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com>
Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com>
Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com>
Delete unncessary lines Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com>
Apply gofmt to pkg/agent/metrics/prometheus.go Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com>
Use add and forget to update agent networkpolicy metrics Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com>
Delete unnecessary blanks Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com>
Fix metrics name Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com>
Add direction in unit test Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com>
5ed689e
to
80b3c83
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing the comments. Were you able to do any manual testing?
Hi @srikartati If we you need me to put some testing code in networkpolicy unit test, I can think about it. |
As discussed before, we can take this up in follow up patch standalone or other metrics: #799 LGTM. It'd be better to get approval from @ksamoray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test-all |
* Add ingress and egress networkpolicy metrics Add ingress and egress networkpolicy metrics Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com> * Add networkpolicy metrics in agent Add networkpolicy metrics in agent Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com> * Update register function based on kubernetes library upgrade Update register function based on kubernetes library upgrade Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com> * Update metric name Update metric name Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com> * Delete unnecessary blanks Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com> * Add metrics to testcase Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com> * Gofmt for agent prometheus.go and metric_cache.go Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com> * Use reconcier to implement networkpolicy metrics Use reconcier to implement networkpolicy metrics Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com> * Change based on rebase Change based on rebase Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com> * Delete unncessary package from go.sum Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com> * Fix mistake by rebase Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com> * Delete unncessary lines Delete unncessary lines Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com> * Apply gofmt to pkg/agent/metrics/prometheus.go Apply gofmt to pkg/agent/metrics/prometheus.go Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com> * Use add and forget to update agent networkpolicy metrics Use add and forget to update agent networkpolicy metrics Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com> * Delete unnecessary blanks Delete unnecessary blanks Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com> * Fix metrics name Fix metrics name Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com> * Add direction in unit test Add direction in unit test Signed-off-by: Yuki Tsuboi <ytsuboi@vmware.com>
Add agent metrics of total number of ingress/egress network policy rules installed on the node.
The number of ingress networkpolicy rules installed on the node
The number of egress networkpolicy rules installed on the node
This PR is a part of #713.