-
Notifications
You must be signed in to change notification settings - Fork 371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance testing for prometheus metrics at agent #916
Conversation
Thanks for your PR. The following commands are available:
These commands can only be run by members of the vmware-tanzu organization. |
/test-all |
ab2abea
to
8509700
Compare
Hi @srikartati, |
These tests do not cover all metrics yet. In addition, e2e test seems to be testing the workflow involving the Prometheus server. Maybe if the above tests are available for all agent and controller metrics, we could remove TestPrometheusMetricsOnController and TestPrometheusMetricsOnAgent, but still keep Prometheus server-based tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, could you add a space between "Fixes" and "#NUM"? otherwise it doesn't link to the issue and cannot auto close it.
/test-all |
/test-conformance |
/test-conformance |
/test-networkpolicy |
@@ -777,6 +783,14 @@ func TestCNIServerChaining(t *testing.T) { | |||
testRequire.Nil(err) | |||
testRequire.Equal(tc.networkConfig, string(cniResp.CniResult)) | |||
|
|||
// Check pod count metric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am definitely not in favor of piggybacking a metrics test in a highly "specialized" test like TestCNIServerChaining
. There should be a dedicated test case for metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment. I will try an alternative way.
@@ -510,6 +523,60 @@ func checkConjunctionFlows(t *testing.T, ruleTable uint8, dropTable uint8, allow | |||
} | |||
} | |||
|
|||
func checkOVSFlowMetrics(t *testing.T, installPolicyRules bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this feels like a clunky test to me, and a bit hard to maintain
would the following be at all possible: simply count the flows in all the OVS tables and check that the metrics match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check if that can be done.
94e0ea3
to
3f67ae0
Compare
/test-all |
} | ||
for _, tc := range testCases { | ||
t.Run(tc.name, func(t *testing.T) { | ||
if strings.Contains(tc.name, "Prometheus") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we add a new validateMetrics
boolean field to the testCase
struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just went with slightly hacky approach, since it is only one testcase. Added a boolean in testcase.
@@ -84,6 +88,9 @@ type testConfig struct { | |||
} | |||
|
|||
func TestConnectivityFlows(t *testing.T) { | |||
// Initialize ovs metrics (prometheus) to test them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/prometheus/Prometheus
same below
@@ -84,6 +88,9 @@ type testConfig struct { | |||
} | |||
|
|||
func TestConnectivityFlows(t *testing.T) { | |||
// Initialize ovs metrics (prometheus) to test them | |||
metrics.InitializeOVSMetrics() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that we call metrics.InitializeOVSMetrics()
in multiple tests. I assume there is no issue with calling the function multiple times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I made sure by calling this multiple times in same test. If the metric is already registered, the registry ignores subsequent calls.
totalFlowCount := 0 | ||
for _, table := range tableStatus { | ||
expectedFlowCount = expectedFlowCount + | ||
`antrea_agent_ovs_flow_count{table_id="` + strconv.Itoa(int(table.ID)) + `"} ` + strconv.Itoa(int(table.FlowCount)) + "\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would maybe be more readable with fmt.Sprintf
? what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good suggestion. Done.
Added testing for following metrics in integration tests: antrea_agent_ovs_total_flow_count antrea_agent_ovs_flow_count antrea_agent_local_pod_count Fixes#799
When running integration test on MacOS, hit an error because of an old docker image for antrea/openvswitch:2.13.0. Pulling that docker image explicitly which is used as base image for test container image.
3f67ae0
to
2818324
Compare
Thanks for your PR. The following commands are available:
These commands can only be run by members of the vmware-tanzu organization. |
/test-whole-conformance |
/test-conformance |
/test-networkpolicy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more nit, otherwise LGTM
tableStatus := client.GetFlowTableStatus() | ||
totalFlowCount := 0 | ||
for _, table := range tableStatus { | ||
expectedFlowCount = expectedFlowCount + fmt.Sprintf("antrea_agent_ovs_flow_count{table_id=\"%s\"} %s\n", strconv.Itoa(int(table.ID)), strconv.Itoa(int(table.FlowCount))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use %d
and then you no longer need to the casts to int
and the calls to strconv.Itoa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Missed this. Thanks.
2818324
to
504f64c
Compare
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for making the changes
* Enhance testing for prometheus metrics at agent Added testing for following metrics in integration tests: antrea_agent_ovs_total_flow_count antrea_agent_ovs_flow_count antrea_agent_local_pod_count Fixes#799 * Changes to Makefile to run integration test on MacOS When running integration test on MacOS, hit an error because of an old docker image for antrea/openvswitch:2.13.0. Pulling that docker image explicitly which is used as base image for test container image. * Addressed latest comments
Added testing for following metrics in existing integration tests:
antrea_agent_ovs_total_flow_count
antrea_agent_ovs_flow_count
antrea_agent_local_pod_count
Fixes #799