Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance testing for prometheus metrics at agent #916

Merged
merged 3 commits into from
Jul 16, 2020

Conversation

srikartati
Copy link
Member

@srikartati srikartati commented Jul 6, 2020

Added testing for following metrics in existing integration tests:
antrea_agent_ovs_total_flow_count
antrea_agent_ovs_flow_count
antrea_agent_local_pod_count

Fixes #799

@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-whole-conformance: to trigger all conformance tests on linux.
  • /skip-whole-conformance: to skip all conformance tests on linux.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-windows-conformance: to trigger windows conformance tests.
  • /skip-windows-conformance: to skip windows conformance tests.
  • /test-all: to trigger all tests (except whole conformance).
  • /skip-all: to skip all tests (except whole conformance).

These commands can only be run by members of the vmware-tanzu organization.

@srikartati
Copy link
Member Author

/test-all

@ksamoray
Copy link
Contributor

ksamoray commented Jul 7, 2020

Hi @srikartati,
Do we still the Prometheus tests in e2e?

@srikartati
Copy link
Member Author

Hi @srikartati,
Do we still the Prometheus tests in e2e?

These tests do not cover all metrics yet. In addition, e2e test seems to be testing the workflow involving the Prometheus server. Maybe if the above tests are available for all agent and controller metrics, we could remove TestPrometheusMetricsOnController and TestPrometheusMetricsOnAgent, but still keep Prometheus server-based tests.
Since you added the e2e test, you may have more insights on this issue.

@srikartati srikartati requested a review from tnqn July 8, 2020 23:34
tnqn
tnqn previously approved these changes Jul 9, 2020
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, could you add a space between "Fixes" and "#NUM"? otherwise it doesn't link to the issue and cannot auto close it.

@srikartati
Copy link
Member Author

/test-all

@srikartati
Copy link
Member Author

/test-conformance
/test-networkpolicy
/test-windows-networkpolicy

@srikartati
Copy link
Member Author

/test-conformance

@srikartati
Copy link
Member Author

/test-networkpolicy

pkg/agent/metrics/prometheus.go Outdated Show resolved Hide resolved
Makefile Show resolved Hide resolved
@@ -777,6 +783,14 @@ func TestCNIServerChaining(t *testing.T) {
testRequire.Nil(err)
testRequire.Equal(tc.networkConfig, string(cniResp.CniResult))

// Check pod count metric
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am definitely not in favor of piggybacking a metrics test in a highly "specialized" test like TestCNIServerChaining. There should be a dedicated test case for metrics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. I will try an alternative way.

@@ -510,6 +523,60 @@ func checkConjunctionFlows(t *testing.T, ruleTable uint8, dropTable uint8, allow
}
}

func checkOVSFlowMetrics(t *testing.T, installPolicyRules bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like a clunky test to me, and a bit hard to maintain

would the following be at all possible: simply count the flows in all the OVS tables and check that the metrics match?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check if that can be done.

@srikartati
Copy link
Member Author

/test-all

}
for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
if strings.Contains(tc.name, "Prometheus") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we add a new validateMetrics boolean field to the testCase struct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just went with slightly hacky approach, since it is only one testcase. Added a boolean in testcase.

test/integration/agent/cniserver_test.go Show resolved Hide resolved
@@ -84,6 +88,9 @@ type testConfig struct {
}

func TestConnectivityFlows(t *testing.T) {
// Initialize ovs metrics (prometheus) to test them
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/prometheus/Prometheus

same below

@@ -84,6 +88,9 @@ type testConfig struct {
}

func TestConnectivityFlows(t *testing.T) {
// Initialize ovs metrics (prometheus) to test them
metrics.InitializeOVSMetrics()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we call metrics.InitializeOVSMetrics() in multiple tests. I assume there is no issue with calling the function multiple times?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I made sure by calling this multiple times in same test. If the metric is already registered, the registry ignores subsequent calls.

totalFlowCount := 0
for _, table := range tableStatus {
expectedFlowCount = expectedFlowCount +
`antrea_agent_ovs_flow_count{table_id="` + strconv.Itoa(int(table.ID)) + `"} ` + strconv.Itoa(int(table.FlowCount)) + "\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would maybe be more readable with fmt.Sprintf? what do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good suggestion. Done.

Added testing for following metrics in integration tests:
antrea_agent_ovs_total_flow_count
antrea_agent_ovs_flow_count
antrea_agent_local_pod_count

Fixes#799
When running integration test on MacOS, hit an error because of an old
docker image for antrea/openvswitch:2.13.0. Pulling that docker image
explicitly which is used as base image for test container image.
@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-whole-conformance: to trigger all conformance tests on linux.
  • /skip-whole-conformance: to skip all conformance tests on linux.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-windows-conformance: to trigger windows conformance tests.
  • /skip-windows-conformance: to skip windows conformance tests.
  • /test-windows-networkpolicy: to trigger windows networkpolicy tests.
  • /skip-windows-networkpolicy: to skip windows networkpolicy tests.
  • /test-all: to trigger all tests (except whole conformance).
  • /skip-all: to skip all tests (except whole conformance).

These commands can only be run by members of the vmware-tanzu organization.

@srikartati
Copy link
Member Author

srikartati commented Jul 15, 2020

/test-whole-conformance
/test-all

@srikartati
Copy link
Member Author

/test-conformance
/test-windows-conformance

@srikartati
Copy link
Member Author

/test-networkpolicy
/test-windows-networkpolicy

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more nit, otherwise LGTM

tableStatus := client.GetFlowTableStatus()
totalFlowCount := 0
for _, table := range tableStatus {
expectedFlowCount = expectedFlowCount + fmt.Sprintf("antrea_agent_ovs_flow_count{table_id=\"%s\"} %s\n", strconv.Itoa(int(table.ID)), strconv.Itoa(int(table.FlowCount)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use %d and then you no longer need to the casts to int and the calls to strconv.Itoa

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Missed this. Thanks.

@srikartati
Copy link
Member Author

/test-all

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for making the changes

@srikartati srikartati merged commit 4aadebb into antrea-io:master Jul 16, 2020
GraysonWu pushed a commit to GraysonWu/antrea that referenced this pull request Sep 22, 2020
* Enhance testing for prometheus metrics at agent

Added testing for following metrics in integration tests:
antrea_agent_ovs_total_flow_count
antrea_agent_ovs_flow_count
antrea_agent_local_pod_count

Fixes#799

* Changes to Makefile to run integration test on MacOS

When running integration test on MacOS, hit an error because of an old
docker image for antrea/openvswitch:2.13.0. Pulling that docker image
explicitly which is used as base image for test container image.

* Addressed latest comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Robust testing for prometheus metrics in Agent and Controller
6 participants