Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for 5 seconds when getting of_port number #830

Merged

Conversation

antoninbas
Copy link
Contributor

Instead of 1 second on Linux. We have observed on some production
clusters that it sometimes takes more second for ovs-vswitch to report
the port number to OVSDB, although we are not yet sure why. Because the
wait operation actually returns when the port is available, this does
not increase execution time of CNI Add in the general case.

Instead of 1 second on Linux. We have observed on some production
clusters that it sometimes takes more second for ovs-vswitch to report
the port number to OVSDB, although we are not yet sure why. Because the
wait operation actually returns when the port is available, this does
not increase execution time of CNI Add in the general case.
@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-windows-conformance: to trigger windows conformance tests.
  • /skip-windows-conformance: to skip windows conformance tests.
  • /test-all: to trigger all tests.
  • /skip-all: to skip all tests.

These commands can only be run by members of the vmware-tanzu organization.

@antoninbas antoninbas requested a review from jianjuns June 12, 2020 00:23
Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with this quick solution.
Do we observe how much time it takes for container to start after CNI ADD returns? If it takes long, a possibility is to change the OFPort reading to async, not to block CNI ADD. But if container can be started fast, this can increase risks of application failures after starting.
@tnqn

@tnqn
Copy link
Member

tnqn commented Jun 15, 2020

I am fine with this quick solution.
Do we observe how much time it takes for container to start after CNI ADD returns? If it takes long, a possibility is to change the OFPort reading to async, not to block CNI ADD. But if container can be started fast, this can increase risks of application failures after starting.
@tnqn

Setting up network is the last step of creating sandbox container, and the following step is creating init container (or normal container), so it could be very fast to start container after that.
I tested it with containerd:
12:23:51.417207 CNI ADD returns
12:23:51.752115 containerd returns "StartContainer for ... returns successfully"

I feel the original 1 second might be too small to get response given that the worker Node could be overloaded. I see openstack sets vsctl timeout to 10 seconds by default and ansible sets 5 seconds.

I think we could add prometheus metrics for durations of CNI requests to understand how long they take.

@antoninbas
Copy link
Contributor Author

/test-all

@antoninbas
Copy link
Contributor Author

/test-windows-conformance

1 similar comment
@antoninbas
Copy link
Contributor Author

/test-windows-conformance

@antoninbas
Copy link
Contributor Author

Windows job seems stuck. I will merge this now but will make sure the test passes before releasing 0.7.2.

@antoninbas antoninbas merged commit cd225e1 into antrea-io:master Jun 15, 2020
@antoninbas antoninbas deleted the wait-longer-when-getting-of_port branch June 15, 2020 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants