Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ETCD-180: Bug 1927942: UPSTREAM: <carry>: *: add support for socket options #70

Merged

Conversation

hexfusion
Copy link

@hexfusion hexfusion commented Mar 9, 2021

This is a manual backport of etcd-io#12702

This PR adds support for setting socket options SO_REUSEADDR and SO_REUSEPORT to etcd listeners via ListenConfig. These options give the flexibility to cluster admins who wish to more explicit control of these features. What we have found is during etcd process restart there can be a considerable time waiting for the port to release as it is held open by TIME_WAIT which on many systems is 60s.

--socket-reuse-port: enables SO_REUSEPORT [1]  allows rebind of a port already in use. 
--socket-reuse-address: enables SO_REUSEADDR [1]  allows binding to an address in `TIME_WAIT` state. 
$ ps aux | grep etcd | wc -l
0

$  ss --numeric -o state time-wait 
tcp    0       0                 10.0.138.12:34642            10.0.138.12:2380   timer:(timewait,50sec,0)                                                       
tcp    0       0                 10.0.138.12:34470            10.0.138.12:2380   timer:(timewait,35sec,0)                                                       
tcp    0       0                 10.0.138.12:34086            10.0.138.12:2380   timer:(timewait,10sec,0)                                                       
tcp    0       0                 10.0.138.12:34232            10.0.138.12:2380   timer:(timewait,20sec,0)                                                       
tcp    0       0                 10.0.138.12:34694            10.0.138.12:2380   timer:(timewait,55sec,0)                                                       
tcp    0       0                 10.0.138.12:34332            10.0.138.12:2380   timer:(timewait,25sec,0)                                                       
tcp    0       0                 10.0.138.12:33966            10.0.138.12:2380   timer:(timewait,845ms,0)                                                       
tcp    0       0                 10.0.138.12:34040            10.0.138.12:2380   timer:(timewait,5.845ms,0)                                                     
tcp    0       0                 10.0.138.12:34184            10.0.138.12:2380   timer:(timewait,15sec,0)                                                       
tcp    0       0                 10.0.138.12:34570            10.0.138.12:2380   timer:(timewait,45sec,0)                                                       
tcp    0       0                 10.0.138.12:34512            10.0.138.12:2380   timer:(timewait,40sec,0)                                                       
tcp    0       0                 10.0.138.12:34398            10.0.138.12:2380   timer:(timewait,30sec,0)  

So we can wait for many seconds even after etcd process is long dead for client and peer ports to become available.

2021-02-19 13:57:25.117257 C | etcdmain: listen tcp 10.0.138.12:2380: bind: address already in use

A similar approach has been taken with the addition of these options in the kube-apiserver[2]

[1] https://man7.org/linux/man-pages/man7/socket.7.html
[2] kubernetes/kubernetes#93861

@openshift-ci-robot
Copy link

@hexfusion: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

UPSTREAM: : *: add support for socket options

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 9, 2021
@openshift-ci-robot
Copy link

@hexfusion: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

UPSTREAM: : *: add support for socket options

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

1 similar comment
@openshift-ci-robot
Copy link

@hexfusion: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

In response to this:

UPSTREAM: : *: add support for socket options

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hexfusion
Copy link
Author

etcd appears fine

/retest

@hexfusion
Copy link
Author

/test openshift-perfscale-e2e-configmap-scale

@openshift-ci-robot
Copy link

@hexfusion: The specified target(s) for /test were not found.
The following commands are available to trigger jobs:

  • /test configmap-scale
  • /test e2e-aws
  • /test e2e-aws-serial
  • /test e2e-aws-upgrade
  • /test images
  • /test unit

Use /test all to run the following jobs:

  • pull-ci-openshift-etcd-openshift-4.8-e2e-aws
  • pull-ci-openshift-etcd-openshift-4.8-e2e-aws-serial
  • pull-ci-openshift-etcd-openshift-4.8-e2e-aws-upgrade
  • pull-ci-openshift-etcd-openshift-4.8-images
  • pull-ci-openshift-etcd-openshift-4.8-unit

In response to this:

/test openshift-perfscale-e2e-configmap-scale

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hexfusion
Copy link
Author

/test configmap-scale

@hexfusion
Copy link
Author

/retest

@hexfusion hexfusion changed the title UPSTREAM: <carry>: *: add support for socket options Bug 1927942: UPSTREAM: <carry>: *: add support for socket options Mar 10, 2021
@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. label Mar 10, 2021
@openshift-ci-robot
Copy link

@hexfusion: This pull request references Bugzilla bug 1927942, which is invalid:

  • expected Bugzilla bug 1927942 to depend on a bug targeting a release in 4.9.0 and in one of the following states: MODIFIED, VERIFIED, but no dependents were found

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 1927942: UPSTREAM: : *: add support for socket options

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Mar 10, 2021
@hexfusion hexfusion added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. staff-eng-approved Indicates a release branch PR has been approved by a staff engineer (formerly group/pillar lead). and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Mar 10, 2021
@hexfusion
Copy link
Author

the bug is valid we have issues because we are a fork, manually setting group lead approved as it is not required in 4.8.

@hexfusion
Copy link
Author

graceful term failure is unrelated to this change, unless the flag is enabled we should not see any direct impact on listeners.

https://search.ci.openshift.org/?search=kube-apiserver+terminates+within+graceful+termination+period+&maxAge=48h&context=1&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

/test e2e-aws

@hexfusion
Copy link
Author

/refresh

@openshift-ci-robot
Copy link

@hexfusion: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws 22e5a7c link /test e2e-aws

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@hexfusion
Copy link
Author

/retest

@hexfusion hexfusion force-pushed the carry-etcd-12702 branch 2 times, most recently from e98f920 to f0be2ed Compare March 10, 2021 22:24
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
Signed-off-by: Sam Batschelet <sbatsche@redhat.com>
@hexfusion
Copy link
Author

/retest

2 similar comments
@hexfusion
Copy link
Author

/retest

@hexfusion
Copy link
Author

/retest

@hexfusion
Copy link
Author

/test unit

@hexfusion hexfusion changed the title Bug 1927942: UPSTREAM: <carry>: *: add support for socket options ETCD-180: Bug 1927942: UPSTREAM: <carry>: *: add support for socket options Mar 22, 2021
@hexfusion
Copy link
Author

/retest

2 similar comments
@hexfusion
Copy link
Author

/retest

@hexfusion
Copy link
Author

/retest

@hexfusion
Copy link
Author

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 29, 2021
@hexfusion
Copy link
Author

/retest

@hexfusion
Copy link
Author

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 29, 2021
@hexfusion
Copy link
Author

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 29, 2021
@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@hexfusion
Copy link
Author

/retest

@hexfusion
Copy link
Author

flake

/retest

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@hexfusion
Copy link
Author

/test e2e-aws

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@hexfusion
Copy link
Author

/retest

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@hexfusion
Copy link
Author

/test e2e-aws-upgrade

@hexfusion
Copy link
Author

/retest

@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@hexfusion
Copy link
Author

/retest

@openshift-merge-robot openshift-merge-robot merged commit d031eea into openshift:openshift-4.8 Mar 30, 2021
@openshift-ci-robot
Copy link

openshift-ci-robot commented Mar 30, 2021

@hexfusion: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Bugzilla bug in order for it to move to the next state. Once unlinked, request a bug refresh with /bugzilla refresh.

Bugzilla bug 1927942 has not been moved to the MODIFIED state.

In response to this:

ETCD-180: Bug 1927942: UPSTREAM: : *: add support for socket options

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@hexfusion hexfusion deleted the carry-etcd-12702 branch April 1, 2021 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. staff-eng-approved Indicates a release branch PR has been approved by a staff engineer (formerly group/pillar lead).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants