Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enables HTTP/2 health check #95981

Merged
merged 3 commits into from
Nov 17, 2020

Conversation

caesarxuchao
Copy link
Member

@caesarxuchao caesarxuchao commented Oct 29, 2020

This PR enables the HTTP/2 connection health check for all the clients that call SetTransportDefaults. This includes all clients created via k8s.io/client-go. Without the health check, a broken transport layer connection can stay in the HTTP/2 connection pool for a long time, causing requests to the same host to fail. For example, a broken TCP connection can linger for 15 minutes before being closed.

The HTTP/2 feature exposes two parameters,

  • ReadIdleTimeout: if the HTTP/2 client has not received any frame from the connection for this amount of time, it starts sending periodic pings to the server. The period of the pings is also ReadIdleTimeout.
  • PingTimeout: if an ACK to the ping is not received by the client after this amount of time, the connection is considered broken and will be closed by the client.

For Kubernetes, I default the ReadIdleTimeout to 30s, which I think is not going to cause performance issues even in large clusters, because

  • the ping is carried by persisted connections,
  • the ping is handled at the http/2 layer, never surfaced to k8s layer.

I'll wait for results from pull-kubernetes-e2e-gce-large-performance to confirm.

I default the PingTimeout to 15s, which is relaxed enough in most network conditions.

Although I think most users don't need to tune these two parameters, I add environment variables to allow user to tune them, or even disable the health check, by setting HTTP2_READ_IDLE_TIMEOUT_SECONDS=0.

Fixes kubernetes/client-go#374 and
#87615.

Ref. #95898
#94844

/kind bug
/sig api-machinery

HTTP/2 connection health check is enabled by default in all Kubernetes clients. The feature should work out-of-the-box. If needed, users can tune the feature via the HTTP2_READ_IDLE_TIMEOUT_SECONDS and HTTP2_PING_TIMEOUT_SECONDS environment variables. The feature is disabled if HTTP2_READ_IDLE_TIMEOUT_SECONDS is set to 0.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/apiserver area/cloudprovider area/code-generation area/dependency Issues or PRs related to dependency changes area/kubectl sig/cli Categorizes an issue or PR as relevant to SIG CLI. labels Oct 29, 2020
@k8s-ci-robot k8s-ci-robot requested review from andrewsykim, andyzhangx and a team October 29, 2020 08:15
@k8s-ci-robot k8s-ci-robot added sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Oct 29, 2020
@caesarxuchao
Copy link
Member Author

caesarxuchao commented Oct 29, 2020

I disagree with setting ReadIdleTimeout to 1/2 * IdleTimeout in #95898. The connection is considered "idle" if there is no stream using the connection (See here). And the connection is considered "readIdle" if there is no frame read from the connection (See here). The two concepts are unique to each other so we shouldn't use one value to determine the other value.

update: I think the IdleTimeout in http/1 behaves more like what's described in #95898. A connection is closed if there is no traffic for the IdleTimeout duration.

@caesarxuchao
Copy link
Member Author

@cheftako, this health check is very similar to the GRPC health check we want to add to the apiserver-network-proxy to solve kubernetes-sigs/apiserver-network-proxy#152. Their performance implication is also similar.

@caesarxuchao
Copy link
Member Author

@cheftako
I'm not too worried about the performance cost of sending the periodic pings, because

  • the ping is carried by persisted connections
  • the ping is handled at the http/2 layer, never surfaced to k8s layer
  • the ping is only sent if there is no traffic on a connection for the ReadIdleTimeout (30s) duration.

It would be more convincing if we can test the behavior in a large cluster. @kubernetes/sig-scalability is there an easy way to test this PR in a large cluster?

@JensErat do you mind sharing the size of the clusters where you turned on the http/2 health check feature? Thanks.

@BenTheElder
Copy link
Member

/test
[intentionally triggering list of available tests]

@k8s-ci-robot
Copy link
Contributor

@BenTheElder: The /test command needs one or more targets.
The following commands are available to trigger jobs:

  • /test pull-kubernetes-bazel-build
  • /test pull-kubernetes-bazel-test
  • /test pull-kubernetes-conformance-image-test
  • /test pull-kubernetes-conformance-kind-ipv6-parallel
  • /test pull-kubernetes-dependencies
  • /test pull-kubernetes-dependencies-canary
  • /test pull-kubernetes-e2e-ipvs-azure-dualstack
  • /test pull-kubernetes-e2e-iptables-azure-dualstack
  • /test pull-kubernetes-e2e-aws-eks-1-13-correctness
  • /test pull-kubernetes-files-remake
  • /test pull-kubernetes-e2e-gce
  • /test pull-kubernetes-e2e-gce-no-stage
  • /test pull-kubernetes-e2e-gce-kubetest2
  • /test pull-kubernetes-e2e-gce-canary
  • /test pull-kubernetes-e2e-gce-ubuntu
  • /test pull-kubernetes-e2e-gce-ubuntu-containerd
  • /test pull-kubernetes-e2e-gce-ubuntu-containerd-canary
  • /test pull-kubernetes-e2e-gce-rbe
  • /test pull-kubernetes-e2e-gce-alpha-features
  • /test pull-kubernetes-e2e-gce-device-plugin-gpu
  • /test pull-kubernetes-integration
  • /test pull-kubernetes-cross
  • /test pull-kubernetes-e2e-kind
  • /test pull-kubernetes-e2e-kind-canary
  • /test pull-kubernetes-e2e-kind-ipv6
  • /test pull-kubernetes-e2e-kind-ipv6-canary
  • /test pull-kubernetes-conformance-kind-ga-only
  • /test pull-kubernetes-conformance-kind-ga-only-parallel
  • /test pull-kubernetes-e2e-kops-aws
  • /test pull-kubernetes-bazel-build-canary
  • /test pull-kubernetes-bazel-test-canary
  • /test pull-kubernetes-bazel-test-integration-canary
  • /test pull-kubernetes-local-e2e
  • /test pull-publishing-bot-validate
  • /test pull-kubernetes-e2e-gce-network-proxy-http-connect
  • /test pull-kubernetes-e2e-gce-network-proxy-grpc
  • /test pull-kubernetes-e2e-gci-gce-autoscaling
  • /test pull-kubernetes-e2e-aks-engine-azure
  • /test pull-kubernetes-e2e-azure-disk
  • /test pull-kubernetes-e2e-azure-disk-vmss
  • /test pull-kubernetes-e2e-azure-file
  • /test pull-kubernetes-e2e-kind-dual-canary
  • /test pull-kubernetes-e2e-kind-ipvs-dual-canary
  • /test pull-kubernetes-e2e-gci-gce-ipvs
  • /test pull-kubernetes-node-e2e
  • /test pull-kubernetes-e2e-containerd-gce
  • /test pull-kubernetes-node-e2e-containerd
  • /test pull-kubernetes-node-e2e-alpha
  • /test pull-kubernetes-node-kubelet-serial-cpu-manager
  • /test pull-kubernetes-node-kubelet-serial-topology-manager
  • /test pull-kubernetes-node-kubelet-serial-hugepages
  • /test pull-kubernetes-node-crio-cgrpv2-e2e
  • /test pull-kubernetes-node-crio-e2e
  • /test pull-kubernetes-e2e-gce-100-performance
  • /test pull-kubernetes-e2e-gce-big-performance
  • /test pull-kubernetes-e2e-gce-correctness
  • /test pull-kubernetes-e2e-gce-large-performance
  • /test pull-kubernetes-kubemark-e2e-gce-big
  • /test pull-kubernetes-kubemark-e2e-gce-scale
  • /test pull-kubernetes-e2e-gce-storage-slow
  • /test pull-kubernetes-e2e-gce-storage-snapshot
  • /test pull-kubernetes-e2e-gce-storage-slow-rbe
  • /test pull-kubernetes-e2e-gce-csi-serial
  • /test pull-kubernetes-e2e-gce-iscsi
  • /test pull-kubernetes-e2e-gce-iscsi-serial
  • /test pull-kubernetes-e2e-gce-storage-disruptive
  • /test pull-kubernetes-e2e-aks-engine-azure-windows
  • /test pull-kubernetes-e2e-azure-disk-windows
  • /test pull-kubernetes-e2e-azure-file-windows
  • /test pull-kubernetes-typecheck
  • /test pull-kubernetes-verify
  • /test pull-kubernetes-e2e-windows-gce

Use /test all to run the following jobs:

  • pull-kubernetes-bazel-build
  • pull-kubernetes-bazel-test
  • pull-kubernetes-dependencies
  • pull-kubernetes-e2e-gce-ubuntu-containerd
  • pull-kubernetes-integration
  • pull-kubernetes-e2e-kind
  • pull-kubernetes-e2e-kind-ipv6
  • pull-kubernetes-conformance-kind-ga-only-parallel
  • pull-kubernetes-node-e2e
  • pull-kubernetes-e2e-gce-100-performance
  • pull-kubernetes-e2e-azure-disk-windows
  • pull-kubernetes-e2e-azure-file-windows
  • pull-kubernetes-typecheck
  • pull-kubernetes-verify

In response to this:

/test
[intentionally triggering list of available tests]

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@BenTheElder
Copy link
Member

I think https://github.com/kubernetes/test-infra/blob/b40bcac24d0ec96cc7d440ca5ece690cce7a62fe/config/jobs/kubernetes/sig-scalability/sig-scalability-presubmit-jobs.yaml#L67 is what you want, there are some optional scale presubmit.

On second thought though, even if that were the right one, we should check with someone from the scale team before using these resources 🙃

@caesarxuchao
Copy link
Member Author

Thanks, @BenTheElder ! The pull-kubernetes-e2e-gce-large-performance uses 2000 nodes, that seems to be the right test. I'll check with the scalability sig.

@jkaniuk
Copy link
Contributor

jkaniuk commented Oct 29, 2020

Seems relatively harmless from scalability point of view.

I'm fine with running 2k presubmit to verify, however our approval is not needed:
/test pull-kubernetes-e2e-gce-large-performance
(fingers crossed, it was a while since I've run this)

@tosi3k - for visibility as next oncaller

@liggitt
Copy link
Member

liggitt commented Mar 18, 2021

Backport to 1.18 is riskier, as it is using v0.0.0-20191004110552-13f9640d40b9, not sure if a backport will introduce incompatible changes.

I think we should backport to 1.18, given:

  • how unpredictable the stuck connection issue is
  • how many components it affects
  • how problematic a stuck connection is for informer clients
  • that we've successfully used these sys and net levels in 1.19 for O(months) now
  • that other consumers have successfully backported these changes to 1.18 and used them in production

I opened #100376 to pick this to 1.18

@deads2k
Copy link
Contributor

deads2k commented Mar 24, 2021

Given the consequences when this fails, I'm in favor of backporting this change if we can get an ack (ideally from @caesarxuchao) that the backport is fairly safe. My knowledge of the golang stack at this depth is fairly limited.

Is there a particular minimum golang level that is required or was it just the golang.org/x/net level?

@liggitt
Copy link
Member

liggitt commented Mar 24, 2021

Given the consequences when this fails, I'm in favor of backporting this change if we can get an ack (ideally from @caesarxuchao) that the backport is fairly safe.

@caesarxuchao acked the backport PR (#100376)

Is there a particular minimum golang level that is required or was it just the golang.org/x/net level?

It was contained to golang.org/x/{net,sys}

k8s-ci-robot added a commit that referenced this pull request Mar 25, 2021
Cherrypick #95981 to 1.18, Enables HTTP/2 health check
christarazi added a commit to christarazi/cilium that referenced this pull request Mar 31, 2022
Following up from cilium#19259, this
commit implements the same logic as kubelet [1] to close all idle
connections. We can rely on HTTP/2 to determine which connections are
idle (HTTP/2 pings), instead of closing all connections indiscriminately
(when the heartbeat fails) via the custom dialer in setDialer().

We can assume HTTP/2 support as it has been supported since K8s 1.20
[2]. Users can disable HTTP/2 by setting the DISABLE_HTTP2 environment
variable, however this only really meant from very advanced users, hence
no documentation. Kubernetes has been running with HTTP/2 as the default
for a long time (~since Dec 2020) without major issues.

CloseIdleConnectionsFor() was introduced in K8s 1.23 [3], hence this commit
cannot be backported to any earlier version of Cilium which doesn't
support K8s 1.23.

[1]: kubernetes/kubernetes#104844
[2]: kubernetes/kubernetes#95981
[3]: kubernetes/kubernetes@b9d865a

Suggested-by: André Martins <andre@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
pchaigno pushed a commit to cilium/cilium that referenced this pull request Apr 4, 2022
Following up from #19259, this
commit implements the same logic as kubelet [1] to close all idle
connections. We can rely on HTTP/2 to determine which connections are
idle (HTTP/2 pings), instead of closing all connections indiscriminately
(when the heartbeat fails) via the custom dialer in setDialer().

We can assume HTTP/2 support as it has been supported since K8s 1.20
[2]. Users can disable HTTP/2 by setting the DISABLE_HTTP2 environment
variable, however this only really meant from very advanced users, hence
no documentation. Kubernetes has been running with HTTP/2 as the default
for a long time (~since Dec 2020) without major issues.

CloseIdleConnectionsFor() was introduced in K8s 1.23 [3], hence this commit
cannot be backported to any earlier version of Cilium which doesn't
support K8s 1.23.

[1]: kubernetes/kubernetes#104844
[2]: kubernetes/kubernetes#95981
[3]: kubernetes/kubernetes@b9d865a

Suggested-by: André Martins <andre@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
pchaigno pushed a commit to pchaigno/cilium that referenced this pull request Apr 5, 2022
[ upstream commit 365c788 ]

Following up from cilium#19259, this
commit implements the same logic as kubelet [1] to close all idle
connections. We can rely on HTTP/2 to determine which connections are
idle (HTTP/2 pings), instead of closing all connections indiscriminately
(when the heartbeat fails) via the custom dialer in setDialer().

We can assume HTTP/2 support as it has been supported since K8s 1.20
[2]. Users can disable HTTP/2 by setting the DISABLE_HTTP2 environment
variable, however this only really meant from very advanced users, hence
no documentation. Kubernetes has been running with HTTP/2 as the default
for a long time (~since Dec 2020) without major issues.

CloseIdleConnectionsFor() was introduced in K8s 1.23 [3], hence this commit
cannot be backported to any earlier version of Cilium which doesn't
support K8s 1.23.

[1]: kubernetes/kubernetes#104844
[2]: kubernetes/kubernetes#95981
[3]: kubernetes/kubernetes@b9d865a

Suggested-by: André Martins <andre@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
pchaigno pushed a commit to cilium/cilium that referenced this pull request Apr 6, 2022
[ upstream commit 365c788 ]

Following up from #19259, this
commit implements the same logic as kubelet [1] to close all idle
connections. We can rely on HTTP/2 to determine which connections are
idle (HTTP/2 pings), instead of closing all connections indiscriminately
(when the heartbeat fails) via the custom dialer in setDialer().

We can assume HTTP/2 support as it has been supported since K8s 1.20
[2]. Users can disable HTTP/2 by setting the DISABLE_HTTP2 environment
variable, however this only really meant from very advanced users, hence
no documentation. Kubernetes has been running with HTTP/2 as the default
for a long time (~since Dec 2020) without major issues.

CloseIdleConnectionsFor() was introduced in K8s 1.23 [3], hence this commit
cannot be backported to any earlier version of Cilium which doesn't
support K8s 1.23.

[1]: kubernetes/kubernetes#104844
[2]: kubernetes/kubernetes#95981
[3]: kubernetes/kubernetes@b9d865a

Suggested-by: André Martins <andre@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
serathius pushed a commit to serathius/kubernetes that referenced this pull request Mar 14, 2024
The env vars were implemented in kubernetes#95981.
Effectively no impacts if using the default values here.

BUG=255296578

Change-Id: I2441ef1a181a0fb202af7893c25cff5ff15ef17a
hoskeri pushed a commit to hoskeri/kubernetes that referenced this pull request Jul 23, 2024
The env vars were implemented in kubernetes#95981.
Effectively no impacts if using the default values here.

BUG=255296578

Change-Id: I2441ef1a181a0fb202af7893c25cff5ff15ef17a

Remove GODEBUG=x509sha1=1 workaround

Bug: 227456358a

Revert changes made in I5165e6c2fe73e8e1b2a617ced591133228b6d275

Change-Id: I7dd1ca8f9796c542c89074703e44e6b6f4b6edab
hoskeri pushed a commit to hoskeri/kubernetes that referenced this pull request Jul 23, 2024
The env vars were implemented in kubernetes#95981.
Effectively no impacts if using the default values here.

BUG=255296578

Change-Id: I2441ef1a181a0fb202af7893c25cff5ff15ef17a

Remove GODEBUG=x509sha1=1 workaround

Bug: 227456358a

Revert changes made in I5165e6c2fe73e8e1b2a617ced591133228b6d275

Change-Id: I7dd1ca8f9796c542c89074703e44e6b6f4b6edab
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/cloudprovider area/code-generation area/dependency Issues or PRs related to dependency changes area/kubectl cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Client should expose a mechanism to close underlying TCP connections