Skip to content
This repository has been archived by the owner on Mar 18, 2024. It is now read-only.

Accelerate dead peer detection with user timeout option #202

Merged
merged 2 commits into from
Jan 12, 2016

Conversation

dongluochen
Copy link
Contributor

Dockerclient are used by other tools like swarm. Swarm manager normally maintains 2-6 established TCP connections with an engine thru HTTP keepalive. When an engine goes offline abruptly, like engine crash, host pause, network failure, these connections remain in Established state. By default swarm manager relies on system dead connection detection mechanism to fail requests. In typical Linux system it's around 13-20 minutes, depending on tcp_retries1, tcp_retries2 and retransmit timeout value. From user perspective, docker CLI hangs.

RFC5482 defines TCP_USER_TIMEOUT so a sender can break the connection faster. While Golang doesn't support this socket option. Linux and Windows have different options to support user timeout.

Signed-off-by: Dong Chen dongluo.chen@docker.com

Signed-off-by: Dong Chen <dongluo.chen@docker.com>
// if packets are not acknowledged after 20 seconds. This is a
// relatively new TCP option to improve dead peer detection.
// Do not fail newHTTPClient if OS doesn's support it.
SetTCPUserTimeout(tcpConn, 20*time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very familiar with this, but I'm wondering, why hard code 20s and not use timeout ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vieux It's an option. The existing timeout is for TCP connection setup. I was thinking several options including timeout, 20s, 2 * timeout, timeout + 10s, max(timeout, 20s). I hesitated with timeout because I'm afraid of aggressive user setting on timeout. Dead peer detection shouldn't be too aggressive.

Signed-off-by: Dong Chen <dongluo.chen@docker.com>
@dongluochen
Copy link
Contributor Author

Ping @vieux @abronan.

@vieux
Copy link
Contributor

vieux commented Jan 12, 2016

LGTM

ping @ehazlett

@ehazlett
Copy link
Collaborator

LGTM

ehazlett added a commit that referenced this pull request Jan 12, 2016
Accelerate dead peer detection with user timeout option
@ehazlett ehazlett merged commit d72aea4 into samalba:master Jan 12, 2016
This was referenced Jan 13, 2016
rubenv added a commit to rubenv/dockerclient that referenced this pull request Jan 13, 2016
…rTimeout"

This reverts commit d72aea4, reversing
changes made to 73c9581.
dongluochen added a commit to dongluochen/dockerclient that referenced this pull request Jan 13, 2016
…rTimeout"

This reverts commit d72aea4, reversing
changes made to 73c9581.

Signed-off-by: Dong Chen <dongluo.chen@docker.com>
vieux added a commit that referenced this pull request Jan 14, 2016
Revert "Merge pull request #202 from dongluochen/supportTcpUserTimeout"
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants