Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a deadline to TLS handshake. #3921

Merged
merged 7 commits into from
Nov 5, 2018
Merged

Add a deadline to TLS handshake. #3921

merged 7 commits into from
Nov 5, 2018

Conversation

jsha
Copy link
Contributor

@jsha jsha commented Nov 2, 2018

Previously, if a TLS handshake timed out, we would block forever in
conn.Handshake(), leaking both a TCP connection and a goroutine.
This sets a deadline on the underlying TCP connection, ensuring that
conn.Handshake() eventually times out.

Fixes #3915

Previously, if a TLS handshake timed out, we would block forever in
`conn.Handshake()`, leaking both a TCP connection and a goroutine.
This sets a deadline on the underlying TCP connection, ensuring that
`conn.Handshake()` eventually times out.
@jsha jsha requested a review from a team as a code owner November 2, 2018 21:49
va/va.go Outdated
@@ -665,18 +665,13 @@ func tlsDial(ctx context.Context, hostPort string, config *tls.Config) (*tls.Con
if err != nil {
return nil, err
}
if deadline, ok := ctx.Deadline(); ok {
netConn.SetDeadline(deadline)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errcheck wants you to check the error returned here.

rolandshoemaker
rolandshoemaker previously approved these changes Nov 3, 2018
va/va.go Outdated
@@ -665,18 +665,13 @@ func tlsDial(ctx context.Context, hostPort string, config *tls.Config) (*tls.Con
if err != nil {
return nil, err
}
if deadline, ok := ctx.Deadline(); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there ever a normal case in which !ok? It seems like we should probably have an audit err log line for the case where we get this far without a Deadline on the context to catch future bugs.

@@ -671,7 +671,7 @@ func TestTLSSNI01TimeoutAfterConnect(t *testing.T) {
t.Fatalf("Connection should've timed out")
}
test.AssertEquals(t, prob.Type, probs.ConnectionProblem)
expected := "Timeout after connect (your server may be slow or overloaded)"
expected := "Timeout during read (your server may be slow or overloaded)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the va tests to include a server that sleeps during the Handshake? I remember you suggesting this when this was in my sprint. Something like overloading a callback for providing a cert and adding a long blocking sleep.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had forgotten that we already have a server like that in these unittests, and that's what's used in this particular test:

boulder/va/va_test.go

Lines 632 to 642 in 8167d87

func slowTLSSrv() *httptest.Server {
server := httptest.NewUnstartedServer(http.DefaultServeMux)
server.TLS = &tls.Config{
GetCertificate: func(*tls.ClientHelloInfo) (*tls.Certificate, error) {
time.Sleep(100 * time.Millisecond)
return makeACert([]string{"nomatter"}), nil
},
}
server.StartTLS()
return server
}

In other words, we were already testing that we do the right thing when we time out during a handshake, in terms of error messages to the user. An ideal test for this fix would check that we haven't leaked goroutines or TCP connections, but I'm not sure how to achieve that reliably. I'll take a quick look at what's available to us.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, we were already testing that we do the right thing when we time out during a handshake, in terms of error messages to the user.

Ahhh! Ok that makes sense 👍

An ideal test for this fix would check that we haven't leaked goroutines or TCP connections, but I'm not sure how to achieve that reliably. I'll take a quick look at what's available to us.

Knowing that the existing slowTLSSrv is there makes me think its fine to +1 this as-is if it turns out to be fiddly testing the leak.

@rolandshoemaker
Copy link
Contributor

Unit test is broken.

@jsha
Copy link
Contributor Author

jsha commented Nov 5, 2018

Yeah, I added a potentially-flaky test to check for the leak, and it appears to be solidly flaky. :-) I'll back it out.

@cpu cpu merged commit 714457b into master Nov 5, 2018
@cpu cpu deleted the timeout-tls branch November 5, 2018 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants