Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bsr-383] Fix flaky studio agent test #1306

Merged
merged 9 commits into from
Aug 5, 2022
16 changes: 14 additions & 2 deletions private/bufpkg/bufstudioagent/bufstudioagent_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,14 @@ import (
"crypto/x509"
"encoding/base64"
"errors"
"fmt"
"io"
"net"
"net/http"
"net/http/httptest"
"strconv"
"testing"
"time"

studiov1alpha1 "github.com/bufbuild/buf/private/gen/proto/go/buf/alpha/studio/v1alpha1"
"github.com/bufbuild/connect-go"
Expand Down Expand Up @@ -141,6 +143,7 @@ func testPlainPostHandlerErrors(t *testing.T, upstreamServer *httptest.Server) {
nil,
),
)
agentServer.Client().Timeout = time.Second
unmultimedio marked this conversation as resolved.
Show resolved Hide resolved
defer agentServer.Close()

t.Run("forbidden_header", func(t *testing.T) {
Expand Down Expand Up @@ -188,11 +191,16 @@ func testPlainPostHandlerErrors(t *testing.T, upstreamServer *httptest.Server) {
t.Run("invalid_upstream", func(t *testing.T) {
listener, err := net.Listen("tcp", "127.0.0.1:")
require.NoError(t, err)
go func() {
listening := make(chan struct{}, 1)
go func(listening chan<- struct{}) {
listening <- struct{}{}
fmt.Println("signal sent")
conn, err := listener.Accept()
fmt.Println("connection arrived")
require.NoError(t, err)
require.NoError(t, conn.Close())
Copy link
Member Author

@unmultimedio unmultimedio Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is inherently flaky because of the behavior of http clients handling closed connections. So we can write a similiar test instead, to return the Unknown code, and test the same behavior.

}()
fmt.Println("connection closed")
}(listening)
defer listener.Close()

requestProto := &studiov1alpha1.InvokeRequest{
Expand All @@ -205,7 +213,11 @@ func testPlainPostHandlerErrors(t *testing.T, upstreamServer *httptest.Server) {
request, err := http.NewRequest(http.MethodPost, agentServer.URL, bytes.NewReader(requestBytes))
require.NoError(t, err)
request.Header.Set("Content-Type", "text/plain")
fmt.Println("waiting before doing the request")
<-listening
fmt.Println("unblocked, starting request")
response, err := agentServer.Client().Do(request)
fmt.Println("request completed")
require.NoError(t, err)
assert.Equal(t, http.StatusBadGateway, response.StatusCode)
})
Expand Down
7 changes: 5 additions & 2 deletions private/bufpkg/bufstudioagent/plain_post_handler.go
Original file line number Diff line number Diff line change
Expand Up @@ -169,7 +169,7 @@ func (i *plainPostHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
// TODO: should this context be cloned to remove attached values (but keep timeout)?``
response, err := client.CallUnary(r.Context(), request)
if err != nil {
// Connect marks any issues connecting with the Unavailable
// Any issue connecting is mapped by Connect to the Unavailable
// status code. We need to differentiate between server sent
// errors with the Unavailable code and client connection
// errors.
Expand All @@ -182,7 +182,10 @@ func (i *plainPostHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need https://github.com/bufbuild/connect-go/issues/222 to accurately determine what to return here? I don't like that to get a test passing we're treating any CodeUnavailable errors (whether from client connection or returned from server) as StatusBadGateway.

I'd feel better if this fix was restricted to the unit tests and didn't change this logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you check carefully the code that was written, the only logic I changed was adding connect.CodeUnavailable as another connect error code to trigger an http.StatusBadGateway. All the other cases: net.OpError, url.Error, or the catch-all in L196, all of them were doing the same thing, triggering an http.StatusBadGateway. All I did was simplify that code into a single catch-all in new L186.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main difference from before is that if you had a handler that returned CodeUnavailable (perhaps with error details) and it didn't wrap net.OpError or url.Error, we'd call i.writeProtoMessage with the error metadata.

Now we're treating CodeUnavailable (whether from a network error or from the handler) as StatusBadGateway. This means that people who use CodeUnavailable for whatever reason in their handler won't be able to dig into more details in the error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, this PR is purely attempting to improve test flakiness. Is that correct? If so, it's important but not super urgent.

@pkwarren is correct about the error handling gotchas we're introducing here. If this change can wait a few days, or if we're comfortable reverting it in a few days, let's address bufbuild/connect-go#222 and use that logic here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had a zoom with @pkwarren, explained me the error logic here, thank you! I'll update this PR to just be skipping the flaky test, and improving comments.

if connectErr := new(connect.Error); errors.As(err, &connectErr) {
if connectErr.Code() == connect.CodeUnknown {
switch connectErr.Code() {
// The server is in some kind of bad and/or unreachable state, forward
// CodeUnavailable information as well.
case connect.CodeUnknown, connect.CodeUnavailable:
http.Error(w, err.Error(), http.StatusBadGateway)
return
}
Expand Down