Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bsr-383] Fix flaky studio agent test #1306

Merged
merged 9 commits into from
Aug 5, 2022
26 changes: 12 additions & 14 deletions private/bufpkg/bufstudioagent/bufstudioagent_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ import (
"encoding/base64"
"errors"
"io"
"net"
"net/http"
"net/http/httptest"
"strconv"
Expand All @@ -38,8 +37,9 @@ import (
)

const (
echoPath = "/echo.Service/EchoEcho"
errorPath = "/error.Service/Error"
echoPath = "/echo.Service/EchoEcho"
errorPath = "/error.Service/Error"
unknownPath = "/unknown.Service/Unknown"
)

func TestPlainPostHandlerTLS(t *testing.T) {
Expand Down Expand Up @@ -185,18 +185,9 @@ func testPlainPostHandlerErrors(t *testing.T, upstreamServer *httptest.Server) {
assert.Equal(t, "something", upstreamResponseHeaders.Get("grpc-message"))
})

t.Run("invalid_upstream", func(t *testing.T) {
listener, err := net.Listen("tcp", "127.0.0.1:")
require.NoError(t, err)
go func() {
conn, err := listener.Accept()
require.NoError(t, err)
require.NoError(t, conn.Close())
Copy link
Member Author

@unmultimedio unmultimedio Aug 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is inherently flaky because of the behavior of http clients handling closed connections. So we can write a similiar test instead, to return the Unknown code, and test the same behavior.

}()
defer listener.Close()

t.Run("unknown_response_bad_gateway", func(t *testing.T) {
requestProto := &studiov1alpha1.InvokeRequest{
Target: "http://" + listener.Addr().String(),
Target: upstreamServer.URL + unknownPath,
Headers: goHeadersToProtoHeaders(http.Header{
"Content-Type": []string{"application/grpc"},
}),
Expand Down Expand Up @@ -236,6 +227,13 @@ func newTestConnectServer(t *testing.T, tls bool) *httptest.Server {
},
connect.WithCodec(&bufferCodec{name: "proto"}),
))
mux.Handle(unknownPath, connect.NewUnaryHandler(
errorPath,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be unknownPath?

Copy link
Member Author

@unmultimedio unmultimedio Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! will fix Fixed in the latest commit, thx.

func(ctx context.Context, r *connect.Request[bytes.Buffer]) (*connect.Response[bytes.Buffer], error) {
return nil, connect.NewError(connect.CodeUnknown, errors.New(r.Msg.String()))
},
connect.WithCodec(&bufferCodec{name: "proto"}),
))
if tls {
upstreamServerTLS := httptest.NewUnstartedServer(mux)
upstreamServerTLS.EnableHTTP2 = true
Expand Down
32 changes: 11 additions & 21 deletions private/bufpkg/bufstudioagent/plain_post_handler.go
Original file line number Diff line number Diff line change
Expand Up @@ -166,33 +166,23 @@ func (i *plainPostHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
targetURL.String(),
clientOptions...,
)
// TODO: should this context be cloned to remove attached values (but keep timeout)?``
// TODO(rvanginkel) should this context be cloned to remove attached values (but keep timeout)?
response, err := client.CallUnary(r.Context(), request)
if err != nil {
// Connect marks any issues connecting with the Unavailable
// status code. We need to differentiate between server sent
// errors with the Unavailable code and client connection
// errors.
if netErr := new(net.OpError); errors.As(err, &netErr) {
http.Error(w, err.Error(), http.StatusBadGateway)
return
}
if urlErr := new(url.Error); errors.As(err, &urlErr) {
http.Error(w, err.Error(), http.StatusBadGateway)
return
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need https://github.com/bufbuild/connect-go/issues/222 to accurately determine what to return here? I don't like that to get a test passing we're treating any CodeUnavailable errors (whether from client connection or returned from server) as StatusBadGateway.

I'd feel better if this fix was restricted to the unit tests and didn't change this logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you check carefully the code that was written, the only logic I changed was adding connect.CodeUnavailable as another connect error code to trigger an http.StatusBadGateway. All the other cases: net.OpError, url.Error, or the catch-all in L196, all of them were doing the same thing, triggering an http.StatusBadGateway. All I did was simplify that code into a single catch-all in new L186.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main difference from before is that if you had a handler that returned CodeUnavailable (perhaps with error details) and it didn't wrap net.OpError or url.Error, we'd call i.writeProtoMessage with the error metadata.

Now we're treating CodeUnavailable (whether from a network error or from the handler) as StatusBadGateway. This means that people who use CodeUnavailable for whatever reason in their handler won't be able to dig into more details in the error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, this PR is purely attempting to improve test flakiness. Is that correct? If so, it's important but not super urgent.

@pkwarren is correct about the error handling gotchas we're introducing here. If this change can wait a few days, or if we're comfortable reverting it in a few days, let's address bufbuild/connect-go#222 and use that logic here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had a zoom with @pkwarren, explained me the error logic here, thank you! I'll update this PR to just be skipping the flaky test, and improving comments.

if connectErr := new(connect.Error); errors.As(err, &connectErr) {
if connectErr.Code() == connect.CodeUnknown {
http.Error(w, err.Error(), http.StatusBadGateway)
// Any Connect error except these codes can be wrapped in the headers.
if connectErr.Code() != connect.CodeUnknown &&
connectErr.Code() != connect.CodeUnavailable {
i.writeProtoMessage(w, &studiov1alpha1.InvokeResponse{
// connectErr.Meta contains the trailers for the
// caller to find out the error details.
Headers: goHeadersToProtoHeaders(connectErr.Meta()),
})
return
}
i.writeProtoMessage(w, &studiov1alpha1.InvokeResponse{
// connectErr.Meta contains the trailers for the
// caller to find out the error details.
Headers: goHeadersToProtoHeaders(connectErr.Meta()),
})
return
}
// Any issue connecting that is not a Connect error is assumed to be because
// the server is in some kind of bad and/or unreachable state.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for checking net.OpErr or url.Error, all of those cases were doing the same thing as this catch-all. Same for the Unknown and CodeUnavailable connect codes.

http.Error(w, err.Error(), http.StatusBadGateway)
return
}
Expand Down