Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special case unary HTTP calls #611

Closed
wants to merge 18 commits into from
Closed

Special case unary HTTP calls #611

wants to merge 18 commits into from

Conversation

emcfarlane
Copy link
Contributor

@emcfarlane emcfarlane commented Oct 19, 2023

Special paths the unary request flow to avoid some of the overhead with streaming request bodies. Unary requests now set http.Request's GetBody field with the full buffer payload. Buffers are wrapped in a payloadCloser to ensure they can be safely re-used.

See issues #609 and #541 .

Status

Unary call optimization is applied to all support protocols.

Remaining work will be done in a cleanup PR to optimize the envelope payloads.

Benchmarks

Updated to latest commit. From the benchmarks theres some reduction in allocs for Unary flows, small increase in bytes due to the missing optimization on envelopes.

                                │   base.txt   │               new.txt                │
                                │    sec/op    │    sec/op     vs base                │
Connect/connect/unary_big-8       2.013m ± ∞ ¹   1.948m ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/connect/unary_small-8     40.27µ ± ∞ ¹   39.64µ ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/connect/client_stream-8   56.23µ ± ∞ ¹   53.88µ ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/connect/server_stream-8   55.21µ ± ∞ ¹   51.11µ ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/connect/bidi_stream-8     56.27µ ± ∞ ¹   54.70µ ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpc/unary_big-8          1.994m ± ∞ ¹   1.968m ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpc/unary_small-8        44.49µ ± ∞ ¹   41.24µ ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpc/client_stream-8      44.50µ ± ∞ ¹   51.83µ ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpc/server_stream-8      44.10µ ± ∞ ¹   40.93µ ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpc/bidi_stream-8        46.22µ ± ∞ ¹   44.90µ ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpcweb/unary_big-8       2.087m ± ∞ ¹   2.008m ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpcweb/unary_small-8     57.98µ ± ∞ ¹   53.48µ ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpcweb/client_stream-8   57.83µ ± ∞ ¹   56.83µ ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpcweb/server_stream-8   56.03µ ± ∞ ¹   55.27µ ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpcweb/bidi_stream-8     56.62µ ± ∞ ¹   55.56µ ± ∞ ¹       ~ (p=1.000 n=1) ²
geomean                           106.4µ         103.5µ        -2.67%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                │   base.txt    │                new.txt                │
                                │     B/op      │     B/op       vs base                │
Connect/connect/unary_big-8       5.453Mi ± ∞ ¹   4.945Mi ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/connect/unary_small-8     21.60Ki ± ∞ ¹   21.14Ki ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/connect/client_stream-8   22.65Ki ± ∞ ¹   20.97Ki ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/connect/server_stream-8   22.44Ki ± ∞ ¹   23.98Ki ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/connect/bidi_stream-8     19.90Ki ± ∞ ¹   20.50Ki ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpc/unary_big-8          5.537Mi ± ∞ ¹   4.854Mi ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpc/unary_small-8        25.16Ki ± ∞ ¹   24.26Ki ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpc/client_stream-8      21.57Ki ± ∞ ¹   23.98Ki ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpc/server_stream-8      23.79Ki ± ∞ ¹   23.89Ki ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpc/bidi_stream-8        21.54Ki ± ∞ ¹   21.73Ki ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpcweb/unary_big-8       5.800Mi ± ∞ ¹   5.203Mi ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpcweb/unary_small-8     30.97Ki ± ∞ ¹   32.82Ki ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpcweb/client_stream-8   27.02Ki ± ∞ ¹   29.21Ki ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpcweb/server_stream-8   29.13Ki ± ∞ ¹   32.04Ki ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpcweb/bidi_stream-8     19.33Ki ± ∞ ¹   19.66Ki ± ∞ ¹       ~ (p=1.000 n=1) ²
geomean                           70.59Ki         70.54Ki        -0.07%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                │  base.txt   │               new.txt               │
                                │  allocs/op  │  allocs/op   vs base                │
Connect/connect/unary_big-8       170.0 ± ∞ ¹   162.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/connect/unary_small-8     152.0 ± ∞ ¹   145.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/connect/client_stream-8   191.0 ± ∞ ¹   188.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/connect/server_stream-8   185.0 ± ∞ ¹   179.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/connect/bidi_stream-8     173.0 ± ∞ ¹   172.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpc/unary_big-8          231.0 ± ∞ ¹   220.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpc/unary_small-8        210.0 ± ∞ ¹   204.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpc/client_stream-8      213.0 ± ∞ ¹   211.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpc/server_stream-8      208.0 ± ∞ ¹   201.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpc/bidi_stream-8        196.0 ± ∞ ¹   195.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpcweb/unary_big-8       218.0 ± ∞ ¹   208.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpcweb/unary_small-8     198.0 ± ∞ ¹   192.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpcweb/client_stream-8   201.0 ± ∞ ¹   199.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpcweb/server_stream-8   195.0 ± ∞ ¹   189.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
Connect/grpcweb/bidi_stream-8     174.0 ± ∞ ¹   173.0 ± ∞ ¹       ~ (p=1.000 n=1) ²
geomean                           193.3         188.2        -2.64%

@emcfarlane emcfarlane self-assigned this Oct 19, 2023
Copy link
Member

@jhump jhump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This didn't actually add a separate entry point to protocolClient as previously discussed. That's a bit awkward since it means the NewConn still has to special-case unary in ways that cause the shape of the abstractions to be more complicated than needed. For example, connectUnaryClientConn seems to be pure bloat. We could get rid of it, and then also improve the unaryHTTPCall's interface if we separate unary from stream calls higher up the stack.

See #609 for a little more context.

.golangci.yml Show resolved Hide resolved
http_call.go Outdated Show resolved Hide resolved
http_call.go Outdated Show resolved Hide resolved
http_call.go Outdated Show resolved Hide resolved
http_call.go Outdated Show resolved Hide resolved
protocol_grpc.go Outdated Show resolved Hide resolved
@mattrobenolt
Copy link
Contributor

The only thing I wanted to add to this right now is this should in theory be able to be applied just fine for server streaming as well, and I'd suspect some optimizations for client streaming?

The only special case needing the io.Pipe should be for bidi. So if we can swallow up "everything except bidi", that'd be my dream implementation. 🙏

Only commenting now since I'd hate to be hyper specific on unary and not catch server stream, at minimum.

http_call.go Outdated Show resolved Hide resolved
@jhump
Copy link
Member

jhump commented Oct 19, 2023

The only thing I wanted to add to this right now is this should in theory be able to be applied just fine for server streaming as well, and I'd suspect some optimizations for client streaming?

Hmmm, that feels like hyper-specialization. I think it's worth it for unary since they are the overwhelming majority of RPCs, and the flow of a unary RPC is straight-forward enough that a specialized flow would maximize readability. But also separately optimizing the other streaming types feel like it will either (1) lead to too many permutations (we suddenly have four specializations to maintain instead of two) or (2) require substantial re-architecting of protocolClient to make the request-handling abstraction independent of response-handling.

That's not really an objective that has been discussed before now. Maybe we should take this discussion to the issue (#609), to better nail down what stakeholders are expecting?

The only special case needing the io.Pipe should be for bidi. So if we can swallow up "everything except bidi", that'd be my dream implementation. 🙏

I don't see how this is true. How does client-streaming work without an io.Pipe? The client sends (i.e. writes) request data, but the *http.Request.Body is a reader. A pipe is what is used to transform the writer into a reader. I could see client-streaming working without a pipe by buffering the entire request stream, but I'm pretty sure that's not what you're suggesting.

@mattrobenolt
Copy link
Contributor

I understand I'm not the one implementing this, so I don't fully know the context of everything here, but there are two sides to this. Client request is buffered or streamed, or the server response is buffered or streamed, or the permutation of both streaming.

A simple Unary is buffered request and buffered response, while a server stream is buffered request, streamed response.

Both pure unary and server stream are the same "write request, read response" with Body being a reader. The choice is to either buffer or stream it.

I might be misspeaking on client streaming since that one is a bit more unique and might be limited by stdlib APIs.

But the io.Pipe is only necessary to facilitate reading and writing asynchronously.

imo the "unary" implementation at minimum can easily cover the pure unary and server stream. If it makes sense for APIs available, to lump client streaming with bidi, I think that's fine too, since client streaming is used even less.

We quite heavily utilize server streaming, and there's no reason that needs to be done with the extra goroutine either. The implementaiton is just reading N messages rather than 1.

@mattrobenolt
Copy link
Contributor

Maybe we should take this discussion to the issue (#609), to better nail down what stakeholders are expecting?

Oh, I missed this! Yes, I agree.

Co-authored-by: Matt Robenolt <matt@ydekproductions.com>
@emcfarlane
Copy link
Contributor Author

emcfarlane commented Oct 23, 2023

Waiting to merge changes in #594

responseReady chan struct{}
request *http.Request
response *http.Response
requestSent atomic.Bool
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch to atomic.Bool to be able to get the state of the sync.Once and error on unary if already called.

@@ -130,179 +116,196 @@ func (d *duplexHTTPCall) CloseWrite() error {
// forever. To make sure users don't have to worry about this, the generated
// code for unary, client streaming, and server streaming RPCs must call
// CloseWrite automatically rather than requiring the user to do it.
return d.requestBodyWriter.Close()
if c.requestBodyWriter != nil {
return c.requestBodyWriter.Close()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure we keep the close semantics in #594

wg.Wait()
}

func TestHTTPCallRaces(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a very nice test but did help recreate the race conditions I was seeing in CI. Any suggestions for improvements?

}

// Close implements io.Closer. It signals that the payload has been fully read.
func (p *payloadCloser) Close() error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close should be called by the http.Transport on receiving an io.EOF and finished with the request body. But under testing the HTTP2 transport would sometimes delay the call leading to a deadlock on Wait(). We therefore also signal completion by reading all the bytes in the buffer.

Copy link
Member

@jhump jhump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to grok the formulation in #622 as it relates to the changes here. I take it that you'd change the httpCall.Send(*bytes.Buffer) method to instead be httpCall.Send(*envelope), avoid having to create the unified buffer (which means copying the prefix and the serialized message to yet-another-buffer). Is that how you see these two fitting together?

p.wait.Wait()
p.mu.Lock()
defer p.mu.Unlock()
p.buf = nil // Discard the buffer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will defeat the GetBody improvement. It seems likely the HTTP framework could rain the entire body (particularly for small bodies) and/or call Close and still need to be able to rewind the buffer and send again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetBody shouldn't be called after we've received a response. Draining or calling Close will allow Send to return and the buffer to be recycled but GetBody can call Rewind() to reset as many times as needed (5x I think max). The number of Close calls or drains isn't counted.

Comment on lines +235 to +237
// Wait for the payload to be fully drained before we return
// from Send to ensure the buffer can safely be reused.
defer payload.Wait()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems excessively dangerous to have unbounded blocking here. Is it really 100% certain that the body will always be drained or closed in a timely manner, in the face of all possible errors?

And, even if so, is it worth paying potential latency hit to wait for this? I would think instead we'd release the buffer (i.e. return to pool) asynchronously, instead of holding of the train here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close has to be called as any reader would leak otherwise. This keeps the current behaviour between sendUnary and sendStream: both don't modify the buffer and both return and the buffer is safe to mutate afterwards.

@emcfarlane
Copy link
Contributor Author

Superseded by #649

@emcfarlane emcfarlane closed this Dec 8, 2023
@emcfarlane emcfarlane deleted the ed/unaryHTTPCall branch February 16, 2024 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants