Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simplify transport. #296

Merged
merged 8 commits into from
Nov 30, 2022
Merged

simplify transport. #296

merged 8 commits into from
Nov 30, 2022

Conversation

zenhack
Copy link
Contributor

@zenhack zenhack commented Aug 23, 2022

A substantial amount of the complexity of the transport package is due
the interface: we pass contexts to functions that do IO, and then jump
through hoops to respect those contexts. Instead, this patch changes
Transport and Codec's Close() to work like net.Close(): it interrupts
any outstanding IO. We demand the same semantics from the
io.ReadWriteCloser that we pass in, which seems like not a tall order.

Still TODO:

  • rework the rpc subsystem's shutdown logic. The basic idea is we
    spawn an extra goroutine to wait on context.Done() and then call close,
    but the shutdown logic is intricate and I am too tired right now; this
    patch introduces test regressions for now.
  • Tweak the docs on how this changes the interface for e.g. NewConn

Marking this as a draft for now, until I find time to tie up the loose ends.

A substantial amount of the complexity of the transport package is due
the interface: we pass contexts to functions that do IO, and then jump
through hoops to respect those contexts. Instead, this patch changes
Transport and Codec's Close() to work like net.Close(): it interrupts
any outstanding IO. We demand the same semantics from the
io.ReadWriteCloser that we pass in, which seems like not a tall order.

Still TODO:

- [ ] rework the rpc subsystem's shutdown logic. The basic idea is we
spawn an extra goroutine to wait on context.Done() and then call close,
but the shutdown logic is intricate and I am too tired right now; this
patch introduces test regressions for now.
- [ ] Tweak the docs on how this changes the interface for e.g. NewConn
@zenhack zenhack marked this pull request as draft August 23, 2022 05:17
lthibault
lthibault previously approved these changes Aug 23, 2022
Copy link
Collaborator

@lthibault lthibault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

zenhack added a commit to zenhack/go-capnp that referenced this pull request Nov 28, 2022
This is a self-contained enough change that it seemed worth submitting
independently of capnproto#296; this moves the actual IO calls on the transport
into a new pair of goroutines (one for send, one for receive), so that
the receive loop and the send goroutine can always be responsive to
context cancels. This will become important when we no longer require
the IO operations themselves to respond to a context.
zenhack added a commit to zenhack/go-capnp that referenced this pull request Nov 28, 2022
This is a self-contained enough change that it seemed worth submitting
independently of capnproto#296; this moves the actual RecvMessage() call
on the transport into a new goroutines (one for send, one for receive),
so that the receive loop always be responsive to context cancels. This
will become important when we no longer require the IO operations
themselves to respond to a context.

send() can stay as-is; it would be abnormal for write to block
indefinitely, so we can rely on that one to return in a reasonable
amount of time. Meanwhile, the new reader goroutine is safe to leave
running until late into the shutdown logic, since the shutdown logic
doesn't itself want to read from the transport, and any messages read
won't be acted on since the receive loop has shut down.
@zenhack
Copy link
Contributor Author

zenhack commented Nov 28, 2022

This is mostly working, but there are some intermittent test failures I don't fully understand yet. @lthibault, I think it's worth a look, though we shouldn't hit merge until we've nailed down those issues.

As far as performance data; before:

$ go tool pprof before.mem 
File: rpc.test
Type: alloc_space
Time: Nov 27, 2022 at 5:25pm (EST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 466.55MB, 71.86% of 649.24MB total
Dropped 56 nodes (cum <= 3.25MB)
Showing top 10 nodes out of 67
      flat  flat%   sum%        cum   cum%
  103.51MB 15.94% 15.94%   138.51MB 21.33%  capnproto.org/go/capnp/v3/rpc/transport.(*ctxWriteCloser).write
   82.01MB 12.63% 28.57%   124.01MB 19.10%  capnproto.org/go/capnp/v3/rpc/transport.(*ctxReader).Read
   77.01MB 11.86% 40.44%    77.01MB 11.86%  net.(*pipeDeadline).set
   51.51MB  7.93% 48.37%    60.01MB  9.24%  capnproto.org/go/capnp/v3.NewPromise
   36.50MB  5.62% 53.99%   169.02MB 26.03%  capnproto.org/go/capnp/v3.(*Decoder).Decode
      29MB  4.47% 58.46%       29MB  4.47%  capnproto.org/go/capnp/v3/internal/chanmutex.NewLocked
      28MB  4.31% 62.77%       28MB  4.31%  capnproto.org/go/capnp/v3.NewMessage
   22.50MB  3.47% 66.24%    50.50MB  7.78%  capnproto.org/go/capnp/v3/rpc/transport.(*transport).NewMessage
   18.50MB  2.85% 69.09%    19.50MB  3.00%  capnproto.org/go/capnp/v3.ImmediateAnswer
      18MB  2.77% 71.86%       18MB  2.77%  capnproto.org/go/capnp/v3.(*Metadata).Put

After:

$ go tool pprof after.mem
File: rpc.test
Type: alloc_space
Time: Nov 28, 2022 at 6:21pm (EST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 232.53MB, 61.41% of 378.69MB total
Dropped 42 nodes (cum <= 1.89MB)
Showing top 10 nodes out of 58
      flat  flat%   sum%        cum   cum%
   44.51MB 11.75% 11.75%    50.51MB 13.34%  capnproto.org/go/capnp/v3.NewPromise
      36MB  9.51% 21.26%    41.50MB 10.96%  capnproto.org/go/capnp/v3.(*Decoder).Decode
      27MB  7.13% 28.39%       27MB  7.13%  capnproto.org/go/capnp/v3.NewMessage
   23.50MB  6.21% 34.60%    23.50MB  6.21%  capnproto.org/go/capnp/v3/internal/chanmutex.NewLocked (inline)
      21MB  5.55% 40.14%       48MB 12.68%  capnproto.org/go/capnp/v3/rpc/transport.(*transport).NewMessage
   18.51MB  4.89% 45.03%    18.51MB  4.89%  capnproto.org/go/capnp/v3.(*Metadata).Put
      16MB  4.23% 49.26%       20MB  5.28%  capnproto.org/go/capnp/v3.ImmediateAnswer
      16MB  4.23% 53.48%    39.50MB 10.43%  capnproto.org/go/capnp/v3/exp/mpsc.newNode[...] (inline)
   15.50MB  4.09% 57.58%    38.01MB 10.04%  capnproto.org/go/capnp/v3/server.(*Server).start
   14.50MB  3.83% 61.41%    19.50MB  5.15%  context.propagateCancel

lthibault
lthibault previously approved these changes Nov 28, 2022
Presumably this couldn't happen before we added the timeout during
shutdown, but now I am seeing an occasional panic here without this if.
This fixes some occasional test failures where questions were sent even
though the context is canceled before the relevant functions were
called; these are dependent on sendMessage to check the context before
actually doing the send.
@zenhack zenhack marked this pull request as ready for review November 30, 2022 05:46
@zenhack zenhack changed the title WIP: simplify transport. simplify transport. Nov 30, 2022
@zenhack
Copy link
Contributor Author

zenhack commented Nov 30, 2022

@lthibault, ok, I think this is ready.

@zenhack zenhack mentioned this pull request Nov 30, 2022
@lthibault lthibault merged commit 3dc6f17 into capnproto:main Nov 30, 2022
@zenhack zenhack deleted the transport-contexts branch November 30, 2022 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants