-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace ShutdownComplete by Closed and ShutdownRequested #2396
Replace ShutdownComplete by Closed and ShutdownRequested #2396
Conversation
I added another fix (and test) to this PR by accident. It's possible to invoke on a server connection that is not connecting (and not fully connected). |
src/IceRpc/Internal/ConnectTimeoutProtocolConnectionDecorator.cs
Outdated
Show resolved
Hide resolved
@@ -230,36 +189,23 @@ public Task<IncomingResponse> InvokeAsync(OutgoingRequest request, CancellationT | |||
Debug.Assert(ConnectionClosedException is not null); | |||
throw new ObjectDisposedException($"{typeof(ProtocolConnection)}", ConnectionClosedException); | |||
} | |||
else if (ConnectionClosedException is not null) | |||
if (ConnectionClosedException is not null) | |||
{ | |||
throw ConnectionClosedException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we use ExeptionUtil.Throw
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think we should. See also #2399.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear anymore what are the semantics of the terminal invoker DisposeAsync
method. Does it performs a speedy shutdown or an abortive close of the connection?
To me, ShutdownAsync
should perform a regular shutdown. DisposeAsync
should perform a speedy-shutdown (with shutdown timeout).
If DisposeAsync
is called while ShutdownAsync
is in progress or if ShutdownAsync
is failed, DiposeAsync
aborts the connection.
if (_isShutDown) | ||
{ | ||
CancelDispatchesAndInvocations(); // speed up shutdown | ||
_ = await Closed.ConfigureAwait(false); | ||
} | ||
else | ||
{ | ||
_ = _closedTcs.TrySetResult( | ||
new IceRpcException(IceRpcError.OperationAborted, "The connection was disposed.")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understanding it correctly, calling DisposeAsync()
while ShutdownAsync()
is in progress triggers a speedy shutdown? But calling DisposeAsync
directly performs an abortive close?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Really calling DisposeAsync while shutdown is in progress speeds-up the shutdown by canceling outstanding dispatches and invocations.
In ProtocolConnection, DisposeAsync cannot call ShutdownAsync/ShutdownAsyncCore since it does not have a ShutdownTimeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this contract a bit strange. What about having instead:
IProtocolConnection.ShutdownAsync(bool speedyShutdown, CancellationToken cancel)
IProtocolConnection.DisposeAsync
is always abortive (like the transport diposal).
?
The terminal invoker calls:
ShutdownAsync(speedyShutdown: false, linkedTokenSource.Token);
fromShutdownAsync
(client connection or connection cache). Here the linked token source is the shutdown timeout token + the application cancellation token.ShutdownAsync(speedyShutdown: true, shutdownTimeoutCts.Token)
fromDisposeAsync
if shutdown wasn't called first. If shutdown was called, it considers the application wants to abort the connection and the terminal invoker callsprotocolConnection.DisposeAsync
to abort it.
This would simplify the protocol connection implementation and would be much clearer for the terminal invoker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://zeroc.atlassian.net/wiki/spaces/~bernard/blog/2022/12/28/1209630729/December+28+2022
I don't think we need this ShutdownAsync(speedyShutdown)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could consider a more radical change and just get rid of CancelDispatchesAndInvocations
. Can't the application implement the speedier shutdown with an interceptor+middleware?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CancelDispatchAndInvocatiions is already an implementation detail:
private protected abstract void CancelDispatchesAndInvocations();
As of this PR, it's only used in the implementation of ProtocolConnection.DisposeAsync.
You're right, this PR is not correct in that respect. First, to clarify the terminology:
I agree. The question is how does a terminal invoker does that?
I don't think this makes sense, and AFAIK that's not what we do on main now. If DisposeAsync is called while ShutdownAsync is in progress, it cancels outstanding invocation and dispatches to speed up the shutdown, but does not aborts the connection--it waits for the shutdown to complete. |
I see 2 possible fixes: a) Change IProtocolConnection.ShutdownAsync to:
where "speedyShutdown" means cancel dispatches and invocations. or This is not unreasonable since absolutely all users of ProtocolConnection use this ShutdownTimeoutProtocolConnectionDecorator (unlike the connect timeout one). This way, DisposeAsync can perform a speedy shutdown bounded by ShutdownTimeout @bentoi, which option do you prefer? Or do you see any other option? |
Yes 🙂.
Ok
I definitely prefer a). To me, the connect and shutdown timeout are the responsibility of the terminal invoker or For For client connection, it could just manage the shutdown CTS directly, no need for a decorator. |
Ok.
Ok, but we really need a better terminology here. The existing _shutdownCts / shutdownCancellationToken is about cancelling outstanding ConnectAsync during shutdown. It's not the cancellation token give to ShutdownAsync! Also note that when ConnectionCache/Server fulfills a shutdown request from the peer, it needs to create a shutdown timeout CTS for that connection. This shutdown timeout CTS can't be shared. |
I reworked this PR based on the reviewers feedback:
This rework is based on my belief there is no value in a "speedy shutdown", where speedy shutdown = cancel dispatches and invocations but still send a GoAway frame to get a "clean" shutdown. There are only 2 possibilities:
If you want a "speedy" graceful shutdown, you just configure your ShutdownTimeout to be small. And then abort when it expires:
|
src/IceRpc/ClientConnection.cs
Outdated
public Task<IncomingResponse> InvokeAsync(OutgoingRequest request, CancellationToken cancellationToken = default) | ||
public Task<IncomingResponse> InvokeAsync( | ||
OutgoingRequest request, | ||
CancellationToken cancellationToken = default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't be more correct to pass the cancelationToken
parameter to ConnectAsync below in the implementation of PerformConnectInvokeAsync
async Task<IncomingResponse> PerformConnectInvokeAsync()
{
_ = await ConnectAsync(cancellationToken).ConfigureAwait(false);
return await _decoratee.InvokeAsync(request, cancellationToken).ConfigureAwait(false);
}
Otherwise, you can end up with OCE from a token different that the one used to make the call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this would be incorrect. Think about it: you're making 100s of concurrent calls on your ClientConnection, each with its own timeout / deadline. Why would the deadline/timeout of one of these invocations--the one that happens to establish the connection--affect the connection establishment?
Otherwise, you can end up with OCE from a token different that the one used to make the call
How?
The call is:
_ = await ConnectAsync(CancellationToken.None).WaitAsync(cancellationToken).ConfigureAwait(false);
ConnectAsyn(cancellationToken.None)
does not throw an OCE.
src/IceRpc/ClientConnection.cs
Outdated
using var cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken); | ||
cts.CancelAfter(_shutdownTimeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the preferred pattern for CTS timeout or
using var cts = new CancellationTokenSource(_shutdownTimeout);
using CancellationTokenRegistration _ =
cancellationToken.UnsafeRegister(cts => ((CancellationTokenSource)cts)!.Cancel(), cts);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know.
CreateLinkedTokenSource + CancelAfter creates one less disposable object and doesn't involve UnsafeRegister... so it looks cleaner.
{ | ||
throw ConnectionClosedException; | ||
} | ||
else if (_connectTask is null) | ||
if (_connectTask is null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we merge the two checks like for shutdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should - the error messages are different.
The ShutdownAsync check is actually incorrect and I'll fix it in a follow-up PR.
_ = _shutdownCompleteSource.TrySetException(newException); | ||
throw newException; | ||
} | ||
ConnectionClosedException ??= new(IceRpcError.ConnectionClosed, message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment in ProtocolConnection says this must be done with the mutex locked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this comment? I only see:
// Derived classes need to be able to set this exception with their mutex locked. We use an atomic
// CompareExchange to avoid locking _mutex and to ensure we only set a single exception, the first one.
able to != must be done
src/IceRpc/Server.cs
Outdated
{ | ||
await _listenTask.ConfigureAwait(false); | ||
} | ||
catch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be clear to put this catch block along with the others and have a _listenTask that is never faulted.
...
catch (IceRpcException exception) when (exception.IceRpcError == IceRpcError.OperationAborted)
{
// The AcceptAsync call can fail with OperationAborted during shutdown if it is accepting a connection
// while the listener is disposed.
}
catch
{
// other exceptions thrown by listener.AcceptAsync are logged by listener via a log decorator
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want ShutdownAsync to throw this exception:
// from ShutdownAsync
if (_listenTask is not null)
{
await _listenTask.ConfigureAwait(false);
}
So I don't see how your suggestion would work.
Maybe I should add a Debug.Fail to assert on this unexpected listenTask exception?
src/IceRpc/Server.cs
Outdated
} | ||
else if (!shutdownCancellationToken.IsCancellationRequested) | ||
else if (!cts.IsCancellationRequested) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we check cts, instead of shutdownCancellationToken? I think the idea was to not bother refusing the connection if we are shutting down the server.
Here cts can be canceled shortly after ConnectTransportConnectionAsync returns and we will not refuse the connection in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The situation here is the transport connection is connected and shortly after that:
- shutdown is in progress (shutdownCancellationToken.IsCancellationRequested)
- or we have too many connections
- or the connect timeout expired
We want to send "ServerBusy" when we have too many connections.
We don't want to send it when shortly after the transport connection got connected the connect timeout expires. In this case, we just want to abort the connection.
Note that with Quic, without an explicit CloseAsync call, DisposeAsync sends a close frame with the default error code, which is now 1 == MultiplexedConnectionCloseError.Aborted. See also #2225.
|
||
[Test] | ||
[Ignore("See #2404")] | ||
public async Task Shutdown_times_out_after_shutdown_timeout() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we only hold the server duplex transport shutdown this should not prevent the client from completing the shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed but it does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this with a fix for #2404 and the test fails from time to time.
{ | ||
await Task.Yield(); // exit mutex lock | ||
|
||
bool shutdownRequested; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we shut down the protocol connection upon shutdown being requested, I think this can result in calling twice connection ShutdownAsync?
For example, if you call Shutdown on the ClientConnection, this will call ShutdownAsync in the protocol connection, and then after receiving the go-away frame we complete _shutdownRequestedTcs
which will trigger a second call from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConnectProtocolConnectionDecorator handles this, so it shouldn't be a problem if we call it twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConnectProtocolConnectionDecorator.ShutdownAsync is simply:
public Task ShutdownAsync(CancellationToken cancellationToken) => _decoratee.ShutdownAsync(cancellationToken);
If you look more closely at the code, there is only one task/thread that will shutdown or dispose the connection. See private async Task CreateConnectionCleanupTask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say the client connection initiates the shutdown, the peer reply by sending the go-away frame, the client receives it and calls ShutdownRequested
which triggers a call to ShtudonAsync from this task, and the protocol connection throws InvalidOperationException
because shutdown was already called. what am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the application calls ClientConnection.ShutdownAsync, it cancels _shutdownCts:
https://github.com/zeroc-ice/icerpc-csharp/blob/48db0cde520e47b392889462dfb7e1eb09d698a0/src/IceRpc/ClientConnection.cs#L273
In doing so, it says "_connection is now mine alone".
Later on, the cleanup task of _connection receives a shutdown request from the server:
https://github.com/zeroc-ice/icerpc-csharp/blob/48db0cde520e47b392889462dfb7e1eb09d698a0/src/IceRpc/ClientConnection.cs#L307
and before taking any action on this connection, it verifies if connection == _connection
and if the ClientConnection is shutting down (and the answer is yes in your scenario):
https://github.com/zeroc-ice/icerpc-csharp/blob/48db0cde520e47b392889462dfb7e1eb09d698a0/src/IceRpc/ClientConnection.cs#L321
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still find the ClientConnection
implementation too complex.
To me, it should be a "simplified" connection cache implementation. It should keep track of the pending connect (_pendingConnectTask
), the active connection (_activeConnection
) and the background connection dispose count (_backgroundConnectionDisposeCount
). No need for decorators.
_connectionFactory = previousConnection => | ||
{ | ||
IProtocolConnection connection = protocolConnectionFactory.CreateConnection(serverAddress); | ||
_connectionFactory = () => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function doesn't appear to be really useful anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's useful. Without it, we would need to keep serverAddress and options.ConnecTimeout around.
@@ -186,7 +194,9 @@ async Task<IncomingResponse> PerformInvokeAsync() | |||
{ | |||
try | |||
{ | |||
connection = await ConnectAsync(mainServerAddress, cancellationToken).ConfigureAwait(false); | |||
// TODO: this code generates a UTE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't it be fixed with this PR? Or is there an issue we could reference here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll create an issue.
try | ||
{ | ||
_ = await connection.ConnectAsync(cts.Token).ConfigureAwait(false); | ||
} | ||
catch (OperationCanceledException) when (!shutdownCancellationToken.IsCancellationRequested) | ||
{ | ||
throw new TimeoutException( | ||
$"The connection establishment timed out after {_connectTimeout.TotalSeconds} s."); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would get rid of the inner try/catch with something along these lines:
try
{
_ = await connection.ConnectAsync(cts.Token).ConfigureAwait(false);
}
catch (OperationCanceledException) when (shutdownCancellationToken.IsCancellationRequested)
{
throw new IceRpcException(
IceRpcError.OperationAborted,
"The connection cache was shut down or disposed.");
}
catch (Exception exception)
{
lock (_mutex)
{
bool removed = _pendingConnections.Remove(serverAddress);
Debug.Assert(removed);
}
await connection.DisposeAsync().ConfigureAwait(false);
if (exception is OperationCanceledException)
{
throw new TimeoutException(
$"The connection establishment timed out after {_connectTimeout.TotalSeconds} s.");
}
else
{
throw;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your suggestion is not correct.
Once we lock the mutex, we have to check shutdownCancellationToken.IsCancellationRequested
.
/// timeout.</description></item> | ||
/// <item><description>The peer shuts down the connection.</description></item> | ||
/// </list> | ||
/// <summary>Gets a task that completes when the peer or the idle monitor requests the shutdown of this connection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious that this can't create leaks since the task continuation will never run. If this continuation references objects, these might never be collected. It would be good to investigate this to verify that it's safe.
private readonly CancellationTokenSource _shutdownCts = new(); | ||
private Task? _shutdownTask; | ||
private readonly TimeSpan _shutdownTimeout; | ||
// The thread that completes this TCS can run the continuations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious why this comment is about. Perhaps explain why it's fine to not use TaskCreationOptions.RunContinuationsAsynchronously
instead?
if (_isShutDown) | ||
{ | ||
CancelDispatchesAndInvocations(); // speed up shutdown | ||
_ = await Closed.ConfigureAwait(false); | ||
} | ||
else | ||
{ | ||
_ = _closedTcs.TrySetResult( | ||
new IceRpcException(IceRpcError.OperationAborted, "The connection was disposed.")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could consider a more radical change and just get rid of CancelDispatchesAndInvocations
. Can't the application implement the speedier shutdown with an interceptor+middleware?
This PR replaces
IProtocolConnection.ShutdownComplete
by 2 new properties:These tasks are never faulted or canceled.
In doing so, this PR significantly simplifies ProtocolConnection by moving some responsibilities to its users (Server, ClientConnection and ConnectionCache), namely:
The last 2 points are tied: DisposeAsync can't perform a graceful shutdown without a ShutdownTimeout to ensure DisposeAsync doesn't hang forever.
This PR removes the "connect CTS" and "shutdown CTS" from ProtocolConnection. They are unnecessary: the caller is responsible to cancel ConnectAsync (and if desired, ShutdownAsync) when it wants to, using its own CTS.
This PR implements connect timeout and shutdown timeout via decorators, and ClientProtocolConnectionFactory manufactures fully decorated protocol connections used by ClientConnection, ConnectionCache or user-code using this factory.
Server uses the ShutdownTimeoutProtocolConnectionDecorator but not the ConnectTimeoutProtocolConnectionDecorator because it wants the "connect timeout" to include the transport connection ConnectAsync performed before the protocol connection ConnectAsync.
This PR also adds or reworks the shutdown CTS of ClientConnection, ConnectCache and Server so that a ShutdownAsync or DisposeAsync cancels outstanding ConnectAsync.
Fixes #2335
Fixes #2352
Fixes #2364
Fixes #2372
Fixes #2395