Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace ShutdownComplete by Closed and ShutdownRequested #2396

Merged
merged 30 commits into from
Jan 2, 2023
Merged

Replace ShutdownComplete by Closed and ShutdownRequested #2396

merged 30 commits into from
Jan 2, 2023

Conversation

bernardnormier
Copy link
Member

@bernardnormier bernardnormier commented Dec 27, 2022

This PR replaces IProtocolConnection.ShutdownComplete by 2 new properties:

Task<Exception?> Closed { get; }
Task ShutdownRequested { get; }

These tasks are never faulted or canceled.

In doing so, this PR significantly simplifies ProtocolConnection by moving some responsibilities to its users (Server, ClientConnection and ConnectionCache), namely:

  • calling ShutdownAsync when ShutdownRequested completes (ShutdownRequested completes when the idle monitor requests a shutdown or an icerpc peer sends a GoAway frame)
  • the implementation of ConnectTimeout and ShutdownTimeout
  • the logic that "DisposeAsync" attempts a graceful shutdown

The last 2 points are tied: DisposeAsync can't perform a graceful shutdown without a ShutdownTimeout to ensure DisposeAsync doesn't hang forever.

This PR removes the "connect CTS" and "shutdown CTS" from ProtocolConnection. They are unnecessary: the caller is responsible to cancel ConnectAsync (and if desired, ShutdownAsync) when it wants to, using its own CTS.

This PR implements connect timeout and shutdown timeout via decorators, and ClientProtocolConnectionFactory manufactures fully decorated protocol connections used by ClientConnection, ConnectionCache or user-code using this factory.
Server uses the ShutdownTimeoutProtocolConnectionDecorator but not the ConnectTimeoutProtocolConnectionDecorator because it wants the "connect timeout" to include the transport connection ConnectAsync performed before the protocol connection ConnectAsync.

This PR also adds or reworks the shutdown CTS of ClientConnection, ConnectCache and Server so that a ShutdownAsync or DisposeAsync cancels outstanding ConnectAsync.

Fixes #2335
Fixes #2352
Fixes #2364
Fixes #2372
Fixes #2395

@bernardnormier
Copy link
Member Author

bernardnormier commented Dec 27, 2022

I added another fix (and test) to this PR by accident. It's possible to invoke on a server connection that is not connecting (and not fully connected).

@@ -230,36 +189,23 @@ public Task<IncomingResponse> InvokeAsync(OutgoingRequest request, CancellationT
Debug.Assert(ConnectionClosedException is not null);
throw new ObjectDisposedException($"{typeof(ProtocolConnection)}", ConnectionClosedException);
}
else if (ConnectionClosedException is not null)
if (ConnectionClosedException is not null)
{
throw ConnectionClosedException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we use ExeptionUtil.Throw here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think we should. See also #2399.

Copy link
Contributor

@bentoi bentoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear anymore what are the semantics of the terminal invoker DisposeAsync method. Does it performs a speedy shutdown or an abortive close of the connection?

To me, ShutdownAsync should perform a regular shutdown. DisposeAsync should perform a speedy-shutdown (with shutdown timeout).

If DisposeAsync is called while ShutdownAsync is in progress or if ShutdownAsync is failed, DiposeAsync aborts the connection.

Comment on lines 158 to 166
if (_isShutDown)
{
CancelDispatchesAndInvocations(); // speed up shutdown
_ = await Closed.ConfigureAwait(false);
}
else
{
_ = _closedTcs.TrySetResult(
new IceRpcException(IceRpcError.OperationAborted, "The connection was disposed."));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understanding it correctly, calling DisposeAsync() while ShutdownAsync() is in progress triggers a speedy shutdown? But calling DisposeAsync directly performs an abortive close?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Really calling DisposeAsync while shutdown is in progress speeds-up the shutdown by canceling outstanding dispatches and invocations.

In ProtocolConnection, DisposeAsync cannot call ShutdownAsync/ShutdownAsyncCore since it does not have a ShutdownTimeout.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this contract a bit strange. What about having instead:

  • IProtocolConnection.ShutdownAsync(bool speedyShutdown, CancellationToken cancel)
  • IProtocolConnection.DisposeAsync is always abortive (like the transport diposal).
    ?

The terminal invoker calls:

  • ShutdownAsync(speedyShutdown: false, linkedTokenSource.Token); from ShutdownAsync (client connection or connection cache). Here the linked token source is the shutdown timeout token + the application cancellation token.
  • ShutdownAsync(speedyShutdown: true, shutdownTimeoutCts.Token) from DisposeAsync if shutdown wasn't called first. If shutdown was called, it considers the application wants to abort the connection and the terminal invoker calls protocolConnection.DisposeAsync to abort it.

This would simplify the protocol connection implementation and would be much clearer for the terminal invoker.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://zeroc.atlassian.net/wiki/spaces/~bernard/blog/2022/12/28/1209630729/December+28+2022

I don't think we need this ShutdownAsync(speedyShutdown).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider a more radical change and just get rid of CancelDispatchesAndInvocations. Can't the application implement the speedier shutdown with an interceptor+middleware?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CancelDispatchAndInvocatiions is already an implementation detail:

private protected abstract void CancelDispatchesAndInvocations();

As of this PR, it's only used in the implementation of ProtocolConnection.DisposeAsync.

@bernardnormier
Copy link
Member Author

It's not clear anymore what are the semantics of the terminal invoker DisposeAsync method. Does it performs a speedy shutdown or an abortive close of the connection?

You're right, this PR is not correct in that respect.

First, to clarify the terminology:

  • speedy shutdown = regular shutdown with non-canceled cancellation token but where we cancel outstanding invocation and dispatches
  • abortive close/shutdown = ShutdownAsync call with a canceled cancellation token. It cancels outstanding invocations and dispatches and unlike a regular or speedy shutdown, does not send a GoAway to the peer

To me, ShutdownAsync should perform a regular shutdown. DisposeAsync should perform a speedy-shutdown (with shutdown timeout).

I agree. The question is how does a terminal invoker does that?
Should I add a speedyShutdown parameter to IProtocolConnection.ShutdownAsync?

If DisposeAsync is called while ShutdownAsync is in progress or if ShutdownAsync is failed, DiposeAsync aborts the connection.

I don't think this makes sense, and AFAIK that's not what we do on main now. If DisposeAsync is called while ShutdownAsync is in progress, it cancels outstanding invocation and dispatches to speed up the shutdown, but does not aborts the connection--it waits for the shutdown to complete.

@bernardnormier
Copy link
Member Author

bernardnormier commented Dec 28, 2022

I see 2 possible fixes:

a) Change IProtocolConnection.ShutdownAsync to:

Task ShutdownAsync(bool speedyShutdown = false, CancellationToken cancellationToken = default);

where "speedyShutdown" means cancel dispatches and invocations.

or
b) Eliminate the ShutdownTimeoutProtocolConnectionDecorator and move back the implementation of ShutdownTimeout to ProtocolConnection (or derived classes).

This is not unreasonable since absolutely all users of ProtocolConnection use this ShutdownTimeoutProtocolConnectionDecorator (unlike the connect timeout one).

This way, DisposeAsync can perform a speedy shutdown bounded by ShutdownTimeout

@bentoi, which option do you prefer? Or do you see any other option?

@bentoi
Copy link
Contributor

bentoi commented Dec 28, 2022

Should I add a speedyShutdown parameter to IProtocolConnection.ShutdownAsync?

Yes 🙂.

If DisposeAsync is called while ShutdownAsync is in progress, it cancels outstanding invocation and dispatches to speed up the shutdown, but does not aborts the connection--it waits for the shutdown to complete.

Ok

@bentoi, which option do you prefer? Or do you see any other option?

I definitely prefer a).

To me, the connect and shutdown timeout are the responsibility of the terminal invoker or Server.

For ConnectionCache and Server, there should be a single shutdown cancellation token source used to shutdown all the connections, no need for a ShutdownTimeoutProtocolConnectionDecorator per connection.

For client connection, it could just manage the shutdown CTS directly, no need for a decorator.

@bernardnormier
Copy link
Member Author

bernardnormier commented Dec 28, 2022

I definitely prefer a).

Ok.

For ConnectionCache and Server, there should be a single shutdown cancellation token source used to shutdown all the connections, no need for a ShutdownTimeoutProtocolConnectionDecorator per connection.

Ok, but we really need a better terminology here.

The existing _shutdownCts / shutdownCancellationToken is about cancelling outstanding ConnectAsync during shutdown. It's not the cancellation token give to ShutdownAsync!

Also note that when ConnectionCache/Server fulfills a shutdown request from the peer, it needs to create a shutdown timeout CTS for that connection. This shutdown timeout CTS can't be shared.

@bernardnormier
Copy link
Member Author

I reworked this PR based on the reviewers feedback:

  • I removed the ConnectTimeout and ShutdownTimeout decorators. They are now implemented in Server, ClientConnection and ConnectionCache
  • DisposeAsync on Server/ClientConnection/ConnectionCache no longer shuts down any connection. It's an abortive closure that does not send any GoAway frame
  • ShutdownAsync on Server/ClientConnection/ConnectionCache can only be called on an object that is not shut down or shutting down
  • ProtocolConnection.ShutdownAsync no longer cancels outstanding dispatches and invocations, even when given a canceled cancellation token
  • ProtocolConnection.DisposeAsync cancels outstanding dispatches and invocations when a shutdown is in progress. It does it as an attempt to abort / speed-up this shutdown.

This rework is based on my belief there is no value in a "speedy shutdown", where speedy shutdown = cancel dispatches and invocations but still send a GoAway frame to get a "clean" shutdown.

There are only 2 possibilities:

  • you want to abort all connections and exit ASAP => you call DisposeAsync. Outstanding dispatches and invocations are aborted.
  • you want to shutdown cleanly and give time to outstanding dispatches and invocations to complete => you call ShutdownAsync

If you want a "speedy" graceful shutdown, you just configure your ShutdownTimeout to be small. And then abort when it expires:

try
{
    await server.ShutdownAsync(); // use shutdown timeout, doesn't cancel anything
}
catch (IceRpcException)
{
}
catch (TimeoutException)
{
}
finally
{
    await server.DisposeAsync(); // cancel dispatches and invocations
}

public Task<IncomingResponse> InvokeAsync(OutgoingRequest request, CancellationToken cancellationToken = default)
public Task<IncomingResponse> InvokeAsync(
OutgoingRequest request,
CancellationToken cancellationToken = default)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't be more correct to pass the cancelationToken parameter to ConnectAsync below in the implementation of PerformConnectInvokeAsync

async Task<IncomingResponse> PerformConnectInvokeAsync()
            {
                _ = await ConnectAsync(cancellationToken).ConfigureAwait(false);
                return await _decoratee.InvokeAsync(request, cancellationToken).ConfigureAwait(false);
            }

Otherwise, you can end up with OCE from a token different that the one used to make the call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this would be incorrect. Think about it: you're making 100s of concurrent calls on your ClientConnection, each with its own timeout / deadline. Why would the deadline/timeout of one of these invocations--the one that happens to establish the connection--affect the connection establishment?

Otherwise, you can end up with OCE from a token different that the one used to make the call

How?

The call is:

_ = await ConnectAsync(CancellationToken.None).WaitAsync(cancellationToken).ConfigureAwait(false);

ConnectAsyn(cancellationToken.None) does not throw an OCE.

Comment on lines 550 to 551
using var cts = CancellationTokenSource.CreateLinkedTokenSource(cancellationToken);
cts.CancelAfter(_shutdownTimeout);
Copy link
Member

@pepone pepone Dec 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the preferred pattern for CTS timeout or

using var cts = new CancellationTokenSource(_shutdownTimeout);
using CancellationTokenRegistration _ =
    cancellationToken.UnsafeRegister(cts => ((CancellationTokenSource)cts)!.Cancel(), cts);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know.
CreateLinkedTokenSource + CancelAfter creates one less disposable object and doesn't involve UnsafeRegister... so it looks cleaner.

@pepone pepone self-requested a review December 29, 2022 16:16
{
throw ConnectionClosedException;
}
else if (_connectTask is null)
if (_connectTask is null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge the two checks like for shutdown?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should - the error messages are different.

The ShutdownAsync check is actually incorrect and I'll fix it in a follow-up PR.

_ = _shutdownCompleteSource.TrySetException(newException);
throw newException;
}
ConnectionClosedException ??= new(IceRpcError.ConnectionClosed, message);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment in ProtocolConnection says this must be done with the mutex locked

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this comment? I only see:

    // Derived classes need to be able to set this exception with their mutex locked. We use an atomic
    // CompareExchange to avoid locking _mutex and to ensure we only set a single exception, the first one.

able to != must be done

{
await _listenTask.ConfigureAwait(false);
}
catch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be clear to put this catch block along with the others and have a _listenTask that is never faulted.

...
catch (IceRpcException exception) when (exception.IceRpcError == IceRpcError.OperationAborted)
{
    // The AcceptAsync call can fail with OperationAborted during shutdown if it is accepting a connection
    // while the listener is disposed.
}
catch
{
  // other exceptions thrown by listener.AcceptAsync are logged by listener via a log decorator
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want ShutdownAsync to throw this exception:

// from ShutdownAsync
            if (_listenTask is not null)
            {
                await _listenTask.ConfigureAwait(false);
            }

So I don't see how your suggestion would work.

Maybe I should add a Debug.Fail to assert on this unexpected listenTask exception?

}
else if (!shutdownCancellationToken.IsCancellationRequested)
else if (!cts.IsCancellationRequested)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we check cts, instead of shutdownCancellationToken? I think the idea was to not bother refusing the connection if we are shutting down the server.

Here cts can be canceled shortly after ConnectTransportConnectionAsync returns and we will not refuse the connection in this case.

Copy link
Member Author

@bernardnormier bernardnormier Dec 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The situation here is the transport connection is connected and shortly after that:

  • shutdown is in progress (shutdownCancellationToken.IsCancellationRequested)
  • or we have too many connections
  • or the connect timeout expired

We want to send "ServerBusy" when we have too many connections.
We don't want to send it when shortly after the transport connection got connected the connect timeout expires. In this case, we just want to abort the connection.

Note that with Quic, without an explicit CloseAsync call, DisposeAsync sends a close frame with the default error code, which is now 1 == MultiplexedConnectionCloseError.Aborted. See also #2225.

@pepone pepone self-requested a review December 29, 2022 19:27

[Test]
[Ignore("See #2404")]
public async Task Shutdown_times_out_after_shutdown_timeout()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we only hold the server duplex transport shutdown this should not prevent the client from completing the shutdown.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed but it does

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this with a fix for #2404 and the test fails from time to time.

{
await Task.Yield(); // exit mutex lock

bool shutdownRequested;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we shut down the protocol connection upon shutdown being requested, I think this can result in calling twice connection ShutdownAsync?

For example, if you call Shutdown on the ClientConnection, this will call ShutdownAsync in the protocol connection, and then after receiving the go-away frame we complete _shutdownRequestedTcs which will trigger a second call from here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConnectProtocolConnectionDecorator handles this, so it shouldn't be a problem if we call it twice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConnectProtocolConnectionDecorator.ShutdownAsync is simply:

 public Task ShutdownAsync(CancellationToken cancellationToken) => _decoratee.ShutdownAsync(cancellationToken);

If you look more closely at the code, there is only one task/thread that will shutdown or dispose the connection. See private async Task CreateConnectionCleanupTask.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say the client connection initiates the shutdown, the peer reply by sending the go-away frame, the client receives it and calls ShutdownRequested which triggers a call to ShtudonAsync from this task, and the protocol connection throws InvalidOperationException because shutdown was already called. what am I missing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the application calls ClientConnection.ShutdownAsync, it cancels _shutdownCts:
https://github.com/zeroc-ice/icerpc-csharp/blob/48db0cde520e47b392889462dfb7e1eb09d698a0/src/IceRpc/ClientConnection.cs#L273

In doing so, it says "_connection is now mine alone".

Later on, the cleanup task of _connection receives a shutdown request from the server:
https://github.com/zeroc-ice/icerpc-csharp/blob/48db0cde520e47b392889462dfb7e1eb09d698a0/src/IceRpc/ClientConnection.cs#L307

and before taking any action on this connection, it verifies if connection == _connection and if the ClientConnection is shutting down (and the answer is yes in your scenario):
https://github.com/zeroc-ice/icerpc-csharp/blob/48db0cde520e47b392889462dfb7e1eb09d698a0/src/IceRpc/ClientConnection.cs#L321

Copy link
Contributor

@bentoi bentoi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still find the ClientConnection implementation too complex.

To me, it should be a "simplified" connection cache implementation. It should keep track of the pending connect (_pendingConnectTask), the active connection (_activeConnection) and the background connection dispose count (_backgroundConnectionDisposeCount). No need for decorators.

_connectionFactory = previousConnection =>
{
IProtocolConnection connection = protocolConnectionFactory.CreateConnection(serverAddress);
_connectionFactory = () =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function doesn't appear to be really useful anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's useful. Without it, we would need to keep serverAddress and options.ConnecTimeout around.

@@ -186,7 +194,9 @@ async Task<IncomingResponse> PerformInvokeAsync()
{
try
{
connection = await ConnectAsync(mainServerAddress, cancellationToken).ConfigureAwait(false);
// TODO: this code generates a UTE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't it be fixed with this PR? Or is there an issue we could reference here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll create an issue.

Comment on lines +343 to +351
try
{
_ = await connection.ConnectAsync(cts.Token).ConfigureAwait(false);
}
catch (OperationCanceledException) when (!shutdownCancellationToken.IsCancellationRequested)
{
throw new TimeoutException(
$"The connection establishment timed out after {_connectTimeout.TotalSeconds} s.");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would get rid of the inner try/catch with something along these lines:

try
{
    _ = await connection.ConnectAsync(cts.Token).ConfigureAwait(false);
}
catch (OperationCanceledException) when (shutdownCancellationToken.IsCancellationRequested)
{
    throw new IceRpcException(
        IceRpcError.OperationAborted,
        "The connection cache was shut down or disposed.");
}
catch (Exception exception)
{
    lock (_mutex)
    {
        bool removed = _pendingConnections.Remove(serverAddress);
        Debug.Assert(removed);
    }
    
    await connection.DisposeAsync().ConfigureAwait(false);

    if (exception is OperationCanceledException)
    {
        throw new TimeoutException(
            $"The connection establishment timed out after {_connectTimeout.TotalSeconds} s.");
    }
    else
    {
        throw;
    }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion is not correct.

Once we lock the mutex, we have to check shutdownCancellationToken.IsCancellationRequested.

/// timeout.</description></item>
/// <item><description>The peer shuts down the connection.</description></item>
/// </list>
/// <summary>Gets a task that completes when the peer or the idle monitor requests the shutdown of this connection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious that this can't create leaks since the task continuation will never run. If this continuation references objects, these might never be collected. It would be good to investigate this to verify that it's safe.

private readonly CancellationTokenSource _shutdownCts = new();
private Task? _shutdownTask;
private readonly TimeSpan _shutdownTimeout;
// The thread that completes this TCS can run the continuations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious why this comment is about. Perhaps explain why it's fine to not use TaskCreationOptions.RunContinuationsAsynchronously instead?

Comment on lines 158 to 166
if (_isShutDown)
{
CancelDispatchesAndInvocations(); // speed up shutdown
_ = await Closed.ConfigureAwait(false);
}
else
{
_ = _closedTcs.TrySetResult(
new IceRpcException(IceRpcError.OperationAborted, "The connection was disposed."));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider a more radical change and just get rid of CancelDispatchesAndInvocations. Can't the application implement the speedier shutdown with an interceptor+middleware?

@bernardnormier bernardnormier merged commit 48db0cd into icerpc:main Jan 2, 2023
@bernardnormier bernardnormier deleted the shutdown-requested branch January 21, 2023 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment