Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a clean way to cancel a long-running Socket operation (e.g. accept or connect) #16236

Closed
Tracked by #93172 ...
GSPP opened this issue Jan 28, 2016 · 12 comments
Closed
Tracked by #93172 ...
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Net
Milestone

Comments

@GSPP
Copy link

GSPP commented Jan 28, 2016

The normal pattern to listen for incoming connections is:

using (var s = new Socket(...)) {
 s.Bind(...);
 s.Listen(...);

 while (true) {
  var connectionSocket = s.Accept();
  Task.Run(() => ProcessConnection(connectionSocket));
 }
}

The only way to reliably cancel that loop is to close the socket. This works, but it causes an ObjectDisposedException which is usually one of the reserved exception types that always indicate a bug such as NRE and ArgumentNullException. Catching that exception might hide real bugs. You might confuse the intended ObjectDisposedException with one that is not supposed to happen but did (bug).

Also, this seems to be racy. What if the handle is closed right before Accept passes it to the native accept function? Is that even thread-safe?! Could be a use of an invalid/unintended handle. Note sure if the BCL has a way to prevent that scenario.

Some people suggest to break the loop by connecting to that socket. That might not be possible depending on what interface you are listening on. There are no guarantees to be able to connect anywhere at all. Also, this feels like such a dirty hack.

There are ideas of using IAsyncResult.WaitHandle with a timeout or in combination with another wait handle but that's quite nasty code and much slower than not using a wait handle. It's much faster to let the kernel wait as part of a synchronous call than to execute an asynchronous call, then create an event object, wait for it and then dispose of all of this. Also, in case you do a WaitHandle.WaitAny(ioHandle, cancelHandle) and you find yourself in the cancellation case you still need to abort the IO using Socket.Close which is the same dirtiness.

It does not matter for this issue whether you're using sync or async IO.

Please add a way to cancel any long-running socket operation such as accept, connect, read, write, shutdown in a clean way. This could take the form of a CancellationToken or a method Socket.AbortPendingOperations(). A cancelled operation should throw an exception that allows to identify the situation so that it can be caught reliably.

Cancelling is not just important for breaking the accept loop but it's useful for all other long-running operations as well. Sometimes you need to be able to shut down processing.

@davidsh
Copy link
Contributor

davidsh commented Jan 28, 2016

cc: @CIPop @SidharthNabar

@CIPop CIPop assigned himadrisarkar and unassigned CIPop Apr 29, 2016
@CIPop
Copy link
Member

CIPop commented Apr 29, 2016

@himadrisarkar PTAL

@GSPP thanks for the suggestion! I agree that a way to cancel operations while preserving the socket object should exist outside of calling Dispose() (or Close() on .Net Desktop).

Regarding the race conditions you've pointed out: you are correct. In some circumstances, calling Close() in parallel with a synchronous Winsock operation will result in an Access Violation caused by the Win32 Winsock code. This means that if you use synchronous calls, there is no safe way to abort operations at this point. You can safely cancel async operations by closing the socket.

@GSPP
Copy link
Author

GSPP commented Apr 29, 2016

@CIPop Are you sure that async operations can be aborted safely? The close might happen right before the OS receives the recv (or whatever else) syscall. The call will happen with a deleted handle. Making this thread safe is hard. It requires a .NET level lock.

@CIPop
Copy link
Member

CIPop commented Apr 29, 2016

@stephentoub @ericeil @pgavlin can answer the *NIX question regarding sync/async cancellation.

The Windows OS doesn't have a similar recv syscall. Winsock's sync implementation is a user-mode wrapper over the Overlapped implementation. Windows async is implemented directly over the Overlapped/IOCP model. I'm not excluding the possibility of issues but so far I'm not aware of any issues regarding async socket cancellation.

@karelz
Copy link
Member

karelz commented Feb 9, 2017

Next step: We need formal API proposal.
On Windows we need to find out if after cancellation you can send still something else on the same connection.
We need to asses Linux implementation - is it doable?

@GSPP
Copy link
Author

GSPP commented Aug 9, 2018

The main point is to way to cancel these operations at all. This is not possible in a truly safe and clean way at the moment (see my lengthy discussion of this in the opening post).

I think it's OK if the socket is destroyed when cancelling. Cancelling a read without taking data is very complicated (unclear if any OS level API supports this and there are race conditions leading to lost data). Cancelling a write would need to make sure that the application knows whether data was written or not (and how much).

All of this is uncertainty and I don't see a need. I don't see a need to cancel a read or write without failing the entire connection. The best practice with socket connections is to write on demand and to have a single read loop busy at all times. User code can use this pattern to emulate cancellable reads and writes.

I also do not propose to add a timeout to individual read/write operations (directly on the socket or on a NetworkStream). No changes to timeouts at all.

Note, that other issues have referenced this one. Higher level APIs need lower level APIs to be cancellable to be cancellable themselves. Making networking cancellable is a fundamental building block. We have a great opportunity here to create a clean cancellation story for networking.


My API proposal: Support CancellationToken on these methods:

  • Send
  • Receive
  • Close
  • Disconnect (low priority)
  • Shutdown
  • Accept
  • NetworkStream.Read/Write
  • TcpClient (all methods where cancellation applies)
  • TcpListener (all methods where cancellation applies)
  • UdpClient (all methods where cancellation applies)
  • UdpListener (all methods where cancellation applies)

I chose this subset because this is what normal socket servers and clients need. They don't need more than that. Note, that Close can block because it contains a flush. So that needs to be cancellable as well. Disconnect is of low priority. I don't know why anyone would use it. Maybe to save a socket handle destroy/create cycle.

We don't touch Dispose. I believe Dispose can block but users can avoid that by calling Close(CancellationToken) first.

There is a Connect(string host, int port) overload that does DNS resolution. If this makes any trouble at all implementing then I'd advocate for leaving this out. This is a bit of an esoteric overload and users can avoid blocking by doing their own DNS first.

I added TcpClient, TcpListener and NetworkStream because these are for the most part forwarders and should be very easy to implement.

Any cancellation must close the connection and dispose the socket.

Cancellation must be thread safe. Cancellations can concurrently arise from parallel reads, writes and close operations on separate tokens and threads. Cancellations might arrive through CancellationToken or through calling Close or Dispose. The OS handle must be destroyed exactly once and no weird exceptions must be thrown. All outstanding operations must complete with a cancellation exception.

@GSPP
Copy link
Author

GSPP commented Aug 12, 2018

The Windows documentation says this about cancelling operations:

In many cases, Winsock overlapped operations using AcceptEx, ConnectEx, WSASend, WSARecv, TransmitFile, and similar functions are cancelable. However, behavior is undefined for the continued use of a socket that has canceled outstanding operations. The closesocket function should be called after canceling an overlapped operation. Therefore, for best results, instead of canceling the I/O directly, the closesocket function should be called to close the socket which will eventually discontinue all pending operations.

This forces a design where any cancellation ends the connection. This should be fine.

Cancellation should be instantaneous in all cases. If it is not then applications cannot rely on cancellation being effective to move the UI state forward. It's also a problem for lingering resource consumption if the application moves on before all resources are freed.

@tmds
Copy link
Member

tmds commented Jul 28, 2019

@stephentoub
Copy link
Member

Triage:
Thanks for the detailed issue.
There are three potential bodies of work here:

  1. Make Dispose safe for canceling all synchronous operations on the socket. This has now been done by @tmds for .NET 5. If we discover additional places where this isn't safe or is insufficient, they should be reported as separate issues and fixed for .NET 5.
  2. CancellationToken/{Value}Task-based overloads for all relevant async operations, with the cancellation token fully respected. That's tracked by various issues, e.g. https://github.com/dotnet/corefx/issues/37118, https://github.com/dotnet/corefx/issues/35861.
  3. CancellationToken-based overloads for all sync operations. We do not plan to do this.

@GSPP
Copy link
Author

GSPP commented Oct 13, 2019

@stephentoub 👍

And regarding (1), Dispose is now safe for all sync and async operations as well as connect, accept, send and receive? It really has to cover all of these because all of them occur during normal program execution and they must instantly respond to cancellation without state corruption.

@stephentoub
Copy link
Member

And regarding (1), Dispose is now safe for all sync and async operations as well as connect, accept, send and receive?

On Windows and Linux. On macOS I believe we simply don't have any reliable way from the OS to interrupt a synchronous connect. @tmds can correct me if I'm wrong.

@GSPP
Copy link
Author

GSPP commented Oct 15, 2019

👍 That is a great improvement. 👍

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Jan 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Net
Projects
None yet
Development

No branches or pull requests

8 participants