Skip to content

Restore blocking mode after successful ConnectAsync on Unix#124200

Open
liveans wants to merge 1 commit intodotnet:mainfrom
liveans:socket-blocking-mode-optimization
Open

Restore blocking mode after successful ConnectAsync on Unix#124200
liveans wants to merge 1 commit intodotnet:mainfrom
liveans:socket-blocking-mode-optimization

Conversation

@liveans
Copy link
Member

@liveans liveans commented Feb 9, 2026

Summary

On Unix, after a successful ConnectAsync completion, this PR restores the underlying socket to blocking mode if the user hasn't explicitly set Socket.Blocking = false. This
optimizes subsequent synchronous operations (Send/Receive) by using native blocking syscalls instead of emulating blocking on top of epoll/kqueue.

When SendAsync/ReceiveAsync (or other async I/O operations) are called later, the socket is switched back to non-blocking mode via the existing SetHandleNonBlocking()
mechanism.

Motivation

Currently, once any async operation is performed on a Unix socket, the underlying file descriptor is permanently set to non-blocking mode via fcntl(O_NONBLOCK). Subsequent
synchronous operations must then emulate blocking behavior in managed code using epoll/kqueue, which has performance overhead compared to native blocking syscalls.

This change benefits the common usage pattern:

await socket.ConnectAsync(endpoint);
// Subsequent sync operations now use native blocking syscalls (more efficient)
socket.Receive(buffer);
socket.Send(data);

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @karelz, @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

@stephentoub
Copy link
Member

🤖 Copilot Code Review — PR #124200

Holistic Assessment

Motivation: The optimization goal is valid. After ConnectAsync completes, users commonly switch to synchronous Send/Receive operations. Currently these must emulate blocking behavior via epoll/kqueue when the socket is in non-blocking mode. Restoring native blocking mode could improve performance for this pattern.

Approach: The implementation adds SetHandleBlocking() to toggle the socket back to blocking mode and calls RestoreBlocking() from multiple Connect/Accept completion paths. The design respects user preferences (only restores blocking if Socket.Blocking is true).

Summary: ⚠️ Needs Human Review. The optimization concept is sound, but this change reintroduces exactly the synchronization complexity the original design explicitly avoided. The key findings below require human judgment on whether the race conditions are acceptable in practice.


Detailed Findings

⚠️ Thread Safety — No Synchronization on Blocking Mode Toggle

Location: SocketAsyncContext.Unix.csSetHandleBlocking() and _isHandleNonBlocking field

The original code (lines 1350-1354) explicitly warned:

"We never transition back to blocking mode, to avoid problems synchronizing that transition with the async infrastructure."

While the PR updates this comment, it doesn't address the underlying race condition. Consider:

  1. Thread A: ConnectAsync completes → calls RestoreBlocking() → sets socket to blocking
  2. Thread B: Simultaneously starts ReceiveAsync → calls SetHandleNonBlocking()

The _isHandleNonBlocking field is read/written without any locking in both methods. Additionally, ShouldRetrySyncOperation() (line 1432) reads _isHandleNonBlocking to determine behavior—if stale, it could misclassify EAGAIN as a timeout.

The PR claims safety because ConnectAsync/AcceptAsync completion is "guaranteed by construction to not be used concurrently with any other operation", but this guarantee is not enforced by the Socket API. A user can start a ReceiveAsync on one thread while ConnectAsync is completing on another.

Questions for maintainers:

  • Is the race actually benign in practice (i.e., worst case is an extra fcntl syscall)?
  • Should Volatile.Read/Write be used for _isHandleNonBlocking?
  • Should the restoration be best-effort (catch and ignore fcntl failures)?

(Flagged by: Claude + GPT-5.1)

⚠️ AcceptAsync Restores Blocking on Listener Socket

Location: SocketAsyncContext.Unix.cs — lines 1509, 1524 (per diff line numbers)

RestoreBlocking() is called on the listener socket after AcceptAsync completes. Unlike ConnectAsync (which can only succeed once per socket), a listener socket may:

  • Have multiple AcceptAsync operations pending
  • Be used concurrently with other accept operations

If one accept completes and restores blocking mode while another async accept is pending, could this cause issues with the epoll/kqueue infrastructure?

Suggestion: Verify whether restoring blocking mode on listener sockets is safe given concurrent accept operations, or limit this optimization to connect sockets only.

⚠️ Exception After Successful Operation

Location: SocketAsyncContext.Unix.csSetHandleBlocking()

csharp if (Interop.Sys.Fcntl.SetIsNonBlocking(_socket, 0) != 0) { throw new SocketException(...); }

If the fcntl call fails (e.g., socket is in an unexpected state), this throws after a logically successful connect/accept. The user's await ConnectAsync() would throw even though the connection succeeded.

Suggestion: Consider making this best-effort (catch and ignore failures, or return a bool and let RestoreBlocking be truly non-throwing). The blocking mode is an optimization; failing it shouldn't fail the operation.

(Flagged by: GPT-5.1)

💡 Multiple RestoreBlocking Call Sites

Location: SocketAsyncContext.Unix.cs — 6+ call sites

RestoreBlocking() is called from many places:

  • AcceptOperation.InvokeCallback()
  • ConnectOperation.InvokeCallback()
  • AcceptAsync synchronous completion (2 paths)
  • ConnectAsync synchronous completion (2 paths)

This is correct for covering all paths, but fragile—future changes to async completion could forget to call RestoreBlocking(). Consider whether there's a single choke point where restoration could happen.

✅ WASI Handling

The early return in SetHandleBlocking() for WASI is correct—WASI sockets are always non-blocking.

✅ Test Coverage

The tests cover the key scenarios:

  • Normal connect with blocking restoration
  • User-set non-blocking mode preserved
  • Subsequent async ops restore non-blocking
  • Failed connect behavior
  • AcceptAsync blocking restoration

However, tests don't cover the race scenarios (concurrent operations) which is where the risk lies.


Cross-Cutting Analysis

Threading Model: The existing socket code has careful threading considerations. The original design deliberately avoided blocking-mode transitions during async operations. This PR re-introduces that complexity. Human review is needed to determine if the races are acceptable.

Related Sibling Code: The Windows implementation doesn't have this concept (Windows sockets work differently). This is a Unix-only optimization.

Performance: No benchmark data was provided. Given the syscall overhead of calling fcntl on every connect/accept completion, it would be valuable to measure whether this actually improves the target scenario (ConnectAsync followed by sync Send/Receive).


Summary

Verdict: ⚠️ Needs Human Review

The optimization motivation is valid, but this change reintroduces synchronization complexity that was explicitly avoided in the original design. Key questions for maintainers:

  1. Thread safety: Is the unsynchronized access to _isHandleNonBlocking acceptable given the new bidirectional transitions?
  2. Listener sockets: Is it safe to restore blocking mode on listener sockets that may have concurrent accept operations?
  3. Exception behavior: Should SetHandleBlocking failures be silently ignored rather than throwing?
  4. Performance evidence: Are there benchmarks showing this optimization provides meaningful improvement?

The code itself is consistent and tests cover the intended behavior. The uncertainty is about the design tradeoffs, not the implementation.


Review generated by Copilot code-review skill. Models contributing: Claude Sonnet 4, GPT-5.1

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/libraries/System.Net.Sockets/src/System/Net/Sockets/SocketAsyncContext.Unix.cs:688

  • The InvokeCallback override only invokes the callback and calls RestoreBlocking when buffer.Length is 0. However, when a connect operation fails (ErrorCode != Success) and a buffer was provided (via DoOperationConnectEx), the callback will never be invoked and RestoreBlocking won't be called.

This can occur when using ConnectEx with a buffer on Unix systems. The callback must always be invoked when the operation is complete, regardless of whether there's a buffer or whether the operation succeeded or failed. RestoreBlocking should also be called on error paths to restore the socket to blocking mode after a failed connection attempt.

Consider updating the logic to:

  1. Always call RestoreBlocking when the connect fails (ErrorCode != Success), regardless of buffer.Length
  2. Always invoke the callback when the operation is complete and no follow-up Send is needed
            public override void InvokeCallback(bool allowPooling)
            {
                var cb = Callback!;
                int bt = BytesTransferred;
                Memory<byte> sa = SocketAddress;
                SocketError ec = ErrorCode;
                Memory<byte> buffer = Buffer;

                if (buffer.Length == 0)
                {
                    AssociatedContext._socket.RestoreBlocking();

                    // Invoke callback only when we are completely done.
                    // In case data were provided for Connect we may or may not send them all.
                    // If we did not we will need follow-up with Send operation
                    cb(bt, sa, SocketFlags.None, ec);
                }
            }

Comment on lines +100 to +109
[Fact]
public async Task ConnectAsync_Failure_SocketIsRestoredToBlocking()
{
using Socket client = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);

await Assert.ThrowsAsync<SocketException>(async () =>
await client.ConnectAsync(new IPEndPoint(IPAddress.Loopback, 1)));

Assert.False(IsSocketNonBlocking(client));
}
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test suite is missing coverage for a failed ConnectAsync with a data buffer. When using ConnectEx with a buffer (via SocketAsyncEventArgs.SetBuffer), if the connection fails, the callback may not be invoked due to the bug in ConnectOperation.InvokeCallback (lines 671-688 in SocketAsyncContext.Unix.cs).

Consider adding a test that:

  1. Creates a SocketAsyncEventArgs with a buffer via SetBuffer
  2. Attempts to connect to an unreachable endpoint (e.g., port 1)
  3. Verifies that the Completed callback is invoked with an error
  4. Verifies that the socket is restored to blocking mode after the failed connect

Copilot uses AI. Check for mistakes.
@liveans liveans force-pushed the socket-blocking-mode-optimization branch from 49a3f53 to b8c9054 Compare February 13, 2026 16:09
@liveans
Copy link
Member Author

liveans commented Feb 13, 2026

/azp run runtime-libraries-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants