Skip to content

Fix deadlock in ServiceEndpointWatcher when disposing change token registration#7255

Merged
ReubenBond merged 1 commit intodotnet:mainfrom
ReubenBond:fix/sd-deadlock
Feb 3, 2026
Merged

Fix deadlock in ServiceEndpointWatcher when disposing change token registration#7255
ReubenBond merged 1 commit intodotnet:mainfrom
ReubenBond:fix/sd-deadlock

Conversation

@ReubenBond
Copy link
Member

@ReubenBond ReubenBond commented Feb 2, 2026

Summary

Fixes a deadlock in ServiceEndpointWatcher that occurs when disposing the change token registration while holding _lock.

The Problem

A deadlock can occur with the following sequence:

  1. Thread A enters RefreshAsyncInternal(), acquires _lock, and calls _changeTokenRegistration?.Dispose()
  2. CancellationTokenRegistration.Dispose() blocks waiting for any in-flight callback to complete
  3. Thread B is executing the change token callback which calls RefreshAsync(force: false), which tries to acquire _lock
  4. Deadlock: Thread A holds _lock and waits for Thread B's callback → Thread B waits for _lock held by Thread A

The Fix

Move _changeTokenRegistration?.Dispose() outside the lock:

  • Capture the registration reference while holding the lock
  • Set the field to null while holding the lock
  • Dispose outside the lock

Applied to both RefreshAsyncInternal() and DisposeAsync() methods.

Testing

  • All existing ServiceDiscovery tests pass (50/50)
Microsoft Reviewers: Open in CodeFlow

Copilot AI review requested due to automatic review settings February 2, 2026 21:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical deadlock issue in ServiceEndpointWatcher that occurs when disposing change token registrations while holding a lock. The deadlock happens when one thread holds the lock and waits for a callback to complete, while the callback thread is blocked trying to acquire the same lock.

Changes:

  • Moved _changeTokenRegistration?.Dispose() outside the lock in RefreshAsyncInternal() by capturing the registration before disposal
  • Applied the same pattern to DisposeAsync() for both _changeTokenRegistration and _pollingTimer
  • Added explanatory comments documenting the deadlock scenario and why disposal must occur outside the lock

@ReubenBond ReubenBond requested a review from rzikm February 2, 2026 21:05
@ReubenBond ReubenBond removed their assignment Feb 2, 2026
Copy link
Member

@rzikm rzikm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo one comment

@joperezr
Copy link
Member

joperezr commented Feb 3, 2026

/backport to release/10.2

@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

Started backporting to release/10.2: https://github.com/dotnet/extensions/actions/runs/21645137209

@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

@joperezr backporting to "release/10.2" failed, the patch most likely resulted in conflicts:

$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch

Patch format detection failed.
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

…gistration

Move _changeTokenRegistration.Dispose() outside the lock to avoid deadlock.
CancellationTokenRegistration.Dispose() blocks waiting for any in-flight
callback to complete, but the callback (RefreshAsync) tries to acquire
the same lock, causing a deadlock.

The fix captures the registration reference while holding the lock, then
disposes it after releasing the lock. Applied to both RefreshAsyncInternal
and DisposeAsync methods.
@ReubenBond
Copy link
Member Author

/backport to release/10.2

@github-actions
Copy link
Contributor

github-actions bot commented Feb 3, 2026

Started backporting to release/10.2: https://github.com/dotnet/extensions/actions/runs/21650268580

@ReubenBond ReubenBond enabled auto-merge (squash) February 3, 2026 23:37
@ReubenBond ReubenBond merged commit b26e2d2 into dotnet:main Feb 3, 2026
6 checks passed
This was referenced Feb 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants