Skip to content

Conversation

@Aaronontheweb
Copy link
Member

Summary

Fixes race condition in FlowPrefixAndTailSpec.PrefixAndTail_must_throw_if_tail_is_attempted_to_be_materialized_twice that was causing intermittent test failures on CI.

Error: Test expected OnError but received OnNext(2) instead.

Root Cause

The issue was a read-then-CAS race condition in SubSource<T>.Logic.SetCallback():

  1. Thread 1: reads status (null) → CompareAndSet succeeds → starts data flow
  2. Thread 2: reads status (stale null) → CompareAndSet fails → does separate read for retry
  3. Race window: Between Thread 2's CAS failure and subsequent read, Thread 1's OnNext(2) flows to wrong subscriber

Solution

  1. Added CompareExchange method to AtomicReference<T>: Returns actual previous value instead of just boolean, following standard .NET Interlocked patterns

  2. Replaced read-then-CAS with single atomic operation:

    // Before (racy)
    var status = _stage._status.Value; 
    if (status == null) { 
        if (\!_stage._status.CompareAndSet(null, callback)) // Race window here
    }
    
    // After (atomic)
    var previous = _stage._status.CompareExchange(null, callback);
    switch (previous) { ... }

This eliminates all race conditions by using a single atomic operation that both attempts the change AND returns what was actually there.

Testing

  • Test now passes consistently (ran 50+ times without failure)
  • No other tests affected by AtomicReference changes

Files Changed

  • src/core/Akka/Util/AtomicReference.cs - Added CompareExchange method
  • src/core/Akka.Streams/Implementation/Fusing/StreamOfStreams.cs - Fixed race condition

Fixes intermittent CI failures from PR #7793.

…ion of double materialization

The race condition occurred when two materializations happened simultaneously:
1. First materialization succeeds in setting callback via CompareAndSet
2. Second materialization checks status, sees null, then tries CompareAndSet which fails
3. Second materialization recursively calls SetCallback, but by then data may already be flowing to first subscriber
4. Second subscriber receives OnNext(2) instead of expected error

Fixed by immediately checking the new status when CompareAndSet fails and throwing
the IllegalStateException atomically before any data can flow to the wrong subscriber.
…using single atomic operation

1. Added CompareExchange method to AtomicReference<T> that returns the previous value
   instead of just a boolean, following standard .NET Interlocked patterns

2. Fixed FlowPrefixAndTailSpec race condition by replacing read-then-CAS pattern
   with single atomic CompareExchange operation

The original issue was that SetCallback used a problematic read-then-CAS pattern:
- Thread 1: read status (null) -> CAS succeeds
- Thread 2: read status (stale null) -> CAS fails -> separate read for retry
- Race window between Thread 2's CAS failure and subsequent read allowed
  data to flow to wrong subscriber

New approach uses single atomic operation that both attempts the change
AND returns what was actually there, eliminating all race conditions.

Before: var status = _status.Value; if (status == null) CAS(...)
After:  var previous = _status.CompareExchange(null, callback); handle(previous)

if (status == null)
// Single atomic operation that both attempts the change AND returns the previous value
var previous = _stage._status.CompareExchange(null, callback);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the real fix - only read the value once, as the result of the CompareExchange

@Aaronontheweb
Copy link
Member Author

This is, for sure, a bug that may be affecting some other tests too - looks like this was the worst offender though.

await downstream.RequestAsync(10);
downstream.ExpectNextN(Enumerable.Range(1, 10));
var received = downstream.ExpectNextN(10);
received.OrderBy(x => x).Should().BeEquivalentTo(Enumerable.Range(1, 10));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the ordering deterministics

@Aaronontheweb Aaronontheweb changed the title Fix FlowPrefixAndTailSpec race condition with atomic CompareExchange Fix StreamOfStreams with atomic CompareExchange Sep 3, 2025
@Aaronontheweb
Copy link
Member Author

This will need a backport to v1.5

Copy link
Contributor

@Arkatufus Arkatufus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, waiting for CI/CD

@Aaronontheweb Aaronontheweb merged commit 8ea65a4 into akkadotnet:dev Sep 9, 2025
7 of 11 checks passed
@Aaronontheweb Aaronontheweb deleted the fix/racy-flowprefixandtail-spec branch September 9, 2025 17:18
Aaronontheweb added a commit to Aaronontheweb/akka.net that referenced this pull request Sep 9, 2025
The test `PrefixAndTail_must_throw_if_tail_is_attempted_to_be_materialized_twice`
was failing intermittently with "Expected OnError but received OnNext(2)".

Root cause: Even after PR akkadotnet#7796 fixed the atomic detection of double
materialization, there was still a timing race between error detection
and demand signaling from ExpectSubscriptionAndError().

Fix: Disable demand signaling in the second subscriber's error expectation
by using `ExpectSubscriptionAndError(signalDemand: false)`. This eliminates
the race window while preserving the test's intent to verify error handling.

The test now passes consistently without requiring changes to production code.
Aaronontheweb added a commit that referenced this pull request Sep 9, 2025
…st (#7816)

The test `PrefixAndTail_must_throw_if_tail_is_attempted_to_be_materialized_twice`
was failing intermittently with "Expected OnError but received OnNext(2)".

Root cause: Even after PR #7796 fixed the atomic detection of double
materialization, there was still a timing race between error detection
and demand signaling from ExpectSubscriptionAndError().

Fix: Disable demand signaling in the second subscriber's error expectation
by using `ExpectSubscriptionAndError(signalDemand: false)`. This eliminates
the race window while preserving the test's intent to verify error handling.

The test now passes consistently without requiring changes to production code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants