Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Oct 24, 2025

Problem

The test ExecuteAsync_WhenStepFails_PipelineLoggerIsCleanedUp was failing intermittently with an assertion error:

Assert.Equal() Failure: Values differ
Expected: 2
Actual:   1

The test expected 2 step activity groups (one for the default "deploy" step and one for the "failing-step") but the actual count varied due to timing of concurrent step execution.

Root Cause

The test had a race condition in its assertions:

  1. When multiple pipeline steps have no dependencies between them, they execute concurrently via Task.Run()
  2. If one step (e.g., "failing-step") throws an exception quickly, it cancels the shared linkedCts cancellation token to stop remaining work (fail-fast behavior)
  3. Depending on timing, other concurrent steps (e.g., the "deploy" step) may or may not complete their activity reporting before cancellation occurs
  4. The test was asserting an exact count of step activities, which varied based on whether the deploy step completed before cancellation

Solution

Updated the test to focus on verifying the failing step's behavior rather than asserting an exact count of step activities:

// Before
Assert.Equal(2, stepActivities.Count); // Updated to account for "deploy" step
var failingStepActivity = stepActivities.FirstOrDefault(g => g.Any(a => a.Data.StatusText == "failing-step"));
Assert.NotNull(failingStepActivity);

// After
// Verify the failing step activity (other steps may or may not complete depending on timing)
var failingStepActivity = stepActivities.Single(g => g.Any(a => a.Data.StatusText == "failing-step"));

This preserves the fail-fast behavior while making the test resilient to timing variations. The test now focuses on what matters: verifying that the failing step is properly reported with error state and that the logger is cleaned up after failure.

Testing

  • Verified the target test passes consistently (20 consecutive runs)
  • All 65 DistributedApplicationPipelineTests pass
  • No regressions introduced
  • Fail-fast behavior is preserved

Fixes intermittent test failure in CI/CD pipelines.

Original prompt

investigate this failing test:

🔴 Aspire.Hosting.Tests.Pipelines.DistributedApplicationPipelineTests.ExecuteAsync_WhenStepFails_PipelineLoggerIsCleanedUp
Assert.Equal() Failure: Values differ
Expected: 2
Actual: 1 at Aspire.Hosting.Tests.Pipelines.DistributedApplicationPipelineTests.ExecuteAsync_WhenStepFails_PipelineLoggerIsCleanedUp() in /Users/runner/work/aspire/aspire/tests/Aspire.Hosting.Tests/Pipelines/DistributedApplicationPipelineTests.cs:line 1529
--- End of stack trace from previous location ---


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Use CancellationToken.None when creating step activities to prevent race condition where a failing step could cancel the token before other concurrent steps finish creating their activities.

Co-authored-by: davidfowl <95136+davidfowl@users.noreply.github.com>
Copilot AI changed the title [WIP] Investigate failing test for pipeline logger cleanup Fix race condition in pipeline step activity reporting when steps fail Oct 24, 2025
Copilot finished work on behalf of davidfowl October 24, 2025 09:04
Copilot AI requested a review from davidfowl October 24, 2025 09:04
@davidfowl
Copy link
Member

@copilot This does not fix the issue, it undoes the behavior, we want fail fast behavior by default. The test should verify. a single failure as it is hard to verify multiple failures.

Reverted the CancellationToken.None change to preserve fail-fast behavior as requested by @davidfowl. Updated the test to focus on verifying the single failing step rather than asserting an exact count of step activities, making it more resilient to timing variations while maintaining fail-fast behavior.

Co-authored-by: davidfowl <95136+davidfowl@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Oct 24, 2025

@copilot This does not fix the issue, it undoes the behavior, we want fail fast behavior by default. The test should verify. a single failure as it is hard to verify multiple failures.

Reverted the pipeline change and updated the test to verify only the failing step instead of asserting an exact count. This preserves fail-fast behavior while making the test resilient to timing variations. Changes in commit 8415720.

Copilot AI changed the title Fix race condition in pipeline step activity reporting when steps fail Fix flaky test ExecuteAsync_WhenStepFails_PipelineLoggerIsCleanedUp Oct 24, 2025
Copilot finished work on behalf of davidfowl October 24, 2025 13:39
@eerhardt eerhardt marked this pull request as ready for review October 24, 2025 20:15
@eerhardt eerhardt requested a review from mitchdenny as a code owner October 24, 2025 20:15
Copilot AI review requested due to automatic review settings October 24, 2025 20:15
@eerhardt eerhardt enabled auto-merge (squash) October 24, 2025 20:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a flaky test that was failing intermittently due to a race condition in concurrent step execution. The test now focuses on verifying the failing step's behavior rather than asserting exact counts that vary based on timing.

Key Changes:

  • Removed assertion on exact count of step activities to avoid timing-dependent failures
  • Changed from FirstOrDefault + NotNull check to Single to verify exactly one failing step exists
  • Added explanatory comment about timing variability of concurrent steps

@github-actions
Copy link
Contributor

github-actions bot commented Oct 24, 2025

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 12341

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 12341"

@eerhardt eerhardt merged commit d139573 into main Oct 25, 2025
605 of 609 checks passed
@eerhardt eerhardt deleted the copilot/investigate-failing-test branch October 25, 2025 08:57
@dotnet-policy-service dotnet-policy-service bot added this to the 13.0 milestone Oct 25, 2025
radical pushed a commit that referenced this pull request Oct 29, 2025
…12341)

* Initial plan

* Fix race condition in pipeline step activity reporting

Use CancellationToken.None when creating step activities to prevent race condition where a failing step could cancel the token before other concurrent steps finish creating their activities.

Co-authored-by: davidfowl <95136+davidfowl@users.noreply.github.com>

* Revert pipeline change and fix test to only verify failing step

Reverted the CancellationToken.None change to preserve fail-fast behavior as requested by @davidfowl. Updated the test to focus on verifying the single failing step rather than asserting an exact count of step activities, making it more resilient to timing variations while maintaining fail-fast behavior.

Co-authored-by: davidfowl <95136+davidfowl@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: davidfowl <95136+davidfowl@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants