Add an azdo failure skill #123913

lewing · 2026-02-02T21:05:57Z

Summary

Adds an AI agent skill for analyzing Azure DevOps and Helix test failures across dotnet repositories. When asked to investigate CI failures, the skill teaches Copilot how to query APIs, extract failure details, and provide actionable recommendations.

Features

Core Functionality

PR and Build Analysis: Query by PR number or Azure DevOps build ID
Multi-repository Support: Works with dotnet/runtime, sdk, aspnetcore, roslyn, and others
Helix Integration: Extracts work item details, console logs, and artifacts
Local Test Support: Handles non-Helix tests (e.g., dotnet/sdk style)

Intelligent Analysis

Build Analysis Integration: Automatically fetches known issues from the Build Analysis PR check
Known Issue Search: Searches GitHub for issues with "Known Build Error" label
MihuBot Semantic Search: Optional integration with MihuBot's semantic database (-SearchMihuBot)
PR Change Correlation: Correlates failures with files changed in the PR
Canceled vs Failed Jobs: Distinguishes between actual failures and dependency cancellations

Smart Recommendations

Provides actionable guidance at the end of analysis:

Recommendation	Meaning
NO RETRY NEEDED	All failures match known tracked issues
LIKELY PR-RELATED	Failures correlate with PR changes
POSSIBLY TRANSIENT	No clear cause - check main branch

Usage

# Analyze PR failures
./scripts/Get-HelixFailures.ps1 -PRNumber 123445 -ShowLogs

# Other repositories
./scripts/Get-HelixFailures.ps1 -PRNumber 12345 -Repository "dotnet/sdk"

# With semantic search
./scripts/Get-HelixFailures.ps1 -PRNumber 123445 -SearchMihuBot

Example Prompts

"Analyze the failures on PR Add test for custom attribute with generic enum array argument #123445"
"Why is this build failing? https://dev.azure.com/dnceng-public/..."
"Investigate the test failures in dotnet/aspnetcore PR HttpStatusCode is missing code 429 (TooManyRequests) #54321"

Structure

.github/skills/azdo-helix-failures/
├── SKILL.md                 # Skill documentation
├── scripts/
│   └── Get-HelixFailures.ps1   # Main PowerShell script
└── references/
    ├── azdo-helix-reference.md    # API and build definition details
    └── manual-investigation.md    # Manual investigation steps

Pull request overview

This PR adds a new skill for retrieving and analyzing test failures from Azure DevOps builds and Helix test runs in the dotnet/runtime CI pipeline.

Changes:

Introduces documentation and tooling to help investigate CI test failures
Provides PowerShell script to query Azure DevOps and Helix APIs for failure information
Enables querying by build ID or PR number with optional detailed log fetching

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
.github/skills/azdo-helix-failures/SKILL.md	Documents the skill's purpose, usage examples, manual investigation steps, and common failure patterns
.github/skills/azdo-helix-failures/Get-HelixFailures.ps1	PowerShell script that queries Azure DevOps for failed jobs and retrieves Helix console logs

.github/skills/azdo-helix-failures/Get-HelixFailures.ps1

- Fix unnecessary backtick escaping in string interpolation - Rename $matches to $urlMatches/$failureMatches to avoid shadowing automatic variable - Add gh CLI dependency check with helpful error message - Add -TimeoutSec parameter (default 30s) for API calls - Add -MaxFailureLines parameter (default 50) for configurable output - Improve Format-TestFailure to detect end of stack trace via empty lines - Add Write-Verbose output for debugging - Update SKILL.md with new parameters, prerequisites, and org/project documentation

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

.github/skills/azdo-helix-failures/Get-HelixFailures.ps1

.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1

- Add Extract-BuildErrors function to parse build logs for error patterns - Add Get-FailureClassification function with known patterns: - macOS clang module cache/dsymutil issues - NativeAOT size regressions - NuGet package errors - Device infrastructure issues - Helix timeouts - C# and MSBuild compilation errors - Expand Format-TestFailure patterns for better Helix log extraction - For non-Helix failures, now extracts actual errors and provides classification, suggested action, and transient failure detection

- Add -Repository parameter to support repos other than dotnet/runtime - Add -ContextLines parameter for error context - Reorder error patterns (specific before general) to avoid overmatch - Fix Select-Object ordering (First then Unique) - Add classification to Helix test failures, not just build failures - Expand Format-TestFailure to capture multiple failures (up to 3) - Add new failure patterns: - OutOfMemoryException (transient) - StackOverflowException - Assertion failures - Test timeouts - Network connectivity issues

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

.github/skills/azdo-helix-failures/Get-HelixFailures.ps1

- Use ${LogId} syntax to prevent PowerShell parsing $LogId? as ternary - Normalize line breaks in log content before extracting Helix URLs - Update URL pattern to handle workitem names with special chars

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

steveisok · 2026-02-02T23:32:17Z

It's looking like I should close #123863 in favor of yours :-)

Script improvements: - Add -HelixJob parameter for direct Helix job queries - Add -WorkItem parameter to query specific work items - Add Get-HelixJobDetails, Get-HelixWorkItems, Get-HelixWorkItemDetails functions - Show work item artifacts, machine name, duration, exit code - List failed work items when querying a job without -WorkItem Documentation improvements (from PR dotnet#123863): - Add build definition IDs table (129, 133, 139) - Add failure classification table with all patterns - Add Helix API curl examples - Add artifact download documentation - Add environment variable extraction examples - Add links to triaging guide, area owners, Helix swagger - Document -HelixJob and -WorkItem parameters

- Add Extract-TestRunUrls function to parse 'Published Test Run' URLs - Add Get-LocalTestFailures to detect non-Helix test failures - Add classification for local xUnit test failures - Update main flow to report local test failures with links - Update SKILL.md with new documentation

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

- Add Get-AzDOTestResults function using az devops invoke - Fetch actual failed test names when az CLI is available - Show up to 10 failed test names with count

- Add Extract-HelixLogUrls function to parse Helix console log URLs - Display work item names with direct log links for Helix failures - Deduplicate URLs to avoid showing duplicates

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

.github/skills/azdo-helix-failures/Get-HelixFailures.ps1

.github/skills/azdo-helix-failures/SKILL.md

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

.github/skills/azdo-helix-failures/references/azdo-helix-reference.md

.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1

.github/skills/azdo-helix-failures/SKILL.md

- Fix SHA256 resource leak: wrap in try/finally with Dispose() - Use safer gh command syntax with & call operator - Add note about running from skill directory - Use cross-platform relative paths (./scripts/...)

- Add documentation about local test failures matching known issues - Search by task name if no specific build errors found - Helps repos like dotnet/sdk that run tests locally

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.

.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1

- Extract Get-UrlHash helper to reduce SHA256 duplication - Simplify retry recommendation (avoid 1:1 mapping assumption) - Move processedJobs increment inside try block - Add exception context when re-throwing errors

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.

.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1

The Helix API returns Files with 'FileName' property, not 'Name'. Also deduplicate artifacts since same file can appear multiple times for different test runs within a work item.

Documents how to find and analyze Helix work item artifacts including: - Accessing artifacts via script and API - Common artifact types (binlogs, console logs, crash dumps) - Binlog analysis with live.msbuildlog.com - Mobile test artifacts (iOS/Android) - Artifact retention info

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 11 comments.

.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1

.github/skills/azdo-helix-failures/SKILL.md

.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1

Key insight: artifact structures vary by test type, always query the work item to see what's available. Document common patterns without being prescriptive about exact layouts.

Security: - Add repository format validation (prevent command injection) - Sanitize search terms before passing to gh CLI - Use full SHA256 hash for cache keys (collision resistance) Robustness: - Add null checks for Timeline.records in Get-FailedJobs, Get-CanceledJobs, Get-HelixJobInfo - Handle ConvertFrom-Json failures in cache gracefully (treat as cache miss) - Use atomic cache writes (temp file + rename) to avoid race conditions - Increase JSON serialization depth from 10 to 100 Documentation: - Fix SKILL.md examples to use correct paths from repo root

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1

All examples now use ./scripts/Get-HelixFailures.ps1 relative to skill directory.

- Add try/catch to cache cleanup with verbose logging on failure - Add comment explaining allowed chars in search term regex - Use TryParse for build ID parsing instead of direct cast - Validate headSha format (40 hex chars) before using in API call

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

- Fix FileName property in manual-investigation.md (was Name) - Fix dotnet/sdk description: uses both local and Helix tests

Add an azdo failure skill

0365ee9

Copilot AI review requested due to automatic review settings February 2, 2026 21:05

dotnet-policy-service bot assigned lewing Feb 2, 2026

lewing mentioned this pull request Feb 2, 2026

Add triaging-helix-failures agent skill #123863

Closed

Copilot AI reviewed Feb 2, 2026

View reviewed changes

.github/skills/azdo-helix-failures/Get-HelixFailures.ps1 Outdated Show resolved Hide resolved

.github/skills/azdo-helix-failures/Get-HelixFailures.ps1 Outdated Show resolved Hide resolved

.github/skills/azdo-helix-failures/Get-HelixFailures.ps1 Outdated Show resolved Hide resolved

jkotas added the area-Infrastructure label Feb 2, 2026

github-project-automation bot added this to Runtime Infra Feb 2, 2026

lewing added 2 commits February 2, 2026 15:16

Fix PowerShell variable scope issue

c040d21

Copilot AI review requested due to automatic review settings February 2, 2026 21:54

Copilot AI reviewed Feb 2, 2026

View reviewed changes

.github/skills/azdo-helix-failures/Get-HelixFailures.ps1 Outdated Show resolved Hide resolved

.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1 Show resolved Hide resolved

lewing added 2 commits February 2, 2026 16:16

Copilot AI review requested due to automatic review settings February 2, 2026 22:56

Copilot AI reviewed Feb 2, 2026

View reviewed changes

.github/skills/azdo-helix-failures/Get-HelixFailures.ps1 Outdated Show resolved Hide resolved

.github/skills/azdo-helix-failures/Get-HelixFailures.ps1 Outdated Show resolved Hide resolved

.github/skills/azdo-helix-failures/Get-HelixFailures.ps1 Outdated Show resolved Hide resolved

lewing added 2 commits February 2, 2026 17:05

Fix URL parsing issues in Get-HelixFailures.ps1

80911b5

- Use ${LogId} syntax to prevent PowerShell parsing $LogId? as ternary - Normalize line breaks in log content before extracting Helix URLs - Update URL pattern to handle workitem names with special chars

Add Docker image pull failure pattern to classification

94a8925

Copilot AI review requested due to automatic review settings February 2, 2026 23:10

Copilot AI reviewed Feb 2, 2026

View reviewed changes

lewing added 2 commits February 2, 2026 17:35

Copilot AI review requested due to automatic review settings February 2, 2026 23:56

Copilot started reviewing on behalf of lewing February 2, 2026 23:57 View session

Copilot AI reviewed Feb 3, 2026

View reviewed changes

lewing added 2 commits February 2, 2026 18:00

Add Azure DevOps CLI support for fetching failed test names

eeec65c

- Add Get-AzDOTestResults function using az devops invoke - Fetch actual failed test names when az CLI is available - Show up to 10 failed test names with count

Include Helix console log links in failure output

3c86381

- Add Extract-HelixLogUrls function to parse Helix console log URLs - Display work item names with direct log links for Helix failures - Deduplicate URLs to avoid showing duplicates

Copilot AI review requested due to automatic review settings February 3, 2026 02:04

Copilot started reviewing on behalf of lewing February 3, 2026 02:05 View session

Copilot AI reviewed Feb 3, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings February 4, 2026 01:57

Copilot started reviewing on behalf of lewing February 4, 2026 01:57 View session

Copilot AI reviewed Feb 4, 2026

View reviewed changes

lewing added 2 commits February 3, 2026 20:14

Address additional PR review comments

d2e6302

- Fix SHA256 resource leak: wrap in try/finally with Dispose() - Use safer gh command syntax with & call operator - Add note about running from skill directory - Use cross-platform relative paths (./scripts/...)

Improve known issue search for local test failures

b98b0c0

- Add documentation about local test failures matching known issues - Search by task name if no specific build errors found - Helps repos like dotnet/sdk that run tests locally

Copilot AI review requested due to automatic review settings February 4, 2026 02:30

Copilot started reviewing on behalf of lewing February 4, 2026 02:31 View session

Copilot AI reviewed Feb 4, 2026

View reviewed changes

lewing added 2 commits February 3, 2026 20:41

Fix indentation in test failure extraction block

8e43185

Address PR review comments (batch 3)

f789e9e

- Extract Get-UrlHash helper to reduce SHA256 duplication - Simplify retry recommendation (avoid 1:1 mapping assumption) - Move processedJobs increment inside try block - Add exception context when re-throwing errors

Copilot AI review requested due to automatic review settings February 4, 2026 03:20

Copilot started reviewing on behalf of lewing February 4, 2026 03:21 View session

Copilot AI reviewed Feb 4, 2026

View reviewed changes

stephentoub marked this pull request as ready for review February 4, 2026 03:30

lewing added 2 commits February 3, 2026 22:55

Fix artifact file property name (Name -> FileName)

3499f2f

The Helix API returns Files with 'FileName' property, not 'Name'. Also deduplicate artifacts since same file can appear multiple times for different test runs within a work item.

Copilot AI review requested due to automatic review settings February 4, 2026 04:58

Copilot started reviewing on behalf of lewing February 4, 2026 04:59 View session

Copilot AI reviewed Feb 4, 2026

View reviewed changes

lewing added 2 commits February 3, 2026 23:12

Simplify helix-artifacts.md - focus on patterns not specifics

5fe1450

Key insight: artifact structures vary by test type, always query the work item to see what's available. Document common patterns without being prescriptive about exact layouts.

Copilot AI review requested due to automatic review settings February 4, 2026 05:20

Copilot started reviewing on behalf of lewing February 4, 2026 05:20 View session

Copilot AI reviewed Feb 4, 2026

View reviewed changes

lewing added 2 commits February 3, 2026 23:28

Use relative paths in skill documentation

a07dc98

All examples now use ./scripts/Get-HelixFailures.ps1 relative to skill directory.

Copilot AI review requested due to automatic review settings February 4, 2026 05:30

Copilot started reviewing on behalf of lewing February 4, 2026 05:31 View session

Copilot AI reviewed Feb 4, 2026

View reviewed changes

Fix documentation errors

da7a287

- Fix FileName property in manual-investigation.md (was Name) - Fix dotnet/sdk description: uses both local and Helix tests

Add an azdo failure skill #123913

Are you sure you want to change the base?

Add an azdo failure skill #123913

Conversation

lewing commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Features

Core Functionality

Intelligent Analysis

Smart Recommendations

Usage

Example Prompts

Structure

Related

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

steveisok commented Feb 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lewing commented Feb 2, 2026 •

edited

Loading