-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Add an azdo failure skill #123913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add an azdo failure skill #123913
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a new skill for retrieving and analyzing test failures from Azure DevOps builds and Helix test runs in the dotnet/runtime CI pipeline.
Changes:
- Introduces documentation and tooling to help investigate CI test failures
- Provides PowerShell script to query Azure DevOps and Helix APIs for failure information
- Enables querying by build ID or PR number with optional detailed log fetching
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| .github/skills/azdo-helix-failures/SKILL.md | Documents the skill's purpose, usage examples, manual investigation steps, and common failure patterns |
| .github/skills/azdo-helix-failures/Get-HelixFailures.ps1 | PowerShell script that queries Azure DevOps for failed jobs and retrieves Helix console logs |
- Fix unnecessary backtick escaping in string interpolation - Rename $matches to $urlMatches/$failureMatches to avoid shadowing automatic variable - Add gh CLI dependency check with helpful error message - Add -TimeoutSec parameter (default 30s) for API calls - Add -MaxFailureLines parameter (default 50) for configurable output - Improve Format-TestFailure to detect end of stack trace via empty lines - Add Write-Verbose output for debugging - Update SKILL.md with new parameters, prerequisites, and org/project documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
- Add Extract-BuildErrors function to parse build logs for error patterns - Add Get-FailureClassification function with known patterns: - macOS clang module cache/dsymutil issues - NativeAOT size regressions - NuGet package errors - Device infrastructure issues - Helix timeouts - C# and MSBuild compilation errors - Expand Format-TestFailure patterns for better Helix log extraction - For non-Helix failures, now extracts actual errors and provides classification, suggested action, and transient failure detection
- Add -Repository parameter to support repos other than dotnet/runtime - Add -ContextLines parameter for error context - Reorder error patterns (specific before general) to avoid overmatch - Fix Select-Object ordering (First then Unique) - Add classification to Helix test failures, not just build failures - Expand Format-TestFailure to capture multiple failures (up to 3) - Add new failure patterns: - OutOfMemoryException (transient) - StackOverflowException - Assertion failures - Test timeouts - Network connectivity issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
- Use ${LogId} syntax to prevent PowerShell parsing $LogId? as ternary
- Normalize line breaks in log content before extracting Helix URLs
- Update URL pattern to handle workitem names with special chars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
|
It's looking like I should close #123863 in favor of yours :-) |
Script improvements: - Add -HelixJob parameter for direct Helix job queries - Add -WorkItem parameter to query specific work items - Add Get-HelixJobDetails, Get-HelixWorkItems, Get-HelixWorkItemDetails functions - Show work item artifacts, machine name, duration, exit code - List failed work items when querying a job without -WorkItem Documentation improvements (from PR dotnet#123863): - Add build definition IDs table (129, 133, 139) - Add failure classification table with all patterns - Add Helix API curl examples - Add artifact download documentation - Add environment variable extraction examples - Add links to triaging guide, area owners, Helix swagger - Document -HelixJob and -WorkItem parameters
- Add Extract-TestRunUrls function to parse 'Published Test Run' URLs - Add Get-LocalTestFailures to detect non-Helix test failures - Add classification for local xUnit test failures - Update main flow to report local test failures with links - Update SKILL.md with new documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
- Add Get-AzDOTestResults function using az devops invoke - Fetch actual failed test names when az CLI is available - Show up to 10 failed test names with count
- Add Extract-HelixLogUrls function to parse Helix console log URLs - Display work item names with direct log links for Helix failures - Deduplicate URLs to avoid showing duplicates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
.github/skills/azdo-helix-failures/references/azdo-helix-reference.md
Outdated
Show resolved
Hide resolved
.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1
Outdated
Show resolved
Hide resolved
.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1
Outdated
Show resolved
Hide resolved
.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1
Outdated
Show resolved
Hide resolved
- Fix SHA256 resource leak: wrap in try/finally with Dispose() - Use safer gh command syntax with & call operator - Add note about running from skill directory - Use cross-platform relative paths (./scripts/...)
- Add documentation about local test failures matching known issues - Search by task name if no specific build errors found - Helps repos like dotnet/sdk that run tests locally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.
.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1
Outdated
Show resolved
Hide resolved
.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1
Outdated
Show resolved
Hide resolved
.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1
Outdated
Show resolved
Hide resolved
.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1
Outdated
Show resolved
Hide resolved
.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1
Outdated
Show resolved
Hide resolved
.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1
Outdated
Show resolved
Hide resolved
- Extract Get-UrlHash helper to reduce SHA256 duplication - Simplify retry recommendation (avoid 1:1 mapping assumption) - Move processedJobs increment inside try block - Add exception context when re-throwing errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.
.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1
Outdated
Show resolved
Hide resolved
.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1
Outdated
Show resolved
Hide resolved
.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1
Outdated
Show resolved
Hide resolved
.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1
Outdated
Show resolved
Hide resolved
.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1
Outdated
Show resolved
Hide resolved
The Helix API returns Files with 'FileName' property, not 'Name'. Also deduplicate artifacts since same file can appear multiple times for different test runs within a work item.
Documents how to find and analyze Helix work item artifacts including: - Accessing artifacts via script and API - Common artifact types (binlogs, console logs, crash dumps) - Binlog analysis with live.msbuildlog.com - Mobile test artifacts (iOS/Android) - Artifact retention info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 11 comments.
.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1
Outdated
Show resolved
Hide resolved
.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1
Outdated
Show resolved
Hide resolved
.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1
Outdated
Show resolved
Hide resolved
Key insight: artifact structures vary by test type, always query the work item to see what's available. Document common patterns without being prescriptive about exact layouts.
Security: - Add repository format validation (prevent command injection) - Sanitize search terms before passing to gh CLI - Use full SHA256 hash for cache keys (collision resistance) Robustness: - Add null checks for Timeline.records in Get-FailedJobs, Get-CanceledJobs, Get-HelixJobInfo - Handle ConvertFrom-Json failures in cache gracefully (treat as cache miss) - Use atomic cache writes (temp file + rename) to avoid race conditions - Increase JSON serialization depth from 10 to 100 Documentation: - Fix SKILL.md examples to use correct paths from repo root
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
.github/skills/azdo-helix-failures/scripts/Get-HelixFailures.ps1
Outdated
Show resolved
Hide resolved
All examples now use ./scripts/Get-HelixFailures.ps1 relative to skill directory.
- Add try/catch to cache cleanup with verbose logging on failure - Add comment explaining allowed chars in search term regex - Use TryParse for build ID parsing instead of direct cast - Validate headSha format (40 hex chars) before using in API call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.
- Fix FileName property in manual-investigation.md (was Name) - Fix dotnet/sdk description: uses both local and Helix tests
Summary
Adds an AI agent skill for analyzing Azure DevOps and Helix test failures across dotnet repositories. When asked to investigate CI failures, the skill teaches Copilot how to query APIs, extract failure details, and provide actionable recommendations.
Features
Core Functionality
Intelligent Analysis
-SearchMihuBot)Smart Recommendations
Provides actionable guidance at the end of analysis:
Usage
Example Prompts
Structure
Related
Supersedes #123863 - this PR includes a PowerShell script for automation rather than just documentation, plus additional features like Build Analysis integration, known issue search, and smart retry recommendations.