-
Notifications
You must be signed in to change notification settings - Fork 36
Closed
Description
Problem Statement
Smoke test workflows are creating duplicate issues, resulting in 15% of all issues being duplicates (9 patterns identified in last 100 issues).
Evidence
Duplicate Patterns Identified
Top 5 duplicate patterns:
- "Smoke Test: Claude - XXXXXX": 15 instances
- "Smoke Test: Copilot - XXXXXX": 13 instances
- "[agentics] Smoke Copilot failed": 4 instances
- "[agentics] agentic workflows out of sync": 3 instances
- "Smoke Claude - Issue Group": 3 instances
Impact
- Noise in issue tracker - harder to find signal in 248 open issues
- Maintenance overhead - need to manually close duplicates
- Reduced credibility - creates perception of low quality
- 15% duplicate rate - approximately 37 duplicate issues out of 248
Root Cause
Smoke tests not checking for existing open issues:
- Creating new issue for each test failure
- No deduplication logic
- Not closing resolved issues before creating new ones
Workflow behavior:
- Test fails
- Workflow creates issue immediately
- No check for existing open issue with same title pattern
- Result: Multiple issues for same problem
Proposed Solution
Add Deduplication Logic to Smoke Tests
Implementation (2-4 hours):
-
Before creating new issue, check for existing open issues:
// Pseudocode const existingIssues = await github.rest.issues.listForRepo({ owner, repo, state: 'open', labels: ['smoke-test'], per_page: 100 }); const duplicatePattern = /^Smoke Test: (Copilot|Claude) -/; const duplicate = existingIssues.data.find(issue => duplicatePattern.test(issue.title) && issue.title.includes(testName) ); if (duplicate) { // Update existing issue instead of creating new one await github.rest.issues.createComment({ issue_number: duplicate.number, body: `Test still failing as of ${new Date().toISOString()}\n\n[Latest run](...)` }); return; // Don't create new issue }
-
Close resolved issues before creating new ones:
- If test passes, close any open issues for that test
- Add comment indicating test now passes
-
Use consistent title patterns:
Smoke Test: [Engine] - [Test Name]- Makes duplicate detection easier
- Improves searchability
Alternative: Issue Groups
Instead of individual issues per run, create issue groups that get updated:
- One issue per failing test that stays open
- Add comments for each failure occurrence
- Close when test passes consistently
Expected Outcomes
Metrics:
- Duplicate rate: From 15% to <5%
- Issue clarity: Easier to identify unique problems
- Maintenance: Less time closing duplicates
User experience:
- Cleaner issue tracker
- Easier to find real problems
- Better signal-to-noise ratio
Implementation Plan
Phase 1: Add Deduplication (2-4 hours)
-
Update smoke-copilot.md workflow:
- Add GitHub API query for existing issues
- Implement duplicate detection logic
- Add comment-instead-of-create behavior
-
Update smoke-claude.md workflow:
- Same changes as Copilot
- Ensure consistent behavior
-
Test changes:
- Trigger smoke tests manually
- Verify no duplicates created
- Verify existing issues get updated
Phase 2: Clean Up Existing Duplicates (1 hour)
- Identify and close duplicates:
- Query for duplicate patterns
- Keep most recent issue open
- Close older duplicates with "duplicate of #XXX" comment
Phase 3: Monitor (Ongoing)
- Track duplicate rate:
- Agent Performance Analyzer monitors rate
- Alert if rate exceeds 5%
- Iterate on detection logic
Testing
Manual testing:
- Trigger smoke test that fails
- Verify no existing open issue → creates new issue ✅
- Trigger same failing test again
- Verify existing open issue → adds comment ✅
- Fix test, trigger passing test
- Verify existing issue → gets closed ✅
Automated verification:
- Agent Performance Analyzer tracks duplicate rate
- Alert if rate exceeds 5% for 2 consecutive weeks
Success Metrics
- Duplicate rate: Target <5% (from 15%)
- Issue count: Expect 15-20% reduction in total open issues
- Signal quality: Easier to identify unique problems
- Maintenance time: Less time closing duplicates
Timeline
- Phase 1 (Implementation): 2-4 hours
- Phase 2 (Cleanup): 1 hour
- Phase 3 (Monitoring): Ongoing
- Total: 3-5 hours initial, then automated
Priority
P1 (High) because:
- Affects 15% of issues (significant noise)
- Reduces issue tracker credibility
- Easy to fix (2-4 hours)
- High impact on user experience
- Not blocking (lower priority than PR merge crisis)
Related Workflows
.github/workflows/smoke-copilot.md.github/workflows/smoke-claude.md- Any workflow creating issues for test failures
Recommended Owner
- Smoke test workflow maintainers
- Testing team
AI generated by Agent Performance Analyzer - Meta-Orchestrator
Copilot