Skip to content

Add workflow health monitoring runbook#7287

Merged
pelikhan merged 4 commits intomainfrom
copilot/document-workflow-health-runbook
Dec 22, 2025
Merged

Add workflow health monitoring runbook#7287
pelikhan merged 4 commits intomainfrom
copilot/document-workflow-health-runbook

Conversation

Copy link
Contributor

Copilot AI commented Dec 22, 2025

Documents systematic investigation and resolution of workflow failures based on the DeepReport incident response findings (Discussion #7277).

Changes

  • Created .github/aw/runbooks/workflow-health.md - Operational runbook covering:

    • Common failure patterns: missing-tool errors, authentication failures, safe-input/output issues
    • Investigation procedures: log analysis, MCP server verification, permission checks
    • Resolution workflows with configuration examples
    • DeepReport case study: Weekly Issue Summary, Dev workflow, Daily Copilot PR Merged failures
  • Documentation integration:

    • Linked from AGENTS.md Quick Reference (for AI agents)
    • Linked from docs/src/content/docs/troubleshooting/common-issues.md (for users)

Example Usage

When a workflow fails with missing-tool errors:

# 1. Download and analyze logs
gh aw logs --start-date -1d -o /tmp/logs

# 2. Verify MCP configuration
gh aw mcp inspect <workflow-name>

# 3. Fix by adding GitHub MCP server
---
tools:
  github:
    mode: remote
    toolsets: [default]
---

The runbook follows Diátaxis how-to guide format with actionable procedures, real incident analysis, and quick reference commands.

Original prompt

This section details on the original issue you should resolve

<issue_title>[plan] Document workflow health monitoring runbook</issue_title>
<issue_description>## Objective

Create a runbook documenting how to investigate and resolve workflow health issues, based on learnings from the DeepReport incident response.

Context

The DeepReport identified several workflow issues (failures, missing-tool errors) that needed investigation. A documented runbook will help future agents and maintainers quickly diagnose and fix similar problems.

Approach

  1. Create a new documentation file (e.g., docs/runbooks/workflow-health.md)
  2. Document common workflow failure patterns:
    • Missing tool configurations
    • Authentication/permission errors
    • Input/secret validation failures
  3. Add investigation steps:
    • How to analyze workflow logs
    • How to identify missing-tool errors
    • How to verify MCP server configurations
  4. Include resolution procedures:
    • Adding GitHub MCP server to workflows
    • Configuring safe-inputs/safe-outputs
    • Testing workflow fixes
  5. Link to relevant documentation and examples

Files to Create

  • docs/runbooks/workflow-health.md (or similar location)

Acceptance Criteria

AI generated by Plan Command for discussion #7277

Comments on the Issue (you are @copilot in this section)

@pelikhan Place run books in .github/aw/runbooks

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 3 commits December 22, 2025 16:02
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
- Create .github/aw/runbooks directory
- Document common workflow failure patterns
- Add investigation steps for workflow logs, missing-tool errors, and MCP configuration
- Include resolution procedures with code examples
- Reference DeepReport case study (#7277)
- Provide quick reference commands and configuration patterns

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
- Add Operational Runbooks section to AGENTS.md Quick Reference
- Add Operational Runbooks section to troubleshooting/common-issues.md
- Provide clear navigation to workflow health monitoring procedures

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Create runbook for workflow health monitoring Add workflow health monitoring runbook Dec 22, 2025
Copilot AI requested a review from pelikhan December 22, 2025 16:10
@pelikhan pelikhan marked this pull request as ready for review December 22, 2025 16:11
@pelikhan pelikhan merged commit 3b1e9d2 into main Dec 22, 2025
4 checks passed
@pelikhan pelikhan deleted the copilot/document-workflow-health-runbook branch December 22, 2025 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[plan] Document workflow health monitoring runbook

2 participants