diff --git a/.github/workflows/daily-observability-report.lock.yml b/.github/workflows/daily-observability-report.lock.yml index daf5ffddbd..c1416831da 100644 --- a/.github/workflows/daily-observability-report.lock.yml +++ b/.github/workflows/daily-observability-report.lock.yml @@ -738,6 +738,55 @@ jobs: You are an expert site reliability engineer analyzing observability coverage for GitHub Agentic Workflows. Your job is to audit workflow runs and determine if they have adequate logging and telemetry for debugging purposes. + ## ๐Ÿ“ Report Formatting Guidelines + + **CRITICAL**: Follow these formatting guidelines to create well-structured, readable reports: + + ### 1. Header Levels + **Use h3 (###) or lower for all headers in your report to maintain proper document hierarchy.** + + The discussion title serves as h1, so all content headers should start at h3: + - Use `###` for main sections (e.g., "### Executive Summary", "### Coverage Summary") + - Use `####` for subsections (e.g., "#### Missing Firewall Logs", "#### Gateway Log Quality") + - Never use `##` (h2) or `#` (h1) in the report body + + ### 2. Progressive Disclosure + **Wrap long sections in `
Section Name` tags to improve readability and reduce scrolling.** + + Use collapsible sections for: + - Detailed run analysis tables + - Per-workflow breakdowns + - Complete observability coverage data + - Verbose telemetry quality analysis + + Example: + ```markdown +
+ Detailed Metrics + + [Long metrics data...] + +
+ ``` + + ### 3. Report Structure Pattern + + Your report should follow this structure for optimal readability: + + 1. **Executive Summary** (always visible): 2-3 paragraph overview of observability status, critical issues, and overall health + 2. **Key Alerts and Anomalies** (always visible): Any critical missing logs or observability gaps that need immediate attention + 3. **Coverage Summary** (always visible): High-level metrics table showing firewall and gateway log coverage + 4. **Detailed Metrics and Analysis** (in `
` tags): Complete run analysis tables, telemetry quality analysis, per-workflow breakdowns + 5. **Recommended Actions** (always visible): Specific, actionable recommendations for improving observability + + ### Design Principles + + Create reports that: + - **Build trust through clarity**: Most important info (summary, critical issues, recommendations) immediately visible + - **Exceed expectations**: Add helpful context, trends, comparisons, and insights beyond basic metrics + - **Create delight**: Use progressive disclosure to reduce overwhelm for detailed data + - **Maintain consistency**: Follow the same patterns as other reporting workflows like audit-workflows and daily-firewall-report + ## Mission Generate a comprehensive daily report analyzing workflow runs from the past week to check for proper observability coverage in: @@ -944,67 +993,72 @@ jobs: **Body Structure**: + Follow the formatting guidelines above. Use the following structure: + ```markdown - [2-3 paragraph executive summary with key findings, critical issues if any, and overall health assessment] + ### Executive Summary -
- ๐Ÿ“Š Full Observability Report + [2-3 paragraph overview of observability status with key findings, critical issues if any, and overall health assessment. Always visible.] + + ### Key Alerts and Anomalies - ## ๐Ÿ“ˆ Coverage Summary + [Critical missing logs or observability gaps that need immediate attention. If none, state "No critical issues detected." Always visible.] + + ๐Ÿ”ด **Critical Issues:** + - [List any runs missing critical logs - access.log for firewall runs, gateway.jsonl for MCP runs] + + โš ๏ธ **Warnings:** + - [List runs with incomplete or low-quality logs] + + ### Coverage Summary | Component | Runs Analyzed | Logs Present | Coverage | Status | |-----------|--------------|--------------|----------|--------| | AWF Firewall (access.log) | X (`firewall_enabled_workflows`) | Y (`runs_with_complete_logs`) | Z% (`observability_coverage_percentage`) | โœ…/โš ๏ธ/๐Ÿ”ด | | MCP Gateway (gateway.jsonl) | X (`mcp_enabled_workflows`) | Y (`runs_with_complete_logs`) | Z% (`observability_coverage_percentage`) | โœ…/โš ๏ธ/๐Ÿ”ด | - ## ๐Ÿ”ด Critical Issues + [Always visible. Summary table showing high-level coverage metrics.] - [List any runs missing critical logs - these need immediate attention] +
+ ๐Ÿ“‹ Detailed Run Analysis - ### Missing Firewall Logs (access.log) + #### Firewall-Enabled Runs - | Workflow | Run ID | Date | Link | - |----------|--------|------|------| - | workflow-name | 12345 | 2024-01-15 | [ยง12345](url) | + | Workflow | Run ID | access.log | Entries | Allowed | Blocked | Status | + |----------|--------|------------|---------|---------|---------|--------| + | ... | ... | โœ…/โŒ | N | N | N | โœ…/โš ๏ธ/๐Ÿ”ด | - ### Missing Gateway Logs (gateway.jsonl) + #### Missing Firewall Logs (access.log) | Workflow | Run ID | Date | Link | |----------|--------|------|------| | workflow-name | 12345 | 2024-01-15 | [ยง12345](url) | - ## โš ๏ธ Warnings - - [List runs with incomplete or low-quality logs] - - ## โœ… Healthy Runs - - [Summary of runs with complete observability coverage] - - ## ๐Ÿ“‹ Detailed Run Analysis - - ### Firewall-Enabled Runs - - | Workflow | Run ID | access.log | Entries | Allowed | Blocked | Status | - |----------|--------|------------|---------|---------|---------|--------| - | ... | ... | โœ…/โŒ | N | N | N | โœ…/โš ๏ธ/๐Ÿ”ด | - - ### MCP-Enabled Runs + #### MCP-Enabled Runs | Workflow | Run ID | gateway.jsonl | Entries | Servers | Tool Calls | Errors | Status | |----------|--------|---------------|---------|---------|------------|--------|--------| | ... | ... | โœ…/โŒ | N | N | N | N | โœ…/โš ๏ธ/๐Ÿ”ด | - ## ๐Ÿ” Telemetry Quality Analysis + #### Missing Gateway Logs (gateway.jsonl) + + | Workflow | Run ID | Date | Link | + |----------|--------|------|------| + | workflow-name | 12345 | 2024-01-15 | [ยง12345](url) | + +
+ +
+ ๐Ÿ” Telemetry Quality Analysis - ### Firewall Log Quality + #### Firewall Log Quality - Total access.log entries analyzed: N - Domains accessed: N unique - Blocked requests: N (X%) - Most accessed domains: domain1, domain2, domain3 - ### Gateway Log Quality + #### Gateway Log Quality - Total gateway.jsonl entries analyzed: N - MCP servers used: server1, server2 @@ -1012,18 +1066,29 @@ jobs: - Error rate: X% - Average response time: Xms - ## ๐Ÿ“ Recommendations + #### Healthy Runs Summary + + [Summary of runs with complete observability coverage] + +
+ + ### Recommended Actions 1. [Specific recommendation for improving observability coverage] 2. [Recommendation for workflows with missing logs] 3. [Recommendation for improving log quality] - ## ๐Ÿ“Š Trends + [Always visible. Actionable recommendations based on the analysis.] + +
+ ๐Ÿ“Š Historical Trends [If historical data is available, show trends in observability coverage over time]
+
+ --- *Report generated automatically by the Daily Observability Report workflow* *Analysis window: Last 7 days | Runs analyzed: N* @@ -1062,6 +1127,8 @@ jobs: - โœ… Create a new discussion with comprehensive report - โœ… Include actionable recommendations + PROMPT_EOF + cat << 'PROMPT_EOF' >> "$GH_AW_PROMPT" Begin your analysis now. Download the logs, analyze observability coverage, and create the discussion report. PROMPT_EOF diff --git a/.github/workflows/daily-observability-report.md b/.github/workflows/daily-observability-report.md index 4ff65a98e9..01a29f4739 100644 --- a/.github/workflows/daily-observability-report.md +++ b/.github/workflows/daily-observability-report.md @@ -36,6 +36,55 @@ imports: You are an expert site reliability engineer analyzing observability coverage for GitHub Agentic Workflows. Your job is to audit workflow runs and determine if they have adequate logging and telemetry for debugging purposes. +## ๐Ÿ“ Report Formatting Guidelines + +**CRITICAL**: Follow these formatting guidelines to create well-structured, readable reports: + +### 1. Header Levels +**Use h3 (###) or lower for all headers in your report to maintain proper document hierarchy.** + +The discussion title serves as h1, so all content headers should start at h3: +- Use `###` for main sections (e.g., "### Executive Summary", "### Coverage Summary") +- Use `####` for subsections (e.g., "#### Missing Firewall Logs", "#### Gateway Log Quality") +- Never use `##` (h2) or `#` (h1) in the report body + +### 2. Progressive Disclosure +**Wrap long sections in `
Section Name` tags to improve readability and reduce scrolling.** + +Use collapsible sections for: +- Detailed run analysis tables +- Per-workflow breakdowns +- Complete observability coverage data +- Verbose telemetry quality analysis + +Example: +```markdown +
+Detailed Metrics + +[Long metrics data...] + +
+``` + +### 3. Report Structure Pattern + +Your report should follow this structure for optimal readability: + +1. **Executive Summary** (always visible): 2-3 paragraph overview of observability status, critical issues, and overall health +2. **Key Alerts and Anomalies** (always visible): Any critical missing logs or observability gaps that need immediate attention +3. **Coverage Summary** (always visible): High-level metrics table showing firewall and gateway log coverage +4. **Detailed Metrics and Analysis** (in `
` tags): Complete run analysis tables, telemetry quality analysis, per-workflow breakdowns +5. **Recommended Actions** (always visible): Specific, actionable recommendations for improving observability + +### Design Principles + +Create reports that: +- **Build trust through clarity**: Most important info (summary, critical issues, recommendations) immediately visible +- **Exceed expectations**: Add helpful context, trends, comparisons, and insights beyond basic metrics +- **Create delight**: Use progressive disclosure to reduce overwhelm for detailed data +- **Maintain consistency**: Follow the same patterns as other reporting workflows like audit-workflows and daily-firewall-report + ## Mission Generate a comprehensive daily report analyzing workflow runs from the past week to check for proper observability coverage in: @@ -242,67 +291,72 @@ Create a new discussion with the comprehensive observability report. **Body Structure**: +Follow the formatting guidelines above. Use the following structure: + ```markdown -[2-3 paragraph executive summary with key findings, critical issues if any, and overall health assessment] +### Executive Summary -
-๐Ÿ“Š Full Observability Report +[2-3 paragraph overview of observability status with key findings, critical issues if any, and overall health assessment. Always visible.] + +### Key Alerts and Anomalies + +[Critical missing logs or observability gaps that need immediate attention. If none, state "No critical issues detected." Always visible.] + +๐Ÿ”ด **Critical Issues:** +- [List any runs missing critical logs - access.log for firewall runs, gateway.jsonl for MCP runs] -## ๐Ÿ“ˆ Coverage Summary +โš ๏ธ **Warnings:** +- [List runs with incomplete or low-quality logs] + +### Coverage Summary | Component | Runs Analyzed | Logs Present | Coverage | Status | |-----------|--------------|--------------|----------|--------| | AWF Firewall (access.log) | X (`firewall_enabled_workflows`) | Y (`runs_with_complete_logs`) | Z% (`observability_coverage_percentage`) | โœ…/โš ๏ธ/๐Ÿ”ด | | MCP Gateway (gateway.jsonl) | X (`mcp_enabled_workflows`) | Y (`runs_with_complete_logs`) | Z% (`observability_coverage_percentage`) | โœ…/โš ๏ธ/๐Ÿ”ด | -## ๐Ÿ”ด Critical Issues +[Always visible. Summary table showing high-level coverage metrics.] -[List any runs missing critical logs - these need immediate attention] +
+๐Ÿ“‹ Detailed Run Analysis -### Missing Firewall Logs (access.log) +#### Firewall-Enabled Runs -| Workflow | Run ID | Date | Link | -|----------|--------|------|------| -| workflow-name | 12345 | 2024-01-15 | [ยง12345](url) | +| Workflow | Run ID | access.log | Entries | Allowed | Blocked | Status | +|----------|--------|------------|---------|---------|---------|--------| +| ... | ... | โœ…/โŒ | N | N | N | โœ…/โš ๏ธ/๐Ÿ”ด | -### Missing Gateway Logs (gateway.jsonl) +#### Missing Firewall Logs (access.log) | Workflow | Run ID | Date | Link | |----------|--------|------|------| | workflow-name | 12345 | 2024-01-15 | [ยง12345](url) | -## โš ๏ธ Warnings - -[List runs with incomplete or low-quality logs] - -## โœ… Healthy Runs - -[Summary of runs with complete observability coverage] - -## ๐Ÿ“‹ Detailed Run Analysis - -### Firewall-Enabled Runs - -| Workflow | Run ID | access.log | Entries | Allowed | Blocked | Status | -|----------|--------|------------|---------|---------|---------|--------| -| ... | ... | โœ…/โŒ | N | N | N | โœ…/โš ๏ธ/๐Ÿ”ด | - -### MCP-Enabled Runs +#### MCP-Enabled Runs | Workflow | Run ID | gateway.jsonl | Entries | Servers | Tool Calls | Errors | Status | |----------|--------|---------------|---------|---------|------------|--------|--------| | ... | ... | โœ…/โŒ | N | N | N | N | โœ…/โš ๏ธ/๐Ÿ”ด | -## ๐Ÿ” Telemetry Quality Analysis +#### Missing Gateway Logs (gateway.jsonl) -### Firewall Log Quality +| Workflow | Run ID | Date | Link | +|----------|--------|------|------| +| workflow-name | 12345 | 2024-01-15 | [ยง12345](url) | + +
+ +
+๐Ÿ” Telemetry Quality Analysis + +#### Firewall Log Quality - Total access.log entries analyzed: N - Domains accessed: N unique - Blocked requests: N (X%) - Most accessed domains: domain1, domain2, domain3 -### Gateway Log Quality +#### Gateway Log Quality - Total gateway.jsonl entries analyzed: N - MCP servers used: server1, server2 @@ -310,18 +364,29 @@ Create a new discussion with the comprehensive observability report. - Error rate: X% - Average response time: Xms -## ๐Ÿ“ Recommendations +#### Healthy Runs Summary + +[Summary of runs with complete observability coverage] + +
+ +### Recommended Actions 1. [Specific recommendation for improving observability coverage] 2. [Recommendation for workflows with missing logs] 3. [Recommendation for improving log quality] -## ๐Ÿ“Š Trends +[Always visible. Actionable recommendations based on the analysis.] + +
+๐Ÿ“Š Historical Trends [If historical data is available, show trends in observability coverage over time]
+
+ --- *Report generated automatically by the Daily Observability Report workflow* *Analysis window: Last 7 days | Runs analyzed: N*