Skip to content
25 changes: 25 additions & 0 deletions plugins/must-gather/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,31 @@ NAMESPACE NAME STATUS VO
openshift-monitoring prometheus-data-prometheus-0 Bound pvc-3d4a0119-b2f2-44fa-9b2f-b11c611c74f2 20Gi
```

#### `analyze_prometheus.py`

Analyzes Prometheus alerts.

```bash
# Alerts in all namespaces
./analyze_prometheus.py <must-gather-path>

# Alerts from a specific namespace
./analyze_prometheus.py <must-gather-path> --namespace openshift-monitoring
```

Output format:
```
ALERTS
STATE NAMESPACE NAME SEVERITY SINCE LABELS
firing openshift-monitoring Watchdog none 2025-10-06T09:54:21Z {}
firing openshift-monitoring AlertmanagerReceiversNotConfigured warning 2025-10-06T09:54:51Z {}

================================================================================
SUMMARY
Active alerts: 2 total (0 pending, 2 firing)
================================================================================
```

### Slash Commands

#### `/must-gather:analyze [path] [component]`
Expand Down
6 changes: 6 additions & 0 deletions plugins/must-gather/commands/analyze.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ The command can analyze:
- Kubernetes events (warnings and errors)
- etcd cluster health and quorum status
- Persistent volume and claim status
- Prometheus alerts

You can request analysis of the entire cluster or focus on a specific component.

Expand Down Expand Up @@ -107,6 +108,7 @@ The command performs the following steps:
- "version", "cluster version", "update", "upgrade" → `analyze_clusterversion.py` ONLY
- "events", "warnings", "errors" → `analyze_events.py` ONLY
- "storage", "pv", "pvc", "volumes", "persistent" → `analyze_pvs.py` ONLY
- "alerts", "prometheus", "monitoring" → `analyze_prometheus.py` ONLY

**STEP 2: No specific component mentioned**

Expand All @@ -119,6 +121,7 @@ The command performs the following steps:
6. Events - warnings only (`analyze_events.py --type Warning --count 50`)
7. etcd (`analyze_etcd.py`)
8. Storage (`analyze_pvs.py`)
9. Monitoring (`analyze_prometheus.py`)

3. **Execute Analysis Scripts**:
```bash
Expand Down Expand Up @@ -175,6 +178,9 @@ ETCD CLUSTER HEALTH:
STORAGE (PVs/PVCs):
[output from analyze_pvs.py]
MONITORING (Alerts):
[output from analyze_prometheus.py]
================================================================================
FINDINGS AND RECOMMENDATIONS
================================================================================
Expand Down
13 changes: 13 additions & 0 deletions plugins/must-gather/skills/must-gather-analyzer/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,19 @@ Shows storage resources:
- Storage classes
- Pending/unbound volumes

#### Monitoring Analysis
```bash
# All alerts.
./scripts/analyze_prometheus.py <must-gather-path>

# Alerts in specific namespace
./scripts/analyze_prometheus.py <must-gather-path> --namespace openshift-monitoring
```

Shows monitoring information:
- Alerts (state, namespace, name, active since, labels)
- Total of pending/firing alerts

### 3. Interpret and Report

After running the scripts:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
#!/usr/bin/env python3
"""
Analyze Prometheus data from must-gather data.
Shows Prometheus status, targets, and active alerts.
"""

import sys
import os
import json
import argparse
from pathlib import Path
from typing import List, Dict, Any, Optional

def parse_json_file(file_path: Path) -> Optional[Dict[str, Any]]:
"""Parse a JSON file."""
try:
with open(file_path, 'r', encoding='utf-8') as f:
doc = json.load(f)
return doc
except (FileNotFoundError, json.JSONDecodeError, OSError) as e:
print(f"Error: Failed to parse {file_path}: {e}", file=sys.stderr)
return None

def print_alerts_table(alerts):
"""Print alerts in a table format."""
if not alerts:
print("No alerts found.")
return

print("ALERTS")
print(f"{'STATE':<10} {'NAMESPACE':<50} {'NAME':<50} {'SEVERITY':<10} {'SINCE':<20} LABELS")

for alert in alerts:
state = alert.get('state', '')
since = alert.get('activeAt', '')[:19] + 'Z' # timestamps are always UTC.
labels = alert.get('labels', {})
namespace = labels.pop('namespace', '')[:50]
name = labels.pop('alertname', '')[:50]
severity = labels.pop('severity', '')[:10]

print(f"{state:<10} {namespace:<50} {name:<50} {severity:<10} {since:<20} {labels}")


def analyze_prometheus(must_gather_path: str, namespace: Optional[str] = None):
"""Analyze Prometheus data in a must-gather directory."""
base_path = Path(must_gather_path)

# Retrieve active alerts.
rules_path = base_path / "monitoring" / "prometheus" / "rules.json"
rules = parse_json_file(rules_path)

if rules is None:
return 1
status = rules.get("status", "")
if status != "success":
print(f"{rules_path}: unexpected status {status}", file=sys.stderr)
return 1

if "data" not in rules or "groups" not in rules["data"]:
print(f"Error: Unexpected JSON structure in {rules_path}", file=sys.stderr)
return 1

alerts = []
for group in rules["data"]["groups"]:
for rule in group["rules"]:
if rule["type"] == 'alerting' and rule["state"] != 'inactive':
for alert in rule["alerts"]:
if namespace is None or namespace == '':
alerts.append(alert)
elif alert.get('labels', {}).get('namespace', '') == namespace:
alerts.append(alert)

# Sort alerts by namespace, alertname and severity.
alerts.sort(key=lambda x: (x.get('labels', {}).get('namespace', ''), x.get('labels', {}).get('alertname', ''), x.get('labels', {}).get('severity', '')))

# Print results
print_alerts_table(alerts)

# Summary
total_alerts = len(alerts)
pending = sum(1 for alert in alerts if alert.get('state') == 'pending')
firing = sum(1 for alert in alerts if alert.get('state') == 'firing')

print(f"\n{'='*80}")
print(f"SUMMARY")
print(f"Active alerts: {total_alerts} total ({pending} pending, {firing} firing)")
print(f"{'='*80}")

return 0


def main():
parser = argparse.ArgumentParser(
description='Analyze Prometheus data from must-gather data',
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
%(prog)s ./must-gather
%(prog)s ./must-gather --namespace openshift-monitoring
"""
)

parser.add_argument('must_gather_path', help='Path to must-gather directory')
parser.add_argument('-n', '--namespace', help='Filter information by namespace')

args = parser.parse_args()

if not os.path.isdir(args.must_gather_path):
print(f"Error: Directory not found: {args.must_gather_path}", file=sys.stderr)
return 1

return analyze_prometheus(args.must_gather_path, args.namespace)


if __name__ == '__main__':
sys.exit(main())