Research Workflow - Still failing after TAVILY_API_KEY fix (20% success rate, 17 days offline)

### Problem

Research workflow remains largely non-operational with only 20% success rate despite TAVILY_API_KEY secret being added. The workflow has been effectively offline for 17 days.

### Current Status (2026-01-25)

- **Success rate**: 1/5 recent runs (20%)
- **Latest failure**: [§21078189533](https://github.com/githubnext/gh-aw/actions/runs/21078189533) (2026-01-16)
- **Last success**: 2026-01-08 (17 days ago)
- **Failed step**: Suspected "Start MCP gateway" (similar to MCP Inspector)
- **Previous issue**: #11434 (auto-closed 2026-01-24)

### Root Cause Analysis

**Key insight**: Daily News workflow recovered immediately after TAVILY_API_KEY was added (2026-01-22), but Research workflow did NOT recover. This suggests:

1. **Hypothesis 1**: Workflow needs recompilation
   - Secret was added AFTER last compilation
   - Lock file may not reference the new secret
   - Solution: `make recompile`

2. **Hypothesis 2**: Different MCP Gateway configuration
   - Research may use different MCP server setup than Daily News
   - May need additional configuration beyond TAVILY_API_KEY
   - Review frontmatter differences

3. **Hypothesis 3**: Intermittent MCP Gateway issues
   - 1/5 runs succeeded (20% rate)
   - May be timing/connectivity related
   - Could be transient MCP server availability

### Comparison with Daily News and MCP Inspector

| Aspect | Daily News (✅) | Research (⚠️) | MCP Inspector (❌) |
|--------|----------------|--------------|-------------------|
| TAVILY_API_KEY | Present | Present | Present |
| Recovery | Immediate | Partial (20%) | None (0%) |
| Success rate | 40% recovering | 20% low | 0% failing |
| Last compiled | Unknown | Unknown | Unknown |
| MCP Gateway | Working | Intermittent | Failing |

### Recommended Investigation Steps

#### Step 1: Recompile Workflow
```bash
cd /path/to/repo
make recompile
git add .github/workflows/research.lock.yml
git commit -m "Recompile Research workflow after TAVILY_API_KEY fix"
git push
```

#### Step 2: Compare Frontmatter
Compare configurations:
- `.github/workflows/daily-news.md` (working, 40% success)
- `.github/workflows/research.md` (failing, 20% success)
- `.github/workflows/mcp-inspector.md` (failing, 0% success)

Look for differences in:
- MCP server configuration
- Tool permissions
- Timeout settings
- Environment variables

#### Step 3: Analyze Failed Run Logs
Download artifacts from [run 21078189533](https://github.com/githubnext/gh-aw/actions/runs/21078189533):
- Check `/tmp/gh-aw/mcp-logs/` for MCP Gateway errors
- Review agent stdio logs
- Look for timeout or connection issues

#### Step 4: Test Manually Multiple Times
```bash
# Run 3-5 times to check for intermittent issues
for i in {1..5}; do
  gh workflow run research.lock.yml
  sleep 60
done
```

Monitor success rate of manual runs.

### Success Criteria

- Research workflow runs successfully
- Success rate returns to >80% over next 5 runs
- Research and knowledge work capabilities fully operational
- No intermittent failures

### Priority: P1 (High)

**Impact**: Research capabilities severely limited for 17 days. This blocks automated research tasks, knowledge work, and investigation workflows.

**Urgency**: High - research functionality is critical for knowledge-based agents and analysis workflows.

**Next steps**: 
1. Recompile workflow (5 min)
2. Test manually 3-5 times (30 min)
3. Analyze intermittent failure pattern (30 min)
4. Apply fix based on findings (variable)

**References:**
- [Latest failed run §21078189533](https://github.com/githubnext/gh-aw/actions/runs/21078189533)
- [Workflow source](https://github.com/githubnext/gh-aw/blob/main/.github/workflows/research.md)
- [Previous issue #11434](https://github.com/githubnext/gh-aw/issues/11434) (auto-closed)
- [Dashboard #11581](https://github.com/githubnext/gh-aw/issues/11581)




> AI generated by [Workflow Health Manager - Meta-Orchestrator](https://github.com/githubnext/gh-aw/actions/runs/21325874708)
> - [x] expires  on Jan 26, 2026, 3:08 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Research Workflow - Still failing after TAVILY_API_KEY fix (20% success rate, 17 days offline) #11722

Problem

Current Status (2026-01-25)

Root Cause Analysis

Comparison with Daily News and MCP Inspector

Recommended Investigation Steps

Step 1: Recompile Workflow

Step 2: Compare Frontmatter

Step 3: Analyze Failed Run Logs

Step 4: Test Manually Multiple Times

Success Criteria

Priority: P1 (High)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Aspect	Daily News (✅)	Research (⚠️)	MCP Inspector (❌)
TAVILY_API_KEY	Present	Present	Present
Recovery	Immediate	Partial (20%)	None (0%)
Success rate	40% recovering	20% low	0% failing
Last compiled	Unknown	Unknown	Unknown
MCP Gateway	Working	Intermittent	Failing

Research Workflow - Still failing after TAVILY_API_KEY fix (20% success rate, 17 days offline) #11722

Description

Problem

Current Status (2026-01-25)

Root Cause Analysis

Comparison with Daily News and MCP Inspector

Recommended Investigation Steps

Step 1: Recompile Workflow

Step 2: Compare Frontmatter

Step 3: Analyze Failed Run Logs

Step 4: Test Manually Multiple Times

Success Criteria

Priority: P1 (High)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions