Skip to content

Research workflow already fixed - no changes needed, requires validation#11724

Closed
Copilot wants to merge 1 commit intomainfrom
copilot/fix-research-workflow-issues-again
Closed

Research workflow already fixed - no changes needed, requires validation#11724
Copilot wants to merge 1 commit intomainfrom
copilot/fix-research-workflow-issues-again

Conversation

Copy link
Contributor

Copilot AI commented Jan 25, 2026

Investigation Results

Research workflow failures (2026-01-08 to 2026-01-16) were caused by ubuntu-slim runner instability during preview period. Issue already resolved by:

  • ubuntu-slim GA release (2026-01-22)
  • TAVILY_API_KEY secret addition (2026-01-22)
  • Workflow recompilation incorporating both fixes (2026-01-24, commit 1bc18eb)

Current State

# research.lock.yml line 380 - TAVILY_API_KEY properly configured
env:
  TAVILY_API_KEY: ${{ secrets.TAVILY_API_KEY }}

Both research.lock.yml and daily-news.lock.yml last compiled 2026-01-24 16:17:22 UTC, after both fixes applied.

Why Unfixed Appearance

Zero workflow runs since 2026-01-24 recompilation. Research workflow is workflow_dispatch only - requires manual trigger for validation.

Recommendation

Close this PR. Validate fix with:

gh workflow run research.lock.yml -f topic="Validation test"

If successful, close original issue. If failure persists, investigate new error messages.

Timeline

Date Event
2026-01-08 Last successful run (preview period)
2026-01-16 Failures (preview instability)
2026-01-22 ubuntu-slim GA + TAVILY_API_KEY added
2026-01-24 Workflows recompiled (fix applied)
2026-01-25 Investigation (no runs since fix)
Original prompt

This section details on the original issue you should resolve

<issue_title>Research Workflow - Still failing after TAVILY_API_KEY fix (20% success rate, 17 days offline)</issue_title>
<issue_description>### Problem

Research workflow remains largely non-operational with only 20% success rate despite TAVILY_API_KEY secret being added. The workflow has been effectively offline for 17 days.

Current Status (2026-01-25)

Root Cause Analysis

Key insight: Daily News workflow recovered immediately after TAVILY_API_KEY was added (2026-01-22), but Research workflow did NOT recover. This suggests:

  1. Hypothesis 1: Workflow needs recompilation

    • Secret was added AFTER last compilation
    • Lock file may not reference the new secret
    • Solution: make recompile
  2. Hypothesis 2: Different MCP Gateway configuration

    • Research may use different MCP server setup than Daily News
    • May need additional configuration beyond TAVILY_API_KEY
    • Review frontmatter differences
  3. Hypothesis 3: Intermittent MCP Gateway issues

    • 1/5 runs succeeded (20% rate)
    • May be timing/connectivity related
    • Could be transient MCP server availability

Comparison with Daily News and MCP Inspector

Aspect Daily News (✅) Research (⚠️) MCP Inspector (❌)
TAVILY_API_KEY Present Present Present
Recovery Immediate Partial (20%) None (0%)
Success rate 40% recovering 20% low 0% failing
Last compiled Unknown Unknown Unknown
MCP Gateway Working Intermittent Failing

Recommended Investigation Steps

Step 1: Recompile Workflow

cd /path/to/repo
make recompile
git add .github/workflows/research.lock.yml
git commit -m "Recompile Research workflow after TAVILY_API_KEY fix"
git push

Step 2: Compare Frontmatter

Compare configurations:

  • .github/workflows/daily-news.md (working, 40% success)
  • .github/workflows/research.md (failing, 20% success)
  • .github/workflows/mcp-inspector.md (failing, 0% success)

Look for differences in:

  • MCP server configuration
  • Tool permissions
  • Timeout settings
  • Environment variables

Step 3: Analyze Failed Run Logs

Download artifacts from run 21078189533:

  • Check /tmp/gh-aw/mcp-logs/ for MCP Gateway errors
  • Review agent stdio logs
  • Look for timeout or connection issues

Step 4: Test Manually Multiple Times

# Run 3-5 times to check for intermittent issues
for i in {1..5}; do
  gh workflow run research.lock.yml
  sleep 60
done

Monitor success rate of manual runs.

Success Criteria

  • Research workflow runs successfully
  • Success rate returns to >80% over next 5 runs
  • Research and knowledge work capabilities fully operational
  • No intermittent failures

Priority: P1 (High)

Impact: Research capabilities severely limited for 17 days. This blocks automated research tasks, knowledge work, and investigation workflows.

Urgency: High - research functionality is critical for knowledge-based agents and analysis workflows.

Next steps:

  1. Recompile workflow (5 min)
  2. Test manually 3-5 times (30 min)
  3. Analyze intermittent failure pattern (30 min)
  4. Apply fix based on findings (variable)

References:

AI generated by Workflow Health Manager - Meta-Orchestrator

  • expires on Jan 26, 2026, 3:08 AM UTC

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Fix research workflow failure after TAVILY_API_KEY update Research workflow already fixed - no changes needed, requires validation Jan 25, 2026
Copilot AI requested a review from pelikhan January 25, 2026 03:41
@pelikhan pelikhan closed this Jan 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Research Workflow - Still failing after TAVILY_API_KEY fix (20% success rate, 17 days offline)

2 participants