Skip to content

[mcp-stress-test] Nightly MCP Stress Test Blocked: Docker-in-Docker Not Available in AWF Environment #626

@github-actions

Description

@github-actions

Critical Blocker for Nightly Stress Test Workflow

The nightly MCP server stress test workflow cannot execute due to a fundamental environment constraint: Docker-in-Docker support is not available in the AWF firewall container.

Test Session Details

  • Test Session: stress-test-20260204-033819
  • Test Date: 2026-02-04T03:42:00Z
  • Workflow: .github/workflows/nightly-mcp-stress-test.md
  • Status:BLOCKED - Cannot Execute

Problem Summary

The stress test attempts to launch 20 MCP servers as Docker containers, but all 20 servers fail immediately because Docker commands are blocked by AWF.

Error Message from MCP Gateway:

ERROR: Docker-in-Docker support was removed in AWF v0.9.1

Docker commands are no longer available inside the firewall container.

If you need to:
- Use MCP servers: Migrate to stdio-based MCP servers (see docs)
- Run Docker: Execute Docker commands outside AWF wrapper
- Build images: Run Docker build before invoking AWF

See PR #205: https://github.com/github/gh-aw-firewall/pull/205

Root Cause

  1. AWF Security Policy: Docker-in-Docker explicitly disabled in AWF v0.9.1 (PR [duplicate-code] Config Validation Logic Duplication (Medium Severity) #205)
  2. Test Design: All 20 MCP servers configured as container: "mcp/*" or container: "ghcr.io/*"
  3. Gateway Behavior: Gateway uses docker run to launch container-based servers
  4. Environment: Workflow runs inside AWF firewall container with no Docker access
  5. Result: Zero servers can launch → zero servers can be tested

Impact

Test Coverage: 0/20 servers tested (0%)

All 20 attempted servers failed with identical Docker availability errors:

  • github (ghcr.io/github/github-mcp-server:v0.30.2)
  • filesystem (mcp/filesystem)
  • memory (mcp/memory)
  • sqlite (mcp/sqlite)
  • postgres (mcp/postgres)
  • brave-search (mcp/brave-search)
  • fetch (mcp/fetch)
  • puppeteer (mcp/puppeteer)
  • slack (mcp/slack)
  • gdrive (mcp/gdrive)
  • google-maps (mcp/google-maps)
  • everart (mcp/everart)
  • sequential-thinking (mcp/sequential-thinking)
  • aws-kb-retrieval (mcp/aws-kb-retrieval)
  • linear (mcp/linear)
  • sentry (mcp/sentry)
  • raygun (mcp/raygun)
  • git (mcp/git)
  • time (mcp/time)
  • axiom (mcp/axiom)

What Actually Worked ✅

The MCP Gateway behaved correctly:

  • Binary compiled successfully
  • Configuration parsed correctly (20 servers loaded)
  • Server started and bound to port 3000
  • Detected AWF environment correctly
  • Provided clear, actionable error messages

This is not a gateway bug - it's an environment incompatibility between the test design and AWF constraints.

Resolution Options

Option 1: Run Workflow Outside AWF (Recommended)

Pros:

  • No code changes needed
  • Tests gateway as designed (with container launching)
  • Quick to implement

Cons:

  • Less security isolation
  • May require different workflow runner

Implementation:

  • Modify workflow to run on standard GitHub runner (not AWF container)
  • OR: Run workflow on self-hosted runner with Docker access

Option 2: Use HTTP-Based MCP Servers

Pros:

  • Servers run outside workflow (no Docker needed)
  • Tests gateway's HTTP proxy capabilities
  • Maintains security boundary

Cons:

  • Requires pre-deployed MCP servers
  • Doesn't test gateway's container launching
  • Complex infrastructure setup

Implementation:

  • Deploy MCP servers externally (e.g., cloud instances)
  • Configure stress test with type: "http" and url instead of container

Option 3: Use Stdio-Based Non-Container Servers

Pros:

  • Can run inside AWF
  • Tests gateway stdio capabilities
  • No Docker dependency

Cons:

  • Requires rewriting/rebuilding MCP servers as binaries
  • Most MCP servers distributed as containers only
  • Significant development effort

Implementation:

  • Build or find stdio-compatible MCP server binaries
  • Deploy binaries into workflow environment
  • Configure with command instead of container

Option 4: Hybrid Approach

Pros:

  • Partial test coverage better than none
  • Incremental improvement possible
  • Flexible

Cons:

  • Incomplete coverage
  • Maintains complexity

Implementation:

  • Identify which servers can run as stdio processes
  • Test subset (e.g., 5-10 servers)
  • Document remaining servers as "requires Docker"

Option 5: Disable Stress Test

Pros:

  • Acknowledges limitation clearly
  • Frees up workflow resources
  • Simple

Cons:

  • Zero multi-server test coverage
  • No regression detection for scaling issues

Implementation:

  • Disable .github/workflows/nightly-mcp-stress-test.md workflow
  • Document as known limitation in README

Recommendations

Immediate Actions

  1. Document blocker (this issue)
  2. 🔲 Disable workflow until resolved (prevents nightly failures)
  3. 🔲 Evaluate Option 1 (run outside AWF for nightly tests)

Short-Term (1-2 weeks)

  • Investigate feasibility of running stress test on non-AWF runner
  • If feasible: implement Option 1
  • If not: implement Option 4 (hybrid with available stdio servers)

Long-Term (1-2 months)

  • Consider Option 2 (pre-deployed HTTP servers) for comprehensive testing
  • Evaluate if stress testing is valuable enough to warrant infrastructure

Next Steps

Decision Required: Which resolution option should we pursue?

Once decided, I can:

  • Update workflow configuration
  • Modify test design
  • Create follow-up implementation tasks

Technical Context

AWF PR #205: github/gh-aw-firewall#205
MCP Gateway Config: .github/agentics/nightly-mcp-stress-test.md
Test Session Logs: Available in workflow run artifacts


Labels Suggested:

  • bug (blocks intended functionality)
  • infrastructure (environment/workflow issue)
  • nightly-tests (affects nightly testing)
  • decision-needed (requires team decision on approach)

AI generated by Nightly MCP Server Stress Test

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions