Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
142 changes: 142 additions & 0 deletions .github/agentics/large-payload-tester.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
<!-- This prompt will be imported in the agentic workflow .github/workflows/large-payload-tester.md at runtime. -->
<!-- You can edit this file to modify the agent behavior without recompiling the workflow. -->

# Large MCP Payload Access Test

You are an AI agent testing the MCP Gateway's ability to handle large payloads and make them accessible to agents.

## Your Task

Test that when the MCP Gateway receives large responses from backend MCP servers:
1. It correctly stores payloads to disk with proper session isolation
2. It returns metadata including the payload file path
3. Agents can successfully read the payload files from their mounted session directory

## Test Protocol

This test uses a **secret-based verification approach**:
1. A secret UUID is embedded in a large test file (~500KB) before the test runs
2. You will use the filesystem MCP server to read a large file containing this secret
3. The gateway will intercept the large response, store it to disk, and return metadata with a `payloadPath`
4. You must then read the payload file from the path provided and extract the secret
5. Finally, report whether you successfully retrieved the secret from the payload

## Test Steps

### Step 1: Read the Test Secret
- Read `/workspace/test-data/test-secret.txt` to get the secret UUID that was generated for this test run
- This file contains ONLY the secret UUID (e.g., `abc123-def456-ghi789`)
- Store this secret - you'll need it to verify payload retrieval later
Comment on lines +27 to +29
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description states the secret file "contains ONLY the secret UUID" (line 28), but the actual implementation in the workflow file shows the secret is prefixed with "test-secret-" (lines 54, 56 in large-payload-tester.md). For example, the actual format is "test-secret-{uuid}" not just "{uuid}".

The example format "abc123-def456-ghi789" in line 28 also doesn't match the actual UUID format which would be something like "test-secret-550e8400-e29b-41d4-a716-446655440000" (if using uuidgen) or "test-secret-1708963200123456789-12345" (if using timestamp fallback).

This discrepancy could confuse the agent during testing since it needs to know the exact format to properly extract and compare the secret.

Copilot uses AI. Check for mistakes.

### Step 2: Trigger a Large Payload Response
- Use the filesystem MCP server's `read_file` tool to read `/workspace/test-data/large-test-file.json`
- This file is ~500KB and contains the secret embedded in JSON data
- The gateway should intercept this response and store it to disk

### Step 3: Extract Metadata from Gateway Response
The gateway's jqschema middleware should transform the response to include:
- `payloadPath`: Full path to the stored payload file
- `preview`: First 500 characters of the response
- `schema`: JSON schema showing structure
- `originalSize`: Size of the full payload
- `queryID`: Unique identifier for this tool call
- `truncated`: Boolean indicating if preview was truncated

Extract and log:
- The `payloadPath` value
- The `queryID` value
- Whether `truncated` is `true`
- The `originalSize` value

### Step 4: Read the Payload File
The payload path will be in the format: `/tmp/jq-payloads/{sessionID}/{queryID}/payload.json`

**IMPORTANT**: The agent's payload directory is mounted to the agent's container. The path you receive from the gateway uses the gateway's filesystem perspective. To read the file:
- The gateway reports path as: `/tmp/jq-payloads/{sessionID}/{queryID}/payload.json`
- In the agent container, the entire `/tmp/jq-payloads` directory is mounted at: `/workspace/mcp-payloads`
- So translate the path by replacing `/tmp/jq-payloads` with `/workspace/mcp-payloads`
- Example: If gateway returns `/tmp/jq-payloads/session-abc123/query-def456/payload.json`, use `/workspace/mcp-payloads/session-abc123/query-def456/payload.json`
- The `{sessionID}` is the actual session identifier, not the literal word "session"
- Use the filesystem MCP server to read the translated path

Use the filesystem MCP server's `read_file` tool to read the payload file at the translated path.

### Step 5: Verify the Secret
- Parse the payload JSON you retrieved
- Search for the secret UUID in the payload
- Compare it with the secret you read in Step 1
- **Verification passes if**: The secret from the payload matches the secret from test-secret.txt
- **Verification fails if**: The secret is missing, doesn't match, or you couldn't read the payload file

### Step 6: Report Results
Create a summary of the test results including:
1. ✅ or ❌ for each test step
2. The secret value you expected (from test-secret.txt)
3. The secret value you found (from the payload file)
4. Whether secrets matched (PASS/FAIL)
5. Path information (gateway path and agent path used)
6. Any errors encountered

## Important Notes

- **Keep all outputs concise** - Use brief, factual statements
- **Log all key values** - Secret, paths, sizes, queryID
- **Be explicit about failures** - State exactly what went wrong if any step fails
- **Path translation is critical** - The gateway and agent see different filesystem paths due to volume mounts

## Expected Behavior

**Success scenario:**
1. Gateway receives large response from filesystem server
2. Gateway stores payload to: `/tmp/jq-payloads/{sessionID}/{queryID}/payload.json`
3. Gateway returns metadata with `payloadPath` and `truncated: true`
4. Agent reads payload from mounted path: `/workspace/mcp-payloads/session/{queryID}/payload.json`
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path in line 93 contains an extra "session/" component that doesn't match the actual payload storage structure. According to lines 88-91 and the rest of the documentation, the path structure is /tmp/jq-payloads/{sessionID}/{queryID}/payload.json where {sessionID} is the actual session identifier. Line 93 incorrectly shows /workspace/mcp-payloads/session/{queryID}/payload.json which adds an extra "session" directory that doesn't exist.

The path should be: /workspace/mcp-payloads/{sessionID}/{queryID}/payload.json (where {sessionID} is replaced with the actual session ID value)

Suggested change
4. Agent reads payload from mounted path: `/workspace/mcp-payloads/session/{queryID}/payload.json`
4. Agent reads payload from mounted path: `/workspace/mcp-payloads/{sessionID}/{queryID}/payload.json`

Copilot uses AI. Check for mistakes.
5. Agent extracts secret from payload
6. Secret matches the expected value from test-secret.txt

**Failure scenarios to detect:**
- Gateway doesn't intercept/store large payloads (no payloadPath in response)
- Gateway path is incorrect or inaccessible
- Agent can't read payload file (permission/mount issues)
- Payload is corrupted or incomplete
- Secret doesn't match (data integrity issue)

## Output Format

After running all tests, create an issue with:
- Title: "Large Payload Test - ${{ github.run_id }}"
- Body with test results in this format:

```markdown
# Large MCP Payload Access Test Results

**Run ID:** ${{ github.run_id }}
**Status:** [PASS/FAIL]
**Timestamp:** [current time]

## Test Results

1. ✅/❌ Read test secret from control file
2. ✅/❌ Trigger large payload response (>1KB)
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 120 describes step 2 as "Trigger large payload response (>1KB)" but this is inconsistent with the actual test design. The test file is ~500KB (not 1KB), and the middleware applies to all responses regardless of size. The ">1KB" threshold is not relevant to the actual implementation or test behavior.

This should either be removed or changed to reflect that the test file is ~500KB and the middleware processes all responses (with preview truncation at 500 characters).

Suggested change
2. ✅/❌ Trigger large payload response (>1KB)
2. ✅/❌ Trigger large payload response (~500KB; middleware processes all responses, preview truncated to 500 chars)

Copilot uses AI. Check for mistakes.
3. ✅/❌ Receive gateway metadata with payloadPath
4. ✅/❌ Translate and access payload file path
5. ✅/❌ Read payload file contents
6. ✅/❌ Extract and verify secret

## Details

- **Expected Secret:** [UUID from test-secret.txt]
- **Found Secret:** [UUID from payload] or "NOT FOUND"
- **Secret Match:** [YES/NO]
- **Gateway Path:** [path from response]
- **Agent Path:** [translated path used]
- **Payload Size:** [originalSize from metadata]
- **Query ID:** [queryID from metadata]

## Conclusion

[Brief summary of what worked and what failed, if anything]

---
Run URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
```
161 changes: 161 additions & 0 deletions .github/workflows/large-payload-tester-README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
# Large Payload Tester Workflow

## Purpose

This agentic workflow tests the MCP Gateway's large payload handling feature, specifically:
1. **Payload Storage**: Verifies that large responses (>500 chars) are automatically stored to disk
2. **Metadata Response**: Confirms the gateway returns correct metadata including `payloadPath`, `schema`, `preview`, etc.
3. **Agent Access**: Tests that agents can successfully read the payload files from their mounted directories
4. **Session Isolation**: Validates that payload files are organized by session ID for multi-agent isolation

## How It Works

### Test Architecture

```
┌─────────────────┐ ┌──────────────────┐ ┌────────────────┐
│ Agent │ │ MCP Gateway │ │ Filesystem │
│ Container │◄────────►│ Container │◄────────►│ MCP Server │
└─────────────────┘ └──────────────────┘ └────────────────┘
│ │ │
│ │ │
Reads payload Stores payload Reads large file
from mounted dir to /tmp/jq-payloads from /tmp/mcp-test-fs
│ │ │
▼ ▼ ▼
/workspace/ /tmp/jq-payloads/ /tmp/mcp-test-fs/
mcp-payloads/ {sessionID}/ large-test-file.json
{queryID}/ (contains secret)
payload.json
```

### Test Protocol

The workflow uses a **secret-based verification** approach:

1. **Setup Phase** (bash step):
- Generate a unique UUID secret
- Create `/tmp/mcp-test-fs/test-secret.txt` containing just the secret
- Create `/tmp/mcp-test-fs/large-test-file.json` (~500KB) with:
- The secret embedded in JSON data
- Array of 2000 items each referencing the secret
- 400KB of padding to ensure size ~500KB

2. **Test Phase** (agent):
- **Step 1**: Read `/workspace/test-data/test-secret.txt` to get the expected secret
- **Step 2**: Call filesystem MCP server to read `/workspace/test-data/large-test-file.json`
- **Step 3**: Gateway intercepts large response, stores to disk, returns metadata
- **Step 4**: Extract `payloadPath` and `queryID` from metadata
- **Step 5**: Translate path and read from `/workspace/mcp-payloads/{sessionID}/{queryID}/payload.json`
- **Step 6**: Extract secret from payload and verify it matches expected secret

3. **Report Phase** (safe-output):
- Create GitHub issue with test results
- Report pass/fail for each step
- Include secret comparison results

### Volume Mounts

The workflow uses three volume mounts to enable the test:

1. **Test Data Mount** (filesystem MCP server):
```yaml
/tmp/mcp-test-fs:/workspace/test-data:ro
```
- Contains the control secret file and large test file
- Read-only access for safety
- Accessible to agent via `/workspace/test-data/`

2. **Payload Mount** (filesystem MCP server):
```yaml
/tmp/jq-payloads:/workspace/mcp-payloads:ro
```
- Allows agent to read stored payloads
- Read-only to prevent accidental corruption
- Accessible to agent via `/workspace/mcp-payloads/`

3. **Gateway Payload Mount** (MCP gateway container):
```yaml
/tmp/jq-payloads:/tmp/jq-payloads:rw
```
- Allows gateway to write payload files
- Read-write for payload storage

### Path Translation

The agent must translate paths between gateway and agent perspectives:

- **Gateway reports**: `/tmp/jq-payloads/{sessionID}/{queryID}/payload.json`
- **Agent uses**: `/workspace/mcp-payloads/{sessionID}/{queryID}/payload.json`
- **Translation rule**: Replace `/tmp/jq-payloads` → `/workspace/mcp-payloads`

## Expected Behavior

### Success Scenario

When working correctly:
1. Gateway intercepts the large file read response
2. Gateway stores payload to disk with structure: `{payloadDir}/{sessionID}/{queryID}/payload.json`
3. Gateway returns metadata with `truncated: true` and `payloadPath`
4. Agent translates path and successfully reads payload file
5. Agent extracts secret from payload
6. Secret matches the expected value
7. Test reports **PASS**

### Failure Scenarios

The test is designed to detect:
- **Gateway not intercepting**: No `payloadPath` in response
- **Wrong path structure**: Agent can't find the file
- **Permission issues**: Agent can't read the payload file
- **Mount problems**: Volume mounts not configured correctly
- **Data corruption**: Secret in payload doesn't match expected
- **Session isolation broken**: Wrong session directory used

## Files

- `.github/workflows/large-payload-tester.md` - Workflow definition with frontmatter and setup steps
- `.github/agentics/large-payload-tester.md` - Agent prompt with detailed test instructions
- `.github/workflows/large-payload-tester.lock.yml` - Compiled GitHub Actions workflow

## Triggering

- **Manual**: `workflow_dispatch` - Can be triggered manually from GitHub UI
- **Scheduled**: Runs daily at a scattered time (around 1:12 AM UTC)

## Configuration

Key configuration in frontmatter:
- `strict: true` - Enforces security best practices
- `timeout-minutes: 10` - Reasonable timeout for the test
- `network.allowed: [defaults]` - Minimal network access
- `tools.bash: ["*"]` - Full bash access for setup steps
- `mcp-servers.filesystem` - Configured with two volume mounts

## Related Features

This workflow tests the jqschema middleware feature. The related implementation files are:
- `internal/middleware/jqschema.go` - Middleware that intercepts large responses
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "Middleware that intercepts large responses" but this is inaccurate. Based on the implementation in internal/middleware/jqschema.go, the middleware intercepts and processes ALL successful tool responses regardless of size. It always stores payloads to disk and returns metadata. The 500 character threshold only determines whether the preview is truncated, not whether the middleware is applied.

The comment should be clarified to say "Middleware that intercepts all responses" or "Middleware that processes tool responses and stores payloads".

Suggested change
- `internal/middleware/jqschema.go` - Middleware that intercepts large responses
- `internal/middleware/jqschema.go` - Middleware that processes tool responses and stores payloads

Copilot uses AI. Check for mistakes.
- `internal/middleware/README.md` - Documentation of the jqschema middleware
- `internal/config/config_payload.go` - Payload directory configuration
- `test/integration/large_payload_test.go` - Unit/integration tests for payload handling

These files already exist in the repository and implement the feature being tested by this workflow.

## Security Considerations

- **Read-only mounts**: Agent has read-only access to payload directory
- **Session isolation**: Each session gets its own subdirectory
- **Payload cleanup**: Old payloads are not automatically cleaned up (manual cleanup needed)
- **File permissions**: Payload files created with `0600` (owner read/write only)
- **Secret handling**: Test secret is only used for this test and is not sensitive

## Future Enhancements

Potential improvements:
1. Test with multiple concurrent sessions
2. Test with very large payloads (>10MB)
3. Test payload cleanup mechanisms
4. Add performance metrics (storage time, read time)
5. Test error handling (disk full, permission denied)
6. Verify jq schema accuracy against complex data structures
Loading