-
Notifications
You must be signed in to change notification settings - Fork 7
Expand smoke-copilot test coverage to match smoke-claude #718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…overage Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>
… and recompile workflow Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request expands the smoke-copilot test suite from 5 to 10 tests to achieve comprehensive parity with smoke-claude coverage (note: smoke-codex has 6 tests). The expansion validates additional MCP servers, safe-input tools, and interaction patterns while maintaining the workflow's core purpose of validating Copilot engine functionality.
Changes:
- Added 4 new shared workflow files providing reusable configurations for MCP pagination patterns, GitHub query tools with jq filtering, Go/Make command wrappers, and GitHub App templates
- Expanded test coverage from 5 to 10 tests, adding validation for safe-input tools (GH CLI, Make), Tavily web search, discussion interactions, and agentic workflows introspection
- Updated workflow configuration with new permissions (
discussions: read), increased timeout (5→15 minutes), enhanced safe-outputs settings, and Go 1.25 runtime support
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
.github/workflows/smoke-copilot.md |
Main workflow definition: expanded from 5 to 10 tests, added 7 shared workflow imports, updated permissions/tools/runtimes, changed output to create issues + optional PR/discussion comments (was PR-only), increased timeout to 15 minutes |
.github/workflows/smoke-copilot.lock.yml |
Compiled workflow: version bumps (Copilot CLI 0.0.402→0.0.403, awf v0.13.1→v0.13.4, mcpg v0.0.98→v0.0.99), added safe-inputs MCP server with 5 tools, added Tavily/Serena MCP configurations, added Go 1.25 setup, enhanced allowed domains for Go/Tavily |
.github/workflows/shared/mcp-pagination.md |
New shared workflow documenting MCP 25k token limit, pagination best practices with perPage parameters, and tool-specific guidance for avoiding oversized responses |
.github/workflows/shared/github-queries-safe-input.md |
New shared workflow defining 3 safe-input tools (github-issue-query, github-pr-query, github-discussion-query) with jq filtering support and schema introspection capabilities |
.github/workflows/shared/go-make.md |
New shared workflow defining safeinputs-go and safeinputs-make tools with usage guidelines emphasizing consistent execution over direct bash commands |
.github/workflows/shared/github-mcp-app.md |
New shared workflow providing commented-out GitHub App configuration template for MCP server authentication (repository-level APP_ID/APP_PRIVATE_KEY) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| --- | ||
|
|
||
| # Smoke Test: Copilot Engine Validation | ||
| # Smoke Test: Copilot Engine Validation. |
Copilot
AI
Feb 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The heading has an extra period at the end: "Copilot Engine Validation." This is inconsistent with the smoke-codex workflow which uses "Codex Engine Validation" (without a period). For consistency, consider removing the trailing period.
| # Smoke Test: Copilot Engine Validation. | |
| # Smoke Test: Copilot Engine Validation |
| 2. **Safe Inputs GH CLI Testing**: Use the `safeinputs-gh` tool to query 2 pull requests from ${{ github.repository }} (use args: "pr list --repo ${{ github.repository }} --limit 2 --json number,title,author") | ||
| 3. **Serena MCP Testing**: | ||
| - Use the Serena MCP server tool `activate_project` to initialize the workspace at `${{ github.workspace }}` and verify it succeeds (do NOT use bash to run go commands - use Serena's MCP tools or the safeinputs-go/safeinputs-make tools from the go-make shared workflow) | ||
| - After initialization, use the `find_symbol` tool to search for symbols (find which tool to call) and verify that at least 3 symbols are found in the results |
Copilot
AI
Feb 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instruction "use the find_symbol tool to search for symbols (find which tool to call)" is ambiguous. The parenthetical note "(find which tool to call)" could mean:
- Discover which Serena MCP tool is available for finding symbols
- Search for symbols related to the words "tool" or "call"
- Find any symbols to verify the tool works
For clarity, consider either:
- Removing the parenthetical if it's just explanatory noise
- Rephrasing to be more explicit, e.g., "use the
find_symboltool with an appropriate search query to find symbols" or "discover and use the appropriate Serena tool for finding symbols"
| - After initialization, use the `find_symbol` tool to search for symbols (find which tool to call) and verify that at least 3 symbols are found in the results | |
| - After initialization, use the `find_symbol` tool with an appropriate search query to find symbols in the project and verify that at least 3 symbols are found in the results |
| 9. **Discussion Interaction Testing**: | ||
| - Use the `github-discussion-query` safe-input tool with params: `limit=1, jq=".[0]"` to get the latest discussion from ${{ github.repository }} | ||
| - Extract the discussion number from the result (e.g., if the result is `{"number": 123, "title": "...", ...}`, extract 123) | ||
| - Use the `add_comment` tool with `discussion_number: <extracted_number>` to add a fun, news-reporter style comment stating that the smoke test agent was here |
Copilot
AI
Feb 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test instruction uses an incorrect parameter name discussion_number when calling the add_comment tool. According to the tool schema (visible in the lock.yml file at line 294-296), the correct parameter name is item_number. This inconsistency could cause the agent to fail when executing test #9.
The instruction should be updated to use item_number: <extracted_number> instead of discussion_number: <extracted_number> to match the actual tool signature.
| - Use the `add_comment` tool with `discussion_number: <extracted_number>` to add a fun, news-reporter style comment stating that the smoke test agent was here | |
| - Use the `add_comment` tool with `item_number: <extracted_number>` to add a fun, news-reporter style comment stating that the smoke test agent was here |
| - Use the `github-discussion-query` safe-input tool with params: `limit=1, jq=".[0]"` to get the latest discussion from ${{ github.repository }} | ||
| - Extract the discussion number from the result (e.g., if the result is `{"number": 123, "title": "...", ...}`, extract 123) | ||
| - Use the `add_comment` tool with `discussion_number: <extracted_number>` to add a fun, news-reporter style comment stating that the smoke test agent was here | ||
| 10. **Agentic Workflows MCP Testing**: | ||
| - Use the `agentic-workflows` MCP tool with the `status` method to query the status of the "smoke-copilot" workflow in ${{ github.repository }} | ||
| - Extract key information: total runs, recent success/failure status, last run time | ||
| - Write a summary of the smoke-copilot workflow status to `/tmp/gh-aw/agent/smoke-copilot-status-${{ github.run_id }}.txt` | ||
| - Use bash to display the file contents | ||
|
|
||
| ## Output | ||
|
|
||
| Add a **very brief** comment (max 5-10 lines) to the current pull request with: | ||
| - PR titles only (no descriptions) | ||
| - ✅ or ❌ for each test result | ||
| - Overall status: PASS or FAIL | ||
| - Mention the pull request author and any assignees | ||
| 1. **Create an issue** with a summary of the smoke test run: | ||
| - Title: "Smoke Test: Copilot - ${{ github.run_id }}" | ||
| - Body should include: | ||
| - Test results (✅ or ❌ for each test) | ||
| - Overall status: PASS or FAIL | ||
| - Run URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }} | ||
| - Timestamp | ||
|
|
||
| 2. **Only if this workflow was triggered by a pull_request event**: Use the `add_comment` tool to add a **very brief** comment (max 5-10 lines) to the triggering pull request (omit the `item_number` parameter to auto-target the triggering PR) with: | ||
| - PR titles only (no descriptions) | ||
| - ✅ or ❌ for each test result | ||
| - Overall status: PASS or FAIL | ||
|
|
||
| 3. Use the `add_comment` tool with `item_number` set to the discussion number you extracted in step 9 to add a **fun news-reporter style comment** to that discussion - be playful and use reporter language like "📰 BREAKING NEWS!" |
Copilot
AI
Feb 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test #9 (Discussion Interaction Testing) and Output step 3 create a dependency that will fail if the repository has no discussions. The test instruction uses jq=".[0]" which returns null for an empty array, and Output step 3 unconditionally requires adding a comment to "the discussion number you extracted in step 9".
This creates two failure points:
- Test Need integration tests to ensure that flowguard-go runs as transparent mcp proxy #9 will fail when trying to extract a discussion number from null
- Output step 3 will fail if there's no discussion number to use
Consider either:
- Making Output step 3 conditional: "If a discussion was found in step 9..."
- Using a more defensive jq filter that handles empty results gracefully
- Documenting that the repository must have at least one discussion for this workflow to succeed
- Skipping this test if no discussions exist
smoke-copilothad 5 tests whilesmoke-claudehad 10. This brings them to parity for comprehensive engine validation.Changes
Added shared workflows (4 files)
mcp-pagination.md- MCP response size limits and pagination patternsgithub-queries-safe-input.md- Safe-input tools for issues/PRs/discussions with jq filteringgo-make.md- Safe-input wrappers for Go and Make commandsgithub-mcp-app.md- GitHub App configuration templateExpanded test coverage (5 → 10 tests)
find_symbolresults (≥3 symbols)Updated configuration
discussions: readmax: 2, +group: true, +close-older-issues: truegithub.toolsets: [repos, pull_requests], +runtimes.go: 1.25Note:
max-turnsexcluded - Copilot engine doesn't support this feature (Claude-only).Test Matrix
safeinputs-ghactivate_project+find_symbolsafeinputs-make build✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.