Expand smoke-copilot test coverage to match smoke-claude #718

Copilot · 2026-02-05T18:11:38Z

smoke-copilot had 5 tests while smoke-claude had 10. This brings them to parity for comprehensive engine validation.

Changes

Added shared workflows (4 files)

mcp-pagination.md - MCP response size limits and pagination patterns
github-queries-safe-input.md - Safe-input tools for issues/PRs/discussions with jq filtering
go-make.md - Safe-input wrappers for Go and Make commands
github-mcp-app.md - GitHub App configuration template

Expanded test coverage (5 → 10 tests)

Added: Safe Inputs GH CLI, Make Build, Tavily Web Search, Discussion Interaction, Agentic Workflows MCP
Enhanced: Serena MCP now validates find_symbol results (≥3 symbols)

Updated configuration

Permissions: +discussions: read
Imports: 7 shared workflows
Safe-outputs: +max: 2, +group: true, +close-older-issues: true
Timeout: 5 → 15 minutes
Tools: +github.toolsets: [repos, pull_requests], +runtimes.go: 1.25
Output: Issue creation + PR/discussion comments (was PR comment only)

Note: max-turns excluded - Copilot engine doesn't support this feature (Claude-only).

Test Matrix

Test	Coverage
GitHub MCP	✅ Last 2 merged PRs
Safe Inputs GH CLI	✅ Query via `safeinputs-gh`
Serena MCP	✅ `activate_project` + `find_symbol`
Make Build	✅ `safeinputs-make build`
Playwright	✅ github.com navigation
Tavily Search	✅ "GitHub Agentic Workflows"
File I/O	✅ Create + verify test file
Bash	✅ Command execution
Discussions	✅ Query + comment
Agentic Workflows	✅ Status introspection

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…overage Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>

… and recompile workflow Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>

Copilot

Pull request overview

This pull request expands the smoke-copilot test suite from 5 to 10 tests to achieve comprehensive parity with smoke-claude coverage (note: smoke-codex has 6 tests). The expansion validates additional MCP servers, safe-input tools, and interaction patterns while maintaining the workflow's core purpose of validating Copilot engine functionality.

Changes:

Added 4 new shared workflow files providing reusable configurations for MCP pagination patterns, GitHub query tools with jq filtering, Go/Make command wrappers, and GitHub App templates
Expanded test coverage from 5 to 10 tests, adding validation for safe-input tools (GH CLI, Make), Tavily web search, discussion interactions, and agentic workflows introspection
Updated workflow configuration with new permissions (discussions: read), increased timeout (5→15 minutes), enhanced safe-outputs settings, and Go 1.25 runtime support

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`.github/workflows/smoke-copilot.md`	Main workflow definition: expanded from 5 to 10 tests, added 7 shared workflow imports, updated permissions/tools/runtimes, changed output to create issues + optional PR/discussion comments (was PR-only), increased timeout to 15 minutes
`.github/workflows/smoke-copilot.lock.yml`	Compiled workflow: version bumps (Copilot CLI 0.0.402→0.0.403, awf v0.13.1→v0.13.4, mcpg v0.0.98→v0.0.99), added safe-inputs MCP server with 5 tools, added Tavily/Serena MCP configurations, added Go 1.25 setup, enhanced allowed domains for Go/Tavily
`.github/workflows/shared/mcp-pagination.md`	New shared workflow documenting MCP 25k token limit, pagination best practices with perPage parameters, and tool-specific guidance for avoiding oversized responses
`.github/workflows/shared/github-queries-safe-input.md`	New shared workflow defining 3 safe-input tools (github-issue-query, github-pr-query, github-discussion-query) with jq filtering support and schema introspection capabilities
`.github/workflows/shared/go-make.md`	New shared workflow defining safeinputs-go and safeinputs-make tools with usage guidelines emphasizing consistent execution over direct bash commands
`.github/workflows/shared/github-mcp-app.md`	New shared workflow providing commented-out GitHub App configuration template for MCP server authentication (repository-level APP_ID/APP_PRIVATE_KEY)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-05T18:28:45Z

.github/workflows/smoke-copilot.md

 ---

-# Smoke Test: Copilot Engine Validation
+# Smoke Test: Copilot Engine Validation.


The heading has an extra period at the end: "Copilot Engine Validation." This is inconsistent with the smoke-codex workflow which uses "Codex Engine Validation" (without a period). For consistency, consider removing the trailing period.

Suggested change

# Smoke Test: Copilot Engine Validation.

# Smoke Test: Copilot Engine Validation

Copilot · 2026-02-05T18:28:45Z

.github/workflows/smoke-copilot.md

+2. **Safe Inputs GH CLI Testing**: Use the `safeinputs-gh` tool to query 2 pull requests from ${{ github.repository }} (use args: "pr list --repo ${{ github.repository }} --limit 2 --json number,title,author")
+3. **Serena MCP Testing**: 
+   - Use the Serena MCP server tool `activate_project` to initialize the workspace at `${{ github.workspace }}` and verify it succeeds (do NOT use bash to run go commands - use Serena's MCP tools or the safeinputs-go/safeinputs-make tools from the go-make shared workflow)
+   - After initialization, use the `find_symbol` tool to search for symbols (find which tool to call) and verify that at least 3 symbols are found in the results


The instruction "use the find_symbol tool to search for symbols (find which tool to call)" is ambiguous. The parenthetical note "(find which tool to call)" could mean:

Discover which Serena MCP tool is available for finding symbols

Search for symbols related to the words "tool" or "call"

Find any symbols to verify the tool works

For clarity, consider either:

Removing the parenthetical if it's just explanatory noise

Rephrasing to be more explicit, e.g., "use the find_symbol tool with an appropriate search query to find symbols" or "discover and use the appropriate Serena tool for finding symbols"

Suggested change

- After initialization, use the `find_symbol` tool to search for symbols (find which tool to call) and verify that at least 3 symbols are found in the results

- After initialization, use the `find_symbol` tool with an appropriate search query to find symbols in the project and verify that at least 3 symbols are found in the results

Copilot · 2026-02-05T18:28:45Z

.github/workflows/smoke-copilot.md

+9. **Discussion Interaction Testing**: 
+   - Use the `github-discussion-query` safe-input tool with params: `limit=1, jq=".[0]"` to get the latest discussion from ${{ github.repository }}
+   - Extract the discussion number from the result (e.g., if the result is `{"number": 123, "title": "...", ...}`, extract 123)
+   - Use the `add_comment` tool with `discussion_number: <extracted_number>` to add a fun, news-reporter style comment stating that the smoke test agent was here


The test instruction uses an incorrect parameter name discussion_number when calling the add_comment tool. According to the tool schema (visible in the lock.yml file at line 294-296), the correct parameter name is item_number. This inconsistency could cause the agent to fail when executing test #9.

The instruction should be updated to use item_number: <extracted_number> instead of discussion_number: <extracted_number> to match the actual tool signature.

Suggested change

- Use the `add_comment` tool with `discussion_number: <extracted_number>` to add a fun, news-reporter style comment stating that the smoke test agent was here

- Use the `add_comment` tool with `item_number: <extracted_number>` to add a fun, news-reporter style comment stating that the smoke test agent was here

Copilot · 2026-02-05T18:28:46Z

.github/workflows/smoke-copilot.md

+   - Use the `github-discussion-query` safe-input tool with params: `limit=1, jq=".[0]"` to get the latest discussion from ${{ github.repository }}
+   - Extract the discussion number from the result (e.g., if the result is `{"number": 123, "title": "...", ...}`, extract 123)
+   - Use the `add_comment` tool with `discussion_number: <extracted_number>` to add a fun, news-reporter style comment stating that the smoke test agent was here
+10. **Agentic Workflows MCP Testing**: 
+   - Use the `agentic-workflows` MCP tool with the `status` method to query the status of the "smoke-copilot" workflow in ${{ github.repository }}
+   - Extract key information: total runs, recent success/failure status, last run time
+   - Write a summary of the smoke-copilot workflow status to `/tmp/gh-aw/agent/smoke-copilot-status-${{ github.run_id }}.txt`
+   - Use bash to display the file contents

 ## Output

-Add a **very brief** comment (max 5-10 lines) to the current pull request with:
- PR titles only (no descriptions)
- ✅ or ❌ for each test result
- Overall status: PASS or FAIL
- Mention the pull request author and any assignees
+1. **Create an issue** with a summary of the smoke test run:
+   - Title: "Smoke Test: Copilot - ${{ github.run_id }}"
+   - Body should include:
+     - Test results (✅ or ❌ for each test)
+     - Overall status: PASS or FAIL
+     - Run URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
+     - Timestamp
+
+2. **Only if this workflow was triggered by a pull_request event**: Use the `add_comment` tool to add a **very brief** comment (max 5-10 lines) to the triggering pull request (omit the `item_number` parameter to auto-target the triggering PR) with:
+   - PR titles only (no descriptions)
+   - ✅ or ❌ for each test result
+   - Overall status: PASS or FAIL
+
+3. Use the `add_comment` tool with `item_number` set to the discussion number you extracted in step 9 to add a **fun news-reporter style comment** to that discussion - be playful and use reporter language like "📰 BREAKING NEWS!"


Test #9 (Discussion Interaction Testing) and Output step 3 create a dependency that will fail if the repository has no discussions. The test instruction uses jq=".[0]" which returns null for an empty array, and Output step 3 unconditionally requires adding a comment to "the discussion number you extracted in step 9".

This creates two failure points:

Test Need integration tests to ensure that flowguard-go runs as transparent mcp proxy #9 will fail when trying to extract a discussion number from null

Output step 3 will fail if there's no discussion number to use

Consider either:

Making Output step 3 conditional: "If a discussion was found in step 9..."

Using a more defensive jq filter that handles empty results gracefully

Documenting that the repository must have at least one discussion for this workflow to succeed

Skipping this test if no discussions exist

Initial plan

852fbea

Copilot AI assigned Copilot and lpcox Feb 5, 2026

Copilot started work on behalf of lpcox February 5, 2026 18:11 View session

Copilot AI and others added 2 commits February 5, 2026 18:15

Add shared workflows and expand smoke-copilot to match smoke-claude c…

ca6a238

…overage Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>

Remove max-turns from smoke-copilot (not supported by Copilot engine)…

892da6d

… and recompile workflow Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Expand smoke copilot agentic-workflow testing coverage~~ Expand smoke-copilot test coverage to match smoke-claude Feb 5, 2026

Copilot AI requested a review from lpcox February 5, 2026 18:19

Copilot finished work on behalf of lpcox February 5, 2026 18:19

lpcox marked this pull request as ready for review February 5, 2026 18:23

Copilot AI review requested due to automatic review settings February 5, 2026 18:23

lpcox merged commit 2b6fef1 into main Feb 5, 2026

lpcox deleted the copilot/expand-smoke-coverage branch February 5, 2026 18:23

Copilot started reviewing on behalf of lpcox February 5, 2026 18:23 View session

Copilot AI reviewed Feb 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand smoke-copilot test coverage to match smoke-claude #718

Expand smoke-copilot test coverage to match smoke-claude #718

Uh oh!

Copilot AI commented Feb 5, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Copilot AI Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	# Smoke Test: Copilot Engine Validation.
	# Smoke Test: Copilot Engine Validation

	- After initialization, use the `find_symbol` tool to search for symbols (find which tool to call) and verify that at least 3 symbols are found in the results
	- After initialization, use the `find_symbol` tool with an appropriate search query to find symbols in the project and verify that at least 3 symbols are found in the results

	- Use the `add_comment` tool with `discussion_number: <extracted_number>` to add a fun, news-reporter style comment stating that the smoke test agent was here
	- Use the `add_comment` tool with `item_number: <extracted_number>` to add a fun, news-reporter style comment stating that the smoke test agent was here

Expand smoke-copilot test coverage to match smoke-claude #718

Expand smoke-copilot test coverage to match smoke-claude #718

Uh oh!

Conversation

Copilot AI commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Test Matrix

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Feb 5, 2026 •

edited

Loading