Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Feb 5, 2026

smoke-copilot had 5 tests while smoke-claude had 10. This brings them to parity for comprehensive engine validation.

Changes

Added shared workflows (4 files)

  • mcp-pagination.md - MCP response size limits and pagination patterns
  • github-queries-safe-input.md - Safe-input tools for issues/PRs/discussions with jq filtering
  • go-make.md - Safe-input wrappers for Go and Make commands
  • github-mcp-app.md - GitHub App configuration template

Expanded test coverage (5 → 10 tests)

  • Added: Safe Inputs GH CLI, Make Build, Tavily Web Search, Discussion Interaction, Agentic Workflows MCP
  • Enhanced: Serena MCP now validates find_symbol results (≥3 symbols)

Updated configuration

  • Permissions: +discussions: read
  • Imports: 7 shared workflows
  • Safe-outputs: +max: 2, +group: true, +close-older-issues: true
  • Timeout: 5 → 15 minutes
  • Tools: +github.toolsets: [repos, pull_requests], +runtimes.go: 1.25
  • Output: Issue creation + PR/discussion comments (was PR comment only)

Note: max-turns excluded - Copilot engine doesn't support this feature (Claude-only).

Test Matrix

Test Coverage
GitHub MCP ✅ Last 2 merged PRs
Safe Inputs GH CLI ✅ Query via safeinputs-gh
Serena MCP activate_project + find_symbol
Make Build safeinputs-make build
Playwright ✅ github.com navigation
Tavily Search ✅ "GitHub Agentic Workflows"
File I/O ✅ Create + verify test file
Bash ✅ Command execution
Discussions ✅ Query + comment
Agentic Workflows ✅ Status introspection

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits February 5, 2026 18:15
…overage

Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>
… and recompile workflow

Co-authored-by: lpcox <15877973+lpcox@users.noreply.github.com>
Copilot AI changed the title [WIP] Expand smoke copilot agentic-workflow testing coverage Expand smoke-copilot test coverage to match smoke-claude Feb 5, 2026
Copilot AI requested a review from lpcox February 5, 2026 18:19
@lpcox lpcox marked this pull request as ready for review February 5, 2026 18:23
Copilot AI review requested due to automatic review settings February 5, 2026 18:23
@lpcox lpcox merged commit 2b6fef1 into main Feb 5, 2026
@lpcox lpcox deleted the copilot/expand-smoke-coverage branch February 5, 2026 18:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request expands the smoke-copilot test suite from 5 to 10 tests to achieve comprehensive parity with smoke-claude coverage (note: smoke-codex has 6 tests). The expansion validates additional MCP servers, safe-input tools, and interaction patterns while maintaining the workflow's core purpose of validating Copilot engine functionality.

Changes:

  • Added 4 new shared workflow files providing reusable configurations for MCP pagination patterns, GitHub query tools with jq filtering, Go/Make command wrappers, and GitHub App templates
  • Expanded test coverage from 5 to 10 tests, adding validation for safe-input tools (GH CLI, Make), Tavily web search, discussion interactions, and agentic workflows introspection
  • Updated workflow configuration with new permissions (discussions: read), increased timeout (5→15 minutes), enhanced safe-outputs settings, and Go 1.25 runtime support

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
.github/workflows/smoke-copilot.md Main workflow definition: expanded from 5 to 10 tests, added 7 shared workflow imports, updated permissions/tools/runtimes, changed output to create issues + optional PR/discussion comments (was PR-only), increased timeout to 15 minutes
.github/workflows/smoke-copilot.lock.yml Compiled workflow: version bumps (Copilot CLI 0.0.402→0.0.403, awf v0.13.1→v0.13.4, mcpg v0.0.98→v0.0.99), added safe-inputs MCP server with 5 tools, added Tavily/Serena MCP configurations, added Go 1.25 setup, enhanced allowed domains for Go/Tavily
.github/workflows/shared/mcp-pagination.md New shared workflow documenting MCP 25k token limit, pagination best practices with perPage parameters, and tool-specific guidance for avoiding oversized responses
.github/workflows/shared/github-queries-safe-input.md New shared workflow defining 3 safe-input tools (github-issue-query, github-pr-query, github-discussion-query) with jq filtering support and schema introspection capabilities
.github/workflows/shared/go-make.md New shared workflow defining safeinputs-go and safeinputs-make tools with usage guidelines emphasizing consistent execution over direct bash commands
.github/workflows/shared/github-mcp-app.md New shared workflow providing commented-out GitHub App configuration template for MCP server authentication (repository-level APP_ID/APP_PRIVATE_KEY)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

---

# Smoke Test: Copilot Engine Validation
# Smoke Test: Copilot Engine Validation.
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The heading has an extra period at the end: "Copilot Engine Validation." This is inconsistent with the smoke-codex workflow which uses "Codex Engine Validation" (without a period). For consistency, consider removing the trailing period.

Suggested change
# Smoke Test: Copilot Engine Validation.
# Smoke Test: Copilot Engine Validation

Copilot uses AI. Check for mistakes.
2. **Safe Inputs GH CLI Testing**: Use the `safeinputs-gh` tool to query 2 pull requests from ${{ github.repository }} (use args: "pr list --repo ${{ github.repository }} --limit 2 --json number,title,author")
3. **Serena MCP Testing**:
- Use the Serena MCP server tool `activate_project` to initialize the workspace at `${{ github.workspace }}` and verify it succeeds (do NOT use bash to run go commands - use Serena's MCP tools or the safeinputs-go/safeinputs-make tools from the go-make shared workflow)
- After initialization, use the `find_symbol` tool to search for symbols (find which tool to call) and verify that at least 3 symbols are found in the results
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instruction "use the find_symbol tool to search for symbols (find which tool to call)" is ambiguous. The parenthetical note "(find which tool to call)" could mean:

  1. Discover which Serena MCP tool is available for finding symbols
  2. Search for symbols related to the words "tool" or "call"
  3. Find any symbols to verify the tool works

For clarity, consider either:

  • Removing the parenthetical if it's just explanatory noise
  • Rephrasing to be more explicit, e.g., "use the find_symbol tool with an appropriate search query to find symbols" or "discover and use the appropriate Serena tool for finding symbols"
Suggested change
- After initialization, use the `find_symbol` tool to search for symbols (find which tool to call) and verify that at least 3 symbols are found in the results
- After initialization, use the `find_symbol` tool with an appropriate search query to find symbols in the project and verify that at least 3 symbols are found in the results

Copilot uses AI. Check for mistakes.
9. **Discussion Interaction Testing**:
- Use the `github-discussion-query` safe-input tool with params: `limit=1, jq=".[0]"` to get the latest discussion from ${{ github.repository }}
- Extract the discussion number from the result (e.g., if the result is `{"number": 123, "title": "...", ...}`, extract 123)
- Use the `add_comment` tool with `discussion_number: <extracted_number>` to add a fun, news-reporter style comment stating that the smoke test agent was here
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test instruction uses an incorrect parameter name discussion_number when calling the add_comment tool. According to the tool schema (visible in the lock.yml file at line 294-296), the correct parameter name is item_number. This inconsistency could cause the agent to fail when executing test #9.

The instruction should be updated to use item_number: <extracted_number> instead of discussion_number: <extracted_number> to match the actual tool signature.

Suggested change
- Use the `add_comment` tool with `discussion_number: <extracted_number>` to add a fun, news-reporter style comment stating that the smoke test agent was here
- Use the `add_comment` tool with `item_number: <extracted_number>` to add a fun, news-reporter style comment stating that the smoke test agent was here

Copilot uses AI. Check for mistakes.
Comment on lines +89 to +113
- Use the `github-discussion-query` safe-input tool with params: `limit=1, jq=".[0]"` to get the latest discussion from ${{ github.repository }}
- Extract the discussion number from the result (e.g., if the result is `{"number": 123, "title": "...", ...}`, extract 123)
- Use the `add_comment` tool with `discussion_number: <extracted_number>` to add a fun, news-reporter style comment stating that the smoke test agent was here
10. **Agentic Workflows MCP Testing**:
- Use the `agentic-workflows` MCP tool with the `status` method to query the status of the "smoke-copilot" workflow in ${{ github.repository }}
- Extract key information: total runs, recent success/failure status, last run time
- Write a summary of the smoke-copilot workflow status to `/tmp/gh-aw/agent/smoke-copilot-status-${{ github.run_id }}.txt`
- Use bash to display the file contents

## Output

Add a **very brief** comment (max 5-10 lines) to the current pull request with:
- PR titles only (no descriptions)
- ✅ or ❌ for each test result
- Overall status: PASS or FAIL
- Mention the pull request author and any assignees
1. **Create an issue** with a summary of the smoke test run:
- Title: "Smoke Test: Copilot - ${{ github.run_id }}"
- Body should include:
- Test results (✅ or ❌ for each test)
- Overall status: PASS or FAIL
- Run URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
- Timestamp

2. **Only if this workflow was triggered by a pull_request event**: Use the `add_comment` tool to add a **very brief** comment (max 5-10 lines) to the triggering pull request (omit the `item_number` parameter to auto-target the triggering PR) with:
- PR titles only (no descriptions)
- ✅ or ❌ for each test result
- Overall status: PASS or FAIL

3. Use the `add_comment` tool with `item_number` set to the discussion number you extracted in step 9 to add a **fun news-reporter style comment** to that discussion - be playful and use reporter language like "📰 BREAKING NEWS!"
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test #9 (Discussion Interaction Testing) and Output step 3 create a dependency that will fail if the repository has no discussions. The test instruction uses jq=".[0]" which returns null for an empty array, and Output step 3 unconditionally requires adding a comment to "the discussion number you extracted in step 9".

This creates two failure points:

  1. Test Need integration tests to ensure that flowguard-go runs as transparent mcp proxy #9 will fail when trying to extract a discussion number from null
  2. Output step 3 will fail if there's no discussion number to use

Consider either:

  • Making Output step 3 conditional: "If a discussion was found in step 9..."
  • Using a more defensive jq filter that handles empty results gracefully
  • Documenting that the repository must have at least one discussion for this workflow to succeed
  • Skipping this test if no discussions exist

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants