Add daily exploratory testing workflow for CLI tools#14168
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Removed permissions for the 'github/agentic-workflow-workbench' repository in the devcontainer configuration.
There was a problem hiding this comment.
Pull request overview
This pull request adds a new daily workflow that performs comprehensive exploratory testing of three critical CLI tools (gh aw audit, gh aw logs, and gh aw compile) via the agentic-workflows MCP server. The workflow runs daily with fuzzy scheduling (scattered to 17:44 UTC) and includes systematic testing phases covering happy paths, edge cases, performance benchmarks, and cross-command integration.
Changes:
- Adds daily-cli-tools-tester.md workflow with 8 phases of exploratory testing covering environment discovery, logs testing, audit testing, compile testing, integration tests, performance benchmarks, usability assessment, and issue reporting
- Generates corresponding lock file (daily-cli-tools-tester.lock.yml) configuring Copilot engine, MCP servers, safe outputs (create-issue max 5, noop), and 60-minute timeout
- Reorders repository entries in devcontainer.json alphabetically
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| .github/workflows/daily-cli-tools-tester.md | New workflow markdown defining comprehensive CLI testing strategy with 8 phases, MCP-only tool usage, and systematic edge case coverage |
| .github/workflows/daily-cli-tools-tester.lock.yml | Generated lock file with Copilot engine configuration, agentic-workflows MCP server setup, safe outputs configuration (create-issue with 7d expiry, noop), and proper job orchestration |
| .devcontainer/devcontainer.json | Cosmetic reordering of github/agentic-workflow-workbench and github/gh-aw repository entries to alphabetical order |
Comments suppressed due to low confidence (1)
.devcontainer/devcontainer.json:38
- This change reorders the repository entries in the devcontainer configuration, swapping the positions of "github/agentic-workflow-workbench" and "github/gh-aw". This appears to be an unrelated cosmetic change that makes them alphabetically ordered. Consider whether this should be included in a PR focused on adding a CLI testing workflow, or if it should be split into a separate commit/PR for configuration cleanup.
"github/gh-aw": {
"permissions": {
"actions": "write",
"contents": "write",
"discussions": "read",
"issues": "read",
"pull-requests": "write",
"workflows": "write"
}
}
}
}
},
"features": {
"ghcr.io/anthropics/devcontainer-features/claude-code:1.0": {},
"ghcr.io/devcontainers/features/copilot-cli:latest": {},
"ghcr.io/devcontainers/features/docker-in-docker:2": {},
"ghcr.io/devcontainers/features/github-cli:1": {},
"ghcr.io/devcontainers/features/node:1": {
"version": "24"
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| actions: read | ||
| tools: | ||
| agentic-workflows: | ||
| bash: |
There was a problem hiding this comment.
The workflow markdown includes bash code blocks using commands that are not in the allowed bash tools list. The frontmatter specifies bash: which defaults to a limited set of commands (echo, ls, pwd, cat, head, tail, grep, wc, sort, uniq, date, yq), but the workflow instructions include:
findcommand (lines 161, 164, 389, 392, 525)forloops (lines 164, 392)diffcommand (line 419)timecommand (lines 497, 500, 503)dfcommand (line 522)ducommand (line 528)
These commands will fail when the agent attempts to use them. Either:
- Update the frontmatter to allow these specific bash commands:
bash: ["find", "for", "diff", "time", "df", "du"](which will merge with defaults), or - Remove or modify these bash code block examples to only use the default allowed commands, or
- Use
bash: ["*"]to allow all bash commands if that's the intended behavior for this testing workflow
| bash: | |
| bash: ["find", "for", "diff", "time", "df", "du"] |
| @@ -0,0 +1,669 @@ | |||
| --- | |||
| description: Daily exploratory testing of audit, logs, and compile tools in gh-aw CLI | |||
There was a problem hiding this comment.
Consider adding engine: copilot to the frontmatter for clarity and consistency with other similar workflows (e.g., daily-cli-performance.md). While Copilot is the default engine and will be used automatically, explicitly specifying it makes the configuration more maintainable and self-documenting, especially since the PR description explicitly mentions "Engine: Copilot".
| description: Daily exploratory testing of audit, logs, and compile tools in gh-aw CLI | |
| description: Daily exploratory testing of audit, logs, and compile tools in gh-aw CLI | |
| engine: copilot |
Implements systematic testing of
audit,logs, andcompilecommands via the agentic-workflows MCP server.Workflow Configuration
create-issue(max 5, 7d TTL),noopTesting Coverage
Phase 1-2: Logs Command
Phase 3: Audit Command
Phase 4: Compile Command
Phase 5-7: Integration & Usability
Phase 8: Reporting
noopwhen all tests pass for transparencyImplementation Notes
The workflow uses the MCP server exclusively (not direct CLI) since authentication is not configured for direct
gh awinvocation in workflow context. Testing approach is exploratory with systematic phases, covering happy paths, edge cases, and cross-command integration.Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.