Skip to content

Add daily exploratory testing workflow for CLI tools#14168

Merged
pelikhan merged 4 commits intomainfrom
copilot/exploratory-testing-audit-logs-tools
Feb 6, 2026
Merged

Add daily exploratory testing workflow for CLI tools#14168
pelikhan merged 4 commits intomainfrom
copilot/exploratory-testing-audit-logs-tools

Conversation

Copy link
Contributor

Copilot AI commented Feb 6, 2026

Implements systematic testing of audit, logs, and compile commands via the agentic-workflows MCP server.

Workflow Configuration

  • Schedule: Daily (fuzzy, scattered to 17:44 UTC)
  • Timeout: 60 minutes
  • Engine: Copilot
  • Safe Outputs: create-issue (max 5, 7d TTL), noop

Testing Coverage

Phase 1-2: Logs Command

  • Basic download, filtering (workflow/engine/date/count)
  • Edge cases: non-existent workflows, invalid dates, future/past date handling
  • Performance benchmarking (target: <10s)

Phase 3: Audit Command

  • Successful, failed, and safe-output runs
  • Deep analysis: tool detection, MCP server calls, network activity, performance metrics
  • Edge cases: invalid run IDs, stale runs, in-progress runs
  • Performance target: <30s per audit

Phase 4: Compile Command

  • Bulk and targeted compilation (simple, complex, with imports)
  • Lock file validation and incremental detection
  • Error handling: malformed YAML, invalid tools, missing imports
  • Performance target: <5s per workflow

Phase 5-7: Integration & Usability

  • Cross-command workflows (compile→run→audit, logs→audit)
  • Resource usage monitoring
  • Error message quality and documentation gaps

Phase 8: Reporting

  • Categorized findings (critical/major/minor/enhancements)
  • Detailed issue templates with reproduction steps
  • Uses noop when all tests pass for transparency

Implementation Notes

The workflow uses the MCP server exclusively (not direct CLI) since authentication is not configured for direct gh aw invocation in workflow context. Testing approach is exploratory with systematic phases, covering happy paths, edge cases, and cross-command integration.

Original prompt

Create a daily agentic workflow that does a deep exploratory testing of the audit, logs and compile tools in the agentic-workflows tool. Use copilot.
Create issue if a problem is detected. Timeout 1h.

Custom agent used: agentic-workflows
GitHub Agentic Workflows (gh-aw) - Create, debug, and upgrade AI-powered workflows with intelligent prompt routing


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits February 6, 2026 16:03
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Add exploratory testing for audit logs and compile tools Add daily exploratory testing workflow for CLI tools Feb 6, 2026
Copilot AI requested a review from pelikhan February 6, 2026 16:07
@pelikhan pelikhan marked this pull request as ready for review February 6, 2026 16:15
Copilot AI review requested due to automatic review settings February 6, 2026 16:15
Removed permissions for the 'github/agentic-workflow-workbench' repository in the devcontainer configuration.
@pelikhan pelikhan merged commit 88dda85 into main Feb 6, 2026
48 checks passed
@pelikhan pelikhan deleted the copilot/exploratory-testing-audit-logs-tools branch February 6, 2026 16:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds a new daily workflow that performs comprehensive exploratory testing of three critical CLI tools (gh aw audit, gh aw logs, and gh aw compile) via the agentic-workflows MCP server. The workflow runs daily with fuzzy scheduling (scattered to 17:44 UTC) and includes systematic testing phases covering happy paths, edge cases, performance benchmarks, and cross-command integration.

Changes:

  • Adds daily-cli-tools-tester.md workflow with 8 phases of exploratory testing covering environment discovery, logs testing, audit testing, compile testing, integration tests, performance benchmarks, usability assessment, and issue reporting
  • Generates corresponding lock file (daily-cli-tools-tester.lock.yml) configuring Copilot engine, MCP servers, safe outputs (create-issue max 5, noop), and 60-minute timeout
  • Reorders repository entries in devcontainer.json alphabetically

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
.github/workflows/daily-cli-tools-tester.md New workflow markdown defining comprehensive CLI testing strategy with 8 phases, MCP-only tool usage, and systematic edge case coverage
.github/workflows/daily-cli-tools-tester.lock.yml Generated lock file with Copilot engine configuration, agentic-workflows MCP server setup, safe outputs configuration (create-issue with 7d expiry, noop), and proper job orchestration
.devcontainer/devcontainer.json Cosmetic reordering of github/agentic-workflow-workbench and github/gh-aw repository entries to alphabetical order
Comments suppressed due to low confidence (1)

.devcontainer/devcontainer.json:38

  • This change reorders the repository entries in the devcontainer configuration, swapping the positions of "github/agentic-workflow-workbench" and "github/gh-aw". This appears to be an unrelated cosmetic change that makes them alphabetically ordered. Consider whether this should be included in a PR focused on adding a CLI testing workflow, or if it should be split into a separate commit/PR for configuration cleanup.
        "github/gh-aw": {
          "permissions": {
            "actions": "write",
            "contents": "write",
            "discussions": "read",
            "issues": "read",
            "pull-requests": "write",
            "workflows": "write"
          }
        }
      }
    }
  },
  "features": {
    "ghcr.io/anthropics/devcontainer-features/claude-code:1.0": {},
    "ghcr.io/devcontainers/features/copilot-cli:latest": {},
    "ghcr.io/devcontainers/features/docker-in-docker:2": {},
    "ghcr.io/devcontainers/features/github-cli:1": {},
    "ghcr.io/devcontainers/features/node:1": {
      "version": "24"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

actions: read
tools:
agentic-workflows:
bash:
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow markdown includes bash code blocks using commands that are not in the allowed bash tools list. The frontmatter specifies bash: which defaults to a limited set of commands (echo, ls, pwd, cat, head, tail, grep, wc, sort, uniq, date, yq), but the workflow instructions include:

  • find command (lines 161, 164, 389, 392, 525)
  • for loops (lines 164, 392)
  • diff command (line 419)
  • time command (lines 497, 500, 503)
  • df command (line 522)
  • du command (line 528)

These commands will fail when the agent attempts to use them. Either:

  1. Update the frontmatter to allow these specific bash commands: bash: ["find", "for", "diff", "time", "df", "du"] (which will merge with defaults), or
  2. Remove or modify these bash code block examples to only use the default allowed commands, or
  3. Use bash: ["*"] to allow all bash commands if that's the intended behavior for this testing workflow
Suggested change
bash:
bash: ["find", "for", "diff", "time", "df", "du"]

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,669 @@
---
description: Daily exploratory testing of audit, logs, and compile tools in gh-aw CLI
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding engine: copilot to the frontmatter for clarity and consistency with other similar workflows (e.g., daily-cli-performance.md). While Copilot is the default engine and will be used automatically, explicitly specifying it makes the configuration more maintainable and self-documenting, especially since the PR description explicitly mentions "Engine: Copilot".

Suggested change
description: Daily exploratory testing of audit, logs, and compile tools in gh-aw CLI
description: Daily exploratory testing of audit, logs, and compile tools in gh-aw CLI
engine: copilot

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants