[copilot-cli-research] Copilot CLI Deep Research - 2026-02-21 #17575

2026-02-21T21:19:57Z

github-actions[bot]
bot Feb 21, 2026

Analysis Date: 2026-02-21 | Repository: github/gh-aw | Triggered by: @pelikhan
Scope: 157 total workflows, 75 using Copilot engine (48%) | Run: §22264457167

Executive Summary

This is a comprehensive analysis of GitHub Copilot CLI feature utilization across 75 Copilot-engine workflows in this repository. The analysis reveals 8 significant missed opportunities, with the most impactful being the complete non-adoption of safe-inputs (0% usage despite full support), zero model pinning, and only 19% of workflows using the AWF network firewall sandbox.

The repository demonstrates mature patterns in areas like safe-outputs (97%), timeout configuration (99%), and bash tooling (73%). However, there are clear gaps in security hardening (strict mode, rate-limiting, AWF), advanced engine configuration, and newer features like safe-inputs and plugins.

Primary Recommendation: Adopt safe-inputs for workflows that currently use bash: ["*"] or shell commands for API calls — this would improve security, testability, and prompt clarity for ~30 workflows.

Critical Findings

🔴 High Priority Issues

Issue	Impact	Affected Workflows
0% safe-inputs adoption	Missing inline MCP tool capability — forces use of raw bash	All 75 workflows
0% model pinning	Workflows silently shift behavior when default model changes	All 75 workflows
81% without AWF firewall	Network-unrestricted execution for 61 workflows	61/75 workflows
43% missing strict mode	Prompt injection risk for user-triggered workflows	32/75 workflows

🟡 Medium Priority Opportunities

Issue	Impact	Affected Workflows
rate-limit at 3%	User-triggered workflows vulnerable to cost/abuse	~15 workflows
engine.args/env at ~0%	Advanced CLI configuration completely unused	All 75
plugins at 0%	Copilot plugin ecosystem ignored	All 75
web-search in Copilot workflows	Copilot doesn't support web-search; warning silently issued	1 workflow

1️⃣ Current State Analysis

View Copilot CLI Capabilities Inventory

Available Engine Configuration Options

engine:
  id: copilot
  version: latest          # Pin specific CLI version
  model: gpt-5             # Pin AI model (COPILOT_MODEL env var)
  command: /usr/local/bin/copilot  # Custom executable
  args: ["--add-dir", "/workspace"]  # Extra CLI args (pre-prompt)
  agent: technical-doc-writer  # Custom .github/agents/*.agent.md
  env:
    MY_VAR: value          # Custom environment variables

Available CLI Flags (auto-configured by compiler)

--disable-builtin-mcps — always added
--allow-all-paths — added when edit: tool enabled
--allow-all-tools — added when bash: ["*"] wildcard used
--allow-tool (tool) — per-tool permission grants
--add-dir (path) — directory access grants
--log-level all --log-dir — always added for logging
--agent (id) — when engine.agent is set

Available Sandbox Options

AWF (sandbox.agent: awf) — network firewalling with allowlist
Standard — direct execution without network isolation

Available Tools

Tool	Notes
`bash`	Shell commands; `["*"]` grants all
`edit`	File write access
`github`	GitHub MCP server with toolsets
`web-fetch`	HTTP fetch via MCP
`playwright`	Browser automation
`serena`	Code intelligence (Go, TS, etc.)
`cache-memory`	Persistent cross-run storage
`repo-memory`	Git branch-backed storage
`agentic-workflows`	Workflow management tools
`safe-inputs`	Inline custom MCP tools (JS/shell/Python)
`plugins`	Copilot plugin installation

Available Network Domains (in `network.allowed`)

defaults, github, node, python, go, playwright, node-cdns, fonts, containers, *

View Feature Adoption Statistics

Feature	Count	Rate	Notes
timeout-minutes	74/75	99%	Excellent
safe-outputs	73/75	97%	Excellent
bash tool	55/75	73%	High adoption
strict mode	43/75	57%	Room to improve
imports (shared)	42/75	56%	Good reuse
edit tool	40/75	53%	Common
network config	35/75	47%	Half missing
cache-memory	28/75	37%	Good for stateful
tracker-id	25/75	33%	Underused
repo-memory	16/75	21%	Growing
sandbox/AWF	14/75	19%	Low security adoption
engine.agent	11/75	15%	Available agents underused
web-fetch	9/75	12%	Low despite network
runtimes	4/75	5%	Niche
playwright	4/75	5%	Niche
rate-limit	2/75	3%	⚠️ Very low
engine.env	1/75	1%	⚠️ Near zero
model pinning	0/75	0%	❌ Unused
safe-inputs	0/75	0%	❌ Unused
engine.args	0/75	0%	❌ Unused
plugins	0/75	0%	❌ Unused

Timeout distribution (among 74 with timeout):

15 min: 17 workflows
20 min: 19 workflows
30 min: 13 workflows
10 min: 15 workflows
5 min: 4 workflows, 45 min: 4 workflows, 60 min: 2 workflows

2️⃣ Feature Usage Matrix

Feature Category	Available	Used	Not Used	Usage Rate
Engine Config	version, model, command, args, env, agent	agent (15%)	model, args, env (0%)	~3%
Security	strict, rate-limit, AWF	strict (57%), AWF (19%), rate-limit (3%)	—	26% avg
Tools	bash, edit, github, web-fetch, playwright, serena, safe-inputs, plugins	most except safe-inputs, plugins	safe-inputs, plugins	~25% of available
Storage	cache-memory, repo-memory	cache-memory (37%), repo-memory (21%)	—	29%
Network	network.allowed, sandbox.awf	network (47%), AWF (19%)	—	33%

3️⃣ Missed Opportunities

🔴 High Priority: safe-inputs (0% adoption)

Opportunity: Replace bash API calls with safe-inputs tools

What: safe-inputs lets you define typed MCP tools inline in the workflow frontmatter using JavaScript, shell, or Python. These tools are mounted as an MCP server at runtime.

Why It Matters:

More secure than raw bash: ["*"] — no arbitrary command execution
Tools are typed (defined inputs/outputs), improving prompt clarity
Secrets can be passed safely without exposing them in bash commands
Works with strict mode (bash wildcard breaks strict behavior)
Can replace many patterns that currently use safeinputs-* shared imports

Where: ~30 workflows use bash: ["*"] (wildcard) granting unrestricted shell access. Many of these could be replaced with targeted safe-inputs tool definitions.

How to Implement:

safe-inputs:
  fetch-pr-data:
    description: "Fetch PR data from GitHub API"
    parameters:
      pr_number:
        type: number
        description: "PR number to fetch"
    run: |
      gh api repos/$\{\{ github.repository }}/pulls/$pr_number
    env:
      GH_TOKEN: "$\{\{ secrets.GITHUB_TOKEN }}"

Expected Benefits: Reduced attack surface, typed tool definitions, cleaner prompts, compatible with strict mode.

🔴 High Priority: Model Pinning (0% adoption)

Opportunity: Pin models for cost/quality predictability

What: No Copilot workflow specifies a model. All rely on the GH_AW_MODEL_AGENT_COPILOT repo variable (or Copilot CLI default).

Why It Matters:

Workflow behavior silently changes when default model changes
Different workflows have different cost/quality tradeoffs
Simple/cheap tasks (changeset, triage, fact) could use gpt-5.1-codex-mini to reduce costs
Complex tasks (archie, brave, repository-quality-improver) could use premium models

Contrast: 8 non-Copilot workflows already pin models:

changeset.md, chroma-issue-indexer.md, ci-doctor.md, daily-fact.md, issue-monster.md → gpt-5.1-codex-mini
poem-bot.md → gpt-5

How to Implement:

engine:
  id: copilot
  model: gpt-5.1-codex-mini   # For simple/cheap tasks
  # OR
  model: gpt-5                 # For complex reasoning tasks

Suggested Tiering:

Cheap (gpt-5.1-codex-mini): triage, fact, changeset, daily reports, simple checks
Standard (default): most agent workflows
Premium (gpt-5): complex analysis, code generation, multi-step reasoning

🔴 High Priority: AWF Network Firewall (19% adoption)

Opportunity: Add network firewalling to sensitive workflows

What: Only 14/75 Copilot workflows use the AWF sandbox (sandbox: agent: awf), leaving 61 workflows with unrestricted outbound network access.

Why It Matters:

Without AWF, a compromised workflow can exfiltrate data to any domain
Workflows with bash: ["*"] + no network sandbox are high risk
GitHub's security model assumes workflows operating on repo data should have limited network

Candidates for AWF (workflows accessing sensitive data without firewall):

daily-secrets-analysis.md — analyzes secrets; no AWF
bot-detection.md — security-sensitive; no AWF
security-compliance.md — security workflow; no AWF
daily-malicious-code-scan.md — scans for malicious code; no AWF

How to Implement:

network:
  allowed:
    - defaults
    - github         # Add only what the workflow actually needs
sandbox:
  agent: awf

🟡 Medium Priority: rate-limit (3% adoption)

Opportunity: Protect user-triggered workflows from abuse

What: Only 2/75 Copilot workflows have rate-limit configured. Many user-triggered workflows (reaction, slash_command, issue_comment) have no rate limiting.

Why It Matters:

Workflows triggered by reactions/comments can be spammed
Each LLM call costs money and runner time
rate-limit prevents abuse by limiting invocations per time window

Affected workflows (triggered by user interaction without rate-limit):

Most slash_command workflows: plan.md, craft.md, grumpy-reviewer.md, pr-nitpick-reviewer.md
Most reaction-triggered workflows

How to Implement:

rate-limit:
  max: 5        # Maximum invocations
  window: 60    # Per 60 minutes

🟡 Medium Priority: engine.args & engine.env (0-1% adoption)

Opportunity: Leverage advanced engine configuration

What: engine.args (0% usage) and engine.env (1% usage) allow custom CLI arguments and environment variable injection.

Use cases not being leveraged:

engine.args — Custom CLI flags:

engine:
  id: copilot
  args: ["--verbose"]        # Enable verbose logging for debugging

engine.env — Custom environment variables:

engine:
  id: copilot
  env:
    MY_TOOL_URL: "(myinternalapi.example.com/redacted)"
    FEATURE_FLAG: "enabled"
    DEBUG_MODE: "true"

Why It Matters: Many workflows that need custom configuration currently embed it in the prompt body (less clean, harder to maintain) or rely on GitHub Actions variables.

🟢 Low Priority: Plugins (0% adoption)

Opportunity: Explore Copilot plugin ecosystem

What: The Copilot engine has supportsPlugins: true and includes plugin installation steps, but no workflow in the repository uses plugins:.

Why It Matters:

Copilot plugins can extend the agent's capabilities beyond built-in tools
Plugin-based functionality can be versioned and reused across workflows
Reduces need for custom MCP servers for common integrations

How to Implement:

plugins:
  - name: my-plugin
    version: "1.0.0"

Note: Investigate available plugins in the Copilot ecosystem before adoption.

🟢 Low Priority: web-search in Copilot workflows

Issue: firewall-escape.md uses unsupported web-search tool

firewall-escape.md has web-search: in its tools config, but the Copilot engine has supportsWebSearch: false. This generates a compiler warning but doesn't fail. The workflow should either switch to engine: claude or engine: codex (both support web-search), or use web-fetch: instead.

4️⃣ Specific Workflow Recommendations

View Workflow-Specific Recommendations

`daily-secrets-analysis.md`

Current: No network sandbox despite analyzing sensitive secrets data
Recommended: Add sandbox: agent: awf + network.allowed: [defaults, github]
Benefit: Prevents data exfiltration from a security-sensitive workflow

`plan.md`, `craft.md`, `grumpy-reviewer.md`, `pr-nitpick-reviewer.md`

Current: slash_command triggered, no rate-limit, no strict mode
Recommended: Add rate-limit: {max: 5, window: 60} and strict: true
Benefit: Prevents prompt injection and cost abuse

`dev.md`

Current: No tools section, no network config, relies on default model
Recommended: Add explicit tools.github toolset, strict: true, consider model pin
Benefit: Explicit configuration is more maintainable than relying on defaults

`firewall-escape.md`

Current: Uses web-search tool (unsupported by Copilot)
Recommended: Replace web-search: with web-fetch: or switch engine to claude
Benefit: Eliminates compiler warning, explicit behavior

High-complexity workflows (`archie.md`, `brave.md`, `smoke-copilot.md`)

Current: No model pinning; use whatever is default
Recommended: Consider pinning to premium model explicitly
Benefit: Predictable behavior regardless of system defaults

5️⃣ Best Practice Guidelines

Based on this research, here are recommended best practices for Copilot CLI workflows:

Always set strict: true for workflows that accept user-generated content (reactions, slash commands, issue comments) to prevent prompt injection attacks.
Add rate-limit to all user-triggered workflows to prevent abuse and cost overruns.
Consider sandbox: agent: awf for workflows that handle sensitive data or don't need broad internet access. Always pair with network.allowed to explicitly declare needed domains.
Use safe-inputs instead of bash: ["*"] when you need to call external APIs or run scripts. This reduces attack surface and creates typed, documentable tools.
Pin models explicitly for workflows where cost/quality predictability matters. Use gpt-5.1-codex-mini for simple tasks and standard/premium models for complex reasoning.
Use tracker-id for workflows that create recurring issues/discussions to enable deduplication and update-in-place behavior.
Use imports for shared functionality — 44% of workflows already do this effectively with shared components in .github/workflows/shared/.

6️⃣ Action Items

Immediate Actions (quick wins):

Add strict: true to the 4 slash_command workflows missing it (plan.md, craft.md, grumpy-reviewer.md, pr-nitpick-reviewer.md)
Fix firewall-escape.md to use web-fetch: instead of web-search:
Add rate-limit to high-traffic user-triggered workflows

Short-term (this month):

Pilot safe-inputs in 2-3 workflows that currently use bash: ["*"] for API calls
Add AWF sandbox to security-sensitive workflows (daily-secrets-analysis.md, bot-detection.md, security-compliance.md)
Evaluate model pinning strategy: define tiers and apply to at least 10 representative workflows

Long-term (this quarter):

Migrate high-risk bash: ["*"] workflows to safe-inputs patterns
Establish model tiering guidelines in AGENTS.md
Investigate and document available Copilot plugins for workflow use
Add AWF sandbox to all workflows handling user-provided data

View Research Methodology

Research Methodology

Data Collection:

Scanned all 157 workflow files in .github/workflows/*.md
Identified 75 using engine: copilot via grep -l "engine: copilot"
Analyzed frontmatter configuration via pattern matching for each feature

Code Analysis:

Reviewed pkg/workflow/copilot_engine.go for engine capabilities
Reviewed pkg/workflow/copilot_engine_execution.go for CLI arg construction
Reviewed pkg/workflow/copilot_engine_tools.go for tool permission logic
Reviewed pkg/workflow/copilot_mcp.go for MCP server configuration
Cross-referenced pkg/constants/constants.go for constants and defaults
Reviewed docs/src/content/docs/reference/engines.md for documented features

Analysis Date: 2026-02-21
Copilot CLI Version: v0.0.414
Research stored in: /tmp/gh-aw/repo-memory/default/copilot-research-2026-02-21.md

References:

AI generated by Copilot CLI Deep Research Agent

expires on Feb 28, 2026, 9:19 PM UTC

2026-02-21T21:42:55Z

github-actions[bot]
bot Feb 21, 2026
Author

🤖 ARM64 smoke test agent was here! 🦾

Greetings from the land of aarch64! Your friendly neighborhood ARM64 Copilot agent swooped in, flexed its silicon muscles, built the entire gh-aw project from scratch, and left without touching a single x86 instruction. If this workflow were a gym, we'd be lifting with 64-bit wide registers. 💪

[ARM64 smoke test run 22264827050 — multi-arch and loving it]

📰 BREAKING: Report filed by Smoke Copilot ARM64

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[copilot-cli-research] Copilot CLI Deep Research - 2026-02-21 #17575

Uh oh!

{{title}}

Uh oh!

Available Engine Configuration Options

Available CLI Flags (auto-configured by compiler)

Available Sandbox Options

Available Tools

Available Network Domains (in `network.allowed`)

Opportunity: Replace bash API calls with safe-inputs tools

Opportunity: Pin models for cost/quality predictability

Opportunity: Add network firewalling to sensitive workflows

Opportunity: Protect user-triggered workflows from abuse

Opportunity: Leverage advanced engine configuration

Opportunity: Explore Copilot plugin ecosystem

Issue: firewall-escape.md uses unsupported web-search tool

`daily-secrets-analysis.md`

`plan.md`, `craft.md`, `grumpy-reviewer.md`, `pr-nitpick-reviewer.md`

`dev.md`

`firewall-escape.md`

High-complexity workflows (`archie.md`, `brave.md`, `smoke-copilot.md`)

Research Methodology

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[copilot-cli-research] Copilot CLI Deep Research - 2026-02-21 #17575

Uh oh!

github-actions[bot] bot Feb 21, 2026

Executive Summary

Critical Findings

🔴 High Priority Issues

🟡 Medium Priority Opportunities

1️⃣ Current State Analysis

Available Engine Configuration Options

Available CLI Flags (auto-configured by compiler)

Available Sandbox Options

Available Tools

Available Network Domains (in network.allowed)

2️⃣ Feature Usage Matrix

3️⃣ Missed Opportunities

Opportunity: Replace bash API calls with safe-inputs tools

Opportunity: Pin models for cost/quality predictability

Opportunity: Add network firewalling to sensitive workflows

Opportunity: Protect user-triggered workflows from abuse

Opportunity: Leverage advanced engine configuration

Opportunity: Explore Copilot plugin ecosystem

Issue: firewall-escape.md uses unsupported web-search tool

4️⃣ Specific Workflow Recommendations

daily-secrets-analysis.md

plan.md, craft.md, grumpy-reviewer.md, pr-nitpick-reviewer.md

dev.md

firewall-escape.md

High-complexity workflows (archie.md, brave.md, smoke-copilot.md)

5️⃣ Best Practice Guidelines

6️⃣ Action Items

Research Methodology

Replies: 1 comment

Uh oh!

github-actions[bot] bot Feb 21, 2026 Author

github-actions[bot]
bot Feb 21, 2026

Available Network Domains (in `network.allowed`)

`daily-secrets-analysis.md`

`plan.md`, `craft.md`, `grumpy-reviewer.md`, `pr-nitpick-reviewer.md`

`dev.md`

`firewall-escape.md`

High-complexity workflows (`archie.md`, `brave.md`, `smoke-copilot.md`)

github-actions[bot]
bot Feb 21, 2026
Author