diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 69d17e581c..00720c7694 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -1851,3 +1851,54 @@ jobs: if: always() run: | rm -rf test-workspace + + safe-outputs-conformance: + runs-on: ubuntu-latest + permissions: + contents: read + steps: + - name: Checkout code + uses: actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd # v5 + + - name: Run Safe Outputs Conformance Checker + id: conformance + continue-on-error: true + run: | + echo "## Safe Outputs Conformance Check" >> $GITHUB_STEP_SUMMARY + echo "" >> $GITHUB_STEP_SUMMARY + + # Run the conformance checker and capture output + if ./scripts/check-safe-outputs-conformance.sh > conformance-output.txt 2>&1; then + echo "✅ All conformance checks passed" >> $GITHUB_STEP_SUMMARY + EXIT_CODE=0 + else + EXIT_CODE=$? + if [ $EXIT_CODE -eq 2 ]; then + echo "⚠️ Critical conformance issues found (treated as warning)" >> $GITHUB_STEP_SUMMARY + elif [ $EXIT_CODE -eq 1 ]; then + echo "⚠️ High priority conformance issues found (treated as warning)" >> $GITHUB_STEP_SUMMARY + else + echo "⚠️ Conformance check completed with warnings" >> $GITHUB_STEP_SUMMARY + fi + fi + + echo "" >> $GITHUB_STEP_SUMMARY + echo "### Conformance Check Output" >> $GITHUB_STEP_SUMMARY + echo '```' >> $GITHUB_STEP_SUMMARY + cat conformance-output.txt >> $GITHUB_STEP_SUMMARY + echo '```' >> $GITHUB_STEP_SUMMARY + + # Also output to console for visibility + echo "=== Conformance Check Results ===" + cat conformance-output.txt + + # Always succeed (treat as warning only) + exit 0 + + - name: Upload conformance report + if: always() + uses: actions/upload-artifact@v4 + with: + name: safe-outputs-conformance-report + path: conformance-output.txt + retention-days: 7 diff --git a/docs/src/content/docs/reference/safe-outputs-specification.md b/docs/src/content/docs/reference/safe-outputs-specification.md index 1d2e5569ef..96ca47d144 100644 --- a/docs/src/content/docs/reference/safe-outputs-specification.md +++ b/docs/src/content/docs/reference/safe-outputs-specification.md @@ -7,9 +7,9 @@ sidebar: # Safe Outputs MCP Gateway Specification -**Version**: 1.8.0 +**Version**: 1.9.0 **Status**: Working Draft -**Publication Date**: 2025-02-14 +**Publication Date**: 2026-02-14 **Editor**: GitHub Agentic Workflows Team **This Version**: [safe-outputs-specification](/gh-aw/reference/safe-outputs-specification/) **Latest Published Version**: This document @@ -42,6 +42,34 @@ This specification follows World Wide Web Consortium (W3C) formatting convention --- +## Terminology + +This specification uses the following terms with precise definitions: + +**Agent**: The AI-powered process executing in an untrusted context with read-only GitHub permissions. + +**Safe Output Type**: A category of GitHub operation (e.g., `create_issue`, `add_comment`) with a corresponding MCP tool definition and handler implementation. + +**MCP Gateway**: The HTTP server accepting MCP tool invocation requests and recording operations to NDJSON format. Runs in the same context as the agent. + +**Safe Output Processor**: The privileged execution context that reads NDJSON artifacts, validates operations, and executes GitHub API calls. + +**Handler**: JavaScript implementation processing operations of a specific safe output type. + +**Validation**: Pre-execution verification of operation structure, limits, and authorization. Includes schema validation, limit enforcement, and allowlist checking. + +**Sanitization**: Content transformation pipeline removing potentially malicious patterns while preserving legitimate content. + +**Verification**: Post-compilation checking of configuration integrity through hash validation. + +**Staged Mode**: Preview execution mode where operations are simulated without creating permanent GitHub resources. + +**Temporary ID**: A placeholder identifier (format: `aw_`) used to reference not-yet-created resources. Resolved to actual resource numbers during processing. + +**Provenance**: Metadata identifying the workflow and run that created a GitHub resource. Included in footers or API metadata fields. + +--- + ## 1. Introduction ### 1.1 Motivation and Problem Statement @@ -405,6 +433,78 @@ This specification addresses five primary threat scenarios: *Residual Risk*: Misconfigured allowlists may permit unintended targets. Mitigation: principle of least privilege in configuration, periodic review. +### 3.2.6 Cross-Repository Security Model + +**Repository Reference Format** + +Target repositories MUST be specified in `owner/repo` format. Implementations MUST validate: +- Format matches regex: `^[a-zA-Z0-9_.-]+/[a-zA-Z0-9_.-]+$` +- Owner and repo components are non-empty +- No protocol prefix (https://, git://, etc.) + +**Allowlist Resolution Order** + +When evaluating cross-repository operations, implementations MUST apply these rules in order: + +1. **Extract target-repo**: Parse from operation arguments or configuration +2. **Check type-specific allowlist**: If safe output type defines `allowed-repos`: + - MUST match against this list + - Type-specific allowlist OVERRIDES global allowlist + - If match fails, REJECT with E004 +3. **Check global allowlist**: If no type-specific allowlist and `allowed-github-references` is defined: + - MUST match against this list + - If match fails, REJECT with E004 +4. **Default deny**: If no allowlists are defined: + - MUST reject cross-repository operations + - Same-repository operations are permitted + +**Matching Rules** + +- Matching is EXACT (case-sensitive) +- Wildcards (*, ?) are NOT supported +- Pattern matching is NOT supported +- Each repository MUST be explicitly listed + +**Security Properties** + +**Property SP6: Cross-Repository Containment** + +For all cross-repository operations: +``` +∀ op ∈ operations: + op.target_repo ≠ null ⇒ + (op.target_repo ∈ type_allowlist ∨ + (type_allowlist = null ∧ op.target_repo ∈ global_allowlist)) +``` + +**Property SP7: Deny-by-Default** + +Without explicit allowlist configuration: +``` +allowed_repos = null ∧ allowed_github_references = null ⇒ + ∀ op ∈ operations: op.target_repo = workflow.repository +``` + +**Example Configurations** + +```yaml +# Example 1: Type-specific allowlist (overrides global) +safe-outputs: + allowed-github-references: [owner/repo-a, owner/repo-b] + + create-issue: + allowed-repos: [owner/repo-c] # Only repo-c permitted for issues + + add-comment: + # No type-specific list, uses global: repo-a, repo-b + +# Example 2: Explicit same-repository only +safe-outputs: + create-issue: + # No allowlist = same repository only + max: 5 +``` + ### 3.3 Security Property Guarantees Conforming implementations MUST maintain these security invariants: @@ -433,6 +533,67 @@ Conforming implementations MUST maintain these security invariants: *Verification*: Code review of safe output processor. Unit tests confirming validation rejects before API calls. +#### Validation Pipeline Requirements + +Implementations MUST execute validation steps in this exact sequence for all safe output operations: + +**Stage 1: Schema Validation (REQUIRED)** +- Input: Raw MCP tool arguments +- Check: JSON schema validation against type-specific schema +- On failure: Reject immediately with E001 (INVALID_SCHEMA) error +- Output: Schema-validated operation data + +**Stage 2: Limit Enforcement (REQUIRED)** +- Input: Count of operations of each type in current batch +- Check: Compare count against configured `max` for each type +- On failure: Reject entire batch with E002 (LIMIT_EXCEEDED) error +- Output: Limit-validated operation set + +**Stage 3: Content Sanitization (REQUIRED)** +- Input: All text fields (title, body, description, etc.) +- Transform: Apply sanitization pipeline (see Section 9.2) +- On failure: Reject with E008 (SANITIZATION_FAILED) if unsafe content cannot be sanitized +- Output: Sanitized operation data + +**Stage 4: Domain Filtering (CONDITIONAL)** +- Input: All URLs in markdown links and images +- Check: Validate against `allowed-domains` if configured +- Transform: Redact unauthorized URLs +- Output: Domain-filtered operation data + +**Stage 5: Cross-Repository Validation (CONDITIONAL)** +- Input: `target-repo` parameter if present +- Check: Validate against `allowed-repos` or `allowed-github-references` +- On failure: Reject with E004 (INVALID_TARGET_REPO) +- Output: Authorized target repository + +**Stage 6: Dependency Resolution (CONDITIONAL)** +- Input: Temporary IDs, parent references +- Check: Resolve references to actual GitHub resource numbers +- On failure: Reject with E005 (MISSING_PARENT) +- Output: Fully-resolved operation data + +**Stage 7: GitHub API Invocation (EXECUTION)** +- Input: Validated, sanitized, authorized operation data +- Action: Execute GitHub API calls +- On failure: Return E007 (API_ERROR) with details + +**Requirement VL1: Sequential Execution** + +Stages MUST execute in the order specified above. A failure at any stage (1-6) MUST prevent Stage 7 from executing for that operation. + +**Requirement VL2: Atomic Validation** + +For single-operation types (max=1), validation failure MUST prevent any API calls. For batch operations, validation failure of one operation MUST NOT cause rejection of the entire batch unless it's a limit enforcement failure. + +**Requirement VL3: Error Propagation** + +Validation errors MUST include: +- Error code (E001-E008) +- Human-readable message +- Operation index (for batch operations) +- Field name (for schema validation errors) + **Property SP3: Limit Enforceability Invariant** *Statement*: For all configured max limits, implementations MUST prevent exceeding the limit. Attempts to exceed limits SHALL result in operation rejection. @@ -2644,6 +2805,146 @@ Operations requiring repository features must validate availability: Validation occurs during execution, not tool invocation. +### 9.4 Content Sanitization Pipeline + +**Applicability** + +Content sanitization MUST be applied to all user-provided text fields in safe output operations. Text fields include: +- `title` (issues, PRs, discussions, projects) +- `body` (issues, PRs, discussions, comments) +- `description` (projects, status updates) +- `comment` (review comments) + +**Sanitization Stages** + +Implementations MUST apply these transformations in order: + +**S1: Null Byte Removal** +- Remove all null bytes (`\0`, `\x00`) from strings +- Rationale: Prevents string truncation attacks + +**S2: Markdown Link Validation** +- Pattern: `[text](url)` and `` +- For each URL: + - Extract domain + - If `allowed-domains` is configured: + - Check domain against allowlist + - If not allowed: Replace with `[text]([URL redacted: unauthorized domain])` + - Log redacted URLs to `/tmp/gh-aw/safeoutputs/redacted-domains.log` + +**S3: Markdown Image Validation** +- Pattern: `![alt](url)` +- For each image URL: + - Extract domain + - If `allowed-domains` is configured: + - Check domain against allowlist + - If not allowed: Replace with `![alt]([Image URL redacted: unauthorized domain])` + +**S4: HTML Tag Filtering** (Optional, depends on field type) +- Remove potentially dangerous tags: + - `` + - `` + - ``, `` + - ``, `` +- Remove event handlers: + - `on*` attributes in HTML tags (onclick, onerror, etc.) +- Preserve safe GitHub Flavored Markdown tags: + - `
`, ``, ``, ``, `` + +**S5: Command Injection Prevention** +- Do NOT execute or interpret code blocks +- Do NOT evaluate template expressions +- Preserve code blocks verbatim (no escaping needed in markdown) + +**Excluded Content** + +The following content MUST NOT be sanitized: +- Code blocks (` ``` `) +- Inline code (`` `code` ``) +- System-generated footers +- System-generated metadata + +**Sanitization Reversibility** + +Sanitization transformations are LOSSY and NOT reversible. Original content is not preserved after sanitization. This is intentional to prevent attempts to bypass sanitization. + +**Conformance Requirement CR1: Pre-API Sanitization** + +All content MUST be sanitized BEFORE GitHub API invocation. Unsanitized content MUST NEVER be passed to GitHub APIs. + +*Verification*: Inspect handler code to confirm sanitization occurs before `octokit.*` calls. + +### 9.5 Error Code Catalog + +Implementations MUST use standardized error codes for validation and execution failures. + +**Error Code Table** + +| Code | Name | Description | When to Use | HTTP Status Equivalent | +|------|------|-------------|-------------|------------------------| +| E001 | INVALID_SCHEMA | Operation failed JSON schema validation | Input does not match type-specific schema | 400 Bad Request | +| E002 | LIMIT_EXCEEDED | Operation count exceeds configured max | Batch contains more operations than allowed | 429 Too Many Requests | +| E003 | UNAUTHORIZED_DOMAIN | URL contains non-allowlisted domain | Domain filtering rejected URL | 403 Forbidden | +| E004 | INVALID_TARGET_REPO | target-repo not in allowed-repos | Cross-repository validation failed | 403 Forbidden | +| E005 | MISSING_PARENT | Referenced parent issue/PR not found | Temporary ID or parent reference cannot be resolved | 404 Not Found | +| E006 | INVALID_LABEL | Label does not exist in repository | Label validation failed | 404 Not Found | +| E007 | API_ERROR | GitHub API returned error | GitHub API call failed | 502 Bad Gateway | +| E008 | SANITIZATION_FAILED | Content contains unsanitizable unsafe patterns | Sanitization pipeline detected unremovable threats | 422 Unprocessable Entity | +| E009 | CONFIG_HASH_MISMATCH | Configuration hash verification failed | Workflow YAML was modified after compilation | 403 Forbidden | +| E010 | RATE_LIMIT_EXCEEDED | GitHub API rate limit exceeded | Too many API calls | 429 Too Many Requests | + +**Error Message Format** + +All errors MUST conform to this JSON structure: + +```json +{ + "error": { + "code": "E002", + "name": "LIMIT_EXCEEDED", + "message": "Operation count exceeds configured limit", + "details": { + "type": "create_issue", + "attempted": 5, + "max": 3, + "operation_index": 3 + }, + "timestamp": "2026-02-14T16:39:20.948Z", + "workflow_run": "https://github.com/owner/repo/actions/runs/12345" + } +} +``` + +**Required Fields**: +- `code`: Error code from table above (E001-E010) +- `name`: Error name from table above +- `message`: Human-readable description +- `timestamp`: ISO 8601 timestamp + +**Optional Fields**: +- `details`: Type-specific error context (operation_index, field names, etc.) +- `workflow_run`: URL to workflow run for provenance + +**Error Handling Requirements** + +**Requirement EH1: Early Failure Detection** + +Validation errors (E001-E006) MUST be detected before any GitHub API calls are made. + +**Requirement EH2: Clear Error Messages** + +Error messages MUST: +- Clearly state what went wrong +- Include enough context to debug (field names, values) +- Suggest remediation when possible + +**Requirement EH3: Error Logging** + +All errors MUST be logged to: +- GitHub Actions step output (visible in workflow run) +- Job summary (visible in workflow run summary) +- STDERR (for local development) + --- ## 10. Execution Guarantees @@ -2679,6 +2980,108 @@ Operations execute in: **Error Reporting**: All errors collected; execution summary reports per-type results. +### 10.5 Edge Case Behavior + +This section defines required behavior for unusual or boundary conditions. + +**Empty Operations** + +*Scenario*: NDJSON artifact contains zero operations + +*Behavior*: +- Safe output job MUST succeed (exit code 0) +- Job summary SHOULD display: "✅ No operations to process" +- No GitHub API calls are made +- No errors are raised + +*Rationale*: Empty operations are valid (agent may determine no action is needed). + +**Zero Max Limit** + +*Scenario*: Configuration specifies `max: 0` for a safe output type + +*Behavior*: +- Type is DISABLED (MCP tool is not registered) +- Attempts to invoke disabled type MUST return MCP error: + ```json + {"error": {"code": -32601, "message": "Method not found"}} + ``` +- No configuration is generated for disabled types + +*Rationale*: `max: 0` is an explicit disable signal. + +**API Rate Limiting** + +*Scenario*: GitHub API returns 429 (rate limit exceeded) or 403 with X-RateLimit-Remaining: 0 + +*Behavior*: +- Processor MUST retry with exponential backoff: + - 1st retry: After 60 seconds + - 2nd retry: After 120 seconds + - 3rd retry: After 240 seconds +- After 3 retries, MUST fail with E010 error +- Error details MUST include rate limit reset time from `X-RateLimit-Reset` header + +*Rationale*: Transient rate limits should not fail workflows unnecessarily. + +**Workflow Cancellation** + +*Scenario*: Workflow is manually cancelled during agent execution + +*Behavior*: +- Safe output job MUST NOT execute if artifact upload was interrupted +- Partial NDJSON artifacts MUST NOT be processed +- GitHub Actions automatically handles cleanup +- No additional logic required in handlers + +*Rationale*: GitHub Actions cancellation is handled at platform level. + +**Concurrent Workflow Runs** + +*Scenario*: Multiple workflow runs execute concurrently for the same workflow + +*Behavior*: +- Each run operates independently +- Max limits are per-run (NOT global across runs) +- No coordination or locking between runs +- Operations in separate runs do NOT affect each other's limits + +*Rationale*: Simplicity and avoiding distributed coordination complexity. + +**Malformed NDJSON** + +*Scenario*: NDJSON artifact contains invalid JSON on one or more lines + +*Behavior*: +- Parser MUST skip invalid lines with warning +- Valid lines MUST be processed +- Job summary MUST show: "⚠️ Skipped N malformed entries" +- Invalid lines MUST be logged to STDERR + +*Rationale*: Partial failure should not prevent valid operations from executing. + +**Missing Artifact** + +*Scenario*: Safe output job cannot download artifact (artifact not found) + +*Behavior*: +- Job MUST fail with clear error message +- Error MUST suggest checking agent job completion +- Exit code MUST be non-zero + +*Rationale*: Missing artifact indicates upstream failure that must be addressed. + +**Duplicate Temporary IDs** + +*Scenario*: Multiple operations use the same `temporary_id` + +*Behavior*: +- First operation using the ID succeeds and establishes mapping +- Subsequent operations using the same ID MUST reference the first operation's result +- If this creates ambiguity (e.g., two issues both want to be "aw_parent"), MUST reject with E005 + +*Rationale*: Deterministic behavior prevents confusion. + --- ## Appendix A: Conformance Checklist @@ -2780,8 +3183,226 @@ Detailed threat analysis and mitigation effectiveness assessment for all five pr --- +## Appendix G: Configuration Patterns + +This appendix provides common configuration patterns for safe outputs. + +### Pattern 1: Simple Issue Tracking + +Basic configuration for automated issue creation: + +```yaml +safe-outputs: + create-issue: + max: 1 + labels: [automated] +``` + +**Use case**: Single automated issue per workflow run with consistent labeling. + +### Pattern 2: Multi-Type with Global Footer + +Configuration with multiple output types sharing global settings: + +```yaml +safe-outputs: + footer: true # Applied to all types + + create-issue: + max: 3 + labels: [bug, automated] + + add-comment: + max: 2 + hide-older-comments: true +``` + +**Use case**: Workflow creating multiple issues and comments with attribution footers. + +### Pattern 3: Cross-Repository Operations + +Secure cross-repository issue creation: + +```yaml +safe-outputs: + allowed-github-references: + - owner/repo-a + - owner/repo-b + + create-issue: + max: 5 + target-repo: owner/repo-a +``` + +**Use case**: Creating issues in a central tracking repository from multiple workflow repositories. + +**Security note**: Explicit allowlist prevents unauthorized repository targeting. + +### Pattern 4: Staged Mode Development + +Safe testing in preview mode: + +```yaml +safe-outputs: + staged: true # Enable preview mode globally + + create-issue: + max: 10 # Safe to set high in staged mode + + add-comment: + max: 5 +``` + +**Use case**: Testing workflow behavior without creating real GitHub resources. + +**Workflow**: Test with `staged: true`, verify previews, then deploy with `staged: false`. + +### Pattern 5: Type-Specific Allowlists + +Fine-grained cross-repository control: + +```yaml +safe-outputs: + allowed-github-references: [owner/repo-a, owner/repo-b] + + create-issue: + allowed-repos: [owner/repo-c] # Overrides global + max: 3 + + add-comment: + # No type-specific list, uses global: repo-a, repo-b + max: 2 +``` + +**Use case**: Different safe output types target different repositories. + +**Security note**: Type-specific allowlists override global allowlists. + +### Pattern 6: Domain Filtering for Security + +Restrict URLs in safe output content: + +```yaml +safe-outputs: + allowed-domains: + - github.com + - "*.github.io" + - docs.github.com + + create-issue: + max: 5 +``` + +**Use case**: Prevent agents from including unauthorized URLs in created content. + +**Effect**: URLs to non-allowlisted domains are redacted during sanitization. + +### Pattern 7: Temporary Resource Cleanup + +Auto-close temporary issues: + +```yaml +safe-outputs: + create-issue: + max: 10 + expires: 7 # Auto-close after 7 days + labels: [temporary, automated] +``` + +**Use case**: Issues for transient notifications that should auto-clean. + +**Implementation**: Scheduled workflow checks issue age and closes expired issues. + +### Pattern 8: Review Comment Workflow + +Pull request review automation: + +```yaml +safe-outputs: + create-pr-review-comment: + max: 20 + + submit-pr-review: + max: 1 + + resolve-pr-review-thread: + max: 10 +``` + +**Use case**: Automated code review with multiple comments and thread resolution. + +**Workflow**: Create review comments, submit bundled review, resolve addressed threads. + +### Pattern 9: Project Management + +Automated project creation and updates: + +```yaml +safe-outputs: + create-project: + max: 1 + + update-project: + max: 5 + + create-project-status-update: + max: 3 +``` + +**Use case**: Creating and maintaining project boards automatically. + +### Pattern 10: Grouped Issues with Parent + +Create related issues under a parent: + +```yaml +safe-outputs: + create-issue: + max: 10 + group: true +``` + +**Use case**: Workflow creates parent issue and multiple sub-issues linked via tasklists. + +**Effect**: First issue becomes parent, subsequent issues link to it. + +### Best Practices + +**Start Conservative**: +- Begin with low `max` values +- Enable `staged: true` for testing +- Use explicit `allowed-repos` lists + +**Use Domain Filtering**: +- Always configure `allowed-domains` when agents process external input +- Include only trusted domains + +**Enable Footers**: +- Keep `footer: true` (default) for transparency +- Only disable when absolutely necessary + +**Temporary Resources**: +- Use `expires` for transient issues +- Clean up with `close-older-issues` for superseded content + +**Cross-Repository Security**: +- Use type-specific `allowed-repos` for fine-grained control +- Prefer explicit lists over broad permissions + +--- + ## Appendix F: Document History +**Version 1.9.0** (2026-02-14): +- Added comprehensive validation pipeline ordering (7 stages) +- Added cross-repository security model with explicit allowlist rules +- Added content sanitization pipeline specification (5 stages) +- Added standardized error code catalog (E001-E010) +- Added edge case behavior specifications +- Added terminology section for consistency +- Enhanced security properties (SP6, SP7) +- Improved requirements testability + **Version 1.8.0** (2025-02-14): - Initial W3C-style specification release - Complete security model documentation diff --git a/scripts/README-conformance.md b/scripts/README-conformance.md new file mode 100644 index 0000000000..d134582cf1 --- /dev/null +++ b/scripts/README-conformance.md @@ -0,0 +1,135 @@ +# Safe Outputs Specification Conformance Checker + +This directory contains automated tools for verifying conformance with the Safe Outputs MCP Gateway Specification. + +## Overview + +The conformance checker validates implementations against normative requirements in the [Safe Outputs Specification](/docs/src/content/docs/reference/safe-outputs-specification.md). It runs automated checks across four categories: + +- **SEC**: Security requirements (privilege separation, validation ordering, sanitization) +- **USE**: Usability requirements (error codes, footers, staged mode formatting) +- **REQ**: Requirements compliance (RFC 2119 keywords, testability, completeness) +- **IMP**: Implementation requirements (handler registration, permission computation, schema validation) + +## Usage + +### Run All Checks + +```bash +./scripts/check-safe-outputs-conformance.sh +``` + +### Exit Codes + +- `0`: All checks passed or only low/medium severity warnings +- `1`: High severity failures detected +- `2`: Critical severity failures detected + +### Output Format + +The checker uses color-coded output: + +- 🔴 **[CRITICAL]**: Must be fixed immediately (security violations) +- 🔴 **[HIGH]**: Should be fixed soon (significant issues) +- 🟡 **[MEDIUM]**: Should be addressed (quality issues) +- 🔵 **[LOW]**: Nice to have (minor improvements) +- 🟢 **[PASS]**: Check passed + +## Checks Implemented + +### Security Checks + +| ID | Name | Severity | Description | +|----|------|----------|-------------| +| SEC-001 | Privilege Separation Enforcement | CRITICAL | Verifies agent jobs lack write permissions | +| SEC-002 | Validation Before API Calls | CRITICAL | Ensures validation occurs before GitHub API calls | +| SEC-003 | Max Limit Enforcement | MEDIUM | Verifies handlers enforce max operation limits | +| SEC-004 | Content Sanitization Required | MEDIUM | Checks that content sanitization is applied | +| SEC-005 | Cross-Repository Validation | HIGH | Validates cross-repo operations check allowlists | + +### Usability Checks + +| ID | Name | Severity | Description | +|----|------|----------|-------------| +| USE-001 | Error Code Standardization | LOW | Verifies handlers use standardized error codes | +| USE-002 | Footer Attribution Required | LOW | Checks footers are added when configured | +| USE-003 | Staged Mode Preview Format | LOW | Validates staged mode uses 🎭 emoji format | + +### Requirements Checks + +| ID | Name | Severity | Description | +|----|------|----------|-------------| +| REQ-001 | RFC 2119 Keyword Usage | MEDIUM | Verifies normative sections use MUST/SHOULD/MAY | +| REQ-002 | Safe Output Type Completeness | MEDIUM | Checks each type has all required documentation | +| REQ-003 | Verification Method Specification | LOW | Ensures requirements include verification methods | + +### Implementation Checks + +| ID | Name | Severity | Description | +|----|------|----------|-------------| +| IMP-001 | Handler Registration Completeness | HIGH | Verifies all standard handlers exist | +| IMP-002 | Permission Computation Accuracy | HIGH | Checks permission computation function exists | +| IMP-003 | Schema Validation Consistency | MEDIUM | Validates schema generation is implemented | + +## Adding New Checks + +To add a new conformance check: + +1. Add a check function following the naming pattern `check__` +2. Use the logging functions: `log_critical`, `log_high`, `log_medium`, `log_low`, `log_pass` +3. Increment the appropriate failure counter when a check fails +4. Add a call to your check function in the main script flow +5. Document the check in this README + +Example: + +```bash +check_new_requirement() { + local failed=0 + + # Your check logic here + if [ condition_not_met ]; then + log_high "CHECK-ID: Description of failure" + failed=1 + fi + + if [ $failed -eq 0 ]; then + log_pass "CHECK-ID: Check passed successfully" + fi +} +``` + +## Related Documentation + +- [Safe Outputs Specification](/docs/src/content/docs/reference/safe-outputs-specification.md) - Complete normative specification +- [Specification Review Findings](/docs/spec-review-findings.md) - Detailed security, usability, and requirements review +- [Specification Improvements Plan](/docs/spec-improvements-plan.md) - Roadmap for addressing findings + +## CI Integration + +The conformance checker can be integrated into CI/CD pipelines: + +```yaml +# .github/workflows/conformance.yml +name: Conformance Check +on: [push, pull_request] + +jobs: + conformance: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Run conformance checks + run: ./scripts/check-safe-outputs-conformance.sh +``` + +## Maintenance + +The conformance checker should be updated when: + +- New safe output types are added +- Specification requirements change +- New security properties are added +- Implementation patterns evolve + +Regular maintenance ensures the checker stays aligned with the specification and implementation. diff --git a/scripts/check-safe-outputs-conformance.sh b/scripts/check-safe-outputs-conformance.sh new file mode 100755 index 0000000000..78ee265fa0 --- /dev/null +++ b/scripts/check-safe-outputs-conformance.sh @@ -0,0 +1,402 @@ +#!/bin/bash +# Safe Outputs Specification Conformance Checker +# This script implements automated checks for the Safe Outputs specification +# Based on findings from docs/spec-review-findings.md + +set -euo pipefail + +# Color codes for output +RED='\033[0;31m' +YELLOW='\033[1;33m' +GREEN='\033[0;32m' +BLUE='\033[0;34m' +NC='\033[0m' # No Color + +# Counters +CRITICAL_FAILURES=0 +HIGH_FAILURES=0 +MEDIUM_FAILURES=0 +LOW_FAILURES=0 + +# Logging functions +log_critical() { + echo -e "${RED}[CRITICAL]${NC} $1" + ((CRITICAL_FAILURES++)) +} + +log_high() { + echo -e "${RED}[HIGH]${NC} $1" + ((HIGH_FAILURES++)) +} + +log_medium() { + echo -e "${YELLOW}[MEDIUM]${NC} $1" + ((MEDIUM_FAILURES++)) +} + +log_low() { + echo -e "${BLUE}[LOW]${NC} $1" + ((LOW_FAILURES++)) +} + +log_pass() { + echo -e "${GREEN}[PASS]${NC} $1" +} + +log_info() { + echo -e "${BLUE}[INFO]${NC} $1" +} + +# Change to repo root +cd "$(dirname "$0")/.." + +echo "==================================================" +echo "Safe Outputs Specification Conformance Checker" +echo "==================================================" +echo "" + +# SEC-001: Privilege Separation Enforcement +echo "Running SEC-001: Privilege Separation Enforcement..." +check_privilege_separation() { + local failed=0 + + # Find all compiled workflow files + find .github/workflows -name "*.lock.yml" | while read -r workflow; do + # Check if agent job has write permissions + if grep -A 50 "^jobs:" "$workflow" | grep -A 20 "^\s*agent:" | grep -qE "issues:\s*write|pull-requests:\s*write|contents:\s*write"; then + log_critical "SEC-001: Agent job in $workflow has write permissions" + failed=1 + fi + done + + if [ $failed -eq 0 ]; then + log_pass "SEC-001: All agent jobs properly lack write permissions" + fi +} +check_privilege_separation + +# SEC-002: Validation Before API Calls +echo "Running SEC-002: Validation Before API Calls..." +check_validation_ordering() { + local failed=0 + + for handler in actions/setup/js/*.cjs; do + # Skip test files + [[ "$handler" =~ test ]] && continue + [[ "$handler" =~ parse ]] && continue + [[ "$handler" =~ buffer ]] && continue + + # Check if handler has API calls + if grep -q "octokit\." "$handler"; then + # Check if validation appears before API calls + if ! awk '/octokit\./{api_line=NR} /validate|sanitize|enforceLimit/{if(NR/dev/null; then + log_critical "SEC-002: $handler may have API calls before validation" + failed=1 + fi + fi + done + + if [ $failed -eq 0 ]; then + log_pass "SEC-002: All handlers validate before API calls" + fi +} +check_validation_ordering + +# SEC-003: Max Limit Enforcement +echo "Running SEC-003: Max Limit Enforcement..." +check_max_limits() { + local failed=0 + + for handler in actions/setup/js/*.cjs; do + # Skip test and utility files + [[ "$handler" =~ (test|parse|buffer|factory) ]] && continue + + # Check if handler enforces max limits + if ! grep -q "\.length.*>.*\.max\|enforceMaxLimit\|checkLimit\|max.*exceeded" "$handler"; then + log_medium "SEC-003: $handler may not enforce max limits" + failed=1 + fi + done + + if [ $failed -eq 0 ]; then + log_pass "SEC-003: All handlers enforce max limits" + fi +} +check_max_limits + +# SEC-004: Content Sanitization Required +echo "Running SEC-004: Content Sanitization Required..." +check_sanitization() { + local failed=0 + + for handler in actions/setup/js/*.cjs; do + # Skip test and utility files + [[ "$handler" =~ (test|parse|buffer) ]] && continue + + # Check if handler has body/content fields + if grep -q "\"body\"\|body:" "$handler"; then + # Check for sanitization + if ! grep -q "sanitize\|stripHTML\|escapeMarkdown\|cleanContent" "$handler"; then + log_medium "SEC-004: $handler has body field but no sanitization" + failed=1 + fi + fi + done + + if [ $failed -eq 0 ]; then + log_pass "SEC-004: All handlers properly sanitize content" + fi +} +check_sanitization + +# SEC-005: Cross-Repository Validation +echo "Running SEC-005: Cross-Repository Validation..." +check_cross_repo() { + local failed=0 + + for handler in actions/setup/js/*.cjs; do + # Skip test files + [[ "$handler" =~ test ]] && continue + + # Check if handler supports target-repo + if grep -q "target.*[Rr]epo\|targetRepo" "$handler"; then + # Check for allowlist validation + if ! grep -q "allowed.*[Rr]epos\|validateTargetRepo\|checkAllowedRepo" "$handler"; then + log_high "SEC-005: $handler supports target-repo but lacks allowlist check" + failed=1 + fi + fi + done + + if [ $failed -eq 0 ]; then + log_pass "SEC-005: All cross-repo handlers validate allowlists" + fi +} +check_cross_repo + +# USE-001: Error Code Standardization +echo "Running USE-001: Error Code Standardization..." +check_error_codes() { + local failed=0 + + for handler in actions/setup/js/*.cjs; do + # Skip test files + [[ "$handler" =~ test ]] && continue + + # Check if handler throws errors + if grep -q "throw.*Error\|core\.setFailed" "$handler"; then + # Check for standardized error codes + if ! grep -qE "E[0-9]{3}|ERROR_|ERR_" "$handler"; then + log_low "USE-001: $handler may not use standardized error codes" + failed=1 + fi + fi + done + + if [ $failed -eq 0 ]; then + log_pass "USE-001: All handlers use standardized error codes" + fi +} +check_error_codes + +# USE-002: Footer Attribution Required +echo "Running USE-002: Footer Attribution Required..." +check_footers() { + local failed=0 + + # Check handlers that create issues/PRs/discussions + for handler in actions/setup/js/{create_issue,create_pull_request,create_discussion,add_comment}.cjs; do + [ ! -f "$handler" ] && continue + + # Check if handler adds footers + if ! grep -q "footer\|addFooter\|attribution\|AI generated" "$handler"; then + log_low "USE-002: $handler may not add footer attribution" + failed=1 + fi + done + + if [ $failed -eq 0 ]; then + log_pass "USE-002: All handlers add footer attribution when configured" + fi +} +check_footers + +# USE-003: Staged Mode Preview Format +echo "Running USE-003: Staged Mode Preview Format..." +check_staged_mode() { + local failed=0 + + for handler in actions/setup/js/*.cjs; do + # Skip test files + [[ "$handler" =~ test ]] && continue + + # Check if handler has staged mode + if grep -q "staged.*true\|isStaged\|GH_AW_SAFE_OUTPUTS_STAGED" "$handler"; then + # Check for emoji in preview + if ! grep -q "🎭\|Staged Mode.*Preview" "$handler"; then + log_low "USE-003: $handler has staged mode but missing 🎭 emoji" + failed=1 + fi + fi + done + + if [ $failed -eq 0 ]; then + log_pass "USE-003: All handlers use correct staged mode format" + fi +} +check_staged_mode + +# REQ-001: RFC 2119 Keyword Usage +echo "Running REQ-001: RFC 2119 Keyword Usage..." +check_rfc2119() { + local spec_file="docs/src/content/docs/reference/safe-outputs-specification.md" + local failed=0 + + # Check key sections have RFC 2119 keywords + for section in "Security Architecture" "Configuration Semantics" "Execution Guarantees"; do + if ! grep -A 200 "## .*$section" "$spec_file" 2>/dev/null | grep -q "MUST\|SHALL\|SHOULD\|MAY"; then + log_medium "REQ-001: Section '$section' may lack RFC 2119 keywords" + failed=1 + fi + done + + if [ $failed -eq 0 ]; then + log_pass "REQ-001: Normative sections use RFC 2119 keywords" + fi +} +check_rfc2119 + +# REQ-002: Safe Output Type Completeness +echo "Running REQ-002: Safe Output Type Completeness..." +check_type_completeness() { + local spec_file="docs/src/content/docs/reference/safe-outputs-specification.md" + local failed=0 + + # Extract type names + grep "^#### Type:" "$spec_file" 2>/dev/null | sed 's/^#### Type: //' | head -10 | while read -r type_name; do + sections_found=0 + + # Check for required sections + for section in "MCP Tool Schema" "Operational Semantics" "Configuration Parameters" "Security Requirements" "Required Permissions"; do + if grep -A 200 "^#### Type: $type_name" "$spec_file" 2>/dev/null | grep -q "**$section**"; then + ((sections_found++)) + fi + done + + if [ $sections_found -lt 5 ]; then + log_medium "REQ-002: Type '$type_name' has only $sections_found/5 required sections" + failed=1 + fi + done + + if [ $failed -eq 0 ]; then + log_pass "REQ-002: All safe output types have complete documentation" + fi +} +check_type_completeness + +# REQ-003: Verification Method Specification +echo "Running REQ-003: Verification Method Specification..." +check_verification_methods() { + local spec_file="docs/src/content/docs/reference/safe-outputs-specification.md" + local failed=0 + + # Check key requirements have verification methods + for req in "AR1" "AR2" "AR3" "SP1" "SP2" "SP3"; do + if ! grep -A 30 "**Requirement $req:\|**Property $req:" "$spec_file" 2>/dev/null | grep -q "Verification:\|Formal Definition:"; then + log_low "REQ-003: Requirement $req may lack verification method" + failed=1 + fi + done + + if [ $failed -eq 0 ]; then + log_pass "REQ-003: All requirements have verification methods" + fi +} +check_verification_methods + +# IMP-001: Handler Registration Completeness +echo "Running IMP-001: Handler Registration Completeness..." +check_handler_registration() { + local failed=0 + + # Check if standard handlers exist + for type in create_issue add_comment close_issue update_issue add_labels remove_labels; do + handler_file="actions/setup/js/${type}.cjs" + if [ ! -f "$handler_file" ]; then + log_high "IMP-001: Missing handler file $handler_file" + failed=1 + fi + done + + if [ $failed -eq 0 ]; then + log_pass "IMP-001: All standard handlers are registered" + fi +} +check_handler_registration + +# IMP-002: Permission Computation Accuracy +echo "Running IMP-002: Permission Computation Accuracy..." +check_permission_computation() { + # Check if permission computation file exists and is well-formed + if [ -f "pkg/workflow/safe_outputs_permissions.go" ]; then + # Basic check that it defines computePermissionsForSafeOutputs + if grep -q "computePermissionsForSafeOutputs" "pkg/workflow/safe_outputs_permissions.go"; then + log_pass "IMP-002: Permission computation function exists" + else + log_high "IMP-002: Permission computation function not found" + fi + else + log_high "IMP-002: Permission computation file missing" + fi +} +check_permission_computation + +# IMP-003: Schema Validation Consistency +echo "Running IMP-003: Schema Validation Consistency..." +check_schema_consistency() { + local failed=0 + + # Check if safe outputs config Go file exists + if [ -f "pkg/workflow/safe_outputs_config.go" ]; then + # Check for schema generation functions + if ! grep -q "generateSchema\|buildSchema\|toolSchema" "pkg/workflow/safe_outputs_config.go"; then + log_medium "IMP-003: Schema generation functions may be missing" + failed=1 + fi + else + log_medium "IMP-003: Safe outputs config file missing" + failed=1 + fi + + if [ $failed -eq 0 ]; then + log_pass "IMP-003: Schema generation is implemented" + fi +} +check_schema_consistency + +# Summary +echo "" +echo "==================================================" +echo "Conformance Check Summary" +echo "==================================================" +echo -e "${RED}Critical Failures:${NC} $CRITICAL_FAILURES" +echo -e "${RED}High Failures:${NC} $HIGH_FAILURES" +echo -e "${YELLOW}Medium Failures:${NC} $MEDIUM_FAILURES" +echo -e "${BLUE}Low Failures:${NC} $LOW_FAILURES" +echo "" + +# Exit code based on failures +if [ $CRITICAL_FAILURES -gt 0 ]; then + echo -e "${RED}FAIL:${NC} Critical conformance issues found" + exit 2 +elif [ $HIGH_FAILURES -gt 0 ]; then + echo -e "${RED}FAIL:${NC} High priority conformance issues found" + exit 1 +elif [ $MEDIUM_FAILURES -gt 0 ]; then + echo -e "${YELLOW}WARN:${NC} Medium priority conformance issues found" + exit 0 +else + echo -e "${GREEN}PASS:${NC} All checks passed" + exit 0 +fi diff --git a/scripts/spec-review-findings.md b/scripts/spec-review-findings.md new file mode 100644 index 0000000000..b3d6dc2998 --- /dev/null +++ b/scripts/spec-review-findings.md @@ -0,0 +1,1061 @@ +# Safe Outputs Specification Review Findings + +**Date**: 2026-02-14 +**Specification**: [Safe Outputs MCP Gateway Specification v1.8.0](/docs/src/content/docs/reference/safe-outputs-specification.md) +**Commit**: [a5b6606](https://github.com/github/gh-aw/commit/a5b6606aead2b2f2c3c53a46da1d1fe88f5ee583) +**Reviewer**: Automated Security, Usability, and Requirements Review + +## Executive Summary + +This document presents findings from a comprehensive review of the Safe Outputs MCP Gateway Specification from three perspectives: **security**, **usability**, and **requirements compliance**. The specification establishes a W3C-style normative framework for secure AI-to-GitHub operation translation. + +**Overall Assessment**: The specification is **well-structured** with strong security foundations and clear architectural separation. However, several areas require clarification and enhancement to enable effective automated rule encoding and conformance testing. + +### Key Strengths +- ✅ Strong security architecture with privilege separation +- ✅ Clear RFC 2119 requirement levels +- ✅ Comprehensive threat model with 5 identified threats +- ✅ Detailed safe output type catalog (36 types documented) +- ✅ Complete permission documentation for GitHub Actions Token and GitHub App + +### Areas Requiring Improvement +- ⚠️ Ambiguous enforcement requirements for some security properties +- ⚠️ Missing automated test specifications +- ⚠️ Inconsistent normative language in some sections +- ⚠️ Limited guidance on error handling edge cases +- ⚠️ Insufficient detail on cross-repository validation rules + +--- + +## 1. Security Review Findings + +### 1.1 Critical Security Issues + +#### **Finding S1: Ambiguous Validation Ordering Requirements** + +**Location**: Section 3.3, Property SP2 (Validation Precedence Invariant) + +**Issue**: The specification states "validation logic MUST execute before any GitHub API invocation" but does not specify: +1. What constitutes "validation logic" (schema validation only, or also sanitization, limit checks, etc.)? +2. Whether partial validation results can trigger API calls +3. How to handle validation dependencies (e.g., validating parent issue exists before creating sub-issue) + +**Security Impact**: **MEDIUM** - Unclear validation ordering could lead to race conditions or partial execution states where some validations are bypassed. + +**Recommendation**: +```markdown +Add explicit validation pipeline specification: + +**Validation Pipeline Ordering**: + +Implementations MUST execute validation in this exact order: + +1. **Schema Validation** (MUST reject invalid JSON structure before proceeding) +2. **Limit Enforcement** (MUST reject operations exceeding max limits) +3. **Content Sanitization** (MUST sanitize all text fields) +4. **Domain Filtering** (MUST apply allowed-domains if configured) +5. **Cross-Repository Validation** (MUST validate target-repo against allowed-repos) +6. **Dependency Resolution** (MUST resolve temporary IDs and parent references) +7. **GitHub API Invocation** (Only after steps 1-6 complete successfully) + +Any failure in steps 1-6 MUST prevent step 7 from executing. +``` + +**Automated Rule**: +```yaml +# validation-ordering-check.yml +rules: + - id: validation-before-api + pattern: | + Must verify that all safe output handler implementations: + 1. Perform schema validation before API calls + 2. Enforce limits before API calls + 3. Sanitize content before API calls + check: | + grep -A 50 "function.*Handler" actions/setup/js/*.cjs | \ + grep -B 5 "octokit\." | \ + grep -q "validateSchema\|enforceLimit\|sanitize" + severity: CRITICAL +``` + +--- + +#### **Finding S2: Insufficient Cross-Repository Security Model** + +**Location**: Section 3.2, Threat T5 (Cross-Repository Privilege Escalation) + +**Issue**: The specification describes `allowed-github-references` and per-type `allowed-repos` but does not specify: +1. How these interact when both are configured +2. Whether allowlist matching is exact or supports patterns +3. How to handle organization-wide or wildcard allowlists securely +4. What happens when target-repo is specified but not in allowed-repos + +**Security Impact**: **HIGH** - Ambiguous cross-repository rules could allow unauthorized repository targeting through configuration misunderstanding. + +**Recommendation**: +```markdown +Add Cross-Repository Security Model section: + +**Cross-Repository Validation Rules**: + +When `target-repo` is specified in a safe output operation: + +1. **Extract Repository Reference**: Parse `target-repo` as `owner/repo` format +2. **Validate Format**: MUST match regex `^[a-zA-Z0-9_.-]+/[a-zA-Z0-9_.-]+$` +3. **Check Global Allowlist**: If `allowed-github-references` is non-empty: + - MUST check if `target-repo` matches any entry + - Match is EXACT (no wildcards, no pattern matching) +4. **Check Type-Specific Allowlist**: If safe output type has `allowed-repos`: + - MUST check if `target-repo` matches any entry + - Type-specific allowlist OVERRIDES global allowlist (does not merge) +5. **Deny by Default**: If neither allowlist permits the repo, MUST reject operation + +**Wildcard Rules**: Wildcards (*, ?) are NOT supported in allowlists. Each repository MUST be explicitly listed. + +**Organization-Wide**: To allow all repos in an organization: +- List repositories explicitly, OR +- Use GitHub App permissions with organization-level access +``` + +**Automated Rule**: +```yaml +# cross-repo-validation-check.yml +rules: + - id: cross-repo-allowlist-enforcement + pattern: | + All handlers with target-repo support must validate against allowlist + check: | + for handler in create_issue.cjs add_comment.cjs; do + grep -q "validateTargetRepo\|checkAllowedRepos" "actions/setup/js/$handler" || exit 1 + done + severity: HIGH +``` + +--- + +#### **Finding S3: Incomplete Content Sanitization Specification** + +**Location**: Section 9 (Content Integrity Mechanisms) + +**Issue**: While sanitization is mentioned throughout, the specification does not provide: +1. Exact definition of what "sanitization" entails +2. List of prohibited patterns or characters +3. Handling of markdown-specific exploits (e.g., image injection, HTML in markdown) +4. Whether sanitization is reversible or lossy + +**Security Impact**: **MEDIUM** - Inconsistent sanitization implementations could lead to content injection vulnerabilities. + +**Recommendation**: +```markdown +Add Sanitization Requirements section: + +**Content Sanitization Pipeline**: + +Implementations MUST apply these sanitization transforms to all user-provided text fields: + +1. **Markdown Link Validation**: + - Extract all markdown links: `[text](url)` + - Validate URL against `allowed-domains` if configured + - Redact unauthorized URLs: `[text]([URL redacted: unauthorized domain])` + +2. **HTML Tag Stripping** (for fields that don't support HTML): + - Remove `