diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 69d17e581c..00720c7694 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -1851,3 +1851,54 @@ jobs:
         if: always()
         run: |
           rm -rf test-workspace
+
+  safe-outputs-conformance:
+    runs-on: ubuntu-latest
+    permissions:
+      contents: read
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@93cb6efe18208431cddfb8368fd83d5badbf9bfd # v5
+
+      - name: Run Safe Outputs Conformance Checker
+        id: conformance
+        continue-on-error: true
+        run: |
+          echo "## Safe Outputs Conformance Check" >> $GITHUB_STEP_SUMMARY
+          echo "" >> $GITHUB_STEP_SUMMARY
+          
+          # Run the conformance checker and capture output
+          if ./scripts/check-safe-outputs-conformance.sh > conformance-output.txt 2>&1; then
+            echo "✅ All conformance checks passed" >> $GITHUB_STEP_SUMMARY
+            EXIT_CODE=0
+          else
+            EXIT_CODE=$?
+            if [ $EXIT_CODE -eq 2 ]; then
+              echo "⚠️ Critical conformance issues found (treated as warning)" >> $GITHUB_STEP_SUMMARY
+            elif [ $EXIT_CODE -eq 1 ]; then
+              echo "⚠️ High priority conformance issues found (treated as warning)" >> $GITHUB_STEP_SUMMARY
+            else
+              echo "⚠️ Conformance check completed with warnings" >> $GITHUB_STEP_SUMMARY
+            fi
+          fi
+          
+          echo "" >> $GITHUB_STEP_SUMMARY
+          echo "### Conformance Check Output" >> $GITHUB_STEP_SUMMARY
+          echo '```' >> $GITHUB_STEP_SUMMARY
+          cat conformance-output.txt >> $GITHUB_STEP_SUMMARY
+          echo '```' >> $GITHUB_STEP_SUMMARY
+          
+          # Also output to console for visibility
+          echo "=== Conformance Check Results ==="
+          cat conformance-output.txt
+          
+          # Always succeed (treat as warning only)
+          exit 0
+
+      - name: Upload conformance report
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: safe-outputs-conformance-report
+          path: conformance-output.txt
+          retention-days: 7
diff --git a/docs/src/content/docs/reference/safe-outputs-specification.md b/docs/src/content/docs/reference/safe-outputs-specification.md
index 1d2e5569ef..96ca47d144 100644
--- a/docs/src/content/docs/reference/safe-outputs-specification.md
+++ b/docs/src/content/docs/reference/safe-outputs-specification.md
@@ -7,9 +7,9 @@ sidebar:
 
 # Safe Outputs MCP Gateway Specification
 
-**Version**: 1.8.0  
+**Version**: 1.9.0  
 **Status**: Working Draft  
-**Publication Date**: 2025-02-14  
+**Publication Date**: 2026-02-14  
 **Editor**: GitHub Agentic Workflows Team  
 **This Version**: [safe-outputs-specification](/gh-aw/reference/safe-outputs-specification/)  
 **Latest Published Version**: This document
@@ -42,6 +42,34 @@ This specification follows World Wide Web Consortium (W3C) formatting convention
 
 ---
 
+## Terminology
+
+This specification uses the following terms with precise definitions:
+
+**Agent**: The AI-powered process executing in an untrusted context with read-only GitHub permissions.
+
+**Safe Output Type**: A category of GitHub operation (e.g., `create_issue`, `add_comment`) with a corresponding MCP tool definition and handler implementation.
+
+**MCP Gateway**: The HTTP server accepting MCP tool invocation requests and recording operations to NDJSON format. Runs in the same context as the agent.
+
+**Safe Output Processor**: The privileged execution context that reads NDJSON artifacts, validates operations, and executes GitHub API calls.
+
+**Handler**: JavaScript implementation processing operations of a specific safe output type.
+
+**Validation**: Pre-execution verification of operation structure, limits, and authorization. Includes schema validation, limit enforcement, and allowlist checking.
+
+**Sanitization**: Content transformation pipeline removing potentially malicious patterns while preserving legitimate content.
+
+**Verification**: Post-compilation checking of configuration integrity through hash validation.
+
+**Staged Mode**: Preview execution mode where operations are simulated without creating permanent GitHub resources.
+
+**Temporary ID**: A placeholder identifier (format: `aw_<id>`) used to reference not-yet-created resources. Resolved to actual resource numbers during processing.
+
+**Provenance**: Metadata identifying the workflow and run that created a GitHub resource. Included in footers or API metadata fields.
+
+---
+
 ## 1. Introduction
 
 ### 1.1 Motivation and Problem Statement
@@ -405,6 +433,78 @@ This specification addresses five primary threat scenarios:
 
 *Residual Risk*: Misconfigured allowlists may permit unintended targets. Mitigation: principle of least privilege in configuration, periodic review.
 
+### 3.2.6 Cross-Repository Security Model
+
+**Repository Reference Format**
+
+Target repositories MUST be specified in `owner/repo` format. Implementations MUST validate:
+- Format matches regex: `^[a-zA-Z0-9_.-]+/[a-zA-Z0-9_.-]+$`
+- Owner and repo components are non-empty
+- No protocol prefix (https://, git://, etc.)
+
+**Allowlist Resolution Order**
+
+When evaluating cross-repository operations, implementations MUST apply these rules in order:
+
+1. **Extract target-repo**: Parse from operation arguments or configuration
+2. **Check type-specific allowlist**: If safe output type defines `allowed-repos`:
+   - MUST match against this list
+   - Type-specific allowlist OVERRIDES global allowlist
+   - If match fails, REJECT with E004
+3. **Check global allowlist**: If no type-specific allowlist and `allowed-github-references` is defined:
+   - MUST match against this list
+   - If match fails, REJECT with E004
+4. **Default deny**: If no allowlists are defined:
+   - MUST reject cross-repository operations
+   - Same-repository operations are permitted
+
+**Matching Rules**
+
+- Matching is EXACT (case-sensitive)
+- Wildcards (*, ?) are NOT supported
+- Pattern matching is NOT supported
+- Each repository MUST be explicitly listed
+
+**Security Properties**
+
+**Property SP6: Cross-Repository Containment**
+
+For all cross-repository operations:
+```
+∀ op ∈ operations:
+  op.target_repo ≠ null ⇒ 
+    (op.target_repo ∈ type_allowlist ∨ 
+     (type_allowlist = null ∧ op.target_repo ∈ global_allowlist))
+```
+
+**Property SP7: Deny-by-Default**
+
+Without explicit allowlist configuration:
+```
+allowed_repos = null ∧ allowed_github_references = null ⇒
+  ∀ op ∈ operations: op.target_repo = workflow.repository
+```
+
+**Example Configurations**
+
+```yaml
+# Example 1: Type-specific allowlist (overrides global)
+safe-outputs:
+  allowed-github-references: [owner/repo-a, owner/repo-b]
+  
+  create-issue:
+    allowed-repos: [owner/repo-c]  # Only repo-c permitted for issues
+    
+  add-comment:
+    # No type-specific list, uses global: repo-a, repo-b
+
+# Example 2: Explicit same-repository only
+safe-outputs:
+  create-issue:
+    # No allowlist = same repository only
+    max: 5
+```
+
 ### 3.3 Security Property Guarantees
 
 Conforming implementations MUST maintain these security invariants:
@@ -433,6 +533,67 @@ Conforming implementations MUST maintain these security invariants:
 
 *Verification*: Code review of safe output processor. Unit tests confirming validation rejects before API calls.
 
+#### Validation Pipeline Requirements
+
+Implementations MUST execute validation steps in this exact sequence for all safe output operations:
+
+**Stage 1: Schema Validation (REQUIRED)**
+- Input: Raw MCP tool arguments
+- Check: JSON schema validation against type-specific schema
+- On failure: Reject immediately with E001 (INVALID_SCHEMA) error
+- Output: Schema-validated operation data
+
+**Stage 2: Limit Enforcement (REQUIRED)**
+- Input: Count of operations of each type in current batch
+- Check: Compare count against configured `max` for each type
+- On failure: Reject entire batch with E002 (LIMIT_EXCEEDED) error
+- Output: Limit-validated operation set
+
+**Stage 3: Content Sanitization (REQUIRED)**
+- Input: All text fields (title, body, description, etc.)
+- Transform: Apply sanitization pipeline (see Section 9.2)
+- On failure: Reject with E008 (SANITIZATION_FAILED) if unsafe content cannot be sanitized
+- Output: Sanitized operation data
+
+**Stage 4: Domain Filtering (CONDITIONAL)**
+- Input: All URLs in markdown links and images
+- Check: Validate against `allowed-domains` if configured
+- Transform: Redact unauthorized URLs
+- Output: Domain-filtered operation data
+
+**Stage 5: Cross-Repository Validation (CONDITIONAL)**
+- Input: `target-repo` parameter if present
+- Check: Validate against `allowed-repos` or `allowed-github-references`
+- On failure: Reject with E004 (INVALID_TARGET_REPO)
+- Output: Authorized target repository
+
+**Stage 6: Dependency Resolution (CONDITIONAL)**
+- Input: Temporary IDs, parent references
+- Check: Resolve references to actual GitHub resource numbers
+- On failure: Reject with E005 (MISSING_PARENT)
+- Output: Fully-resolved operation data
+
+**Stage 7: GitHub API Invocation (EXECUTION)**
+- Input: Validated, sanitized, authorized operation data
+- Action: Execute GitHub API calls
+- On failure: Return E007 (API_ERROR) with details
+
+**Requirement VL1: Sequential Execution**
+
+Stages MUST execute in the order specified above. A failure at any stage (1-6) MUST prevent Stage 7 from executing for that operation.
+
+**Requirement VL2: Atomic Validation**
+
+For single-operation types (max=1), validation failure MUST prevent any API calls. For batch operations, validation failure of one operation MUST NOT cause rejection of the entire batch unless it's a limit enforcement failure.
+
+**Requirement VL3: Error Propagation**
+
+Validation errors MUST include:
+- Error code (E001-E008)
+- Human-readable message
+- Operation index (for batch operations)
+- Field name (for schema validation errors)
+
 **Property SP3: Limit Enforceability Invariant**
 
 *Statement*: For all configured max limits, implementations MUST prevent exceeding the limit. Attempts to exceed limits SHALL result in operation rejection.
@@ -2644,6 +2805,146 @@ Operations requiring repository features must validate availability:
 
 Validation occurs during execution, not tool invocation.
 
+### 9.4 Content Sanitization Pipeline
+
+**Applicability**
+
+Content sanitization MUST be applied to all user-provided text fields in safe output operations. Text fields include:
+- `title` (issues, PRs, discussions, projects)
+- `body` (issues, PRs, discussions, comments)
+- `description` (projects, status updates)
+- `comment` (review comments)
+
+**Sanitization Stages**
+
+Implementations MUST apply these transformations in order:
+
+**S1: Null Byte Removal**
+- Remove all null bytes (`\0`, `\x00`) from strings
+- Rationale: Prevents string truncation attacks
+
+**S2: Markdown Link Validation**
+- Pattern: `[text](url)` and `<url>`
+- For each URL:
+  - Extract domain
+  - If `allowed-domains` is configured:
+    - Check domain against allowlist
+    - If not allowed: Replace with `[text]([URL redacted: unauthorized domain])`
+  - Log redacted URLs to `/tmp/gh-aw/safeoutputs/redacted-domains.log`
+
+**S3: Markdown Image Validation**
+- Pattern: `![alt](url)`
+- For each image URL:
+  - Extract domain
+  - If `allowed-domains` is configured:
+    - Check domain against allowlist
+    - If not allowed: Replace with `![alt]([Image URL redacted: unauthorized domain])`
+
+**S4: HTML Tag Filtering** (Optional, depends on field type)
+- Remove potentially dangerous tags:
+  - `<script>`, `</script>`
+  - `<iframe>`, `</iframe>`
+  - `<object>`, `</object>`
+  - `<embed>`, `</embed>`
+- Remove event handlers:
+  - `on*` attributes in HTML tags (onclick, onerror, etc.)
+- Preserve safe GitHub Flavored Markdown tags:
+  - `<details>`, `<summary>`, `<sub>`, `<sup>`, `<kbd>`
+
+**S5: Command Injection Prevention**
+- Do NOT execute or interpret code blocks
+- Do NOT evaluate template expressions
+- Preserve code blocks verbatim (no escaping needed in markdown)
+
+**Excluded Content**
+
+The following content MUST NOT be sanitized:
+- Code blocks (` ``` `)
+- Inline code (`` `code` ``)
+- System-generated footers
+- System-generated metadata
+
+**Sanitization Reversibility**
+
+Sanitization transformations are LOSSY and NOT reversible. Original content is not preserved after sanitization. This is intentional to prevent attempts to bypass sanitization.
+
+**Conformance Requirement CR1: Pre-API Sanitization**
+
+All content MUST be sanitized BEFORE GitHub API invocation. Unsanitized content MUST NEVER be passed to GitHub APIs.
+
+*Verification*: Inspect handler code to confirm sanitization occurs before `octokit.*` calls.
+
+### 9.5 Error Code Catalog
+
+Implementations MUST use standardized error codes for validation and execution failures.
+
+**Error Code Table**
+
+| Code | Name | Description | When to Use | HTTP Status Equivalent |
+|------|------|-------------|-------------|------------------------|
+| E001 | INVALID_SCHEMA | Operation failed JSON schema validation | Input does not match type-specific schema | 400 Bad Request |
+| E002 | LIMIT_EXCEEDED | Operation count exceeds configured max | Batch contains more operations than allowed | 429 Too Many Requests |
+| E003 | UNAUTHORIZED_DOMAIN | URL contains non-allowlisted domain | Domain filtering rejected URL | 403 Forbidden |
+| E004 | INVALID_TARGET_REPO | target-repo not in allowed-repos | Cross-repository validation failed | 403 Forbidden |
+| E005 | MISSING_PARENT | Referenced parent issue/PR not found | Temporary ID or parent reference cannot be resolved | 404 Not Found |
+| E006 | INVALID_LABEL | Label does not exist in repository | Label validation failed | 404 Not Found |
+| E007 | API_ERROR | GitHub API returned error | GitHub API call failed | 502 Bad Gateway |
+| E008 | SANITIZATION_FAILED | Content contains unsanitizable unsafe patterns | Sanitization pipeline detected unremovable threats | 422 Unprocessable Entity |
+| E009 | CONFIG_HASH_MISMATCH | Configuration hash verification failed | Workflow YAML was modified after compilation | 403 Forbidden |
+| E010 | RATE_LIMIT_EXCEEDED | GitHub API rate limit exceeded | Too many API calls | 429 Too Many Requests |
+
+**Error Message Format**
+
+All errors MUST conform to this JSON structure:
+
+```json
+{
+  "error": {
+    "code": "E002",
+    "name": "LIMIT_EXCEEDED",
+    "message": "Operation count exceeds configured limit",
+    "details": {
+      "type": "create_issue",
+      "attempted": 5,
+      "max": 3,
+      "operation_index": 3
+    },
+    "timestamp": "2026-02-14T16:39:20.948Z",
+    "workflow_run": "https://github.com/owner/repo/actions/runs/12345"
+  }
+}
+```
+
+**Required Fields**:
+- `code`: Error code from table above (E001-E010)
+- `name`: Error name from table above
+- `message`: Human-readable description
+- `timestamp`: ISO 8601 timestamp
+
+**Optional Fields**:
+- `details`: Type-specific error context (operation_index, field names, etc.)
+- `workflow_run`: URL to workflow run for provenance
+
+**Error Handling Requirements**
+
+**Requirement EH1: Early Failure Detection**
+
+Validation errors (E001-E006) MUST be detected before any GitHub API calls are made.
+
+**Requirement EH2: Clear Error Messages**
+
+Error messages MUST:
+- Clearly state what went wrong
+- Include enough context to debug (field names, values)
+- Suggest remediation when possible
+
+**Requirement EH3: Error Logging**
+
+All errors MUST be logged to:
+- GitHub Actions step output (visible in workflow run)
+- Job summary (visible in workflow run summary)
+- STDERR (for local development)
+
 ---
 
 ## 10. Execution Guarantees
@@ -2679,6 +2980,108 @@ Operations execute in:
 
 **Error Reporting**: All errors collected; execution summary reports per-type results.
 
+### 10.5 Edge Case Behavior
+
+This section defines required behavior for unusual or boundary conditions.
+
+**Empty Operations**
+
+*Scenario*: NDJSON artifact contains zero operations
+
+*Behavior*:
+- Safe output job MUST succeed (exit code 0)
+- Job summary SHOULD display: "✅ No operations to process"
+- No GitHub API calls are made
+- No errors are raised
+
+*Rationale*: Empty operations are valid (agent may determine no action is needed).
+
+**Zero Max Limit**
+
+*Scenario*: Configuration specifies `max: 0` for a safe output type
+
+*Behavior*:
+- Type is DISABLED (MCP tool is not registered)
+- Attempts to invoke disabled type MUST return MCP error:
+  ```json
+  {"error": {"code": -32601, "message": "Method not found"}}
+  ```
+- No configuration is generated for disabled types
+
+*Rationale*: `max: 0` is an explicit disable signal.
+
+**API Rate Limiting**
+
+*Scenario*: GitHub API returns 429 (rate limit exceeded) or 403 with X-RateLimit-Remaining: 0
+
+*Behavior*:
+- Processor MUST retry with exponential backoff:
+  - 1st retry: After 60 seconds
+  - 2nd retry: After 120 seconds  
+  - 3rd retry: After 240 seconds
+- After 3 retries, MUST fail with E010 error
+- Error details MUST include rate limit reset time from `X-RateLimit-Reset` header
+
+*Rationale*: Transient rate limits should not fail workflows unnecessarily.
+
+**Workflow Cancellation**
+
+*Scenario*: Workflow is manually cancelled during agent execution
+
+*Behavior*:
+- Safe output job MUST NOT execute if artifact upload was interrupted
+- Partial NDJSON artifacts MUST NOT be processed
+- GitHub Actions automatically handles cleanup
+- No additional logic required in handlers
+
+*Rationale*: GitHub Actions cancellation is handled at platform level.
+
+**Concurrent Workflow Runs**
+
+*Scenario*: Multiple workflow runs execute concurrently for the same workflow
+
+*Behavior*:
+- Each run operates independently
+- Max limits are per-run (NOT global across runs)
+- No coordination or locking between runs
+- Operations in separate runs do NOT affect each other's limits
+
+*Rationale*: Simplicity and avoiding distributed coordination complexity.
+
+**Malformed NDJSON**
+
+*Scenario*: NDJSON artifact contains invalid JSON on one or more lines
+
+*Behavior*:
+- Parser MUST skip invalid lines with warning
+- Valid lines MUST be processed
+- Job summary MUST show: "⚠️ Skipped N malformed entries"
+- Invalid lines MUST be logged to STDERR
+
+*Rationale*: Partial failure should not prevent valid operations from executing.
+
+**Missing Artifact**
+
+*Scenario*: Safe output job cannot download artifact (artifact not found)
+
+*Behavior*:
+- Job MUST fail with clear error message
+- Error MUST suggest checking agent job completion
+- Exit code MUST be non-zero
+
+*Rationale*: Missing artifact indicates upstream failure that must be addressed.
+
+**Duplicate Temporary IDs**
+
+*Scenario*: Multiple operations use the same `temporary_id`
+
+*Behavior*:
+- First operation using the ID succeeds and establishes mapping
+- Subsequent operations using the same ID MUST reference the first operation's result
+- If this creates ambiguity (e.g., two issues both want to be "aw_parent"), MUST reject with E005
+
+*Rationale*: Deterministic behavior prevents confusion.
+
 ---
 
 ## Appendix A: Conformance Checklist
@@ -2780,8 +3183,226 @@ Detailed threat analysis and mitigation effectiveness assessment for all five pr
 
 ---
 
+## Appendix G: Configuration Patterns
+
+This appendix provides common configuration patterns for safe outputs.
+
+### Pattern 1: Simple Issue Tracking
+
+Basic configuration for automated issue creation:
+
+```yaml
+safe-outputs:
+  create-issue:
+    max: 1
+    labels: [automated]
+```
+
+**Use case**: Single automated issue per workflow run with consistent labeling.
+
+### Pattern 2: Multi-Type with Global Footer
+
+Configuration with multiple output types sharing global settings:
+
+```yaml
+safe-outputs:
+  footer: true  # Applied to all types
+  
+  create-issue:
+    max: 3
+    labels: [bug, automated]
+  
+  add-comment:
+    max: 2
+    hide-older-comments: true
+```
+
+**Use case**: Workflow creating multiple issues and comments with attribution footers.
+
+### Pattern 3: Cross-Repository Operations
+
+Secure cross-repository issue creation:
+
+```yaml
+safe-outputs:
+  allowed-github-references:
+    - owner/repo-a
+    - owner/repo-b
+  
+  create-issue:
+    max: 5
+    target-repo: owner/repo-a
+```
+
+**Use case**: Creating issues in a central tracking repository from multiple workflow repositories.
+
+**Security note**: Explicit allowlist prevents unauthorized repository targeting.
+
+### Pattern 4: Staged Mode Development
+
+Safe testing in preview mode:
+
+```yaml
+safe-outputs:
+  staged: true  # Enable preview mode globally
+  
+  create-issue:
+    max: 10  # Safe to set high in staged mode
+  
+  add-comment:
+    max: 5
+```
+
+**Use case**: Testing workflow behavior without creating real GitHub resources.
+
+**Workflow**: Test with `staged: true`, verify previews, then deploy with `staged: false`.
+
+### Pattern 5: Type-Specific Allowlists
+
+Fine-grained cross-repository control:
+
+```yaml
+safe-outputs:
+  allowed-github-references: [owner/repo-a, owner/repo-b]
+  
+  create-issue:
+    allowed-repos: [owner/repo-c]  # Overrides global
+    max: 3
+    
+  add-comment:
+    # No type-specific list, uses global: repo-a, repo-b
+    max: 2
+```
+
+**Use case**: Different safe output types target different repositories.
+
+**Security note**: Type-specific allowlists override global allowlists.
+
+### Pattern 6: Domain Filtering for Security
+
+Restrict URLs in safe output content:
+
+```yaml
+safe-outputs:
+  allowed-domains:
+    - github.com
+    - "*.github.io"
+    - docs.github.com
+  
+  create-issue:
+    max: 5
+```
+
+**Use case**: Prevent agents from including unauthorized URLs in created content.
+
+**Effect**: URLs to non-allowlisted domains are redacted during sanitization.
+
+### Pattern 7: Temporary Resource Cleanup
+
+Auto-close temporary issues:
+
+```yaml
+safe-outputs:
+  create-issue:
+    max: 10
+    expires: 7  # Auto-close after 7 days
+    labels: [temporary, automated]
+```
+
+**Use case**: Issues for transient notifications that should auto-clean.
+
+**Implementation**: Scheduled workflow checks issue age and closes expired issues.
+
+### Pattern 8: Review Comment Workflow
+
+Pull request review automation:
+
+```yaml
+safe-outputs:
+  create-pr-review-comment:
+    max: 20
+    
+  submit-pr-review:
+    max: 1
+    
+  resolve-pr-review-thread:
+    max: 10
+```
+
+**Use case**: Automated code review with multiple comments and thread resolution.
+
+**Workflow**: Create review comments, submit bundled review, resolve addressed threads.
+
+### Pattern 9: Project Management
+
+Automated project creation and updates:
+
+```yaml
+safe-outputs:
+  create-project:
+    max: 1
+    
+  update-project:
+    max: 5
+    
+  create-project-status-update:
+    max: 3
+```
+
+**Use case**: Creating and maintaining project boards automatically.
+
+### Pattern 10: Grouped Issues with Parent
+
+Create related issues under a parent:
+
+```yaml
+safe-outputs:
+  create-issue:
+    max: 10
+    group: true
+```
+
+**Use case**: Workflow creates parent issue and multiple sub-issues linked via tasklists.
+
+**Effect**: First issue becomes parent, subsequent issues link to it.
+
+### Best Practices
+
+**Start Conservative**:
+- Begin with low `max` values
+- Enable `staged: true` for testing
+- Use explicit `allowed-repos` lists
+
+**Use Domain Filtering**:
+- Always configure `allowed-domains` when agents process external input
+- Include only trusted domains
+
+**Enable Footers**:
+- Keep `footer: true` (default) for transparency
+- Only disable when absolutely necessary
+
+**Temporary Resources**:
+- Use `expires` for transient issues
+- Clean up with `close-older-issues` for superseded content
+
+**Cross-Repository Security**:
+- Use type-specific `allowed-repos` for fine-grained control
+- Prefer explicit lists over broad permissions
+
+---
+
 ## Appendix F: Document History
 
+**Version 1.9.0** (2026-02-14):
+- Added comprehensive validation pipeline ordering (7 stages)
+- Added cross-repository security model with explicit allowlist rules
+- Added content sanitization pipeline specification (5 stages)
+- Added standardized error code catalog (E001-E010)
+- Added edge case behavior specifications
+- Added terminology section for consistency
+- Enhanced security properties (SP6, SP7)
+- Improved requirements testability
+
 **Version 1.8.0** (2025-02-14):
 - Initial W3C-style specification release
 - Complete security model documentation
diff --git a/scripts/README-conformance.md b/scripts/README-conformance.md
new file mode 100644
index 0000000000..d134582cf1
--- /dev/null
+++ b/scripts/README-conformance.md
@@ -0,0 +1,135 @@
+# Safe Outputs Specification Conformance Checker
+
+This directory contains automated tools for verifying conformance with the Safe Outputs MCP Gateway Specification.
+
+## Overview
+
+The conformance checker validates implementations against normative requirements in the [Safe Outputs Specification](/docs/src/content/docs/reference/safe-outputs-specification.md). It runs automated checks across four categories:
+
+- **SEC**: Security requirements (privilege separation, validation ordering, sanitization)
+- **USE**: Usability requirements (error codes, footers, staged mode formatting)
+- **REQ**: Requirements compliance (RFC 2119 keywords, testability, completeness)
+- **IMP**: Implementation requirements (handler registration, permission computation, schema validation)
+
+## Usage
+
+### Run All Checks
+
+```bash
+./scripts/check-safe-outputs-conformance.sh
+```
+
+### Exit Codes
+
+- `0`: All checks passed or only low/medium severity warnings
+- `1`: High severity failures detected
+- `2`: Critical severity failures detected
+
+### Output Format
+
+The checker uses color-coded output:
+
+- 🔴 **[CRITICAL]**: Must be fixed immediately (security violations)
+- 🔴 **[HIGH]**: Should be fixed soon (significant issues)
+- 🟡 **[MEDIUM]**: Should be addressed (quality issues)
+- 🔵 **[LOW]**: Nice to have (minor improvements)
+- 🟢 **[PASS]**: Check passed
+
+## Checks Implemented
+
+### Security Checks
+
+| ID | Name | Severity | Description |
+|----|------|----------|-------------|
+| SEC-001 | Privilege Separation Enforcement | CRITICAL | Verifies agent jobs lack write permissions |
+| SEC-002 | Validation Before API Calls | CRITICAL | Ensures validation occurs before GitHub API calls |
+| SEC-003 | Max Limit Enforcement | MEDIUM | Verifies handlers enforce max operation limits |
+| SEC-004 | Content Sanitization Required | MEDIUM | Checks that content sanitization is applied |
+| SEC-005 | Cross-Repository Validation | HIGH | Validates cross-repo operations check allowlists |
+
+### Usability Checks
+
+| ID | Name | Severity | Description |
+|----|------|----------|-------------|
+| USE-001 | Error Code Standardization | LOW | Verifies handlers use standardized error codes |
+| USE-002 | Footer Attribution Required | LOW | Checks footers are added when configured |
+| USE-003 | Staged Mode Preview Format | LOW | Validates staged mode uses 🎭 emoji format |
+
+### Requirements Checks
+
+| ID | Name | Severity | Description |
+|----|------|----------|-------------|
+| REQ-001 | RFC 2119 Keyword Usage | MEDIUM | Verifies normative sections use MUST/SHOULD/MAY |
+| REQ-002 | Safe Output Type Completeness | MEDIUM | Checks each type has all required documentation |
+| REQ-003 | Verification Method Specification | LOW | Ensures requirements include verification methods |
+
+### Implementation Checks
+
+| ID | Name | Severity | Description |
+|----|------|----------|-------------|
+| IMP-001 | Handler Registration Completeness | HIGH | Verifies all standard handlers exist |
+| IMP-002 | Permission Computation Accuracy | HIGH | Checks permission computation function exists |
+| IMP-003 | Schema Validation Consistency | MEDIUM | Validates schema generation is implemented |
+
+## Adding New Checks
+
+To add a new conformance check:
+
+1. Add a check function following the naming pattern `check_<category>_<name>`
+2. Use the logging functions: `log_critical`, `log_high`, `log_medium`, `log_low`, `log_pass`
+3. Increment the appropriate failure counter when a check fails
+4. Add a call to your check function in the main script flow
+5. Document the check in this README
+
+Example:
+
+```bash
+check_new_requirement() {
+    local failed=0
+    
+    # Your check logic here
+    if [ condition_not_met ]; then
+        log_high "CHECK-ID: Description of failure"
+        failed=1
+    fi
+    
+    if [ $failed -eq 0 ]; then
+        log_pass "CHECK-ID: Check passed successfully"
+    fi
+}
+```
+
+## Related Documentation
+
+- [Safe Outputs Specification](/docs/src/content/docs/reference/safe-outputs-specification.md) - Complete normative specification
+- [Specification Review Findings](/docs/spec-review-findings.md) - Detailed security, usability, and requirements review
+- [Specification Improvements Plan](/docs/spec-improvements-plan.md) - Roadmap for addressing findings
+
+## CI Integration
+
+The conformance checker can be integrated into CI/CD pipelines:
+
+```yaml
+# .github/workflows/conformance.yml
+name: Conformance Check
+on: [push, pull_request]
+
+jobs:
+  conformance:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Run conformance checks
+        run: ./scripts/check-safe-outputs-conformance.sh
+```
+
+## Maintenance
+
+The conformance checker should be updated when:
+
+- New safe output types are added
+- Specification requirements change
+- New security properties are added
+- Implementation patterns evolve
+
+Regular maintenance ensures the checker stays aligned with the specification and implementation.
diff --git a/scripts/check-safe-outputs-conformance.sh b/scripts/check-safe-outputs-conformance.sh
new file mode 100755
index 0000000000..78ee265fa0
--- /dev/null
+++ b/scripts/check-safe-outputs-conformance.sh
@@ -0,0 +1,402 @@
+#!/bin/bash
+# Safe Outputs Specification Conformance Checker
+# This script implements automated checks for the Safe Outputs specification
+# Based on findings from docs/spec-review-findings.md
+
+set -euo pipefail
+
+# Color codes for output
+RED='\033[0;31m'
+YELLOW='\033[1;33m'
+GREEN='\033[0;32m'
+BLUE='\033[0;34m'
+NC='\033[0m' # No Color
+
+# Counters
+CRITICAL_FAILURES=0
+HIGH_FAILURES=0
+MEDIUM_FAILURES=0
+LOW_FAILURES=0
+
+# Logging functions
+log_critical() {
+    echo -e "${RED}[CRITICAL]${NC} $1"
+    ((CRITICAL_FAILURES++))
+}
+
+log_high() {
+    echo -e "${RED}[HIGH]${NC} $1"
+    ((HIGH_FAILURES++))
+}
+
+log_medium() {
+    echo -e "${YELLOW}[MEDIUM]${NC} $1"
+    ((MEDIUM_FAILURES++))
+}
+
+log_low() {
+    echo -e "${BLUE}[LOW]${NC} $1"
+    ((LOW_FAILURES++))
+}
+
+log_pass() {
+    echo -e "${GREEN}[PASS]${NC} $1"
+}
+
+log_info() {
+    echo -e "${BLUE}[INFO]${NC} $1"
+}
+
+# Change to repo root
+cd "$(dirname "$0")/.."
+
+echo "=================================================="
+echo "Safe Outputs Specification Conformance Checker"
+echo "=================================================="
+echo ""
+
+# SEC-001: Privilege Separation Enforcement
+echo "Running SEC-001: Privilege Separation Enforcement..."
+check_privilege_separation() {
+    local failed=0
+    
+    # Find all compiled workflow files
+    find .github/workflows -name "*.lock.yml" | while read -r workflow; do
+        # Check if agent job has write permissions
+        if grep -A 50 "^jobs:" "$workflow" | grep -A 20 "^\s*agent:" | grep -qE "issues:\s*write|pull-requests:\s*write|contents:\s*write"; then
+            log_critical "SEC-001: Agent job in $workflow has write permissions"
+            failed=1
+        fi
+    done
+    
+    if [ $failed -eq 0 ]; then
+        log_pass "SEC-001: All agent jobs properly lack write permissions"
+    fi
+}
+check_privilege_separation
+
+# SEC-002: Validation Before API Calls
+echo "Running SEC-002: Validation Before API Calls..."
+check_validation_ordering() {
+    local failed=0
+    
+    for handler in actions/setup/js/*.cjs; do
+        # Skip test files
+        [[ "$handler" =~ test ]] && continue
+        [[ "$handler" =~ parse ]] && continue
+        [[ "$handler" =~ buffer ]] && continue
+        
+        # Check if handler has API calls
+        if grep -q "octokit\." "$handler"; then
+            # Check if validation appears before API calls
+            if ! awk '/octokit\./{api_line=NR} /validate|sanitize|enforceLimit/{if(NR<api_line || api_line==0) valid=1} END{exit !valid}' "$handler" 2>/dev/null; then
+                log_critical "SEC-002: $handler may have API calls before validation"
+                failed=1
+            fi
+        fi
+    done
+    
+    if [ $failed -eq 0 ]; then
+        log_pass "SEC-002: All handlers validate before API calls"
+    fi
+}
+check_validation_ordering
+
+# SEC-003: Max Limit Enforcement
+echo "Running SEC-003: Max Limit Enforcement..."
+check_max_limits() {
+    local failed=0
+    
+    for handler in actions/setup/js/*.cjs; do
+        # Skip test and utility files
+        [[ "$handler" =~ (test|parse|buffer|factory) ]] && continue
+        
+        # Check if handler enforces max limits
+        if ! grep -q "\.length.*>.*\.max\|enforceMaxLimit\|checkLimit\|max.*exceeded" "$handler"; then
+            log_medium "SEC-003: $handler may not enforce max limits"
+            failed=1
+        fi
+    done
+    
+    if [ $failed -eq 0 ]; then
+        log_pass "SEC-003: All handlers enforce max limits"
+    fi
+}
+check_max_limits
+
+# SEC-004: Content Sanitization Required
+echo "Running SEC-004: Content Sanitization Required..."
+check_sanitization() {
+    local failed=0
+    
+    for handler in actions/setup/js/*.cjs; do
+        # Skip test and utility files
+        [[ "$handler" =~ (test|parse|buffer) ]] && continue
+        
+        # Check if handler has body/content fields
+        if grep -q "\"body\"\|body:" "$handler"; then
+            # Check for sanitization
+            if ! grep -q "sanitize\|stripHTML\|escapeMarkdown\|cleanContent" "$handler"; then
+                log_medium "SEC-004: $handler has body field but no sanitization"
+                failed=1
+            fi
+        fi
+    done
+    
+    if [ $failed -eq 0 ]; then
+        log_pass "SEC-004: All handlers properly sanitize content"
+    fi
+}
+check_sanitization
+
+# SEC-005: Cross-Repository Validation
+echo "Running SEC-005: Cross-Repository Validation..."
+check_cross_repo() {
+    local failed=0
+    
+    for handler in actions/setup/js/*.cjs; do
+        # Skip test files
+        [[ "$handler" =~ test ]] && continue
+        
+        # Check if handler supports target-repo
+        if grep -q "target.*[Rr]epo\|targetRepo" "$handler"; then
+            # Check for allowlist validation
+            if ! grep -q "allowed.*[Rr]epos\|validateTargetRepo\|checkAllowedRepo" "$handler"; then
+                log_high "SEC-005: $handler supports target-repo but lacks allowlist check"
+                failed=1
+            fi
+        fi
+    done
+    
+    if [ $failed -eq 0 ]; then
+        log_pass "SEC-005: All cross-repo handlers validate allowlists"
+    fi
+}
+check_cross_repo
+
+# USE-001: Error Code Standardization
+echo "Running USE-001: Error Code Standardization..."
+check_error_codes() {
+    local failed=0
+    
+    for handler in actions/setup/js/*.cjs; do
+        # Skip test files
+        [[ "$handler" =~ test ]] && continue
+        
+        # Check if handler throws errors
+        if grep -q "throw.*Error\|core\.setFailed" "$handler"; then
+            # Check for standardized error codes
+            if ! grep -qE "E[0-9]{3}|ERROR_|ERR_" "$handler"; then
+                log_low "USE-001: $handler may not use standardized error codes"
+                failed=1
+            fi
+        fi
+    done
+    
+    if [ $failed -eq 0 ]; then
+        log_pass "USE-001: All handlers use standardized error codes"
+    fi
+}
+check_error_codes
+
+# USE-002: Footer Attribution Required
+echo "Running USE-002: Footer Attribution Required..."
+check_footers() {
+    local failed=0
+    
+    # Check handlers that create issues/PRs/discussions
+    for handler in actions/setup/js/{create_issue,create_pull_request,create_discussion,add_comment}.cjs; do
+        [ ! -f "$handler" ] && continue
+        
+        # Check if handler adds footers
+        if ! grep -q "footer\|addFooter\|attribution\|AI generated" "$handler"; then
+            log_low "USE-002: $handler may not add footer attribution"
+            failed=1
+        fi
+    done
+    
+    if [ $failed -eq 0 ]; then
+        log_pass "USE-002: All handlers add footer attribution when configured"
+    fi
+}
+check_footers
+
+# USE-003: Staged Mode Preview Format
+echo "Running USE-003: Staged Mode Preview Format..."
+check_staged_mode() {
+    local failed=0
+    
+    for handler in actions/setup/js/*.cjs; do
+        # Skip test files
+        [[ "$handler" =~ test ]] && continue
+        
+        # Check if handler has staged mode
+        if grep -q "staged.*true\|isStaged\|GH_AW_SAFE_OUTPUTS_STAGED" "$handler"; then
+            # Check for emoji in preview
+            if ! grep -q "🎭\|Staged Mode.*Preview" "$handler"; then
+                log_low "USE-003: $handler has staged mode but missing 🎭 emoji"
+                failed=1
+            fi
+        fi
+    done
+    
+    if [ $failed -eq 0 ]; then
+        log_pass "USE-003: All handlers use correct staged mode format"
+    fi
+}
+check_staged_mode
+
+# REQ-001: RFC 2119 Keyword Usage
+echo "Running REQ-001: RFC 2119 Keyword Usage..."
+check_rfc2119() {
+    local spec_file="docs/src/content/docs/reference/safe-outputs-specification.md"
+    local failed=0
+    
+    # Check key sections have RFC 2119 keywords
+    for section in "Security Architecture" "Configuration Semantics" "Execution Guarantees"; do
+        if ! grep -A 200 "## .*$section" "$spec_file" 2>/dev/null | grep -q "MUST\|SHALL\|SHOULD\|MAY"; then
+            log_medium "REQ-001: Section '$section' may lack RFC 2119 keywords"
+            failed=1
+        fi
+    done
+    
+    if [ $failed -eq 0 ]; then
+        log_pass "REQ-001: Normative sections use RFC 2119 keywords"
+    fi
+}
+check_rfc2119
+
+# REQ-002: Safe Output Type Completeness
+echo "Running REQ-002: Safe Output Type Completeness..."
+check_type_completeness() {
+    local spec_file="docs/src/content/docs/reference/safe-outputs-specification.md"
+    local failed=0
+    
+    # Extract type names
+    grep "^#### Type:" "$spec_file" 2>/dev/null | sed 's/^#### Type: //' | head -10 | while read -r type_name; do
+        sections_found=0
+        
+        # Check for required sections
+        for section in "MCP Tool Schema" "Operational Semantics" "Configuration Parameters" "Security Requirements" "Required Permissions"; do
+            if grep -A 200 "^#### Type: $type_name" "$spec_file" 2>/dev/null | grep -q "**$section**"; then
+                ((sections_found++))
+            fi
+        done
+        
+        if [ $sections_found -lt 5 ]; then
+            log_medium "REQ-002: Type '$type_name' has only $sections_found/5 required sections"
+            failed=1
+        fi
+    done
+    
+    if [ $failed -eq 0 ]; then
+        log_pass "REQ-002: All safe output types have complete documentation"
+    fi
+}
+check_type_completeness
+
+# REQ-003: Verification Method Specification
+echo "Running REQ-003: Verification Method Specification..."
+check_verification_methods() {
+    local spec_file="docs/src/content/docs/reference/safe-outputs-specification.md"
+    local failed=0
+    
+    # Check key requirements have verification methods
+    for req in "AR1" "AR2" "AR3" "SP1" "SP2" "SP3"; do
+        if ! grep -A 30 "**Requirement $req:\|**Property $req:" "$spec_file" 2>/dev/null | grep -q "Verification:\|Formal Definition:"; then
+            log_low "REQ-003: Requirement $req may lack verification method"
+            failed=1
+        fi
+    done
+    
+    if [ $failed -eq 0 ]; then
+        log_pass "REQ-003: All requirements have verification methods"
+    fi
+}
+check_verification_methods
+
+# IMP-001: Handler Registration Completeness
+echo "Running IMP-001: Handler Registration Completeness..."
+check_handler_registration() {
+    local failed=0
+    
+    # Check if standard handlers exist
+    for type in create_issue add_comment close_issue update_issue add_labels remove_labels; do
+        handler_file="actions/setup/js/${type}.cjs"
+        if [ ! -f "$handler_file" ]; then
+            log_high "IMP-001: Missing handler file $handler_file"
+            failed=1
+        fi
+    done
+    
+    if [ $failed -eq 0 ]; then
+        log_pass "IMP-001: All standard handlers are registered"
+    fi
+}
+check_handler_registration
+
+# IMP-002: Permission Computation Accuracy
+echo "Running IMP-002: Permission Computation Accuracy..."
+check_permission_computation() {
+    # Check if permission computation file exists and is well-formed
+    if [ -f "pkg/workflow/safe_outputs_permissions.go" ]; then
+        # Basic check that it defines computePermissionsForSafeOutputs
+        if grep -q "computePermissionsForSafeOutputs" "pkg/workflow/safe_outputs_permissions.go"; then
+            log_pass "IMP-002: Permission computation function exists"
+        else
+            log_high "IMP-002: Permission computation function not found"
+        fi
+    else
+        log_high "IMP-002: Permission computation file missing"
+    fi
+}
+check_permission_computation
+
+# IMP-003: Schema Validation Consistency
+echo "Running IMP-003: Schema Validation Consistency..."
+check_schema_consistency() {
+    local failed=0
+    
+    # Check if safe outputs config Go file exists
+    if [ -f "pkg/workflow/safe_outputs_config.go" ]; then
+        # Check for schema generation functions
+        if ! grep -q "generateSchema\|buildSchema\|toolSchema" "pkg/workflow/safe_outputs_config.go"; then
+            log_medium "IMP-003: Schema generation functions may be missing"
+            failed=1
+        fi
+    else
+        log_medium "IMP-003: Safe outputs config file missing"
+        failed=1
+    fi
+    
+    if [ $failed -eq 0 ]; then
+        log_pass "IMP-003: Schema generation is implemented"
+    fi
+}
+check_schema_consistency
+
+# Summary
+echo ""
+echo "=================================================="
+echo "Conformance Check Summary"
+echo "=================================================="
+echo -e "${RED}Critical Failures:${NC} $CRITICAL_FAILURES"
+echo -e "${RED}High Failures:${NC} $HIGH_FAILURES"
+echo -e "${YELLOW}Medium Failures:${NC} $MEDIUM_FAILURES"
+echo -e "${BLUE}Low Failures:${NC} $LOW_FAILURES"
+echo ""
+
+# Exit code based on failures
+if [ $CRITICAL_FAILURES -gt 0 ]; then
+    echo -e "${RED}FAIL:${NC} Critical conformance issues found"
+    exit 2
+elif [ $HIGH_FAILURES -gt 0 ]; then
+    echo -e "${RED}FAIL:${NC} High priority conformance issues found"
+    exit 1
+elif [ $MEDIUM_FAILURES -gt 0 ]; then
+    echo -e "${YELLOW}WARN:${NC} Medium priority conformance issues found"
+    exit 0
+else
+    echo -e "${GREEN}PASS:${NC} All checks passed"
+    exit 0
+fi
diff --git a/scripts/spec-review-findings.md b/scripts/spec-review-findings.md
new file mode 100644
index 0000000000..b3d6dc2998
--- /dev/null
+++ b/scripts/spec-review-findings.md
@@ -0,0 +1,1061 @@
+# Safe Outputs Specification Review Findings
+
+**Date**: 2026-02-14  
+**Specification**: [Safe Outputs MCP Gateway Specification v1.8.0](/docs/src/content/docs/reference/safe-outputs-specification.md)  
+**Commit**: [a5b6606](https://github.com/github/gh-aw/commit/a5b6606aead2b2f2c3c53a46da1d1fe88f5ee583)  
+**Reviewer**: Automated Security, Usability, and Requirements Review
+
+## Executive Summary
+
+This document presents findings from a comprehensive review of the Safe Outputs MCP Gateway Specification from three perspectives: **security**, **usability**, and **requirements compliance**. The specification establishes a W3C-style normative framework for secure AI-to-GitHub operation translation.
+
+**Overall Assessment**: The specification is **well-structured** with strong security foundations and clear architectural separation. However, several areas require clarification and enhancement to enable effective automated rule encoding and conformance testing.
+
+### Key Strengths
+- ✅ Strong security architecture with privilege separation
+- ✅ Clear RFC 2119 requirement levels
+- ✅ Comprehensive threat model with 5 identified threats
+- ✅ Detailed safe output type catalog (36 types documented)
+- ✅ Complete permission documentation for GitHub Actions Token and GitHub App
+
+### Areas Requiring Improvement
+- ⚠️ Ambiguous enforcement requirements for some security properties
+- ⚠️ Missing automated test specifications
+- ⚠️ Inconsistent normative language in some sections
+- ⚠️ Limited guidance on error handling edge cases
+- ⚠️ Insufficient detail on cross-repository validation rules
+
+---
+
+## 1. Security Review Findings
+
+### 1.1 Critical Security Issues
+
+#### **Finding S1: Ambiguous Validation Ordering Requirements**
+
+**Location**: Section 3.3, Property SP2 (Validation Precedence Invariant)
+
+**Issue**: The specification states "validation logic MUST execute before any GitHub API invocation" but does not specify:
+1. What constitutes "validation logic" (schema validation only, or also sanitization, limit checks, etc.)?
+2. Whether partial validation results can trigger API calls
+3. How to handle validation dependencies (e.g., validating parent issue exists before creating sub-issue)
+
+**Security Impact**: **MEDIUM** - Unclear validation ordering could lead to race conditions or partial execution states where some validations are bypassed.
+
+**Recommendation**:
+```markdown
+Add explicit validation pipeline specification:
+
+**Validation Pipeline Ordering**:
+
+Implementations MUST execute validation in this exact order:
+
+1. **Schema Validation** (MUST reject invalid JSON structure before proceeding)
+2. **Limit Enforcement** (MUST reject operations exceeding max limits)
+3. **Content Sanitization** (MUST sanitize all text fields)
+4. **Domain Filtering** (MUST apply allowed-domains if configured)
+5. **Cross-Repository Validation** (MUST validate target-repo against allowed-repos)
+6. **Dependency Resolution** (MUST resolve temporary IDs and parent references)
+7. **GitHub API Invocation** (Only after steps 1-6 complete successfully)
+
+Any failure in steps 1-6 MUST prevent step 7 from executing.
+```
+
+**Automated Rule**:
+```yaml
+# validation-ordering-check.yml
+rules:
+  - id: validation-before-api
+    pattern: |
+      Must verify that all safe output handler implementations:
+      1. Perform schema validation before API calls
+      2. Enforce limits before API calls
+      3. Sanitize content before API calls
+    check: |
+      grep -A 50 "function.*Handler" actions/setup/js/*.cjs | \
+      grep -B 5 "octokit\." | \
+      grep -q "validateSchema\|enforceLimit\|sanitize"
+    severity: CRITICAL
+```
+
+---
+
+#### **Finding S2: Insufficient Cross-Repository Security Model**
+
+**Location**: Section 3.2, Threat T5 (Cross-Repository Privilege Escalation)
+
+**Issue**: The specification describes `allowed-github-references` and per-type `allowed-repos` but does not specify:
+1. How these interact when both are configured
+2. Whether allowlist matching is exact or supports patterns
+3. How to handle organization-wide or wildcard allowlists securely
+4. What happens when target-repo is specified but not in allowed-repos
+
+**Security Impact**: **HIGH** - Ambiguous cross-repository rules could allow unauthorized repository targeting through configuration misunderstanding.
+
+**Recommendation**:
+```markdown
+Add Cross-Repository Security Model section:
+
+**Cross-Repository Validation Rules**:
+
+When `target-repo` is specified in a safe output operation:
+
+1. **Extract Repository Reference**: Parse `target-repo` as `owner/repo` format
+2. **Validate Format**: MUST match regex `^[a-zA-Z0-9_.-]+/[a-zA-Z0-9_.-]+$`
+3. **Check Global Allowlist**: If `allowed-github-references` is non-empty:
+   - MUST check if `target-repo` matches any entry
+   - Match is EXACT (no wildcards, no pattern matching)
+4. **Check Type-Specific Allowlist**: If safe output type has `allowed-repos`:
+   - MUST check if `target-repo` matches any entry
+   - Type-specific allowlist OVERRIDES global allowlist (does not merge)
+5. **Deny by Default**: If neither allowlist permits the repo, MUST reject operation
+
+**Wildcard Rules**: Wildcards (*, ?) are NOT supported in allowlists. Each repository MUST be explicitly listed.
+
+**Organization-Wide**: To allow all repos in an organization:
+- List repositories explicitly, OR
+- Use GitHub App permissions with organization-level access
+```
+
+**Automated Rule**:
+```yaml
+# cross-repo-validation-check.yml
+rules:
+  - id: cross-repo-allowlist-enforcement
+    pattern: |
+      All handlers with target-repo support must validate against allowlist
+    check: |
+      for handler in create_issue.cjs add_comment.cjs; do
+        grep -q "validateTargetRepo\|checkAllowedRepos" "actions/setup/js/$handler" || exit 1
+      done
+    severity: HIGH
+```
+
+---
+
+#### **Finding S3: Incomplete Content Sanitization Specification**
+
+**Location**: Section 9 (Content Integrity Mechanisms)
+
+**Issue**: While sanitization is mentioned throughout, the specification does not provide:
+1. Exact definition of what "sanitization" entails
+2. List of prohibited patterns or characters
+3. Handling of markdown-specific exploits (e.g., image injection, HTML in markdown)
+4. Whether sanitization is reversible or lossy
+
+**Security Impact**: **MEDIUM** - Inconsistent sanitization implementations could lead to content injection vulnerabilities.
+
+**Recommendation**:
+```markdown
+Add Sanitization Requirements section:
+
+**Content Sanitization Pipeline**:
+
+Implementations MUST apply these sanitization transforms to all user-provided text fields:
+
+1. **Markdown Link Validation**:
+   - Extract all markdown links: `[text](url)`
+   - Validate URL against `allowed-domains` if configured
+   - Redact unauthorized URLs: `[text]([URL redacted: unauthorized domain])`
+
+2. **HTML Tag Stripping** (for fields that don't support HTML):
+   - Remove `<script>`, `<iframe>`, `<object>`, `<embed>` tags
+   - Remove `on*` event handlers in HTML attributes
+   - Preserve safe markdown-supported HTML (e.g., `<details>`, `<summary>`)
+
+3. **Command Injection Prevention**:
+   - Escape shell metacharacters in code blocks
+   - Do NOT execute or interpret code blocks
+
+4. **Image URL Validation**:
+   - Validate image URLs `![alt](url)` against allowed-domains
+   - Redact unauthorized image URLs
+
+5. **Null Byte Handling**:
+   - Remove null bytes (\0) from all strings
+   - Reject inputs containing null bytes
+
+**Non-Sanitized Fields**: The following fields MUST NOT be sanitized (verbatim):
+- Code blocks (```code```)
+- URLs in footers added by the system
+```
+
+**Automated Rule**:
+```yaml
+# sanitization-check.yml
+rules:
+  - id: sanitization-completeness
+    pattern: |
+      All handlers must import and use sanitization functions
+    check: |
+      for handler in actions/setup/js/*.cjs; do
+        [[ "$handler" =~ (test|parse|buffer) ]] && continue
+        grep -q "sanitize\|stripHTML\|validateURL" "$handler" || exit 1
+      done
+    severity: MEDIUM
+```
+
+---
+
+### 1.2 Moderate Security Issues
+
+#### **Finding S4: Unclear Artifact Storage Security Model**
+
+**Location**: Section 3.1, Requirement AR2 (Communication Channel Integrity)
+
+**Issue**: The specification requires artifact storage for communication but does not specify:
+1. Access controls on artifacts (who can download?)
+2. Artifact retention policies (how long do they persist?)
+3. Handling of sensitive data in artifacts
+4. Encryption requirements (at rest? in transit?)
+
+**Security Impact**: **LOW-MEDIUM** - Artifacts might contain sensitive operation details accessible to unauthorized parties.
+
+**Recommendation**:
+```markdown
+Add Artifact Security Requirements:
+
+**Artifact Access Control**:
+- Artifacts MUST be scoped to the workflow run (not accessible across runs)
+- Artifacts MUST require same permissions as workflow run to download
+- Implementations SHOULD delete artifacts after safe output processing completes
+
+**Sensitive Data Handling**:
+- Implementations SHOULD NOT include secrets in artifact data
+- GitHub Actions secret masking SHOULD be verified before artifact upload
+- Implementations MAY encrypt artifact contents (not required but permitted)
+
+**Retention Policy**:
+- Artifacts SHOULD be retained for at least 1 day (for debugging)
+- Artifacts MUST be deleted after 90 days (GitHub Actions limit)
+- Implementations MAY implement shorter retention (e.g., 7 days)
+```
+
+---
+
+#### **Finding S5: Missing Rate Limiting Specification**
+
+**Location**: Section 3.2, Threat T4 (Resource Exhaustion)
+
+**Issue**: While max limits are specified, there's no guidance on:
+1. Rate limiting across multiple workflow runs
+2. Preventing rapid workflow re-triggering
+3. Global resource consumption limits (e.g., total issues per day)
+
+**Security Impact**: **LOW** - Within single runs, max limits are effective. Across runs, potential for abuse remains.
+
+**Recommendation**:
+```markdown
+Add Rate Limiting Guidance:
+
+**Per-Run Limits**: Enforced via `max` configuration (existing)
+
+**Cross-Run Considerations**:
+Implementations MAY implement additional safeguards:
+- Daily operation caps (e.g., max 100 issues per day per workflow)
+- Workflow trigger rate limiting (e.g., max 10 runs per hour)
+- Organization-wide quotas
+
+**NOTE**: Cross-run limits are OPTIONAL and not required for conformance.
+```
+
+---
+
+### 1.3 Security Best Practices
+
+#### **Finding S6: Consider Adding Security Testing Requirements**
+
+**Recommendation**: Add normative security test requirements to conformance section.
+
+```markdown
+Add to Section 2.3:
+
+**Method M5: Adversarial Testing (RECOMMENDED)**
+
+Implementations SHOULD verify security properties under adversarial conditions:
+
+1. **Prompt Injection Scenarios**:
+   - Test with malicious inputs attempting to exceed max limits
+   - Test with inputs containing command injection attempts
+   - Test with inputs attempting cross-repository escalation
+
+2. **Configuration Tampering Attempts**:
+   - Test with modified YAML (should fail hash check)
+   - Test with invalid configuration (should reject)
+
+3. **Boundary Condition Testing**:
+   - Test with exactly `max` operations (should succeed)
+   - Test with `max + 1` operations (should reject)
+   - Test with empty operations (should handle gracefully)
+```
+
+---
+
+## 2. Usability Review Findings
+
+### 2.1 Documentation Clarity Issues
+
+#### **Finding U1: Inconsistent Terminology**
+
+**Issue**: Multiple terms used interchangeably:
+- "Agent" vs "AI Agent" vs "Agent Process"
+- "Safe Output Type" vs "Operation Type" vs "Handler Type"
+- "Validation" vs "Verification" (sometimes overlapping meaning)
+
+**Impact**: **LOW** - Could cause confusion but context usually makes meaning clear.
+
+**Recommendation**: Add terminology section defining canonical terms.
+
+```markdown
+## Terminology
+
+**Agent**: The AI-powered process executing in the untrusted context. Uses read-only GitHub token.
+
+**Safe Output Type**: A category of GitHub operation (e.g., create_issue, add_comment). Each type has a corresponding MCP tool and handler.
+
+**Handler**: JavaScript implementation processing operations of a specific safe output type.
+
+**Validation**: Pre-execution checking of operation structure, limits, and content (schema validation, limit enforcement).
+
+**Verification**: Post-execution checking of configuration integrity (hash checking, signature verification).
+```
+
+---
+
+#### **Finding U2: Missing Error Code Catalog**
+
+**Issue**: Specification mentions errors but doesn't provide:
+1. Complete list of error codes
+2. Error message format requirements
+3. Guidance on when to use each error type
+
+**Impact**: **MEDIUM** - Inconsistent error handling across implementations.
+
+**Recommendation**:
+```markdown
+Add Error Code Catalog section:
+
+## Error Codes
+
+All safe output processors MUST use these error codes:
+
+| Code | Name | Description | HTTP Status |
+|------|------|-------------|-------------|
+| E001 | INVALID_SCHEMA | Operation failed JSON schema validation | 400 |
+| E002 | LIMIT_EXCEEDED | Operation count exceeds configured max | 429 |
+| E003 | UNAUTHORIZED_DOMAIN | URL contains non-allowlisted domain | 403 |
+| E004 | INVALID_TARGET_REPO | target-repo not in allowed-repos | 403 |
+| E005 | MISSING_PARENT | parent issue reference not found | 404 |
+| E006 | INVALID_LABEL | Label does not exist in repository | 404 |
+| E007 | API_ERROR | GitHub API returned error | 502 |
+| E008 | SANITIZATION_FAILED | Content sanitization detected unsafe patterns | 422 |
+
+**Error Message Format**:
+```json
+{
+  "error": {
+    "code": "E002",
+    "message": "Limit exceeded: attempted 5 create_issue operations but max is 3",
+    "details": {
+      "type": "create_issue",
+      "attempted": 5,
+      "max": 3
+    }
+  }
+}
+```
+```
+
+**Automated Rule**:
+```yaml
+# error-code-check.yml
+rules:
+  - id: error-code-usage
+    pattern: |
+      Handlers must use standardized error codes
+    check: |
+      grep -h "throw.*Error\|return.*error" actions/setup/js/*.cjs | \
+      grep -q "E[0-9]{3}" || echo "WARNING: Non-standard error codes found"
+    severity: LOW
+```
+
+---
+
+#### **Finding U3: Limited Configuration Examples**
+
+**Issue**: Section 5 provides syntax but few complete examples showing:
+1. Complex multi-type configurations
+2. Common configuration patterns
+3. Migration from simple to complex configs
+
+**Impact**: **LOW** - Users can figure it out but examples would help.
+
+**Recommendation**: Add configuration patterns appendix.
+
+```markdown
+## Appendix G: Configuration Patterns
+
+### Pattern 1: Simple Issue Tracking
+```yaml
+safe-outputs:
+  create-issue:
+    max: 1
+    labels: [automated]
+```
+
+### Pattern 2: Multi-Type with Global Footer
+```yaml
+safe-outputs:
+  footer: true  # Applied to all types
+  
+  create-issue:
+    max: 3
+    labels: [bug, automated]
+  
+  add-comment:
+    max: 2
+    hide-older-comments: true
+```
+
+### Pattern 3: Cross-Repository Operations
+```yaml
+safe-outputs:
+  allowed-github-references:
+    - owner/repo-a
+    - owner/repo-b
+  
+  create-issue:
+    max: 5
+    target-repo: owner/repo-a
+```
+
+### Pattern 4: Staged Mode Development
+```yaml
+safe-outputs:
+  staged: true  # Enable preview mode globally
+  
+  create-issue:
+    max: 10  # Safe to set high in staged mode
+  
+  add-comment:
+    max: 5
+```
+```
+
+---
+
+### 2.2 Operational Clarity Issues
+
+#### **Finding U4: Unclear Staged Mode Interaction with Other Features**
+
+**Issue**: Section 5.2 (GP2) describes staged mode but doesn't clarify:
+1. Does staged mode respect max limits? (preview all operations or only up to max?)
+2. Are footers added in staged mode previews?
+3. Does sanitization occur in staged mode?
+
+**Impact**: **LOW** - Likely obvious from implementation but could be explicit.
+
+**Recommendation**:
+```markdown
+Add to Section 5.2 (staged mode):
+
+**Staged Mode Feature Interactions**:
+
+When `staged: true`:
+- Max limits: RESPECTED (preview only shows operations up to max limit)
+- Footers: INCLUDED in previews (shows what would be added)
+- Sanitization: APPLIED (previews show sanitized content)
+- Domain filtering: APPLIED (previews show redacted URLs)
+- Cross-repo validation: APPLIED (invalid target-repo rejected even in staged mode)
+
+**Rationale**: Staged mode should accurately preview what would happen, including all validation and transformation steps.
+```
+
+---
+
+#### **Finding U5: Missing Guidance on Temporary ID Patterns**
+
+**Issue**: Temporary IDs are mentioned (e.g., Section 7.1, create_issue) but specification lacks:
+1. Complete list of where temporary IDs can be used
+2. Resolution order when multiple operations reference each other
+3. Error handling when temporary ID references are circular
+
+**Impact**: **LOW-MEDIUM** - Could lead to implementation inconsistencies.
+
+**Recommendation**:
+```markdown
+Add Temporary ID Resolution section:
+
+## Temporary ID Resolution
+
+**Pattern**: `^aw_[A-Za-z0-9]{3,8}$` (e.g., `aw_abc123`)
+
+**Supported Operations**:
+- create_issue (in `body` field, references resolved to issue numbers)
+- add_comment (in `body` field)
+- create_pull_request (in `body` field)
+
+**Resolution Order**:
+1. Operations are processed in NDJSON order
+2. When temporary ID `aw_xxx` is created, map it to actual GitHub resource number
+3. Before processing next operation, replace all `#aw_xxx` references with actual numbers
+4. If reference to undefined temporary ID is encountered, REJECT operation with E005 error
+
+**Circular References**: NOT SUPPORTED. If detected, reject with error.
+
+**Example**:
+```json
+{"type": "create_issue", "title": "Parent", "temporary_id": "aw_p1"}
+{"type": "create_issue", "title": "Child", "body": "Related to #aw_p1", "parent": "aw_p1"}
+```
+
+After processing:
+- First issue created with number #123
+- Second issue body updated to "Related to #123" and linked to parent #123
+```
+
+---
+
+## 3. Requirements Review Findings
+
+### 3.1 RFC 2119 Keyword Usage Issues
+
+#### **Finding R1: Inconsistent MUST Usage**
+
+**Issue**: Some normative requirements use descriptive language instead of RFC 2119 keywords:
+
+Examples:
+- Section 4.2: "Operations flow through the system following this precise sequence" (should be MUST?)
+- Section 5.1: "Global parameters have unreserved names" (MUST?)
+- Section 7.x: Many "When X occurs, Y happens" statements without MUST/SHOULD
+
+**Impact**: **MEDIUM** - Reduces enforceability and automated rule encoding.
+
+**Recommendation**: Audit all sections and convert implicit requirements to explicit MUST/SHOULD statements.
+
+**Automated Rule**:
+```yaml
+# rfc2119-consistency-check.yml
+rules:
+  - id: normative-statement-keywords
+    pattern: |
+      Check that normative statements use RFC 2119 keywords
+    check: |
+      # Find sentences that describe requirements without keywords
+      grep -E "(must|shall|will|is required to|needs to)" \
+        docs/src/content/docs/reference/safe-outputs-specification.md | \
+        grep -v "MUST\|SHALL\|REQUIRED" || true
+    severity: MEDIUM
+    action: |
+      Review findings and convert to RFC 2119 keywords where appropriate
+```
+
+---
+
+#### **Finding R2: Missing SHOULD Justifications**
+
+**Issue**: RFC 2119 states that SHOULD recommendations require rationale, but many SHOULDs lack justification:
+
+Examples:
+- Section 2.3, Method M2: "Security test suite SHOULD include..." (why?)
+- Section 5.2: "Staged mode is RECOMMENDED for..." (why recommended vs required?)
+
+**Impact**: **LOW** - Makes it harder to understand when to deviate from recommendations.
+
+**Recommendation**: Add rationale for all SHOULD statements.
+
+```markdown
+Example:
+**BEFORE**:
+Security test suite SHOULD include prompt injection scenarios.
+
+**AFTER**:
+Security test suite SHOULD include prompt injection scenarios.
+
+*Rationale*: While not strictly required for conformance, prompt injection testing provides critical validation that security mitigations are effective. Implementations without adversarial testing may unknowingly contain vulnerabilities. However, basic conformance can be demonstrated through functional testing alone.
+```
+
+---
+
+### 3.2 Testability Issues
+
+#### **Finding R3: Untestable Requirements**
+
+**Issue**: Some requirements are stated but lack clear test criteria:
+
+Examples:
+- Section 1.3, Principle P1: "Write permissions MUST reside in separate execution contexts" - How to verify "separate"?
+- Section 3.3, Property SP4: "Content MUST undergo sanitization" - How to prove it happened?
+
+**Impact**: **MEDIUM** - Reduces ability to verify conformance.
+
+**Recommendation**: Add verification methods to all requirements.
+
+```markdown
+Example for P1:
+
+**Principle P1: Security Through Architectural Separation**
+
+Write permissions MUST reside in separate execution contexts from AI reasoning.
+
+**Verification Methods**:
+1. **Static Analysis**: Inspect workflow YAML. Agent job MUST NOT declare write permissions.
+2. **Runtime Inspection**: In agent job, `echo $GITHUB_TOKEN` and verify it lacks write scopes using `gh api user --jq .permissions`.
+3. **Process Isolation**: Verify agent and safe output jobs run in different containers/VMs (check `$RUNNER_NAME` differs).
+```
+
+---
+
+#### **Finding R4: Missing Conformance Test Suite**
+
+**Issue**: Section 2.3 mentions conformance testing but notes "A normative conformance test suite is RECOMMENDED for future specification versions but not currently provided."
+
+**Impact**: **MEDIUM** - Without test suite, conformance is subjective.
+
+**Recommendation**: Create conformance test suite as separate deliverable.
+
+**Proposed Structure**:
+```
+tests/conformance/
+├── security/
+│   ├── test_privilege_separation.sh
+│   ├── test_validation_ordering.sh
+│   ├── test_limit_enforcement.sh
+│   └── test_sanitization.sh
+├── functionality/
+│   ├── test_create_issue.sh
+│   ├── test_add_comment.sh
+│   ├── test_staged_mode.sh
+│   └── test_cross_repo.sh
+├── configuration/
+│   ├── test_config_parsing.sh
+│   ├── test_config_validation.sh
+│   └── test_config_inheritance.sh
+└── protocol/
+    ├── test_mcp_http.sh
+    ├── test_ndjson_format.sh
+    └── test_error_responses.sh
+```
+
+Each test should:
+1. State requirement being tested (with spec section reference)
+2. Provide test setup instructions
+3. Define pass/fail criteria
+4. Include expected output examples
+
+---
+
+### 3.3 Completeness Issues
+
+#### **Finding R5: Missing Safe Output Type Requirements**
+
+**Issue**: Section 7 documents 36 safe output types, but not all have complete specifications. Some are missing:
+- **Required Permissions** documentation
+- **Operational Semantics** details
+- **Configuration Parameters** lists
+- **Security Requirements**
+
+**Impact**: **MEDIUM** - Incomplete type definitions reduce implementation consistency.
+
+**Recommendation**: Audit all 36 types and ensure each has:
+1. ✅ MCP Tool Schema
+2. ✅ Operational Semantics (5+ bullets)
+3. ✅ Configuration Parameters (complete list)
+4. ✅ Security Requirements (at least 2)
+5. ✅ Required Permissions (both token and app)
+6. ✅ Example usage
+
+**Automated Rule**:
+```yaml
+# safe-output-type-completeness-check.yml
+rules:
+  - id: type-definition-completeness
+    pattern: |
+      Each safe output type must have complete documentation
+    check: |
+      # Extract all "#### Type: " headers
+      grep "^#### Type:" docs/src/content/docs/reference/safe-outputs-specification.md | \
+      while read -r line; do
+        type_name=$(echo "$line" | sed 's/^#### Type: //')
+        
+        # Check for required sections
+        has_schema=$(grep -A 100 "^#### Type: $type_name" spec.md | grep -q "MCP Tool Schema" && echo "yes" || echo "no")
+        has_semantics=$(grep -A 100 "^#### Type: $type_name" spec.md | grep -q "Operational Semantics" && echo "yes" || echo "no")
+        has_config=$(grep -A 100 "^#### Type: $type_name" spec.md | grep -q "Configuration Parameters" && echo "yes" || echo "no")
+        has_security=$(grep -A 100 "^#### Type: $type_name" spec.md | grep -q "Security Requirements" && echo "yes" || echo "no")
+        has_permissions=$(grep -A 100 "^#### Type: $type_name" spec.md | grep -q "Required Permissions" && echo "yes" || echo "no")
+        
+        if [[ "$has_schema" != "yes" ]] || [[ "$has_semantics" != "yes" ]] || \
+           [[ "$has_config" != "yes" ]] || [[ "$has_security" != "yes" ]] || \
+           [[ "$has_permissions" != "yes" ]]; then
+          echo "INCOMPLETE: $type_name (schema=$has_schema, semantics=$has_semantics, config=$has_config, security=$has_security, permissions=$has_permissions)"
+        fi
+      done
+    severity: MEDIUM
+```
+
+---
+
+#### **Finding R6: Undefined Behavior for Edge Cases**
+
+**Issue**: Specification doesn't address several edge cases:
+1. What happens when max=0? (disable type entirely? or error?)
+2. What happens when empty NDJSON artifact is uploaded?
+3. What happens when GitHub API rate limit is exceeded?
+4. What happens when workflow is cancelled mid-execution?
+
+**Impact**: **LOW** - Edge cases are rare but should be specified.
+
+**Recommendation**: Add edge case handling section.
+
+```markdown
+## Edge Case Behavior
+
+### Empty Operations
+When NDJSON artifact contains zero operations:
+- Safe output job MUST succeed (not an error)
+- Job summary SHOULD display "No operations to process"
+
+### Zero Max Limit
+When `max: 0` is configured for a type:
+- Type is DISABLED (MCP tool not registered)
+- Attempts to invoke disabled type MUST return error
+
+### API Rate Limiting
+When GitHub API rate limit is exceeded:
+- Safe output processor MUST retry with exponential backoff
+- After 3 retries, MUST fail with clear error message
+- MUST log rate limit reset time
+
+### Workflow Cancellation
+When workflow is cancelled during agent execution:
+- Safe output job MUST NOT execute
+- Partial NDJSON artifact MUST NOT be processed
+- No GitHub resources MUST be created
+
+### Concurrent Workflow Runs
+When multiple runs execute concurrently:
+- Each run operates independently
+- Max limits are per-run, not global
+- No coordination between runs (no distributed locking)
+```
+
+---
+
+## 4. Automated Checker Rules
+
+Based on the findings above, here is a comprehensive set of automated rules that can be encoded:
+
+### 4.1 Security Rules
+
+```yaml
+# security-checks.yml
+rules:
+  - id: SEC-001
+    name: Privilege Separation Enforcement
+    description: Verify agent jobs lack write permissions
+    check: |
+      grep -A 50 "jobs:" compiled-workflow.lock.yml | \
+      grep -A 10 "agent:" | \
+      grep -E "issues:.*write|pull-requests:.*write" && exit 1 || exit 0
+    severity: CRITICAL
+    remediation: Remove write permissions from agent job
+
+  - id: SEC-002
+    name: Validation Before API Calls
+    description: Ensure validation happens before GitHub API calls
+    check: |
+      for handler in actions/setup/js/*.cjs; do
+        [[ "$handler" =~ test ]] && continue
+        # Check that validation functions appear before API calls
+        awk '/octokit\./{line=NR} /validate|sanitize/{if(NR<line) valid=1} END{exit !valid}' "$handler" || {
+          echo "FAIL: $handler has API calls before validation"
+          exit 1
+        }
+      done
+    severity: CRITICAL
+    remediation: Reorder code to validate before API calls
+
+  - id: SEC-003
+    name: Max Limit Enforcement
+    description: Verify max limits are checked before processing
+    check: |
+      for handler in actions/setup/js/*.cjs; do
+        [[ "$handler" =~ test ]] && continue
+        grep -q "\.length.*>.*\.max\|enforceMaxLimit\|checkLimit" "$handler" || {
+          echo "WARN: $handler may not enforce max limits"
+        }
+      done
+    severity: HIGH
+    remediation: Add max limit check before processing operations
+
+  - id: SEC-004
+    name: Content Sanitization Required
+    description: Verify content sanitization is applied
+    check: |
+      for handler in actions/setup/js/*.cjs; do
+        [[ "$handler" =~ (test|parse|buffer) ]] && continue
+        grep -q "sanitize\|stripHTML\|escapeMarkdown" "$handler" || {
+          echo "WARN: $handler may not sanitize content"
+        }
+      done
+    severity: MEDIUM
+    remediation: Import and use sanitization functions
+
+  - id: SEC-005
+    name: Cross-Repository Validation
+    description: Verify cross-repo operations check allowlist
+    check: |
+      for handler in actions/setup/js/*.cjs; do
+        grep -q "target.*repo\|targetRepo" "$handler" && {
+          grep -q "allowed.*repos\|validateTargetRepo" "$handler" || {
+            echo "FAIL: $handler has target-repo but no allowlist check"
+            exit 1
+          }
+        }
+      done
+    severity: HIGH
+    remediation: Add allowlist validation for cross-repo operations
+```
+
+### 4.2 Usability Rules
+
+```yaml
+# usability-checks.yml
+rules:
+  - id: USE-001
+    name: Error Code Standardization
+    description: Verify handlers use standardized error codes
+    check: |
+      for handler in actions/setup/js/*.cjs; do
+        [[ "$handler" =~ test ]] && continue
+        # Check for error codes in E### format
+        grep -E "throw.*Error|return.*error" "$handler" | \
+        grep -qE "E[0-9]{3}" || {
+          echo "WARN: $handler may not use standard error codes"
+        }
+      done
+    severity: LOW
+    remediation: Use standard E### error codes
+
+  - id: USE-002
+    name: Footer Attribution Required
+    description: Verify footers are added when footer:true
+    check: |
+      for handler in actions/setup/js/*.cjs; do
+        grep -q "footer.*true\|addFooter\|attribution" "$handler" || {
+          echo "INFO: $handler may not add footers"
+        }
+      done
+    severity: LOW
+    remediation: Ensure footer is appended when configured
+
+  - id: USE-003
+    name: Staged Mode Preview Format
+    description: Verify staged mode uses correct preview format
+    check: |
+      for handler in actions/setup/js/*.cjs; do
+        grep -q "staged.*true\|isStaged" "$handler" && {
+          grep -q "🎭\|Staged Mode" "$handler" || {
+            echo "WARN: $handler has staged mode but no 🎭 emoji"
+          }
+        }
+      done
+    severity: LOW
+    remediation: Use 🎭 emoji in staged mode previews
+```
+
+### 4.3 Requirements Rules
+
+```yaml
+# requirements-checks.yml
+rules:
+  - id: REQ-001
+    name: RFC 2119 Keyword Usage
+    description: Verify normative sections use MUST/SHOULD/MAY
+    check: |
+      # Check sections that should have normative statements
+      for section in "Security Architecture" "Configuration Semantics" "Execution Guarantees"; do
+        grep -A 200 "## .*$section" docs/src/content/docs/reference/safe-outputs-specification.md | \
+        grep -q "MUST\|SHALL\|SHOULD\|MAY" || {
+          echo "WARN: Section '$section' may lack RFC 2119 keywords"
+        }
+      done
+    severity: MEDIUM
+    remediation: Add MUST/SHOULD/MAY keywords to normative statements
+
+  - id: REQ-002
+    name: Safe Output Type Completeness
+    description: Verify each type has all required documentation sections
+    check: |
+      # Extract type names and check for required sections
+      grep "^#### Type:" docs/src/content/docs/reference/safe-outputs-specification.md | \
+      sed 's/^#### Type: //' | while read -r type; do
+        sections_found=0
+        for section in "MCP Tool Schema" "Operational Semantics" "Configuration Parameters" "Security Requirements" "Required Permissions"; do
+          grep -A 150 "^#### Type: $type" docs/src/content/docs/reference/safe-outputs-specification.md | \
+          grep -q "**$section**" && ((sections_found++))
+        done
+        if [ $sections_found -lt 5 ]; then
+          echo "INCOMPLETE: Type '$type' has only $sections_found/5 required sections"
+        fi
+      done
+    severity: MEDIUM
+    remediation: Complete documentation for all safe output types
+
+  - id: REQ-003
+    name: Verification Method Specification
+    description: Verify requirements include verification methods
+    check: |
+      # Check that key requirements have verification guidance
+      for req in "AR1" "AR2" "AR3" "SP1" "SP2" "SP3"; do
+        grep -A 30 "**Requirement $req:\|**Property $req:" docs/src/content/docs/reference/safe-outputs-specification.md | \
+        grep -q "Verification:\|Formal Definition:" || {
+          echo "WARN: Requirement $req may lack verification method"
+        }
+      done
+    severity: LOW
+    remediation: Add verification methods to all key requirements
+```
+
+### 4.4 Implementation Rules
+
+```yaml
+# implementation-checks.yml
+rules:
+  - id: IMP-001
+    name: Handler Registration Completeness
+    description: Verify all configured types have handlers
+    check: |
+      # Extract safe output types from config
+      configured_types=$(jq -r '.tools | keys[]' /opt/gh-aw/safeoutputs/config.json 2>/dev/null || echo "")
+      
+      # Check each has handler file
+      for type in $configured_types; do
+        handler_file="actions/setup/js/$(echo $type | tr '_' '-').cjs"
+        [ ! -f "$handler_file" ] && {
+          echo "MISSING: Handler file $handler_file for type $type"
+          exit 1
+        }
+      done
+    severity: HIGH
+    remediation: Create handler file for configured type
+
+  - id: IMP-002
+    name: Permission Computation Accuracy
+    description: Verify computed permissions match type requirements
+    check: |
+      # Compare permissions in workflow vs required permissions
+      go test -run TestPermissionsComputation ./pkg/workflow/ || {
+        echo "FAIL: Permission computation tests failed"
+        exit 1
+      }
+    severity: HIGH
+    remediation: Fix permission computation logic
+
+  - id: IMP-003
+    name: Schema Validation Consistency
+    description: Verify schemas match specification
+    check: |
+      # For each type, compare generated schema to spec
+      for type in create_issue add_comment; do
+        # Extract schema from spec
+        spec_required=$(grep -A 20 "\"$type\"" docs/src/content/docs/reference/safe-outputs-specification.md | \
+                       grep "\"required\":" | head -1)
+        
+        # Compare to generated config
+        config_required=$(jq -r ".tools.${type}.inputSchema.required" /opt/gh-aw/safeoutputs/config.json 2>/dev/null)
+        
+        # Simplified check (actual implementation would be more thorough)
+        [ -z "$config_required" ] && echo "WARN: Type $type schema may not match spec"
+      done
+    severity: MEDIUM
+    remediation: Update schema generation to match specification
+```
+
+---
+
+## 5. Recommendations Summary
+
+### 5.1 High Priority (Must Address)
+
+1. **[S2] Clarify Cross-Repository Security Model** - Add explicit validation rules and precedence order for allowlists
+2. **[R1] Audit and Fix RFC 2119 Keyword Usage** - Ensure all normative statements use MUST/SHOULD/MAY
+3. **[R3] Add Verification Methods to Requirements** - Make all requirements testable
+4. **[R4] Create Conformance Test Suite** - Provide normative tests for implementers
+
+### 5.2 Medium Priority (Should Address)
+
+1. **[S1] Specify Validation Pipeline Ordering** - Eliminate ambiguity in validation sequence
+2. **[S3] Complete Content Sanitization Specification** - Define exact sanitization rules
+3. **[U2] Add Error Code Catalog** - Standardize error codes and messages
+4. **[R5] Complete Safe Output Type Documentation** - Ensure all 36 types have full specifications
+5. **[R6] Define Edge Case Behavior** - Specify handling of unusual conditions
+
+### 5.3 Low Priority (Nice to Have)
+
+1. **[U1] Standardize Terminology** - Add glossary for consistency
+2. **[U3] Add Configuration Examples** - Provide more real-world examples
+3. **[U4] Clarify Staged Mode Interactions** - Document how staged mode affects other features
+4. **[U5] Add Temporary ID Resolution Guidance** - Complete specification of temporary ID handling
+5. **[S5] Add Rate Limiting Guidance** - Provide optional cross-run limit recommendations
+
+---
+
+## 6. Implementation Roadmap
+
+### Phase 1: Critical Security Clarifications (Week 1-2)
+- [ ] Add validation pipeline ordering specification (S1)
+- [ ] Complete cross-repository security model (S2)
+- [ ] Add content sanitization requirements (S3)
+- [ ] Implement automated security checks (SEC-001 through SEC-005)
+
+### Phase 2: Requirements and Testability (Week 3-4)
+- [ ] Audit and fix RFC 2119 keyword usage (R1)
+- [ ] Add verification methods to all requirements (R3)
+- [ ] Begin conformance test suite development (R4)
+- [ ] Implement requirements checks (REQ-001 through REQ-003)
+
+### Phase 3: Usability Improvements (Week 5-6)
+- [ ] Add error code catalog (U2)
+- [ ] Create configuration examples appendix (U3)
+- [ ] Add terminology section (U1)
+- [ ] Document staged mode interactions (U4)
+- [ ] Implement usability checks (USE-001 through USE-003)
+
+### Phase 4: Completeness (Week 7-8)
+- [ ] Complete all 36 safe output type specifications (R5)
+- [ ] Add edge case behavior definitions (R6)
+- [ ] Add temporary ID resolution section (U5)
+- [ ] Complete conformance test suite (R4)
+- [ ] Implement implementation checks (IMP-001 through IMP-003)
+
+---
+
+## 7. Conclusion
+
+The Safe Outputs MCP Gateway Specification provides a strong foundation for secure AI-to-GitHub operations with clear architectural separation and comprehensive threat mitigation strategies. The specification follows W3C conventions and uses RFC 2119 keywords appropriately in most sections.
+
+**Key achievements**:
+- ✅ Well-defined security architecture
+- ✅ Comprehensive threat model
+- ✅ Detailed safe output type catalog
+- ✅ Clear permission requirements
+
+**Areas for improvement**:
+- Clarify ambiguous security requirements (validation ordering, cross-repo rules)
+- Enhance testability with verification methods
+- Improve usability through standardized errors and examples
+- Complete documentation for all safe output types
+
+**Next steps**:
+1. Prioritize and address high-priority findings
+2. Develop conformance test suite
+3. Implement automated checker rules
+4. Iterate on specification based on implementation feedback
+
+---
+
+**Document Version**: 1.0  
+**Last Updated**: 2026-02-14  
+**Status**: Initial Review Complete