RyderFreeman4Logos · RyderFreeman4Logos · Feb 9, 2026 · Feb 9, 2026 · Feb 9, 2026
diff --git a/skills/csa-review/SKILL.md b/skills/csa-review/SKILL.md
@@ -333,6 +333,27 @@ User: /csa-review tool=opencode scope=base:dev
 ```
 -> Uses opencode instead of auto-detected tool
 
+## Disagreement Escalation (when findings are contested)
+
+When the developer (or orchestrating agent) disagrees with a csa-review finding:
+
+1. **NEVER silently dismiss findings.** Every finding was produced by an independent
+   model with evidence — it deserves adversarial evaluation, not unilateral dismissal.
+
+2. **Use the `debate` skill** to arbitrate contested findings:
+   - The finding becomes the "question" for debate
+   - The reviewer's evidence is the initial proposal
+   - The developer's counter-argument is the critique
+   - The debate MUST use heterogeneous models (different from both the reviewer and developer)
+
+3. **Record the outcome**: If a finding is dismissed after debate, document the
+   debate verdict (with model specs) in the review report or PR comment.
+
+4. **Escalate to user** if debate reaches deadlock (both sides have valid points).
+
+**FORBIDDEN**: Dismissing a csa-review finding without adversarial arbitration.
+The code author's confidence alone is NOT sufficient justification.
+
 ## Done Criteria
 
 1. Review prompt was sent to CSA with the correct tool.
@@ -344,3 +365,4 @@ User: /csa-review tool=opencode scope=base:dev
 7. If security_mode required pass 3, adversarial_pass_executed=true.
 8. If mode=review-and-fix, fix artifacts exist and session was resumed (not new).
 9. CSA session ID was reported for potential follow-up.
+10. **If any finding was contested**: debate skill was used with heterogeneous models, and outcome documented with model specs.
diff --git a/skills/debate/SKILL.md b/skills/debate/SKILL.md
@@ -199,6 +199,62 @@ After the debate concludes (convergence or max rounds/escalations reached), YOU
 - Models used: {model_list}
 ```
 
+## Audit Trail Requirements (MANDATORY)
+
+Every debate result MUST include:
+1. **Full model specs** for ALL participants in `tool/provider/model/thinking_budget` format
+2. **Round-by-round transcript** (at minimum: position summaries per round)
+3. **Final verdict** with which side prevailed and rationale
+4. **Escalation history** if tier escalation occurred
+
+**Why**: Debate results are used as evidence in code review arbitration (pr-codex-bot),
+security audits, and design decisions. Without model specs, future reviewers cannot
+assess the quality or heterogeneity of the arbitration.
+
+## PR Integration (when used for code review arbitration)
+
+When the debate skill is invoked from `pr-codex-bot` Step 8 (false positive arbitration)
+or any code review context where results will be posted to a PR:
+
+### MANDATORY: Post Results to PR
+
+The debate result MUST be posted as a PR comment for audit trail. The caller
+(typically pr-codex-bot) is responsible for posting, but the debate output MUST
+include all information needed:
+
+1. **Participants section** with full model specs (both sides)
+2. **Bot's original concern** (what was being debated)
+3. **Round-by-round summary** (not full transcript — keep PR comments readable)
+4. **Conclusion** with verdict (DISMISSED / CONFIRMED / ESCALATED)
+5. **CSA session ID** (if applicable, for full transcript retrieval)
+
+### Template for PR Comment
+
+```markdown
+**Local arbitration result: [DISMISSED|CONFIRMED|ESCALATED].**
+
+## Participants
+- **Author**: `{tool}/{provider}/{model}/{thinking_budget}`
+- **Arbiter**: `{tool}/{provider}/{model}/{thinking_budget}`
+
+## Debate Summary
+### Round 1
+- **Proposer** (`{model}`): [position summary]
+- **Critic** (`{model}`): [counter-argument summary]
+### Round N...
+
+## Conclusion
+[verdict, rationale, which side prevailed]
+
+## Audit
+- Rounds: {N}, Escalations: {N}
+- CSA session: `{session_id}`
+```
+
+**FORBIDDEN**: Posting a debate result without model specs. If model specs cannot be
+determined (e.g., CSA returned no metadata), report this explicitly in the comment
+rather than omitting the section.
+
 ## Constraints
 
 - **No hardcoded models**: All models come from `csa tiers list`.
@@ -241,3 +297,5 @@ Debate flow:
 4. No hardcoded model names in any invocation.
 5. Zero direct tool invocations (all through `csa run`).
 6. If any CSA command failed, debate was stopped and error reported.
+7. **All participant model specs listed in `tool/provider/model/thinking_budget` format** (audit trail).
+8. **If used for PR arbitration**: debate result posted to PR comment with full model specs and round summaries (see PR Integration section above).
diff --git a/skills/pr-codex-bot/SKILL.md b/skills/pr-codex-bot/SKILL.md
@@ -52,6 +52,22 @@ Step 7: Evaluate each comment
                              Step 13: Merge (remote + local) ✅ DONE
 ```
 
+## FORBIDDEN Actions (VIOLATION = SOP breach)
+
+- **NEVER dismiss a bot comment as "false positive" using your own reasoning alone** — you are the code author; your judgment is inherently biased
+- **NEVER reply to a bot comment without completing Step 8 (local arbitration)** — even if the false positive seems "obvious", the arbiter will confirm instantly
+- **NEVER skip the debate step for any reason** — "too simple", "clearly wrong", "design disagreement" are NOT valid excuses
+- **NEVER post a dismissal comment without full model specs** (`tool/provider/model/thinking_budget`) for both debate participants
+- **NEVER use the same model family for arbitration** as you (the main agent)
+
+**If you believe a bot comment is wrong, you MUST:**
+1. Queue it for Step 8 (local arbitration) — NOT reply directly
+2. Get an independent verdict from a heterogeneous model via CSA
+3. If the arbiter disagrees with you, debate adversarially (Step 8.3b)
+4. Post the full audit trail (with model specs for BOTH sides) to the PR comment
+
+**Any self-dismissal without arbitration is an SOP VIOLATION that undermines the entire review process. The point of heterogeneous review is that no single model — including you — gets to be judge of its own code.**
+
 ## Parameters
 
 Extract from user message or PR context:
@@ -203,6 +219,14 @@ The cloud bot has limited context and cannot execute commands to verify its
 claims. Do not debate with it directly — instead, get an independent local
 second opinion first.
 
+**SOP VIOLATION WARNING**: You MUST NOT skip Step 8 for ANY Category B comment.
+Even if you are "99% sure" it is a false positive, you MUST get an independent
+heterogeneous model verdict. Your confidence as the code author is irrelevant —
+the entire point of this process is that **no single model judges its own code**.
+
+Replying directly with your own reasoning (e.g., "This is a design choice,
+dismissing.") without completing Step 8 is a **FORBIDDEN action** (see above).
+
 ### Category C: Real Issue
 
 The bot found a genuine bug or improvement.
@@ -288,18 +312,27 @@ Arbiter says...
 
 ### Step 8.3a: Arbiter Confirms False Positive
 
-React 👎 on PR and post the arbitration result as audit trail
+React 👎 on PR and post the arbitration result as audit trail with **full model specs**
 (**do NOT `@codex`**):
 
 ```bash
 gh api "repos/${REPO}/pulls/comments/${COMMENT_ID}/reactions" \
   -X POST -f content='-1'
 gh api "repos/${REPO}/pulls/${PR_NUM}/comments" \
   -X POST \
-  -f body="**Dismissed after local arbitration.** [summary of arbiter reasoning]. [cite file:line evidence]." \
+  -f body="**Dismissed after local arbitration.**
+
+**Participants:**
+- Author: \`{your_tool}/{your_provider}/{your_model}/{your_thinking_budget}\`
+- Arbiter: \`{arbiter_tool}/{arbiter_provider}/{arbiter_model}/{arbiter_thinking_budget}\`
+
+**Reasoning:** [summary of arbiter reasoning]. [cite file:line evidence]." \
   -F in_reply_to=${COMMENT_ID}
 ```
 
+**MANDATORY**: Model specs MUST use the `tool/provider/model/thinking_budget` format
+(matching CSA tiers). This enables future reviewers to verify heterogeneous arbitration.
+
 ### Step 8.3b: Arbiter Confirms Real Issue or Uncertain → YOU Debate
 
 **CRITICAL: YOU (the main agent / code author) MUST debate with the arbiter.**
@@ -340,33 +373,46 @@ multi-round adversarial debate with automatic tier escalation.
 | Arbiter convinced you | React 👍 + queue for Step 9 (fix) |
 | Deadlock (each side has valid points) | **Escalate to user** |
 
-Post the full debate summary as a PR comment for audit trail
+Post the full debate summary as a PR comment for audit trail with **full model specs**
 (**do NOT `@codex`**):
 
 ```bash
 gh api "repos/${REPO}/pulls/${PR_NUM}/comments" \
   -X POST \
   -f body="**Local arbitration result: [DISMISSED|CONFIRMED|ESCALATED].**
 
+## Participants (MANDATORY for auditability)
+- **Author**: \`{your_tool}/{your_provider}/{your_model}/{your_thinking_budget}\`
+- **Arbiter**: \`{arbiter_tool}/{arbiter_provider}/{arbiter_model}/{arbiter_thinking_budget}\`
+
 ## Bot's concern
 [bot comment summary]
 
 ## Arbiter's independent assessment
 [arbiter's initial verdict and reasoning]
 
-## Debate (YOU vs Arbiter)
+## Debate (Author vs Arbiter)
 ### Round 1
-- **Author (Claude)**: [your counter-argument]
-- **Arbiter (CSA)**: [arbiter's response]
+- **Author** (\`{your_model}\`): [your counter-argument]
+- **Arbiter** (\`{arbiter_model}\`): [arbiter's response]
 ### Round 2 (if needed)
-- **Author (Claude)**: [your rebuttal]
-- **Arbiter (CSA)**: [arbiter's response]
+- **Author** (\`{your_model}\`): [your rebuttal]
+- **Arbiter** (\`{arbiter_model}\`): [arbiter's response]
 
 ## Conclusion
-[final verdict, which side prevailed, and rationale]" \
+[final verdict, which side prevailed, and rationale]
+
+## Audit
+- Debate rounds: {N}
+- CSA session: \`{session_id}\` (if applicable)
+- Debate skill used: [yes/no — if complex, the \`debate\` skill provides structured multi-round debate]" \
   -F in_reply_to=${COMMENT_ID}
 ```
 
+**MANDATORY**: Both model specs MUST use the `tool/provider/model/thinking_budget` format.
+The audit section enables future reviewers (human or AI) to verify that heterogeneous
+models were used and assess the quality of the arbitration.
+
 ### `@codex` Tagging Rules
 
 - **NEVER `@codex` in false positive replies** — the bot ignores threaded