feat(review): add code-review-pipeline with parallel specialist agents#432
feat(review): add code-review-pipeline with parallel specialist agents#432guyathomas wants to merge 12 commits intoobra:mainfrom
Conversation
Comprehensive improvements based on skill grading report. Score Improvements: - PDA: 22/30 -> 28/30 (+6) - Ease of Use: 20/25 -> 24/25 (+4) - Spec Compliance: 11/15 -> 13/15 (+2) - Writing Style: 8/10 -> 10/10 (+2) - Utility: 14/20 -> 19/20 (+5) - Modifiers: -4 -> +3 (+7) - TOTAL: 71/100 -> 97/100 (+26) Changes Made: - Added trigger phrases to description - Restructured workflow into 17 numbered steps across 5 phases - Fixed second-person voice to imperative form - Removed redundancy between Overview and Key Principles - Created references/design-template.md - Created references/example-session.md
New skill for comprehensive research using: - State machine: INIT → DECOMPOSE → RESEARCH → EVALUATE → SYNTHESIZE - 8-question decomposition framework covering all research angles - Parallel subagent dispatch (max 3 per batch) - Source tiering (Tier 1: .gov/.edu, Tier 2: publications, Tier 3: blogs) - Confidence-based iteration with explicit exit criteria - MCP tool fallback (firecrawl → WebFetch) - State recovery for interrupted sessions 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- test-reviewer: Independent subagent for catching AI "cheat tests" - Pure XML structure with 5-category checklist - Detects tests that verify implementation instead of behavior - Catches tautological mocks, over-mocking, coverage gaming - Uses fresh context to avoid implementer bias - test-planning: Minimal skill for test strategy decisions - Granularity guidance (unit/integration/e2e) - Testing pyramid reference - Integrates with TDD skill and test-reviewer subagent 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Bumps version to 4.0.4. UX/UI design skill (skills/ux-ui-design/): - Router pattern with 5 workflows: ideate, design, review, implement, iterate - 8 reference files: accessibility, design-systems, usability-heuristics, visual-design, user-research, patterns, anti-patterns, tools - Figma MCP integration for design context extraction - WCAG AA accessibility requirements built into workflows - Design system token compliance enforcement - Disambiguation to prevent confusion with architecture design /research slash command (commands/research.md): - Thin wrapper exposing superpowers:research skill - No code duplication 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add research-stop-hook.sh that blocks exit until targetSources is met - Add /research-cancel command for graceful abort with progress report - Migrate research skill to pure XML structure - Add resource tracking: sourcesGathered, subagentCalls, findingsCount, startTime - Update EVALUATE phase with two-stage decision (source gate + quality gate) - Add resource usage reporting on completion/cancel - Update README with research skill documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add PostToolUse hooks for Skill and Task tools to log usage - Create skill-analytics.sh to append events to JSONL file - Create render-skill-tree.sh to display ASCII call tree at session end - Add /skill-stats slash command for aggregate frequency and flow analysis - Bump version to 4.0.6 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Delete commands/research-cancel.md slash command - Increment version from 4.0.6 to 4.0.7 in plugin.json
Skill orchestrates parallel code review via specialist agents: - implementation-reviewer, test-reviewer, architecture-reviewer, tech-practices-reviewer, ui-reviewer - review-nudge stop hook nudges review after code changes - pre-commit-quality-gate blocks commits on lint/typecheck failure
📝 WalkthroughWalkthroughVersion bump to 4.1.1 introduces research workflow, specialized reviewer agents (architecture, implementation, tech-practices, test, UI), a code-review-pipeline skill, planning capability, and pre-commit quality gates. Concurrent removal of bundled procedural skills (brainstorming, executing-plans, requesting-code-review, writing-plans) and extensive documentation consolidation, while extending hook infrastructure and agent specifications. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes The diff involves substantial architectural changes across multiple categories: new research workflow with complex state management and gating logic, five new agent specifications with distinct review frameworks, refactored hook infrastructure with external utilities, removal of nine bundled skills plus extensive documentation (~3000+ lines deleted), and coordinated test suite updates. While many deletions are straightforward, the research skill's state machine complexity (INIT→DECOMPOSE→RESEARCH→EVALUATE→SYNTHESIZE with source gating), hook script logic (pre-commit checks, research progress tracking), and multiple agent output schemas require careful verification of correctness and integration points. The heterogeneity of changes—combining feature additions with consolidation and removals—demands separate reasoning for each cohort. Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 13
🤖 Fix all issues with AI agents
In `@agents/tech-practices-reviewer.md`:
- Line 43: Update the severity enum usage so all reviewer agents and the
pipeline agree: either add "critical" to the severity field in reviewer configs
(e.g., update "severity": "high|medium|low" in tech-practices-reviewer.md,
architecture-reviewer.md, test-reviewer.md to "critical|high|medium|low") and
ensure code that emits findings (implementation-reviewer, ui-reviewer, etc.)
uses the same enum, or remove the Critical tier from the code-review-pipeline
SKILL.md and any grouping logic; choose one approach and apply it consistently
across the unique symbols: severity fields in tech-practices-reviewer.md,
architecture-reviewer.md, test-reviewer.md, implementation-reviewer.md,
ui-reviewer.md and the grouping logic in code-review-pipeline SKILL.md so all
files share the same set of severity values.
In `@hooks/pre-commit-quality-gate.sh`:
- Around line 140-148: The script currently uses set -e which makes run_lint or
run_typecheck abort the script immediately on non-zero exit, preventing both
checks from running and preventing the intended final exit code handling; modify
the calls to run_lint and run_typecheck so their failures are suppressed and
captured (e.g. run them with a construct that prevents immediate exit and
captures their exit status) and append any error output or status into the
existing errors variable, then let the subsequent if [[ -n "$errors" ]] block
decide to echo and exit 2; specifically update the invocations of run_lint and
run_typecheck and their handling so they do not trigger set -e but instead
record failures into errors.
- Around line 61-71: The current logic collects staged_files and pipes them
through plain xargs which breaks on filenames with spaces and also misses
ESLint's TypeScript flat config; change the git diff invocation to produce
NUL-delimited output (use git ... -z --name-only --diff-filter=ACMR) and consume
it with xargs -0 when calling eslint/ruff/pyright so filenames with spaces are
preserved; also include eslint.config.ts in the config file existence checks
(alongside .eslintrc, .eslintrc.js, .eslintrc.json, eslint.config.js,
eslint.config.mjs) and apply the same -z / xargs -0 pattern to the ruff and
pyright sections (refer to the staged_files variable and the eslint/ruff/pyright
command invocations and output handling).
In `@hooks/research-stop-hook.sh`:
- Around line 12-13: The script currently assigns HOOK_INPUT=$(cat) but never
uses HOOK_INPUT; remove this unused assignment or rename it to _HOOK_INPUT to
indicate intentional discard and silence SC2034; locate the assignment of
HOOK_INPUT in hooks/research-stop-hook.sh (the HOOK_INPUT variable) and either
delete that line or change the variable name to _HOOK_INPUT if the value must be
consumed for side effects.
- Around line 196-201: The cancelled/complete branches currently echo raw text
from generate_report which breaks consumers; instead wrap the report into the
hook JSON shape used elsewhere. Update the branches handling current_phase ==
"CANCELLED" and "COMPLETE" to emit a single JSON object like
{"hookSpecificOutput": <report>} (ensuring <report> is valid JSON or properly
escaped), e.g. capture report=$(generate_report ...) and then printf
'{"hookSpecificOutput": %s}\n' "$report" (or use jq to build the object) before
exiting; keep function/name references generate_report and current_phase so you
modify the same code paths.
In `@hooks/review-nudge-hook.sh`:
- Around line 9-12: The check against stop_hook_active in review-nudge-hook.sh
is dead/unassigned; either remove the conditional block (the if [[
"${stop_hook_active:-}" == "true" ]]; then exit 0 fi) or explicitly document and
enforce how stop_hook_active is set/exported (e.g., add a comment above the
check stating it must be exported by a prior stop hook such as
research-stop-hook.sh or add an export in that prior hook) so the coordination
mechanism is explicit; update the code to match one of these two approaches and
ensure any referenced hook (research-stop-hook.sh) actually sets/export
stop_hook_active when chosen.
In `@RELEASE-NOTES.md`:
- Around line 54-73: Add a new release notes entry (e.g., v4.1.2) documenting
the three headline features introduced by this PR: the code-review-pipeline
skill, the pre-commit-quality-gate hook, and the review-nudge hook; describe
each briefly (what it does and triggers or integration points), mention any
CLI/slash command or config names if applicable, and place it alongside the
existing v4.0.4 section so users can discover these additions in
RELEASE-NOTES.md.
In `@skills/code-review-pipeline/SKILL.md`:
- Around line 30-44: The classification table in the DIFF phase currently
conflicts with the dispatch map: update the DIFF phase's file-pattern
classification and/or dispatch map so they agree—pick one authority (preferably
keep the dispatch map's conditional gates) and modify the classification rows
(e.g., the `.ts, .js, .py, .rs, .go` row in the table) to remove unconditional
`architecture`, `test`, or `ui` tags and instead document that `architecture` is
added only for new/moved files or changed exports, `test` is added only when
source files (not just tests) changed, and `ui` only for UI component files;
ensure the deduplication step and the phrase "The `implementation` reviewer is
ALWAYS dispatched" remain consistent with the chosen authority so the DIFF phase
logic (phase name "DIFF") and subsequent deduplication produce the intended
reviewer set.
- Around line 79-91: The AGGREGATE/ACT pipeline references a "Critical" severity
but none of the reviewer agents (architecture reviewer, implementation reviewer,
tech-practices reviewer, test reviewer, UI reviewer) produce it because their
severity enums only include "high|medium|low"; update those reviewer agents'
severity enum definitions to include "critical" (so findings can be promoted to
AGGREGATE's Critical tier) and ensure any place branching on severity (e.g., ACT
handling logic) accepts "critical"; alternatively, if you prefer one change,
collapse the pipeline's "Critical" tier into "High" in the AGGREGATE and ACT
phases so the existing enums remain valid—choose one approach and apply
consistently across the reviewer agent enums and AGGREGATE/ACT logic.
In `@skills/research/SKILL.md`:
- Around line 73-75: Add a language identifier to the fenced code block
containing "INIT → DECOMPOSE → RESEARCH → EVALUATE → [RESEARCH or SYNTHESIZE] →
DONE" in SKILL.md to satisfy markdownlint MD040; change the opening fence from
``` to ```text so the block reads with a language tag (e.g., use ```text).
- Line 62: Update the table entry that currently reads "Scrape full page content
(preferred)" to use the hyphenated compound adjective "full-page" so it reads
"Scrape full-page content (preferred)"; locate the row containing the
`firecrawl-mcp:firecrawl_scrape` identifier in SKILL.md and replace the phrase
accordingly.
- Around line 372-380: The error_handling table in SKILL.md is missing the
required leading/trailing pipes and blank lines around the table which triggers
markdownlint; edit the <error_handling> section to add a blank line before and
after the table and ensure every header and row starts and ends with a pipe
(e.g., "| Error | Action |" and "|-------|--------|", and each data row "|
Malformed JSON | Retry once, then mark low confidence |", etc.) so the table is
properly formatted for markdownlint to pass.
- Around line 393-404: The markdown table under the <red_flags> block (starting
with "STOP if you catch yourself thinking any of these:") has inconsistent pipe
usage and mismatched column counts; fix it by updating the header and every
table row to include leading and trailing pipes and ensure the header separator
(the --- row) has the same number of columns as the header (two columns:
"Thought" and "Reality"), e.g., add a leading | and trailing | to each line and
make the separator |---------|---------| match the columns so markdownlint
passes.
🧹 Nitpick comments (12)
agents/tech-practices-reviewer.md (1)
34-53: Output instructions may conflict: "no markdown fences" but example uses fenced block.Line 34 says "Return ONLY this JSON (no markdown fences, no commentary)" but the example itself is wrapped in a markdown code fence (lines 36–53). This could confuse the agent — it may return the output with fences (mimicking the example) or without (following the instruction). This same pattern appears in the other reviewer agents too.
Consider either removing the fences from the example or rewording the instruction to clarify that the fences in the document are just for formatting.
agents/architecture-reviewer.md (1)
37-54: Add a language identifier to the fenced code block.The static analysis tool flagged MD040: fenced code blocks should have a language specified. Adding
jsonto the opening fence improves rendering and lint compliance.Proposed fix
-``` +```json { "agent": "architecture-reviewer",agents/test-reviewer.md (1)
36-56: Add a language tag to the JSON fence.The JSON example is a fenced block without a language identifier, which violates MD040 and reduces readability.
✍️ Suggested fix
-``` +```json { "agent": "test-reviewer", "filesReviewed": ["src/foo.ts", "src/foo.test.ts"], "findings": [ { "severity": "high|medium|low", "confidence": 85, "file": "src/foo.ts", "line": 42, "issue": "New error handling branch has no test coverage", "recommendation": "Add test case for when fetchUser throws NetworkError", "category": "test-quality" } ], "missingTests": [ "Test error path when fetchUser throws NetworkError in src/foo.ts:42", "Test boundary condition for empty array input in src/bar.ts:15" ], "summary": "3 coverage gaps, 1 antipattern found" }skills/planning/SKILL.md (1)
57-97: Add language tags to the fenced step blocks.Several fenced blocks are untyped, which triggers MD040 and makes intent less clear. Use
text(or remove fences) for these numbered step examples.✍️ Suggested fix
-``` +```text 1. resolve-library-id for each relevant library 2. query-docs for the specific feature/API needed 3. Note: version-specific gotchas, recommended patterns, deprecations -``` +``` -``` +```text 1. Search for "[feature] [framework] implementation" 2. Search for "[feature] best practices [year]" 3. Look for: blog posts with code, official guides, comparison articles -``` +``` -``` +```text 1. search-code for the pattern/API in real projects 2. search-repositories for projects solving the same problem 3. Look for: how production codebases structure this, common pitfalls -``` +``` -``` +```text ### Approach N: [Name] **How it works:** [2-3 sentences] **Evidence:** - Context7: [what the docs say about this approach] - Serper: [what real-world articles recommend] - GitHub: [how production codebases do it] **Trade-offs:** - Pro: [concrete benefit with source] - Pro: [concrete benefit with source] - Con: [concrete drawback with source] **Fits this project because:** [why this works for the specific codebase] -``` +```agents/implementation-reviewer.md (1)
38-56: Add a language tag to the JSON fence.MD040 flagged the untyped fenced block. A
jsontag improves readability and lint compliance.✍️ Suggested fix
-``` +```json { "agent": "implementation-reviewer", "filesReviewed": ["path/to/file.ts"], "findings": [ { "severity": "critical|high|medium|low", "confidence": 95, "file": "path/to/file.ts", "line": 42, "issue": "Concise description of the bug or vulnerability", "recommendation": "Specific fix suggestion", "category": "security|logic|error-handling|race-condition|resource-leak|type-safety|edge-case" } ], "missingTests": [], "summary": "2 critical, 1 high found" }agents/ui-reviewer.md (1)
46-73: Add a language tag to the JSON fence.Please add
jsonto the fenced block for lint compliance and readability.✍️ Suggested fix
-``` +```json { "agent": "ui-reviewer", "filesReviewed": ["src/components/Modal.svelte"], "findings": [ { "severity": "critical|high|medium|low", "confidence": 95, "file": "src/components/Modal.svelte", "line": 8, "issue": "Modal has no focus trap — keyboard users can tab behind the overlay", "recommendation": "Add focus trap that cycles between first and last focusable element, restore focus on close", "category": "a11y" }, { "severity": "medium", "confidence": 85, "file": "src/components/Modal.svelte", "line": 22, "issue": "No loading state while async content fetches", "recommendation": "Add spinner or skeleton screen during data fetch", "category": "ux" } ], "missingTests": [], "summary": "1 critical a11y, 1 medium ux found" }skills/subagent-driven-development/SKILL.md (1)
83-155: Add a language tag to the example workflow fence.The example workflow is an untyped fenced block, which triggers MD040. Use
textfor clarity.✍️ Suggested fix
-``` +```text You: I'm using Subagent-Driven Development to execute this plan. [Read plan file once: docs/plans/feature-plan.md] [Extract all 5 tasks with full text and context] [Create all tasks with TaskCreate] Task 1: Hook installation script ... Done! -``` +```lib/escape-json.sh (1)
4-20: Missing escaping for some JSON control characters.The function handles
\,",\n,\r,\tbut omits\b(backspace),\f(form feed), and other control characters (U+0000–U+001F). Per the JSON spec, all control characters must be escaped. This is unlikely to matter for the current hook messages, but could cause malformed JSON if state data or error output containing these characters is ever passed through.Additionally, the character-by-character loop with
+=is O(n²) in bash due to string reallocation. Fine for short hook messages, but worth noting if this utility is ever reused for larger payloads.💡 Alternative: use a `sed` pipeline for correctness and performance
escape_for_json() { local input="$1" - local output="" - local i char - for (( i=0; i<${`#input`}; i++ )); do - char="${input:$i:1}" - case "$char" in - $'\\') output+='\\' ;; - '"') output+='\"' ;; - $'\n') output+='\n' ;; - $'\r') output+='\r' ;; - $'\t') output+='\t' ;; - *) output+="$char" ;; - esac - done - printf '%s' "$output" + printf '%s' "$input" | sed \ + -e 's/\\/\\\\/g' \ + -e 's/"/\\"/g' \ + -e 's/\t/\\t/g' \ + -e '$! s/$/\\n/' | tr -d '\n' }Or for full spec compliance, a small
python3 -c "import json,sys; print(json.dumps(sys.stdin.read())[1:-1], end='')"one-liner would handle all edge cases.hooks/review-nudge-hook.sh (1)
14-18:git diff --name-only HEADincludes both staged and unstaged changes.This is likely intentional (detecting all session work), but note it won't detect new untracked files. If an agent creates a new
.tsfile but hasn't staged it, this hook won't see it. If the intent is "any code changes in the working tree," consider addinggit ls-files --others --exclude-standardfor untracked files as well.hooks/session-start.sh (1)
23-23: Extremely long line is hard to maintain.This ~900-character line embeds the entire skill list and instructions. Consider building it in a variable with concatenation or a multiline approach before interpolation, which would make future edits (adding/removing skills) less error-prone.
hooks/pre-commit-quality-gate.sh (1)
18-21: Compound and piped commands containinggit commitwon't be intercepted.The regex
^\s*git\s+commit\bonly matches whengit commitis the first command in the string. Commands likecd repo && git commit -m "msg"orgit add . && git commitwill bypass the gate. This may be acceptable for the intended use case (Claude Code typically issues simple commands), but it's worth documenting.skills/research/SKILL.md (1)
58-70: Add blank lines around the required-tools table.markdownlint (MD058) flags tables not surrounded by blank lines. Add a blank line before and after the table to satisfy the rule.
Suggested fix
<required_tools> -| Tool | Purpose | Required | -|------|---------|----------| -| `WebSearch` | Search queries (built-in) | Yes | -| `Task` | Spawn research subagents | Yes | -| `firecrawl-mcp:firecrawl_scrape` | Scrape full page content (preferred) | No | -| `WebFetch` | Fetch page content (built-in fallback) | Fallback | + +| Tool | Purpose | Required | +|------|---------|----------| +| `WebSearch` | Search queries (built-in) | Yes | +| `Task` | Spawn research subagents | Yes | +| `firecrawl-mcp:firecrawl_scrape` | Scrape full-page content (preferred) | No | +| `WebFetch` | Fetch page content (built-in fallback) | Fallback | +
| "findings": [ | ||
| { | ||
| "severity": "high|medium|low", | ||
| "confidence": 90, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find all agent and pipeline files to inspect severity definitions
echo "=== Repository structure ==="
fd -e md . agents/ skills/ 2>/dev/null | head -20
echo ""
echo "=== Checking for severity field definitions in agents/ ==="
fd -e md . agents/ 2>/dev/null --exec grep -l "severity" {} \;
echo ""
echo "=== Checking for severity enum patterns in reviewer agents ==="
fd -e md . agents/ 2>/dev/null -exec grep -A 2 -B 2 "severity" {} + 2>/dev/null | head -50
echo ""
echo "=== Checking pipeline SKILL.md for severity references ==="
find . -name "SKILL.md" -path "*/code-review-pipeline*" 2>/dev/null | head -5Repository: obra/superpowers
Length of output: 1274
🏁 Script executed:
#!/bin/bash
# Read tech-practices-reviewer.md around line 43 to see the severity enum
echo "=== tech-practices-reviewer.md (severity context) ==="
grep -n "severity" agents/tech-practices-reviewer.md -A 1 -B 1
echo ""
echo "=== architecture-reviewer.md (for comparison) ==="
grep -n "severity" agents/architecture-reviewer.md -A 1 -B 1
echo ""
echo "=== Full output schema section in tech-practices-reviewer.md ==="
sed -n '/## Output/,/^## /p' agents/tech-practices-reviewer.md | head -60
echo ""
echo "=== Pipeline SKILL.md severity expectations (lines 80-110) ==="
sed -n '80,110p' skills/code-review-pipeline/SKILL.mdRepository: obra/superpowers
Length of output: 2028
🏁 Script executed:
#!/bin/bash
# Check ALL reviewer agents for severity definitions
echo "=== All agent severity definitions ==="
for file in agents/*.md; do
echo "--- $file ---"
grep -A 1 '"severity"' "$file" || echo " (no severity field)"
done
echo ""
echo "=== Verify pipeline uses all four severity levels ==="
grep -i "critical\|high\|medium\|low" skills/code-review-pipeline/SKILL.md | grep -E "^\s*-\s" | head -10Repository: obra/superpowers
Length of output: 1104
Severity enums are inconsistent across reviewer agents — some support critical, others don't.
The tech-practices-reviewer.md defines "severity": "high|medium|low", but the code-review-pipeline SKILL.md expects findings grouped into four tiers: Critical, High, Medium, and Low. However, this inconsistency is worse than initially stated:
- With
critical: implementation-reviewer.md, ui-reviewer.md - Without
critical: architecture-reviewer.md, tech-practices-reviewer.md, test-reviewer.md
Only 2 of 5 reviewer agents support critical-level findings. This means the pipeline's Critical tier will be sparsely populated and dependent on which agents are invoked, creating unpredictable behavior. Either standardize all agents to include critical in the severity enum, or remove the Critical tier from the pipeline.
🤖 Prompt for AI Agents
In `@agents/tech-practices-reviewer.md` at line 43, Update the severity enum usage
so all reviewer agents and the pipeline agree: either add "critical" to the
severity field in reviewer configs (e.g., update "severity": "high|medium|low"
in tech-practices-reviewer.md, architecture-reviewer.md, test-reviewer.md to
"critical|high|medium|low") and ensure code that emits findings
(implementation-reviewer, ui-reviewer, etc.) uses the same enum, or remove the
Critical tier from the code-review-pipeline SKILL.md and any grouping logic;
choose one approach and apply it consistently across the unique symbols:
severity fields in tech-practices-reviewer.md, architecture-reviewer.md,
test-reviewer.md, implementation-reviewer.md, ui-reviewer.md and the grouping
logic in code-review-pipeline SKILL.md so all files share the same set of
severity values.
| if command -v eslint >/dev/null 2>&1 && [[ -f "$project_root/.eslintrc" || -f "$project_root/.eslintrc.js" || -f "$project_root/.eslintrc.json" || -f "$project_root/eslint.config.js" || -f "$project_root/eslint.config.mjs" ]]; then | ||
| local staged_files | ||
| staged_files=$(git diff --cached --name-only --diff-filter=ACMR | grep -E '\.(ts|js|tsx|jsx)$' || true) | ||
| if [[ -n "$staged_files" ]]; then | ||
| local output | ||
| if ! output=$(cd "$project_root" && echo "$staged_files" | xargs eslint 2>&1); then | ||
| errors+="ESLint failed:\n$output\n\n" | ||
| return 1 | ||
| fi | ||
| fi | ||
| return 0 |
There was a problem hiding this comment.
Filenames with spaces will break xargs splitting, and ESLint flat config .ts variant is missing.
echo "$staged_files" | xargs eslint splits on whitespace, so filenames containing spaces would be misinterpreted. Also, ESLint 9+ supports eslint.config.ts which isn't checked.
Proposed fix
- if command -v eslint >/dev/null 2>&1 && [[ -f "$project_root/.eslintrc" || -f "$project_root/.eslintrc.js" || -f "$project_root/.eslintrc.json" || -f "$project_root/eslint.config.js" || -f "$project_root/eslint.config.mjs" ]]; then
+ if command -v eslint >/dev/null 2>&1 && [[ -f "$project_root/.eslintrc" || -f "$project_root/.eslintrc.js" || -f "$project_root/.eslintrc.json" || -f "$project_root/eslint.config.js" || -f "$project_root/eslint.config.mjs" || -f "$project_root/eslint.config.ts" ]]; then
local staged_files
staged_files=$(git diff --cached --name-only --diff-filter=ACMR | grep -E '\.(ts|js|tsx|jsx)$' || true)
if [[ -n "$staged_files" ]]; then
local output
- if ! output=$(cd "$project_root" && echo "$staged_files" | xargs eslint 2>&1); then
+ if ! output=$(cd "$project_root" && echo "$staged_files" | xargs -I{} eslint {} 2>&1); then
errors+="ESLint failed:\n$output\n\n"The same xargs issue applies to the ruff (line 79) and pyright (line 119) invocations.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if command -v eslint >/dev/null 2>&1 && [[ -f "$project_root/.eslintrc" || -f "$project_root/.eslintrc.js" || -f "$project_root/.eslintrc.json" || -f "$project_root/eslint.config.js" || -f "$project_root/eslint.config.mjs" ]]; then | |
| local staged_files | |
| staged_files=$(git diff --cached --name-only --diff-filter=ACMR | grep -E '\.(ts|js|tsx|jsx)$' || true) | |
| if [[ -n "$staged_files" ]]; then | |
| local output | |
| if ! output=$(cd "$project_root" && echo "$staged_files" | xargs eslint 2>&1); then | |
| errors+="ESLint failed:\n$output\n\n" | |
| return 1 | |
| fi | |
| fi | |
| return 0 | |
| if command -v eslint >/dev/null 2>&1 && [[ -f "$project_root/.eslintrc" || -f "$project_root/.eslintrc.js" || -f "$project_root/.eslintrc.json" || -f "$project_root/eslint.config.js" || -f "$project_root/eslint.config.mjs" || -f "$project_root/eslint.config.ts" ]]; then | |
| local staged_files | |
| staged_files=$(git diff --cached --name-only --diff-filter=ACMR | grep -E '\.(ts|js|tsx|jsx)$' || true) | |
| if [[ -n "$staged_files" ]]; then | |
| local output | |
| if ! output=$(cd "$project_root" && echo "$staged_files" | xargs -I{} eslint {} 2>&1); then | |
| errors+="ESLint failed:\n$output\n\n" | |
| return 1 | |
| fi | |
| fi | |
| return 0 |
🤖 Prompt for AI Agents
In `@hooks/pre-commit-quality-gate.sh` around lines 61 - 71, The current logic
collects staged_files and pipes them through plain xargs which breaks on
filenames with spaces and also misses ESLint's TypeScript flat config; change
the git diff invocation to produce NUL-delimited output (use git ... -z
--name-only --diff-filter=ACMR) and consume it with xargs -0 when calling
eslint/ruff/pyright so filenames with spaces are preserved; also include
eslint.config.ts in the config file existence checks (alongside .eslintrc,
.eslintrc.js, .eslintrc.json, eslint.config.js, eslint.config.mjs) and apply the
same -z / xargs -0 pattern to the ruff and pyright sections (refer to the
staged_files variable and the eslint/ruff/pyright command invocations and output
handling).
| # Run checks | ||
| run_lint | ||
| run_typecheck | ||
|
|
||
| if [[ -n "$errors" ]]; then | ||
| # Block the commit | ||
| echo "$errors" >&2 | ||
| exit 2 | ||
| fi |
There was a problem hiding this comment.
set -e causes premature exit — lint failures kill the script before typecheck runs, with wrong exit code.
With set -euo pipefail (line 7), when run_lint returns 1 on a lint failure, the script immediately exits with code 1 — run_typecheck never executes, the errors check on line 144 is never reached, and the intended exit code 2 is never used. The same applies if run_lint passes but run_typecheck fails.
🐛 Proposed fix: suppress non-zero returns so errors accumulate
# Run checks
-run_lint
-run_typecheck
+run_lint || true
+run_typecheck || true
if [[ -n "$errors" ]]; thenThis allows both checks to run and accumulate their errors in $errors, then the script exits with 2 only if there were failures.
🤖 Prompt for AI Agents
In `@hooks/pre-commit-quality-gate.sh` around lines 140 - 148, The script
currently uses set -e which makes run_lint or run_typecheck abort the script
immediately on non-zero exit, preventing both checks from running and preventing
the intended final exit code handling; modify the calls to run_lint and
run_typecheck so their failures are suppressed and captured (e.g. run them with
a construct that prevents immediate exit and captures their exit status) and
append any error output or status into the existing errors variable, then let
the subsequent if [[ -n "$errors" ]] block decide to echo and exit 2;
specifically update the invocations of run_lint and run_typecheck and their
handling so they do not trigger set -e but instead record failures into errors.
| # Read hook input from stdin | ||
| HOOK_INPUT=$(cat) |
There was a problem hiding this comment.
HOOK_INPUT is read but never used.
Static analysis (SC2034) confirms this. Either remove it or, if you intend to use it later, prefix with _ to signal intentional discard.
Proposed fix
-# Read hook input from stdin
-HOOK_INPUT=$(cat)
+# Consume hook input from stdin (not used by this hook)
+cat > /dev/null🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 13-13: HOOK_INPUT appears unused. Verify use (or export if used externally).
(SC2034)
🤖 Prompt for AI Agents
In `@hooks/research-stop-hook.sh` around lines 12 - 13, The script currently
assigns HOOK_INPUT=$(cat) but never uses HOOK_INPUT; remove this unused
assignment or rename it to _HOOK_INPUT to indicate intentional discard and
silence SC2034; locate the assignment of HOOK_INPUT in
hooks/research-stop-hook.sh (the HOOK_INPUT variable) and either delete that
line or change the variable name to _HOOK_INPUT if the value must be consumed
for side effects.
| if [[ "$current_phase" == "CANCELLED" ]]; then | ||
| # Generate cancellation report and allow exit | ||
| local report | ||
| report=$(generate_report "$state_file" "$findings_file" "cancelled") | ||
| echo "$report" | ||
| exit 0 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find and list the hooks directory
find . -type f -name "*hook*" | head -20Repository: obra/superpowers
Length of output: 191
🏁 Script executed:
# Check the structure of the hooks directory
ls -la hooks/Repository: obra/superpowers
Length of output: 584
🏁 Script executed:
# Read the research-stop-hook.sh file to see the lines in question
cat -n hooks/research-stop-hook.sh | sed -n '190,260p'Repository: obra/superpowers
Length of output: 2893
🏁 Script executed:
# Read the review-nudge-hook.sh file to understand the reference
cat -n hooks/review-nudge-hook.shRepository: obra/superpowers
Length of output: 1614
🏁 Script executed:
# Search for hookSpecificOutput and decision patterns in all shell scripts
rg -n 'hookSpecificOutput|"decision"' --type=shRepository: obra/superpowers
Length of output: 224
Fix inconsistent output format: use structured JSON for all hook outputs.
Lines 199–200 and 228–229 output plain text via echo "$report" for cancelled and complete statuses, while lines 253–259 output structured JSON with "decision": "block" for the blocked status. This inconsistency conflicts with the hook system's expectations—other hooks in the repository (review-nudge-hook.sh, session-start.sh) consistently use JSON structures with hookSpecificOutput. The plain text outputs may be silently ignored or cause parse errors.
🤖 Prompt for AI Agents
In `@hooks/research-stop-hook.sh` around lines 196 - 201, The cancelled/complete
branches currently echo raw text from generate_report which breaks consumers;
instead wrap the report into the hook JSON shape used elsewhere. Update the
branches handling current_phase == "CANCELLED" and "COMPLETE" to emit a single
JSON object like {"hookSpecificOutput": <report>} (ensuring <report> is valid
JSON or properly escaped), e.g. capture report=$(generate_report ...) and then
printf '{"hookSpecificOutput": %s}\n' "$report" (or use jq to build the object)
before exiting; keep function/name references generate_report and current_phase
so you modify the same code paths.
| <phase name="AGGREGATE"> | ||
| 1. Collect JSON responses from all reviewers | ||
| 2. Parse each response (if malformed, skip with warning) | ||
| 3. **Filter:** Remove findings with `confidence < 80` | ||
| 4. **Deduplicate:** If multiple reviewers flag the same file:line, keep the higher-confidence finding | ||
| 5. **Group by severity:** | ||
| - **Critical** — Must fix before proceeding | ||
| - **High** — Should fix now | ||
| - **Medium** — Suggestions worth considering | ||
| - **Low** — Minor improvements | ||
|
|
||
| 6. **Compile missing tests** list from all reviewers | ||
| </phase> |
There was a problem hiding this comment.
critical severity is referenced in AGGREGATE/ACT but no reviewer agent produces it.
Lines 85–88 define four severity tiers including "Critical — Must fix before proceeding", and lines 96–100/112–113 in the ACT phase handle critical findings with inline fixes. However, all specialist reviewer agents (architecture, implementation, tech-practices, test, UI) define their severity enum as "high|medium|low" — none include critical.
This means the Critical tier will always be empty, and the most severe findings will only ever be classified as high. Either:
- Add
criticalto each reviewer agent's severity enum, or - Collapse Critical and High into a single tier in the pipeline.
🤖 Prompt for AI Agents
In `@skills/code-review-pipeline/SKILL.md` around lines 79 - 91, The AGGREGATE/ACT
pipeline references a "Critical" severity but none of the reviewer agents
(architecture reviewer, implementation reviewer, tech-practices reviewer, test
reviewer, UI reviewer) produce it because their severity enums only include
"high|medium|low"; update those reviewer agents' severity enum definitions to
include "critical" (so findings can be promoted to AGGREGATE's Critical tier)
and ensure any place branching on severity (e.g., ACT handling logic) accepts
"critical"; alternatively, if you prefer one change, collapse the pipeline's
"Critical" tier into "High" in the AGGREGATE and ACT phases so the existing
enums remain valid—choose one approach and apply consistently across the
reviewer agent enums and AGGREGATE/ACT logic.
| |------|---------|----------| | ||
| | `WebSearch` | Search queries (built-in) | Yes | | ||
| | `Task` | Spawn research subagents | Yes | | ||
| | `firecrawl-mcp:firecrawl_scrape` | Scrape full page content (preferred) | No | |
There was a problem hiding this comment.
Hyphenate the compound adjective “full-page”.
LanguageTool flags this as a compound adjective before a noun.
Suggested fix
-| `firecrawl-mcp:firecrawl_scrape` | Scrape full page content (preferred) | No |
+| `firecrawl-mcp:firecrawl_scrape` | Scrape full-page content (preferred) | No |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | `firecrawl-mcp:firecrawl_scrape` | Scrape full page content (preferred) | No | | |
| | `firecrawl-mcp:firecrawl_scrape` | Scrape full-page content (preferred) | No | |
🧰 Tools
🪛 LanguageTool
[uncategorized] ~62-~62: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...irecrawl-mcp:firecrawl_scrape| Scrape full page content (preferred) | No | |WebFetch`...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
🤖 Prompt for AI Agents
In `@skills/research/SKILL.md` at line 62, Update the table entry that currently
reads "Scrape full page content (preferred)" to use the hyphenated compound
adjective "full-page" so it reads "Scrape full-page content (preferred)"; locate
the row containing the `firecrawl-mcp:firecrawl_scrape` identifier in SKILL.md
and replace the phrase accordingly.
| ``` | ||
| INIT → DECOMPOSE → RESEARCH → EVALUATE → [RESEARCH or SYNTHESIZE] → DONE | ||
| ``` |
There was a problem hiding this comment.
Specify a language for the fenced code block.
markdownlint (MD040) requires a language on fenced code blocks.
Suggested fix
-```
+```text
INIT → DECOMPOSE → RESEARCH → EVALUATE → [RESEARCH or SYNTHESIZE] → DONE</details>
<!-- suggestion_start -->
<details>
<summary>📝 Committable suggestion</summary>
> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
```suggestion
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)
[warning] 73-73: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
In `@skills/research/SKILL.md` around lines 73 - 75, Add a language identifier to
the fenced code block containing "INIT → DECOMPOSE → RESEARCH → EVALUATE →
[RESEARCH or SYNTHESIZE] → DONE" in SKILL.md to satisfy markdownlint MD040;
change the opening fence from ``` to ```text so the block reads with a language
tag (e.g., use ```text).
| <error_handling> | ||
| | Error | Action | | ||
| |-------|--------| | ||
| | Malformed JSON | Retry once, then mark low confidence | | ||
| | Scrape fails | Continue with other URLs | | ||
| | Rate limit | Wait 60s, reduce batch to 2 | | ||
| | No results | Mark low confidence, rephrase as follow-up | | ||
| | Tool not found | Fall back to WebFetch, update state.json | | ||
| </error_handling> |
There was a problem hiding this comment.
Fix table formatting and add blank lines in error handling.
markdownlint flags missing leading/trailing pipes and missing blank lines around the table.
Suggested fix
<error_handling>
-| Error | Action |
-|-------|--------|
-| Malformed JSON | Retry once, then mark low confidence |
-| Scrape fails | Continue with other URLs |
-| Rate limit | Wait 60s, reduce batch to 2 |
-| No results | Mark low confidence, rephrase as follow-up |
-| Tool not found | Fall back to WebFetch, update state.json |
+
+| Error | Action |
+|-------|--------|
+| Malformed JSON | Retry once, then mark low confidence |
+| Scrape fails | Continue with other URLs |
+| Rate limit | Wait 60s, reduce batch to 2 |
+| No results | Mark low confidence, rephrase as follow-up |
+| Tool not found | Fall back to WebFetch, update state.json |
+
</error_handling>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| <error_handling> | |
| | Error | Action | | |
| |-------|--------| | |
| | Malformed JSON | Retry once, then mark low confidence | | |
| | Scrape fails | Continue with other URLs | | |
| | Rate limit | Wait 60s, reduce batch to 2 | | |
| | No results | Mark low confidence, rephrase as follow-up | | |
| | Tool not found | Fall back to WebFetch, update state.json | | |
| </error_handling> | |
| <error_handling> | |
| | Error | Action | | |
| |-------|--------| | |
| | Malformed JSON | Retry once, then mark low confidence | | |
| | Scrape fails | Continue with other URLs | | |
| | Rate limit | Wait 60s, reduce batch to 2 | | |
| | No results | Mark low confidence, rephrase as follow-up | | |
| | Tool not found | Fall back to WebFetch, update state.json | | |
| </error_handling> |
🧰 Tools
🪛 markdownlint-cli2 (0.20.0)
[warning] 373-373: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
[warning] 380-380: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing leading pipe
(MD055, table-pipe-style)
[warning] 380-380: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing trailing pipe
(MD055, table-pipe-style)
[warning] 380-380: Table column count
Expected: 2; Actual: 1; Too few cells, row will be missing data
(MD056, table-column-count)
🤖 Prompt for AI Agents
In `@skills/research/SKILL.md` around lines 372 - 380, The error_handling table in
SKILL.md is missing the required leading/trailing pipes and blank lines around
the table which triggers markdownlint; edit the <error_handling> section to add
a blank line before and after the table and ensure every header and row starts
and ends with a pipe (e.g., "| Error | Action |" and "|-------|--------|", and
each data row "| Malformed JSON | Retry once, then mark low confidence |", etc.)
so the table is properly formatted for markdownlint to pass.
| <red_flags> | ||
| STOP if you catch yourself thinking any of these: | ||
|
|
||
| | Thought | Reality | | ||
| |---------|---------| | ||
| | "I have high confidence, I can skip the source target" | The Stop hook will block you. Gather the sources—it's non-negotiable. | | ||
| | "This topic is too broad for 8 questions" | Narrow the scope first. Don't start research on vague topics. | | ||
| | "I'll just synthesize what I have" | Check `sourcesGathered >= targetSources`. If not met, you cannot proceed. | | ||
| | "I don't need to update state.json" | You will lose track. Always read/write state.json. | | ||
| | "All sources are equal" | Weight Tier 1 sources higher in synthesis. | | ||
| | "I'm stuck, I'll just finish" | Narrow the scope or generate better follow-up questions. The source gate is non-negotiable. | | ||
| </red_flags> |
There was a problem hiding this comment.
Fix table formatting in red flags.
markdownlint flags missing leading/trailing pipes and column count issues.
Suggested fix
<red_flags>
STOP if you catch yourself thinking any of these:
-| Thought | Reality |
-|---------|---------|
-| "I have high confidence, I can skip the source target" | The Stop hook will block you. Gather the sources—it's non-negotiable. |
-| "This topic is too broad for 8 questions" | Narrow the scope first. Don't start research on vague topics. |
-| "I'll just synthesize what I have" | Check `sourcesGathered >= targetSources`. If not met, you cannot proceed. |
-| "I don't need to update state.json" | You will lose track. Always read/write state.json. |
-| "All sources are equal" | Weight Tier 1 sources higher in synthesis. |
-| "I'm stuck, I'll just finish" | Narrow the scope or generate better follow-up questions. The source gate is non-negotiable. |
+| Thought | Reality |
+|---------|---------|
+| "I have high confidence, I can skip the source target" | The Stop hook will block you. Gather the sources—it's non-negotiable. |
+| "This topic is too broad for 8 questions" | Narrow the scope first. Don't start research on vague topics. |
+| "I'll just synthesize what I have" | Check `sourcesGathered >= targetSources`. If not met, you cannot proceed. |
+| "I don't need to update state.json" | You will lose track. Always read/write state.json. |
+| "All sources are equal" | Weight Tier 1 sources higher in synthesis. |
+| "I'm stuck, I'll just finish" | Narrow the scope or generate better follow-up questions. The source gate is non-negotiable. |
</red_flags>🧰 Tools
🪛 markdownlint-cli2 (0.20.0)
[warning] 404-404: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing leading pipe
(MD055, table-pipe-style)
[warning] 404-404: Table pipe style
Expected: leading_and_trailing; Actual: no_leading_or_trailing; Missing trailing pipe
(MD055, table-pipe-style)
[warning] 404-404: Table column count
Expected: 2; Actual: 1; Too few cells, row will be missing data
(MD056, table-column-count)
🤖 Prompt for AI Agents
In `@skills/research/SKILL.md` around lines 393 - 404, The markdown table under
the <red_flags> block (starting with "STOP if you catch yourself thinking any of
these:") has inconsistent pipe usage and mismatched column counts; fix it by
updating the header and every table row to include leading and trailing pipes
and ensure the header separator (the --- row) has the same number of columns as
the header (two columns: "Thought" and "Reality"), e.g., add a leading | and
trailing | to each line and make the separator |---------|---------| match the
columns so markdownlint passes.
Summary
git commitif lint/typecheck fails (fail-open when no tooling detected)New files
skills/code-review-pipeline/SKILL.md— orchestration (DIFF → DISPATCH → AGGREGATE → ACT)agents/{implementation,test,architecture,tech-practices,ui}-reviewer.md— 5 specialist agents (sonnet, confidence-scored findings)hooks/review-nudge-hook.sh— Stop hook, nudge onlyhooks/pre-commit-quality-gate.sh— PreToolUse hook for Bash, auto-detects project toolingDesign decisions
model: sonnetfor cost efficiency (run up to 5 in parallel)Test plan
/code-review-pipelineon a branch with mixed file types (.ts, .svelte, .py)Summary by CodeRabbit
Release Notes
New Features
/researchcommand for deep research with source verification and parallel subagent investigation.Bug Fixes
Chores