-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Further improvements to code-review skill #124229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
stephentoub
commented
Feb 10, 2026
- Try to remove bias from PR/issue comments by prompting it to first form its own opinions based only on the code and only then factor in and evaluate the PR/issue comments
- Urge it to have a more aggressive / skeptical mindset
- Explicitly give it the option of saying it needs human review to reduce desire to just say LGTM
- Delete the CCR code-review-instructions that seem to be confusing it and just have those refer back to the skill
- Try to remove bias from PR/issue comments by prompting it to first form its own opinions based only on the code and only then factor in and evaluate the PR/issue comments - Urge it to have a more aggressive / skeptical mindset - Explicitly give it the option of saying it needs human review to reduce desire to just say LGTM - Delete the CCR code-review-instructions that seem to be confusing it and just have those refer back to the skill
|
Before: After: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Updates the repository’s code-review guidance to reduce narrative anchoring, encourage more skeptical evaluation, and standardize an explicit “Needs Human Review” outcome, while simplifying the legacy code-review instructions file to defer to the skill.
Changes:
- Added an explicit reviewer mindset and a staged process that forms an initial opinion from code context before reading PR narrative.
- Updated the review output format to include “Needs Human Review” and added verdict consistency rules.
- Replaced the long-form
.github/code-review-instructions.mdwith a pointer to the skill doc.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
.github/skills/code-review/SKILL.md |
Expands the review process and output format to emphasize independent assessment and add a “Needs Human Review” verdict path. |
.github/code-review-instructions.md |
Simplifies instructions to reference the skill as the single source of truth. |
Comments suppressed due to low confidence (1)
.github/skills/code-review/SKILL.md:117
- In the Detailed Findings template header ("#### ✅/
⚠️ /❌ ..."), 💡 is omitted even though the guidelines below list 💡 as a supported severity. Consider updating the template header to include 💡 as well, or remove 💡 from the guidelines, so reviewers follow a single consistent format.
#### ✅/⚠️/❌ <Category Name> — <Brief description>
<Explanation with specifics. Reference code, line numbers, interleavings, etc.>
(Repeat for each finding category. Group related findings under a single heading.)
Inspired by the code-review skill pattern (PR dotnet#124229), restructure the codeflow analysis to form an independent assessment from primary signals before consulting Maestro comments: - New 'Current State' section (Step 2) synthesizes empty diff, force push events, and activity recency into an immediate verdict: NO-OP / IN PROGRESS / STALE / ACTIVE - PR Branch Analysis now appears before comments, providing commit categorization as primary data - Renamed 'Staleness & Conflict Check' to 'Codeflow History' to clearly signal these are past events, not current state. Framing line directs reader to Current State for present status. - Recommendations driven by current state assessment, with comment history as supporting context The principle: comments tell you the history, not the present. Determine the PR's actual state from primary signals (diff, branch, timeline) before consulting Maestro comments for context.
…e push & empty diff detection (#124231) This improves the vmr-codeflow-status Copilot CLI skill with enhancements discovered while investigating a stale codeflow PR ([#124095](#124095)). ## Design Change: Current State First Inspired by the code-review skill restructuring in #124229, the analysis now follows a **"current state first, comments as history"** pattern: 1. **Assess current state** from primary signals (PR state, diff size, force pushes, timeline) before reading any Maestro comments 2. **Form an independent verdict** (MERGED / CLOSED / NO-OP / IN PROGRESS / STALE / ACTIVE) 3. **Then** read comments as historical context to explain how we got to the current state 4. **Recommendations** are generated by the agent from a structured JSON summary, not hardcoded script logic Previously, the script read Maestro conflict/staleness comments first and drew conclusions from them — even when those comments were stale (e.g., after someone had already force-pushed a resolution). Recommendations were a 130-line if/elseif chain that couldn't adapt to edge cases. ## New Capabilities ### 1. Current State Assessment (Step 2) Synthesizes primary signals into an immediate verdict before any comment parsing: - `✅ MERGED` — PR has been merged, no action needed - `✖️ CLOSED` — PR was closed without merging, Maestro should create a replacement - `📭 NO-OP` — empty diff, likely already resolved - `🔄 IN PROGRESS` — recent force push, awaiting update - `⏳ STALE` — no activity for >3 days - `✅ ACTIVE` — PR has content and recent activity ### 2. Force Push Detection Queries PR timeline for `head_ref_force_pushed` events, showing who force-pushed and when. Uses `--slurp` for correct pagination handling. ### 3. Empty Diff Detection Checks `changedFiles`/`additions`/`deletions` to flag 0-change PRs as NO-OP (regardless of whether a force push caused it). ### 4. Post-Action Staleness Analysis Cross-references force push timestamps against conflict/staleness warnings. When a force push post-dates these warnings, the script correctly identifies the issues as potentially resolved. ### 5. JSON Summary + Agent-Generated Recommendations The script's hardcoded recommendations (130+ lines of branching logic) have been replaced with: - A `[CODEFLOW_SUMMARY]` JSON block emitted at the end of the script with all key facts - `isCodeflowPR` boolean for explicit codeflow vs non-codeflow classification - A "Generating Recommendations" section in SKILL.md that teaches the agent how to reason over the summary - The agent now synthesizes contextual, nuanced recommendations instead of picking from a canned decision tree ### 6. Codeflow History (renamed section) The former "Staleness & Conflict Check" is now "Codeflow History" with a framing line: *"Maestro warnings (historical — see Current State for present status)"*. This signals that comments describe past events, not necessarily current state. ## Testing - Verified against PR #124095 (closed backflow with conflicts + force push) and PR #124231 (non-codeflow PR) - Multi-model review round 1 (Claude Sonnet 4, GPT-5): consensus findings → added merged/closed state handling, empty-diff-only NO-OP, `--slurp` pagination fix - Multi-model test round 2 (Claude Sonnet 4, GPT-5): functional test of JSON summary → fixed `isUpToDate` null handling, negative `daysSinceUpdate`, added `isCodeflowPR` field, clean exit codes, non-codeflow guidance in SKILL.md