Further improvements to code-review skill #124229

stephentoub · 2026-02-10T16:10:49Z

Try to remove bias from PR/issue comments by prompting it to first form its own opinions based only on the code and only then factor in and evaluate the PR/issue comments
Urge it to have a more aggressive / skeptical mindset
Explicitly give it the option of saying it needs human review to reduce desire to just say LGTM
Delete the CCR code-review-instructions that seem to be confusing it and just have those refer back to the skill

- Try to remove bias from PR/issue comments by prompting it to first form its own opinions based only on the code and only then factor in and evaluate the PR/issue comments - Urge it to have a more aggressive / skeptical mindset - Explicitly give it the option of saying it needs human review to reduce desire to just say LGTM - Delete the CCR code-review-instructions that seem to be confusing it and just have those refer back to the skill

stephentoub · 2026-02-10T16:11:37Z

Before:
#124188 (review)

After:
#124228 (comment)

Copilot

Pull request overview

Updates the repository’s code-review guidance to reduce narrative anchoring, encourage more skeptical evaluation, and standardize an explicit “Needs Human Review” outcome, while simplifying the legacy code-review instructions file to defer to the skill.

Changes:

Added an explicit reviewer mindset and a staged process that forms an initial opinion from code context before reading PR narrative.
Updated the review output format to include “Needs Human Review” and added verdict consistency rules.
Replaced the long-form .github/code-review-instructions.md with a pointer to the skill doc.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`.github/skills/code-review/SKILL.md`	Expands the review process and output format to emphasize independent assessment and add a “Needs Human Review” verdict path.
`.github/code-review-instructions.md`	Simplifies instructions to reference the skill as the single source of truth.

Comments suppressed due to low confidence (1)

.github/skills/code-review/SKILL.md:117

In the Detailed Findings template header ("#### ✅/⚠️/❌ ..."), 💡 is omitted even though the guidelines below list 💡 as a supported severity. Consider updating the template header to include 💡 as well, or remove 💡 from the guidelines, so reviewers follow a single consistent format.

#### ✅/⚠️/❌ <Category Name> — <Brief description>

<Explanation with specifics. Reference code, line numbers, interleavings, etc.>

(Repeat for each finding category. Group related findings under a single heading.)

.github/skills/code-review/SKILL.md

Inspired by the code-review skill pattern (PR dotnet#124229), restructure the codeflow analysis to form an independent assessment from primary signals before consulting Maestro comments: - New 'Current State' section (Step 2) synthesizes empty diff, force push events, and activity recency into an immediate verdict: NO-OP / IN PROGRESS / STALE / ACTIVE - PR Branch Analysis now appears before comments, providing commit categorization as primary data - Renamed 'Staleness & Conflict Check' to 'Codeflow History' to clearly signal these are past events, not current state. Framing line directs reader to Current State for present status. - Recommendations driven by current state assessment, with comment history as supporting context The principle: comments tell you the history, not the present. Determine the PR's actual state from primary signals (diff, branch, timeline) before consulting Maestro comments for context.

…e push & empty diff detection (#124231) This improves the vmr-codeflow-status Copilot CLI skill with enhancements discovered while investigating a stale codeflow PR ([#124095](#124095)). ## Design Change: Current State First Inspired by the code-review skill restructuring in #124229, the analysis now follows a **"current state first, comments as history"** pattern: 1. **Assess current state** from primary signals (PR state, diff size, force pushes, timeline) before reading any Maestro comments 2. **Form an independent verdict** (MERGED / CLOSED / NO-OP / IN PROGRESS / STALE / ACTIVE) 3. **Then** read comments as historical context to explain how we got to the current state 4. **Recommendations** are generated by the agent from a structured JSON summary, not hardcoded script logic Previously, the script read Maestro conflict/staleness comments first and drew conclusions from them — even when those comments were stale (e.g., after someone had already force-pushed a resolution). Recommendations were a 130-line if/elseif chain that couldn't adapt to edge cases. ## New Capabilities ### 1. Current State Assessment (Step 2) Synthesizes primary signals into an immediate verdict before any comment parsing: - `✅ MERGED` — PR has been merged, no action needed - `✖️ CLOSED` — PR was closed without merging, Maestro should create a replacement - `📭 NO-OP` — empty diff, likely already resolved - `🔄 IN PROGRESS` — recent force push, awaiting update - `⏳ STALE` — no activity for >3 days - `✅ ACTIVE` — PR has content and recent activity ### 2. Force Push Detection Queries PR timeline for `head_ref_force_pushed` events, showing who force-pushed and when. Uses `--slurp` for correct pagination handling. ### 3. Empty Diff Detection Checks `changedFiles`/`additions`/`deletions` to flag 0-change PRs as NO-OP (regardless of whether a force push caused it). ### 4. Post-Action Staleness Analysis Cross-references force push timestamps against conflict/staleness warnings. When a force push post-dates these warnings, the script correctly identifies the issues as potentially resolved. ### 5. JSON Summary + Agent-Generated Recommendations The script's hardcoded recommendations (130+ lines of branching logic) have been replaced with: - A `[CODEFLOW_SUMMARY]` JSON block emitted at the end of the script with all key facts - `isCodeflowPR` boolean for explicit codeflow vs non-codeflow classification - A "Generating Recommendations" section in SKILL.md that teaches the agent how to reason over the summary - The agent now synthesizes contextual, nuanced recommendations instead of picking from a canned decision tree ### 6. Codeflow History (renamed section) The former "Staleness & Conflict Check" is now "Codeflow History" with a framing line: *"Maestro warnings (historical — see Current State for present status)"*. This signals that comments describe past events, not necessarily current state. ## Testing - Verified against PR #124095 (closed backflow with conflicts + force push) and PR #124231 (non-codeflow PR) - Multi-model review round 1 (Claude Sonnet 4, GPT-5): consensus findings → added merged/closed state handling, empty-diff-only NO-OP, `--slurp` pagination fix - Multi-model test round 2 (Claude Sonnet 4, GPT-5): functional test of JSON summary → fixed `isUpToDate` null handling, negative `daysSinceUpdate`, added `isCodeflowPR` field, clean exit codes, non-codeflow guidance in SKILL.md

stephentoub requested review from Copilot and jkotas February 10, 2026 16:10

github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 10, 2026

dotnet-policy-service bot assigned stephentoub Feb 10, 2026

Copilot started reviewing on behalf of stephentoub February 10, 2026 16:11 View session

Copilot AI reviewed Feb 10, 2026

View reviewed changes

.github/skills/code-review/SKILL.md Show resolved Hide resolved

.github/skills/code-review/SKILL.md Show resolved Hide resolved

.github/skills/code-review/SKILL.md Show resolved Hide resolved

Update summary in Holistic Assessment section

006a146

stephentoub added area-skills Agent Skills and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Feb 10, 2026

stephentoub enabled auto-merge (squash) February 10, 2026 17:07

stephentoub requested a review from steveisok February 10, 2026 17:10

lewing mentioned this pull request Feb 10, 2026

Improve vmr-codeflow-status skill: current-state-first analysis, force push & empty diff detection #124231

Merged

lewing approved these changes Feb 10, 2026

View reviewed changes

stephentoub merged commit 32b519c into dotnet:main Feb 10, 2026
20 checks passed

dotnet-maestro bot mentioned this pull request Feb 11, 2026

[main] Source code updates from dotnet/runtime dotnet/dotnet#4727

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Further improvements to code-review skill #124229

Further improvements to code-review skill #124229

stephentoub commented Feb 10, 2026

Uh oh!

stephentoub commented Feb 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Further improvements to code-review skill #124229

Further improvements to code-review skill #124229

Conversation

stephentoub commented Feb 10, 2026

Uh oh!

stephentoub commented Feb 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants