Skip to content

Conversation

@stephentoub
Copy link
Member

  • Try to remove bias from PR/issue comments by prompting it to first form its own opinions based only on the code and only then factor in and evaluate the PR/issue comments
  • Urge it to have a more aggressive / skeptical mindset
  • Explicitly give it the option of saying it needs human review to reduce desire to just say LGTM
  • Delete the CCR code-review-instructions that seem to be confusing it and just have those refer back to the skill

- Try to remove bias from PR/issue comments by prompting it to first form its own opinions based only on the code and only then factor in and evaluate the PR/issue comments
- Urge it to have a more aggressive / skeptical mindset
- Explicitly give it the option of saying it needs human review to reduce desire to just say LGTM
- Delete the CCR code-review-instructions that seem to be confusing it and just have those refer back to the skill
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Feb 10, 2026
@stephentoub
Copy link
Member Author

Before:
#124188 (review)

After:
#124228 (comment)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the repository’s code-review guidance to reduce narrative anchoring, encourage more skeptical evaluation, and standardize an explicit “Needs Human Review” outcome, while simplifying the legacy code-review instructions file to defer to the skill.

Changes:

  • Added an explicit reviewer mindset and a staged process that forms an initial opinion from code context before reading PR narrative.
  • Updated the review output format to include “Needs Human Review” and added verdict consistency rules.
  • Replaced the long-form .github/code-review-instructions.md with a pointer to the skill doc.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
.github/skills/code-review/SKILL.md Expands the review process and output format to emphasize independent assessment and add a “Needs Human Review” verdict path.
.github/code-review-instructions.md Simplifies instructions to reference the skill as the single source of truth.
Comments suppressed due to low confidence (1)

.github/skills/code-review/SKILL.md:117

  • In the Detailed Findings template header ("#### ✅/⚠️/❌ ..."), 💡 is omitted even though the guidelines below list 💡 as a supported severity. Consider updating the template header to include 💡 as well, or remove 💡 from the guidelines, so reviewers follow a single consistent format.
#### ✅/⚠️/❌ <Category Name> — <Brief description>

<Explanation with specifics. Reference code, line numbers, interleavings, etc.>

(Repeat for each finding category. Group related findings under a single heading.)

@stephentoub stephentoub added area-skills Agent Skills and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Feb 10, 2026
@stephentoub stephentoub enabled auto-merge (squash) February 10, 2026 17:07
lewing added a commit to lewing/runtime that referenced this pull request Feb 10, 2026
Inspired by the code-review skill pattern (PR dotnet#124229), restructure
the codeflow analysis to form an independent assessment from primary
signals before consulting Maestro comments:

- New 'Current State' section (Step 2) synthesizes empty diff, force
  push events, and activity recency into an immediate verdict:
  NO-OP / IN PROGRESS / STALE / ACTIVE

- PR Branch Analysis now appears before comments, providing commit
  categorization as primary data

- Renamed 'Staleness & Conflict Check' to 'Codeflow History' to
  clearly signal these are past events, not current state. Framing
  line directs reader to Current State for present status.

- Recommendations driven by current state assessment, with comment
  history as supporting context

The principle: comments tell you the history, not the present.
Determine the PR's actual state from primary signals (diff, branch,
timeline) before consulting Maestro comments for context.
@stephentoub stephentoub merged commit 32b519c into dotnet:main Feb 10, 2026
20 checks passed
lewing added a commit that referenced this pull request Feb 11, 2026
…e push & empty diff detection (#124231)

This improves the vmr-codeflow-status Copilot CLI skill with
enhancements discovered while investigating a stale codeflow PR
([#124095](#124095)).

## Design Change: Current State First

Inspired by the code-review skill restructuring in #124229, the analysis
now follows a **"current state first, comments as history"** pattern:

1. **Assess current state** from primary signals (PR state, diff size,
force pushes, timeline) before reading any Maestro comments
2. **Form an independent verdict** (MERGED / CLOSED / NO-OP / IN
PROGRESS / STALE / ACTIVE)
3. **Then** read comments as historical context to explain how we got to
the current state
4. **Recommendations** are generated by the agent from a structured JSON
summary, not hardcoded script logic

Previously, the script read Maestro conflict/staleness comments first
and drew conclusions from them — even when those comments were stale
(e.g., after someone had already force-pushed a resolution).
Recommendations were a 130-line if/elseif chain that couldn't adapt to
edge cases.

## New Capabilities

### 1. Current State Assessment (Step 2)
Synthesizes primary signals into an immediate verdict before any comment
parsing:
- `✅ MERGED` — PR has been merged, no action needed
- `✖️ CLOSED` — PR was closed without merging, Maestro should create a
replacement
- `📭 NO-OP` — empty diff, likely already resolved
- `🔄 IN PROGRESS` — recent force push, awaiting update
- `⏳ STALE` — no activity for >3 days
- `✅ ACTIVE` — PR has content and recent activity

### 2. Force Push Detection
Queries PR timeline for `head_ref_force_pushed` events, showing who
force-pushed and when. Uses `--slurp` for correct pagination handling.

### 3. Empty Diff Detection
Checks `changedFiles`/`additions`/`deletions` to flag 0-change PRs as
NO-OP (regardless of whether a force push caused it).

### 4. Post-Action Staleness Analysis
Cross-references force push timestamps against conflict/staleness
warnings. When a force push post-dates these warnings, the script
correctly identifies the issues as potentially resolved.

### 5. JSON Summary + Agent-Generated Recommendations
The script's hardcoded recommendations (130+ lines of branching logic)
have been replaced with:
- A `[CODEFLOW_SUMMARY]` JSON block emitted at the end of the script
with all key facts
- `isCodeflowPR` boolean for explicit codeflow vs non-codeflow
classification
- A "Generating Recommendations" section in SKILL.md that teaches the
agent how to reason over the summary
- The agent now synthesizes contextual, nuanced recommendations instead
of picking from a canned decision tree

### 6. Codeflow History (renamed section)
The former "Staleness & Conflict Check" is now "Codeflow History" with a
framing line: *"Maestro warnings (historical — see Current State for
present status)"*. This signals that comments describe past events, not
necessarily current state.

## Testing
- Verified against PR #124095 (closed backflow with conflicts + force
push) and PR #124231 (non-codeflow PR)
- Multi-model review round 1 (Claude Sonnet 4, GPT-5): consensus
findings → added merged/closed state handling, empty-diff-only NO-OP,
`--slurp` pagination fix
- Multi-model test round 2 (Claude Sonnet 4, GPT-5): functional test of
JSON summary → fixed `isUpToDate` null handling, negative
`daysSinceUpdate`, added `isCodeflowPR` field, clean exit codes,
non-codeflow guidance in SKILL.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-skills Agent Skills

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants