Skip to content

Feature request: preserve review history across clean rebases #275

@cderv

Description

@cderv

written with the help of claude on roborev repo to explore this idea based on my usage

When a feature branch is rebased onto main (without modifying any commit content), all existing reviews, comments, and refine history become unreachable by SHA. The new commits have different SHAs, and review lookups in the prompt builder, refine loop, and show command all go through review_jobs.git_ref which stores the exact SHA.

This means the accumulated context from the review feedback loop -- previous review attempts, developer comments explaining false positives or design decisions, refine iteration history -- is silently lost after a routine rebase.

I regularly rebase feature branches onto main rather than merging main into the branch, and this makes the review feedback loop less useful for longer-lived branches where rebasing happens more than once.

Workflow that triggers this

# Feature branch has commits reviewed by roborev, with comments added
git checkout feature-branch
git rebase main

# After rebase:
# - All commits have new SHAs
# - `roborev list` still shows old reviews (filtered by branch name via j.branch),
#   but they reference SHAs that no longer exist on the branch
# - `roborev show <new-sha>` finds nothing (queries j.git_ref = ?)
# - `roborev refine` iterates branch commits via git, calls GetReviewBySHA for each
#   new SHA, finds no match -- can't continue iterating on prior reviews
# - Re-reviewing produces fresh reviews with no "Previous Review Attempts" context
#   (writePreviousAttemptsForGitRef queries by exact git_ref)
# - Comments written via `roborev respond` are still in the DB linked by job_id,
#   but no SHA-based lookup path reaches them anymore

The post-commit hook correctly skips during the rebase itself (IsRebaseInProgress check in cmd/roborev/main.go:836), so no redundant reviews are enqueued. But after the rebase, the SHA-based lookups that power the feedback loop all break.

A similar situation happens with git commit --amend when only editing the commit message (code unchanged). The post-commit hook triggers a new review for the amended SHA, but the previous review and any comments on it become orphaned.

"Clean rewrite" as a defined concept

Not all rebases or amends are equal. An interactive rebase can edit, squash, or drop commits, changing the actual code. But a "clean rewrite" -- rebase onto main without conflicts, reorder-only, or a message-only amend -- produces commits with identical diffs, just on a different base or with a different message.

Git provides a way to detect this: git patch-id --stable. It hashes the diff content of a commit, ignoring line numbers and whitespace. Two commits with the same code changes but different SHAs produce the same patch-id.

# Before rebase
git show abc123 | git patch-id --stable
# deadbeef abc123

# After rebase (same diff, new SHA)
git show def456 | git patch-id --stable
# deadbeef def456   -- same patch-id

If the commit was edited or had conflict resolution that changed the diff, the patch-id changes. So patch-id matching is a reliable way to distinguish "same code, new SHA" from "actually modified."

I verified this locally: clean rebase produces identical patch-ids, rebase with conflict resolution produces different ones, and message-only amend produces identical ones.

See: https://git-scm.com/docs/git-patch-id

What git provides for tracking rewrites

Git has a post-rewrite hook that fires after git rebase and git commit --amend. It receives on stdin the old-to-new SHA mapping:

old-sha1 new-sha1
old-sha2 new-sha2
...

Combined with patch-id, this gives both the mapping and the ability to validate it:

  • For each old/new pair, compare patch-ids
  • If they match: the review for the old SHA is still valid for the new SHA
  • If they differ: the commit was modified, skip remapping

See: https://git-scm.com/docs/githooks#_post_rewrite

Expected behavior

After a clean rewrite (where commit diffs are unchanged), roborev would preserve review continuity:

  • roborev show for the new SHA finds the existing review
  • roborev refine matches remapped reviews and can continue iterating
  • Re-reviewing a remapped commit includes prior review context in the "Previous Review Attempts" section
  • Comments and responses remain accessible (they are linked by job_id, not SHA, so they follow the review automatically once git_ref is updated)

For commits that were modified during rebase (conflict resolution, edit, squash), no remapping occurs -- those commits are treated as new and get fresh reviews. A review is only preserved when it provably still applies to the code.

Open questions

  • Approach: hook-based remapping vs. stored patch-id. One approach is a post-rewrite hook that actively remaps git_ref values using patch-id to validate. An alternative would be to store patch-id alongside each review at creation time, then fall back to patch-id lookup when SHA lookup fails -- no new hook needed. A low-risk first step could be just storing patch-id in review_jobs without changing any lookup behavior, to have the data available and experiment from there.

  • Multiple rebases of the same commit. If a branch is rebased more than once, there could be multiple old reviews with the same patch-id. A tiebreaker would be needed to decide which review to surface.


Would this be something worth exploring for roborev? Happy to help flesh out the approach further if it seems like a good fit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions