Skip to content

Add range-diff, a tbdiff lookalike #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 21 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
f168da3
linear-assignment: a function to solve least-cost assignment problems
dscho Apr 30, 2018
33758f3
Introduce `range-diff` to compare iterations of a topic branch
dscho May 1, 2018
08b8c3f
range-diff: first rudimentary implementation
dscho May 2, 2018
7b90919
range-diff: improve the order of the shown commits
dscho May 2, 2018
8515d2f
range-diff: also show the diff between patches
dscho May 6, 2018
a10ca01
range-diff: right-trim commit messages
dscho May 2, 2018
f81cbef
range-diff: indent the diffs just like tbdiff
dscho May 2, 2018
458090f
range-diff: suppress the diff headers
dscho May 2, 2018
d3be03a
range-diff: adjust the output of the commit pairs
dscho May 2, 2018
94b44df
range-diff: do not show "function names" in hunk headers
dscho May 6, 2018
1477c58
range-diff: add tests
trast May 2, 2018
32492c1
range-diff: use color for the commit pairs
dscho May 2, 2018
969a196
color: add the meta color GIT_COLOR_REVERSE
dscho May 3, 2018
f1c86f6
diff: add an internal option to dual-color diffs of diffs
dscho May 3, 2018
3c7b9f3
range-diff: offer to dual-color the diffs
dscho May 3, 2018
c56c51c
range-diff --dual-color: skip white-space warnings
dscho May 3, 2018
8c5543a
range-diff: populate the man page
dscho May 3, 2018
16e3cf2
completion: support `git range-diff`
dscho May 3, 2018
d9b09ab
range-diff: left-pad patch numbers
dscho May 5, 2018
f6fd395
range-diff: make --dual-color the default mode
dscho Jun 30, 2018
699cd71
range-diff: use dim/bold cues to improve dual color mode
dscho Jul 21, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@
/git-pull
/git-push
/git-quiltimport
/git-range-diff
/git-read-tree
/git-rebase
/git-rebase--am
Expand Down
6 changes: 4 additions & 2 deletions Documentation/config.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1193,8 +1193,10 @@ color.diff.<slot>::
(highlighting whitespace errors), `oldMoved` (deleted lines),
`newMoved` (added lines), `oldMovedDimmed`, `oldMovedAlternative`,
`oldMovedAlternativeDimmed`, `newMovedDimmed`, `newMovedAlternative`
and `newMovedAlternativeDimmed` (See the '<mode>'
setting of '--color-moved' in linkgit:git-diff[1] for details).
`newMovedAlternativeDimmed` (See the '<mode>'
setting of '--color-moved' in linkgit:git-diff[1] for details),
`contextDimmed`, `oldDimmed`, `newDimmed`, `contextBold`,
`oldBold`, and `newBold` (see linkgit:git-range-diff[1] for details).

color.decorate.<slot>::
Use customized color for 'git log --decorate' output. `<slot>` is one
Expand Down
252 changes: 252 additions & 0 deletions Documentation/git-range-diff.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
git-range-diff(1)
=================

NAME
----
git-range-diff - Compare two commit ranges (e.g. two versions of a branch)

SYNOPSIS
--------
[verse]
'git range-diff' [--color=[<when>]] [--no-color] [<diff-options>]
[--no-dual-color] [--creation-factor=<factor>]
( <range1> <range2> | <rev1>...<rev2> | <base> <rev1> <rev2> )

DESCRIPTION
-----------

This command shows the differences between two versions of a patch
series, or more generally, two commit ranges (ignoring merge commits).

To that end, it first finds pairs of commits from both commit ranges
that correspond with each other. Two commits are said to correspond when
the diff between their patches (i.e. the author information, the commit
message and the commit diff) is reasonably small compared to the
patches' size. See ``Algorithm`` below for details.

Finally, the list of matching commits is shown in the order of the
second commit range, with unmatched commits being inserted just after
all of their ancestors have been shown.


OPTIONS
-------
--no-dual-color::
When the commit diffs differ, `git range-diff` recreates the
original diffs' coloring, and adds outer -/+ diff markers with
the *background* being red/green to make it easier to see e.g.
when there was a change in what exact lines were added.
+
Additionally, the commit diff lines that are only present in the first commit
range are shown "dimmed" (this can be overridden using the `color.diff.<slot>`
config setting where `<slot>` is one of `contextDimmed`, `oldDimmed` and
`newDimmed`), and the commit diff lines that are only present in the second
commit range are shown in bold (which can be overridden using the config
settings `color.diff.<slot>` with `<slot>` being one of `contextBold`,
`oldBold` or `newBold`).
+
This is known to `range-diff` as "dual coloring". Use `--no-dual-color`
to revert to color all lines according to the outer diff markers
(and completely ignore the inner diff when it comes to color).

--creation-factor=<percent>::
Set the creation/deletion cost fudge factor to `<percent>`.
Defaults to 60. Try a larger value if `git range-diff` erroneously
considers a large change a total rewrite (deletion of one commit
and addition of another), and a smaller one in the reverse case.
See the ``Algorithm`` section below for an explanation why this is
needed.

<range1> <range2>::
Compare the commits specified by the two ranges, where
`<range1>` is considered an older version of `<range2>`.

<rev1>...<rev2>::
Equivalent to passing `<rev2>..<rev1>` and `<rev1>..<rev2>`.

<base> <rev1> <rev2>::
Equivalent to passing `<base>..<rev1>` and `<base>..<rev2>`.
Note that `<base>` does not need to be the exact branch point
of the branches. Example: after rebasing a branch `my-topic`,
`git range-diff my-topic@{u} my-topic@{1} my-topic` would
show the differences introduced by the rebase.

`git range-diff` also accepts the regular diff options (see
linkgit:git-diff[1]), most notably the `--color=[<when>]` and
`--no-color` options. These options are used when generating the "diff
between patches", i.e. to compare the author, commit message and diff of
corresponding old/new commits. There is currently no means to tweak the
diff options passed to `git log` when generating those patches.


CONFIGURATION
-------------
This command uses the `diff.color.*` and `pager.range-diff` settings
(the latter is on by default).
See linkgit:git-config[1].


EXAMPLES
--------

When a rebase required merge conflicts to be resolved, compare the changes
introduced by the rebase directly afterwards using:

------------
$ git range-diff @{u} @{1} @
------------


A typical output of `git range-diff` would look like this:

------------
-: ------- > 1: 0ddba11 Prepare for the inevitable!
1: c0debee = 2: cab005e Add a helpful message at the start
2: f00dbal ! 3: decafe1 Describe a bug
@@ -1,3 +1,3 @@
Author: A U Thor <author@example.com>

-TODO: Describe a bug
+Describe a bug
@@ -324,5 +324,6
This is expected.

-+What is unexpected is that it will also crash.
++Unexpectedly, it also crashes. This is a bug, and the jury is
++still out there how to fix it best. See ticket #314 for details.

Contact
3: bedead < -: ------- TO-UNDO
------------

In this example, there are 3 old and 3 new commits, where the developer
removed the 3rd, added a new one before the first two, and modified the
commit message of the 2nd commit as well its diff.

When the output goes to a terminal, it is color-coded by default, just
like regular `git diff`'s output. In addition, the first line (adding a
commit) is green, the last line (deleting a commit) is red, the second
line (with a perfect match) is yellow like the commit header of `git
show`'s output, and the third line colors the old commit red, the new
one green and the rest like `git show`'s commit header.

A naive color-coded diff of diffs is actually a bit hard to read,
though, as it colors the entire lines red or green. The line that added
"What is unexpected" in the old commit, for example, is completely red,
even if the intent of the old commit was to add something.

To help with that, `range` uses the `--dual-color` mode by default. In
this mode, the diff of diffs will retain the original diff colors, and
prefix the lines with -/+ markers that have their *background* red or
green, to make it more obvious that they describe how the diff itself
changed.


Algorithm
---------

The general idea is this: we generate a cost matrix between the commits
in both commit ranges, then solve the least-cost assignment.

The cost matrix is populated thusly: for each pair of commits, both
diffs are generated and the "diff of diffs" is generated, with 3 context
lines, then the number of lines in that diff is used as cost.

To avoid false positives (e.g. when a patch has been removed, and an
unrelated patch has been added between two iterations of the same patch
series), the cost matrix is extended to allow for that, by adding
fixed-cost entries for wholesale deletes/adds.

Example: Let commits `1--2` be the first iteration of a patch series and
`A--C` the second iteration. Let's assume that `A` is a cherry-pick of
`2,` and `C` is a cherry-pick of `1` but with a small modification (say,
a fixed typo). Visualize the commits as a bipartite graph:

------------
1 A

2 B

C
------------

We are looking for a "best" explanation of the new series in terms of
the old one. We can represent an "explanation" as an edge in the graph:


------------
1 A
/
2 --------' B

C
------------

This explanation comes for "free" because there was no change. Similarly
`C` could be explained using `1`, but that comes at some cost c>0
because of the modification:

------------
1 ----. A
| /
2 ----+---' B
|
`----- C
c>0
------------

In mathematical terms, what we are looking for is some sort of a minimum
cost bipartite matching; `1` is matched to `C` at some cost, etc. The
underlying graph is in fact a complete bipartite graph; the cost we
associate with every edge is the size of the diff between the two
commits' patches. To explain also new commits, we introduce dummy nodes
on both sides:

------------
1 ----. A
| /
2 ----+---' B
|
o `----- C
c>0
o o

o o
------------

The cost of an edge `o--C` is the size of `C`'s diff, modified by a
fudge factor that should be smaller than 100%. The cost of an edge
`o--o` is free. The fudge factor is necessary because even if `1` and
`C` have nothing in common, they may still share a few empty lines and
such, possibly making the assignment `1--C`, `o--o` slightly cheaper
than `1--o`, `o--C` even if `1` and `C` have nothing in common. With the
fudge factor we require a much larger common part to consider patches as
corresponding.

The overall time needed to compute this algorithm is the time needed to
compute n+m commit diffs and then n*m diffs of patches, plus the time
needed to compute the least-cost assigment between n and m diffs. Git
uses an implementation of the Jonker-Volgenant algorithm to solve the
assignment problem, which has cubic runtime complexity. The matching
found in this case will look like this:

------------
1 ----. A
| /
2 ----+---' B
.--+-----'
o -' `----- C
c>0
o ---------- o

o ---------- o
------------


SEE ALSO
--------
linkgit:git-log[1]

GIT
---
Part of the linkgit:git[1] suite
3 changes: 3 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -870,6 +870,7 @@ LIB_OBJS += gpg-interface.o
LIB_OBJS += graph.o
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> +		else if (!line.buf[0] || starts_with(line.buf, "index "))
> +			/*
> +			 * A completely blank (not ' \n', which is context)
> +			 * line is not valid in a diff.  We skip it

I noticed this while wondering how somebody could teach range-diff
to honor --notes=amlog while preparing the patches to be compared
[*1*], but this assumption goes against what POSIX.1 says these
days.

    It is implementation-defined whether an empty unaffected line is
    written as an empty line or a line containing a single <space> character.

cf. http://pubs.opengroup.org/onlinepubs/9699919799/utilities/diff.html#tag_20_34_10_07

We need to insert ", as we disable user's diff.suppressBlankEmpty
settings" before ".  We skip it" (and if we get affected by the
setting, we need to fix it; it is not ultra-urgent, though).

[Footnote]

*1* ... which I do not have a good answer to, yet.  As discussed
earlier, the diffopt passed into the show_range_diff() machinery is
primarily meant for the final output (i.e. how the matching patches
from the two iterations are compared) and not about how the patches
to be compared are generated.  Worse, --notes=amlog (and possibly
other useful options) are parsed by "git log" side of the machinery,
not "git diff" side that populates diffopt.

LIB_OBJS += grep.o
LIB_OBJS += hashmap.o
LIB_OBJS += linear-assignment.o
LIB_OBJS += help.o
LIB_OBJS += hex.o
LIB_OBJS += ident.o
Expand Down Expand Up @@ -924,6 +925,7 @@ LIB_OBJS += progress.o
LIB_OBJS += prompt.o
LIB_OBJS += protocol.o
LIB_OBJS += quote.o
LIB_OBJS += range-diff.o
LIB_OBJS += reachable.o
LIB_OBJS += read-cache.o
LIB_OBJS += reflog-walk.o
Expand Down Expand Up @@ -1062,6 +1064,7 @@ BUILTIN_OBJS += builtin/prune-packed.o
BUILTIN_OBJS += builtin/prune.o
BUILTIN_OBJS += builtin/pull.o
BUILTIN_OBJS += builtin/push.o
BUILTIN_OBJS += builtin/range-diff.o
BUILTIN_OBJS += builtin/read-tree.o
BUILTIN_OBJS += builtin/rebase--helper.o
BUILTIN_OBJS += builtin/receive-pack.o
Expand Down
1 change: 1 addition & 0 deletions builtin.h
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,7 @@ extern int cmd_prune(int argc, const char **argv, const char *prefix);
extern int cmd_prune_packed(int argc, const char **argv, const char *prefix);
extern int cmd_pull(int argc, const char **argv, const char *prefix);
extern int cmd_push(int argc, const char **argv, const char *prefix);
extern int cmd_range_diff(int argc, const char **argv, const char *prefix);
extern int cmd_read_tree(int argc, const char **argv, const char *prefix);
extern int cmd_rebase__helper(int argc, const char **argv, const char *prefix);
extern int cmd_receive_pack(int argc, const char **argv, const char *prefix);
Expand Down
Loading