Optimization batch 11: avoid repeatedly detecting same renames #859

newren · 2021-01-31T07:04:07Z

This series avoids repeatedly detecting the same renames in a sequence
of merges such as a rebase or cherry-pick of several commits.

Changes since v2 (thanks to Stolee for the reviews!):

Patch 2: Wording cleanups (typo fixes and whatnot in the documentation)
Patch 5: Expand the comment to explain the purpose of cached_irrelevant
Patch 6: Add a cache_new_pairs() helper, remove extraneous line deletion
Patch 7: Typo fix

Not included: Additional trace2 metadata beyond that found in v2; I'm not sure what to record that would help (see https://lore.kernel.org/git/CABPp-BFdxn9f0-jUjY6Ake_6kX-jeN1EEzpeJeTg+TV4wfepwg@mail.gmail.com/)

Changes since v1:

Found and fixed a few bugs affecting merge.directoryRenames=true,
one of which would have caused excessive rename detection runs (not
caching things right), and another that would cause conflicts to be
reported when the merge should be able to succeed.
Updated timings. The speedups are approximately the same as in v1,
but are slightly improved by fixing the above bugs. Also, my v1 cover
letter appears to have had incorrect "percentage of overall time"
reported. Not sure what happened there, but I have updated numbers
below.
Five new patches added to the front of the series (explained in reverse order):
- Patch 5: Add a bunch of testcases to cover all the special cases
  that could present problems for the remember renames optimization.
- Patch 4: Extend test-tool fast-rebase slightly for the new testcases.
- Patch 3: Fix an embarrassing bug in fast-rebase, for use in new
  testcases.
- Patch 2: Add documentation that thoroughly explains all the
  nooks and crannies and special cases associated with this
  optimization to "prove" that it is safe. May help if future
  optimizations or feature changes call into question any
  assumptions in play (e.g. if break detection were ever turned on
  in the merge machinery).
- Patch 1: While thoroughly covering all the special cases, I also
  found and documented a minor merge.directoryRenames=true bug
  that affects both merge-recursive and merge-ort, with or without
  this optimization; this bug has been there for years.
One additional patch inserted near the end of the series:
- Patch 11: Special handling for rename/rename(1to1) situations, as
  discussed in Patch 2.

=== Basic Optimization idea ===

When there are many renames between the old base and the new base,
traditionally all those renames are re-detected for every commit that
is transplanted. This optimization avoids redoing that work. While
that description is a simple summary of the high level idea, the
reasons why this optimization are safe and correct can be somewhat
intricate; the second patch adds a document that goes to great length
to explain every relevant detail.

This represents "Optimization #4" from my Git Merge 2020 talk[1]; the
details are a bit more involved than I realized at the time, but the
high level idea is the same.

=== Comparison to previous series ===

I previously noted that we had three major rename-related optimizations:

exact rename detection (applies when unmodified on renamed side)
skip-because-irrelevant (applies when unmodified on unrenamed side)
basename-guided rename detection (applies when basename unchanged)

This one adds a fourth (remember-renames), with some interesting
properties:

unlike basename-guided rename detection, there are no behavioral
changes (there is no heuristic involved)[2].
like skip-because-irrelevant, this optimization does not apply to
all git commands using the rename machinery. In fact, this one is
even more restrictive since it is ONLY useful for rebases and
cherry-picks (not even merges), and only for second and later
commits in a linear series.
unlike the three previous optimizations, there are no requirements
about the types of changes done to the file; it just caches
renames on the "upstream" side of history for subsequent commit
picking.

It's also worth noting despite wording about "remembering" or
"caching" renames, that this optimization does NOT write this cache to
disk; it's an in-memory only cache. When the rebase or cherry-pick
completes (or hits a conflict and stops), the cache is discarded.

=== Results ===

For the testcases mentioned in commit 557ac03 ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
the changes in just this series improves the performance as follows:

                     Before Series           After Series
no-renames:        5.665 s ±  0.129 s     5.622 s ±  0.059 s
mega-renames:     11.435 s ±  0.158 s    10.127 s ±  0.073 s
just-one-mega:   494.2  ms ±  6.1  ms   500.3  ms ±  3.8  ms

By design, this optimization could not help the just-one-mega
testcase. The gains for the other two testcases may look somewhat
smaller than one would expect given the description (only ~13% for the
mega-renames testcase), but the point was to spend less time detecting
renames...and there just wasn't that much time spent in renames for
these testcases before this series for us to remove. However, if we
undid the basename-guided rename detection and skip-because-unnecessary
optimizations, then this series alone would have improved performance
as follows:

               Before Basename Series   After Just This Series
no-renames:      13.815 s ±  0.062 s      5.697 s ±  0.080 s
mega-renames:  1799.937 s ±  0.493 s    205.709 s ±  0.457 s

Showing that this optimization has the ability to improve things when
the other optimizations do not apply. In fact, when I originally
implemented this optimization, it improved the mega-renames testcase
by a factor of 2 (at the time, I did not have all the optimizations
from ort-perf-batch-7 thru ort-perf-batch-10 in their current shape).

As a reminder, before any merge-ort/diffcore-rename performance work,
the performance results we started with were:

no-renames-am:      6.940 s ±  0.485 s
no-renames:        18.912 s ±  0.174 s
mega-renames:    5964.031 s ± 10.459 s
just-one-mega:    149.583 s ±  0.751 s

=== Further discussion of results ===

If we change our focus from absolute time taken, to the percentage of
overall time spent on rename detection, then we find the following
picture comparing our starting point at the beginning of the
performance work to what we achieve at the end of this series:

         Percentage of time spent on rename detection
   
                  commit 557ac0350d      After this Series
no-renames:             39.4%                   0.2%
mega-renames:           96.6%                   8.7%
just-one-mega:          95.0%                  15.6%

This optimization is only applicable for the first two testcases
(because the third only involves rebasing a single commit). This
table makes it clear that our attempts to accelerate rename detection
have succeeded, and any further work to accelerate merges needs to
start concentrating on other areas.

[1] https://github.com/newren/presentations/blob/pdfs/merge-performance/merge-performance-slides.pdf

[2] Well, almost no changes. There's technically a very narrow way that
this could change the behavior...though in a way that does not
affect correctness of the merge; see section 5 of the new document
in the second patch for the details.

cc: Derrick Stolee dstolee@microsoft.com
cc: Jonathan Tan jonathantanmy@google.com
cc: Taylor Blau me@ttaylorr.com
cc: Elijah Newren newren@gmail.com
cc: Derrick Stolee stolee@gmail.com
cc: Bagas Sanjaya bagasdotme@gmail.com
cc: "Kerry, Richard" richard.kerry@atos.net

gitgitgadget · 2021-05-25T20:24:14Z

This patch series was integrated into seen via git@a8dbceb.

gitgitgadget · 2021-05-25T21:47:36Z

This patch series was integrated into seen via git@ced4672.

gitgitgadget · 2021-05-27T03:41:49Z

This patch series was integrated into seen via git@a1dfc2b.

gitgitgadget · 2021-05-27T05:28:36Z

This patch series was integrated into seen via git@4b7e6e0.

gitgitgadget · 2021-05-27T06:14:58Z

This patch series was integrated into seen via git@ef589b8.

gitgitgadget · 2021-05-28T06:15:19Z

This patch series was integrated into seen via git@3ec3cd2.

gitgitgadget · 2021-05-28T06:15:20Z

This patch series was integrated into next via git@58a8b85.

gitgitgadget · 2021-05-31T04:11:42Z

This patch series was integrated into seen via git@58a8b85.

gitgitgadget · 2021-05-31T04:50:28Z

This patch series was integrated into seen via git@4c68384.

gitgitgadget · 2021-06-01T22:28:23Z

This patch series was integrated into seen via git@a8ff15b.

gitgitgadget · 2021-06-02T02:45:27Z

This patch series was integrated into seen via git@aee3575.

gitgitgadget · 2021-06-02T03:27:25Z

This patch series was integrated into seen via git@58a8b85.

gitgitgadget · 2021-06-02T03:57:20Z

This patch series was integrated into seen via git@fcf8cb7.

gitgitgadget · 2021-06-02T08:13:49Z

This patch series was integrated into seen via git@ef18b82.

gitgitgadget · 2021-06-05T22:59:43Z

This patch series was integrated into seen via git@1b9e055.

gitgitgadget · 2021-06-06T05:44:02Z

This patch series was integrated into seen via git@f102943.

gitgitgadget · 2021-06-06T06:17:13Z

This patch series was integrated into seen via git@0f57124.

gitgitgadget · 2021-06-06T12:28:01Z

This patch series was integrated into seen via git@3dee93b.

gitgitgadget · 2021-06-06T12:50:54Z

There was a status update in the "Cooking" section about the branch en/ort-perf-batch-11 on the Git mailing list:

Optimize out repeated rename detection in a sequence of mergy
operations.

Will cook in 'next'.

gitgitgadget · 2021-06-08T05:12:32Z

There was a status update in the "Cooking" section about the branch en/ort-perf-batch-11 on the Git mailing list:

Optimize out repeated rename detection in a sequence of mergy
operations.

Will cook in 'next'.

gitgitgadget · 2021-06-10T06:27:09Z

This patch series was integrated into seen via git@6315e00.

gitgitgadget · 2021-06-10T06:54:59Z

There was a status update in the "Cooking" section about the branch en/ort-perf-batch-11 on the Git mailing list:

Optimize out repeated rename detection in a sequence of mergy
operations.

Will cook in 'next'.

gitgitgadget · 2021-06-14T05:45:19Z

This patch series was integrated into seen via git@169914e.

gitgitgadget · 2021-06-14T05:45:20Z

This patch series was integrated into next via git@169914e.

gitgitgadget · 2021-06-14T05:45:21Z

This patch series was integrated into master via git@169914e.

gitgitgadget · 2021-06-14T05:45:25Z

Closed via 169914e.

newren force-pushed the ort-perf-batch-11 branch from b44d032 to 43334cd Compare February 3, 2021 05:32

newren force-pushed the temporary/ort-perf-batch-10 branch from 2a75ef0 to 57f5b94 Compare February 3, 2021 19:03

newren force-pushed the ort-perf-batch-11 branch 3 times, most recently from 45056df to 47b6ab0 Compare February 9, 2021 10:33

newren force-pushed the temporary/ort-perf-batch-10 branch from 42aa076 to 78f24d1 Compare February 9, 2021 10:33

newren force-pushed the ort-perf-batch-11 branch from 47b6ab0 to c35784a Compare February 10, 2021 14:08

newren force-pushed the temporary/ort-perf-batch-10 branch from 38a5af7 to fe3c2d3 Compare February 10, 2021 16:26

newren force-pushed the ort-perf-batch-11 branch 2 times, most recently from c344081 to e93a570 Compare February 10, 2021 16:35

newren force-pushed the temporary/ort-perf-batch-10 branch from 4bfef15 to 297ba4e Compare February 11, 2021 07:37

newren force-pushed the ort-perf-batch-11 branch 2 times, most recently from b20a00c to 703e6df Compare February 12, 2021 17:29

newren force-pushed the temporary/ort-perf-batch-10 branch from 59c6d19 to 911a730 Compare February 12, 2021 21:12

newren force-pushed the ort-perf-batch-11 branch from 703e6df to 472b8d5 Compare February 12, 2021 21:12

newren force-pushed the temporary/ort-perf-batch-10 branch from 911a730 to 5676cfd Compare February 14, 2021 03:31

newren force-pushed the ort-perf-batch-11 branch from 472b8d5 to a3bddd6 Compare February 14, 2021 03:31

newren force-pushed the temporary/ort-perf-batch-10 branch from 5676cfd to d8e921a Compare February 14, 2021 03:38

newren force-pushed the ort-perf-batch-11 branch from a3bddd6 to d17f7bd Compare February 14, 2021 03:39

newren force-pushed the temporary/ort-perf-batch-10 branch from d8e921a to 9b0eef8 Compare February 23, 2021 21:29

newren force-pushed the ort-perf-batch-11 branch from d17f7bd to 67f6f7d Compare February 23, 2021 21:29

newren force-pushed the temporary/ort-perf-batch-10 branch from 9b0eef8 to c2eca0c Compare February 25, 2021 01:21

newren force-pushed the ort-perf-batch-11 branch 2 times, most recently from d6f48b8 to 6f645b1 Compare February 26, 2021 00:02

newren force-pushed the temporary/ort-perf-batch-10 branch from 030edc4 to 4e5a08e Compare February 26, 2021 20:47

newren force-pushed the ort-perf-batch-11 branch from 6f645b1 to 37bad86 Compare February 26, 2021 20:47

newren force-pushed the temporary/ort-perf-batch-10 branch from 4e5a08e to 9f16076 Compare February 27, 2021 06:17

newren force-pushed the ort-perf-batch-11 branch from 37bad86 to 2cf6509 Compare February 27, 2021 06:17

newren force-pushed the temporary/ort-perf-batch-10 branch from 9f16076 to decde26 Compare March 8, 2021 22:29

newren force-pushed the ort-perf-batch-11 branch from 2cf6509 to b026ae8 Compare March 8, 2021 22:29

gitgitgadget bot added the next label May 28, 2021

gitgitgadget bot added the master label Jun 14, 2021

gitgitgadget bot closed this Jun 14, 2021

newren deleted the ort-perf-batch-11 branch June 25, 2021 02:57

Optimization batch 11: avoid repeatedly detecting same renames #859

Optimization batch 11: avoid repeatedly detecting same renames #859

Uh oh!

Conversation

newren commented Jan 31, 2021 • edited by gitgitgadget bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gitgitgadget bot commented May 25, 2021

Uh oh!

gitgitgadget bot commented May 25, 2021

Uh oh!

gitgitgadget bot commented May 27, 2021

Uh oh!

gitgitgadget bot commented May 27, 2021

Uh oh!

gitgitgadget bot commented May 27, 2021

Uh oh!

gitgitgadget bot commented May 28, 2021

Uh oh!

gitgitgadget bot commented May 28, 2021

Uh oh!

gitgitgadget bot commented May 31, 2021

Uh oh!

gitgitgadget bot commented May 31, 2021

Uh oh!

gitgitgadget bot commented Jun 1, 2021

Uh oh!

gitgitgadget bot commented Jun 2, 2021

Uh oh!

gitgitgadget bot commented Jun 2, 2021

Uh oh!

gitgitgadget bot commented Jun 2, 2021

Uh oh!

gitgitgadget bot commented Jun 2, 2021

Uh oh!

gitgitgadget bot commented Jun 5, 2021

Uh oh!

gitgitgadget bot commented Jun 6, 2021

Uh oh!

gitgitgadget bot commented Jun 6, 2021

Uh oh!

gitgitgadget bot commented Jun 6, 2021

Uh oh!

gitgitgadget bot commented Jun 6, 2021

Uh oh!

gitgitgadget bot commented Jun 8, 2021

Uh oh!

gitgitgadget bot commented Jun 10, 2021

Uh oh!

gitgitgadget bot commented Jun 10, 2021

Uh oh!

gitgitgadget bot commented Jun 14, 2021

Uh oh!

gitgitgadget bot commented Jun 14, 2021

Uh oh!

gitgitgadget bot commented Jun 14, 2021

Uh oh!

gitgitgadget bot commented Jun 14, 2021

Uh oh!

Uh oh!

newren commented Jan 31, 2021 •

edited by gitgitgadget bot

Loading