Skip to content

Optimization batch 11: avoid repeatedly detecting same renames #859

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from

Conversation

newren
Copy link

@newren newren commented Jan 31, 2021

This series avoids repeatedly detecting the same renames in a sequence
of merges such as a rebase or cherry-pick of several commits.

Changes since v2 (thanks to Stolee for the reviews!):

  • Patch 2: Wording cleanups (typo fixes and whatnot in the documentation)
  • Patch 5: Expand the comment to explain the purpose of cached_irrelevant
  • Patch 6: Add a cache_new_pairs() helper, remove extraneous line deletion
  • Patch 7: Typo fix

Not included: Additional trace2 metadata beyond that found in v2; I'm not sure what to record that would help (see https://lore.kernel.org/git/CABPp-BFdxn9f0-jUjY6Ake_6kX-jeN1EEzpeJeTg+TV4wfepwg@mail.gmail.com/)

Changes since v1:

  • Found and fixed a few bugs affecting merge.directoryRenames=true,
    one of which would have caused excessive rename detection runs (not
    caching things right), and another that would cause conflicts to be
    reported when the merge should be able to succeed.

  • Updated timings. The speedups are approximately the same as in v1,
    but are slightly improved by fixing the above bugs. Also, my v1 cover
    letter appears to have had incorrect "percentage of overall time"
    reported. Not sure what happened there, but I have updated numbers
    below.

  • Five new patches added to the front of the series (explained in reverse order):

    • Patch 5: Add a bunch of testcases to cover all the special cases
      that could present problems for the remember renames optimization.

    • Patch 4: Extend test-tool fast-rebase slightly for the new testcases.

    • Patch 3: Fix an embarrassing bug in fast-rebase, for use in new
      testcases.

    • Patch 2: Add documentation that thoroughly explains all the
      nooks and crannies and special cases associated with this
      optimization to "prove" that it is safe. May help if future
      optimizations or feature changes call into question any
      assumptions in play (e.g. if break detection were ever turned on
      in the merge machinery).

    • Patch 1: While thoroughly covering all the special cases, I also
      found and documented a minor merge.directoryRenames=true bug
      that affects both merge-recursive and merge-ort, with or without
      this optimization; this bug has been there for years.

  • One additional patch inserted near the end of the series:

    • Patch 11: Special handling for rename/rename(1to1) situations, as
      discussed in Patch 2.

=== Basic Optimization idea ===

When there are many renames between the old base and the new base,
traditionally all those renames are re-detected for every commit that
is transplanted. This optimization avoids redoing that work. While
that description is a simple summary of the high level idea, the
reasons why this optimization are safe and correct can be somewhat
intricate; the second patch adds a document that goes to great length
to explain every relevant detail.

This represents "Optimization #4" from my Git Merge 2020 talk[1]; the
details are a bit more involved than I realized at the time, but the
high level idea is the same.

=== Comparison to previous series ===

I previously noted that we had three major rename-related optimizations:

  • exact rename detection (applies when unmodified on renamed side)
  • skip-because-irrelevant (applies when unmodified on unrenamed side)
  • basename-guided rename detection (applies when basename unchanged)

This one adds a fourth (remember-renames), with some interesting
properties:

  • unlike basename-guided rename detection, there are no behavioral
    changes (there is no heuristic involved)[2].

  • like skip-because-irrelevant, this optimization does not apply to
    all git commands using the rename machinery. In fact, this one is
    even more restrictive since it is ONLY useful for rebases and
    cherry-picks (not even merges), and only for second and later
    commits in a linear series.

  • unlike the three previous optimizations, there are no requirements
    about the types of changes done to the file; it just caches
    renames on the "upstream" side of history for subsequent commit
    picking.

It's also worth noting despite wording about "remembering" or
"caching" renames, that this optimization does NOT write this cache to
disk; it's an in-memory only cache. When the rebase or cherry-pick
completes (or hits a conflict and stops), the cache is discarded.

=== Results ===

For the testcases mentioned in commit 557ac03 ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
the changes in just this series improves the performance as follows:

                     Before Series           After Series
no-renames:        5.665 s ±  0.129 s     5.622 s ±  0.059 s
mega-renames:     11.435 s ±  0.158 s    10.127 s ±  0.073 s
just-one-mega:   494.2  ms ±  6.1  ms   500.3  ms ±  3.8  ms

By design, this optimization could not help the just-one-mega
testcase. The gains for the other two testcases may look somewhat
smaller than one would expect given the description (only ~13% for the
mega-renames testcase), but the point was to spend less time detecting
renames...and there just wasn't that much time spent in renames for
these testcases before this series for us to remove. However, if we
undid the basename-guided rename detection and skip-because-unnecessary
optimizations, then this series alone would have improved performance
as follows:

               Before Basename Series   After Just This Series
no-renames:      13.815 s ±  0.062 s      5.697 s ±  0.080 s
mega-renames:  1799.937 s ±  0.493 s    205.709 s ±  0.457 s

Showing that this optimization has the ability to improve things when
the other optimizations do not apply. In fact, when I originally
implemented this optimization, it improved the mega-renames testcase
by a factor of 2 (at the time, I did not have all the optimizations
from ort-perf-batch-7 thru ort-perf-batch-10 in their current shape).

As a reminder, before any merge-ort/diffcore-rename performance work,
the performance results we started with were:

no-renames-am:      6.940 s ±  0.485 s
no-renames:        18.912 s ±  0.174 s
mega-renames:    5964.031 s ± 10.459 s
just-one-mega:    149.583 s ±  0.751 s

=== Further discussion of results ===

If we change our focus from absolute time taken, to the percentage of
overall time spent on rename detection, then we find the following
picture comparing our starting point at the beginning of the
performance work to what we achieve at the end of this series:

         Percentage of time spent on rename detection
   
                  commit 557ac0350d      After this Series
no-renames:             39.4%                   0.2%
mega-renames:           96.6%                   8.7%
just-one-mega:          95.0%                  15.6%

This optimization is only applicable for the first two testcases
(because the third only involves rebasing a single commit). This
table makes it clear that our attempts to accelerate rename detection
have succeeded, and any further work to accelerate merges needs to
start concentrating on other areas.

[1] https://github.com/newren/presentations/blob/pdfs/merge-performance/merge-performance-slides.pdf

[2] Well, almost no changes. There's technically a very narrow way that
this could change the behavior...though in a way that does not
affect correctness of the merge; see section 5 of the new document
in the second patch for the details.

cc: Derrick Stolee dstolee@microsoft.com
cc: Jonathan Tan jonathantanmy@google.com
cc: Taylor Blau me@ttaylorr.com
cc: Elijah Newren newren@gmail.com
cc: Derrick Stolee stolee@gmail.com
cc: Bagas Sanjaya bagasdotme@gmail.com
cc: "Kerry, Richard" richard.kerry@atos.net

@newren newren force-pushed the ort-perf-batch-11 branch from b44d032 to 43334cd Compare February 3, 2021 05:32
@newren newren force-pushed the temporary/ort-perf-batch-10 branch from 2a75ef0 to 57f5b94 Compare February 3, 2021 19:03
@newren newren force-pushed the ort-perf-batch-11 branch 3 times, most recently from 45056df to 47b6ab0 Compare February 9, 2021 10:33
@newren newren force-pushed the temporary/ort-perf-batch-10 branch from 42aa076 to 78f24d1 Compare February 9, 2021 10:33
@newren newren force-pushed the temporary/ort-perf-batch-10 branch from 38a5af7 to fe3c2d3 Compare February 10, 2021 16:26
@newren newren force-pushed the ort-perf-batch-11 branch 2 times, most recently from c344081 to e93a570 Compare February 10, 2021 16:35
@newren newren force-pushed the temporary/ort-perf-batch-10 branch from 4bfef15 to 297ba4e Compare February 11, 2021 07:37
@newren newren force-pushed the ort-perf-batch-11 branch 2 times, most recently from b20a00c to 703e6df Compare February 12, 2021 17:29
@newren newren force-pushed the temporary/ort-perf-batch-10 branch from 59c6d19 to 911a730 Compare February 12, 2021 21:12
@newren newren force-pushed the temporary/ort-perf-batch-10 branch from 911a730 to 5676cfd Compare February 14, 2021 03:31
@newren newren force-pushed the temporary/ort-perf-batch-10 branch from 5676cfd to d8e921a Compare February 14, 2021 03:38
@newren newren force-pushed the temporary/ort-perf-batch-10 branch from d8e921a to 9b0eef8 Compare February 23, 2021 21:29
@newren newren force-pushed the temporary/ort-perf-batch-10 branch from 9b0eef8 to c2eca0c Compare February 25, 2021 01:21
@newren newren force-pushed the ort-perf-batch-11 branch 2 times, most recently from d6f48b8 to 6f645b1 Compare February 26, 2021 00:02
@newren newren force-pushed the temporary/ort-perf-batch-10 branch from 030edc4 to 4e5a08e Compare February 26, 2021 20:47
@newren newren force-pushed the temporary/ort-perf-batch-10 branch from 4e5a08e to 9f16076 Compare February 27, 2021 06:17
@newren newren force-pushed the temporary/ort-perf-batch-10 branch from 9f16076 to decde26 Compare March 8, 2021 22:29
@newren newren force-pushed the ort-perf-batch-11 branch from 2cf6509 to b026ae8 Compare March 8, 2021 22:29
@gitgitgadget
Copy link

gitgitgadget bot commented May 25, 2021

This patch series was integrated into seen via git@a8dbceb.

@gitgitgadget
Copy link

gitgitgadget bot commented May 25, 2021

This patch series was integrated into seen via git@ced4672.

@gitgitgadget
Copy link

gitgitgadget bot commented May 27, 2021

This patch series was integrated into seen via git@a1dfc2b.

@gitgitgadget
Copy link

gitgitgadget bot commented May 27, 2021

This patch series was integrated into seen via git@4b7e6e0.

@gitgitgadget
Copy link

gitgitgadget bot commented May 27, 2021

This patch series was integrated into seen via git@ef589b8.

@gitgitgadget
Copy link

gitgitgadget bot commented May 28, 2021

This patch series was integrated into seen via git@3ec3cd2.

@gitgitgadget
Copy link

gitgitgadget bot commented May 28, 2021

This patch series was integrated into next via git@58a8b85.

@gitgitgadget gitgitgadget bot added the next label May 28, 2021
@gitgitgadget
Copy link

gitgitgadget bot commented May 31, 2021

This patch series was integrated into seen via git@58a8b85.

@gitgitgadget
Copy link

gitgitgadget bot commented May 31, 2021

This patch series was integrated into seen via git@4c68384.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 1, 2021

This patch series was integrated into seen via git@a8ff15b.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 2, 2021

This patch series was integrated into seen via git@aee3575.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 2, 2021

This patch series was integrated into seen via git@58a8b85.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 2, 2021

This patch series was integrated into seen via git@fcf8cb7.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 2, 2021

This patch series was integrated into seen via git@ef18b82.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 5, 2021

This patch series was integrated into seen via git@1b9e055.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 6, 2021

This patch series was integrated into seen via git@f102943.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 6, 2021

This patch series was integrated into seen via git@0f57124.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 6, 2021

This patch series was integrated into seen via git@3dee93b.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 6, 2021

There was a status update in the "Cooking" section about the branch en/ort-perf-batch-11 on the Git mailing list:

Optimize out repeated rename detection in a sequence of mergy
operations.

Will cook in 'next'.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 8, 2021

There was a status update in the "Cooking" section about the branch en/ort-perf-batch-11 on the Git mailing list:

Optimize out repeated rename detection in a sequence of mergy
operations.

Will cook in 'next'.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 10, 2021

This patch series was integrated into seen via git@6315e00.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 10, 2021

There was a status update in the "Cooking" section about the branch en/ort-perf-batch-11 on the Git mailing list:

Optimize out repeated rename detection in a sequence of mergy
operations.

Will cook in 'next'.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 14, 2021

This patch series was integrated into seen via git@169914e.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 14, 2021

This patch series was integrated into next via git@169914e.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 14, 2021

This patch series was integrated into master via git@169914e.

@gitgitgadget gitgitgadget bot added the master label Jun 14, 2021
@gitgitgadget gitgitgadget bot closed this Jun 14, 2021
@gitgitgadget
Copy link

gitgitgadget bot commented Jun 14, 2021

Closed via 169914e.

@newren newren deleted the ort-perf-batch-11 branch June 25, 2021 02:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant