-
Notifications
You must be signed in to change notification settings - Fork 141
Optimization batch 11: avoid repeatedly detecting same renames #859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b44d032
to
43334cd
Compare
2a75ef0
to
57f5b94
Compare
45056df
to
47b6ab0
Compare
42aa076
to
78f24d1
Compare
47b6ab0
to
c35784a
Compare
38a5af7
to
fe3c2d3
Compare
c344081
to
e93a570
Compare
4bfef15
to
297ba4e
Compare
b20a00c
to
703e6df
Compare
59c6d19
to
911a730
Compare
703e6df
to
472b8d5
Compare
911a730
to
5676cfd
Compare
472b8d5
to
a3bddd6
Compare
5676cfd
to
d8e921a
Compare
a3bddd6
to
d17f7bd
Compare
d8e921a
to
9b0eef8
Compare
d17f7bd
to
67f6f7d
Compare
9b0eef8
to
c2eca0c
Compare
d6f48b8
to
6f645b1
Compare
030edc4
to
4e5a08e
Compare
6f645b1
to
37bad86
Compare
4e5a08e
to
9f16076
Compare
37bad86
to
2cf6509
Compare
9f16076
to
decde26
Compare
2cf6509
to
b026ae8
Compare
This patch series was integrated into seen via git@a8dbceb. |
This patch series was integrated into seen via git@ced4672. |
This patch series was integrated into seen via git@a1dfc2b. |
This patch series was integrated into seen via git@4b7e6e0. |
This patch series was integrated into seen via git@ef589b8. |
This patch series was integrated into seen via git@3ec3cd2. |
This patch series was integrated into next via git@58a8b85. |
This patch series was integrated into seen via git@58a8b85. |
This patch series was integrated into seen via git@4c68384. |
This patch series was integrated into seen via git@a8ff15b. |
This patch series was integrated into seen via git@aee3575. |
This patch series was integrated into seen via git@58a8b85. |
This patch series was integrated into seen via git@fcf8cb7. |
This patch series was integrated into seen via git@ef18b82. |
This patch series was integrated into seen via git@1b9e055. |
This patch series was integrated into seen via git@f102943. |
This patch series was integrated into seen via git@0f57124. |
This patch series was integrated into seen via git@3dee93b. |
There was a status update in the "Cooking" section about the branch Optimize out repeated rename detection in a sequence of mergy operations. Will cook in 'next'. |
There was a status update in the "Cooking" section about the branch Optimize out repeated rename detection in a sequence of mergy operations. Will cook in 'next'. |
This patch series was integrated into seen via git@6315e00. |
There was a status update in the "Cooking" section about the branch Optimize out repeated rename detection in a sequence of mergy operations. Will cook in 'next'. |
This patch series was integrated into seen via git@169914e. |
This patch series was integrated into next via git@169914e. |
This patch series was integrated into master via git@169914e. |
Closed via 169914e. |
This series avoids repeatedly detecting the same renames in a sequence
of merges such as a rebase or cherry-pick of several commits.
Changes since v2 (thanks to Stolee for the reviews!):
Not included: Additional trace2 metadata beyond that found in v2; I'm not sure what to record that would help (see https://lore.kernel.org/git/CABPp-BFdxn9f0-jUjY6Ake_6kX-jeN1EEzpeJeTg+TV4wfepwg@mail.gmail.com/)
Changes since v1:
Found and fixed a few bugs affecting merge.directoryRenames=true,
one of which would have caused excessive rename detection runs (not
caching things right), and another that would cause conflicts to be
reported when the merge should be able to succeed.
Updated timings. The speedups are approximately the same as in v1,
but are slightly improved by fixing the above bugs. Also, my v1 cover
letter appears to have had incorrect "percentage of overall time"
reported. Not sure what happened there, but I have updated numbers
below.
Five new patches added to the front of the series (explained in reverse order):
Patch 5: Add a bunch of testcases to cover all the special cases
that could present problems for the remember renames optimization.
Patch 4: Extend test-tool fast-rebase slightly for the new testcases.
Patch 3: Fix an embarrassing bug in fast-rebase, for use in new
testcases.
Patch 2: Add documentation that thoroughly explains all the
nooks and crannies and special cases associated with this
optimization to "prove" that it is safe. May help if future
optimizations or feature changes call into question any
assumptions in play (e.g. if break detection were ever turned on
in the merge machinery).
Patch 1: While thoroughly covering all the special cases, I also
found and documented a minor merge.directoryRenames=true bug
that affects both merge-recursive and merge-ort, with or without
this optimization; this bug has been there for years.
One additional patch inserted near the end of the series:
discussed in Patch 2.
=== Basic Optimization idea ===
When there are many renames between the old base and the new base,
traditionally all those renames are re-detected for every commit that
is transplanted. This optimization avoids redoing that work. While
that description is a simple summary of the high level idea, the
reasons why this optimization are safe and correct can be somewhat
intricate; the second patch adds a document that goes to great length
to explain every relevant detail.
This represents "Optimization #4" from my Git Merge 2020 talk[1]; the
details are a bit more involved than I realized at the time, but the
high level idea is the same.
=== Comparison to previous series ===
I previously noted that we had three major rename-related optimizations:
This one adds a fourth (remember-renames), with some interesting
properties:
unlike basename-guided rename detection, there are no behavioral
changes (there is no heuristic involved)[2].
like skip-because-irrelevant, this optimization does not apply to
all git commands using the rename machinery. In fact, this one is
even more restrictive since it is ONLY useful for rebases and
cherry-picks (not even merges), and only for second and later
commits in a linear series.
unlike the three previous optimizations, there are no requirements
about the types of changes done to the file; it just caches
renames on the "upstream" side of history for subsequent commit
picking.
It's also worth noting despite wording about "remembering" or
"caching" renames, that this optimization does NOT write this cache to
disk; it's an in-memory only cache. When the rebase or cherry-pick
completes (or hits a conflict and stops), the cache is discarded.
=== Results ===
For the testcases mentioned in commit 557ac03 ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
the changes in just this series improves the performance as follows:
By design, this optimization could not help the just-one-mega
testcase. The gains for the other two testcases may look somewhat
smaller than one would expect given the description (only ~13% for the
mega-renames testcase), but the point was to spend less time detecting
renames...and there just wasn't that much time spent in renames for
these testcases before this series for us to remove. However, if we
undid the basename-guided rename detection and skip-because-unnecessary
optimizations, then this series alone would have improved performance
as follows:
Showing that this optimization has the ability to improve things when
the other optimizations do not apply. In fact, when I originally
implemented this optimization, it improved the mega-renames testcase
by a factor of 2 (at the time, I did not have all the optimizations
from ort-perf-batch-7 thru ort-perf-batch-10 in their current shape).
As a reminder, before any merge-ort/diffcore-rename performance work,
the performance results we started with were:
=== Further discussion of results ===
If we change our focus from absolute time taken, to the percentage of
overall time spent on rename detection, then we find the following
picture comparing our starting point at the beginning of the
performance work to what we achieve at the end of this series:
This optimization is only applicable for the first two testcases
(because the third only involves rebasing a single commit). This
table makes it clear that our attempts to accelerate rename detection
have succeeded, and any further work to accelerate merges needs to
start concentrating on other areas.
[1] https://github.com/newren/presentations/blob/pdfs/merge-performance/merge-performance-slides.pdf
[2] Well, almost no changes. There's technically a very narrow way that
this could change the behavior...though in a way that does not
affect correctness of the merge; see section 5 of the new document
in the second patch for the details.
cc: Derrick Stolee dstolee@microsoft.com
cc: Jonathan Tan jonathantanmy@google.com
cc: Taylor Blau me@ttaylorr.com
cc: Elijah Newren newren@gmail.com
cc: Derrick Stolee stolee@gmail.com
cc: Bagas Sanjaya bagasdotme@gmail.com
cc: "Kerry, Richard" richard.kerry@atos.net