In-memory diffs and merges slow on large repos

In-memory merges can be slow on a repository like https://github.com/mozilla/gecko-dev, e.g. several seconds to carry out a simple merge for a few changed files. This is around 500x slower than what's possible, comparing against a workaround (see benchmark at https://github.com/arxanas/git-branchless/commit/4c5740779a88059e2fa547bfac2b275cc886e869):

* 1.9s for naive cherry-pick.
* 2.8ms for workaround cherry-pick.

I believe this is because the in-memory `Index` structure always stores all files, even when the vast majority of them aren't changed. It would be best if the in-memory index could alternatively be backed by a tree + changed paths.

The workaround is as follows:

* Find the commits to merge and calculate their merge-base, as appropriate.
* Find all paths changed among each of those commits' trees compared to the merge-base.
* Generate synthetic versions of each tree/commit only containing the changed paths.
* Carry out the merge on the synthetic trees/commits.
* Combine the result back with an original tree, overwriting any entries already in the tree. (It doesn't matter which tree, since the non-changed entries are all the same.)

Reference implementation for cherry-picking specifically: https://github.com/arxanas/git-branchless/blob/ec0d27427ab7a505d4109e4588e356d6a18da2fe/src/git/repo.rs#L726-L836

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

In-memory diffs and merges slow on large repos #6036

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

In-memory diffs and merges slow on large repos #6036

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions