Skip to content

In-memory diffs and merges slow on large repos #6036

Open
@arxanas

Description

@arxanas

In-memory merges can be slow on a repository like https://github.com/mozilla/gecko-dev, e.g. several seconds to carry out a simple merge for a few changed files. This is around 500x slower than what's possible, comparing against a workaround (see benchmark at arxanas/git-branchless@4c57407):

  • 1.9s for naive cherry-pick.
  • 2.8ms for workaround cherry-pick.

I believe this is because the in-memory Index structure always stores all files, even when the vast majority of them aren't changed. It would be best if the in-memory index could alternatively be backed by a tree + changed paths.

The workaround is as follows:

  • Find the commits to merge and calculate their merge-base, as appropriate.
  • Find all paths changed among each of those commits' trees compared to the merge-base.
  • Generate synthetic versions of each tree/commit only containing the changed paths.
  • Carry out the merge on the synthetic trees/commits.
  • Combine the result back with an original tree, overwriting any entries already in the tree. (It doesn't matter which tree, since the non-changed entries are all the same.)

Reference implementation for cherry-picking specifically: https://github.com/arxanas/git-branchless/blob/ec0d27427ab7a505d4109e4588e356d6a18da2fe/src/git/repo.rs#L726-L836

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions