Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up commit operations with partial index updates #151

Open
epage opened this issue Oct 29, 2021 · 6 comments
Open

Speed up commit operations with partial index updates #151

epage opened this issue Oct 29, 2021 · 6 comments
Labels
enhancement Improve the expected

Comments

@epage
Copy link
Collaborator

epage commented Oct 29, 2021

    /// Cherry-pick a commit in memory and return the resulting tree.
    ///
    /// The `libgit2` routines operate on entire `Index`es, which contain one
    /// entry per file in the repository. When operating on a large repository,
    /// this is prohibitively slow, as it takes several seconds just to write
    /// the index to disk. To improve performance, we reduce the size of the
    /// involved indexes by filtering out any unchanged entries from the input
    /// trees, then call into `libgit2`, then add back the unchanged entries to
    /// the output tree.

See https://github.com/arxanas/git-branchless/blob/331b7cf2a37d00a3d54ba5df00f3d702aa7b3948/src/git/repo.rs#L774-L789

@epage epage added the enhancement Improve the expected label Oct 29, 2021
@arxanas
Copy link
Contributor

arxanas commented Oct 29, 2021

There's also some implementation commentary here: libgit2/libgit2#6036 (cc @bcongdon)

@epage
Copy link
Collaborator Author

epage commented Oct 29, 2021

I had always thought the speed up was just in operating on the index directly, rather than touching the working tree, which i also do. I didn't know there were additional optimizations. Thanks!

@epage
Copy link
Collaborator Author

epage commented Oct 30, 2021

I wonder if we can and should create a crate for sharing complex details like this. It could serve dual purposes of also documenting how to implement higher level git operations with libgit2, something that is lacking regardless of language. Pretty much the only resource I found was (pygit's recipes.

The big risk is people wanting these functions customized every which way that we might as well not provide them. If we can find a strict definition of what we accept (no touching working tree, conflicts are report as errors, etc) it'd help.

Maybe we could create an org we can collaborate in and I could move https://github.com/crate-ci/git-config-env over there (I put it there just to not have it in my personal repos which I try to avid).

@arxanas
Copy link
Contributor

arxanas commented Oct 31, 2021

I would be open to splitting the stuff under git-branchless/git into its own crate — I kind of wanted to do that anyways to improve compilation times. It wraps a lot of the libgit2 stuff into its own APIs, as per the commentary here, so it's not clear what functionality we wouldn't want to wrap.

That being said, it seems like it would slow down development to split up that crate from the rest of the repository, since development does tend to cross-cut at times.

There's also some neat, generally-useful stuff like the rebase engine which could be used across projects, but it's a harder to see how to split the API boundary, since it depends on details of the DAG like obsolete commits. (And the fact that it uses the Eden DAG would force that component to be GPLed.)

@epage
Copy link
Collaborator Author

epage commented Nov 1, 2021

I very much see libgit2 wrappers being too coupled to individual projects to be worth splitting out, unless they get very mature.

I was more envisioning a git-recipes crate that provides one off snippets that people can find useful like

So its mostly a question of whether the algorithms can be split out from the abstractions.

@epage
Copy link
Collaborator Author

epage commented Nov 3, 2021

Naming is hard :). I was looking to create an org to put these under but it looks like someone has the username git-rs. If anyone has alternative names for what to name this common ground for working on git related libraries and tools, I'm open to ideas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improve the expected
Projects
None yet
Development

No branches or pull requests

2 participants