-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merging PRs takes very long time for large repository #601
Comments
just changing the clone-path to Further improvements include
|
http://stackoverflow.com/questions/10637584/how-to-do-a-non-fast-forward-git-merge-to-a-branch-that-isnt-checked-out might be useful as well. Edit: I have a patch experimenting with that, will submit it later. |
@typeless not sure if it works for bare repos though 🙁 |
For reference: https://gist.github.com/cool-RR/6575042 |
In my case is a little slow too, repo is very big, an solution is add gauge/loading image for feedback to user. It's possible? If yes, is good add too in apply merge pull request, what is very slow too e no exixsts feedback to user, only the loading in the browser title. My repo size is: 1,8 GB. |
If we could refactor so that there is no local clone operation on server when merging I think it would significantly decrease merging time. Or at least keep it permanently and just reset hard from origin when needed and in such case all its operations must be protected by locking |
I once attempted to fix the issue by combining some plumbing commands but that approach had some critical flaws. It seems that the better solution would be to fix it in the Git upstream. |
I don't like that at present we're basically forced to create arbitrary paths on the server - it smells dangerous to me, even though we should have protected against climbing out of the root. I wouldn't be surprised if there were genuine issues on Windows regarding this e.g. reserved filenames, filename length, reserved characters and the like. I could also imagine a filename encoding problem to affect linux at some point. These problems similarly affect uploads. I'm just gonna put some details about how git works down here so they don't disappear.
I need to look at how merge actually works - I think we should be able to generate the merged data directly. Using plumbing it should be possible to actually not do a physical checkout at all and create an index in a bare repository that shares the object dbs of the repositories with the two HEADS we're merging from. E.g. We create an index from HEAD_1. Then merge each changed file in HEAD_2 into HEAD_1s version in turn - hashing each new object and updating the index as appropriate. We finally write the index as a tree, commit the tree, and update the refs before pushing. The merging files don't need to be written to a working copy - just a temporary file. Similarly the shared repositories can be in a temp directory and the index is a temporary file too. At no point do we actually try to checkout a repo Finally got round to looking at the source code for git -- verify_path which is run on adding to index disallows the committing of any .git directories (https://github.com/git/git/blob/7f4e64169352e03476b0ea64e7e2973669e491a2/read-cache.c#L951) although I haven't checked completely I suspect a malicious repo that contained a tree with a ".git" directory would be similarly protected by verify_path. |
I think your current approach should reduce the load considerably. Certainly the use of shared object databases is extremely important and would be responsible for improvements in load. It would be worth a pr in itself. In fact, if you don't think #4921 is going to be ready in time for 1.7 I'd recommend doing exactly that. (It's worth noting that most repos have many more objects than checkout files and although hard links are relatively cheap they still need to be created and still change inodes.) Just thinking on, although you've shared the base repository object database, when you add the remote that doesn't get shared too. It is possible to set up a repo to share multiple object databases. Could it be that one of these repos could have a much smaller object db, so when you fetch on the "remote" you actually copy a lot of objects unintentionally? This is likely to be an extreme edge case though. Although libgit2 would likely provide an improvement - it still requires in depth understanding of the plumbing and most of GitHubs improvements come from that deeper knowledge of plumbing. Yes they saw benefits descriptifying in the way that libgit2 allows them to do - we still need to move off the porcelain and have a deeper understanding of the plumbing before we can go do that approach. Your current PR is a definite step towards deeper plumbing knowledge. I'm working on file upload at present and have a working - albeit not very pretty - solution for this. It's also worth noting that the unique thing that libgit2 provides is the in memory index - the merging strategies come from this, but as a default index is a file it may be that we don't need this. |
We have deployed that bit on our server couple of months ago and it makes the difference between totally unusable (ie. the merge times out) and slow, but usuable. The Gitea performance on Windows with our multi-gigabyte repositories is atrocious out of the box, but we run a forked version that makes it quite bearable. It uses go-git for accessing the repositories and often makes the pages load > 10x faster (in some cases > 30x). I was not brave enough to use go-git for any write operations, especially since the merging algorithm is quite complex, but using the shared object database in itself saved ~30 seconds on each merge on 3+ Gb repositories. |
@filipnavara why no to share your changes to us back as I have also started to move parts of the read operations to go-git |
@lafriks I shared the links in other issues/PRs. https://github.com/filipnavara/gitea/commits/commitgraph (also the perf-read branch). I will rebase it when I get back after new year. |
Nice, will take a look again. Did perf improvement got merged in go-git? |
Yep, go-git has merged my performance fixes. The branch with serialized commit tree is stale at the moment, but it is not that important anyway. We use it on our server, but the performance improvements are relatively small compared to the rest of the changes. |
@zeripath Adding the path of the head repository to |
there is a bounty on this issue: https://www.bountysource.com/issues/40674560-merging-prs-takes-very-long-time-for-large-repository |
It's closed. It's from 2017. 😄 |
it can be clamed ... that's what I like to remind |
Gitea Version: 1.0.0+76-geb9ce39
[x]
):Description
When merging a PR, Gitea will clone the whole repository, which is unacceptably time-consuming when the repository is large.
See https://github.com/go-gitea/gitea/blob/master/models/pull.go#L285
--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/40674560-merging-prs-takes-very-long-time-for-large-repository?utm_campaign=plugin&utm_content=tracker%2F47456670&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F47456670&utm_medium=issues&utm_source=github).The text was updated successfully, but these errors were encountered: