Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow version calculation times on freshly cloned pull request #1850

Closed
hanzworld opened this issue Oct 9, 2019 · 13 comments
Closed

Slow version calculation times on freshly cloned pull request #1850

hanzworld opened this issue Oct 9, 2019 · 13 comments
Labels

Comments

@hanzworld
Copy link

Hi team

We have a large repository (800MB) which experiences incredibly slow GitVersion-ing when run on pull requests. However we can't understand how to fix it.

Command: gitversion.exe /output buildserver /nofetch /UpdateAssemblyInfo true
Symptom:

  • on pull request branches (e.g. refs/pull/9752/merge - which means the git repository is in a detached head state) it takes approximately 30 minutes to execute.
  • on a 'normal' branch (e.g. feature/thing) the problem does not occur and it takes between 1m30s to 2m30s.

I've worked out it's to do with the fact the repository is in a detached HEAD state. I know that GitVersion goes through a process to remedy such a scenario, and we can see this process in the logs: Begin: Normalizing git directory for branch 'refs/pull/9752/merge'. However, it according to the logs it takes < 1 minute to normalise the git directory by creating local branches, and the remaining 20+ minutes follow the "normal" GitVersion process but very, very slowly. If we rerun the same process on the same repo (not freshly cloned) it runs very quickly.

Timings on a sample run (taken straight from logs):

  • End: Normalizing git directory for branch 'refs/pull/9752/merge' (Took: 18,905.97ms)
  • End: Loading version variables from disk cache (Took: 0.74ms)
  • End: Calculating base versions (Took: 1,705,913.13ms)
  • End: Getting version tags from branch 'refs/heads/pull/9752/merge'. (Took: 7,628.48ms)
  • End: Creating dictionary (Took: 5.11ms)
  • End: Storing version variables to cache file C:\buildAgent\work\4d9ba6a2c8666e82.git\gitversion_cache\4DFE59029FF64EA34E655D9C6CBB0D781A41B4D7.yml (Took: 67.15ms)

We have tried the following with no difference.

  • Updating GitVersion from 3.0.1 to 5.0.1
  • Manually creating a branch at the specified SHA1 and checking it out prior to , so that the repository is no longer in a detached head state.

Let me know what else you need - the log is 2MB, but I'm happy to sanitize it and make available if of use.

@hanzworld hanzworld changed the title Very slow version calculation times on freshly cloned pull request branch Very slow version calculation times on freshly cloned pull request Oct 9, 2019
@asbjornu
Copy link
Member

#1838 should help, which will be included in the 5.1.0 release of GitVersion. If you want to test it now, you can download the build artifacts and give them a spin.

@hanzworld
Copy link
Author

hanzworld commented Oct 14, 2019

@asbjornu We tried it with 5.0.2-beta1.77. You're right in that it improved, however it still seems to take an unreasonably long time - 7 minutes (compared to 1.5 - 2.5 for a "normal" branch).

  • End: Normalizing git directory for branch 'refs/pull/9752/merge' (Took: 17,290.67ms)
  • End: Loading version variables from disk cache (Took: 0.98ms)
  • End: Calculating base versions (Took: 390,321.31ms)
  • End: Getting version tags from branch 'refs/heads/pull/9752/merge'. (Took: 7,788.63ms)
  • End: Creating dictionary (Took: 3.91ms)
  • End: Storing version variables to cache file C:\buildAgent\work\4d9ba6a2c8666e82.git\gitversion_cache\5F7285E0A0685F39B831D87C0582F693A5591F3A.yml (Took: 65.68ms)

We also tried with 5.0.2-beta1.95 with no futher improvement.

Given the normalization from a detached head state only takes 17s, do you know what it's doing differently in the calculating base versions?

@hanzworld hanzworld changed the title Very slow version calculation times on freshly cloned pull request Slow version calculation times on freshly cloned pull request Oct 14, 2019
@asbjornu
Copy link
Member

@hanzworld, since the author of #1838, @erikbra, has been looking at this code from a performance viewpoint recently, perhaps he has an idea of what the problem you're seeing might be?

@erikbra
Copy link
Contributor

erikbra commented Oct 14, 2019

I did have some theory about simplifying finding base revisions (see #1839), but I don't know exactly how it should be done. I just have a hunch that in many percent of the cases (as leas on our own codebase), the correct base revision is found after very few node traversals up the tree, and backtracking through all of the tree trying to find versions is not necessary.

Although, I am very humble about not knowing all the rules. I have some started work trying to cache what we read from LibGit2, to avoid reading from native code so many times, but it is just started, and I haven't got that far.

My theory is that in most cases, calculating the base version could be done in very few seconds (maybe sub-second), and not hundreds. And, you are right, on some branches (or PRs), calculating the base version is much heavier than finding the base version on e.g. master.

So, in conclusion, I have some ideas, but they are not verified yet, and problem is not solved. Hope to be able to look more at it in the not-so-distant future. But, you know, work, life, and all that... ;)

@asbjornu
Copy link
Member

@erikbra, I really appreciate the time you're spending on this. With regards to caching LibGit2's data, please see #1243 and #1244 for @JakeGinnivan's attempt to do the same thing. I completely agree that building our own immutable in-memory model of the Git tree is something we should do to speed things up, reduce bugs and create a LibGit2-independent abstraction that serves GitVersion's needs.

@erikbra
Copy link
Contributor

erikbra commented Oct 16, 2019

Thank for the tips on different PRs. I was thinking about just caching the IEnumerable from LibGit2 with the repo information, but maybe a full abstraction layer is a better way to go.

@erikbra
Copy link
Contributor

erikbra commented Oct 16, 2019

@hanzworld (or anyone else) - is your repository public? Or do you know of any other public repository that demonstrates the same issues? I have access to a private one that we have the same issues with, but it would be really great to have a public, large one to perform tests on, it will make the discussions easier.

@hanzworld
Copy link
Author

hanzworld commented Oct 17, 2019 via email

@erikbra
Copy link
Contributor

erikbra commented Oct 17, 2019

I don't know if it's too big. Other stuff I can think of is the asp.net core codebase. But it might be too big as well. It should have a few branches, and ideally use different strategies for versioning (labels, commit messages, etc), to test the different strategies.

@hanzworld
Copy link
Author

@erikbra I'm afraid I don't really understand the inner workings of GitVersion. Would you be able to explain to me why this takes so long in a detached head state? Or more specifically, why even when we resolve the detached head state (e.g. create a branch) it still takes longer than a non-detached-head-initial-state scenario?

@JakeGinnivan
Copy link
Contributor

JakeGinnivan commented Oct 22, 2019

The main issue is that GitVersion needs to understand what the source branch for your branch is.

There is no way to know for sure because git is a directed acyclic graph of commits, and branches just point to a single commit.

Take this example.

* feature2
|  * feature 3
|/
* feature1
|
* master

Given this graph, what is the parent of feature 3, is it feature1, feature2, or master? We can easily look at that and say well of course i'd want master.

What about this one?

* master
|  * feature1
|/
*

In this scenario feature1 was branched off master, but now master has moved on. GitVersion still needs to figure out that it needs to use master for version calculation and not one of the many other branches which were branched off different points in the graph.

@nikki9990
Copy link

nikki9990 commented Oct 23, 2019 via email

@stale
Copy link

stale bot commented Jan 21, 2020

This issue has been automatically marked as stale because it has not had recent activity. After 30 days from now, it will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants