Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide option to enable pushing submodule commits to a branch of the same name as the destination meta commit #726

Open
jhdub23 opened this issue Sep 7, 2019 · 21 comments

Comments

@jhdub23
Copy link
Contributor

jhdub23 commented Sep 7, 2019

Our typical workflow very much depends on branches across multiple repos. The meta repo state that should be recorded for every commit should not only record the hash of the subrepos, but also the branch of the subrepo. This branch info is pretty important.

Envisoned workflow:

git meta checkout -b release_v1
for all subrepos:
  git checkout -b release_v1
  git push -u origin release_v1
git meta commit -a -m "Created release_v1 branch" # records "current" branch for all open subrepos
git meta push

The next time we need to do a hotfix on the release branch, what I'd like to do:

git clone meta
git meta checkout release_v1
git open some_subrepo # Exactly the same as today, but create a local branch with the name of the recorded branch.  Set tracking of the local branch to remote branch (if it exists).

At this point, if someone else has directly made updates to branch some_subrepo/release_v1, I can just to a "git pull --fast-forward" to bring things up to date.

Without his branch info, we have to manually guess or somehow record as part of the commit what the "working branch" was at the time of the "git meta commit". A meta branch could potentially mix and match subrepo branches (i.e. meta:feature_a = subrepo1:master + subrepo2:feature_a + subrepo3:feature_b). On a new clone, we want to know which branch we should continue to work on for each subrepo.

The "git meta" can remain lightweight and not push branch names upstream, and leave this as a manual step for each subrepo.

@novalis
Copy link
Contributor

novalis commented Sep 7, 2019

The architecture doc explains why we don't do this: because then you have (as you note) the possibility of "shear" between the submodule branches and the meta branches. If some_subrepo has branch release_v1 set to commit X, but the meta repo's branch release_v1 is set to commit Y, who wins? The only possible answer is the meta repo, because that's the only one we can update atomically.

The idea is that you never make submodule commits outside of the context of the meta repo. That's what git meta is for: to make it easy to make submodule commits from within the meta repo.

@jhdub23
Copy link
Contributor Author

jhdub23 commented Sep 7, 2019

I read through the architecture doc in detail. Maybe I'm missing something, but how is this different from your local repo branch pointing to one commit, and the remote repo pointing to a different commit? If the branches have diverged, then "git push" fails until you resolve the divergence through rebase or merge.

@novalis
Copy link
Contributor

novalis commented Sep 7, 2019

In this case, the potential shear is between the meta repository's branches and the submodules' branches. You can't push atomically to the meta and submodules, or to multiple submodules (without weird custom server stuff, anyway). So it's possible for them to get out-of-sync. The question is: what are the semantics of this? We solve the problem by ignoring submodule branches, and only considering meta branches. (Inside Two Sigma, we do have a cronjob that populates submodule branches from meta repo branches, just for ease of browsing, but it's kind of a hack).

@jhdub23
Copy link
Contributor Author

jhdub23 commented Sep 7, 2019

Yes, I see the race condition/atomic problem across multiple repos if you actually try to push updates to branch heads. I'm thinking more along the lines of a synthetic meta branch head. On meta commit, record the commit hash plus current branch_name. This hash may or may not match origin/branch_name; we don't really care. We allow divergence with origin, and maybe just print a warning. On "git meta open", do the equivalent of:

git checkout -b branch_name hash
git branch --set-upstream-to origin/branch_name

At this point, the branch may be in a divergent state with respect to origin, but this is true today. The only difference is that you know what the original branch was for your rebase or merge operation, instead of having to guess.

@novalis
Copy link
Contributor

novalis commented Sep 7, 2019

You don't need the original branch for your merge/rebase, because you can use git meta merge or rebase, which works on meta commits.

But if this is really something that seems exciting, you might be able to do this with hooks. But I think it would be confusing to have that sort of divergence.

@abliss
Copy link
Contributor

abliss commented Sep 7, 2019 via email

@abliss
Copy link
Contributor

abliss commented Sep 7, 2019 via email

@jhdub23
Copy link
Contributor Author

jhdub23 commented Sep 7, 2019

We have a fairly large organization with multiple business units and multiple product groups within each business unit. Each product group has its own set of git repositories. We want to move to a monorepo across our entire company, but the scaling issues associated with one giant git repo is not acceptable. The git-meta architectural doc pretty much sums up our own conclusions, and we are now exploring using git-meta.

As a pilot deployment, we would layer git-meta on top of our existing repos. This was one of the bonuses of git-meta; it could coexist with our current multi-repo workflows. However, deploying git-meta without disrupting current multi-repo workflows means that what we see at the git-meta level should be consistent at the multi-repo level, and vice-versa.

We have a fairly standard release process. We create a permanent release branch (across all repos of interest), and hotfix those release branches as needed. If we hotfix at the meta level, we would need to make sure those hotfixes are reflected at the multi-repo release branch; they can't just live at the meta level. This means checking out the release branch for the subrepo, merge/rebase the meta commits, then pushing.

Relying on every developer to manually figure out what branch to check out when synchronizing the subrepo itself would be very error prone. Ideally, when we check out a branch of the meta and do a "git meta open", it would automatically change your local branch appropriately. At the subrepo level, git operations would be very natural to the developer and "just work" on the correct subrepo branch:

cd meta
git meta open subrepoA
cd subrepoA
# make changes to subrepoA
git commit
git pull --rebase # Just works.  We are on the correct subrepo branch associated with the meta branch
git push
git meta push

vs

cd meta
git meta open subrepoA
cd subrepoA
git checkout -b <hmmm, what meta directory am I on?  release_v1? develop? master?  Am I already on the branch, or detached?>
...

We don't need git-meta to do any automatic subrepo branch ref updating. The subrepo branch pushes can be left outside of the scope of git-meta. The only additional functionality is that enough info is recorded in git meta commits so that a git meta open would automatically do the correct "git checkout -b" and "git branch --set-upstream-to" upon a "git meta open".

@novalis
Copy link
Contributor

novalis commented Sep 8, 2019

I think we would accept a patch to do this, as long as it was optional.

@abliss
Copy link
Contributor

abliss commented Sep 8, 2019 via email

@jhdub23
Copy link
Contributor Author

jhdub23 commented Sep 9, 2019

What happens if two people try to push a hotfix to the same branch at the
same time? The architecture doc describes a possible race here: if each of
two pushes succeeds in pushing to a different set of repos, they can become
permanently deadlocked. Do you force developers to take out a central lock
for the duration of the hotfix push?

Currently, pushes happen manually one repo at a time (using the deprecated "gits" for some groups, manually by other groups), so pushes involving multiple repos can be interleaved between two people. At this point, both have to pull (whatever subset of repos that has been pushed), compile, run tests, then continue pushing. It's true that for a very short period of time, repos become out of sync, but this is quickly resolved by both parties. We've learned to live with this in our multirepo system. It's similar to a bad push causing compile or QA failures; when it happens, it's the highest priority to fix immediately.

Manually pushing the subrepos, even with git-meta, would maintain the status quo. However, git-meta pushes would be atomic and record states before we get into manually resolving the "race condition", so this would be an improvement to our current system.

We discourage our users from doing manual pulls and pushes in the
submodules. It quickly causes the meta repo to get into inconsistent states
which are hard for the user to understand.

I agree that working completely in git-meta and not at the submodule would be ideal. However, the reality is that we will not be able to instantly change our entire company and internal processes to use git-meta with the flick of a switch. Not everyone is convinced that monorepo is the way to go.

We will need to support both git-meta monorepo and our existing multirepo workflows for the transition period, and have a fallback plan if git-meta proves to be problematic for risk management. I believe that this would be true for any company with well established multi-repo workflows.

As far as using hooks and a global lock, I'm hoping we can avoid having to do that. The meta repo would be the one "source of truth," and if any submodule activity causes divergence from the meta repo, we would resolve that at the meta level and then push the resolution back to the submodule.

@novalis novalis assigned shijinglu and unassigned shijinglu Sep 9, 2019
@abliss
Copy link
Contributor

abliss commented Sep 10, 2019 via email

@jhdub23
Copy link
Contributor Author

jhdub23 commented Sep 14, 2019

The problem is when we have some people using git-meta and some not. How does a meta-user push his commits back to the subrepo so that the non-meta-user can see them? It's easy without branching, as there is only one branch (master). However, when there are branches, this branch selection becomes problematic.

I'll play around with branching within meta and construct a usage example where branch info is stored.

@abliss
Copy link
Contributor

abliss commented Sep 14, 2019 via email

@jhdub23
Copy link
Contributor Author

jhdub23 commented Sep 14, 2019

Yes, I think that would do the trick. The local tracking of branch history is not needed. Git meta open should also read this branch and set the subrepo to this local branch name.

Any coordination with the subrepo origin/branch_name would be left up to the user (along with all the pitfalls). We can tool around this part.

@abliss abliss changed the title Store branch name of submodules Provide option to enable pushing submodule commits to a branch of the same name as the destination meta commit Sep 14, 2019
@abliss
Copy link
Contributor

abliss commented Sep 14, 2019

Ok, I updated the title to reflect the new goal. I propose the config be named gitmeta.pushSubmoduleBranches and I think it should probably be just a couple-line change around https://github.com/twosigma/git-meta/blob/master/node/lib/util/push.js#L230 . WDYT @novalis ?

@novalis
Copy link
Contributor

novalis commented Sep 14, 2019

What do you want to do about the consistency problems? Warn the user when things get inconsistent? That seems fine, I guess.

@abliss
Copy link
Contributor

abliss commented Sep 14, 2019 via email

@jhdub23
Copy link
Contributor Author

jhdub23 commented Sep 14, 2019

For consistency problems, a warning would be sufficient and it's ok to leave it to the user to resolve. Right now, we use the deprecated "git-slave", and if a push fails on a repo, we know we are temporarily in an inconsistent state, but we just resolve it immediately.

I like the --keep-going option.

Thanks for implementing this. We are currently doing a pilot project with git-meta. If successful, we will roll it out to one product group, followed by one Business Unit, followed by the entire company.

@novalis
Copy link
Contributor

novalis commented Sep 19, 2019

Sorry, just to be clear: we'll happily take a patch on this, but I don't think we're likely to implement it ourselves.

@jhdub23
Copy link
Contributor Author

jhdub23 commented Sep 20, 2019

I see. Guess I'll have to start looking at the source code. Reaching out to anyone else out there who is already familiar with the code and is willing to make the enhancement...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants