Skip to content
This repository has been archived by the owner on Aug 2, 2020. It is now read-only.

Preparing for the merge #440

Closed
3 tasks done
snowleopard opened this issue Oct 19, 2017 · 53 comments
Closed
3 tasks done

Preparing for the merge #440

snowleopard opened this issue Oct 19, 2017 · 53 comments
Assignees

Comments

@snowleopard
Copy link
Owner

snowleopard commented Oct 19, 2017

So, it looks like Hadrian is finally getting merged! 🎉

The plan is to merge Hadrian as is, without yet recommending GHC developers to switch to it, but already requiring all GHC patches to be buildable both by Make and by Hadrian.

Let's use this issue to keep track of what needs to be done.

  • Fix all references to GitHub issues and pull requests in the source from #N to concrete GitHub links to avoid confusion with GHC tickets.
  • Prepare the patch, using git subtree to keep Hadrian repository on GitHub.
  • Update the README.

Anything else?

In my view, the two most important requirements in the long term are:

  1. Preserving the commit/issue/pull-request history. A GHC developer fighting a strange build failure should be able to find a relevant discussion not only now but in 5 years from now. This may be solved via documentation, i.e. gradually moving all discussions from GitHub to docs/comments. That's a lot of hard work (compared to simply keeping Hadrian's repository here on GitHub).

  2. Making it convenient for GHC developers to work on Hadrian. To me, git submodules are not convenient at all, but maybe there are just no other options given the requirement (1). Is git subtree a solution?

@snowleopard snowleopard self-assigned this Oct 19, 2017
@izgzhen
Copy link
Collaborator

izgzhen commented Oct 20, 2017

If using git subtree, where the main development will happen? In this repo, or in the GHC tree?

@angerman
Copy link
Collaborator

As I've raised the same question again. I believe this should be a submodule while development continues on GitHub. And then become a subtree, once development will shift over to phabricator.

@snowleopard
Copy link
Owner Author

@bgamari Can you comment on why you think git subtree is the way to go?

@snowleopard
Copy link
Owner Author

snowleopard commented Oct 20, 2017

My understanding of the difference between subtree and submodule is:

  • subtree will keep a copy (a snapshot) of Hadrian's repo in the GHC tree. To update the copy we can subtree pull from Hadrian's repo, probably squashing the commits, and push them to GHC. It is also possible to push changes made to Hadrian in GHC tree to Hadrian's repo.
  • submodule will keep a link to a specific commit in Hadrian's repo. To update GHC's version of Hadrian, one needs to change the link to point to a fresher commit in Hadrian's repo. With this approach the only way to change Hadrian is to do it in Hadrian's repo.

@izgzhen Answering your question, I think the main development will at some point shift from GitHub to the GHC tree, but we will be able accept patches in the GHC tree at any point and push these changes to GitHub. This will require some extra work from us, but it's possible using subtree route.

@angerman You say you "believe" we should use git submodule instead of git subtree. Can you be more specific? It looks to me that git subtree is strictly more powerful, but perhaps I'm missing something.

@snowleopard
Copy link
Owner Author

snowleopard commented Oct 20, 2017

P.S.: I just noticed there is a continued discussion on the ghc-devs mailing list with detailed comments from @hvr and @angerman. I'd prefer to keep the discussion here, as email is not the best medium.

@angerman
Copy link
Collaborator

Just for reference, the email conversation @snowleopard is referring to is the one starting here: https://mail.haskell.org/pipermail/ghc-devs/2017-October/014890.html

@snowleopard my initial impression of git subtree was likely flawed and it doesn't really embed one repo into another. However, I'll say this again. For consistency I would very much prefer git submodule over git subtree. As every other external reference in the ghc tree is a submodule.

Let me just add, that with the git submodule, you will also be able to work on hadrian in ghc. There is effectively the hadrian repository inside of ghc, and you can push from there as well. (Of course there is the issue with the detached head; which needs some getting used to). I do see however that due to some technicalities this might only work if hadrian was moved into the ghc organization? as the submodule would have to point to GitHub.com/ghc/hadrian, instead of GitHub.com/snowleopard/hadrian? But I'm certain @hvr will be able to explain the necessary requirements better here. Not sure if the subtree would impose those technicalities as well.

@snowleopard
Copy link
Owner Author

snowleopard commented Oct 20, 2017

However, I'll say this again. For consistency I would very much prefer git submodule over git subtree. As every other external reference in the ghc tree is a submodule.

@angerman Yes, this is a very good point. Indeed, git submodules are familiar to all GHC developers.

Let me just add, that with the git submodule, you will also be able to work on hadrian in ghc. There is effectively the hadrian repository inside of ghc, and you can push from there as well.

Aha, I didn't realise this!

I do see however that due to some technicalities this might only work if hadrian was moved into the ghc organization? as the submodule would have to point to GitHub.com/ghc/hadrian, instead of GitHub.com/snowleopard/hadrian?

I have no problems with moving this repository into the ghc organisation. As far as I understand this should not affect commits/issues/PRs. In fact, being part of the ghc organisation makes a lot of sense. I should have probably started the repo there in the first place. I've added a corresponding todo item above.

@bgamari
Copy link
Collaborator

bgamari commented Oct 20, 2017

Indeed, git submodules are familiar to all GHC developers.

Yes, but submodules and subtrees are two quite different solutions addressing rather different use-cases; they are not interchangeable. The former is essentially a pointer to a commit in an external repository. As far as the history of the containing repository is concerned the submodule has no "structure". The submodule is just a file with a commit SHA in it. We indeed use submodules for tracking third party libraries since we don't want their history to pollute the ghc tree.

Creating a subtree, on the other hand, essentially amounts to merging one repository's history into another. This is ultimately the state we want Hadrian to end up in and therefore I think this is the state where I think we should start the merge.

Starting with a submodule would require a fair amount of administrative work (since we have to setup mirrors) and yet all of this work would be for naught in the end; we would eventually have to rip out of the submodule and replace it with... a subtree. I have a hard time seeing the sense in this.

If we are going to merge, let's do so in a way that won't need to be reverted later on. With the subtree approach development can continue here and GHC can easily pull over changes on a regular basis (even once an hour, if we want). Updating is merely a git subtree pull hadrian master away.

@snowleopard
Copy link
Owner Author

snowleopard commented Oct 20, 2017

@bgamari I see you point. How will this look from the point of view of a GHC developer? We'll need to provide some guidance in the README, so people are not lost. What about the following?

  • If you need to build GHC with Hadrian, run ./build -j from the GHC root. (This assumes we create top-level scripts build.sh and build.bat that simply redirect to appropriate scripts in Hadrian subdirectory.) For this very common use-case there is no need to know about subtree at all.
  • To bring Hadrian up-to-date: git subtree pull hadrian master.
  • To make a change to Hadrian: edit the source file, and do git commit and git subtree push ??? -- this bit is unclear to me. What is the right process?

Maybe this will also convince @angerman that there is nothing too difficult about the subtree workflow.

@hvr
Copy link
Contributor

hvr commented Oct 20, 2017

Fwiw, I have the following in my ~/.gitconfig:

[alias]
	pullall = "!f(){ git pull \"$@\" && git submodule update --init --recursive; }; f"

so I can git pullall or git pullall --rebase and it does the right thing; how exactly would I need to change that to have the same "rebasing" pull for a subtree?

@Ericson2314
Copy link

@hvr That will continue to work!

The git subtree pull hadrian master people mention is analogous to changing the commit hash of a submodule:

pushd "$foo" \
 && git checkout origin master \
 && git pull && popd \
 && git add "$foo" \
 && git commit -m "update $foo submodule"

a mouthful which is not part of your alias.

@Ericson2314
Copy link

Ericson2314 commented Oct 20, 2017

For comparison, the Rust people are going with the subtree approach as we speak with their core language interpreter:

  1. Prep miri repository for rustc merger rust-lang/miri#258 in the interpreter repo
  2. Merge miri into librustc_mir rust-lang/rust#43340 in the main repo

@hvr
Copy link
Contributor

hvr commented Oct 20, 2017

@Ericson2314 do I understand it right that git subtree pull hadrian master is the equivalent of git submodule update --remote ./hadrian for subtrees?

@angerman
Copy link
Collaborator

Can we maybe start with the "small" and easy things? E.g. move this to github.com/ghc? @hvr, @bgamari, who can give relevant permissions?

@snowleopard
Copy link
Owner Author

Happy to start the move, but so far it appears there is no consensus on how to proceed. We can do easy things, but I'd still prefer to have the full picture of what is going to happen before rushing ahead.

@Ericson2314
Copy link

@hvr Yes. I didn't know about git submodule update --remote.

snowleopard added a commit that referenced this issue Oct 23, 2017
See #428.

Also see #440: build.sh may later be relocated to the top of the GHC tree.
snowleopard added a commit that referenced this issue Oct 23, 2017
…bat)

See #428.

Note that building Hadrian with Cabal currently fails on Windows, hence using Stack.

Also see #440: build.bat may later be relocated to the top of the GHC tree.
@bgamari
Copy link
Collaborator

bgamari commented Oct 26, 2017

@hvr, yes, that is correct.

@snowleopard, I agree, we should have a complete story before moving ahead.

However, @angerman said via IRC that he is alright with moving ahead with the subtree approach, so I suspect consensus isn't far off.

How will this look from the point of view of a GHC developer?

There will be a hadrian directory in the root of the ghc tree, just as though you had checked out hadrian today. One can update the subtree via git subtree -P hadrian https://github.com/snowleopard/hadrian master. That is pretty much all that is to it.

I have pushed an example of what this might look like to the hadrian-merge branch of my fork.

@snowleopard
Copy link
Owner Author

snowleopard commented Oct 26, 2017

@bgamari Many thanks! Great that the consensus is near.

I've updated the top todo list, please have a look. The last two items are with question marks:

  • Do we need build.sh and build.bat scripts in the GHC root or shall we keep them in hadrian folder for now? I think they may confuse people into thinking that Hadrian is already default.

  • Do we move this repository to https://github.com/ghc organisation? It's probably the best place for it in the long term, but perhaps there are some caveats I'm not aware of? Hopefully, all issues/PRs/etc. will keep working as is, and GitHub will forward all old links there.

Have I missed anything?

That is pretty much all that is to it.

How will GHC devs push changes to Hadrian? Presumably they can push their changes to the GHC repository directly, but I'm not clear on how these changes will reach the Hadrian repository.

@bgamari
Copy link
Collaborator

bgamari commented Oct 26, 2017

Do we need build.sh and build.bat scripts in the GHC root or shall we keep them in hadrian folder for now? I think they may confuse people into thinking that Hadrian is already default.

I agree. For now let's keep them in hadrian/.

Do we move this repository to https://github.com/ghc organisation? It's probably the best place for it in the long term, but perhaps there are some caveats I'm not aware of? Hopefully, all issues/PRs/etc. will keep working as is, and GitHub will forward all old links there.

I don't think this is necessary. Let's keep it under your account. This will hopefully be a relatively short-term solution. I hope that soon enough we will be able to move development into the ghc repository itself.

How will GHC devs push changes to Hadrian? Presumably they can push their changes to the GHC repository directly, but I'm not clear on how these changes will reach the Hadrian repository.

I believe git subtree push will do the right thing here.

@snowleopard
Copy link
Owner Author

snowleopard commented Oct 26, 2017

@bgamari I see, thanks! I'll take care of the remaining todo's in the next couple of days.

This bit is still unclear:

I believe git subtree push will do the right thing here.

To make this work I think I need to arrange for GHC commits to be accepted in the Hadrian master branch (or a new branch to be merged via a PR). How do I do that? I guess in the worst case I'll just need to give push permissions to GHC developers manually (most likely there will not be too many contributors to Hadrian initially).

@bgamari
Copy link
Collaborator

bgamari commented Oct 26, 2017

To make this work I think I need to arrange for GHC commits to be accepted in the Hadrian master branch (or a new branch to be merged via a PR). How do I do that? I guess in the worst case I'll just need to give push permissions to GHC developers manually (most likely there will not be too many contributors to Hadrian initially).

How about we just keep things simple: allow contributors to push to their own forks of Hadrian and open pull requests just as they do now. This is what we would have ended up doing if we were to go the submodule route anyways.

Does this sound reasonable?

@snowleopard
Copy link
Owner Author

@bgamari Yes, I think it does.

@snowleopard
Copy link
Owner Author

@bgamari I have one more question, following the recent incident with spamming GHC devs with every single Hadrian commit ;-)

I thought the plan is to squash all Hadrian commits in the GHC subtree to avoid adding so many commits to the GHC history. The complete commit history will be preserved in the GitHub repository anyway, so why duplicate it? However, judging by the above-mentioned incident, it looks like you didn't squash the commits in your test run. So, what shall I do when preparing the patch?

@bgamari
Copy link
Collaborator

bgamari commented Oct 30, 2017

Ahh, so we had very different understandings of this decision it seems. I would be fine with squashing if you would like. My understanding from our discussion is that you preferred to keep the entire history, which I also understand and would be fine with. Having the entire history available within the GHC tree would be quite convenient and comes at a very small cost. We can disable the ghc-commits@ hook before pushing if we want to go this route.

@snowleopard
Copy link
Owner Author

snowleopard commented Oct 30, 2017

Yes, I do want to keep the entire history, but not necessarily in both repositories. If you think the cost is very small then we can keep it in GHC too. My only concern is that Hadrian's history is not very pretty: it's not just the history of Hadrian, it's also the history of my learning Haskell -- there's quite a lot of junk :-)

@bgamari
Copy link
Collaborator

bgamari commented Oct 30, 2017 via email

snowleopard added a commit that referenced this issue Nov 4, 2017
@snowleopard
Copy link
Owner Author

snowleopard commented Nov 4, 2017

@bgamari I think Hadrian is ready for the merge. The only thing left for me to do is to create a branch in GHC, say wip/hadrian, and add Hadrian as a git subtree, squashing the history. To be precise, here is what I plan to do:

git clone --recursive git://git.haskell.org/ghc.git
cd ghc
git checkout -b wip/hadrian
git subtree add --prefix hadrian https://github.com/snowleopard/hadrian.git master --squash
# Remove hadrian directory from .gitignore -- this appears to be optional, but I'm not sure why
git commit -am "Do not ignore Hadrian"

And then arc diff HEAD~ to send the diff to Phabricator.

Have I missed anything?

Do we need to disable the ghc-commits@ hook or is it OK since I'm squashing the commits?

snowleopard added a commit that referenced this issue Nov 5, 2017
snowleopard added a commit that referenced this issue Nov 5, 2017
@bgamari
Copy link
Collaborator

bgamari commented Nov 6, 2017

This looks right to me. However, let's not go through Phabricator. Just push the branch to git.haskell.org and I'll have a look at it and merge.

Do we need to disable the ghc-commits@ hook or is it OK since I'm squashing the commits?

This shouldn't be necessary since there will be only one commit.

@snowleopard
Copy link
Owner Author

snowleopard commented Nov 7, 2017

I've pushed my branch: http://git.haskell.org/ghc.git/shortlog/refs/heads/wip/hadrian

However, there is some turmoil in the GHC at the moment, so we might want not to rush with merging:

@bgamari Let's wait until things stabilise before merging to the master branch?

@bgamari
Copy link
Collaborator

bgamari commented Nov 10, 2017

The merge has happened. See ghc/ghc@9773053.

@bgamari bgamari closed this as completed Nov 10, 2017
@angerman
Copy link
Collaborator

@bgamari

The merge has happened. See ghc/ghc@9773053.

Can someone please tell me how I point that subtree to a different repository and branch?
With a submodule, I'd simply change the remote and branch. I know how to do that. For
the subtree the best I found was:

git add remote my-remote https://path/to/repo.git
git rm -r hadrian
git commit # need to commit that the existing hadrian is gone
git subtree add hadrian my-remote/my/feature/name

But that just sounds wrong. I have +2k commits now on my branch.

@angerman angerman reopened this Nov 11, 2017
@angerman
Copy link
Collaborator

with --squash I only have a single commit left. So, now I wonder how I merge the upstream hadrian into my fork with git subtree

@angerman
Copy link
Collaborator

angerman commented Nov 11, 2017

So I need help with:

  • (I can live with this, but am not happy about it) blaming, now that everything is squashed I can't properly blame anymore :(
  • (I need this to work) how do I commit, and have those commits end up in the upstream repo/branch?
  • (just annoying) my tooling used to automatically pick up projects based on git repositories. As such searching in hadrian was confined to looking up search strings in hadrian. Now it's always returning matches for all of GHC :-/

I've tried to do:

$ git subtree push --prefix=hadrian angerman-hadrian angerman/feature/reloc
git push using:  angerman-hadrian angerman/feature/reloc
Counting objects: 423212, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (94553/94553), done.
Writing objects:  19% (80411/423212), 15.88 MiB | 258.00 KiB/s

at which point I stopped this process. Just counting the objects took about 5min. And writing 16MiB for a ~50line change seems wrong.

I'm sorry for sounding so negative here right now. And I'm angry with myself for seeming not "getting it". Please someone help me, on how to develop with hadrian in GHC now.

So why did I not just keep working like before? Because hadrian is now in GHC and as such overlaps with my previous hadrian checkout in GHC. And that's just terrible, because every branch switch... just about anything I do to GHC starts pulling back in the hadrian subtree, and I have constantly to revert the changes.

@angerman
Copy link
Collaborator

Now that my GHC tree has a completely different hadrian subtree than upstream. And GHC upstream pulls hadrian periodically, does that mean I'll keep running into merge conflicts?

@angerman
Copy link
Collaborator

Someone please just tell me what commands I need to run to keep working on my hadrian fork in GHC. Please!

@snowleopard
Copy link
Owner Author

@angerman I don't have all the answers, but my understanding is that we continue Hadrian development in the GitHub repository, where we have the full history.

Here is what I do:

# Checkout GHC
git clone --recursive git://git.haskell.org/ghc.git
cd ghc
# Replace GHC's Hadrian with GitHub one
rm -rf hadrian
git clone https://github.com/snowleopard/hadrian.git
cd hadrian
# Work on Hadrian
git checkout my-branch

This is not ideal, but works for me.

@snowleopard
Copy link
Owner Author

@angerman I fear my workflow might not work for you, because you also make concurrent changes to GHC itself, while I do not... Sorry, the merge caused issues for you. One other (non ideal) solution is to stick to an older pre-Hadrian GHC commit if that works for you?

@bgamari
Copy link
Collaborator

bgamari commented Nov 11, 2017

@angerman, I believe the correct way to do this would be to simply work on hadrian within GHC and then git subtree push your changes to your Hadrian fork.

@angerman
Copy link
Collaborator

@bgamari, that’s what I tried. (See above) but a push that takes multiple minutes and pushes multiple megabytes for a single small commit seems wrong, no?

@angerman
Copy link
Collaborator

Just for the record. I've given up on the subtree, and added a submodule to my GHC tree. I'm not happy about the situation, but it allows me to keep on working.

@Mistuke
Copy link
Contributor

Mistuke commented Nov 25, 2017

So here's what I know of subtrees.

As you know, subtrees can share a history with a parent repository. The intention is that if you have an existing repository, you can just do a git subtree split to get the repository into another repository, and then do a git subtree push to push the changes to a new repository. But this pushes the entire history. Why? because subtrees aren't a special feature of git. It's a re-use of existing facilities (it's actually implemented as a shell script https://github.com/git/git/blob/master/contrib/subtree/git-subtree.sh).

a git subtree push is actually doing a git subtree split followed by a git push. Split essentially just makes a clone of the history. But to clone the history of a prefix, it has to traverse every commit. This is why when @angerman did a git subtree push --prefix=hadrian angerman-hadrian angerman/feature/reloc he had to wait so long.

Why does it do this? because it doesn't know what history it's already pushed to the remote repository. Since subtrees are re-using normal git structures, there's no meta information stored about them as there would be normally for a branch and tag.

The solution, is that on every pull or merge back from the remote to the subtree, you tell git that the merge is also the latest revision on the remote.

You do this by doing a git subtree split --rejoin. What this does is create a commit with metadata for the subtree which looks like this:

commit 35a14a5d235bb682360a4a929752524452ee9ad5 (HEAD -> master)
Merge: 5e356276ef 6f7195d169
Author: Tamar Christina <tamar@zhox.com>
Date:   Sat Nov 25 01:41:02 2017 +0000

    Split 'hadrian/' into commit '6f7195d169ea8950bba64abe3151e990745b050e'

    git-subtree-dir: hadrian
    git-subtree-mainline: 5e356276efffa40e82c89628fbdf1a38ca489216
    git-subtree-split: 6f7195d169ea8950bba64abe3151e990745b050e

After doing this, you notice that the next git subtree split (and remember a git subtree push is really a split followed by a normal push) it doesn't have to traverse past this split marker!.

If you're working on your own remote, you can do a rejoin after you push changes to create these, this preserving the same performance as branches and submodules have.

So @angerman you did nothing wrong, it's just we never told git what the latest revision was on the remote Hadrian endpoint. So it had no idea what it could safely not send. (someone performing the merges) should run a rejoin to solve this pain, or I can submit the rejoin commit I created.

As for how to use the subtree with your branch, you don't need to remove it to add your changes.

git remote add <remote> <url>
git subtree pull --prefix=hadrian <remote> <remote-branch>

to get your latest changes, you can then just create a branch to work on

git subtree split --prefix=hadrian --branch <my-branch>
git push <remote> <mybranch>:<remote-branch>

(you no longer need the subtree subcommand here since you've told git what the branch is tracking)

note that rebasing with subtrees is a bit dangerous, it's best to just do a pull and let it merge. You could do a --squash to get only one big merge commit.

After you're done a push and want to tell git to stop sending the same commits again you can do a

git subtree split --prefix=hadrian --rejoin

which will merge your branch back into head, thus preventing the next split from doing the history traversal again.

PS. I'm by means no expert on this, but this is what I understood from the manual and some experiments :)

@angerman
Copy link
Collaborator

I'll give this a try later on. There are in the commit history git-subtree-dir and git-subtree-split markers in the hadrian squash commits though. Ahh well, I guess we all learn. Any anything that would lessen my pain with switching between branches, I appreciate.

@angerman
Copy link
Collaborator

Great :(

$ git subtree pull --prefix=hadrian angerman-hadrian angerman/feature/reloc
From github.com:angerman/hadrian
 * branch                  angerman/feature/reloc -> FETCH_HEAD
fatal: refusing to merge unrelated histories

@Mistuke
Copy link
Contributor

Mistuke commented Nov 25, 2017

There are in the commit history git-subtree-dir and git-subtree-split markers in the hadrian squash commits though.

Yes, but they're missing one crucial information, namely, how the new squashed commits relate to the mainline ones. For example, one of the squash commits from @bgamari says:

commit 360d7404809b3fa54541f7f932a6864294f75a4a
Author: Ben Gamari <ben@smart-cactus.org>
Date:   Wed Nov 22 08:47:55 2017 -0500

    Squashed 'hadrian/' changes from fa3771fe6b..4499b294e4

    4499b294e4 Follow GHC changes
    8fd68186b2 Add ways to build hadrian using nix
    e5c7a29c23 Do not depend on the in-tree filepath library
    9dd7ad2acc Fix dependencies
    497184390e Bring mtl dependency back
    6c5f5c9bd9 Minor clean up of Hadrian dependencies
    9aff81d424 Fix Windows build
    fa95caa8df Unbreak `cabal new-build`

    git-subtree-dir: hadrian
    git-subtree-split: 4499b294e4a53f71f8808d6eb55a7dd0b341cfb8

Notice the crucial difference with the one above?
It's telling git how it was split from the remote repo, but not how it related to the current one!
We have squashed the commits, so 4499b294e4a53f71f8808d6eb55a7dd0b341cfb8 doesn't exist locally.
Remember that you can arbitrarily change your history in git. So the order of the commits don't say much.

The one made with --rejoin locally has

    git-subtree-dir: hadrian
    git-subtree-mainline: 5e356276efffa40e82c89628fbdf1a38ca489216
    git-subtree-split: 6f7195d169ea8950bba64abe3151e990745b050e

see the git-subtree-mainline? that's telling it how the two relates.
I suspect we wouldn't have this problem, if we were merging with full history.

$ git subtree pull --prefix=hadrian angerman-hadrian angerman/feature/reloc
From github.com:angerman/hadrian
 * branch                  angerman/feature/reloc -> FETCH_HEAD
fatal: refusing to merge unrelated histories

because the tree is a subtree. There's no common history,, but an artificial one. This is triggered by a change in git merge since base operation which now rejects unrelated histories by default. https://github.com/git/git/blob/master/Documentation/RelNotes/2.9.0.txt#L58-L68

It says:

 * "git merge" used to allow merging two branches that have no common
   base by default, which led to a brand new history of an existing
   project created and then get pulled by an unsuspecting maintainer,
   which allowed an unnecessary parallel history merged into the
   existing project.  The command has been taught not to allow this by
   default, with an escape hatch "--allow-unrelated-histories" option
   to be used in a rare event that merges histories of two projects
   that started their lives independently.

.. And guess what subtrees do. This was subsequently corrected with git/git@0f12c7d but only thought to git subtree split --rejoin when you're explicitly telling it to merge two branches.

A lot of documentation/blogs/tutorials on subtrees predate this annoying change,

So you can only apply merge commits on top of it or use a split --rejoin to merge them by doing the merges manually.

> git subtree pull --prefix=hadrian angerman-hadrian angerman/feature/reloc --squash
From https://github.com/angerman/hadrian
 * branch                  angerman/feature/reloc -> FETCH_HEAD
Auto-merging hadrian/src/Types/Flavour.hs
Removing hadrian/src/Settings/Packages/RunGhc.hs
...
 hadrian/src/Utilities.hs                         |  40 ++-
 hadrian/src/Way.hs                               | 112 +-------
 99 files changed, 2175 insertions(+), 1625 deletions(-)
...
> 

or

git subtree split --prefix=hadrian angerman-hadrian/angerman/feature/reloc --rejoin
would have worked if we were mergin back full histories into GHC from Hadrian.

instead

git fetch angerman-hadrian angerman/feature/reloc
git merge -Xsubtree="hadrian" angerman-hadrian/angerman/feature/reloc --allow-unrelated-histories -s ours
> git subtree split --prefix=hadrian --rejoin
Already up-to-date.
6f7195d169ea8950bba64abe3151e990745b050e

seems to work if you want some history...

Again i'm sort of reconstructing this as I go :)

@Mistuke
Copy link
Contributor

Mistuke commented Nov 26, 2017

Right, so I didn't notice but --rejoin will include the history of the branch. It seems that fundamentally, the way we set up subtrees means you can't develop in the GHC tree checkout on Hadrian. This is because we don't include full history.

Subtrees rely on having the complete history in the merges branch. By doing any --rejoin operation which you need to do to sync the trees, branch etc, all will force it to include the history.

Without the history it will have to traverse the entire repo each time. Since we're just squashing and merging I don't think there's any point in having this being a subtree. The main benefit of having a subtree is to have the history...

@snowleopard
Copy link
Owner Author

@Mistuke Thank you for the in-depth investigation.

@bgamari What do you think about this? It looks like squashing the history wasn't a good idea after all...

@bgamari
Copy link
Collaborator

bgamari commented Nov 27, 2017 via email

@snowleopard
Copy link
Owner Author

Thanks @bgamari! No objections from me. Please go ahead.

@snowleopard
Copy link
Owner Author

We've switched to a submodule now: ghc/ghc@4335c07.

@snowleopard snowleopard mentioned this issue Dec 11, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants