-
Notifications
You must be signed in to change notification settings - Fork 37
Preparing for the merge #440
Comments
If using |
As I've raised the same question again. I believe this should be a submodule while development continues on GitHub. And then become a subtree, once development will shift over to phabricator. |
@bgamari Can you comment on why you think |
My understanding of the difference between
@izgzhen Answering your question, I think the main development will at some point shift from GitHub to the GHC tree, but we will be able accept patches in the GHC tree at any point and push these changes to GitHub. This will require some extra work from us, but it's possible using @angerman You say you "believe" we should use |
Just for reference, the email conversation @snowleopard is referring to is the one starting here: https://mail.haskell.org/pipermail/ghc-devs/2017-October/014890.html @snowleopard my initial impression of git subtree was likely flawed and it doesn't really embed one repo into another. However, I'll say this again. For consistency I would very much prefer git submodule over git subtree. As every other external reference in the ghc tree is a submodule. Let me just add, that with the git submodule, you will also be able to work on hadrian in ghc. There is effectively the hadrian repository inside of ghc, and you can push from there as well. (Of course there is the issue with the detached head; which needs some getting used to). I do see however that due to some technicalities this might only work if hadrian was moved into the ghc organization? as the submodule would have to point to GitHub.com/ghc/hadrian, instead of GitHub.com/snowleopard/hadrian? But I'm certain @hvr will be able to explain the necessary requirements better here. Not sure if the subtree would impose those technicalities as well. |
@angerman Yes, this is a very good point. Indeed, git submodules are familiar to all GHC developers.
Aha, I didn't realise this!
I have no problems with moving this repository into the |
Yes, but submodules and subtrees are two quite different solutions addressing rather different use-cases; they are not interchangeable. The former is essentially a pointer to a commit in an external repository. As far as the history of the containing repository is concerned the submodule has no "structure". The submodule is just a file with a commit SHA in it. We indeed use submodules for tracking third party libraries since we don't want their history to pollute the ghc tree. Creating a subtree, on the other hand, essentially amounts to merging one repository's history into another. This is ultimately the state we want Hadrian to end up in and therefore I think this is the state where I think we should start the merge. Starting with a submodule would require a fair amount of administrative work (since we have to setup mirrors) and yet all of this work would be for naught in the end; we would eventually have to rip out of the submodule and replace it with... a subtree. I have a hard time seeing the sense in this. If we are going to merge, let's do so in a way that won't need to be reverted later on. With the |
@bgamari I see you point. How will this look from the point of view of a GHC developer? We'll need to provide some guidance in the
Maybe this will also convince @angerman that there is nothing too difficult about the |
Fwiw, I have the following in my
so I can |
@hvr That will continue to work! The pushd "$foo" \
&& git checkout origin master \
&& git pull && popd \
&& git add "$foo" \
&& git commit -m "update $foo submodule" a mouthful which is not part of your alias. |
For comparison, the Rust people are going with the subtree approach as we speak with their core language interpreter:
|
@Ericson2314 do I understand it right that |
Happy to start the move, but so far it appears there is no consensus on how to proceed. We can do easy things, but I'd still prefer to have the full picture of what is going to happen before rushing ahead. |
@hvr Yes. I didn't know about |
@hvr, yes, that is correct. @snowleopard, I agree, we should have a complete story before moving ahead. However, @angerman said via IRC that he is alright with moving ahead with the subtree approach, so I suspect consensus isn't far off.
There will be a I have pushed an example of what this might look like to the |
@bgamari Many thanks! Great that the consensus is near. I've updated the top todo list, please have a look. The last two items are with question marks:
Have I missed anything?
How will GHC devs push changes to Hadrian? Presumably they can push their changes to the GHC repository directly, but I'm not clear on how these changes will reach the Hadrian repository. |
I agree. For now let's keep them in
I don't think this is necessary. Let's keep it under your account. This will hopefully be a relatively short-term solution. I hope that soon enough we will be able to move development into the
I believe |
@bgamari I see, thanks! I'll take care of the remaining todo's in the next couple of days. This bit is still unclear:
To make this work I think I need to arrange for GHC commits to be accepted in the Hadrian master branch (or a new branch to be merged via a PR). How do I do that? I guess in the worst case I'll just need to give push permissions to GHC developers manually (most likely there will not be too many contributors to Hadrian initially). |
How about we just keep things simple: allow contributors to push to their own forks of Hadrian and open pull requests just as they do now. This is what we would have ended up doing if we were to go the submodule route anyways. Does this sound reasonable? |
@bgamari Yes, I think it does. |
@bgamari I have one more question, following the recent incident with spamming GHC devs with every single Hadrian commit ;-) I thought the plan is to squash all Hadrian commits in the GHC subtree to avoid adding so many commits to the GHC history. The complete commit history will be preserved in the GitHub repository anyway, so why duplicate it? However, judging by the above-mentioned incident, it looks like you didn't squash the commits in your test run. So, what shall I do when preparing the patch? |
Ahh, so we had very different understandings of this decision it seems. I would be fine with squashing if you would like. My understanding from our discussion is that you preferred to keep the entire history, which I also understand and would be fine with. Having the entire history available within the GHC tree would be quite convenient and comes at a very small cost. We can disable the |
Yes, I do want to keep the entire history, but not necessarily in both repositories. If you think the cost is very small then we can keep it in GHC too. My only concern is that Hadrian's history is not very pretty: it's not just the history of Hadrian, it's also the history of my learning Haskell -- there's quite a lot of junk :-) |
Andrey Mokhov <notifications@github.com> writes:
Yes, I do want to keep the entire history, but not necessarily in both
repositories. If you think the cost is very small then we can keep it.
My only concern is that Hadrian's history is not very pretty: it's not
just the history of Hadrian, it's also the history of my learning
Haskell -- there's quite a lot of junk :-)
You should have a look at some of the earlier history of GHC sometime;
after doing so I suspect you won't feel quite so bad about hadrian. ;)
|
@bgamari I think Hadrian is ready for the merge. The only thing left for me to do is to create a branch in GHC, say
And then Have I missed anything? Do we need to disable the |
This looks right to me. However, let's not go through Phabricator. Just push the branch to
This shouldn't be necessary since there will be only one commit. |
I've pushed my branch: http://git.haskell.org/ghc.git/shortlog/refs/heads/wip/hadrian However, there is some turmoil in the GHC at the moment, so we might want not to rush with merging:
@bgamari Let's wait until things stabilise before merging to the master branch? |
The merge has happened. See ghc/ghc@9773053. |
Can someone please tell me how I point that subtree to a different repository and branch?
But that just sounds wrong. I have +2k commits now on my branch. |
with |
So I need help with:
I've tried to do:
at which point I stopped this process. Just counting the objects took about 5min. And writing 16MiB for a ~50line change seems wrong. I'm sorry for sounding so negative here right now. And I'm angry with myself for seeming not "getting it". Please someone help me, on how to develop with hadrian in GHC now. So why did I not just keep working like before? Because hadrian is now in GHC and as such overlaps with my previous hadrian checkout in GHC. And that's just terrible, because every branch switch... just about anything I do to GHC starts pulling back in the hadrian subtree, and I have constantly to revert the changes. |
Now that my GHC tree has a completely different hadrian subtree than upstream. And GHC upstream pulls hadrian periodically, does that mean I'll keep running into merge conflicts? |
Someone please just tell me what commands I need to run to keep working on my hadrian fork in GHC. Please! |
@angerman I don't have all the answers, but my understanding is that we continue Hadrian development in the GitHub repository, where we have the full history. Here is what I do: # Checkout GHC
git clone --recursive git://git.haskell.org/ghc.git
cd ghc
# Replace GHC's Hadrian with GitHub one
rm -rf hadrian
git clone https://github.com/snowleopard/hadrian.git
cd hadrian
# Work on Hadrian
git checkout my-branch This is not ideal, but works for me. |
@angerman I fear my workflow might not work for you, because you also make concurrent changes to GHC itself, while I do not... Sorry, the merge caused issues for you. One other (non ideal) solution is to stick to an older pre-Hadrian GHC commit if that works for you? |
@angerman, I believe the correct way to do this would be to simply work on |
@bgamari, that’s what I tried. (See above) but a push that takes multiple minutes and pushes multiple megabytes for a single small commit seems wrong, no? |
Just for the record. I've given up on the subtree, and added a submodule to my GHC tree. I'm not happy about the situation, but it allows me to keep on working. |
So here's what I know of subtrees. As you know, subtrees can share a history with a parent repository. The intention is that if you have an existing repository, you can just do a a Why does it do this? because it doesn't know what history it's already pushed to the remote repository. Since subtrees are re-using normal git structures, there's no meta information stored about them as there would be normally for a branch and tag. The solution, is that on every pull or merge back from the remote to the subtree, you tell git that the merge is also the latest revision on the remote. You do this by doing a
After doing this, you notice that the next If you're working on your own remote, you can do a rejoin after you push changes to create these, this preserving the same performance as branches and submodules have. So @angerman you did nothing wrong, it's just we never told git what the latest revision was on the remote Hadrian endpoint. So it had no idea what it could safely not send. (someone performing the merges) should run a As for how to use the subtree with your branch, you don't need to remove it to add your changes.
to get your latest changes, you can then just create a branch to work on
(you no longer need the subtree subcommand here since you've told git what the branch is tracking) note that rebasing with subtrees is a bit dangerous, it's best to just do a pull and let it merge. You could do a After you're done a push and want to tell git to stop sending the same commits again you can do a
which will merge your branch back into head, thus preventing the next split from doing the history traversal again. PS. I'm by means no expert on this, but this is what I understood from the manual and some experiments :) |
I'll give this a try later on. There are in the commit history git-subtree-dir and git-subtree-split markers in the hadrian squash commits though. Ahh well, I guess we all learn. Any anything that would lessen my pain with switching between branches, I appreciate. |
Great :(
|
Yes, but they're missing one crucial information, namely, how the new squashed commits relate to the mainline ones. For example, one of the squash commits from @bgamari says:
Notice the crucial difference with the one above? The one made with
see the
because the tree is a subtree. There's no common history,, but an artificial one. This is triggered by a change in It says:
.. And guess what subtrees do. This was subsequently corrected with git/git@0f12c7d but only thought to A lot of documentation/blogs/tutorials on So you can only apply merge commits on top of it or use a
or
instead
seems to work if you want some history... Again i'm sort of reconstructing this as I go :) |
Right, so I didn't notice but Subtrees rely on having the complete history in the merges branch. By doing any Without the history it will have to traverse the entire repo each time. Since we're just squashing and merging I don't think there's any point in having this being a subtree. The main benefit of having a subtree is to have the history... |
Andrey Mokhov <notifications@github.com> writes:
@Mistuke Thank you for the in-depth investigation.
@bgamari What do you think about this? It looks like squashing the history wasn't a good idea after all...
Yes, it would seem that way. I suspect the easiest way forward is to
fall back to a submodule until we are ready to retire the GitHub
repository for good. I can try to do this today if there are no
objections. My apologies for @angerman for all of the pain that this has
caused.
|
Thanks @bgamari! No objections from me. Please go ahead. |
We've switched to a submodule now: ghc/ghc@4335c07. |
So, it looks like Hadrian is finally getting merged! 🎉
The plan is to merge Hadrian as is, without yet recommending GHC developers to switch to it, but already requiring all GHC patches to be buildable both by Make and by Hadrian.
Let's use this issue to keep track of what needs to be done.
git subtree
to keep Hadrian repository on GitHub.README
.Anything else?
In my view, the two most important requirements in the long term are:
Preserving the commit/issue/pull-request history. A GHC developer fighting a strange build failure should be able to find a relevant discussion not only now but in 5 years from now. This may be solved via documentation, i.e. gradually moving all discussions from GitHub to docs/comments. That's a lot of hard work (compared to simply keeping Hadrian's repository here on GitHub).
Making it convenient for GHC developers to work on Hadrian. To me, git submodules are not convenient at all, but maybe there are just no other options given the requirement (1). Is git subtree a solution?
The text was updated successfully, but these errors were encountered: