Skip to content
This repository has been archived by the owner on Nov 30, 2024. It is now read-only.

RFC: Combine multiple RSpec repositories into a single one. #2509

Closed
1 of 2 tasks
xaviershay opened this issue Jan 24, 2018 · 23 comments
Closed
1 of 2 tasks

RFC: Combine multiple RSpec repositories into a single one. #2509

xaviershay opened this issue Jan 24, 2018 · 23 comments

Comments

@xaviershay
Copy link
Member

xaviershay commented Jan 24, 2018

Summary

Combine rspec-core, rspec-mocks, rspec-expectations, rspec-support, rspec-dev (which would be mostly deprecated with this proposal), the rspec meta-gem, and possible rspec-rails into a single repository.

Motivation

Making cross-cutting changes to RSpec involves a complicated dance of PRs. Selected examples:

  • Upgrading rubocop involved multiple PRs, that all needed to be somewhat kept in sync as a result of changes/feedback, and are still pending a final rspec-dev sync. As well as been a pain at the time, this has caused confusion for a subsequent contribution.
  • Adding to rspec-support to fix a bug in (say) rspec-mocks requires a PR to support, a PR to mocks pointing at that branch to demonstrate that the support change works, merging the support PR, then redoing the mocks PR to point back at master. (e.g. Add specs demonstrating problem with verified doubles and keyword args. rspec-mocks#726)
  • Any of the rspec-dev tasks to "sync" configuration to all repos requires a PR for each repo.
  • complex and partially overlapping CI builds

It's also not always clear to issue reporters which project is appropriate for an issue. We shouldn't expect newcomers to know which sub-project an issue could be related to!

Approach

Issues

Given a motivation is to simplify issue reporting, we should migrate them all to the main rspec project (which current has issues disabled) using something like github-issues-import. We can use tags (mocks, etc...) to track what came from where, but for new issues they would be optional.

Inflight PRs

We should explicitly close or merge every open PR. This will include closing many stale PRs, that will need to be re-opened against the new monorepo. We can mitigate this with a notice period (i.e. "one month from now we're merging repos").

CI

I see two ways to set up CI:

  1. Run all the specs for each project on every CI. Conceptually the simplest, potential downside is runtime. Given a lot of build time is build setup (installing ruby, gems, etc...) I suspect this will work out fine.
  2. Add conditional build logic to identify which projects are likely affected based on file paths changed, and only run those. Speeds things up, but more complicated and not technically as thorough (though probably good enough).

This is where rspec-rails would be an interesting addition, because it has a test matrix with rails versions that isn't applicable to the other projects and would probably force us to option 2. I'm leaning towards keeping it out, at least to begin with. We can always incorporate it later!

For now we should benchmark option 1.

rspec-dev

We need to audit rspec-dev to see what things it does are still appropriate and how to best incorporate them into the monorepo. Given it's primarily used to "manage" multiple repos, my hope is it can be mostly deprecated. This will be investigated in the prototype suggested below.

Drawbacks and mitigations

  • Git history is harder to work with. We should keep around the old git projects for spelunking purposes.
  • Intermingled git logs. I'm not worried about this. Possibly consider it an upside, being able to see what was happening across the entire project.
  • Easy to accidentally add unwanted dependencies. Since the code is "there" this potentially makes it easier to accidentally introduce coupling between the various components. I figure we have enough history and discipline keeping them separate that this shouldn't be an issue in practice.
  • wrap up or manually transfer all hanging pull requests in Expectations/Mocks/Support/Core
  • align git version tags - they overlap, and we need to keep them in sync
  • CI limit of simultaneous jobs constraints our a bit tighter

Next steps

  • Feedback from other members of core team.
  • Create a new project to test this idea (rspec-monorepo), including setting up CI and incorporating rspec-dev.
@xaviershay
Copy link
Member Author

@rspec/rspec feedback please :)

@xaviershay
Copy link
Member Author

Background reading on monorepos:

We don't have enough code (nor should we ever) to start hitting some of the bigger downsides of a monorepo.

And also my own relevant background: I was responsible for the CI teams at Square for many years, dealing with a large Java monorepo and many disparate ruby repositories.

@soulcutter
Copy link
Member

I support this idea.

@myronmarston
Copy link
Member

I very much support this idea. The amount of overhead we deal with due to having separate repos is demoralizing. Elixir (the main language I dealt with the pass couple years) has first-class support for monorepos with many sub-applications via its umbrella feature and I'm a big fan of the approach when you're tooling supports it.

There's also another potential benefit we get out of a monorepo: it can greatly lessen the effort involved in extracting a feature out of the core gems and into an extension gem. For example, if we ever want to extract any_instance into a separate gem so it doesn't ship with rspec-mocks (something I'd advocate we consider for RSpec 4), with our current approach, extraction would be difficult. We'd have to refactor rspec-mocks to promote some new things to first-class extension APIs so the any_instance code could use it. That would be a lot of effort. With a monorepo, we could potentially move the any_instance code into a sub-directory of the monorepo, w/o needing to define new extension APIs. Instead, we can just include the any_instance spec suite in our travis build, so we can keep ourselves from breaking the any_instance logic.

possibly rspec-rails into a single repository.

I'm in favor of not merging rspec-rails. It's pretty different, and not part of core RSpec. The CI build is way different with all the rails versions it supports. And we've talked about potentially de-coupling it's versioning from the versioning of the rest of RSpec.

Given a motivation is to simplify issue reporting, we should migrate them all to the main rspec project (which current has issues disabled) using something like github-issues-import. We can use tags (mocks, etc...) to track what came from where, but for new issues they would be optional.

Great idea. A few additional thoughts:

  • It would be good to cleanup our issues before doing this so we don't import issues that should really be closed.
  • The merging of issues could potentially happen before (or after) the code is merged. The fewer changes in flight at once, the better.
  • When importing an issue into the monorepo, can we include a link back to the original issue?
  • It would be great if we could have the import script close the original issue, with a link to the issue in the monorepo.
  • I think there's value in tagging issues with which gem the issue applies to on an ongoing basis, not just for imported issues.
  • We probably want to prevent the creation of new issues once we've imported, but not turn off the issues feature of the old repos entirely, so folks can still browse/search/etc old issues. Not sure what github supports here.

This will include closing many stale PRs, that will need to be re-opened against the new monorepo.

Honestly, I think we can just close stale PRs, with a note saying the person can re-open a new PR on the monorepo if they want to see it get merged. I don't think we need to go through the effort of manually importing PRs (and automating it sounds hard). Cleaning up open PRs would be really helpful as a preliminary step.

Run all the specs for each project on every CI. Conceptually the simplest, potential downside is runtime. Given a lot of build time is build setup (installing ruby, gems, etc...) I suspect this will work out fine.

I would favor this approach over the more complicated analysis approach you mentioned. I think we can optimize this using a well-designed build matrix. The cukes are generally the slow part of our build. I'd favor doing something like:

  • One matrix entry for all the spec suites (and other quick little checks like yard coverage, etc)
  • One matrix entry for each cuke suite (core, expectations, and mocks)

So that would be 4x matrix entires per supported Ruby version...which sounds bad, but isn't actually any worse than our current situation, where we've got 4 builds (core, expectations, mocks, support) per supported version. And it should run simpler and faster.

We can consider not running the cukes on all supported ruby versions if there's a concern about the number of matrix entries.

This is where rspec-rails would be an interesting addition, because it has a test matrix with rails versions that isn't applicable to the other projects and would probably force us to option 2. I'm leaning towards keeping it out, at least to begin with. We can always incorporate it later!

I don't think I can even wrap my head around how we'd incorporate rspec-rails in a monorepo build. I very much advocate for keeping it separate. It's a special beast.

We need to audit rspec-dev to see what things it does are still appropriate and how to best incorporate them into the monorepo. Given it's primarily used to "manage" multiple repos, my hope is it can be mostly deprecated.

There's a couple different things rspec-dev provides:

  • Canonical copies of some files that we then push out to all the other repos. Once we have a mono repo, we can just have one copy of each file, with no tooling to sync it up needed.
  • A bunch of rake tasks for supporting RSpec development. Some of these are just there to deal with the pain of not having a monorepo (e.g. tasks to change branches in all repos, etc), and these can probably be dropped. Other tasks support doing things like bumping versions on all gems (which we'll still need), cutting releases, generating stats for blog posts, etc, and we'll still want these tasks--they'll just go in the root Rakefile of the monorepo.

I don't think we need to stringently audit rspec-dev and try to bring everything over--instead, we can just bring things over as we have the need for a task.

Create a new project to test this idea (rspec-monorepo), including setting up CI and incorporating rspec-dev.

This is a great idea. Would rspec-monorepo be a prototype, and then we'd re-do the merge on rspec/rspec for real? A couple thoughts/ideas:

  • rspec-monorepo should probably start off as a fork of rspec/rspec so you can test out merging into an existing repo with history.
  • Whoever works on this should keep copious notes on how they did it so we can easily reproduce on the rspec repo. Automating the process with some scripts is probably good, although as a one-time process we shouldn't try to go overboard.

One thing you didn't mention: how are we going to actually merge the source history of the repos? Is there a good tool for that? Obviously, all the SHAs are going to change. If possible, it would be really cool if whatever tool we used to import commits amended the message of each commit with a note like [Imported from rspec/rspec-core@abc1239237e8234bedf] so there's a clear link from each commit to the original. And we could potentially create a link the other way, too: since github lets you leave comments directly on commits, our import tool could leave an automated comment on each commit linking to its new location in the monorepo. Although, given the number of commits, I can imagine that would be super slow....so probably not worth it.

While I'm 100% on board with this idea, I can't say I'm excited (or particularly available) to do much of the work here. Is there someone who is willing to lead this effort? (Looking at you, @xaviershay :)...)

@JonRowe
Copy link
Member

JonRowe commented Jan 26, 2018

I'm on board with the mono repo idea, would certainly simplify keeping everything in sync!

On the CI side I'd recommend we not split out the various builds into matrices, as it will (ironically) slow the build down. Travis seems to limit the number of in progress builds at a time (one of the reasons why rspec-rails builds are so slow), which is fair given its a free service, so having one build per Ruby version will probably work out faster in practise.

@yujinakayama
Copy link
Member

yujinakayama commented Jan 26, 2018

👍 for the monorepo idea.

Some ideas for the import of commits:

  • Keep original author info, so that they will appear in the contribution graph on GitHub
  • Keep original author/commit dates
  • Add commits in chronological order for easier history investigation
    • Commits "Add foo" in rspec-support and "Use foo from rspec-support" in rspec-core should be close in the merged history

@xaviershay
Copy link
Member Author

Some ideas for the import of commits:

Oh interesting, I was just planning to start a new repo and copy the code over, with initial commit being "see the old repos". I can play around with various ways of importing history ... guess it should be pretty straight forward since there's no way they could conflict with one another.

@myronmarston
Copy link
Member

Oh interesting, I was just planning to start a new repo and copy the code over, with initial commit being "see the old repos". I can play around with various ways of importing history ... guess it should be pretty straight forward since there's no way they could conflict with one another.

I think for the prototype repo, copying in all the code is OK so you can work on getting CI setup, etc., but for the final merge I think the extra effort to preserve the source history as much as possible is valuable. I like @yujinakayama's ideas.

@xaviershay
Copy link
Member Author

Sorry I have made basically no progress on this. I'm going to de-commit from it rather than let it go on in limbo.

I made this prototype repo to start playing around with ideas if anyone wants to pick it up: https://github.com/rspec/rspec-monorepo-prototype

@p8
Copy link

p8 commented Oct 2, 2019

We could merge the other repos while keeping all the history.
I had some success with the steps described here (https://thoughts.t37.net/merging-2-different-git-repositories-without-losing-your-history-de7a06bba804):

git clone git@github.com:rspec/rspec-core.git
git clone git@github.com:rspec/rspec-expectations.git
cd rspec-core
git remote add rspec-expectations ../rspec-expectations
git fetch rspec-expectations
git co -b merge-rspec-expectations
git merge -S --allow-unrelated-histories rspec-expectations/master

#2665

@yujinakayama
Copy link
Member

yujinakayama commented Oct 8, 2021

Recently I've been working on my project RepositoryMerger which satisfies the requirements for the repository migration, and finally it's almost complete.

Here's a prototype monorepo:

https://github.com/yujinakayama/rspec-monorepo

The largest benefit of the monorepo generated by RepositoryMerger is that we can handle the past codebase and commits in the same way as we do after the migration.

  • We can checkout any commit like git checkout 3-10-maintenance and just run cd rspec-core; bundle exec rspec as always. This is equivalent to rake git:checkout[3-10-maintenance] and cd repos/rspec-core; bundle exec rspec in rspec-dev.
  • All the commit history is available. Of course we can git bisect to find the commit that introduced a bug.

How it works

Though I think the monorepo generated by RepositoryMerger is basically intuitive and easy to handle, below is a brief explanation of the behavior for review from the core team members.

First we need to choose target branches, which we consider important as history and want to import into the monorepo. In other words, commits unreachable from the target branches in the original repos (e.g. stale pull requests) won't be imported. In our case, the target branches should be main and the maintenance branches (2-2-maintenance ... 3-10-maintenance).

Then we can import commit history for each branch one by one in the following way:

configuration = RepositoryMerger::Configuration.new(
  original_repo_paths: %w[rspec rspec-core rspec-expectations rspec-mocks rspec-support],
  monorepo_path: 'monorepo'
)

repo_merger = RepositoryMerger.new(configuration)

repo_merger.merge_commit_history_of_branches_named('main')
repo_merger.merge_commit_history_of_branches_named('2-2-maintenance')
repo_merger.merge_commit_history_of_branches_named('2-3-maintenance')
# ...
repo_merger.merge_commit_history_of_branches_named('3-10-maintenance')

The imported files are placed under subdirectories for each original repo in the same manner as rspec-dev.

Importing a single set of branches

For simplicity, let's consider a case of importing only a single set of branches from two repos: creating main branch in the monorepo by importing the two main branches from rspec-core and rspec-expectations:

repo_merger.merge_commit_history_of_branches_named('main')

RepositoryMerger imports all the commits from the original branches with the following rules to preserve the original commit graph structure as much as possible:

  • Mainline commits (commits traceable from the target branch head, by following only first parents) are mixed in commit time order.
  • Always import parent commits before children, even if children have earlier commit time than the parents.
  • Non-mainline commits are branched from the same commits and merged into the same ones as the original.

Importing single set of branches

Importing multiple sets of branches

Now let's consider a case of importing two sets of branches from two repos: creating main and 3-0-maintenance branches in the monorepo by importing the two sets of branches from rspec-core and rspec-expectations:

repo_merger.merge_commit_history_of_branches_named('main')
repo_merger.merge_commit_history_of_branches_named('3-0-maintenance')

The first import phase for main does exactly the same processing as the case above. In the second phase for 3-0-maintenance, it reuses already imported commits if possible instead of creating new ones so that main and 3-0-maintenance branches in the monorepo will share some of the commits until they diverge:

Importing multiple sets of branches

This result comes as no surprise.

Importing multiple sets of branches with complex commit graph

However, it may produce a surprising result if original repos have complex commit graph structures. For example, let's consider a case where the main branch in rspec-core has been merged into the 3-8-maintenance through a bug fix branch at a point after they diverged:

Importing multiple sets of branches with complex commit graph

In this case, RepositoryMerger intentionally creates duplicate commits, instead of reusing commits that would cause commit contamination. As a result, the commit graph seems reassembled. At first glance this may look strange, but if you look at each branch this is reasonable to avoid commit contamination between target branches.

Commit message conversion

I've implemented the following commit message conversions for convenience and clarity:

  • Prepend scope of the commit: e.g. Fix a bug in rspec-core -> [core] Fix a bug
  • Convert repo-local GitHub issue references to absolute ones: Closes #123 in rspec-core -> Closes rspec/rspec-core#123
  • Reference to original commit: add footer This commit was imported from https://github.com/rspec/REPO/commit/SHA1. to every commit.

Examples:

Tag name conversion

Tags are suffixed with their original component name: v3.10.0 in rspec-core is imported as v3.10.0-core.

Remaining issue: some old tags point orphan commits

Through trial and error in the development of RepositoryMerger, I've found a strange fact that some old tags in our current repos point orphan commits that have different root commits from the main branches.

These are the tags with orphan commits:

  • rspec-core:
    • v2.11.2
    • v2.11.3
  • rspec-mocks:
    • v2.2.1
    • v2.3.1
    • v2.5.1
    • v2.5.2
    • v2.6.1
    • v2.6.2.rc
    • v2.6.2
    • v2.6.3.beta1
    • v2.6.3
    • v2.6.4
    • v2.7.1

Visualized commit graphs generated with the following command (note that you cannot distinguish the orphan commits with --oneline format):

git log --graph --format='%ci %C(yellow)%h%Creset%Cgreen%C(bold)%d%Creset %s%n' --simplify-by-decoration '--remotes=*-maintenance' origin/main --tags

rspec-core:
orphan tags in rspec-core

rspec-mocks:
orphan tags in rspec-mocks

I'm not sure how we should handle these tags. Technically importing them into monorepo is possible but it should cause confusion since the imported branch will also be an orphan one and contain a lot of duplicate commits.

@myronmarston Do you know how these tags have been created?


@rspec/rspec What do you think?

@myronmarston
Copy link
Member

@yujinakayama: amazing work! There's a ton of nuance and complexity here that I didn't realize.

On a side note, I've gotten a bunch of emails from github during this process due to imported commits that @mention me:

image

I was wondering what was causing this; now I know :). In theory it would be nice to prevent any additional test runs that you do from sending those emails (I imagine I'm not the only one getting them...). One idea would be to change @mention to @-mention in commit messages in any test runs (while keeping a proper @mention in the final import run).

There's one key invariant that seems really important to verify: for each imported branch, the file system contents of a gem subdirectory must match the file system contents of the same branch in the original repository. For example, on the 3-10-maintenance branch of the rspec-core repository, the file system contents should match the contents of the rspec-core directory of the 3-10-maintenance branch. I'm not sure of the git command but I believe there's a way to get the SHA hash of the git tree object that represents the file system contents of a directory--what do you think about adding a check to your script that the tree hash is the same for a gem directory on the imported repo as it was in the source repo?

I've implemented the following commit message conversions for convenience and clarity:

These all sound good, but I think there's one more that would be helpful: when a commit SHA is mentioned in a commit message (e.g. Fixes regression introduce in 2114c9b25ffd1d01434c18c131ab9015ad3dcf18), it would be nice if the SHA was updated to the SHA of the corresponding imported commit in the imported repo. Is that doable? I suspect SHA references would show up in two forms (just the SHA, and a URL to the SHA on github).

@myronmarston Do you know how these tags have been created?

Not off the top of my head. I have no memory of it. Let me dig into it and get back to you.

Also, one more thing to consider: what github repo are we going to use for this? In theory rspec/rspec would be ideal but it's a repository that already has a history, and that may not play well with the import process. Then again maybe there's a way to at least preserve its master/main branch history in a named branch (old-rspec-main or something) while importing into other branches that don't treat any of the existing commits in that repo as parents. Or maybe it's best to do it in a new repository.

Thoughts?

@myronmarston
Copy link
Member

@myronmarston Do you know how these tags have been created?

I spent some time looking into this and it's super weird. I have no specific memories of how that happened. I took a look at rspec-core v2.11.3 (and v2.11.2) and they have a 100% alternate history, all the way back to the first commit, so that they actually have no shared commits with the main branch. I was thinking maybe someone rebased and force pushed but I don't see how that could even cause this state. Maybe it's possible it got corrupted some how? Although with git's hash-based model I don't really see how that could happen either.

In theory, there might be a way to "repair" it. If you check out the v2.11.2 and v2.11.3 tag I can see there's a commit in the history that claims to be the 2.11.1 release, so you could perhaps branch off of the v2.11.1 tag, and cherry-pick each commit after the alternate 2.11.1 release commit up through the v2.11.3 tag on the new branch. If the cherry-picks applied cleanly (a big if...) you'd have a "clean" 2-11 branch that you could use during the import process.

....but with all that said, I don't think it's worth the effort to try to fix these orphaned branches. I can't imagine anyone is going to use the old 2.x maintenance branches for anything ever again. And it's worth remembering that the import process is not destructive: the original rspec-core, rspec-mocks, etc repositories will be left in place as-is, right? These odd orphaned branches will remain available in those repositories.

@yujinakayama
Copy link
Member

yujinakayama commented Oct 10, 2021

@myronmarston Thanks for the feedback!

One idea would be to change @mention to @-mention in commit messages in any test runs (while keeping a proper @mention in the final import run).

Oh, sorry. I didn't imagine that the mentions were included in the commit messages. I'll modify them as you suggested next time I rebuild the monorepo.

There's one key invariant that seems really important to verify: for each imported branch, the file system contents of a gem subdirectory must match the file system contents of the same branch in the original repository.

Yeah, it's tested in this spec by comparing lists of files with digests. I didn't think of your approach with git tree objects, though, I'll add spec examples with that approach for more confidence. I just confirmed that it worked by hand 👍🏻:

$ cd original_repos/rspec-core
$ git cat-file -p '3-10-maintenance' | grep tree
tree 20b8a7de9ca320fec376e559c30e5548655753a2
$ cd ../rspec-expectations
$ git cat-file -p '3-10-maintenance' | grep tree
tree 89fac6ca71bccfef0bea2f5ff0bd650376e17d81
$ cd ../rspec-mocks
$ git cat-file -p '3-10-maintenance' | grep tree
tree f49bb412ec577ec313cfae97e157d8a900d01b8e
$ cd ../rspec-support
$ git cat-file -p '3-10-maintenance' | grep tree
tree a351b573892f9328c78a96de9479d75058060d53
$ cd rspec
$ git cat-file -p '3-10-maintenance' | grep tree
tree 1c5701ea1b875c9b76ac2c05af76abded8f904f4
$ cd ../../monorepo
$ git cat-file -p '3-10-maintenance^{tree}'
040000 tree 20b8a7de9ca320fec376e559c30e5548655753a2	rspec-core
040000 tree 89fac6ca71bccfef0bea2f5ff0bd650376e17d81	rspec-expectations
040000 tree f49bb412ec577ec313cfae97e157d8a900d01b8e	rspec-mocks
040000 tree a351b573892f9328c78a96de9479d75058060d53	rspec-support
040000 tree 1c5701ea1b875c9b76ac2c05af76abded8f904f4	rspec

when a commit SHA is mentioned in a commit message (e.g. Fixes regression introduce in 2114c9b25ffd1d01434c18c131ab9015ad3dcf18), it would be nice if the SHA was updated to the SHA of the corresponding imported commit in the imported repo. Is that doable? I suspect SHA references would show up in two forms (just the SHA, and a URL to the SHA on github).

It's possible, but I noticed a concern as I extracted commit references in the URL format, there're some references that point commit comments on GitHub. If we replace them with URLs pointing the imported commits, we cannot see the comments unless follow the link This commit was imported from ....

I think we have two options for this:

  • A. Convert all the references in any format to point to the new commits, giving up the inconvenience of commit comments.
  • B. Convert only repo-local references (only SHA) to user/repo@SHA format that point old commits (i.e. we don't touch SHA).

Also, one more thing to consider: what github repo are we going to use for this? In theory rspec/rspec would be ideal but it's a repository that already has a history, and that may not play well with the import process. Then again maybe there's a way to at least preserve its master/main branch history in a named branch (old-rspec-main or something) while importing into other branches that don't treat any of the existing commits in that repo as parents. Or maybe it's best to do it in a new repository.

I'd prefer the new repository approach since keeping the merged contents and the old metagem contents together should confuse the future maintainers, like our current situation about the orphaned branch.

Currently I'm thinking the following plan:

  1. Rename the current rspec repository as rspec-metagem or something (without renaming the gem itself). This should be done a few months before the actual migration (maybe now). This is a relatively safe operation since GitHub automatically redirects accesses until we create a new repository named rspec.
  2. Announce the plan for the repository migration.
  3. Create the monorepo as rspec-monorepo as a preparation.
  4. When everything is ready, rename the monorepo as rspec.

In theory, there might be a way to "repair" it. If you check out the v2.11.2 and v2.11.3 tag I can see there's a commit in the history that claims to be the 2.11.1 release, so you could perhaps branch off of the v2.11.1 tag, and cherry-pick each commit after the alternate 2.11.1 release commit up through the v2.11.3 tag on the new branch. If the cherry-picks applied cleanly (a big if...) you'd have a "clean" 2-11 branch that you could use during the import process.

....but with all that said, I don't think it's worth the effort to try to fix these orphaned branches. I can't imagine anyone is going to use the old 2.x maintenance branches for anything ever again. And it's worth remembering that the import process is not destructive: the original rspec-core, rspec-mocks, etc repositories will be left in place as-is, right? These odd orphaned branches will remain available in those repositories.

I was thinking exactly the same thing. Repairing them might be possible (I think another issue should arise in the process, though 🙄), but it's a sort of "history rewrite" which is something I'm reluctant to do if possible.

@yujinakayama
Copy link
Member

yujinakayama commented Oct 12, 2021

By the way, there're some missing maintenance branches where no patch version has been released for the minor version. Since they are needed to merge branches properly (I've been using my forked repos in the deveploment), I'll run the following commands and push the branches:

cd rspec
git branch 2-2-maintenance v2.2.0
git branch 2-3-maintenance v2.3.0
git branch 2-5-maintenance v2.5.0
git branch 2-6-maintenance v2.6.0
git branch 2-7-maintenance v2.7.0
git branch 2-9-maintenance v2.9.0
git branch 2-10-maintenance v2.10.0
git branch 2-11-maintenance v2.11.0
git branch 2-13-maintenance v2.13.0

cd rspec-core
# v2.13.1 is included in the main branch
git branch 2-13-maintenance v2.13.1

cd rspec-expectations
git branch 2-2-maintenance v2.2.0
git branch 2-3-maintenance v2.3.0
git branch 2-5-maintenance v2.5.0
git branch 2-6-maintenance v2.6.0
git branch 2-10-maintenance v2.10.0
git branch 2-13-maintenance v2.13.0

cd rspec-mocks
git branch 2-2-maintenance v2.2.0
git branch 2-3-maintenance v2.3.0
git branch 2-5-maintenance v2.5.0
git branch 2-6-maintenance v2.6.0
git branch 2-7-maintenance v2.7.0
git branch 2-9-maintenance v2.9.0
git branch 2-10-maintenance v2.10.1

@yujinakayama
Copy link
Member

yujinakayama commented Oct 14, 2021

For the issue of the orphaned branches, I found new facts, which are good news:

  • The orphaned branch for v2.11.2 and v2.11.3 in rspec-core is actually a partial duplication of rspec-mocks repo.
  • The orphaned branch for v2.2.1 ... v2.7.1 in rspec-mocks is actually a partial duplication of rspec-core repo.
Orphaned branch in rspec-core
$ cd rspec-core
$ git checkout v2.11.3
$ ls *.gemspec
rspec-mocks.gemspec
$ grep \\.name rspec-mocks.gemspec
  s.name        = "rspec-mocks"
$ cd ../rspec-mocks
$ git show --quiet v2.11.3
tag v2.11.3
Tagger: Myron Marston <myron.marston@gmail.com>
Date:   Wed Sep 19 20:44:40 2012 -0700

Version 2.11.3

commit d0bcce439a6bf9b3245db3553ad5e82233f96634 (HEAD, tag: v2.11.3)
Author: Myron Marston <myron.marston@gmail.com>
Date:   Wed Sep 19 20:42:17 2012 -0700

    2.11.3 release.
$ cd ../rspec-mocks
$ git show --quiet  v2.11.3
tag v2.11.3
Tagger: Myron Marston <myron.marston@gmail.com>
Date:   Wed Sep 19 20:44:40 2012 -0700

Version 2.11.3

commit d0bcce439a6bf9b3245db3553ad5e82233f96634 (tag: v2.11.3, origin/2-11-maintenance, 2-11-maintenance)
Author: Myron Marston <myron.marston@gmail.com>
Date:   Wed Sep 19 20:42:17 2012 -0700

    2.11.3 release.
Orphaned branch in rspec-mocks
$ cd rspec-mocks
$ git checkout v2.7.1
$ ls *.gemspec
rspec-core.gemspec
$ grep name rspec-core.gemspec
$ grep \\.name rspec-core.gemspec
  s.name        = "rspec-core"
$ git show --quiet v2.7.1
tag v2.7.1
Tagger: David Chelimsky <dchelimsky@gmail.com>
Date:   Thu Oct 20 07:02:57 2011 -0500

Version 2.7.1

commit d63dc5d563b90e6344b7139a614dc6eab3a50eb4 (HEAD, tag: v2.7.1)
Author: David Chelimsky <dchelimsky@gmail.com>
Date:   Thu Oct 20 07:01:24 2011 -0500

    bump to 2.7.1
$ cd ../rspec-core
$ git show --quiet v2.7.1
tag v2.7.1
Tagger: David Chelimsky <dchelimsky@gmail.com>
Date:   Thu Oct 20 07:02:57 2011 -0500

Version 2.7.1

commit d63dc5d563b90e6344b7139a614dc6eab3a50eb4 (tag: v2.7.1)
Author: David Chelimsky <dchelimsky@gmail.com>
Date:   Thu Oct 20 07:01:24 2011 -0500

    bump to 2.7.1

We can say that they have exactly the same history and contents because they have the same commit sha.

Also, versions with the problematic tags haven't been released as gems:

Tag versions vs. released gem versions in rspec-core
$ cd rspec-core
$ git tag | perl -pe 's/^v//' | sort > tag_versions.txt
$ curl https://rubygems.org/api/v1/versions/rspec-core.json | jq --raw-output '.[].number' | sort > gem_versions.txt
$ diff -u gem_versions.txt tag_versions.txt
--- gem_versions.txt    2021-10-14 18:53:09.000000000 +0900
+++ tag_versions.txt    2021-10-14 18:53:00.000000000 +0900
@@ -1,3 +1,4 @@
+0.0.0
 2.0.0
 2.0.0.a1
 2.0.0.a10
@@ -22,6 +23,7 @@
 2.0.0.beta.19
 2.0.0.beta.2
 2.0.0.beta.20
+2.0.0.beta.21
 2.0.0.beta.22
 2.0.0.beta.3
 2.0.0.beta.4
@@ -37,6 +39,8 @@
 2.10.1
 2.11.0
 2.11.1
+2.11.2
+2.11.3
 2.12.0
 2.12.1
 2.12.2
@@ -61,9 +65,14 @@
 2.5.1
 2.5.2
 2.6.0
+2.6.0.rc1
 2.6.0.rc2
+2.6.0.rc3
 2.6.0.rc4
+2.6.0.rc5
 2.6.0.rc6
+2.6.1
+2.6.2
 2.6.2.rc
 2.6.3
 2.6.3.beta1
@@ -75,6 +84,7 @@
 2.8.0.rc1
 2.8.0.rc2
 2.9.0
+2.9.0.rc1
 2.9.0.rc2
 2.99.0
 2.99.0.beta1
Tag versions vs. released gem versions in rspec-mocks
$ cd rspec-mocks
$ git tag | perl -pe 's/^v//' | sort > tag_versions.txt
$ curl https://rubygems.org/api/v1/versions/rspec-mocks.json | jq --raw-output '.[].number' | sort > gem_versions.txt
 $ diff -u gem_versions.txt tag_versions.txt
--- gem_versions.txt    2021-10-14 18:54:13.000000000 +0900
+++ tag_versions.txt    2021-10-14 18:54:03.000000000 +0900
@@ -1,3 +1,4 @@
+0.0.0
 2.0.0
 2.0.0.a1
 2.0.0.a10
@@ -22,6 +23,7 @@
 2.0.0.beta.19
 2.0.0.beta.2
 2.0.0.beta.20
+2.0.0.beta.21
 2.0.0.beta.22
 2.0.0.beta.3
 2.0.0.beta.4
@@ -53,19 +55,34 @@
 2.14.5
 2.14.6
 2.2.0
+2.2.1
 2.3.0
+2.3.1
 2.4.0
 2.5.0
+2.5.1
+2.5.2
 2.6.0
+2.6.0.rc1
 2.6.0.rc2
+2.6.0.rc3
 2.6.0.rc4
+2.6.0.rc5
 2.6.0.rc6
+2.6.1
+2.6.2
+2.6.2.rc
+2.6.3
+2.6.3.beta1
+2.6.4
 2.7.0
 2.7.0.rc1
+2.7.1
 2.8.0
 2.8.0.rc1
 2.8.0.rc2
 2.9.0
+2.9.0.rc1
 2.9.0.rc2
 2.99.0
 2.99.0.beta1

Though I still cannot imagine how these branches have accidentally been created, we no longer need to concern about them.

@yujinakayama
Copy link
Member

I tried to transfer https://github.com/yujinakayama/rspec-monorepo-migration to rspec organization but failed since I don't have the permission to create public repos on rspec organization.

@JonRowe Could you give me the permission?

@JonRowe
Copy link
Member

JonRowe commented Oct 26, 2021

@yujinakayama I think you have permission now

@yujinakayama
Copy link
Member

@JonRowe Thanks, it worked.

@JonRowe
Copy link
Member

JonRowe commented Oct 26, 2021

I'm very much in favour of running forward with a monorepo, perhaps as part of the RSpec 4 release process, there will be a lot of massaging PRs and issues running forward and I think we will have to mark the old repos as "archived" with large notices in the read me's.

I'm going to go ahead and start this by renaming the current meta gem repo.

Thank you so much for working on this @yujinakayama

@pirj
Copy link
Member

pirj commented Oct 26, 2021

@yujinakayama Fantastic job, thank you!

there will be a lot of massaging PRs and issues

@JonRowe I have some time to spare in November/December.

@nevinera
Copy link
Contributor

nevinera commented May 8, 2024

Well, this thread is quite old, and I know @JonRowe is actively mono-repoing now, but I wanted to check if you've considered the incremental approach - instead of trying to import everything to one place all at once, you could start by shifting two of the repositories together. It'd reduce the time-frame and friction (and the amount of in-flight juggling), and let you iron out the kinks with a smaller working set. Then you could import the other gems to it one at a time afterward as time allows.

The payoff isn't as immediate, but I'd find it a lot less stressful, were it me :-)

@JonRowe
Copy link
Member

JonRowe commented May 9, 2024

The main blocker at the moment is the build matrix for the monorepo, I've made good progress on unifying the build I just need to find time to finisht that. By using a Github action to run the build its no more complicated for all the gems than it is for 2, so doing it incrementally would increase the amount of work as you'd have to tie into the existing builds

@JonRowe JonRowe closed this as completed Nov 27, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants