Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache submodules between different git checkouts #10279

Closed
wants to merge 1 commit into from

Conversation

jyn514
Copy link
Member

@jyn514 jyn514 commented Jan 11, 2022

What does this PR try to resolve?

Rather than letting git manage each submodule checkout individually, this uses a shared bare submodule directory in target/git/checkouts/submodules and makes a checkout of the shared directory in the relative path.

This caches based on the URL, but base64-encodes the URL to avoid errors about invalid FS paths. I'm hesitant to cache based on the relative path in the git checkout because it could use a different remote between commits of the parent.

Fixes #7987.

How should we test and review this PR?

I tested like so:

$ cargo init
$ echo 'boring = { git = "https://github.com/cloudflare/boring", branch = "master" }' >> Cargo.toml
$ /home/jnelson/rust-lang/cargo/target/debug/cargo update -p boring --precise 46787b7b6909cadf81cf3a8cd9dc351c9efdfdbd
$ /home/jnelson/rust-lang/cargo/target/debug/cargo update -p boring --precise c037a438f8d7b91533524570237afcfeffffe496

and confirmed that the time to do the second update is negligible. I'm not sure how to test this automatically using the cargo_test framework since cargo still prints "Updating git submodule" on the second checkout (because it has to create the symlink).

@rust-highfive
Copy link

r? @alexcrichton

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 11, 2022
@jyn514
Copy link
Member Author

jyn514 commented Jan 11, 2022

cc @ehuss

@jyn514 jyn514 force-pushed the cached-submodules branch 2 times, most recently from f054fc0 to 1e09e34 Compare January 11, 2022 02:21
@ehuss
Copy link
Contributor

ehuss commented Jan 11, 2022

I haven't had a chance to review this, but I did want to mention a few things before you spend too much time on them.

I don't think we can rely on symlinks on windows. Older versions do not support them.

I don't think links are necessary, though. I would expect this to use the db directory to store a bare clone of the submodule, and to then just check out the specific revision inside the parent project (git/checkouts/foo-abcdef/ghdefk/submodule-path).

@jyn514 jyn514 force-pushed the cached-submodules branch from 2f93c42 to 2a2a94a Compare January 13, 2022 00:28
@jyn514
Copy link
Member Author

jyn514 commented Jan 13, 2022

I don't think links are necessary, though. I would expect this to use the db directory to store a bare clone of the submodule, and to then just check out the specific revision inside the parent project (git/checkouts/foo-abcdef/ghdefk/submodule-path).

Ok, done :)

@alexcrichton
Copy link
Member

Reading over this I don't think that this is the best approach to solve this problem. Naively the "fix" I would expect for this issue is what @ehuss described. Checkouts of submodules would perform basically the exact same process as the rest of git checkouts. There's a shared db for each submodule url, and within that db we make resolve the submodule git commit or otherwise fetch contents to try to resolve it. Afterwards we'd then perform a checkout from the bare db repository into the checkout of the outer git checkout. Most of this probably wouldn't actually use libgit2's submodule support other than simply iterating, otherwise we'd be managing submodules in a custom fashion.

@jyn514
Copy link
Member Author

jyn514 commented Jan 13, 2022

@alexcrichton Sorry, I forgot to update the PR description - this now uses @ehuss's suggestion of bare checkouts.

@alexcrichton
Copy link
Member

While that may be the case, I had other suggestions in my previous comment I don't think should be ignored simply because bare checkouts are used somewhere now.

@bors
Copy link
Contributor

bors commented Jan 18, 2022

☔ The latest upstream changes (presumably #10296) made this pull request unmergeable. Please resolve the merge conflicts.

@jyn514
Copy link
Member Author

jyn514 commented Feb 15, 2022

There's a shared db for each submodule url, and within that db we make resolve the submodule git commit or otherwise fetch contents to try to resolve it. Afterwards we'd then perform a checkout from the bare db repository into the checkout of the outer git checkout.

Sorry, I'm not sure I understand what I haven't done from that list.

Is there something I missed?

@jyn514 jyn514 force-pushed the cached-submodules branch 2 times, most recently from c533f07 to e4eaa3b Compare February 15, 2022 03:58
This base64-encodes the URLs to avoid errors like the following:
```
error: failed to get `dep1` as a dependency of package `foo v0.5.0 (D:/a/cargo/cargo/target/tmp/cit/t1035/foo)`

Caused by:
  failed to load source for dependency `dep1`

Caused by:
  Unable to update file:///D:/a/cargo/cargo/target/tmp/cit/t1035/dep1

Caused by:
  failed to update submodule `src`

Caused by:
  failed to make directory 'D:/a/cargo/cargo/target/tmp/cit/t1035/home/.cargo/git/checkouts/submodules/file:': The filename, directory name, or volume label syntax is incorrect.
  ; class=Os (2)
', tests\testsuite\git.rs:2515:10
```

It uses bare checkouts instead of symbolic links to avoid permission errors on Windows.
@alexcrichton
Copy link
Member

Cargo has existing infrastructure for a global database of git repos and a global database of git checkouts. You've bypassed all that infrastructure and invented your own scheme. My comment is that you should be using what Cargo already has instead of inventing something new.

@alexcrichton
Copy link
Member

Thanks for the PR, but I'm going to be stepping down from the Cargo team so I'm going to un-assign myself from this. The Cargo team will help review this when they get a chance.

@alexcrichton alexcrichton removed their assignment Mar 8, 2022
@jyn514
Copy link
Member Author

jyn514 commented May 4, 2022

I am not quite sure how to proceed here; I'll open a new PR if I figure it out.

@jyn514 jyn514 closed this May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

git submodules are not cached
5 participants