-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cargo should reference git dependencies by commit hash instead of url #7497
Comments
Thanks for the report! Unfortunately though this isn't a particularly actionable issue as-is and as-written is likely to basically be ignored for the rest of time. It's not really the most productive to off-hand propose large-scale changes to the foundations of Cargo (such as how git sources are handled). I agree there are bugs and improvements to be made with Cargo, but if you're curious to see actual change and see bugs fixed, proposing that Cargo completely rethinks how it handles git repositories is unlikely to do that. (this also ignores context for why Cargo does what it does today!) |
@alexcrichton: I'm aware that my suggestion goes beyond a quick fix. I may also have neglected valid reasons to do it like cargo does it today. However: May I ask for your opinion on the approach suggested from a design point of view rather than the likelihood of it being done? |
Hi @alexcrichton , we are suffering from this for a period of time. Our rust-sgx-sdk has been transferred to the apache organization and renamed twice. In a foreseeable future, there'll be at least two more renaming. It's painful to modify hundreds of forked crates for each of the renaming action. We really need it. A temporary workaround is to simulate the HTTP 302 redirection to help git find the correct repo. The result is that we can have a URL like [net]
git-fetch-with-cli = true @brenzi 's comment is very reasonable
It'll be the best if I can get some comments from you on the following questions: Thank you very much! |
I do not personally have time to prioritize and implement this at this time, but if you're willing to put in the work to design and implement this that'd be great! |
Related: #7670 |
It looks like you might be able to do a quick fix by resolving all the different forms of git revs in EDIT So this solves my case (repo?rev=xxxxxx#xxxxxx == repo#xxxxxx), but not the general case. The way to solve the general case is to reference git dependencies by rev hash rather than url. It also requires some kind of concept of underspecified git dep (so that during dep resolution cargo can merge git deps if possible). Quite a lot of work: I'll think about how it might look. |
Would a possible solution be to wait until feature resolution is complete, then compare all git revisions to see if they share hash (the hash is 20 bytes long and 2 ** 160 is a very very large number). This would be The thing is, I don't think there is any other way to know if two repos share code in general. They might have completely different names, so if you can't use the Oids (hashes) then I'm stumped. |
If we didn't want to go down the route of unifying all gits based on revision hashes, we could do it for some obvious cases, for example we could identify commits based on a (location, rev) pair, where location is some sort of canonicalised repo url, and rev is the hash of the revision/commit. This might be the best safest option. If users are pulling in 2 different repos from different places, can they really expect us to dedup them? |
Note:
is already unified with
(https://github.com/rust-lang/cargo/blob/master/src/cargo/util/canonical_url.rs#L50) |
There are two solutions I see
Both of these require downloading/accessing the git repo earlier I think. I don't think you should try to unify 2 different repositories, only revisions within a single repository. I also think the What are others' thoughts? And if you like the idea can you point me in the direction of how to implement it, because I've been looking at the codebase and I can't figure out how the resolution bit works. |
This is PR 3 of N in this series. For the earlier ones, see: * oxidecomputer/crucible#552 * oxidecomputer/propolis#262 Currently, we specify the progenitor dependency as both: ```toml progenitor-client = { git = "https://github.com/oxidecomputer/progenitor", branch = "main" } ``` And ```toml progenitor-client = { git = "https://github.com/oxidecomputer/progenitor" } ``` Cargo (due to rust-lang/cargo#7497) doesn't dedup these dependencies even though they resolve to the same branch and hash. This causes issues with `cargo doc` colliding with itself, as it tries to document both: ``` Documenting progenitor-impl v0.2.1-dev (https://github.com/oxidecomputer/progenitor#4b5cef4b) Documenting progenitor-impl v0.2.1-dev (https://github.com/oxidecomputer/progenitor?branch=main#4b5cef4b) ``` Fix this by always using the branch name. (We choose this option over not specifying the branch so to maintain uniformity in cases where a non-default branch must be used.) This also requires a crucible update, since that didn't specify a branch either. And the crucible update necessitates a propolis update to get the versions aligned. In the future it would be nice to have a lint which ensures that git dependencies always have the branch name in them.
See also #10256 |
cargo thinks the following dependencies are different although they are exactly the same:
assuming that branch
ccc
HEAD as well as tagddd
points torev = "0123456789abcdef"
, being the HEAD ofmaster
. Moreover,fff/bbb
being a fresh fork of repositorybbb
by userfff
I would suggest that all these references are being treated as equal because they are.
The fact that cargo doesn't know that all these dependencies are equal causes a lot of trouble:
The great thing about git is that a commit hash is a unique fingerprint of a repository. It doesn't matter on which branch, tag or even in which organization the reference is located. The commit hash is the same, so is the code.
actually, just writing
might not be practical, but it would be unique across the entirety of all git repositories in existence. The only problem being to retrieve the corresponding code if you don't know the url, so I'm not suggesting this syntax. I just want to make the point that - behind the scenes - cargo could very well use a git commit hash without any url, branch or tag as an identifier for a specific version of a crate.
The text was updated successfully, but these errors were encountered: