-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue for RFC 1977: public & private dependencies, as it relates to the resolver #6129
Comments
Adding new dependencies seems fine by me! My historical strategy with the resolver has been "don't add any fancy representation tricks, profile Servo, add tricks as necessary" |
I have a branch that stubs out getting this working. It involves all the ugly things I can think of. The state is a mess of big nested data structures. The algorithm is a bunch of nested loops in the hottest part of the resolver. It is not even checking for redundant work. The code is mostly repeated between the code that check if a candidate can be added and where the the candidate is added. Several existing optimizations are completely removed in the case that a pub-dep has ben looked at. The error messages are just "Todo", and most of the commit messages are "oops". But it is working! I modified the passes_validation to check for pub-dep conflicts, then modified the Next try the So my plan for the Impl day at RBR is to attempt to get edit: the cleaned up history is at https://github.com/Eh2406/cargo/tree/pub-dep |
Cc @necaris |
So at RBR I got the history cleaned up. Alex and I reviewed the progress so far and a plan for how to re enable backjumping without losing correctness. That backjumping strategy was implemented today. Unfortunately, the fuzz tests are finding kasese of exponential blowup (50_000 ticks). I suspect it is similar to #5213, in which we backtrack to where On a different note I also ran the other resolver tests, we have several that are hitting the 30 sec timeout on this branch. So we will need to work on per tick performance before this can merge. |
Good news If you run proptest overnight with |
I am (for now) ignoring the fact that the current strategy will fail to resolve when there is a valid solution in rare circumstances. I have cleaned up the case of exponential blow up, so I understand why it is happening. What I don't understand is why It is not happening using normal deps. (it is a lot harder to write, but should be possible.) I am investigating that, hoping that I will either find an optimization I forgot to add to pub/priv deps, or a case that can be reproduced on master. |
I can now confirm that the Unfortunately, a simple solution did not work. I think doing something special for |
I implemented the "obviously correct thing", and the fuzz tests pointed out that it was not backtracking correctly. I assumed I had done something silly, but could not find it. Then I got distracted by #6258. Coming back to it the minimized test was correctly pointing out that the "obviously correct thing" will not work. It is possible to make a pub-dep violation, without changing |
Persistent data structures by im-rs There has been a long standing "TODO: We should switch to persistent hash maps if we can" in the resolver. This PR introduces a dependency on [im-rs](bodil/im-rs#26) to provide persistent hash maps. It then uses `im_rc::OrdSet` to store the priority list of dependencies, instead of `std::BinaryHeap`, as cloning the `Heap` was one of the most expensive things we did. In Big O terms these changes are very big wins, in practice the improvement is small. This is do to a number of factors like, `std::collections` are very fast, N is usually only in the hundreds, we only clone when the borrow checker insists on it, and we use `RC` and other tricks to avoid cloning. I would advocate that we accept this PR, as: - Things are only going to get more complicated in the future. It would be nice to have Big O on our side. - When developing a new feature finding each place to add `RC` should be an afterthought not absolutely required before merge. (cc #6129 witch is stuck on this kind of per tick performance problem as well as other things). - As the code gets more complex, making sure we never break any of the tricks becomes harder. It would be easier to work on this if such mistakes are marginal instead of show stoppers. (cc #5130 a bug report of one of these bing removed and affecting `./x.py build` enough to trigger an investigation). - The resolver is already very complicated with a lot of moving parts to keep in your head at the same time, this will only get worse as we add features. There are a long list of tricks that are *critical* before and `~5%` after, it would be nice to consider if each one is worth the code complexity. (I can list some of the ones I have tried to remove but are still small wins, if that would help the discussion). Overall, I think it is worth doing now, but can see that if is not a slam dunk.
I have a branch where I have implemented both versions simultaneously. Like the "obviously correct thing" it records every PID in the path from the dependency that can see the conflict to both conflicting crates. Like the strategy Alex and I came up with at RBR, it ensures the parents relationship is still active when the Back Jumping occurs. Because it looks at the dependency tree it passes the test the previous strategy failed. Because it will records all the PIDs involved in conflict it can learn like #5213. Unfortunately, because it adds a lot of things to the conflict it is a highly efficient way of creating a test case like #6283. So efficient that, In debug the fuzz test only tell me that it hit the 50k limit, In release the fuzz test only tell me that there are cases that took longer than 90sec to resolve! I have to say, adding new features to the resolver has gotten a lot harder now that we have fuzz tests making sure that they are correct, complete, and efficient. :-P |
Copied from #6653 (comment): So the specific questions are: How does this interact with renamed dependencies?There are lots of juicy corner cases here. On one had being able to depend on 2 different versions of the same library is one of the points of renamed dependencies. Fore example so you can publicly derive This makes my head hurt, is there something simple I am missing? If not, where is the appropriate place to discuss fundamental problems with an accepted RFC? How does this interact with
|
For renaming dependencies I think the main question is that it's the first time you've been able to depend on two versions of the same dependency. The name itself doesn't matter so much (it's only relevant when compiling the local crate), but the fact that you can publicly depend on two versions of the same crate does indeed create a weird situation. My best guess of what to do here is that, like we'd probably always need anyway, an escape hatch would be used to say "please don't consider this for public crate graph resolution" or something like that. For |
part of the infrastructure for public & private dependencies in the resolver This is part of my work on public & private dependencies in the resolver from #6129. As discussed there the proptest fuzzers are happy to find exponential blow up with all the back jumping strategies I have tried. So this PR does not have a back jumping strategie nor does it have the proptest fuzzers generating public dependencies. These will both need to change for the feature to stabilize. In the meantime it gives the correct results on the cases it can handle. With rust-lang/rust#57586 landed there is a lot of work to do on Cargos front end. Adding a UI for this, passing the relevant things to rustc, passing it to crates.io, passing it to cargo-metadata. This is good enough to allow that work to proceed.
So there are 3 states (not the 2 described in the RFC)
The status quo is that everything is |
Yes I feel it would be better to grandfather everything in as unchecked so privacy problems can be errors not warnings by default.
Yes, but I think it's best to think in terms that preserve composition. Upstream creates don't care about downstream creates at all, but the dependency resolution upstream imposes
It depends on what equivalences the Rust code is allowed to witness: i.e. if |
Regarding renaming dependencies: Since the |
At the request of the Cargo team I have reread all of the discussion of the RFC to summarize the state of affairs. Admittedly, unlike @Aaron1011, @Ericson2314, and @alexcrichton, I was not there when the RFC was written. But here goes. The RFC was accepted before But in the meantime the So we need a new syntax that is aware of all three cases, and preferably future compatible with hard cases we may want to add. However it does not need to be the syntax forever as we have editions. The default interpretation of (at least |
syntax proposal, speaking for myself not for the team: Goles: stay consistent but also as close to the RFC as possible. A dep can be in 3 states:
Add a message to the Future extensions:
Alternatives:
|
@Eh2406 I too like explicit publicity lists (you want to do explicit public, not explicit private), but that is way to complicated a feature for Cargo to handle alone. This would take a module system like the ML's, or Haskell's backpack. Let's punt on this for now, and keep in mind it overlaps with rust-lang/rfcs#2492 . |
It's bikesheddy, but I prefer this form of specifying it. |
My 2¢: "visibility" field name is fine, but I think your original "unchecked" name (e.g. visibility="unchecked") is much better than "unset". The latter left me rather confused about how it could be different from not setting (not specifying) this optional field in Cargo.toml deps. After reading this 3 or 4 times, I now think not specifying the field is the same as some logical default for which you reserve the right to change over time ("unchecked" for now, possibly "private", later.) Or am I still wrong about your intent with that? |
You impute to me too much forethought. It often bites when there is no way to set a field to the value that would be used if it was missing. In CSS often The use case I had in mind is something like: [dependencies]
curl = {
version = "0.4.21",
features = ['http2'],
public="unchecked" # We want this to be private but there is some bad interaction with our unrelated dep on http2 so we are intentionally leaving it unset.
# We just had a comment here, but we got 3 PRs suggesting we change it from people that did not read the comment.
} Sorry for the confusion. |
Q: Is the below another possible use case? visibility="unchecked" # This *is* public, but... (workaround for project failure during dep resolution) (And thus I think "visibility" gives you more future room for changes.) Note this also relates to a suggestion in rust-lang/rust#44663 : A rustc warning/lint for if the dep is marked public, but it is in fact not public. |
Yes, definitely. That will also have |
And you could also use "viz" to shorten "visibility". Not just cute, but accurate usage of the abbreviation: https://en.wikipedia.org/wiki/Viz.
prior art: "pub" lang. keyword for public. |
speaking for myself not for the team: syntax proposal 2, thanks to everyone for the feedback. A dep can be in 3 states:
Add a message to the Possible future extensions:
|
How will public/private dependencies interact with crate features? One of my crates may be gaining a feature flag that would conceptually change a private dependency into a public one; is there some way of indicating that? |
@shepmaster I would say you should be able to do that with a optional public and private dependency. publicity overrules privacy due to the principle that no crate should be able to see multiple versions of the same library. |
rust-lang/wg-cargo-std-aware#12 a situation came up here that might require a small tweak to the solver. Say |
@shepmaster from the resolvers perspective this is straightforward and works already. (Without the flag the resolver get just a private dependencies with the flag the resolver get both.) But from a syntactic Cargo.toml perspective, I don't know how to specify that relationship. Add it to the list of things that need to be added to the features system.
From an implementation perspective, there are some structures that explicitly rely on the global property From a conceptual perspective I wonder if this is transitive? For example:
Can we build From a community perspective, we all ready have a vocal portion of our user base concerned with the Overall I think this is a good idea and definitely worth discussing, but a big enough change to deserve a separate RFC. |
In light of #7769 I would suggest the proposed field |
This is a sub part of the discussion of implementation of RFC 1977, specifically how the resolver could implement this protocol. The main Tracking issue is at rust-lang/rust#44663.
This comment rust-lang/rfcs#1977 (comment) described the properties we want to uphold the way that fits best with my brain.
So to use the termanology, I feel like when we are adding a packedge
B
we need to walk up the dependency tree to all potentialC
's that can see the addition to test if there is a conflictingA
.That can be done pretty fast if we have a fast way to walk up dependency tree and a fast way to see all the pub reachable packages for each ancestor. (Neither of which we have at this time.)
But that feels like a lot of state to be cloned for each tick.
Currently we avoid the expensive clones with extensive use of RC.
Is it time to add a im-rs dependency?
Is there a better algorithm?
Is there a small part that would make a good starting PR?
The text was updated successfully, but these errors were encountered: