Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source::{fuzzy_}query{_vec} can say "try again" #8985

Closed
wants to merge 9 commits into from

Conversation

Eh2406
Copy link
Contributor

@Eh2406 Eh2406 commented Dec 16, 2020

This adds Poll to the return type of Source. This is prep work for @jonhoo's #8890 as suggested by @alexcrichton.

There is definitely a lot of polishing work to do before merge, but it is good enough to start a conversation and see what CI has to say.

Some questions as it stands now:

  • CargoResult<Poll<_>> vs Poll<CargoResult<()>> neither is clearly more ergonomic to work with?
  • Is there some use case that makes query pull its weight, or should we only have the query_vec version?
  • What to do when we get a Poll::Pending? Currently we bizzy weight, but that is not grate. What can we do short of async and await.
  • There are a lot of places that want to "just weight till it is ready", should we find a way to have that code in only one or two places? Should we have a query_weight that is defaulted to call query in a loop?
  • Should we move some of the loops up the stack for better parallelism?
  • Should we move some of the loops down the stack to do less redundant work?
  • Can we break some of the caches up, so that there is more sharing between runs? (Yes, but maybe it can be left for later.)
  • There are 2 places where we want expect for Poll, should we start an extension trait to add a method?

@rust-highfive
Copy link

r? @alexcrichton

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 16, 2020
Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for this!

FWIW to get the full value out of this change I think we will want to remove Source::update. That should no longer be needed at all since sources will naturally report that they need to be updated if, during a query, they discover that blocking work needs to be done. This would involve refactoring the registry, for example, to report that if any unlocked dependency was queried and an update hasn't been done yet that an update needs to be done. The thinking then is that for the HTTP inded source it would be more fine-grained about whether a package is ready or not.

As to some of your questions:

CargoResult<Poll<_>> vs Poll<CargoResult<()>> neither is clearly more ergonomic to work with?

I like CargoResult<Poll<_>> because the usage of ? is still "obviously correct"

Is there some use case that makes query pull its weight, or should we only have the query_vec version?

This was added for performance reasons at some point in the past to avoid allocating lots of intermediate vectors. I'm not sure if this is still the case but it didn't seem like it was too onerous to handle here?

What to do when we get a Poll::Pending? Currently we bizzy weight, but that is not grate. What can we do short of async and await.

I wrote down a few comments here and there, but the main gist of what I'm thinking is that we go back to the source and say "hey you said pending earlier, please resolve that pending status right now in a blocking fashion".

There are a lot of places that want to "just weight till it is ready", should we find a way to have that code in only one or two places? Should we have a query_weight that is defaulted to call query in a loop?

I was thinking the same thing about possibly having a helper method on the trait which does the wait for you. I think it'd be fine to add that, although we would want to use it sparingly within Cargo.

Should we move some of the loops up the stack for better parallelism?

I only found one location (patches) that I think should get moved up, but other than that the loops seemed reasonable to me (to either add blocking or ignore the pending status).

Should we move some of the loops down the stack to do less redundant work?

I didn't find anything along these lines

Can we break some of the caches up, so that there is more sharing between runs? (Yes, but maybe it can be left for later.)

Yeah I was initially hoping we could preserve the entire resolver and benefit from lots of prepopulated non-pending caches, but that may not end up being the case. I don't think this is a huge worry though given the speed of the resolver and the typical time it takes to fetch something from the network.

There are 2 places where we want expect for Poll, should we start an extension trait to add a method?

Yeah seems fine to me!

})
.collect::<Vec<_>>();
} else {
// TODO: dont hot loop for it to be Ready
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I imagine this bottoms out in something like self.wait_for_sources_to_be_ready(). Each source would already be registered in some internal map of PackageRegistry if we're blocked on it, and then we'd ask each source, in sequence, "do your blocking thing now".

source.query(dep, callback)?;
source.query(dep, callback)?
};
if pend.is_pending() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where I'd imagine that a record of this dependency's source id is recorded in an internal map for us to later iterate over and ask the source to block on things.

src/cargo/core/registry.rs Outdated Show resolved Hide resolved
@@ -32,7 +33,7 @@ pub struct RegistryQueryer<'a> {
/// specify minimum dependency versions to be used.
minimal_versions: bool,
/// a cache of `Candidate`s that fulfil a `Dependency`
registry_cache: HashMap<Dependency, Rc<Vec<Summary>>>,
registry_cache: HashMap<Dependency, Poll<Rc<Vec<Summary>>>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit surprised by this where it's caching not ready signals. I was hoping that we could reuse the resolver across iterations perhaps and benefit from mostly prepopulated caches?

src/cargo/core/resolver/dep_cache.rs Outdated Show resolved Hide resolved
src/cargo/core/resolver/errors.rs Outdated Show resolved Hide resolved
if registry.all_ready() {
break (registry, cx);
} else {
// TODO: dont hot loop for it to be Ready
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I'm imagining a new method on Registry which is something like "block on things" which is called here.

break deps;
}
Poll::Pending => {
// TODO: dont hot loop for it to be Ready
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget the context in which this function is called, but this could call the hypothetical new Source::block_on_stuff method

@jonhoo
Copy link
Contributor

jonhoo commented Dec 17, 2020

I don't have too much to add to this except that RegistryData::load must also return Poll to enable #8890. I also agree with Alex that we probably want busy waiting to be replaced with a call to a "wait for ready" method on Source.

@alexcrichton
Copy link
Member

@jonhoo yeah I'm imagining that after this PR we'll want to refactor the internal index implementation of the registry like you mention. We'll want to tweak the current implementations as well to match the new interface, and the http index should then fit quite cleanly into the implementation.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Dec 18, 2020

I just added a commit to push the Poll all the way to RegistryData::load. It added more ugly loops. I wanted to get all of the ugly on the table before starting to take your suggestions for how to clean them up.

.summaries(pkg.name(), &req, load)?
.any(|summary| summary.yanked);
Ok(found)
loop {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this just also be modified to return Poll<bool>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One level is straightforward and worth it. Continuing up the stack Source::is_yanked; I am not seeing where we would want to call a lot of it in parallel nor where it ends in an existing loop. Seams like it is called where there is not an opportunity for parallelism (install code) or where we have already updated that pkg (writing a lockfile). What have I missed?

@jonhoo
Copy link
Contributor

jonhoo commented Dec 18, 2020

I think my general thinking here is that we should just continue to propagate Poll up the stack and deal with it "near the top"

break;
}
Poll::Pending => {
// TODO: dont hot loop for it to be Ready
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will want the ability to return up Poll

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to do the work, I am just not seeing where up the stack we would want to call this in parallel with other work? What have I missed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's true, although I think we'll want special handling for this since I don't think we want to add a round-trip time to get this file all the time in Cargo. We'd presumably want to cache this file separately for a fixed length of time and while in that window of time Cargo doesn't re-fetch the file.

@bors
Copy link
Collaborator

bors commented Feb 3, 2021

☔ The latest upstream changes (presumably #9125) made this pull request unmergeable. Please resolve the merge conflicts.

@bors
Copy link
Collaborator

bors commented Feb 10, 2021

☔ The latest upstream changes (presumably #9133) made this pull request unmergeable. Please resolve the merge conflicts.

@samanpa
Copy link

samanpa commented Aug 11, 2021

Any updates on this? I keep hoping that this will be unblocked so #8890 will become a reality.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Aug 12, 2021

That is my hope as well. My plan was to get back to this, but I have been spending my time understanding the design space around the Cargo Auth RFC. When that is done, I will get back to this. (Best laid plans and all, but I will try) The Cargo team may close this PR as stale in the meantime, but it will still be "next" on my todo list.

@samanpa
Copy link

samanpa commented Aug 12, 2021

That is my hope as well. My plan was to get back to this, but I have been spending my time understanding the design space around the Cargo Auth RFC. When that is done, I will get back to this. (Best laid plans and all, but I will try) The Cargo team may close this PR as stale in the meantime, but it will still be "next" on my todo list.

Thank you.

@alexcrichton alexcrichton removed their assignment Mar 8, 2022
bors added a commit that referenced this pull request Mar 9, 2022
Registry functions return Poll to enable parallel fetching of index data

Adds `Poll` as a return type for several registry functions to enable parallel fetching of crate metadata with a future http-based registry.

Work is scheduled by calling the `query` and related functions, then waited on with `block_until_ready`.

This PR is based on the draft PR started by eh2406 here [#8985](#8985).

r? `@Eh2406`
cc `@alexcrichton`
cc `@jonhoo`
@arlosi
Copy link
Contributor

arlosi commented Mar 9, 2022

This can be closed since it landed via #10064!

@Eh2406 Eh2406 closed this Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants