-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make resolution with correct lock file faster #5321
Comments
I don't know of anything in particular off the top of my head, I am happy to take a look. I'd think that the resolver is linear with a lock file, but I have been wrong before. If you are doing profiling anyway, maybe what nightly made it regress? |
I don't think this is a regression. At least, not a recent one. I get 50 ms in resolve on current stable for http://github.com/matklad/fall/, which has about 100 packages in lockfile. 50ms seems way to much for a linear algorithm with an input of size 100. I haven't profiled it to closely though, so maybe we do some costly IO during resolution as well? |
I used to profile a |
For CARGO_PROFILE, you might want to check this PR #5319, because the current implementation of profiling is buggy and would print some stuff twice. When I do this, I see
It might be prudent to try to insert some more-fine grained profiling calls as well, to see the breakdown of those 54 mses. |
and I can reproduce:
So now to dig in |
Unfortunately, I don't know how to share flame graphs from WPA in a targeted way. So you'll have to settle for my summary. Ferst, a local build of master is slower than nightly ~370ms. Don't know why, maybe the debug symbols I have turned on so as to get flame graphs, or the profiling. For a random invocation WPA claims:
That takes us well into parts of the code I know nothing about, but maybe some of those queries could be cached? The repo won't move, nore will the head commit. |
Interesting! So, this is indeed IO bound. I wonder, however, if we indeed need all this IO. In theory, it should be possible to verify Cargo.lock against Cargo.toml without actually loading any crates, and just blindly trusting that Cargo.lock lists correct dependencies. This assumption is, of course, invalid, because path dependencies can change under our feet. But looks like we can assume that registry dependencies, which are the absolute majority, do not change? @alexcrichton do you think this could be a valid optimization? It is, of course, much less important then waiting for |
This is the second time we noted that something is slow because querying is slow. I added a cache to the conflict path in #5112 so that slow resolutions don't call this slow path. I suspect we will continue to find things that are slowed down by querying. I am not objecting to a cache in the large approach for the happy path. But, my 2c is to find ways to make querying faster if possible. (Of course they are not mutually exclusive.) |
@Eh2406's findings definitely make sense to me, that's what I've historically seen as well. In that slow registry times are primarily related to tons of crates.io crates where we need to go back to the git repository to load information and parse the JSON coming out, both of which are relatively costly. I'm personally ok making changes here as well but we need to be quite careful as determining whether a lockfile matches the previous manifests is certainly non-trivial (and I've gotten it wrong a bunch in the past). I'd prefer to first optimize the actual pieces here if we can. Historically we actually checked out the index on disk but we stopped doing that one day to cut down on checkout size and now it's all stored as a raw git database. While that has clear space savings I'm not sure that it made anything faster (and in fact may have made things slower!). That's one possible route for optimization I think. The other possibility to optimize here is that when we query a registry with a dependency that looks like |
Now that #5359 is merged, I believe this is the main bottleneck of no-op builds. I was going to close this issue because it doesn't suggest anything too actionable, but I think I have at least an idea!
What if we pull essentially the same trick as we did with rustc caching? I think that the main bottleneck with queries is the actual git operations and JSON parsing. It seems unlikely that we can optimize those further: for git we use libgit2, and we cache all the relevant handles, for JSON, we could parse only a single line with a specific version, but to do that, you need to parse all input at least partially as well. So it might be interesting to cache those things across Cargo invocations. What if we store a per-project cache as a bincode-endcoded |
@matklad an excellent idea! I've toyed with the idea of doing that in the past but it was never pressing enough to attempt to tackle until now! I think it's possible that we lazily compute a faster-to-access version of the index on the fly whenever Cargo is executing. We could crate-by-crate create a much faster format on-disk for accessing files, maybe even mmap them if we're intrepid! We could then bust the cache as soon as the registry updates (easy to do, we already have a git rev). One more possible optimization to explore though would be to avoid using libgit2 at all for reading json files. We initial stopped checking out the index to save disk space but that may have come at the cost of performance if libgit2 isn't the speediest at loading these files. It may actually be faster to have a checkout of the index which we then read file-by-file. This wouldn't solve the JSON parse times though, but wanted to mention it as a possible optimization! |
Er you actually said most of what I was already gonna say I should read past the first few sentences... In any case, I agree! I think it's worthwhile to leave this open. I'd love for huge projects to have at most 5ms of overhead from Cargo per builds. |
So, I've cooked up a quick and dirty patch to add persistent caching, the code is here: https://github.com/matklad/cargo/tree/cache-more-of-the-things. The result is marginally faster than staus quo. This means thee things:
Here are flamegraphs
|
Hm are you sure the "with caching" one is optimized well enough and working? 10% of the runtime is dropping the registry index which I'd expect to be close to 0 for a fresh build, right? The fact that Context::compile takes 25% of the time is also sorta crazy. It does look though like there's a lot of TOML parsing going on which may be slowing things down. Overall it looks like there's just a ton of allocations which we may be able to work around with more strategic caching/interning (maybe even arenas?). I suspect though that TOML may be one of the lowest hanging fruit to speed up (although not super easy either) |
I am sure it is not optimized at all :) Like, in indeed dumps registry to disk every time, because it doesn't track "dirty" state at all. It looks like it mostly works though (custom registries are mishandled, but otherwise Cargo with this patch successfully builds stuff). I was just hoping that libgit and parsing JSON are orders of magnitude slow, so I was expecting that simplistic caching would remove lockfile verification from profile altogether. Sadly, this hasn't happened, so we'll have to hunt down those allocations :-) As another interesting metric, the size of the cache itself, for cargo, is 1.3M, which is kinda huge, given that it is bincode-encoded data (though, the data is mostly short strings, and I think this would be an especially bad case for bincode?)
This might have regressed after my refactorings of
Definitely! I think carefully implemented caching would do the trick. And we probably want to intern at the storage layer as well, so as not to store a 100500 copies of
Overall, |
Yeah overall I'm also sort of perplexed why Cargo is taking so long. Nothing Cargo should really be more than O(n^2) and if the dependency graph has 100 crates then O(10_000) allocations isn't really that bad? Weird! |
This is flamegraph for no-op build of servo: https://github.com/matklad/cargo/blob/aa181fd28a9418fc5c03dafbe05b4b7a06d88816/servo.svg. The culprit there is |
@matklad btw for the "best" flamegraphs I've found that I've needed to:
AKA I think if you get frame pointers everywhere the stack traces look a little nicer, and even moreso with the musl target nowadays which I think has debuginfo as well by default. |
Just re-did the benchmarks and master (773e3b6) looks the same. |
This came up when profiling no-op builds recently. Although the majority of time is spent shelling out to
rustc
, we still spend about 50ms in resolve, although we have a lockfile, and it is valid.Perhaps we can make this happy path much faster? Theory says that we can verify solution to an NP problem in polynomial time, and practice hints that verifying lockfile should probably be linear?
@Eh2406 perhaps you know some ways to make no-op case specifically faster?
The text was updated successfully, but these errors were encountered: