-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High total runtime for test
, even with very high cache rate
#12
Comments
At least for the cases I have profiles for, it looks like a very large fraction of the runtime for these cases is in: https://github.com/pantsbuild/pants/blob/e45b00d2eb467b599966bca262405a5d74d27bdd/src/python/pants/backend/python/util_rules/local_dists.py#L80-L83 Upon inspection, it looks like those @rules are pulling the entire contents of distributions into memory, and that they're doing so per-consumer (rather than per-dist). And because the work accomplished by those |
pantsbuild/pants#14551 addresses the local dist performance issue. |
Cherrypicking to 2.10.x in pantsbuild/pants#14555 |
Have also opened pantsbuild/pants#14564, since it looks like in some cases there are no dists present at all. |
I did some benchmarking on digest subsetting (see pantsbuild/pants#14569), and noticed the following:
So it is possible that full subset generation times of a few hundred ms add up across many consumers. But this benchmarking effort incidentally reveals something worse and not directly related to subsetting, i.e., superlinear behavior of CreateDigest. |
Of course this may be a red herring for this specific issue, since I'm not sure we ever CreateDigest on significant numbers of files outside a test setting. Nonetheless it would be instructive to know why we have quadratic behavior there, as maybe it means we have it elsewhere. |
Thanks a lot!
Yea, I think that |
I've opened pantsbuild/pants#14638 to help address pantsbuild/pants#12662: there appears to be a small amount of overlap between memory usage and performance (less than I suspected when I opened this ticket), so we will likely break out pursuing pantsbuild/pants#12662 into a separate workstream while continuing to work on the performance issue captured here. |
The next highest item in the profile was related to the snapshotting of sources:
After investigating whether it would be useful to implement batching (I don't think so, because inference will still need to capture or isolate per-target source batches, which might mean two different APIs), I've begun pantsbuild/pants#13112, which should significantly reduce IO. |
There has been further progress on pantsbuild/pants#13112: see pantsbuild/pants#13112 (comment) for a summary. The full stack of patches should be landable early next week, although some further optimization might be necessary to show the full potential. |
Most of the effort for pantsbuild/pants#13112 has now been completed, and preliminary microbenchmarking shows that the particular codepath from #12 (comment) (which represented 33-66% of the total runtime) has been reduced by 56%. So we should see anywhere between a |
Users were able to test with pantsbuild/pants#13112, and confirmed a 30% speedup, which is great. Source hydration / snapshotting is still the largest part of the profile (the codepath below The item next in the list below source hydration is time spent fetching from the (remote) cache. There is a known performance issue in that area, which is that we currently eagerly fetch all cache outputs before calling the cache hit successful. That is because we can't know whether we'll need them later, and we don't implement "backtracking" to re-run a process which we thought had hit but which actually missed -- covered by pantsbuild/pants#11331. It's not an issue of bandwidth (the total amount fetched from the cache in these cases is actually only ~2MB): rather, one of latency and bounded concurrency (controlled by |
New trace level profiles show that The hottest path below [
{
"method": "pants.backend.python.goals.pytest_runner.setup_pytest_for_target",
"count": 398,
"duration_micros": 56748332960
},
{
"method": "pants.engine.internals.graph.transitive_targets",
"count": 398,
"duration_micros": 37749039905
},
{
"method": "pants.engine.internals.graph.transitive_dependency_mapping",
"count": 398,
"duration_micros": 37737757996
},
{
"method": "pants.engine.internals.graph.resolve_targets",
"count": 4585,
"duration_micros": 46571189900
},
{
"method": "pants.engine.internals.graph.resolve_unexpanded_targets",
"count": 5009,
"duration_micros": 46579084763
},
{
"method": "pants.engine.internals.graph.resolve_dependencies",
"count": 4585,
"duration_micros": 46556427119
},
{
"method": "pants.backend.python.dependency_inference.rules.infer_python_dependencies_via_imports",
"count": 3216,
"duration_micros": 44731238539
},
{
"method": "pants.backend.python.dependency_inference.parse_python_imports.parse_python_imports",
"count": 3216,
"duration_micros": 44514117183
},
{
"method": "pants.engine.process.fallible_to_exec_result_or_raise",
"count": 3216,
"duration_micros": 28124661088
},
{
"method": "process",
"count": 3224,
"duration_micros": 23976103909
}
] The dropoff between Surprisingly, the breakdown of time within [
{
"method": "remote_cache_read_speculation",
"count": 3590,
"duration_micros": 9230203432
},
{
"method": "local_cache_read",
"count": 3626,
"duration_micros": 7594634338
},
{
"method": "local_cache_write",
"count": 3590,
"duration_micros": 7303851431
},
{
"method": "remote_cache_write",
"count": 18,
"duration_micros": 12707169
},
{
"method": "remote_cache_write_setup",
"count": 18,
"duration_micros": 42
}
] Although the local cache read/write are not the hottest part of process execution, they are notable because we execute them first and last, synchronously, before doing remote cache lookups (whereas remote cache lookups occur in parallel with executing the process). So, as a next step, it looks like we should actually try disabling the local cache entirely for this user, since reading/writing to it looks like a bottleneck. If that improves things, we might want to either make local cache lookups concurrent with remote cache lookups, and/or consider skipping local writes if we've hit remotely. |
We've made significant progress on the performance of this case (from 10-12 minutes to 3-4 minutes), so although it is definitely not as low as we'd like it to be, I'm going to call this one resolved. Once the dust settles on the 2.11.x release, and people are stably running with the new settings (in particular, with PEX-native lockfiles and the fix for pantsbuild/pants#14127 in place), we can revisit. Please continue to open issues for the 2022 Performance Push. Thanks! |
For larger number of tests, the startup time and overhead of
pants
in some private repositories can be very high. At the same time, memory usage is unexpectedly high (as first reported in pantsbuild/pants#12662).Most likely the memory usage and total runtime are related, so pursuing the metrics gathering exercise from pantsbuild/pants#12662 would likely be very valuable in also tracking down the performance issue.
The text was updated successfully, but these errors were encountered: