-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect CPU utilization statistics of CI builders #48828
Comments
On Windows this can be done by taking advantage of job objects. If the entire build is wrapped in a job object then we can call |
I made script that will print # launch in travis as 'pathto/script.sh &'
while `sleep 30`
do
top -ibn 1 | head -n4 | tr "\n" " " | tee -a /tmp/top.log
echo "" | tee -a /tmp/top.log
done Some findings: Cloning submodules jemalloc, libcompiler_buildtins and liblibc alone takes 30 seconds. While building bootstrap, compiling serde_derive, serde_json and bootstrap crates seems to take 30 seconds (total build time: 47 seconds). stage0: stage0 codegen artifacts: During stage1, rustc_errors and syntax_ext builds are approximately as slow as during stage0, rustc_plugins 2 minutes, one CGU. stage2: compiletest suite=run-make mode=run-make: Testing alloc stage1: Testing syntax stage1: Notes: |
As shown in #48480 (comment), the CPUs assigned to each job may have some performance difference:
The clock-rate 2.4 GHz vs 2.5 GHz shouldn't make any noticeable difference though (this would at most slow down by 7.2 minutes out of 3 hours if everything is CPU-bound). It is not enough to explain the timeout in #48480. |
I was working on https://github.com/alexcrichton/cpu-usage-over-time recently for this where it periodically prints out the CPU usage as a percentage for the whole system (aka 1 core on a 4 core machine is 25%). I only got Linux/OSX working though and was unable to figure out a good way to do it on Windows. My thinking for how we'd do this is probably download a binary near the beginning of the build (or set up some script). We'd then run Initially I was also thinking we'd just |
This commit adds a script which we'll execute on Azure Pipelines which is intended to run in the background and passively collect CPU usage statistics for our builders. The intention here is that we can use this information over time to diagnose issues with builders, see where we can optimize our build, fix parallelism issues, etc. This might not end up being too useful in the long run but it's data we've wanted to collect for quite some time now, so here's a stab at it! Comments about how this is intended to work can be found in the python script used here to collect CPU usage statistics. Closes rust-lang#48828
…=pietroalbini ci: Collect CPU usage statistics on Azure This commit adds a script which we'll execute on Azure Pipelines which is intended to run in the background and passively collect CPU usage statistics for our builders. The intention here is that we can use this information over time to diagnose issues with builders, see where we can optimize our build, fix parallelism issues, etc. This might not end up being too useful in the long run but it's data we've wanted to collect for quite some time now, so here's a stab at it! Comments about how this is intended to work can be found in the python script used here to collect CPU usage statistics. Closes rust-lang#48828
One of the easiest ways to make CI faster is to make things parallel and simply use the hardware we have available to us. Unfortunately though we don't have a lot of data about how parallel our build is. Are there steps we think are parallel but actually aren't? Are we pegged to one core for long durations when there's other work we could be doing?
The general idea here is that we'd spin up a daemon at the very start of the build which would sample CPU utilization every so often. This daemon would then update a file that's either displayed or uploaded at the end of the build.
Hopefully we could then use these logs to get a better view into how the builders are working during the build, diagnose non-parallel portions of the build, and implement fixes to use all the cpus we've got.
cc @rust-lang/infra
The text was updated successfully, but these errors were encountered: