-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ci] parallelize R package installs in CI jobs #4198
Conversation
/gha run r-solaris Workflow Solaris CRAN check has been triggered! 🚀 solaris-x86-patched: https://builder.r-hub.io/status/lightgbm_3.2.1.99.tar.gz-e9ee6191d08443fda80dbf2d2daa5138 |
/gha run r-valgrind Workflow R valgrind tests has been triggered! 🚀 Status: success ✔️. |
I like this change!
|
Ah yes, you're totally right, thank you! added in 7c6fd87 |
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
This week, @jnolis taught me that the R function
install.packages()
can support parallel package installations! That function supports an argumentNcpus
. If set to a value greater than 1, R will install multiple packages at the same time.See https://stat.ethz.ch/R-manual/R-patched/library/utils/html/install.packages.html for more.
This PR proposes setting that argument to the value of
parallel::detectCores()
in this project's CI jobs and documentation of how to run them manually. I think this should reduce the time it takes R CI jobs to run, especially on Linux where CRAN does not prepare precompiled binaries.{parallel}
comes installed in all standard installations of R, so this does not introduce a new dependency for LightGBM's CI jobs.Windows and Linux runners from GitHub Actions have a single 2-core CPU and macOS runners have a single 3-core CPU: https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources.
Testing whether this actually makes builds faster
The table below compares run times for the R jobs on the 3 most recent builds of
master
and 3 builds of this PRmaster
build 1: https://github.com/microsoft/LightGBM/actions/runs/759494394master
build 2: https://github.com/microsoft/LightGBM/actions/runs/759907488master
build 3: https://github.com/microsoft/LightGBM/actions/runs/761721481Linux
Mac
Windows
The fastest time for each build is {in brackets}. It's hard to get an accurate estimate of the timing for these things because so many factors impact runtime of the CI jobs, including the response times and availability of multiple package managers.
But it does look roughly like what I'd expect...this change seems to offer a noticeable speedup on Linux (where R packages have to be installed from source) and only a small speed up (if any) on Window and macOS (where CRAN provides binaries).