You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
data.table::getDTthreads() is not agile to Linux CGroups settings. If CGroups limits the number of CPU cores, then data.table will overuse the CPU resources is available to the R process.
For example, the 'Free' Posit Cloud plan gives you a single CPU core to play with. They use CGroups v1 to limit the CPU resource. Running the following from within their RStudio server reveals this:
This means multi-threaded data.table tasks will overuse the CPU resources by 800%, which results in lots of overhead from context switching (unless there are other low-level mechanisms in data.table detecting this). CPU overuse will slow down the performance.
The overuse problem becomes worse the more CPU cores the host has. For example, the Posit Cloud instances currently runs with 16 vCPUs, but if they upgrade to say 64 vCPUs, the overuse will be 3200%. On research HPC environments, it's now common to see 192 CPUs, and I'd expect this number to grow over time.
FWIW, parallelly::availableCores() queries also CGroups/CGroups v2, e.g.
similar to #5573 about using data.table on slurm cluster.
currently we assume this kind of configuration should be handled by the user. For example, the user can set R_DATATABLE_NUM_THREADS environment variable.
in terms of dev/maintenance time, how many types of environment variables like this should we support? (SLURM, CGroups, ...?) how would we test each of them? given constraints on dev time, I would argue that it would be better to keep asking users to handle this.
it would be better to keep asking users to handle this
Given that data.table is such a central infrastructure package used internally by many packages and pipelines, I wonder how many users even know they are using data.table yet know they need to configure the number threads it should use.
For the problem reported here, CGroups throttling, I believe there are lots of data.table instances out there running slower than a single-thread version would do, and this without anyone even noticing the problem. It's only the savvy user who would know that this could be a problem and that it should be fixed.
Issue
data.table::getDTthreads()
is not agile to Linux CGroups settings. If CGroups limits the number of CPU cores, then data.table will overuse the CPU resources is available to the R process.For example, the 'Free' Posit Cloud plan gives you a single CPU core to play with. They use CGroups v1 to limit the CPU resource. Running the following from within their RStudio server reveals this:
A user on the 'Premium' plan has 4 CPUs to play with, so they would get
quota = 400000
andcores = 4
above.The defaults of data.table does not pick this up:
This means multi-threaded data.table tasks will overuse the CPU resources by 800%, which results in lots of overhead from context switching (unless there are other low-level mechanisms in data.table detecting this). CPU overuse will slow down the performance.
The overuse problem becomes worse the more CPU cores the host has. For example, the Posit Cloud instances currently runs with 16 vCPUs, but if they upgrade to say 64 vCPUs, the overuse will be 3200%. On research HPC environments, it's now common to see 192 CPUs, and I'd expect this number to grow over time.
FWIW,
parallelly::availableCores()
queries also CGroups/CGroups v2, e.g.Session info
The text was updated successfully, but these errors were encountered: