Calling multithreaded functions sets global number of OMP threads #4705

david-cortes · 2021-10-22T18:36:07Z

Calling lightgbm functions which use multithreading will forcibly change the configured number of openmp threads in the whole processes from where lightgbm was called, which is a bad practice (for example, depending on how numpy was built, it might change the number of threads that matrix multiplications in numpy will use afterwards). This can be checked in lines such as this:

LightGBM/src/c_api.cpp

Line 2144 in a2b60e8

omp_set_num_threads(config.num_threads);

The fix is easy but requires some design changes in the overall code structure: don't set the number of openmp threads, pass them instead as a parameter to the openmp pragmas.

Example:

import sklearn
import threadpoolctl
threadpoolctl.threadpool_info()

[{'filepath': '/home/david/anaconda3/envs/py3/lib/libopenblasp-r0.3.13.so',
  'prefix': 'libopenblas',
  'user_api': 'blas',
  'internal_api': 'openblas',
  'version': '0.3.13',
  'num_threads': 16,
  'threading_layer': 'pthreads',
  'architecture': 'Zen'},
 {'filepath': '/home/david/anaconda3/envs/py3/lib/libgomp.so.1.0.0',
  'prefix': 'libgomp',
  'user_api': 'openmp',
  'internal_api': 'openmp',
  'version': None,
  'num_threads': 16}]

from lightgbm import LGBMRegressor
from sklearn.datasets import fetch_california_housing
X, y = fetch_california_housing(return_X_y=True)

model = LGBMRegressor(n_jobs=3).fit(X, y)
p = model.predict(X)
threadpoolctl.threadpool_info()

[{'filepath': '/home/david/anaconda3/envs/py3/lib/libopenblasp-r0.3.13.so',
  'prefix': 'libopenblas',
  'user_api': 'blas',
  'internal_api': 'openblas',
  'version': '0.3.13',
  'num_threads': 16,
  'threading_layer': 'pthreads',
  'architecture': 'Zen'},
 {'filepath': '/home/david/anaconda3/envs/py3/lib/libgomp.so.1.0.0',
  'prefix': 'libgomp',
  'user_api': 'openmp',
  'internal_api': 'openmp',
  'version': None,
  'num_threads': 3}]

The text was updated successfully, but these errors were encountered:

StrikerRUS · 2021-10-22T21:28:31Z

Ah, just wrote a comment in the related issue! 🙂 Could you please share your expert opinion in #4607 on the chosen design?

StrikerRUS · 2022-01-08T23:14:00Z

Related: dmlc/xgboost#7537.

guolinke · 2022-03-02T15:52:11Z

@david-cortes if you use the default value (-1), the omp_set_num_threads is not called. And it should use global omp n-therads by default. Does it meet your demand?

david-cortes · 2022-03-02T16:54:53Z

@guolinke : But the issue still persists: if one wants to pass a number of threads that's different from the OMP default, it will alter the current OMP default.

Also scikit-learn's glossary mentions that negative n_jobs is interpreted as following joblib's formula:
https://scikit-learn.org/stable/glossary.html#term-n_jobs
(so e.g. -1 means using all threads instead of following the OMP default)

I think users would typically expect scikit-learn-compatible packages to follow that kind of conventions.

guolinke · 2022-03-03T00:08:17Z

I see. A quick solution is to have a context manager. For example, when start running LightGBM, it records current omp nthreads, then when exiting LightGBM, it resets the omp nthreads by the previous recorded values.

Innixma · 2023-01-11T00:22:40Z

Just wanted to mention that this problem is quite nasty when working on AutoML systems (I develop AutoGluon), as LightGBM can alter the OpenMP thread count at inference time, slowing down torch neural network models running on CPU dramatically if ensembling them together. It would be great if this was fixed. We currently use a context manager to avoid the issue, but it slows down inference speed in small-batch size scenarios.

jameslamb · 2023-01-11T00:32:39Z

Thanks for the comment @Innixma ! That's important context.

We've really been struggling from a lack of maintainer attention and availability here, which is why this has been open so long. Are you interested in working on this problem and submitting a proposal in a pull request? That is the best way to ensure that it's fixed in the next release of LightGBM.

Innixma · 2023-01-28T02:26:47Z

Hi @jameslamb, thanks for your attention!

I'm sorry to hear that LightGBM is struggling to obtain prioritization. It is an amazing package and foundational to many AutoML systems. There is a reason the only model I have more than 1 of in our ensemble is LightGBM, and we use it 3 times!

Unfortunately, to say that my C++ skills are rusty would be an understatement, and I'd expect the rest of the AutoGluon developers would say the same.

We currently have a work-around recently implemented into AutoGluon to fix the majority of the slow down (2.5x faster ensemble inference). The work-around mostly boils down to avoiding the use of a context manager entirely and always specifying lightgbm_model.predict(X, num_threads=0), thus avoiding the altering of the OpenMP thread limit and simply using whatever value happens to be set globally.

This solution is a bit shaky, since it depends on the assumption that LightGBM will default to the current OpenMP thread limit and that the current OpenMP thread limit is set to a value that all model types are efficient with (not really possible in reality as some models prefer thread-count and others prefer physical-count, but hopefully it is close enough to not be too slow). Also, optimal OpenMP thread limit differs based on inference batch size, but that is a separate dimension of optimization.

Thankfully, it is pretty easy to benchmark, and for most cases I think the work-around generally solves the problem on our end.

jameslamb · 2023-10-07T04:43:37Z

Assigning this to myself, as I'm actively working on it right now.

…enMP side effects (fixes #4705, fixes #5102)

…enMP side effects (fixes #4705, fixes #5102) (#6226)

StrikerRUS mentioned this issue Oct 22, 2021

Passing default num_threads to booster.predict not working #4607

Closed

jameslamb added the feature request label Oct 26, 2021

guolinke added the awaiting response label Mar 2, 2022

no-response bot removed the awaiting response label Mar 2, 2022

david-cortes mentioned this issue Apr 7, 2022

[R-package] Temporary workaround for not modifying global OMP config #5134

Closed

jameslamb mentioned this issue Apr 10, 2022

LGBM_DatasetCreateFromCSC does not allow thread control #4598

Closed

This was referenced Jul 13, 2022

[R-package] tests and examples should not use more than 2 threads #5102

Closed

[tests][R-package] run all tests using 2 threads (fixes #5102) #5367

Closed

david-cortes mentioned this issue Jul 26, 2023

[R-package] v4.0.0 CRAN submission issues #5987

Closed

jameslamb self-assigned this Oct 7, 2023

This was referenced Oct 8, 2023

remove unnecessary allocations in HistogramSumReducer #6132

Merged

factor out uses of omp_get_num_threads() and omp_get_max_threads() outside of OpenMP wrapper #6133

Merged

set explicit number of threads in every OpenMP parallel region #6135

Merged

david-cortes mentioned this issue Oct 24, 2023

Keep number of threads in a global variable separate from global OMP config #6152

Closed

jameslamb added a commit that referenced this issue Dec 5, 2023

[R-package] [c++] add tighter multithreading control, avoid global Op…

7f0de8f

…enMP side effects (fixes #4705, fixes #5102)

jameslamb mentioned this issue Dec 5, 2023

[R-package] [c++] add tighter multithreading control, avoid global OpenMP side effects (fixes #4705, fixes #5102) #6226

Merged

jameslamb closed this as completed in #6226 Dec 7, 2023

jameslamb added a commit that referenced this issue Dec 7, 2023

[R-package] [c++] add tighter multithreading control, avoid global Op…

1548b42

…enMP side effects (fixes #4705, fixes #5102) (#6226)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calling multithreaded functions sets global number of OMP threads #4705

Calling multithreaded functions sets global number of OMP threads #4705

david-cortes commented Oct 22, 2021 •

edited

Loading

StrikerRUS commented Oct 22, 2021

StrikerRUS commented Jan 8, 2022

guolinke commented Mar 2, 2022

david-cortes commented Mar 2, 2022

guolinke commented Mar 3, 2022

Innixma commented Jan 11, 2023

jameslamb commented Jan 11, 2023

Innixma commented Jan 28, 2023

jameslamb commented Oct 7, 2023

Calling multithreaded functions sets global number of OMP threads #4705

Calling multithreaded functions sets global number of OMP threads #4705

Comments

david-cortes commented Oct 22, 2021 • edited Loading

StrikerRUS commented Oct 22, 2021

StrikerRUS commented Jan 8, 2022

guolinke commented Mar 2, 2022

david-cortes commented Mar 2, 2022

guolinke commented Mar 3, 2022

Innixma commented Jan 11, 2023

jameslamb commented Jan 11, 2023

Innixma commented Jan 28, 2023

jameslamb commented Oct 7, 2023

david-cortes commented Oct 22, 2021 •

edited

Loading