-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LightGBM is incompatible with libomp 12 and 13 on macOS #4229
Comments
All our tests are passing with libomp 12: https://github.com/microsoft/LightGBM/runs/2437586276
I'm not sure LightGBM was ever able to
I think you can migrate to the bug-free Intel toolchain or compile threadless version: https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html#build-threadless-version-not-recommended. |
Well, we have been using a similar approach as stated above (ThreadPool + fit) successfully in production settings for quite some time, and also facing this problem now. As already commented, this issue is quickly solved by downgrading to the older libomp version, without any side effects. Maybe this has been something without any tests, which just now happens to fail? I would also like to point out that this issue could also happen with different scikit-learn wrappers using the joblib/delayed approach. The default here is to use multiprocessing (which works), but threading (in order to save memory etc) does not.
Interestingly, it seems that this (somewhat) fixes the problem. Setting n_jobs=1 works for me, but also higher values (up to around n_jobs=5) seem to work. Maybe this is simply a question of spawning too many threads in total? Some results:
For whatever reason it seems the threshold is between 40 (working) and 42 (failing). |
On XGBoost we are also facing issues with updated libomp. It has internal error: https://github.com/dmlc/xgboost/pull/6912/checks?check_run_id=2459890229 |
Another example of regression in 12 version: facebookresearch/faiss#1849. I can't find this bug was reported... |
Upstream bug report: https://bugs.llvm.org/show_bug.cgi?id=50579. |
Moving the import statement libomp version Error dump when loading booster model. Putting it out here in case it is useful:
|
Facing the exact reported issue.Subscribed for more updates. |
I have the same issue and did some testing: basically |
Unfortunately, LLVM developers haven't fixed this bug (#4229 (comment)) in |
One suggested workaround in the upstream bug report without downgrading
|
New major LLVM version 13 was released 4 days ago: https://github.com/llvm/llvm-project/releases/tag/llvmorg-13.0.0. And the latest Homebrew libomp formulae is pointing to that version now: https://github.com/Homebrew/homebrew-core/blob/4343aee9c28d28b9ed3208b5933df54c29b916fb/Formula/libomp.rb#L4. But unfortunately this bug (#4229 (comment)) wasn't fixed in stable 13 release. |
Bug https://bugs.llvm.org/show_bug.cgi?id=50579 is not yet fixed in version 13, see also microsoft/LightGBM#4229 (comment)
LLVM has changed Bugzilla to GitHub Issues as the main issue tracker. New replies to the original bug report contains the following:
Everyone who is subscribed to this issue and has easy access to macOS, please check the latest available |
@StrikerRUS I saw llvm/llvm-project#49923 is closed, is this problem solved? |
@guolinke I haven't seen this. According to the conversation in llvm/llvm-project#49923, they closed that issue due to the inability to reproduce the issue. Please, anyone subscribed to this issue, check whether the error occurs with the most recent libomp version |
This worked for me in a Local dataspell notebook on M1 ARM. Looks like if you only need tabular package, then you may be in luck:
|
@nickordoodle Thanks for posting this. Can you please explain how your post is related to the topic "LightGBM is incompatible with OpenMP 12 and 13 on macOS"? |
I found tonight that upgrading to the latest brew install libomp
cd ./python-package
pip install .
cd ..
pytest tests/python_package_tests |
Assigning this to myself... I'll prioritize this for the next release of LightGBM (after v4.2.0). I observed a deadlock in this simple example tonight: rm -rf ./dist
sh build-python.sh sdist
pip install ./dist/lightgbm-*.tar.gz import lightgbm as lgb
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=10_000)
dtrain = lgb.Dataset(X, label=y)
dtrain.construct() With the following:
brew info libomp
Installing with OpenMP turned off, I didn't experience any deadlocks or other issues. pip install \
--config-settings=cmake.define.USE_OPENMP=OFF \
./dist/lightgbm-*.tar.gz For more details: #6191 (comment) |
FYI (not sure if this is common knowledge yet): when developing on LightGBM on Apple Silicon, I never turned off OpenMP but used export CXX=g++-13 CC=gcc-13 This fixed any problems I had 😅 |
Thanks @borchero , that's helpful! Looking into this a bit today, I also think that some of these failures might not actually be about incompatibility with particular versions of OpenMP, but rather related to #5106. Fixing the search paths embedded in details (click me)Tried the following today on my intel mac:
rm -rf ./build
mkdir ./build
cd ./build
cmake ..
make -j2 _lightgbm
cd ..
# check what it's linked to
otool -L lib_lightgbm.so
Notice that even though I was building in an active
sh build-python.sh install --precompile
This segfaults, I think because it's finding the python ./examples/python-guide/logistic_regression.py
Running with some debugging stuff set... it looks like that's exactly what's happening. 2 versions of OpenMP are being loaded. DYLD_PRINT_LIBRARIES=1 \
python examples/python-guide/logistic_regression.py 2>&1 \
| grep libomp
Looking a bit more closely, it seems that otool -L /Users/jlamb/mambaforge/envs/lgb-dev/lib/python3.11/site-packages/sklearn/utils/_openmp_helpers.cpython-311-darwin.so
Patching out install_name_tool \
-change /usr/local/opt/libomp/lib/libomp.dylib \
@rpath/libomp.dylib \
/Users/jlamb/mambaforge/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/lib/lib_lightgbm.so otool -L \
/Users/jlamb/mambaforge/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/lib/lib_lightgbm.so
python examples/python-guide/logistic_regression.py
Just stopping here for now to post my notes. I'll continue working on this. |
Adding another relevant link: bacpop/pp-sketchlib#42 (comment) |
We did this in #6391. As of I'm going to mark this Thank you all very much for the patience and helpful comments. Please come by and contribute again some time, we'd love the help! |
This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM! |
Description
LightGBM cannot be used to fit multiple models in parallel using threads with the latest libomp.
On 2014 MacBook Pro:
OMP: Error #13: Assertion failure at kmp_runtime.cpp(3689). OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see https://bugs.llvm.org/. [1] 17358 abort python myfile2.py
On 2019 MacBook Pro:
OMP: Error #131: Thread identifier invalid.
Setting nthreads=1 doesn't solve the problem.
Reproducible example
Environment info
LightGBM version or commit hash: 3.1.1 (with python 3.7.3) and 3.2.1 (with python 3.9.4)
Command(s) you used to install LightGBM
Additional Comments
The code does work with libomp version 11. Downgraded using
The text was updated successfully, but these errors were encountered: