-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bin size 257 cannot run on GPU #4082
Comments
Here is more minimal MRE:
FYI gpu_use_dp=True or False has no effect. That is, I iterated through all parameters, the key to failure is (of course) on GPU but also min_data_in_bin=1. 2 also fails, but 10 does not fail. So lgb is not respecting the max_bin of 255 even for numeric values. If this is a user error, I recommend listening primarily to max_bin. E.g. when doing hyperparameter search, fatal failures are not fun to handle. Best if lgb does reasonable thing. |
Hi, any thoughts? Seems like a clear MRE, but it's been 5 days and no response. Thanks. |
Again, no categorical handling enabled etc. This is on master as of last night. |
@guolinke reminder - still the dominant failure mode for LightGBM in Driverless AI |
I think the old GPU/CUDA version will be abandoned. |
@arnocandel We are updating a branch new CUDA version. Please follow #4630 and #4528 for latest progress. |
@shiyu1994 and @guolinke . Hi, Looking at those 2 PRs made me realize that perhaps the current CUDA mode (as opposed to openCL) is incomplete. e.g. you mention categorical handling as added to CUDA version in the PR. Is that correct? More generally, is the CUDA version incomplete in various ways that are documented? Or does it have (or will have) full parity? If I run with CUDA version with categorical handling it seems to run fine, but maybe it's not doing what I choose even though I pass categorical_feature? |
@pseudotensor The current CUDA version is doing the correct thing, it can handle categorical features normally. The only problem is current implementation only do histogram construction on GPU, so the GPU utilization can be low. Supporting of categorical features is not added yet in our first part of new CUDA version #4630, but will be added later. |
Here's another minimal repro, in case helps
first one passes, second one fails, not sure where 257 comes from:
|
Thanks very much @arnocandel ! But are you able to provide a reproducible example starting from raw data in a text-based format, generated from scratch with I personally don't ever load pickle files whose origin I don't know, and I expect others wanting to contribute to fixing this issue might share that hesistation. From https://docs.python.org/3/library/pickle.html
|
@jameslamb - ok
|
I'm having the same issue over here!
|
@jameslamb - were you able to check with above two .csv files for X and y? Here the full thing for simplicity:
|
I was not. If you're subscribed to this issue, you'll be notified when someone picks this up or has new information to share. |
this is a bug for lightGBM for GPU,when use CPU,it is OK. |
Any update so far on this issue? |
I'm having the same issue :( |
same issue too :( |
Still have this issue. |
I have the same issue "LightGBMError: bin size 1973 cannot run on GPU." It runs alright using CPU. |
For everyone who encounters this issue with the |
I have followed these instructions to install the CUDA version instead of the GPU version, but I still have the same issue: For more info, I am running on a linux server with cuda 12.1 with A100. Let me know if more info are needed to fix this issue. |
Same issue with GPU version on windows, works fine on CPU
|
I have realized that after compiling lightgbm with the cuda option, and then using the command On a side note, the main issue of cuda memory still persists, and this relates to the fact that a categorical feature has too many unique values (I tested by omitting that feature and it works fine on both gpu, cuda and cpu). But when including that feature, using the gpu version I get
So it seems that there is a limitation in the implementation when it comes to categorical features on cuda/gpu, that requires a fix. |
I have the same issue "LightGBMError: bin size 512 cannot run on GPU." |
I know there are a couple other issues that mention this problem, but it's gotten messy with suggestions it's related to categorical_feature setting and other stuff. Here is clean MRE.
d9a96c9
lgb257.pkl.zip
FYI a model.get_params() shows:
and FYI here is kwargs:
Running
fails same way, but I'm unsure for sklearn API if it is using 'auto' for categorical_feature then.
The text was updated successfully, but these errors were encountered: