-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OverflowError when training with 100k+ iterations #2265
Comments
It seems this error is caused by ctypes... |
Just curious. Do you think it is possible to bypass the |
keep_training_booster=True is the only solution for now. |
@guolinke Do you think that this issue is fixable? |
There could be a work around, for example, returning multiple small strings and concat them outside ctypes. |
Closed in favor of being in #2302. We decided to keep all feature requests in one place. Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature. |
Environment info
Operating System: Windows 7 SP2 (and same issue on macOS 10.13.6 but it crashes python kernel without any message)
CPU/GPU model: CPU
C++/Python/R version: Python 3.6
LightGBM version or commit hash: 2.2.3 (and 2.2.0)
Error message
When training lightgbm with more than 100,000 iterations, the model can finish training (still enough memory) but fail when it try to exit the training process.
However, if I set the
keep_training_booster=True
, it can finish the entire training without problem. So this seems to happen only when Lightgbm is trying to turn the model into a string before removing it.Reproducible examples
You can try with any regression problem with ~50,000 samples and 150 features, and train it with ~300,000 iterations but small learning rate like 0.001.
where
df_train
in our case has about 50,000 samples and 150 features and it still fit in our 16GB memory during training. But only fail when exiting the training withkeep_training_booster=False
.The text was updated successfully, but these errors were encountered: