Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lightgbm CPU learner error - lightgbm.basic.LightGBMError: Check failed: (best_split_info.left_count) > (0) #3489

Closed
diditforlulz273 opened this issue Oct 26, 2020 · 5 comments · Fixed by #3492

Comments

@diditforlulz273
Copy link

diditforlulz273 commented Oct 26, 2020

How you are using LightGBM?

LightGBM component:

Environment info

Operating System:
Ubuntu 20.04
CPU/GPU model:
Threadripper 1920x
C++ compiler version:
gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)
CMake version:
3.16.3
Java version:

Python version:
3.8.5
R version:

Other:

LightGBM version or commit hash:
Version: 3.0.0

Error message and / or logs

@guolinke Have just built it from the latest master branch, still fails. I'll try to separate a minimum reproducible example and create an issue then.

Originally posted by @diditforlulz273 in #2793 (comment)

[LightGBM] [Fatal] Check failed: (best_split_info.left_count) > (0) at /__w/1/s/python-package/compile/src/treelearner/serial_tree_learner.cpp, line 630 .

Traceback (most recent call last):
File "/home/seva/PycharmProjects/ECOM_demand/lgbm_mre.py", line 41, in
model = lgb.train(lgb_params, train_dat, valid_sets=test_dat, verbose_eval=20)
File "/home/seva/PycharmProjects/ECOM_demand/venv/lib/python3.8/site-packages/lightgbm/engine.py", line 252, in train
booster.update(fobj=fobj)
File "/home/seva/PycharmProjects/ECOM_demand/venv/lib/python3.8/site-packages/lightgbm/basic.py", line 2370, in update
_safe_call(_LIB.LGBM_BoosterUpdateOneIter(
File "/home/seva/PycharmProjects/ECOM_demand/venv/lib/python3.8/site-packages/lightgbm/basic.py", line 55, in _safe_call
raise LightGBMError(decode_string(_LIB.LGBM_GetLastError()))
lightgbm.basic.LightGBMError: Check failed: (best_split_info.left_count) > (0) at /__w/1/s/python-package/compile/src/treelearner/serial_tree_learner.cpp, line 630 .

Reproducible example(s)

Archive with code and pickled datasets, 4.5 kb total (lol really a MINIMAL reproducible example)

https://drive.google.com/file/d/1y04Z_11Ce-sETRZPfwjBEBY-aSOMBcHK/view?usp=sharing

Steps to reproduce

1.Run it on lgbm==3.0.0
2. ?????
3. NO PROFIT!
4. Run on lgbm==2.3.1
5.?????
6. PROFIT!

@guolinke
Copy link
Collaborator

Thanks @diditforlulz273
It seems I met the error when loading data.

>>> train_x = pd.read_pickle('train_x.pkl')
>>> test_x = pd.read_pickle('test_x.pkl')
>>> train_y = np.loadtxt('train_y.txt')
>>> test_y = np.loadtxt('test_y.txt')
>>> train_x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\guoke\Anaconda3\lib\site-packages\pandas\core\frame.py", line 680, in __repr__
    self.to_string(
  File "C:\Users\guoke\Anaconda3\lib\site-packages\pandas\core\frame.py", line 801, in to_string
    formatter = fmt.DataFrameFormatter(
  File "C:\Users\guoke\Anaconda3\lib\site-packages\pandas\io\formats\format.py", line 593, in __init__
    self.max_rows_displayed = min(max_rows or len(self.frame), len(self.frame))
  File "C:\Users\guoke\Anaconda3\lib\site-packages\pandas\core\frame.py", line 1041, in __len__
    return len(self.index)
  File "C:\Users\guoke\Anaconda3\lib\site-packages\pandas\core\generic.py", line 5270, in __getattr__
    return object.__getattribute__(self, name)
  File "pandas\_libs\properties.pyx", line 63, in pandas._libs.properties.AxisProperty.__get__
  File "C:\Users\guoke\Anaconda3\lib\site-packages\pandas\core\generic.py", line 5270, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute '_data'

could you share the raw data or NumPy format data?

@diditforlulz273
Copy link
Author

I guess the problem could be in different versions of Pandas and python's underlying pickle part.
I used Pandas 1.1.3 and Python 3.8.5

Sharing numpy arrays is nearly impossible - train dataframe has ~54 columns.
If versions matching won't help, I could share the data in .csv, although saving to .csv sometimes transforms ints like 7 to floats like 7.000000039 in an unpredictable manner, which can affect reproductability.

@guolinke
Copy link
Collaborator

@diditforlulz273 I found the root cause you cannot set both min_data_in_leaf and min_child_weight to 0.
The leaf should be at least have one sample, otherwise, it cannot be a leaf.
We will fix the parameter checking, and throw errors before training.

@diditforlulz273
Copy link
Author

@guolinke Well, indeed, this combination looks stupid, my bad. I used this tiny dataset in some of my integration tests, and, I guess, set all the possible constraints to 0 to make LightGBM build at least some trees and predict something reproducible. Anyway, it worked out well in 2.3.1 version :)

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants