Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confused about 2^max_depth > num_leaves warning #5734

Closed
Cat2Li opened this issue Feb 21, 2023 · 2 comments · Fixed by #6402
Closed

Confused about 2^max_depth > num_leaves warning #5734

Cat2Li opened this issue Feb 21, 2023 · 2 comments · Fixed by #6402
Labels

Comments

@Cat2Li
Copy link

Cat2Li commented Feb 21, 2023

I followed the parameter tuning guide at https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html, which says

  1. num_leaves. This is the main parameter to control the complexity of the tree model. Theoretically, we can set num_leaves = 2^(max_depth) to obtain the same number of leaves as depth-wise tree. However, this simple conversion is not good in practice. The reason is that a leaf-wise tree is typically much deeper than a depth-wise tree for a fixed number of leaves. Unconstrained depth can induce over-fitting. Thus, when trying to tune the num_leaves, we should let it be smaller than 2^(max_depth). For example, when the max_depth=7 the depth-wise tree can get good accuracy, but setting num_leaves to 127 may cause over-fitting, and setting it to 70 or 80 may get better accuracy than depth-wise.

However, when I set max_leaves = 31 and max_depth = 8 explictly, I keep getting the warning message:

[LightGBM] [Warning] Accuracy may be bad since you didn't explicitly set num_leaves OR 2^max_depth > num_leaves. (num_leaves=31).

The warnings actually do not make sense since num_leaves = 31 is good enough for my model and I set max_depth = 7,8,9 to prevent overfitting. For example, if I follow the warning message and set max_depth = 4, in my opinion max_depth is no longer a valid hyperparameter since only 16 leaves can be grown.

I really wonder if I misunderstand something in your manual or the warning itself is misleading. I am looking forward to your reply.

Thanks.

@pranavkolapkar167
Copy link

can you explain why only 16 leaves can be there if max depth is set to 4

@jameslamb
Copy link
Collaborator

jameslamb commented Apr 1, 2024

why only 16 leaves can be there if max depth is set to 4

"depth" begins at 0 (a 0-split tree has depth=0).

LightGBM creates binary splits (the data is partitioned into exactly 2 groups by each split).

LightGBM grows non-symmetric trees.

These constraints describe why, when max_depth >= 0, the most leaves a given tree can have is 2 ^ max_depth. Consider this visually:

image

You could try changing the parameters max_depth and num_leaves in the Python API to explore this. For example, this example with lightgbm==4.3.0 shows that with max_depth=1 and num_leaves left at its default, lightgbm will produce trees with up to 2 leaves.

import lightgbm as lgb
from sklearn.datasets import make_blobs

X, y = make_blobs(n_samples=10_000, n_features=1, centers=2)

bst = lgb.train(
    params={
        "objective": "binary",
        "max_depth": 1,
        "num_iterations": 1,
        "verbose": 1
    },
    train_set=lgb.Dataset(X, label=y, params={"min_data_in_bin": 1})
)

bst_df = bst.trees_to_dataframe()
   tree_index  node_depth node_index left_child right_child parent_index  ... decision_type  missing_direction  missing_type    value  weight  count
0           0           1       0-S0       0-L0        0-L1         None  ...            <=               left          None  0.00000     0.0  10000
1           0           2       0-L0       None        None         0-S0  ...          None               None          None  0.20000  1249.5   4998
2           0           2       0-L1       None        None         0-S0  ...          None               None          None -0.19984  1250.5   5002

[3 rows x 15 columns]

the warning itself is misleading

Sorry for this, yes it is misleading. Please see the discussion in #2898 (comment) ... I'll put up a PR to modify this warning and clarify the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants