Always respect forced splits, even when feature_fraction < 1.0 (fixes #4601) #4725

tongwu-sh · 2021-10-27T06:25:26Z

Root cause for this issue: The histogram for the force split feature might not construct if feature_fraction < 1.0. And it would failed at SplitInner.

Only col-wise histogram construction impacted. row-wise histogram construction would go through all features.
Only force split settings gain > 0 impacted. If no gain for force_split leaf, the force_split settings would be auto ignored.

jameslamb

Thanks very much for working on this! Please see my initial comments below.

Also, I changed the title of this pull request. In this project, pull request titles become items in the release notes (see, for example, https://github.com/microsoft/LightGBM/releases/tag/v3.3.0), and I don't think "fix issue 4601" would be very informative for a user reading the release notes to understand what has changed.

I'll try to test this as soon as possible using the reproducible example provided in #4601.

src/treelearner/serial_tree_learner.cpp

tests/python_package_test/test_engine.py

shiyu1994

The change in the cpp side LGTM. Just a suggestion in the test case. Thanks!

tests/python_package_test/test_engine.py

jameslamb

Thanks for the changes. Please see a few suggestions I left about the tests. I think it's important to test with force_col_wise=true and force_row_wise=true to be confident that this fixes #4601.

tests/python_package_test/test_engine.py

StrikerRUS

LGTM, thank you very much for working on this!
I left only one suggestion for your consideration to make the test more strict.

StrikerRUS · 2021-11-03T13:17:21Z

tests/python_package_test/test_engine.py

+    assert len(tree_info) > 1
+    for tree in tree_info:
+        tree_structure = tree["tree_structure"]
+        assert tree_structure['split_feature'] == 0


Suggested change

assert tree_structure['split_feature'] == 0

assert tree_structure['split_feature'] == 0

assert tree_structure['threshold'] == pytest.approx(0.5, abs=1e-1)

the actual threshold is the nearest record is about 0.52..

shiyu1994 · 2021-11-08T07:10:11Z

Thanks for the changes. Please see a few suggestions I left about the tests. I think it's important to test with force_col_wise=true and force_row_wise=true to be confident that this fixes #4601.

@jameslamb Hi James, we found that when force_row_wise=true, the test case fails due to other problem, which is not related to using forced splits with feature_fraction < 1.0. It is probably due to the same problem as in issues #3679 #4739.

Can we leave this in subsequent PRs to fix, and merge this for now, given that all tests are passed?

jameslamb · 2021-11-08T18:01:42Z

Hi James, we found that when force_row_wise=true, the test case fails due to other problem

@shiyu1994 just to be sure I understand, which of these do you mean?

the tests added in this PR fail if you run them on master with force_row_wise=True
the tests added in this PR fail on this PR branch (but not master) with force_row_wise=True

If it's number 1, then I support merging this PR as-is and fixing the other bug in a later PR.

If it's number 2, then that means that bug would be introduced by this PR, and I don't think it should be merged.

tongwu-sh · 2021-11-09T01:42:46Z

Hi James, we found that when force_row_wise=true, the test case fails due to other problem

@shiyu1994 just to be sure I understand, which of these do you mean?

the tests added in this PR fail if you run them on master with force_row_wise=True

the tests added in this PR fail on this PR branch (but not master) with force_row_wise=True

If it's number 1, then I support merging this PR as-is and fixing the other bug in a later PR.

If it's number 2, then that means that bug would be introduced by this PR, and I don't think it should be merg ed.

@jameslamb thanks for the check here. It is for #1, "force_row_wise=True" same test would also repro on current master branch.

jameslamb · 2021-11-09T14:50:11Z

It is for #1

Got it, thanks! Then I think it's ok to merge this and fix that other bug separately. Thanks for explaining it to me 😄

jameslamb

Thanks for working through this!

Never considered, when I encountered #4601, that a source of randomness could be in the choice of row-wise vs. column-wise Dataset construction. So I learned something really important through reviewing this and through your explanations ❤️

github-actions · 2023-08-23T14:41:03Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

tongwu-sh added 3 commits October 26, 2021 11:16

issue fix microsoft#4601

fbf0d1b

fix issue 4601 it2

5559d75

add tests for issue 4601

1910649

tongwu-sh requested review from btrotta, chivee, guolinke, henry0312, shiyu1994 and StrikerRUS as code owners October 27, 2021 06:25

tongwu-sh added 5 commits October 27, 2021 14:39

fix warning

cdb172f

fix warning

86c6d82

add new line at end

624cd4e

remove last line at end

1da33f1

fix lint warning

ce720b6

jameslamb changed the title ~~Fix issue 4601~~ Always respect forced splits, even when feature_fraction < 1.0 (fixes #4601) Oct 27, 2021

jameslamb self-requested a review October 27, 2021 22:29

jameslamb requested changes Oct 27, 2021

View reviewed changes

src/treelearner/serial_tree_learner.cpp Outdated Show resolved Hide resolved

tests/python_package_test/test_engine.py Show resolved Hide resolved

tests/python_package_test/test_engine.py Show resolved Hide resolved

address comments

e3d986f

StrikerRUS added the fix label Oct 28, 2021

shiyu1994 reviewed Oct 28, 2021

View reviewed changes

tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved

address comments

3222f0b

tongwu-sh requested review from shiyu1994 and jameslamb October 29, 2021 08:29

StrikerRUS requested changes Oct 29, 2021

View reviewed changes

tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved

address comments

8188da9

tongwu-sh requested a review from hzy46 as a code owner November 1, 2021 02:33

tongwu-sh requested a review from StrikerRUS November 1, 2021 02:34

jameslamb requested changes Nov 1, 2021

View reviewed changes

tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved

fix address

c677bed

tongwu-sh requested a review from jameslamb November 1, 2021 05:01

tongwu-sh added 2 commits November 1, 2021 17:37

address comments

c9106f8

revert seed

a1c7b1a

StrikerRUS approved these changes Nov 3, 2021

View reviewed changes

tongwu-sh added 3 commits November 8, 2021 11:12

fix recursive force split issue

3d8e939

fix build error

2e6bc54

fix lint warning

ab716bf

jameslamb approved these changes Nov 9, 2021

View reviewed changes

shiyu1994 merged commit 33a2f9e into microsoft:master Nov 10, 2021

jameslamb mentioned this pull request Nov 29, 2021

[R-package] enable saving Booster with saveRDS() and loading it with readRDS() (fixes #4296) #4685

Merged

jameslamb mentioned this pull request Dec 28, 2021

Trained tree unexpectedly contains only root #4826

Open

StrikerRUS mentioned this pull request Jan 6, 2022

[DO NOT MERGE] Release 3.3.2 #4930

Closed

13 tasks

jameslamb mentioned this pull request Oct 7, 2022

[DO NOT MERGE] Release v3.3.3 #5525

Closed

40 tasks

github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always respect forced splits, even when feature_fraction < 1.0 (fixes #4601) #4725

Always respect forced splits, even when feature_fraction < 1.0 (fixes #4601) #4725

tongwu-sh commented Oct 27, 2021 •

edited

Loading

jameslamb left a comment

shiyu1994 left a comment

jameslamb left a comment

StrikerRUS left a comment

StrikerRUS Nov 3, 2021

tongwu-sh Nov 8, 2021

shiyu1994 commented Nov 8, 2021

jameslamb commented Nov 8, 2021

tongwu-sh commented Nov 9, 2021

jameslamb commented Nov 9, 2021

jameslamb left a comment

github-actions bot commented Aug 23, 2023

	assert tree_structure['split_feature'] == 0
	assert tree_structure['split_feature'] == 0
	assert tree_structure['threshold'] == pytest.approx(0.5, abs=1e-1)

Always respect forced splits, even when feature_fraction < 1.0 (fixes #4601) #4725

Always respect forced splits, even when feature_fraction < 1.0 (fixes #4601) #4725

Conversation

tongwu-sh commented Oct 27, 2021 • edited Loading

jameslamb left a comment

Choose a reason for hiding this comment

shiyu1994 left a comment

Choose a reason for hiding this comment

jameslamb left a comment

Choose a reason for hiding this comment

StrikerRUS left a comment

Choose a reason for hiding this comment

StrikerRUS Nov 3, 2021

Choose a reason for hiding this comment

tongwu-sh Nov 8, 2021

Choose a reason for hiding this comment

shiyu1994 commented Nov 8, 2021

jameslamb commented Nov 8, 2021

tongwu-sh commented Nov 9, 2021

jameslamb commented Nov 9, 2021

jameslamb left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 23, 2023

tongwu-sh commented Oct 27, 2021 •

edited

Loading