Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GBM bugfix: matching predictions LightGBM, hummingbird #2574

Merged
merged 11 commits into from
Oct 3, 2022
Merged

Conversation

jppgks
Copy link
Contributor

@jppgks jppgks commented Sep 30, 2022

This PR:

  1. Fixes a bug where category variable with 2 classes gave wrong predictions
  2. Asserts predictions from LightGBM and converted Hummingbird model match
  3. Extends test coverage (incl. for the above cases)
  4. Uses cleaner sklearn API for LightGBM

There are quite some changes, the PR is best reviewed commit-by-commit

@github-actions
Copy link

github-actions bot commented Sep 30, 2022

Unit Test Results

       4 files   -     1         4 suites   - 1   1h 52m 36s ⏱️ - 27m 54s
3 423 tests +  10  3 344 ✔️ +  9    78 💤 ±  0  1 +1 
9 995 runs   - 110  9 783 ✔️  - 89  211 💤  - 22  1 +1 

For more details on these failures, see this check.

Results for commit ac455d0. ± Comparison against base commit 1dc66ca.

♻️ This comment has been updated with latest results.

@jppgks jppgks changed the title GBM robustness: matching predictions LightGBM, hummingbird GBM bugfix: matching predictions LightGBM, hummingbird Oct 3, 2022
ludwig/models/gbm.py Show resolved Hide resolved
tests/integration_tests/test_gbm.py Outdated Show resolved Hide resolved
@@ -719,17 +722,17 @@ def set_steps_to_1_or_quit(self, signum, frame):
def _construct_lgb_params(self) -> Tuple[dict, dict]:
output_params = {}
feature = next(iter(self.model.output_features.values()))
if feature.type() == CATEGORY:
if feature.type() == BINARY or (hasattr(feature, "num_classes") and feature.num_classes == 2):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to check the num_classes stuff here when the feature is binary?

Copy link
Contributor Author

@jppgks jppgks Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a user can specify a category variable with only two classes, this is to catch that case and explicitly use LightGBM with the binary objective

@justinxzhao justinxzhao merged commit 43fee24 into master Oct 3, 2022
@justinxzhao justinxzhao deleted the debug-gbm branch October 3, 2022 23:57
jppgks added a commit that referenced this pull request Oct 4, 2022
* GBM bugfix: matching predictions LightGBM, hummingbird (#2574)

* use old dataset API

Co-authored-by: Joppe Geluykens <joppe@predibase.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants