[python] save all param values into model file #2589

StrikerRUS · 2019-11-24T00:41:47Z

Fix for Python part of #2208.

StrikerRUS · 2019-11-24T00:49:36Z

python-package/lightgbm/basic.py

+        params_to_update = copy.deepcopy(self.params)
+        params_to_update.update(dict(kwargs,
+                                     predict_raw_score=raw_score,
+                                     predict_leaf_index=pred_leaf,
+                                     predict_contrib=pred_contrib,
+                                     num_iteration_predict=num_iteration))
+        self.reset_parameter(params_to_update)


@guolinke This seems to be quite computationally expensive part and it doesn't work in case params contain any "core" parameters, e.g.

[LightGBM] [Fatal] Cannot change metric during training [LightGBM] [Fatal] Cannot change num_class during training [LightGBM] [Fatal] Cannot change boosting during training

Maybe we can put predict parameters into config directly at cpp side during prediction itself?

Yeah, use reset parameter is not a good idea. BTW, do we need to save predict parameters?

Do you mean remove predict params from model file completely? I think it makes sense, because they are not needed to restore a model and seems to be more "logging" stuff (or at least something different than params used to train model). I think we can introduce naming schema like predict_* for them and filter later to not write params in model file which start with predict_.

yeah, it makes senses.

OK, then I'll remove my attempts to record prediction time params from here. Can you please help with cpp part to not store them at all (here or in a separate PR)?

Hmmm, seems that we already have [doc-only] directive:

LightGBM/helpers/parameter_generator.py

Lines 301 to 302 in dc65e0a

if "[doc-only]" in y:

continue

BTW, why are for example boosting and objective already doc-only?

As they are not automatically generated. We manually write their code.

@guolinke

Maybe we can add a tag, e.g. [no-save], to these parameters, and skip them in to_string.

I've added [no-save] tag in the latest commit and applied it to predict and convert_model tasks' params.
Maybe we can apply it to "output" training values as well? I mean, params like

LightGBM/include/LightGBM/config.h

Lines 488 to 491 in 102893a

// alias = model_output, model_out

// desc = filename of output model in training

// desc = **Note**: can be used only in CLI version

std::string output_model = "LightGBM_model.txt";

LightGBM/include/LightGBM/config.h

Lines 564 to 567 in 102893a

// alias = is_save_binary, is_save_binary_file

// desc = if ``true``, LightGBM will save the dataset (including validation data) to a binary file. This speed ups the data loading for the next time

// desc = **Note**: can be used only in CLI version; for language-specific packages you can use the correspondent function

bool save_binary = false;

LightGBM/include/LightGBM/config.h

Lines 784 to 787 in 102893a

// check = >0

// alias = output_freq

// desc = frequency for metric output

int metric_freq = 1;

@guolinke WDYT about the list of candidate params for the tag?

StrikerRUS · 2019-11-27T23:46:17Z

@guolinke Just out of curiosity, what is the reason to keep text and JSON representations different?

StrikerRUS · 2019-11-27T23:48:11Z

@jameslamb @Laurae2 Would you mind fixing R-package right here or prefer create a separate PR later?

StrikerRUS · 2019-12-01T05:32:51Z

@guolinke One more problem is that it seems params for Dataset are not saved at all:

import numpy as np
import lightgbm as lgb

X = np.random.random((100, 2))
y = np.random.random(100)
lgb_data = lgb.Dataset(X, y, categorical_feature=[0, 1], params={"max_bin": 100})
bst = lgb.train({}, lgb_data, num_boost_round=5)
bst.save_model('model.txt')

...
[max_bin: 255]
...
[categorical_feature: ]
...

guolinke · 2019-12-01T15:38:20Z

@StrikerRUS yes, it is a problem. Currently, we only copy the params in lgb.train/... to the Dataset, but the copy to Booster from Dataset.

guolinke · 2019-12-01T15:46:17Z

@StrikerRUS

@guolinke Just out of curiosity, what is the reason to keep text and JSON representations different?

which part you refer to?

StrikerRUS · 2019-12-01T21:44:47Z

@guolinke

yes, it is a problem. Currently, we only copy the params in lgb.train/... to the Dataset, but the copy to Booster from Dataset.

Do you have any thoughts how to fix it?..

which part you refer to?

I mean, some fields that presented in text format are not included into JSON format and vice versa. For example, parameters, feature importances.

import json

import numpy as np
import lightgbm as lgb

X = np.random.random((100, 2))
y = np.random.random(100)
lgb_data = lgb.Dataset(X, y)
bst = lgb.train({}, lgb_data, num_boost_round=3)
bst.save_model('save_model.txt')
with open('dump_model.json', 'w') as json_dump:
    json.dump(bst.dump_model(), json_dump, indent=2)

guolinke · 2019-12-02T02:50:18Z

maybe in Booster Construct:

LightGBM/python-package/lightgbm/basic.py

Lines 1713 to 1716 in bbc45fe

    
           _safe_call(_LIB.LGBM_BoosterCreate( 
        
               train_set.construct().handle, 
        
               c_str(params_str), 
        
               ctypes.byref(self.handle)))

we can construct dataset first, and then copy its params to booster, then construct booster.
BTW, maybe this line in Dataset Construct is needed: https://github.com/microsoft/LightGBM/pull/2594/files#diff-732a5a5220860efcac575e9e956bbaeaR855

For the field mismatch, I think it is caused by multiple contributors, we can unify them.

StrikerRUS · 2019-12-02T12:57:08Z

@guolinke

we can construct dataset first, and then copy its params to booster, then construct booster.

OK, I see. Let's work on that within #2208 after merging this PR and #2594.

For the field mismatch, I think it is caused by multiple contributors, we can unify them.

OK, I think it'll be good refactoring to have and it will help a lot of third-party libraries that work with LightGBM model's dumps. I'll create a separate issue for this.

guolinke · 2019-12-03T02:30:06Z

@StrikerRUS actually, I think the parameter write back should be in #2594, otherwise, the reset config checking for dataset may fail..

StrikerRUS · 2019-12-28T04:19:43Z

Blocked by #2594.

StrikerRUS · 2020-02-24T16:55:22Z

@guolinke
As #2594 has been merged, I think we can get back to this PR. I'm copying my old comments from the thread above to make it easier to follow the discussion.

Maybe we can add a tag, e.g. [no-save], to these parameters, and skip them in to_string.

I've added [no-save] tag in the latest commit and applied it to predict and convert_model tasks' params.
Maybe we can apply it to "output" training values as well? I mean, params like

LightGBM/include/LightGBM/config.h

Lines 488 to 491 in 102893a

// alias = model_output, model_out

// desc = filename of output model in training

// desc = **Note**: can be used only in CLI version

std::string output_model = "LightGBM_model.txt";

LightGBM/include/LightGBM/config.h

Lines 564 to 567 in 102893a

// alias = is_save_binary, is_save_binary_file

// desc = if ``true``, LightGBM will save the dataset (including validation data) to a binary file. This speed ups the data loading for the next time

// desc = **Note**: can be used only in CLI version; for language-specific packages you can use the correspondent function

bool save_binary = false;

LightGBM/include/LightGBM/config.h

Lines 784 to 787 in 102893a

// check = >0

// alias = output_freq

// desc = frequency for metric output

int metric_freq = 1;

WDYT about the list of candidate params for the tag?

guolinke · 2020-02-25T00:47:57Z

Thanks @StrikerRUS , I agree with you, some of the parameters don't need to save.

output_model
verbosity
metric_freq
save_binary

StrikerRUS · 2020-02-25T23:21:11Z

@guolinke I added some more params in the latest commit, please check. Maybe we need to ignore more params, e.g. data, valid?

guolinke · 2020-03-05T01:57:53Z

@StrikerRUS I think data and valid need to save, for the cli users.

guolinke

LGTM

save all param values into model file

8da7435

StrikerRUS changed the title ~~[python] save all param values into model file~~ [WIP][python] save all param values into model file Nov 24, 2019

StrikerRUS commented Nov 24, 2019

View reviewed changes

revert storing predict params

29d7c44

StrikerRUS changed the title ~~[WIP][python] save all param values into model file~~ [python] save all param values into model file Nov 25, 2019

StrikerRUS marked this pull request as ready for review November 25, 2019 14:00

StrikerRUS requested review from chivee, henry0312, jameslamb, Laurae2 and wxchan as code owners November 25, 2019 14:00

do not save params for predict and convert tasks

0275dc0

StrikerRUS force-pushed the save_params branch from a2302a3 to 0275dc0 Compare November 27, 2019 22:51

fixed test: 10 is found successfully for default 100

0f8fc96

Merge branch 'master' into save_params

c03a104

StrikerRUS added the in progress label Dec 28, 2019

fixed conflicts

71f2487

specify more params as no-save

80e1d42

StrikerRUS added 2 commits February 26, 2020 20:12

fixed conflict

3ac9996

Merge branch 'master' into save_params

7eeef6e

guolinke added the fix label Mar 1, 2020

StrikerRUS added awaiting review and removed in progress labels Mar 1, 2020

Merge branch 'master' into save_params

16028b9

StrikerRUS requested a review from guolinke March 4, 2020 21:26

guolinke approved these changes Mar 5, 2020

View reviewed changes

Merge branch 'master' into save_params

5bcc9aa

StrikerRUS removed the awaiting review label Mar 5, 2020

StrikerRUS merged commit ba15a16 into master Mar 6, 2020

StrikerRUS deleted the save_params branch March 6, 2020 12:48

StrikerRUS mentioned this pull request Mar 6, 2020

incorrect logged parameters in model file #2208

Closed

lock bot locked as resolved and limited conversation to collaborators May 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python] save all param values into model file #2589

[python] save all param values into model file #2589

StrikerRUS commented Nov 24, 2019

StrikerRUS Nov 24, 2019

guolinke Nov 24, 2019

StrikerRUS Nov 24, 2019

guolinke Nov 25, 2019

StrikerRUS Nov 25, 2019

StrikerRUS Nov 25, 2019

StrikerRUS Nov 25, 2019

guolinke Nov 26, 2019

StrikerRUS Nov 27, 2019

StrikerRUS Dec 2, 2019

StrikerRUS commented Nov 27, 2019

StrikerRUS commented Nov 27, 2019

StrikerRUS commented Dec 1, 2019

guolinke commented Dec 1, 2019

guolinke commented Dec 1, 2019

StrikerRUS commented Dec 1, 2019

guolinke commented Dec 2, 2019 •

edited

Loading

StrikerRUS commented Dec 2, 2019

guolinke commented Dec 3, 2019

StrikerRUS commented Dec 28, 2019

StrikerRUS commented Feb 24, 2020

guolinke commented Feb 25, 2020

StrikerRUS commented Feb 25, 2020

guolinke commented Mar 5, 2020

guolinke left a comment

	// alias = model_output, model_out
	// desc = filename of output model in training
	// desc = Note: can be used only in CLI version
	std::string output_model = "LightGBM_model.txt";

	// alias = is_save_binary, is_save_binary_file
	// desc = if ``true``, LightGBM will save the dataset (including validation data) to a binary file. This speed ups the data loading for the next time
	// desc = Note: can be used only in CLI version; for language-specific packages you can use the correspondent function
	bool save_binary = false;

	// check = >0
	// alias = output_freq
	// desc = frequency for metric output
	int metric_freq = 1;

[python] save all param values into model file #2589

[python] save all param values into model file #2589

Conversation

StrikerRUS commented Nov 24, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StrikerRUS commented Nov 27, 2019

StrikerRUS commented Nov 27, 2019

StrikerRUS commented Dec 1, 2019

guolinke commented Dec 1, 2019

guolinke commented Dec 1, 2019

StrikerRUS commented Dec 1, 2019

guolinke commented Dec 2, 2019 • edited Loading

StrikerRUS commented Dec 2, 2019

guolinke commented Dec 3, 2019

StrikerRUS commented Dec 28, 2019

StrikerRUS commented Feb 24, 2020

guolinke commented Feb 25, 2020

StrikerRUS commented Feb 25, 2020

guolinke commented Mar 5, 2020

guolinke left a comment

Choose a reason for hiding this comment

guolinke commented Dec 2, 2019 •

edited

Loading