Load back saved parameters with save_model to Booster object #2613

everdark · 2019-12-05T13:30:08Z

Environment info

Operating System:
Windows 10 (Same result on both Windows and WSL)

CPU/GPU model:
Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz

C++/Python/R version:
Python 3.7

LightGBM version or commit hash:
2.3.1 installed by pip

Error message

Reproducible examples

import json
import lightgbm as lgb
import pandas as pd
import numpy as np
from sklearn.metrics import mean_squared_error

try:
    import cPickle as pickle
except BaseException:
    import pickle

print('Loading data...')
# load or create your dataset
df_train = pd.read_csv('../binary_classification/binary.train', header=None, sep='\t')
df_test = pd.read_csv('../binary_classification/binary.test', header=None, sep='\t')
W_train = pd.read_csv('../binary_classification/binary.train.weight', header=None)[0]
W_test = pd.read_csv('../binary_classification/binary.test.weight', header=None)[0]

y_train = df_train[0]
y_test = df_test[0]
X_train = df_train.drop(0, axis=1)
X_test = df_test.drop(0, axis=1)

num_train, num_feature = X_train.shape

# create dataset for lightgbm
# if you want to re-use data, remember to set free_raw_data=False
lgb_train = lgb.Dataset(X_train, y_train,
                        weight=W_train, free_raw_data=False)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train,
                       weight=W_test, free_raw_data=False)

# specify your configurations as a dict
params = {
    'boosting_type': 'gbdt',
    'objective': 'binary',
    'metric': 'binary_logloss',
    'num_leaves': 31,
    'learning_rate': 0.05,
    'feature_fraction': 0.9,
    'bagging_fraction': 0.8,
    'bagging_freq': 5,
    'verbose': 0
}

# generate feature names
feature_name = ['feature_' + str(col) for col in range(num_feature)]

print('Starting training...')
# feature_name and categorical_feature
gbm = lgb.train(params,
                lgb_train,
                num_boost_round=10,
                valid_sets=lgb_train,  # eval training data
                feature_name=feature_name,
                categorical_feature=[21])

print(gbm.params) # Check params.
gbm.save_model('model.txt')

gbm2 = lgb.Booster(model_file='model.txt')
print(gbm2.params) # Nothing.

Coding example above is directly borrowed from official example advanced_example.py
I've confirmed the parameters have been written to model file.
Here is the trailing of the file:

parameters:
[boosting: gbdt]
[objective: binary]
[metric: binary_logloss]
[tree_learner: serial]
[device_type: cpu]
[data: ]
[valid: ]
[num_iterations: 100]
[learning_rate: 0.05]
[num_leaves: 31]
[num_threads: 0]
[max_depth: -1]
[min_data_in_leaf: 20]
[min_sum_hessian_in_leaf: 0.001]
[bagging_fraction: 0.8]
[pos_bagging_fraction: 1]
[neg_bagging_fraction: 1]
[bagging_freq: 5]
[bagging_seed: 3]
[feature_fraction: 0.9]
[feature_fraction_bynode: 1]
[feature_fraction_seed: 2]
[early_stopping_round: 0]
[first_metric_only: 0]
[max_delta_step: 0]
[lambda_l1: 0]
[lambda_l2: 0]
[min_gain_to_split: 0]
[drop_rate: 0.1]
[max_drop: 50]
[skip_drop: 0.5]
[xgboost_dart_mode: 0]
[uniform_drop: 0]
[drop_seed: 4]
[top_rate: 0.2]
[other_rate: 0.1]
[min_data_per_group: 100]
[max_cat_threshold: 32]
[cat_l2: 10]
[cat_smooth: 10]
[max_cat_to_onehot: 4]
[top_k: 20]
[monotone_constraints: ]
[feature_contri: ]
[forcedsplits_filename: ]
[forcedbins_filename: ]
[refit_decay_rate: 0.9]
[cegb_tradeoff: 1]
[cegb_penalty_split: 0]
[cegb_penalty_feature_lazy: ]
[cegb_penalty_feature_coupled: ]
[verbosity: 0]
[max_bin: 255]
[max_bin_by_feature: ]
[min_data_in_bin: 3]
[bin_construct_sample_cnt: 200000]
[histogram_pool_size: -1]
[data_random_seed: 1]
[output_model: LightGBM_model.txt]
[snapshot_freq: -1]
[input_model: ]
[output_result: LightGBM_predict_result.txt]
[initscore_filename: ]
[valid_data_initscores: ]
[pre_partition: 0]
[enable_bundle: 1]
[max_conflict_rate: 0]
[is_enable_sparse: 1]
[sparse_threshold: 0.8]
[use_missing: 1]
[zero_as_missing: 0]
[two_round: 0]
[save_binary: 0]
[header: 0]
[label_column: ]
[weight_column: ]
[group_column: ]
[ignore_column: ]
[categorical_feature: ]
[predict_raw_score: 0]
[predict_leaf_index: 0]
[predict_contrib: 0]
[num_iteration_predict: -1]
[pred_early_stop: 0]
[pred_early_stop_freq: 10]
[pred_early_stop_margin: 10]
[convert_model_language: ]
[convert_model: gbdt_prediction.cpp]
[num_class: 1]
[is_unbalance: 0]
[scale_pos_weight: 1]
[sigmoid: 1]
[boost_from_average: 1]
[reg_sqrt: 0]
[alpha: 0.9]
[fair_c: 1]
[poisson_max_delta_step: 0.7]
[tweedie_variance_power: 1.5]
[max_position: 20]
[lambdamart_norm: 1]
[label_gain: ]
[metric_freq: 1]
[is_provide_training_metric: 0]
[eval_at: ]
[multi_error_top_k: 1]
[num_machines: 1]
[local_listen_port: 12400]
[time_out: 120]
[machine_list_filename: ]
[machines: ]
[gpu_platform_id: -1]
[gpu_device_id: -1]
[gpu_use_dp: 0]

end of parameters

pandas_categorical:[]

Is this behavior by design?
I found this because I'm using shap with saved model and it failed to compute shap values due to the fact that shap need to access objective in the params, which is gone if the Booster is a pre-trained and re-loaded one.

As of now my workaround is to also pass params to Booster when loading:

gbm2 = lgb.Booster(model_file='model.txt', params=params)

However I don't think this is a good practice since there is no way to make sure the passed params are consistent with the saved model.

The text was updated successfully, but these errors were encountered:

StrikerRUS · 2019-12-05T16:53:32Z

@everdark Thanks for your report! I think it's very related to #2604 and #2208. Also it'll require something like LGBM_BoosterGetConfig or adding [out] out_config argument to the existing functions at cpp side. @guolinke

StrikerRUS · 2021-03-27T21:45:10Z

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

zyxue · 2021-10-20T17:11:44Z

is there any update on this issue?

jameslamb · 2021-10-25T04:09:45Z

is there any update on this issue?

@zyxue , thanks for your interest in LightGBM!

If you're interested in working on this feature and contributing, let us know and we'd be happy to answer questions you have.

Otherwise, you can subscribe to notifications on this issue for updates.

zyxue · 2021-10-25T17:43:42Z

hey @jameslamb , I'm interested in giving it a try. Do you have guidance on where to start?

jameslamb · 2021-11-01T05:38:38Z

Thanks @zyxue !

I'd start by reading the issues @StrikerRUS mentioned at #2613 (comment), just to get a better understanding of this part of the code base

Next, I'd add a test to https://github.com/microsoft/LightGBM/blob/da98f24711a2faab17f94e5b2a636e6609c93fa6/tests/python_package_test/test_basic.py using the reproducible example provided by @everdark. That test should fail until your changes are made.

Next, try to work through changes on the C++ side based on @StrikerRUS's statement #2613 (comment).

it'll require something like LGBM_BoosterGetConfig or adding [out] out_config argument to the existing functions at cpp side

Here's the relevant Python code that's called to create a Booster from a model .txt file. Note that it calls LGBM_BoosterCreateFromModelfile().

LightGBM/python-package/lightgbm/basic.py

Lines 2635 to 2648 in da98f24

    
           elif model_file is not None: 
        
               # Prediction task 
        
               out_num_iterations = ctypes.c_int(0) 
        
               self.handle = ctypes.c_void_p() 
        
               _safe_call(_LIB.LGBM_BoosterCreateFromModelfile( 
        
                   c_str(str(model_file)), 
        
                   ctypes.byref(out_num_iterations), 
        
                   ctypes.byref(self.handle))) 
        
               out_num_class = ctypes.c_int(0) 
        
               _safe_call(_LIB.LGBM_BoosterGetNumClasses( 
        
                   self.handle, 
        
                   ctypes.byref(out_num_class))) 
        
               self.__num_class = out_num_class.value 
        
               self.pandas_categorical = _load_pandas_categorical(file_name=model_file)

I believe you'll need to create a proposal for extracting the Config_ property from the Booster after it's loaded.

LightGBM/src/boosting/gbdt.h

Line 459 in da98f24

std::unique_ptr<Config> config_;

"Config" is the word we use in LightGBM's C++ code to refer to an object that holds all parameters (see e.g. #4724 (review)).

Here's code called by LGBM_BoosterCreateFromModelfile() which gets parameters from the model text file.

LightGBM/src/boosting/gbdt_model_text.cpp

Lines 571 to 596 in d517ba1

    
           bool is_inparameter = false; 
        
           std::stringstream ss; 
        
           Common::C_stringstream(ss); 
        
           while (p < end) { 
        
             auto line_len = Common::GetLine(p); 
        
             if (line_len > 0) { 
        
               std::string cur_line(p, line_len); 
        
               if (cur_line == std::string("parameters:")) { 
        
                 is_inparameter = true; 
        
               } else if (cur_line == std::string("end of parameters")) { 
        
                 break; 
        
               } else if (is_inparameter) { 
        
                 ss << cur_line << "\n"; 
        
                 if (Common::StartsWith(cur_line, "[linear_tree: ")) { 
        
                   int is_linear = 0; 
        
                   Common::Atoi(cur_line.substr(14, 1).c_str(), &is_linear); 
        
                   linear_tree_ = static_cast<bool>(is_linear); 
        
                 } 
        
               } 
        
             } 
        
             p += line_len; 
        
             p = Common::SkipNewLine(p); 
        
           } 
        
           if (!ss.str().empty()) { 
        
             loaded_parameter_ = ss.str(); 
        
           }

I'll re-open this issue for now since you're planning to work on it. We have a policy in this repo of keeping feature request issues marked "closed" if no one is working on them, so if for any reason you decide not to work on this feature for now, please let me know so we can re-close it.

And if you are interested in contributing but feel that this feature is not right for you, now that you know more about it, let me know what you're looking to work on and I'd be happy to suggest another one. Thanks again for your help!

zyxue · 2021-11-04T05:24:48Z

Thank you @jameslamb for the informative guide! I'll try to get to it.

zyxue · 2021-11-08T01:17:47Z

loaded_parameter_ isn't accessible via Boosting Class in cpp code right? loaded_parameter_ looks an attribute specific to GBDT only?

zyxue · 2021-11-18T17:03:13Z

Hey @jameslamb , do you have any feedback on my PR above, please? I wonder if that's the right direction for loading back saved the params?

jameslamb · 2021-11-18T17:28:00Z

thanks for starting on the work @zyxue ! We will get to reviewing it as soon as possible.

I and a few other maintainers here work on LightGBM in our spare time, so we can sometimes be slow to respond (especially for larger features like this one which require more effort to review). Thanks for your patience.

) (#5424)

jameslamb · 2023-08-18T02:53:28Z

This was locked accidentally. I just unlocked it. We'd still welcome contributions related to this feature!

StrikerRUS mentioned this issue Dec 20, 2020

[R-package] allow access to params in Booster #3662

Merged

StrikerRUS mentioned this issue Jan 28, 2021

v3.2.0 release #3872

Merged

StrikerRUS mentioned this issue Mar 19, 2021

Deepcopy lightgbm model causing missing parameters? #4085

Closed

StrikerRUS changed the title ~~Booster model load didn't load back saved parameters with save_model~~ Load back saved parameters with save_model to Booster object Mar 27, 2021

StrikerRUS added feature request help wanted labels Mar 27, 2021

StrikerRUS mentioned this issue Mar 27, 2021

Feature Requests & Voting Hub #2302

Open

StrikerRUS closed this as completed Mar 27, 2021

StrikerRUS mentioned this issue Apr 2, 2021

[python-package] Support 2d collections as input for init_score in multiclass classification task #4150

Merged

jameslamb reopened this Nov 1, 2021

jameslamb assigned zyxue Nov 1, 2021

zyxue mentioned this issue Nov 14, 2021

WIP: Load back parameters from saved model file (fixes #2613) #4802

Closed

jameslamb mentioned this issue Jan 1, 2022

Load back parameters when loading Booster from lgb.cv() saved to text file? #4883

Closed

jameslamb mentioned this issue May 31, 2022

predict() requires DataFrame to have category dtype, but should be able to infer which fields are categorical #5244

Open

johnpaulett mentioned this issue May 31, 2022

[python] Reapply the trained categorical columns when predicting #5246

Closed

jmoralez mentioned this issue Jun 27, 2022

Categorical Features in Booster object #5321

Closed

jmoralez mentioned this issue Aug 16, 2022

[python-package][R-package] load parameters from model file (fixes #2613) #5424

Merged

jmoralez closed this as completed in #5424 Oct 11, 2022

jmoralez added a commit that referenced this issue Oct 11, 2022

[python-package][R-package] load parameters from model file (fixes #2613

8b72084

) (#5424)

This comment was marked as off-topic.

Sign in to view

github-actions bot locked as resolved and limited conversation to collaborators Aug 15, 2023

microsoft unlocked this conversation Aug 18, 2023

jameslamb mentioned this issue Sep 5, 2023

[python-package] [R-package] include more params in model text representation (fixes #6010) #6077

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load back saved parameters with save_model to Booster object #2613

Load back saved parameters with save_model to Booster object #2613

everdark commented Dec 5, 2019 •

edited

Loading

StrikerRUS commented Dec 5, 2019

StrikerRUS commented Mar 27, 2021

zyxue commented Oct 20, 2021

jameslamb commented Oct 25, 2021

zyxue commented Oct 25, 2021

jameslamb commented Nov 1, 2021

zyxue commented Nov 4, 2021

zyxue commented Nov 8, 2021

zyxue commented Nov 18, 2021

jameslamb commented Nov 18, 2021

This comment was marked as off-topic.

jameslamb commented Aug 18, 2023

Load back saved parameters with save_model to Booster object #2613

Load back saved parameters with save_model to Booster object #2613

Comments

everdark commented Dec 5, 2019 • edited Loading

Environment info

Error message

Reproducible examples

StrikerRUS commented Dec 5, 2019

StrikerRUS commented Mar 27, 2021

zyxue commented Oct 20, 2021

jameslamb commented Oct 25, 2021

zyxue commented Oct 25, 2021

jameslamb commented Nov 1, 2021

zyxue commented Nov 4, 2021

zyxue commented Nov 8, 2021

zyxue commented Nov 18, 2021

jameslamb commented Nov 18, 2021

This comment was marked as off-topic.

jameslamb commented Aug 18, 2023

everdark commented Dec 5, 2019 •

edited

Loading