-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setting parameters differently per round #129
Comments
Yes, it can change most of parameters during training, just implement your own callbacks. You can follow the implementation of learning rate callback. It achieves this by calling Booster.reset_parameter. And I just have a check, num_leaves, min_data_in_leaf, feature_fraction and bagging_fraction can be changed during training. |
@guolinke I think we can turn Btw, I tried to change metrics during training before, seem not work. But I think this is an expected behavior, right? |
@wxchan , I think it should work now. check the code: BTW, currently reset logic is not efficient except learning rate. Maybe we can also optimize for some common parameters. And we need to define which parameters can be reset during training. And throw exception or warning if passed wrong parameters. |
@guolinke I know it's implemented, but it takes very long time when I used it, did you try it? |
@gugatr0n1c I think it can work perfect now. |
@guolinke tested. Much better now. |
great thx, I will wait for docs from wxchan about callback function, I am newbie in python, so not sure how to use it |
@gugatr0n1c I already added to docs.
|
thx for the example, I just compile latest version and get this error by calling this (was working before update): num_leaves_list = np.random.randint(500, 1000, iterace).tolist() 2016-12-18 17:45:41 I find out that I need to change it to "valid_sets", but then I got this: Traceback (most recent call last): Not sure, what to do now.. my data are numpy ndarrays, thx |
model = lg.train( this is working, so feel free to close this issue |
ok, one more thing.. I tried just simple test to compare if I am using it right and just called num_leaves_list = [750] * iterace and comparing with just model where I set num_leaves = 750, without callback What I understand this should give same result (not exactly because of stochasticity during training), but callback version give me much worse performance. Any idea about this? |
@gugatr0n1c did you use "feature_fraction" or "bagging" ? |
@guolinke yes, I used both in params... now it seems to be working nicely, accuracy is much better, thx! |
I can confirm now, that this brings boost in accuracy. num_leaves_list = np.random.randint(500, 1000, iterace).tolist() Thx. Closing this issue. |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Hi,
there is way in Python to set learning_rate differently for each boosting round as follows:
gbm = lgb.train(params,
lgb_train,
num_boost_round=10,
init_model=gbm,
learning_rates=lambda iter: 0.05 * (0.99 ** iter),
valid_sets=lgb_eval)
Is it possible to allow this for other parameters as well?
num_leaves
min_data_in_leaf
feature_fraction
bagging_fraction
I tested this in xgboost un-directly, with building not one model with 10k tree, but with 1k models, each with 10 tree. Each model was little bit different and there was boost in accuracy, similar what brings DART. There is no research paper on this. But idea is to build slightly different models.
The flow is than as follows:
1] find optimum parameters,ie num_leaves = 750
2] define values around this point [600, 680, 750, 830, 900]
3] generate list of random choices from this list: length = num_boosting_rounds
4] call train with this list and take for each boosting round parameetr from list
Then this can be stored with one model, not with 1k models.
Randomize tree is one argument, second is as this:
Similary as decreasing learning_rate, we can want to slowly increase min_data_in_leaf to figth overfitting.
thx for consideration
The text was updated successfully, but these errors were encountered: