-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable random seed averaging #40
Comments
Hi @Y-oHr-N Is it that you want to make your I'd thought that since OptGBM follows the sklearn API, it would be compatible by default. Or am I missing something? Anyway, random seed averaging is something I'll need to test as stdev of differently seeded models is sometimes very high, and averaging could generate a more robust model. Do you have datasets in mind on which it's improving? Thanks |
Hi @flamby, As you said, random seed averaging is possible by default. The first way is to pass The second way is to train This is a simple example using OptGBM 0.5.0. import lightgbm as lgb
from mllib.ensemble import RandomSeedAveragingClassifier
from optgbm.sklearn import OGBMClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
# 1. LightGBM
model = lgb.LGBMClassifier(random_state=0)
model.fit(X_train, y_train)
score = model.score(X_test, y_test) # acc = 0.960...
# 2. LightGBM + random seed averaging
model = lgb.LGBMClassifier()
model = RandomSeedAveragingClassifier(model, n_estimators=10, random_state=0)
model.fit(X_train, y_train)
score = model.score(X_test, y_test) # acc = 0.960...
# 3. OptGBM (fold averaging)
model = OGBMClassifier(n_trials=20, random_state=0)
model.fit(X_train, y_train)
score = model.score(X_test, y_test) # acc = 0.977...
# 4. OptGBM (single model)
model = OGBMClassifier(n_trials=20, random_state=0)
model.fit(X_train, y_train)
model.refit(X_train, y_train)
score = model.score(X_test, y_test) # acc = 0.980...
# 5. OptGBM (fold averaging) + random seed averaging (tune `n_estimators` times)
model = OGBMClassifier(n_trials=20)
model = RandomSeedAveragingClassifier(model, n_estimators=10, random_state=0)
model.fit(X_train, y_train)
score = model.score(X_test, y_test) # acc = 0.984...
# 6. OptGBM (fold averaging) + random seed averaging (tune only once)
model = OGBMClassifier(n_trials=20, random_state=0)
model.fit(X_train, y_train)
model = lgb.LGBMClassifier(**model.study_.best_params)
model = RandomSeedAveragingClassifier(model, n_estimators=10, random_state=0)
model.fit(X_train, y_train)
score = model.score(X_test, y_test) # acc = 0.968... By the way, mllib is not currently being maintained and most of the code has been ported to pretools. I am trying to implement random seed averaging in pretools or |
Hi @Y-oHr-N, Thanks for the clarification. Except on example 5 for which I got the below error : lib/python3.7/site-packages/lightgbm/sklearn.py in set_params(self, **params)
366 setattr(self, key, value)
367 if hasattr(self, '_' + key):
--> 368 setattr(self, '_' + key, value)
369 self._other_params[key] = value
370 return self
AttributeError: can't set attribute I'm used to rely a lot on decision threshold to improve my classification precision/recall, thanks to Thanks and keep the good work! |
I noticed the bug yesterday, immediately fixed it and released 0.5.0. If you are really using 0.5.0, please tell me your environment in detail. Example 5 works fine in my environment.
I will consider implementation positively, but I cannot guarantee that it will be implemented soon. I am glad if you wait patiently or send a PR. Thank you for your feedback. |
You're right. It appears it was a jupyter cache issue. Silly me. Restarting the kernel again fixed it.
I've already monkey patch it, taking inspiration from the sklearn's VotingClassifier way to do it Here it is. I hope I did not make mistakes. model = lgb.LGBMClassifier()
def predict_proba(self, X):
self._check_is_fitted()
probas = np.asarray([e.predict_proba(X) for e in self.estimators_])
with warnings.catch_warnings():
warnings.simplefilter('ignore', category=RuntimeWarning)
avg = np.average(probas, axis=0)
return avg
# monkey patching
RandomSeedAveragingClassifier.predict_proba = predict_proba
model = RandomSeedAveragingClassifier(model, n_estimators=10, random_state=0)
model.fit(X_train, y_train)
probas = model.predict_proba(X_test) |
Thank you for sharing your code. |
Thank you very much @Y-oHr-N |
Hi @flamby, I noticed that example 6 had a mistake. The modified code is as follows. # 6. OptGBM (fold averaging) + random seed averaging (tune only once)
model = OGBMClassifier(n_trials=20, random_state=0)
model.fit(X_train, y_train)
model = lgb.LGBMClassifier(n_estimators=model.best_iteration_, **model.best_params_)
model = RandomSeedAveragingClassifier(model, n_estimators=10, random_state=0)
model.fit(X_train, y_train)
score = model.score(X_test, y_test) # acc = 0.982... |
Hi @Y-oHr-N Thanks. I finally had time to test it, and it works like a charm. |
No description provided.
The text was updated successfully, but these errors were encountered: