-
Notifications
You must be signed in to change notification settings - Fork 872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SequentialFeatureSelector hyperparameter propagation doesn't work properly with GridSearch #813
Comments
I've just realized that a part of this is happening intentionally as explained here: #456 (comment) But there must still be something wrong because in above code the Also, now I'm wondering how I can couple the hyperparameters of feature selection and grid search. |
Thanks for taking the time to put together this informative example and illustration of the issue. I agree with you that this is a bug or confusing behavior at the least. It's probably due to cloning the estimator like you suggested. To get the desired behavior, I believe that setting sfs1 = SFS(...
clone_estimator=False) should do the trick. This should be mentioned in the documentation. Also, it would be great to have a warning message if someone is using the defaults in the grid search context -- not sure how to catch this use case scenario well though. |
Thanks for your response!
So the configured hyperparameter is used for fitting and predicting, but at the end there is still a classifier with the default settings used. Is it possible that it now does feature selection with the hyperparameters set by grid search but in the end it still uses default parameters? I don't understand what the purpose of the last fit/predict step is. In comparison, here is the output with the pipeline containing only the
So the default parameters appear only when SFS is used. Regarding the documentation I would suggest to definitely mention this in the example section, since "GridSearch" and "SequentialFeatureSelection" were the keywords that made me think this was exactly what I was looking for, but it is actually a different way of combining these methods. Maybe even make it two examples, one with a shared classifier, and one with clearly different classifiers for feature selection and grid search, so the user is aware of multiple options existing in this context. |
Thanks for looking into this further!
I was confused about that at first too -- this shouldn't happen. But then looking over your code more closely, this is because of the following line: clf = DebugClassifier(max_depth=10)
sfs1 = SFS(estimator=clf,
k_features=3,
forward=True,
floating=False,
scoring='accuracy',
cv=5)
pipe = Pipeline([('sfs', sfs1),
('clf', clf)]) ### <---------------------------
param_grid = [
{#'sfs__k_features': [1, 4],
'sfs__estimator__max_depth': [1, 5]}
]
gs = GridSearchCV(estimator=pipe,
param_grid=param_grid,
scoring='accuracy',
n_jobs=1,
cv=5,
#iid=True,
refit=False) I think what's happening here is that it is tuning the sequential feature selector internally, and then it is somehow using the original classifier. I think that's because GridSearchCV may have made a clone of it. I am not exactly sure. To avoid this issue, you can make it two different entities. One is used internally by the sequential feature selector, and one is used externally. E.g., clf1 = DebugClassifier(max_depth=10)
clf2 = DebugClassifier(max_depth=7)
sfs1 = SequentialFeatureSelector(estimator=clf1,
k_features=3,
forward=True,
floating=False,
scoring='accuracy',
clone_estimator=False,
cv=5)
pipe = Pipeline([('sfs', sfs1),
('clf2', clf2)])
param_grid = [
{
'sfs__estimator__max_depth': [1, 5],
'clf2__max_depth': [1, 5],
}]
gs = GridSearchCV(estimator=pipe,
param_grid=param_grid,
scoring='accuracy',
n_jobs=1,
cv=2,
#iid=True,
refit=False) Broadly, it is best to think of the classifier in the But yeah, this should probably be made more clear in the documentation. |
Thanks for the tips, I believe the following solution based on your suggestions should do what I want: I encapsulated the classifier in the SFS and disabled cloning, which allowed GridSearch to configure the hyperparameters of the classifier that is also used for feature selection. But GridSearch wants to score/predict with the given estimator, which isn't supported by SFS, so I added a simple wrapper that delegates the predicting while transforming the data with the optimized feature selection. # Same imports as above
class SFSWrapperEstimator(SFS):
def predict(self, X, *args, **kwargs):
return self.estimator.predict(self.transform(X), *args, **kwargs)
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=123)
clf = KNeighborsClassifier()
sfs1 = SFSWrapperEstimator(estimator=clf,
k_features=3,
forward=True,
floating=False,
scoring='accuracy',
cv=False,
clone_estimator=False)
param_grid = [{
'estimator__n_neighbors': [1, 5]
}]
gs = GridSearchCV(estimator=sfs1,
param_grid=param_grid,
scoring='accuracy',
n_jobs=1,
cv=5,
#iid=True,
refit=True,
)
# run gridearch
gs = gs.fit(X_train, y_train)
for i in range(len(gs.cv_results_['params'])):
print(gs.cv_results_['params'][i], 'test acc.:', gs.cv_results_['mean_test_score'][i])
gs.best_estimator_.estimator.n_neighbors Output:
I checked the output of the Edit: Just realized that SFS doesn't refit the estimator after feature selection but leaves it with the last tried combination. |
This is nice. Yeah, I think it should be added to the library (or the documentation at least).
It could be a general Whether I run param_grid = [{
'estimator__n_neighbors': [50, 3]
}]
gs = GridSearchCV(estimator=sfs1,
param_grid=param_grid,
scoring='accuracy',
n_jobs=1,
cv=5,
#iid=True,
refit=True,
)
# run gridearch
gs = gs.fit(X_train, y_train)
for i in range(len(gs.cv_results_['params'])):
print(gs.cv_results_['params'][i], 'test acc.:', gs.cv_results_['mean_test_score'][i])
gs.best_estimator_.estimator.n_neighbors or run the following (order of params flipped) param_grid = [{
'estimator__n_neighbors': [3, 50]
}]
gs = GridSearchCV(estimator=sfs1,
param_grid=param_grid,
scoring='accuracy',
n_jobs=1,
cv=5,
#iid=True,
refit=True,
)
# run gridearch
gs = gs.fit(X_train, y_train)
for i in range(len(gs.cv_results_['params'])):
print(gs.cv_results_['params'][i], 'test acc.:', gs.cv_results_['mean_test_score'][i])
gs.best_estimator_.estimator.n_neighbors it both results in the same best estimator params. When I then check
it prints 5 (the default value passed in before grid search, that's probably |
Interesting. But that doesn't seem to have been the issue here, because I accessed the (cloned) optimized SFS via So I updated my solution by also overriding class SFSWrapperEstimator(SFS):
def fit(self, X, y, **kwargs):
super().fit(X, y, **kwargs)
self.estimator.fit(self.transform(X), y, **kwargs)
def predict(self, X, *args, **kwargs):
return self.estimator.predict(self.transform(X), *args, **kwargs) It's quite ugly and inefficient, a better solution would be to store all the fitted estimators of SFS or at least their parameters to be able to provide the best estimator afterwards in a dedicated attribute and without additional fitting. |
Yesterday when you first mentioned that I checked the SFS code as well and couldn't find anything about the refitting after optimization either. It is a bit hard to find out what's going on exactly because there are so many layers (SFS, pipeline, GridSearchCV). The proposed workaround with the |
Sounds good. Apart from that, using the initial approach still seems to cause a bug (cited below), right?
It doesn't affect my suggested solution but might still be an issue for others. |
I just had some time to revisit this. I think the issue(s) can currently be avoided by not using pipe.set_params(**gs.best_params_).fit(X_train, y_train) instead. I am updated the docs via #875 accordingly. Also, at this point it may make sense not to change the SFS behavior but to focus more on the replacement via #834 |
Describe the bug
When using sklearn's
GridSearchCV
withSequentialFeatureSelector
, the configured hyperparameter values are not properly propagated to the actual classifier that is used for fitting and predicting. I put together a MWE below that is based on example 8 in the docs, the only major change is the custom classifier.In the output listed in the docs you can see that the score doesn't change with the
k
parameter of the KNN, which is very strange.While searching for similar issues I found that this has already been mentioned in multiple other issues, e.g. #456 and #511. Below you can see the unexpected behavior in the suggested approach.
Steps/Code to Reproduce
Expected Results
Actual Results
As you can see, the value
1
for the hyperparametermax_depth
is correctly configured for some classifier, however while fitting and predicting it appears that a different classifier is used, where the default value ofmax_depth=10
is still set.Versions
The text was updated successfully, but these errors were encountered: