Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add prefit to VotingClassifier #7382

Open
arvieFrydenlund opened this issue Sep 9, 2016 · 16 comments
Open

Add prefit to VotingClassifier #7382

arvieFrydenlund opened this issue Sep 9, 2016 · 16 comments
Labels

Comments

@arvieFrydenlund
Copy link

arvieFrydenlund commented Sep 9, 2016

EDIT: Never mind, saw that I need to call fit on it and not just pass in already fitted models, which is how I wanted to used it. Apologies.

Looking at the voting_classifier.py

class VotingClassifier(BaseEstimator, ClassifierMixin, TransformerMixin):
"""Soft Voting/Majority Rule classifier for unfitted estimators.

.. versionadded:: 0.17

Read more in the :ref:`User Guide <voting_classifier>`.

Parameters
----------
estimators : list of (string, estimator) tuples
    Invoking the ``fit`` method on the ``VotingClassifier`` will fit clones
    of those original estimators that will be stored in the class attribute
    `self.estimators_`.

....

def __init__(self, estimators, voting='hard', weights=None):

    **self.estimators = estimators**
    self.named_estimators = dict(estimators)
    self.voting = voting
    self.weights = weights

def _predict(self, X):
    """Collect results from clf.predict calls. """
    return np.asarray([clf.predict(X) for clf in self.estimators_]).T

So it looks like the initialization of self.estimators should be self.estimators_

My code:

import numpy
import sys

from collections import OrderedDict
from scipy import special, stats
from scipy.stats import f_oneway
from sklearn.utils import safe_mask
from sklearn.feature_selection import SelectKBest, f_classif, f_regression
from sklearn import svm
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.grid_search import GridSearchCV
import sklearn.svm
from sklearn.feature_selection import RFE

from sklearn.cross_validation import StratifiedShuffleSplit
from sklearn.ensemble import VotingClassifier

from scipy.stats.stats import pearsonr

from DataStream import *
from Config import genders

def training2(X, y, test_X = None, test_y = None):

sss = StratifiedShuffleSplit(train_y, 3, test_size=0.5, random_state=0)
estimators = []
i = 0
for train_index, test_index in sss:
    i += 1
    print train_index
    print(i)

    clf = svm.NuSVC() #gamma=0.5, nu=0.8)
    clf.fit(X.astype(float), y.astype(float))
    #print(clf)
    estimators.append(('clf'+str(i), clf))

eclf = VotingClassifier(estimators=estimators, voting='hard')
print(estimators)


if test_X is not None and test_y is not None:
    print(testing(X=test_X, y=test_y, clf=eclf))

return clf

Actual Results

Traceback (most recent call last):
File "/u/arvie/PHD/DCCA_Experiment2/expA_SVM.py", line 165, in
my_svm = training2(apply_feature_selection(train_X, features), train_y, apply_feature_selection(test_X, features), test_y)
File "/u/arvie/PHD/DCCA_Experiment2/expA_SVM.py", line 57, in training2
print(testing(X=test_X, y=test_y, clf=eclf))
File "/u/arvie/PHD/DCCA_Experiment2/expA_SVM.py", line 116, in testing
return accuracy_score(y, clf.predict(X), normalize=True)
File "/u/arvie/.local/lib/python2.7/site-packages/sklearn/ensemble/voting_classifier.py", line 149, in predict
predictions = self.predict(X)
File "/u/arvie/.local/lib/python2.7/site-packages/sklearn/ensemble/voting_classifier.py", line 226, in predict
return np.asarray([clf.predict(X) for clf in self.estimators
]).T
AttributeError: 'VotingClassifier' object has no attribute 'estimators
'

Versions

import sys; print("Python", sys.version)
('Python', '2.7.6 (default, Jun 22 2015, 17:58:13) \n[GCC 4.8.2]')

import numpy; print("NumPy", numpy.version)

('NumPy', '1.11.0')

import scipy; print("SciPy", scipy.version)
('SciPy', '0.17.0')
import sklearn; print("Scikit-Learn", sklearn.version)
('Scikit-Learn', '0.17.1')

@arvieFrydenlund arvieFrydenlund changed the title return np.asarray([clf.predict(X) for clf in self.estimators_]).T AttributeError: 'VotingClassifier' object has no attribute 'estimators_' AttributeError: 'VotingClassifier' object has no attribute 'estimators_' Sep 9, 2016
@amueller
Copy link
Member

amueller commented Sep 9, 2016

hm it looks like currently we currently don't support "prefit" estimators. That's a bit surprising to me.
I also don't find an issue for that, though I'm sure it was discussed. Maybe there's something in #4161?

I think the easy solution would be to add a prefit parameter to VotingClassifier which would make fit and empty operation and would allow using self.estimators in predict and transform.
We shouldn't be setting self.estimators_ in init.

Another option would be to check in predict and transform whether self.estimators_ exist and otherwise use self.estimators. That's a little bit more implicit, though. Also, it might break in pipelines, depending on what you want to do.

Adding a prefit parameter probably breaks cloning, right @jnothman ? How did we do that in CalibratedClassifier?

@agramfort
Copy link
Member

agramfort commented Sep 10, 2016 via email

@jnothman
Copy link
Member

jnothman commented Sep 10, 2016

Did you call fit()?

@jnothman
Copy link
Member

Again, we need my memoized estimator wrapper so that there's no cost in fitting again....

Adding a prefit parameter probably breaks cloning, right @jnothman ? How did we do that in CalibratedClassifier?

We shouldn't need to clone if we're merely predicting...

@jnothman
Copy link
Member

we just need to decide when it's okay to allow prefit arguments and when not. they don't really fit in very well...

@amueller
Copy link
Member

@jnothman sorry for being unclear. I meant that our CalibratedClassifierCV with prefit is not cloneable, right? Because we would need the memoized estimator for that.

@jnothman
Copy link
Member

jnothman commented Sep 12, 2016

No, not cloneable.

@amueller amueller added the API label Jun 6, 2017
@amueller
Copy link
Member

amueller commented Jun 6, 2017

also see #8374, #8370

@kwitaszczyk
Copy link

Are there any plans to support already fitted models in VotingClassifier?

@jnothman
Copy link
Member

jnothman commented Apr 19, 2018 via email

@mfeurer
Copy link
Contributor

mfeurer commented May 28, 2020

Hey together, as it appears that there is a conclusion in #8370, would it be possible to add the feature of passing already fitted estimators to the VotingClassifier and VotingRegressor? If yes, @franchuterivera could jump on this feature if there is a concrete way how to implement this.

In case anyone already needs this feature in the meantime, one can apply the following hack which appears to work as long as one restricts to predict_proba().

@thomasjpfan
Copy link
Member

As described in #7382 (comment), the quick solution is to have fit set estimators_ to estimators when prefit=True. I do not see many issues with having this feature in Voting*. Is this okay with everyone else?

@mfeurer
Copy link
Contributor

mfeurer commented May 28, 2020

Sorry for missing that and thanks for pointing this out. We would then also ignore the LabelEncoder I guess?

@thomasjpfan
Copy link
Member

I do not think we can ignore LabelEncoder. The VotingClassifer uses it in predict to inverse_transform the labels.

An option would also be to pass in the LabelEncoder into VotingClassifer.__init__. I am +0.3 on this option.

@eddiebergman
Copy link
Contributor

I'd like pick this up and do a PR to implement the prefit=True feature so that pre-trained models can be passed into Voting* objects.

@thomasjpfan thomasjpfan changed the title AttributeError: 'VotingClassifier' object has no attribute 'estimators_' Add prefit to VotingClassifier May 27, 2022
@jondo
Copy link
Contributor

jondo commented May 8, 2023

My workaround is to set the pre-trained estimators after creation:

eclf = VotingClassifier(estimators=None, voting='soft')
eclf.estimators_ = trained_ensemble_estimators
y_pred_proba = eclf.predict_proba(X_test)

Edit: Ah, this is exacly @mfeurer's hack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants