KeyError: 'weight' with sklearn.feature_selection.SelectFromModel #5653

GuidoBartoli · 2020-05-11T16:02:58Z

Hi,
I'm using scikit-learn automatic feature selection together with a trained XGBoost model.
I set up a threshold to interrupt the feature reduction process when accuracy falls below it.
I think everything is fine in the loop, but when I use SelectFromModel.transform() I receive the following error:

Traceback (most recent call last):
  File "boost.py", line 581, in <module>
    s_train_x = selection.transform(train_x)
  File "/home/guido/.virtualenvs/ml/lib/python3.6/site-packages/sklearn/feature_selection/_base.py", line 77, in transform
    mask = self.get_support()
  File "/home/guido/.virtualenvs/ml/lib/python3.6/site-packages/sklearn/feature_selection/_base.py", line 46, in get_support
    mask = self._get_support_mask()
  File "/home/guido/.virtualenvs/ml/lib/python3.6/site-packages/sklearn/feature_selection/_from_model.py", line 178, in _get_support_mask
    scores = _get_feature_importances(estimator, self.norm_order)
  File "/home/guido/.virtualenvs/ml/lib/python3.6/site-packages/sklearn/feature_selection/_from_model.py", line 18, in _get_feature_importances
    coef_ = getattr(estimator, "coef_", None)
  File "/home/guido/.virtualenvs/ml/lib/python3.6/site-packages/xgboost/sklearn.py", line 716, in coef_
    coef = np.array(json.loads(b.get_dump(dump_format='json')[0])['weight'])
KeyError: 'weight'

I'm using the latest xgboost 1.0.2 with scikit-learn 0.22 and below there is the code I wrote. It's part of a bigger script, so some variable are defined before, but the KeyError should not depend on that.

report = []
prev_t = -1
scores = np.sort(model.feature_importances_)
indices = np.argsort(model.feature_importances_)
misc.msg('Feature selection (threshold = {})...'.format(autosel))
iterator = tqdm(scores)
for i, t in enumerate(iterator):
    if -1 < prev_t == t:
        continue
    prev_t = t
    selection = SelectFromModel(model, threshold=t, prefit=True)
    try:
        s_train_x = selection.transform(train_x)
    except ValueError:
        misc.msg('Incompatible number of features!', 'err')
        sys.exit(1)
    kwargs = {'tree_method': 'hist' if not gpu else 'gpu_hist',
              'grow_policy': 'lossguide' if useloss else 'depthwise'} \
        if not exact else {}
    s_model = xgb.XGBClassifier(objective=model.objective, n_jobs=-1, n_estimators=model.n_estimators,
                                max_depth=model.max_depth, learning_rate=model.learning_rate,
                                subsample=model.subsample, colsample_bytree=model.colsample_bytree,
                                min_child_weight=model.min_child_weight, gamma=model.gamma,
                                reg_alpha=model.reg_alpha, reg_lambda=model.reg_lambda,
                                max_delta_step=model.max_delta_step, random_state=model.random_state,
                                scale_pos_weight=model.scale_pos_weight, **kwargs)
    try:
        s_model.fit(s_train_x, train_y)
    except KeyboardInterrupt:
        misc.msg('Feature selection interrupted', 'warn')
        sys.exit(0)
    s_test_x = selection.transform(test_x)
    s_pred_y = s_model.predict(s_test_x)
    s_accuracy = accuracy_score(test_y, s_pred_y)
    subset = str(list(reversed(indices[i:]))).replace(',', ';')
    report.append([t, s_train_x.shape[1], s_accuracy, subset])
    if s_accuracy < args.autosel:
        iterator.close()
        misc.msg('Accuracy below threshold ({:.6f})'.format(s_accuracy), 'warn')
        misc.msg('Feature subset: {}'.format(conv.values2ranges(indices[i:])))
        break
    gc.collect()

Anyone can reproduce this behaviour?
Many thanks in advance!

The text was updated successfully, but these errors were encountered:

trivialfis · 2020-05-12T08:21:32Z

Hi, could you please post a more complete script that I can run?

GuidoBartoli · 2020-05-12T10:09:15Z

Sure, I will post it here this afternoon, so you can take a look at it.

Thanks!

GuidoBartoli · 2020-05-13T07:18:10Z

This is a minimal test.py:

from h5py import File
from joblib import load
from sklearn.feature_selection import SelectFromModel

if __name__ == '__main__':
    h5 = File('dataset.h5', 'r')
    data = h5['data'][:]
    model = load('model.mdl')
    selection = SelectFromModel(model, threshold=0.95, prefit=True).transform(data)

This is the corresponding requirements.txt:

h5py==2.10.0
joblib==0.14.1
numpy==1.18.4
scikit-learn==0.23.0
scipy==1.4.1
six==1.14.0
threadpoolctl==2.0.0
xgboost==1.0.2

Here are the dataset and model to be unzipped in the same folder as the script. The model is a xgb.XGBClassifier previously trained on the same data with the standard fit() function.

You can reproduce the reported problem with python test.py.

arthurnage · 2020-06-01T17:51:17Z

@GuidoBartoli Hi dude. I've had the same problem. I have used xgboost==1.0.0 version. Upgrading up to recent 1.1.0 helped.

hcho3 · 2020-06-17T21:32:25Z

The issue is fixed in #5505 and the example script runs fine on XGBoost 1.1.0.

GuidoBartoli · 2020-06-18T05:07:13Z

The issue is fixed in #5505 and the example script runs fine on XGBoost 1.1.0.

Perfect, many thanks!

hcho3 closed this as completed Jun 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KeyError: 'weight' with sklearn.feature_selection.SelectFromModel #5653

KeyError: 'weight' with sklearn.feature_selection.SelectFromModel #5653

GuidoBartoli commented May 11, 2020 •

edited

Loading

trivialfis commented May 12, 2020

GuidoBartoli commented May 12, 2020

GuidoBartoli commented May 13, 2020 •

edited

Loading

arthurnage commented Jun 1, 2020

hcho3 commented Jun 17, 2020

GuidoBartoli commented Jun 18, 2020

KeyError: 'weight' with sklearn.feature_selection.SelectFromModel #5653

KeyError: 'weight' with sklearn.feature_selection.SelectFromModel #5653

Comments

GuidoBartoli commented May 11, 2020 • edited Loading

trivialfis commented May 12, 2020

GuidoBartoli commented May 12, 2020

GuidoBartoli commented May 13, 2020 • edited Loading

arthurnage commented Jun 1, 2020

hcho3 commented Jun 17, 2020

GuidoBartoli commented Jun 18, 2020

GuidoBartoli commented May 11, 2020 •

edited

Loading

GuidoBartoli commented May 13, 2020 •

edited

Loading