Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: too many indices for array #1

Open
jiandanjinxin opened this issue Sep 20, 2017 · 0 comments
Open

IndexError: too many indices for array #1

jiandanjinxin opened this issue Sep 20, 2017 · 0 comments

Comments

@jiandanjinxin
Copy link

Dear Taccio Yamamoto,
The project is remarkable. I have the following questions to need your help.
My data is creditcard.csv from kaggle fraud-detection . The preprocessing is from https://github.com/yazanobeidi/fraud-detection/blob/master/project.ipynb

After I run the following code, the problem comes out.

for sampler in [None, rus, ros, sm]:
if not sampler:
pipeline = Pipeline([('lr', linear_model.LogisticRegression(n_jobs=-1))])
else:
pipeline = Pipeline([('sampler', sampler), ('lr', linear_model.LogisticRegression(n_jobs=-1))])

parameters = {'lr__C':[.0001, .001, .01, .1, .5, .75, 1, 2.5, 5, 10]}
gscv = model_selection.GridSearchCV(cv=3, error_score='raise', n_jobs=-1,
                             scoring=aucpr, 
                             verbose=True,
                             estimator=pipeline, 
                             param_grid=parameters)
gscv.fit(X_train, y_train)
print(gscv.best_estimator_)
print(gscv.best_score_)

Fitting 3 folds for each of 10 candidates, totalling 30 fits

IndexError Traceback (most recent call last)
in ()
10 parameters = {'lr_C':[.0001, .001, .01, .1, .5, .75, 1, 2.5, 5, 10]}
11 gscv = GridSearchCV(cv=3, error_score='raise', n_jobs=-1,scoring=aucpr, verbose=True,estimator=pipeline, param_grid=parameters)
---> 12 gscv.fit(XX_train, YY_train)
13 print(gscv.best_estimator_)
14 print(gscv.best_score_)

/usr/local/lib/python2.7/dist-packages/sklearn/model_selection/_search.pyc in fit(self, X, y, groups, **fit_params)
636 error_score=self.error_score)
637 for parameters, (train, test) in product(candidate_params,
--> 638 cv.split(X, y, groups)))
639
640 # if one choose to see train score, "out" will contain train score info

/usr/local/lib/python2.7/dist-packages/sklearn/model_selection/_split.pyc in split(self, X, y, groups)
330 n_samples))
331
--> 332 for train, test in super(_BaseKFold, self).split(X, y, groups):
333 yield train, test
334

/usr/local/lib/python2.7/dist-packages/sklearn/model_selection/_split.pyc in split(self, X, y, groups)
93 X, y, groups = indexable(X, y, groups)
94 indices = np.arange(_num_samples(X))
---> 95 for test_index in self._iter_test_masks(X, y, groups):
96 train_index = indices[np.logical_not(test_index)]
97 test_index = indices[test_index]

/usr/local/lib/python2.7/dist-packages/sklearn/model_selection/_split.pyc in _iter_test_masks(self, X, y, groups)
624
625 def _iter_test_masks(self, X, y=None, groups=None):
--> 626 test_folds = self._make_test_folds(X, y)
627 for i in range(self.n_splits):
628 yield test_folds == i

/usr/local/lib/python2.7/dist-packages/sklearn/model_selection/_split.pyc in make_test_folds(self, X, y)
611 for test_fold_indices, per_cls_splits in enumerate(zip(*per_cls_cvs)):
612 for cls, (
, test_split) in zip(unique_y, per_cls_splits):
--> 613 cls_test_folds = test_folds[y == cls]
614 # the test split can be too big because we used
615 # KFold(...).split(X[:max(c, n_splits)]) when data is not 100%

IndexError: too many indices for array

I am looking forward to your reply. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant