Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In Multiclass-Classification eval methods gets a 1D array instead of a 2D array #5113

Closed
datajanko opened this issue Mar 31, 2022 · 5 comments

Comments

@datajanko
Copy link

datajanko commented Mar 31, 2022

Description

When adding evaluation set and a custom evaluation metric in the multi class classification case, the evaluation of the iteration does not receive a 2D matrix but a vector.

Specifically, the eval function shall expect the following shapes of y_pred:

y_pred numpy 1-D array of shape = [n_samples] or numpy 2-D array of shape = [n_samples, n_classes] (for multi-class task)

But in the multi class case this does not work, see below

Reproducible example

import pandas as pd
import numpy as np
from lightgbm import LGBMClassifier

train_df = pd.DataFrame({"A": [0,1,2,3,2,1,0], "B":[0.1, 0.0, 0.3, 0.4, 0.5, 0.6, 0.7], "y": [0,1,2,3,0,1,2]
             })

def dummy_metric(y_true, y_pred):
      print(y_pred.shape)
      print(y_pred)
      return 'dummy', 0, False

model = LGBMClassifier(n_estimators=1, objective='multiclass')
model.fit(train_df.drop('y', axis=1), train_df['y'], eval_metric=dummy_metric, 
          eval_set=[(train_df.drop('y', axis=1), train_df['y'])])

Output

(28,)
[0.30769231 0.30769231 0.30769231 0.30769231 0.30769231 0.30769231
 0.30769231 0.30769231 0.30769231 0.30769231 0.30769231 0.30769231
 0.30769231 0.30769231 0.30769231 0.30769231 0.30769231 0.30769231
 0.30769231 0.30769231 0.30769231 0.07692308 0.07692308 0.07692308
 0.07692308 0.07692308 0.07692308 0.07692308]
[1]	valid_0's multi_logloss: 1.3767	valid_0's dummy: 0
Out[41]:
LGBMClassifier(n_estimators=1, objective='multiclass')

So we see an array of shape 28 which is 4(classes) * 7 (rows) but not an array.

Environment info

LightGBM version or commit hash:
3.3.2

Command(s) you used to install LightGBM

pip install lightgbm

Additional Comments

I wanted to implement something like a (continuous) ranked probability score function. And I needed to compute the cumulative sum across axis=1 which fails. Errors seem to happen around the __inner_eval function

We are seeing the code to do the reshaping here:

if self.__num_class > 1:

        if self.__num_class > 1:
            num_data = result.size // self.__num_class
            result = result.reshape(num_data, self.__num_class, order='F')

but for whatever reason, we don't seem to enter that branch 🤔

@jmoralez
Copy link
Collaborator

Hi @datajanko, thank you for the very detailed description. The change to using 2D collections in multi-class classification was merged after the 3.3.2 release, so it isn't available in that version. If you want to use it you can install from GitHub or install from the nightly builds. If you're unable to do that you have to use 1D arrays as described in the note here. Please let us know if you have further doubts.

@datajanko
Copy link
Author

datajanko commented Mar 31, 2022

I'm sorry I thought I used the latest version and didn't expect this to change so drastically. Also I didn't find an open Issue that mentions this. I guess I'll just use the workaround I hinted at above in my metric, i.e. reshape using the Fortran order. I guess we can close this issue, right? Thanks for the quick response!

@jmoralez
Copy link
Collaborator

No worries. Thanks for the detailed report.

@jameslamb
Copy link
Collaborator

+1, AWESOME report with reproducible example, we really really appreciate the time you put into writing that up.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants