Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unequal shape between Y and Y pred in scoring function. #1598

Open
LeSasse opened this issue Oct 13, 2022 · 2 comments
Open

Unequal shape between Y and Y pred in scoring function. #1598

LeSasse opened this issue Oct 13, 2022 · 2 comments

Comments

@LeSasse
Copy link

LeSasse commented Oct 13, 2022

Describe the bug

Please describe the bug you're experiencing is precise as possible.
I suspect that Y and Y pred are handed to scoring functions with unequal shapes. This can
lead some custom scorers to fail if users are unaware.

To Reproduce

Steps to reproduce the behavior:

I have X (shape: 100, 10) and y (shape: 100, ) obtained using scikitlearns make_regression

I fit the following model:

AutoSklearnRegressor(
    time_left_for_this_task=40 * 60,
    per_run_time_limit=20 * 60,
    metric=corr_func_auto_scorer,
    scoring_functions=[
        corr_func_auto_scorer,
        mean_absolute_error,
        mean_squared_error,
    ],
)

my scoring function is defined as:
(i know, its not such a sensible metric but long story)

def corr_func(y, y_pred):
    return pearsonr(y, y_pred)[0]

I registered this using autosklearn's make_scorer.
However, somehow this was not working so I did
some additional debugging with the following scoring function:

def my_score_func(y, y_true):
    print(y.shape)
    print(y_true.shape)
    return 0.5

Now, the output showed me that one was shape (33, 1) and the other was (33,)
The problem is that scipy.stats.pearsonr behaves differently in this case from when the input shapes are the same.
This was a bit difficult to debug, although arguably now that I know, its easy to deal with in my scoring function.

But not knowing this did cost some time, so I thought it would be useful for you to know that this behaviour was a bit unexpected and let
to some problem for this scoring function (which is, I admit a rather weird edge case.)

Here I also add a completely runnable example showcasing what I mean:

from autosklearn.metrics import make_scorer
from autosklearn.regression import AutoSklearnRegressor
from sklearn.datasets import make_regression


def my_score_func(y, y_true):
    print(y.shape)
    print(y_true.shape)
    return 0.0


score_func = make_scorer(
    name="my_score_func",
    score_func=my_score_func,
)

automl = AutoSklearnRegressor(
    time_left_for_this_task=5 * 60,
    per_run_time_limit=1 * 60,
    metric=score_func,
)

X, y = make_regression(n_features=10)

automl.fit(X, y)

Expected behavior

A clear and concise description of what you expected to happen.

I expected y and y_true to be handed to scoring functions with the same shape. Not sure if this is the case for scikitlearn also.

Environment and installation:

Please give details about your installation:

  • OS

Ubuntu 20.04

  • Is your installation in a virtual environment or conda environment?

python3 venv

  • Python version

3.8.10

  • Auto-sklearn version

0.15.0

@verakye
Copy link

verakye commented Oct 13, 2022

I assume what you observe here is not a difference in shape but the difference between a python array (can have any dimensionality) and a python matrix (is always 2D) (https://towardsdatascience.com/6-key-differences-between-np-ndarray-and-np-matrix-objects-e3f5234ae327). The reason why you are getting an output array instead of one correlation value might be that your scoring function seems to give you back a matrix. However, the question of course remains why you are getting different types to begin with.

@LeSasse
Copy link
Author

LeSasse commented Oct 13, 2022

If I change the score function to:

def my_score_func(y, y_true):
    print(type(y))
    print(type(y_true))
    print(y.ndim)
    print(y_true.ndim)
    return 0.0

it prints out:

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
1
2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants