Unequal shape between Y and Y pred in scoring function. #1598

LeSasse · 2022-10-13T13:12:34Z

Describe the bug

Please describe the bug you're experiencing is precise as possible.
I suspect that Y and Y pred are handed to scoring functions with unequal shapes. This can
lead some custom scorers to fail if users are unaware.

To Reproduce

Steps to reproduce the behavior:

I have X (shape: 100, 10) and y (shape: 100, ) obtained using scikitlearns make_regression

I fit the following model:

AutoSklearnRegressor(
    time_left_for_this_task=40 * 60,
    per_run_time_limit=20 * 60,
    metric=corr_func_auto_scorer,
    scoring_functions=[
        corr_func_auto_scorer,
        mean_absolute_error,
        mean_squared_error,
    ],
)

my scoring function is defined as:
(i know, its not such a sensible metric but long story)

def corr_func(y, y_pred):
    return pearsonr(y, y_pred)[0]

I registered this using autosklearn's make_scorer.
However, somehow this was not working so I did
some additional debugging with the following scoring function:

def my_score_func(y, y_true):
    print(y.shape)
    print(y_true.shape)
    return 0.5

Now, the output showed me that one was shape (33, 1) and the other was (33,)
The problem is that scipy.stats.pearsonr behaves differently in this case from when the input shapes are the same.
This was a bit difficult to debug, although arguably now that I know, its easy to deal with in my scoring function.

But not knowing this did cost some time, so I thought it would be useful for you to know that this behaviour was a bit unexpected and let
to some problem for this scoring function (which is, I admit a rather weird edge case.)

Here I also add a completely runnable example showcasing what I mean:

from autosklearn.metrics import make_scorer
from autosklearn.regression import AutoSklearnRegressor
from sklearn.datasets import make_regression


def my_score_func(y, y_true):
    print(y.shape)
    print(y_true.shape)
    return 0.0


score_func = make_scorer(
    name="my_score_func",
    score_func=my_score_func,
)

automl = AutoSklearnRegressor(
    time_left_for_this_task=5 * 60,
    per_run_time_limit=1 * 60,
    metric=score_func,
)

X, y = make_regression(n_features=10)

automl.fit(X, y)

Expected behavior

A clear and concise description of what you expected to happen.

I expected y and y_true to be handed to scoring functions with the same shape. Not sure if this is the case for scikitlearn also.

Environment and installation:

Please give details about your installation:

OS

Ubuntu 20.04

Is your installation in a virtual environment or conda environment?

python3 venv

Python version

3.8.10

Auto-sklearn version

0.15.0

verakye · 2022-10-13T14:12:48Z

I assume what you observe here is not a difference in shape but the difference between a python array (can have any dimensionality) and a python matrix (is always 2D) (https://towardsdatascience.com/6-key-differences-between-np-ndarray-and-np-matrix-objects-e3f5234ae327). The reason why you are getting an output array instead of one correlation value might be that your scoring function seems to give you back a matrix. However, the question of course remains why you are getting different types to begin with.

LeSasse · 2022-10-13T14:39:00Z

If I change the score function to:

def my_score_func(y, y_true):
    print(type(y))
    print(type(y_true))
    print(y.ndim)
    print(y_true.ndim)
    return 0.0

it prints out:

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
1
2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unequal shape between Y and Y pred in scoring function. #1598

Unequal shape between Y and Y pred in scoring function. #1598

LeSasse commented Oct 13, 2022 •

edited

Loading

verakye commented Oct 13, 2022

LeSasse commented Oct 13, 2022

Unequal shape between Y and Y pred in scoring function. #1598

Unequal shape between Y and Y pred in scoring function. #1598

Comments

LeSasse commented Oct 13, 2022 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Environment and installation:

verakye commented Oct 13, 2022

LeSasse commented Oct 13, 2022

LeSasse commented Oct 13, 2022 •

edited

Loading