You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please describe the bug you're experiencing is precise as possible.
I suspect that Y and Y pred are handed to scoring functions with unequal shapes. This can
lead some custom scorers to fail if users are unaware.
To Reproduce
Steps to reproduce the behavior:
I have X (shape: 100, 10) and y (shape: 100, ) obtained using scikitlearns make_regression
I registered this using autosklearn's make_scorer.
However, somehow this was not working so I did
some additional debugging with the following scoring function:
Now, the output showed me that one was shape (33, 1) and the other was (33,)
The problem is that scipy.stats.pearsonr behaves differently in this case from when the input shapes are the same.
This was a bit difficult to debug, although arguably now that I know, its easy to deal with in my scoring function.
But not knowing this did cost some time, so I thought it would be useful for you to know that this behaviour was a bit unexpected and let
to some problem for this scoring function (which is, I admit a rather weird edge case.)
Here I also add a completely runnable example showcasing what I mean:
I assume what you observe here is not a difference in shape but the difference between a python array (can have any dimensionality) and a python matrix (is always 2D) (https://towardsdatascience.com/6-key-differences-between-np-ndarray-and-np-matrix-objects-e3f5234ae327). The reason why you are getting an output array instead of one correlation value might be that your scoring function seems to give you back a matrix. However, the question of course remains why you are getting different types to begin with.
Describe the bug
Please describe the bug you're experiencing is precise as possible.
I suspect that Y and Y pred are handed to scoring functions with unequal shapes. This can
lead some custom scorers to fail if users are unaware.
To Reproduce
Steps to reproduce the behavior:
I have X (shape: 100, 10) and y (shape: 100, ) obtained using scikitlearns make_regression
I fit the following model:
my scoring function is defined as:
(i know, its not such a sensible metric but long story)
I registered this using autosklearn's make_scorer.
However, somehow this was not working so I did
some additional debugging with the following scoring function:
Now, the output showed me that one was shape (33, 1) and the other was (33,)
The problem is that scipy.stats.pearsonr behaves differently in this case from when the input shapes are the same.
This was a bit difficult to debug, although arguably now that I know, its easy to deal with in my scoring function.
But not knowing this did cost some time, so I thought it would be useful for you to know that this behaviour was a bit unexpected and let
to some problem for this scoring function (which is, I admit a rather weird edge case.)
Here I also add a completely runnable example showcasing what I mean:
Expected behavior
A clear and concise description of what you expected to happen.
I expected y and y_true to be handed to scoring functions with the same shape. Not sure if this is the case for scikitlearn also.
Environment and installation:
Please give details about your installation:
Ubuntu 20.04
python3 venv
3.8.10
0.15.0
The text was updated successfully, but these errors were encountered: