-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add default losses to KerasClassifier and KerasRegressor #208
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR. I left some more general questions.
tests/test_simple_usage.py
Outdated
* binary classification | ||
* one hot classification | ||
* single class classification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean we do not support 1 output with multiple classes? Am I getting confused by the usage of outputs vs. output units.
The most common way to set up single target multi-class problems in Keras is with output=Dense(n_classes, activation="softmax")
and one of categorical_crossentropy
or sparse_categorical_crossentropy
:
def clf():
model = tf.keras.Sequential()
model.add(tf.keras.layers.Input(shape=(FEATURES,)))
model.add(tf.keras.layers.Dense(N_CLASSES, activation="softmax"))
model.compile(loss="cce") # or "scce"
return model
Would this use no longer be supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I see now. This should work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made the uses cases where this works more clear in 9358d6e. It works for all major use cases.
This PR simply changes the default loss; it doesn't change compatibility in any way.
scikeras/wrappers.py
Outdated
def _fit_keras_model(self, *args, **kwargs): | ||
try: | ||
super()._fit_keras_model(*args, **kwargs) | ||
except ValueError as e: | ||
if ( | ||
self.loss == "categorical_crossentropy" | ||
and hasattr(self, "model_") | ||
and 1 in {o.shape[1] for o in getattr(self.model_, "outputs", [])} | ||
): | ||
raise ValueError( | ||
"The model is configured to have one output, but the " | ||
f"loss='{self.loss}' is expecting multiple outputs " | ||
"(which is often used with one-hot encoded targets). " | ||
"More detail on Keras losses: https://keras.io/api/losses/" | ||
) from e | ||
else: | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it should live in _check_model_compatibility
, or they should be merged in some way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error message only provides marginal utility: it protects against cases when the model has one output but there are multiple classes.
It can not go in _check_model_compatibility
; I wait for an error to be raised before issuing this warning (otherwise a model with a single output raises an error).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Is there a specific error message we can check for, like if "some Keras error" in str(e)
?
getattr(self.model_, "outputs", [])
Is this necessary? model_
should always have an outputs
attribute, except in the case described in #207, but that should be a separate check/error.
f"loss='{self.loss}' is expecting multiple outputs "
Can you clarify what you mean by a loss expecting a number of outputs? My understanding is that Keras "broadcasts" losses to outputs, so if you give it a scalar loss (ie.. loss="bce"
) with 2 outputs (i.e. len(model_.outputs) == 2
), it will implicitly compile the model with loss=[original_loss] * len(outputs)
. But you can actually map losses to outputs manually, by passing loss=["bce", "mse"]
or loss={"out1": "bce", "out2": "mse"}
. From the tests, it seems like by "loss is expecting multiple outputs" you mean that there is a single output unit but multiple classes, which I feel like could be confused with the above concept of configuring a loss for multiple outputs.
I'm also curious about the iteration through outputs (1 in {o.shape[1] for o in self.model_.outputs}
). SciKeras does not support >1 output out of the box (users need to override target_encoder
) so it seems a bit strange to try to account for that when using the default loss. I feel that using the default loss should only be supported for the simple single-output cases that target_encoder
supports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a side note: I think giving users better errors and validating their inputs like you are doing here can be a very valuable part of SciKeras, but currently it is done in an ad-hoc manner via _check_model_compatibility
, etc. I think if we add more of these types of things, it would be nice to have an organized interface for it. I opened #209 to try to brainstorm ideas for this.
Should we do anything for models with a single output and a single linear output unit and 2 classes? This is a subset of your error catching / test, but Keras won't raise a ValueError for it. Instead, it will train and not learn anything. The same model with import numpy as np
import tensorflow as tf
from scikeras.wrappers import KerasClassifier
N_CLASSES = 2
FEATURES = 1
n_eg = 10000
y = np.random.choice(N_CLASSES, size=n_eg).astype(int)
X = y.reshape(-1, 1).astype("float32")
def clf():
model = tf.keras.Sequential()
model.add(tf.keras.layers.Input(shape=(FEATURES,)))
model.add(tf.keras.layers.Dense(4))
model.add(tf.keras.layers.Dense(1))
return model
model = KerasClassifier(clf, loss="categorical_crossentropy", epochs=50)
model.fit(X, y).score(X, y) # ~ 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you're making some progress, thank you for keeping at it. If you're budding up against annoying / hard to debug tests let me know, I'm happy to jump in and try to iron it out.
I've got most of the issues resolved:
The tests on fully compliant Scikit-learn estimators are failing now. I'd appreciate some help. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A warning is issued for a binary classification target but loss="categorical_crossentropy"; the user might have compiled their own model.
👍
I removed the GitHub action dependence; why should the tests depend on linting? It's really annoying.
This is a somewhat arbitrary choice. I'm open to removing it, but I'd like that done in a separate PR. If you use the git pre-commit hook, you should never have a problem being annoyed by linting because you won't even be able to make a commit that fails linting (unless you manually override it).
The tests on fully compliant Scikit-learn estimators are failing now
I'll take a look.
est = KerasRegressor(model=shallow_net, model__single_output=True) | ||
assert est.loss == "mse" | ||
est.partial_fit(X, y) | ||
assert est.model_.loss.__name__ == "mean_squared_error" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the assertion that the "long" name for the loss was used in the model necessary here? I don't see the same assertion for classifiers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assert statement is present to make sure the BaseWrapper.loss
is mirror in BaseWrapper.model_.loss
. I'll add a test for KerasClassifier
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is good use case for scikeras.utils.loss_name
?
Tests are passing now; I manually specified On the whole, this PR makes SciKeras more usable, though technically backwards incompatible for a very narrow use case. That relies on a couple points:
|
I believe this PR is not breaking basic wrapper backwards compatibility: import numpy as np
from scikeras.wrappers import KerasClassifier
from tensorflow import keras
num_classes = 3
n_features = 4
n_samples = 100
X = np.random.random(size=(n_samples, n_features))
y = np.random.randint(low=0, high=num_classes, size=(n_samples,))
def get_model():
model = keras.Sequential(
[
keras.Input((n_features,)),
keras.layers.Dense(n_features),
keras.layers.Dense(num_classes, activation="softmax"),
]
)
model.compile(loss="sparse_categorical_crossentropy")
return model
# Works via Keras
get_model().fit(X, y)
# Works via Keras wrappers
keras.wrappers.scikit_learn.KerasClassifier(get_model).fit(X, y)
# Does not work with SciKeras stsievert:clf-default-loss but does work on SciKeras master
KerasClassifier(get_model).fit(X, y) # InvalidArgumentError: logits and labels must have the same first dimension This can be "fixed" by using As to why this happens, it is. because if |
...in user compiled model with N_CLASSES outputs
This error only happens when both of the following occur:
I'm going to raise an exception if the Keras model specifies |
Can't this also be solved by just defaulting to This also makes me wonder if the design decision to put the description of the data and encoding / decoding into the same place was flawed. Or if perhaps inspecting the compiled model was better than relying on the user's input. |
Yes, |
I'll try to cook up an implementation of that in the next couple of days. Regardless, thank you for all of the great work you've done on this PR so far, I think it's done a lot to illuminate our options and challenges with this feature. |
I'd like to see a new function inserted after model creation and before model compiling to choose the some compile arguments based on the model architecture. It'd be useful to have this function to choose the loss function from the number of output neurons. That'd help a lot with the |
We have |
I'm thinking of this pseudo-code: assert len(self.model_.outputs) == 1
out = self.model_.outputs[0]
if self.loss == "auto":
if out.shape[1] == 1 and y_n_unique <= 2:
loss = "binary_crossentropy"
elif out.shape[1] > 1 and y_encoded.ndim == 1:
loss = "sparse_categorical_crossentropy"
elif out.shape[1] > 1 and y_encoded.ndim > 1:
loss = "categorical_crossentropy"
self._compile_model(loss=loss) This code isn't exact – I'm not sure if all the conditions for the loss functions are right. This requires that |
This seems pretty similar to what I am proposing in #210 , please check that PR if you can. |
See thread in #208 (review) Since we use GHA as a FOSS project, we really don't pay for usage. This will increase usage because tests will run even if linting fails, but will help PR feedback TAT because failed linting won't preclude getting test data back.
Resolved via #217 |
What does this PR implement?
It adds
loss="categorical_crossentropy"
to KerasClassifier by default. It adds some protection in case the user passes multiple classes iny
but the model is only configured for one class. This implementation covers the following:Dense(10)
for MNIST).Dense(1)
).I test both these cases.
Reference issues/PR
This PR closes #206