-
Notifications
You must be signed in to change notification settings - Fork 19.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keras: near-future directions #754
Comments
@fchollet How would You like to deal with the |
@elanmart that would be way too implicit. No, it's actually really simple: after we introduce an |
Don't forget this also means re-initialize the optimizer (for adam, rmsprop and other moving averages).
How would you train models with different backends though? You pick and choose one at a time right? I would also suggest a better way of dealing with desired output shapes. Right now, Keras implicitly assumes the desired is either a matrix (samples x dim) or a tensor3 (samples x time x dim). I had an image as desired and it took me days (partially my bad though) to realize what was wrong, my (samples x rows x cols) was being reshaped as (samples-rows x cols) without warning. This is really dangerous since the method just compiled and ran like nothing was wrong. |
@EderSantana right, I also think a simplified and more intuitive way of applying loss functions, masks and weights would be welcome. |
It would be most important and valuable features in the future. Also I think we should be able to implement any neural network easily using Keras backend API only without complicated theano codes (for instance, implementing GRU with keras api only like #620 ) |
Yes, the recurrent Container would be great to have. It could really speed up experimenting with different RNN architectures. |
First, I'll apologize for not completing the caffe PR yet. I'll try to do it over the weekend. Maybe some of the things done in Caffe PR could be used in shape inference and inferring input and output nodes? The very first and bad implementation of non sequential models that I wrote did something like that - fallback to recently added layer if previous node is not explicitly mentioned. |
I have started doing this in my fork because current way of using convolutional layers is too difficult. I can do the same with all layers and make a PR. |
@matsuyamax in your solution can the user still define the shapes himself and continue compatible with the current API? |
@EderSantana I don't think compatibility can be achieved if we make input dimensions optional, because they are not keyword arguments. Constructor of convolutional layers looks like this: def __init__(self, input_dim, nb_filter, filter_length, init='uniform', activation='linear', ...) Currently we use it like this: layer = Convolution1D(input_dim, nb_filter, filter_length, init='uniform', activation='linear', ...) We want to make layer = Convolution1D(nb_filter, filter_length, init='uniform', activation='linear', ...) That's not possible with compatibility. |
@matsuyamax that wouldn't be the most smooth solution I think. Sometimes we just need a quick layer and skip the lazy-initialization that inferred shapes would impose. What if input_dim is optional and the inferred value has the last word? Like, for example, lets assume the Dense layer class Dense(Layer)
def __init__(output_dim, input_dim=None, *args, **kwargs):
self.input_dim = input_dim
self.output_dim = output_dim
self. ...
...
self.initialize()
def initialize(self):
if self.input_dim is None:
raise ValueError("Using lazy initialization, either define `input_dim` or set a `previous_layer`)
else:
self.W = self.init((self.input_dim, self.output_dim))
self.b = shared_zeros((self.output_dim))
def set_previous(self, layer)
self.previous = layer
if self.input_dim is None:
self.input_dim = layer.output_dim
elif self.input_dim != layet.output_dim:
warn("Overwriting `input_dim` of layer {}. The user defined parameter differs from what we inferred".format(self.name)) This way, I would not be forced to define two layers (input and output) in the cases where I need only one of them. What do you think? |
I agree, but your solution breaks compatibility too. You see what I mean? We could do your solution, but then we would have to implement input shape management in every layer layer = Dense(output_dim)
layer.set_input_shape(input_shape) or layer = Dense(output_dim)
layer.input_shape = input_shape Then we would have a single method common with all layer. |
The major problem with shape inference is that the layer objects are created first and then added to the model. So, at initialization, the object has no idea whether it will receive an input dimension or should it raise an error. The way I did this was to basically split up the current initialization method into two parts and calling the second part (the one where parameters are initialized) after the layer has been added to the model. Backwards compatibility is possible, the code just gets ugly. |
Loss functions could be reduced to layers. |
TL;DR Maybe we should expose less OOP and makes things even easier to use.
The more layers we introduce, the longer it will take to write a model and start training it. One of the things I like the most about Keras is that you can actually memorize all the steps to write a model, compile and run the experiment. If we introduce too much boilerplate, like a layer for input, a layer for the cost, lazy only initialization, etc. we are going to be just another Blocks and we will always have to go back to copy paste from the documentation to do anything. I'm not saying Blocks is bad, I used to contribute to it as well. I'm just saying that we don't need another Blocks. What do you think? I honestly believe that this simple API is a huge PLUS about Keras. That is why it is more popular than Blocks or Lasagne which were started almost at the same time. I'm not saying we should just appeal to the masses, I'm saying that simple is the ultimate sophistication and I believe Keras did it and we shouldn't go back. Maybe the solution is in the direction of less OOP not even more OOP. I also think that progress means making things even easier to use, like an optional lazy initialization or easier to develop RNNs. For the case of losses, the reshaping and masking should go inside the objective. This would give the user more control when necessary, without making to hidden. For example, the mse cost would just be def mse(y_true, y_pred):
return T.sqr(y_true.flatten(ndim=2) - y_pred.flatten(ndim=2)).mean(axis=-1) In other words the reshaping goes inside the objective, and we never collapse data dimensions with the first def mse(y_true, y_pred, mask=None):
if mask is None:
return T.sqr(y_true.flatten(ndim=2) - y_pred.flatten(ndim=2)).mean(axis=-1)
else
cost = T.sqr(y_true.flatten(ndim=2) - y_pred.flatten(ndim=2)) * mask
return cost.sum(axis=-1) / mask.sum(axis=-1) where |
I think |
I agree. Being able to build networks by heart after using Keras a few times is proof that it does a good job at reducing cognitive load, and we want to keep that. Deep learning should be like playing with Duplo blocks and coloring with crayons. We can back off from the idea of having I guess we could have an input argument
It don't think it's necessary to have object losses, because losses don't have a state, unlike optimizers and layers. Everything you could do with a loss layer, you should be able to do it with a loss function. Also, if it's a loss layer then it's not a layer. Layers only access the previous layer's input.
Sure, but what of masking and weighting?
OOP is a big part of what made Keras successful and it is the most appropriate paradigm for deep learning, where modules are stateful (e.g. layers). |
We just have to enforce an
I showed an example of how to do masking above. def objective_decorator(func):
@wraps(func)
def wrapper(y_pred, y_true, mask, sample_weight):
y_pred = y_pred.flatten(ndim=2)
y_true = y_true.flatten(ndim=2)
cost = func(y_pred, y_true)
cost = cost[sample_weight]
if mask is None:
return cost.mean(axis=-1)
else:
mask = mask[sample_weight]
return cost.sum(axis=-) / (mask.sum(axis=-1) + 1e-7)
return wrapper With this decorator, we wouldn't even have to rewrite a lot in the objectives we already have. Usage would just be @objective_decorator
def mse(y_pred, y_true)
return T.sqr(y_pred - y_true)
I agree, we just don't have to force the user to always define classes to be able to use Keras. |
decorators are not super intuitive tho... Thoughts? |
Adding output shape inference to all layers has given me to occasion to dive deeper into the repo. Here are some thoughts about the code in general. I think the code has potential, but several issues need addressing.
|
It would be a rather elegant solution. I'm all for it.
I would be fine with such a change.
Again, this is fine.
Feel free to add it. |
A few comments:
What about adding support for Convolution3D and ZeroPadding3D as well? I think I have this working in my fork. In the same fork, I have added support to freeze layers. This now works for Graph as well and the
I lost quite some time time on that one too. Reshaping should definitely be inside the objective function.
Another nice addition would be an |
We have UpSampling1D and 2D layers, I think this is what you are looking for.
Possibly. You can submit a PR and I will review it.
Certainly. We already have a PR with Convolution3D undergoing review, you might want to review it to make sure you're on the same page. |
I would love to see that. I would like to use Keras in for "online" predictions while training is done "offline". Right now I use the save / load model procedure as described in the FAQ but it takes rather long (approx 1 min) to load everything until a prediction on new data can be made. Maybe there is already a better way to do this? |
@holderm : you can already do so. Let's say you have a compiled model in production and you would like to update it (to sync it with an identical model that you've just finished training somewhere else). You can just dump the weights of your newly trained model to HDF5, and load them in your production model without recompiling, just via |
Yes, that's true but assume training and prediction take place in different sessions. First one needs to compile the (previously trained) model and then predictions can be made. Once the model is compiled on the new seassion update of weights ( |
I was more thinking of a straight "reverse" of the convolution in a single layer like the one in Caffe or MatConvNet. This is more efficient than upsampling + convolution. |
As long as you have a compiled model with the same architecture living in each session, you will be fine. But if you don't, the model re-initialization feature we will add is not going to help you anyway. |
Here is what I think we should go for the
|
Also untrainable outputs would be nice. Such as returning the output of intermediate layers in the predict() function but without having to actually define a 0 loss function for them and slow down the training procedure. |
I am not so sure anymore about this. This will require more design work. We'll post-pone it for now. |
If i recall, There was some issues about polymorphism right? With having |
I think having The thing I am less sure of is the removal of |
@fchollet any new thoughts on the Graph API? Maybe Also, are there any plans to implement a base class for multi-io layers? |
Here are a few things I think would be valuable to add to Keras in the near future.
These are mere suggestions. You are welcome to discuss them, contest them, add your own ideas... I would like the development of Keras to be increasingly driven by the community. I don't want to be a bottleneck of development.
I won't be writing code myself, but I will happily give feedback and advice to any contributor who wants to tackle some of these features. I will also keep reviewing and merging PRs.
Here's a list of features. Their focus is on simplicity and user experience.
Input
layers as part of the above, to define expected input data dimensions.add_input
andadd_output
methods, which are cumbersome. A singleadd
method should suffice.The text was updated successfully, but these errors were encountered: