-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caffe models support #442
Caffe models support #442
Conversation
@pranv I haven't really looked at it yet, just added your code and cleaned it up. Check out the changes I made, they should help you improve your coding style. I'll do a full review in the next few days. |
@fchollet |
Started code-reviewing. I find the code quite difficult to understand, and I believe it could be considerably simplified :( |
As far as I know, the layers are treated in the standard way. The complexity is present in getting the graph. Again, as far as I know, this is pretty much standard. There aren't other libraries that do this, hence it was hard. And I'm pretty sure that all steps are required, and I couldn't figure out a way to simplify them. |
I haven't understood everything yet. But this part seems quite complicated and difficult to follow: network = make_network(layers, phase) # obtain the nodes that make up the graph
if len(network) == 0:
raise Exception('failed to construct network from the prototext')
network = acyclic(network) # Convert it to be truly acyclic
network = merge_layer_blob(network) # eliminate 'blobs', just have layers
reverse_network = reverse(network) # reverse, to obtain the network_input
network_inputs = get_inputs(reverse_network) # inputs of the network - 'in-order' is zero
network_outputs = get_outputs(network) # outputs of the network - 'out-order' is zero
network = remove_label_paths(network, network_inputs, network_outputs) # path from input to loss layers removed
reverse_network = reverse(network) # stores the 'input' for each layer Would there be any way to convert the Caffe prototext to a structure such as this in one pass? I'll look into it. Otherwise, I'm worried a bit about redundancy. Technically we already have code to build a model from a 'config' (though the config format is a bit different, but could be unified): https://github.com/fchollet/keras/blob/master/keras/utils/layer_utils.py#L18 |
The caffe definition of models is a bit weird. Technically, and according to the documentation, caffe accepts all Directed Acyclic Graphs (DAG). But in reality, several functions like At the same time, there are no specified input and output layers. Since it is a DAG, the nodes with 0 in-degree are input nodes. Similarly, the nodes with 0 out-degree are output nodes. This has to be found out. Other methods that use "one pass" cannot handle things like Siamese Networks - they place a constraint that the model is sequential. I create a graph as a dictionary. I'll find the node which maps to none, that is the output node. I'll reverse the network, find nodes which maps to none to get the start nodes. Now one more important aspect of caffe is that the same The Hope that clarifies some things. |
Another addition: All names are blobs. |
I noticed the 'data' input isn't present when loading the caffemodel. Is that by design as per your comment above? I'm not able to get my results to match with caffe for the VGG 16 layer network (prototxt and caffemodel). It seems to be an issue with the input layer being 'conv1_1' instead of 'data'. For reference - here is my pycaffe code (where net is loaded from the prototxt and caffemodel): self.net = caffe.Net(cnn_model_def, cnn_model_params)
out = self.net.forward(**{self.net.inputs[0]: image_batch})
features = out[self.net.outputs[0]].squeeze(axis=(2, 3)) And here is my not-so-equivalent keras code: model = convert.CaffeToKeras(
prototext='cnn_params/VGG_ILSVRC_16_layers_deploy_features.prototxt',
caffemodel='cnn_params/VGG_ILSVRC_16_layers.caffemodel',
phase='test')
graph = model('network')
graph.compile('rmsprop', {graph.outputs.keys()[0]: 'mse'})
features = graph.predict({graph.inputs.keys()[0]: batch_images}, batch_size=1, verbose=1) |
@asampat3090 @pranv Any thoughts on how this could be addressed? |
I think I had addressed this issue before. The input name isn't I've added 'model.inputs' and 'model.outputs' to help in this regard. model = convert.CaffeToKeras(
prototext='cnn_params/VGG_ILSVRC_16_layers_deploy_features.prototxt',
caffemodel='cnn_params/VGG_ILSVRC_16_layers.caffemodel') #phase irrelavent in deploy mode
graph = model('network')
graph.compile('rmsprop', {model.outputs[0]: 'mse'})
features = graph.predict({model.inputs[0]: batch_images}, batch_size=1, verbose=1) And does pycaffe run the code in caffe or in keras? This PR is about converting caffe models to keras, and running the network in keras. And although this is a couple of lines more, which code seems simpler to you? |
The big question is:
So, does this implementation produce a Keras model that is capable of reproducing the Caffe results? Or Not? If not, what should be done to address it? |
Yea @pranv I understand the syntax but the concern is that it doesn't return the same result. As far as I can tell I have used the correct Keras code. Have you tried running this against caffe for known models? Doesn't seem to match up. @fchollet I'll have to delve deeper into the code to figure out next steps. |
@asampat3090 what exactly is the difference in result? |
They just don't match up at all. What would be the best way to show you the difference? Can I send you the pickle files? For context, I am taking the output features from the 'fc7' layer w/ a 1x4096 feature vector. The features from keras have no zero values while the caffe one does. There doesn't seem to be any simple linear relation. |
@asampat3090 okay. Do go through the code and let me know of my mistake. |
The only error I can think of is that the weights had to be transposed. Anything else would lead to dimensionality errors. Or, in VGGs case, there could be an issue with the |
weights_p = np.zeros((nb_filter, stack_size, nb_col, nb_row)) | ||
weights_b = np.array(blobs[1].data) | ||
|
||
chunk_data_size = len(blobs[0].data) // group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for reference, this is how chainer does this:
def _setup_convolution(self, layer):
blobs = layer.blobs
param = layer.convolution_param
ksize = _get_ksize(param)
stride = _get_stride(param)
pad = _get_pad(param)
num = _get_num(blobs[0])
channels = _get_channels(blobs[0])
n_in = channels * param.group
n_out = num
func = functions.Convolution2D(n_in, n_out, ksize, stride, pad,
nobias=not param.bias_term)
func.W.fill(0)
part_size = len(blobs[0].data) // param.group
for i in six.moves.range(param.group):
in_slice = slice(i * n_in // param.group,
(i+1) * n_in // param.group)
out_slice = slice(i * n_out // param.group,
(i+1) * n_out // param.group)
w = func.W[out_slice, in_slice]
data = numpy.array(blobs[0].data[i*part_size:(i+1)*part_size])
w[:] = data.reshape(w.shape)
if param.bias_term:
func.b[:] = blobs[1].data
setattr(self.fs, layer.name, func)
self.forwards[layer.name] = func
self._add_layer(layer)
And this is how sklearn-theano does this:
def _blob_to_ndarray(blob):
"""Converts a caffe protobuf blob into an ndarray"""
dimnames = ["num", "channels", "height", "width"]
data = np.array(blob.data)
shape = tuple([getattr(blob, dimname) for dimname in dimnames])
return data.reshape(shape)
@fchollet For a measure of complexity, I think the above two repos might give an estimate. |
It seems like the weights for the fully connected layers were transposed incorrectly, but otherwise the weights and biases for the convolution and fully connected layers all matched up with caffe. However there is still something wrong. It probably isn't the group parameter since I checked the weights against caffe's after the model was imported, but it could be an issue with pooling or relu layers (made changes here). Any other ideas? It does seem like it could be a relu issue given that the keras result has no zero values and the caffe one does...but not sure how to check it. Caffe features sample (correct): array([-0. , -0. , 2.7828536 , ..., -0. ,
-0. , 1.70453358], dtype=float32) Keras features sample: array([[-24.98186874, 21.50024796, -1.37967587, ..., -28.88113213,
-3.03378105, 5.12496424]]) |
@asampat3090 Thank you for that.
|
@asampat3090 also, do post the code changes you made (if any) |
@asampat3090 Maybe you could add a test to perform the check you did automatically? |
That would be great, if it's doable. Such a check should absolutely be part of testing that Caffe import does work. |
What model method are you using to check your results? Dropout will only be applied in 'training' mode, i.e. using Also, it would be great to have your code so we can reproduce these results and investigate. |
@pranv you can submit a PR to the |
I have removed ~230 lines of code. Any concerns? 8a31762 I haven't tested for correctness yet, so although the conversion yields valid Keras models, it is not clear whether the weights do reproduce the original Caffe network... By the way, I believe it would be simpler to just convert Caffe models to a Keras config dictionary, then instantiate the model from this config, instead of converting the Caffe model to an instantiated Keras model. I will look into it in the near future. Any concerns? Also I will still be looking for a simpler way to do this step: https://github.com/fchollet/keras/blob/8a31762d0703d704befbbfe4850079bc6223f027/keras/caffe/converters.py#L29-L38 |
Based a on a quick skim:
Overall, it was a good idea to reduce almost 2 similar functions to 1. |
Also, I had a conversation with @szagoruyko, the author of the torch equivalent and a member of the torch team(FAIR). He thought the code was fine |
Checking out your changes, what was your justification for removing the transposition for the weights of the fully connected layers? That doesn't seem right. |
@fchollet sorry for the delayed response. I'm using the VGG 16 layer caffemodel and have a custom prototext file (https://gist.github.com/asampat3090/c63a6f5082bab64e8a74). As for the weights, I compared the weights in caffe to those in keras and transposed those weights because they didn't match. Unfortunately it still didn't result in the correct answer given my prototxt. |
@asampat3090, please try with the latest code again |
From caffe issue somewhere. |
Sorry for posting on the older post. @fchollet referring to our previous convo (others please see #368). I used the caffemodel file for the VGG 16 layer as given in the Model Zoo (http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_16_layers.caffemodel). I'm not sure why the caffe model would be in training mode - I did specify the phase to be test (or so I thought) via: @pranv oh interesting...I wonder how I can prevent that - or at the very least know what that scaling is. It seems like that may be out of my control if I am using the pycaffe wrapper then. Any ideas? |
@asampat3090, can you divide the entire image by 255 and try? |
Unfortunately that didn't work. I tried to divide the caffe input by 255, which made the caffe output change and have nearly as many non-zero values as the keras output, but the numbers were still off - also, I'm not confident that would be the correct result for caffe anyways. Alternatively I tried to multiply the keras input by 255, but that didn't change much. It probably does have to do with how the image is interpreted though since I've verified the weights matched up with the caffe blobs for each layer. Or I suppose the layers could not be connected up correctly but that seems unlikely. @pranv do you think you could try running my code and see what you get? |
By the way, wouldn't it be easier (less prone to issues in the test code itself) to look at much simpler networks first (eg. single FC layer, or single Conv layer)? With a single-layer network any potential error (in the conversion or in the test code) should be immediately obvious.
You can look at the Keras network via |
@asampat3090, I suggested that you divide keras input by 255 |
@asampat3090 it could be done this way... but, more cleanly: couldn't we build a few single-layer (or 2-layers) sequential Caffe models, save their weights to a It's worth noting that unit tests need to be fast, so the existing tests are essentially useless since they take too long to actually be run every time we push to master. This approach would solve that. |
Hi, |
@dasguptar Thanks :) There is a small bug in convolution layers. And unit tests have to be added. I'm sure it'll be done within this week. |
I noticed a small difference between Keras and Caffe's implementations of the Dropout layer. |
What is the status on this? |
Exciting work! and what is the status on this? |
@grisaitis Thank you |
No description provided.