Caffe models support #442

fchollet · 2015-07-25T04:00:41Z

No description provided.

fchollet · 2015-07-25T04:02:20Z

@pranv I haven't really looked at it yet, just added your code and cleaned it up. Check out the changes I made, they should help you improve your coding style.

I'll do a full review in the next few days.

pranv · 2015-07-25T05:32:59Z

@fchollet
Great!
Learnt a few things..

fchollet · 2015-07-26T09:50:31Z

Started code-reviewing. I find the code quite difficult to understand, and I believe it could be considerably simplified :(

pranv · 2015-07-26T10:15:41Z

As far as I know, the layers are treated in the standard way. The complexity is present in getting the graph. Again, as far as I know, this is pretty much standard. There aren't other libraries that do this, hence it was hard. And I'm pretty sure that all steps are required, and I couldn't figure out a way to simplify them.
Is there a specific area, or if the entire thing unnecessarily complex?

fchollet · 2015-07-26T12:20:14Z

I haven't understood everything yet. But this part seems quite complicated and difficult to follow:

network = make_network(layers, phase)  # obtain the nodes that make up the graph
if len(network) == 0:
    raise Exception('failed to construct network from the prototext')
network = acyclic(network)  # Convert it to be truly acyclic
network = merge_layer_blob(network)  # eliminate 'blobs', just have layers
reverse_network = reverse(network)  # reverse, to obtain the network_input
network_inputs = get_inputs(reverse_network)  # inputs of the network - 'in-order' is zero
network_outputs = get_outputs(network)  # outputs of the network - 'out-order' is zero
network = remove_label_paths(network, network_inputs, network_outputs)  # path from input to loss layers removed
reverse_network = reverse(network)  # stores the 'input' for each layer

Would there be any way to convert the Caffe prototext to a structure such as this in one pass?

I'll look into it.

Otherwise, I'm worried a bit about redundancy. Technically we already have code to build a model from a 'config' (though the config format is a bit different, but could be unified): https://github.com/fchollet/keras/blob/master/keras/utils/layer_utils.py#L18

pranv · 2015-07-26T14:26:54Z

The caffe definition of models is a bit weird. Technically, and according to the documentation, caffe accepts all Directed Acyclic Graphs (DAG). But in reality, several functions like ReLU, Dropout etc, can start and end at the same blob. This is because the computation can happen at the same memory location, ie blob. This causes cycles and naively using the bottom parameter of each layer could lead to a disaster.

At the same time, there are no specified input and output layers. Since it is a DAG, the nodes with 0 in-degree are input nodes. Similarly, the nodes with 0 out-degree are output nodes. This has to be found out. Other methods that use "one pass" cannot handle things like Siamese Networks - they place a constraint that the model is sequential.

I create a graph as a dictionary. I'll find the node which maps to none, that is the output node. I'll reverse the network, find nodes which maps to none to get the start nodes.

Now one more important aspect of caffe is that the same data layer is usually used to get both the actual data and the labels. This is an inherent non-sequentiality and keras doesn't do things this way. So that path, between data and the final loss layers has to be removed.

The reverse_network stores all the previous nodes of a given node. Hence it is a node -> [previous nodes] map.

Hope that clarifies some things.

pranv · 2015-07-28T14:18:07Z

Another addition: All names are blobs.

asampat3090 · 2015-07-31T01:53:49Z

I noticed the 'data' input isn't present when loading the caffemodel. Is that by design as per your comment above? I'm not able to get my results to match with caffe for the VGG 16 layer network (prototxt and caffemodel). It seems to be an issue with the input layer being 'conv1_1' instead of 'data'.

For reference - here is my pycaffe code (where net is loaded from the prototxt and caffemodel):

self.net = caffe.Net(cnn_model_def, cnn_model_params)
out = self.net.forward(**{self.net.inputs[0]: image_batch})
features = out[self.net.outputs[0]].squeeze(axis=(2, 3))

And here is my not-so-equivalent keras code:

model = convert.CaffeToKeras(
    prototext='cnn_params/VGG_ILSVRC_16_layers_deploy_features.prototxt',
    caffemodel='cnn_params/VGG_ILSVRC_16_layers.caffemodel',
    phase='test')
graph = model('network') 
graph.compile('rmsprop', {graph.outputs.keys()[0]: 'mse'})
features = graph.predict({graph.inputs.keys()[0]: batch_images}, batch_size=1, verbose=1)

fchollet · 2015-07-31T10:08:57Z

@asampat3090 @pranv Any thoughts on how this could be addressed?

pranv · 2015-07-31T10:26:42Z

I think I had addressed this issue before. The input name isn't 'data' since your prototext is in deploy mode. In deploy mode, the first layer is directly a processing layer and not DATA. And that layer's name has to be given as the input.

I've added 'model.inputs' and 'model.outputs' to help in this regard.

model = convert.CaffeToKeras(
    prototext='cnn_params/VGG_ILSVRC_16_layers_deploy_features.prototxt',
    caffemodel='cnn_params/VGG_ILSVRC_16_layers.caffemodel') #phase irrelavent in deploy mode
graph = model('network') 
graph.compile('rmsprop', {model.outputs[0]: 'mse'})
features = graph.predict({model.inputs[0]: batch_images}, batch_size=1, verbose=1)

And does pycaffe run the code in caffe or in keras? This PR is about converting caffe models to keras, and running the network in keras.

And although this is a couple of lines more, which code seems simpler to you?
Consider the fact that we can't expect all keras users to know caffe entirely

fchollet · 2015-07-31T10:31:11Z

The big question is:

I'm not able to get my results to match with caffe for the VGG 16 layer network (prototxt and caffemodel).

So, does this implementation produce a Keras model that is capable of reproducing the Caffe results? Or Not? If not, what should be done to address it?

asampat3090 · 2015-07-31T15:55:32Z

Yea @pranv I understand the syntax but the concern is that it doesn't return the same result. As far as I can tell I have used the correct Keras code. Have you tried running this against caffe for known models? Doesn't seem to match up. @fchollet I'll have to delve deeper into the code to figure out next steps.

pranv · 2015-07-31T16:19:32Z

@asampat3090 what exactly is the difference in result?

asampat3090 · 2015-08-01T01:45:11Z

They just don't match up at all. What would be the best way to show you the difference? Can I send you the pickle files? For context, I am taking the output features from the 'fc7' layer w/ a 1x4096 feature vector. The features from keras have no zero values while the caffe one does. There doesn't seem to be any simple linear relation.

pranv · 2015-08-01T03:45:06Z

@asampat3090 okay. Do go through the code and let me know of my mistake.

pranv · 2015-08-01T04:01:03Z

The only error I can think of is that the weights had to be transposed. Anything else would lead to dimensionality errors.

Or, in VGGs case, there could be an issue with the group parameter. But again, of this wasn't right, this would lead to dimensionality errors as well. Again, a transpose might be in necessary.

pranv · 2015-08-01T11:57:55Z

keras/caffe/converters.py

+            weights_p = np.zeros((nb_filter, stack_size, nb_col, nb_row))
+            weights_b = np.array(blobs[1].data)
+
+            chunk_data_size = len(blobs[0].data) // group


Just for reference, this is how chainer does this:

def _setup_convolution(self, layer): blobs = layer.blobs param = layer.convolution_param ksize = _get_ksize(param) stride = _get_stride(param) pad = _get_pad(param) num = _get_num(blobs[0]) channels = _get_channels(blobs[0]) n_in = channels * param.group n_out = num func = functions.Convolution2D(n_in, n_out, ksize, stride, pad, nobias=not param.bias_term) func.W.fill(0) part_size = len(blobs[0].data) // param.group for i in six.moves.range(param.group): in_slice = slice(i * n_in // param.group, (i+1) * n_in // param.group) out_slice = slice(i * n_out // param.group, (i+1) * n_out // param.group) w = func.W[out_slice, in_slice] data = numpy.array(blobs[0].data[i*part_size:(i+1)*part_size]) w[:] = data.reshape(w.shape) if param.bias_term: func.b[:] = blobs[1].data setattr(self.fs, layer.name, func) self.forwards[layer.name] = func self._add_layer(layer)

And this is how sklearn-theano does this:
def _blob_to_ndarray(blob):

"""Converts a caffe protobuf blob into an ndarray""" dimnames = ["num", "channels", "height", "width"] data = np.array(blob.data) shape = tuple([getattr(blob, dimname) for dimname in dimnames]) return data.reshape(shape)

pranv · 2015-08-01T12:00:59Z

@fchollet For a measure of complexity, I think the above two repos might give an estimate.

asampat3090 · 2015-08-02T00:53:23Z

It seems like the weights for the fully connected layers were transposed incorrectly, but otherwise the weights and biases for the convolution and fully connected layers all matched up with caffe.

However there is still something wrong. It probably isn't the group parameter since I checked the weights against caffe's after the model was imported, but it could be an issue with pooling or relu layers (made changes here). Any other ideas? It does seem like it could be a relu issue given that the keras result has no zero values and the caffe one does...but not sure how to check it.

Caffe features sample (correct):

array([-0.        , -0.        ,  2.7828536 , ..., -0.        ,
       -0.        ,  1.70453358], dtype=float32)

Keras features sample:

array([[-24.98186874,  21.50024796,  -1.37967587, ..., -28.88113213,
         -3.03378105,   5.12496424]])

pranv · 2015-08-02T04:09:59Z

@asampat3090 Thank you for that.

It does seem like it could be a relu issue given that the keras result has no zero values and the caffe one does...but not sure how to check it.

network.get_config() could offer some insight.

pranv · 2015-08-02T04:13:54Z

@asampat3090 also, do post the code changes you made (if any)

phreeza · 2015-08-03T12:50:34Z

@asampat3090 Maybe you could add a test to perform the check you did automatically?

fchollet · 2015-08-03T17:10:47Z

Maybe you could add a test to perform the check you did automatically?

That would be great, if it's doable. Such a check should absolutely be part of testing that Caffe import does work.

fchollet · 2015-08-05T03:50:38Z

@asampat3090 :

It does seem like it could be a relu issue given that the keras result has no zero values and the caffe one does...but not sure how to check it.

What model method are you using to check your results? Dropout will only be applied in 'training' mode, i.e. using model.train, etc.

Also, it would be great to have your code so we can reproduce these results and investigate.

fchollet · 2015-08-06T11:51:06Z

@pranv you can submit a PR to the caffe branch in Keras. This will be helpful since I have also local changes to caffe that I will commit later.

caffe prototext v2 support

fchollet · 2015-08-10T07:36:34Z

I have removed ~230 lines of code. Any concerns? 8a31762

I haven't tested for correctness yet, so although the conversion yields valid Keras models, it is not clear whether the weights do reproduce the original Caffe network...

By the way, I believe it would be simpler to just convert Caffe models to a Keras config dictionary, then instantiate the model from this config, instead of converting the Caffe model to an instantiated Keras model. I will look into it in the near future. Any concerns?

Also I will still be looking for a simpler way to do this step: https://github.com/fchollet/keras/blob/8a31762d0703d704befbbfe4850079bc6223f027/keras/caffe/converters.py#L29-L38

pranv · 2015-08-10T07:59:42Z

Based a on a quick skim:

I think you made an if...else decision with prototxt or caffemodel. As you can see from the discussion above, people would want to change prototxts to obtain modified models
I assumed convert_weights would be helpful in embedding something like VGG net for feature extraction
~~What happened in line 37?~~

Overall, it was a good idea to reduce almost 2 similar functions to 1.

pranv · 2015-08-10T08:02:50Z

Also, I had a conversation with @szagoruyko, the author of the torch equivalent and a member of the torch team(FAIR). He thought the code was fine

fchollet · 2015-08-11T11:50:09Z

@asampat3090

It seems like the weights for the fully connected layers were transposed incorrectly, but otherwise the
weights and biases for the convolution and fully connected layers all matched up with caffe.

Checking out your changes, what was your justification for removing the transposition for the weights of the fully connected layers? That doesn't seem right.

asampat3090 · 2015-08-11T15:38:03Z

@fchollet sorry for the delayed response. I'm using the VGG 16 layer caffemodel and have a custom prototext file (https://gist.github.com/asampat3090/c63a6f5082bab64e8a74). As for the weights, I compared the weights in caffe to those in keras and transposed those weights because they didn't match. Unfortunately it still didn't result in the correct answer given my prototxt.

pranv · 2015-08-11T16:11:33Z

@asampat3090, please try with the latest code again

pranv · 2015-08-18T04:30:23Z

On the C++ side the image is represented as uint8 in 0-255 while on the Python side scikit-image represents the image data as float in 0-1. For this reason the Python wrapper scales first so that it can re-map the data to 0-255 to match the precomputed mean.

From caffe issue somewhere.

asampat3090 · 2015-08-18T05:51:14Z

Sorry for posting on the older post. @fchollet referring to our previous convo (others please see #368). I used the caffemodel file for the VGG 16 layer as given in the Model Zoo (http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_16_layers.caffemodel).

I'm not sure why the caffe model would be in training mode - I did specify the phase to be test (or so I thought) via: net.set_phase_test()

@pranv oh interesting...I wonder how I can prevent that - or at the very least know what that scaling is. It seems like that may be out of my control if I am using the pycaffe wrapper then. Any ideas?

pranv · 2015-08-18T06:21:43Z

@asampat3090, can you divide the entire image by 255 and try?

asampat3090 · 2015-08-18T20:09:05Z

Unfortunately that didn't work. I tried to divide the caffe input by 255, which made the caffe output change and have nearly as many non-zero values as the keras output, but the numbers were still off - also, I'm not confident that would be the correct result for caffe anyways. Alternatively I tried to multiply the keras input by 255, but that didn't change much.

It probably does have to do with how the image is interpreted though since I've verified the weights matched up with the caffe blobs for each layer. Or I suppose the layers could not be connected up correctly but that seems unlikely.

@pranv do you think you could try running my code and see what you get?

fchollet · 2015-08-18T20:43:01Z

By the way, wouldn't it be easier (less prone to issues in the test code itself) to look at much simpler networks first (eg. single FC layer, or single Conv layer)? With a single-layer network any potential error (in the conversion or in the test code) should be immediately obvious.

Or I suppose the layers could not be connected up correctly but that seems unlikely.

You can look at the Keras network via graph.get_config(True).

pranv · 2015-08-19T08:15:02Z

@asampat3090, I suggested that you divide keras input by 255

asampat3090 · 2015-08-19T10:19:42Z

@pranv I actually tried that as well, but that made it worse. @fchollet yea that's actually a good point. I'm guessing you mean using the same caffemodel file and slowly building up the prototxt one layer at a time?

fchollet · 2015-08-19T16:43:53Z

@asampat3090 it could be done this way... but, more cleanly: couldn't we build a few single-layer (or 2-layers) sequential Caffe models, save their weights to a .caffemodel file, and write unit tests for the Keras-Caffe importer based on these files?

It's worth noting that unit tests need to be fast, so the existing tests are essentially useless since they take too long to actually be run every time we push to master. This approach would solve that.

dasguptar · 2015-09-12T10:29:25Z

Hi,
Could I ask what the current status of the Caffe branch is? The last activity here was almost 3 weeks ago.
Also, wow, this is really neat work!

pranv · 2015-09-12T10:32:53Z

@dasguptar Thanks :)

There is a small bug in convolution layers. And unit tests have to be added. I'm sure it'll be done within this week.

mmmikael · 2015-09-12T11:40:32Z

I noticed a small difference between Keras and Caffe's implementations of the Dropout layer.
I made a "Caffe mode" here https://github.com/mmmikael/keras/blob/caffe_m/keras/layers/core.py#L252-L276

phreeza · 2015-10-19T12:51:17Z

What is the status on this?

lireagan · 2015-12-15T07:08:37Z

Exciting work! and what is the status on this?

grisaitis · 2016-01-14T02:16:56Z

@lireagan @phreeza you might want to see #921

lireagan · 2016-01-17T07:35:05Z

@grisaitis Thank you

pranv and others added 2 commits July 25, 2015 12:45

Add Caffe conversion code by pranv

bf997c1

Coding style cleanup / small fixes

758d1fb

Add caffe compilation to setup.py

79eac6a

Start code review of caffe conversion

568e9ec

pranv reviewed Aug 1, 2015
View reviewed changes

Merge branch 'master' into caffe

00da19f

pranv and others added 4 commits August 6, 2015 18:33

caffe prototext v2 support

db44bcc

Merge pull request #492 from pranv/caffe

63d5d29

caffe prototext v2 support

Merge branch 'caffe' of https://github.com/fchollet/keras into caffe

94c4d61

Remove some redundancy

8a31762

Remove more things

22eff8c

fchollet mentioned this pull request Aug 18, 2015

Caffe support by pranv #368

Closed

fchollet closed this Oct 30, 2015

fchollet deleted the caffe branch December 6, 2015 04:07

Caffe models support #442

Caffe models support #442

Conversation

fchollet commented Jul 25, 2015

fchollet commented Jul 25, 2015

pranv commented Jul 25, 2015

fchollet commented Jul 26, 2015

pranv commented Jul 26, 2015

fchollet commented Jul 26, 2015

pranv commented Jul 26, 2015

pranv commented Jul 28, 2015

asampat3090 commented Jul 31, 2015

fchollet commented Jul 31, 2015

pranv commented Jul 31, 2015

fchollet commented Jul 31, 2015

asampat3090 commented Jul 31, 2015

pranv commented Jul 31, 2015

asampat3090 commented Aug 1, 2015

pranv commented Aug 1, 2015

pranv commented Aug 1, 2015

pranv Aug 1, 2015

Choose a reason for hiding this comment

pranv commented Aug 1, 2015

asampat3090 commented Aug 2, 2015

pranv commented Aug 2, 2015

pranv commented Aug 2, 2015

phreeza commented Aug 3, 2015

fchollet commented Aug 3, 2015

fchollet commented Aug 5, 2015

fchollet commented Aug 6, 2015

fchollet commented Aug 10, 2015

pranv commented Aug 10, 2015

pranv commented Aug 10, 2015

fchollet commented Aug 11, 2015

asampat3090 commented Aug 11, 2015

pranv commented Aug 11, 2015

pranv commented Aug 18, 2015

asampat3090 commented Aug 18, 2015

pranv commented Aug 18, 2015

asampat3090 commented Aug 18, 2015

fchollet commented Aug 18, 2015

pranv commented Aug 19, 2015

asampat3090 commented Aug 19, 2015

fchollet commented Aug 19, 2015

dasguptar commented Sep 12, 2015

pranv commented Sep 12, 2015

mmmikael commented Sep 12, 2015

phreeza commented Oct 19, 2015

lireagan commented Dec 15, 2015

grisaitis commented Jan 14, 2016

lireagan commented Jan 17, 2016