Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caffe models support #442

Closed
wants to merge 10 commits into from
Closed

Caffe models support #442

wants to merge 10 commits into from

Conversation

fchollet
Copy link
Member

No description provided.

@fchollet
Copy link
Member Author

@pranv I haven't really looked at it yet, just added your code and cleaned it up. Check out the changes I made, they should help you improve your coding style.

I'll do a full review in the next few days.

@pranv
Copy link
Contributor

pranv commented Jul 25, 2015

@fchollet
Great!
Learnt a few things..

@fchollet
Copy link
Member Author

Started code-reviewing. I find the code quite difficult to understand, and I believe it could be considerably simplified :(

@pranv
Copy link
Contributor

pranv commented Jul 26, 2015

As far as I know, the layers are treated in the standard way. The complexity is present in getting the graph. Again, as far as I know, this is pretty much standard. There aren't other libraries that do this, hence it was hard. And I'm pretty sure that all steps are required, and I couldn't figure out a way to simplify them.
Is there a specific area, or if the entire thing unnecessarily complex?

@fchollet
Copy link
Member Author

I haven't understood everything yet. But this part seems quite complicated and difficult to follow:

network = make_network(layers, phase)  # obtain the nodes that make up the graph
if len(network) == 0:
    raise Exception('failed to construct network from the prototext')
network = acyclic(network)  # Convert it to be truly acyclic
network = merge_layer_blob(network)  # eliminate 'blobs', just have layers
reverse_network = reverse(network)  # reverse, to obtain the network_input
network_inputs = get_inputs(reverse_network)  # inputs of the network - 'in-order' is zero
network_outputs = get_outputs(network)  # outputs of the network - 'out-order' is zero
network = remove_label_paths(network, network_inputs, network_outputs)  # path from input to loss layers removed
reverse_network = reverse(network)  # stores the 'input' for each layer

Would there be any way to convert the Caffe prototext to a structure such as this in one pass?

I'll look into it.

Otherwise, I'm worried a bit about redundancy. Technically we already have code to build a model from a 'config' (though the config format is a bit different, but could be unified): https://github.com/fchollet/keras/blob/master/keras/utils/layer_utils.py#L18

@pranv
Copy link
Contributor

pranv commented Jul 26, 2015

The caffe definition of models is a bit weird. Technically, and according to the documentation, caffe accepts all Directed Acyclic Graphs (DAG). But in reality, several functions like ReLU, Dropout etc, can start and end at the same blob. This is because the computation can happen at the same memory location, ie blob. This causes cycles and naively using the bottom parameter of each layer could lead to a disaster.

At the same time, there are no specified input and output layers. Since it is a DAG, the nodes with 0 in-degree are input nodes. Similarly, the nodes with 0 out-degree are output nodes. This has to be found out. Other methods that use "one pass" cannot handle things like Siamese Networks - they place a constraint that the model is sequential.

I create a graph as a dictionary. I'll find the node which maps to none, that is the output node. I'll reverse the network, find nodes which maps to none to get the start nodes.

Now one more important aspect of caffe is that the same data layer is usually used to get both the actual data and the labels. This is an inherent non-sequentiality and keras doesn't do things this way. So that path, between data and the final loss layers has to be removed.

The reverse_network stores all the previous nodes of a given node. Hence it is a node -> [previous nodes] map.

Hope that clarifies some things.

@pranv
Copy link
Contributor

pranv commented Jul 28, 2015

Another addition: All names are blobs.

@asampat3090
Copy link

I noticed the 'data' input isn't present when loading the caffemodel. Is that by design as per your comment above? I'm not able to get my results to match with caffe for the VGG 16 layer network (prototxt and caffemodel). It seems to be an issue with the input layer being 'conv1_1' instead of 'data'.

For reference - here is my pycaffe code (where net is loaded from the prototxt and caffemodel):

self.net = caffe.Net(cnn_model_def, cnn_model_params)
out = self.net.forward(**{self.net.inputs[0]: image_batch})
features = out[self.net.outputs[0]].squeeze(axis=(2, 3))

And here is my not-so-equivalent keras code:

model = convert.CaffeToKeras(
    prototext='cnn_params/VGG_ILSVRC_16_layers_deploy_features.prototxt',
    caffemodel='cnn_params/VGG_ILSVRC_16_layers.caffemodel',
    phase='test')
graph = model('network') 
graph.compile('rmsprop', {graph.outputs.keys()[0]: 'mse'})
features = graph.predict({graph.inputs.keys()[0]: batch_images}, batch_size=1, verbose=1)

@fchollet
Copy link
Member Author

@asampat3090 @pranv Any thoughts on how this could be addressed?

@pranv
Copy link
Contributor

pranv commented Jul 31, 2015

I think I had addressed this issue before. The input name isn't 'data' since your prototext is in deploy mode. In deploy mode, the first layer is directly a processing layer and not DATA. And that layer's name has to be given as the input.

I've added 'model.inputs' and 'model.outputs' to help in this regard.

model = convert.CaffeToKeras(
    prototext='cnn_params/VGG_ILSVRC_16_layers_deploy_features.prototxt',
    caffemodel='cnn_params/VGG_ILSVRC_16_layers.caffemodel') #phase irrelavent in deploy mode
graph = model('network') 
graph.compile('rmsprop', {model.outputs[0]: 'mse'})
features = graph.predict({model.inputs[0]: batch_images}, batch_size=1, verbose=1)

And does pycaffe run the code in caffe or in keras? This PR is about converting caffe models to keras, and running the network in keras.

And although this is a couple of lines more, which code seems simpler to you?
Consider the fact that we can't expect all keras users to know caffe entirely

@fchollet
Copy link
Member Author

The big question is:

I'm not able to get my results to match with caffe for the VGG 16 layer network (prototxt and caffemodel).

So, does this implementation produce a Keras model that is capable of reproducing the Caffe results? Or Not? If not, what should be done to address it?

@asampat3090
Copy link

Yea @pranv I understand the syntax but the concern is that it doesn't return the same result. As far as I can tell I have used the correct Keras code. Have you tried running this against caffe for known models? Doesn't seem to match up. @fchollet I'll have to delve deeper into the code to figure out next steps.

@pranv
Copy link
Contributor

pranv commented Jul 31, 2015

@asampat3090 what exactly is the difference in result?

@asampat3090
Copy link

They just don't match up at all. What would be the best way to show you the difference? Can I send you the pickle files? For context, I am taking the output features from the 'fc7' layer w/ a 1x4096 feature vector. The features from keras have no zero values while the caffe one does. There doesn't seem to be any simple linear relation.

@pranv
Copy link
Contributor

pranv commented Aug 1, 2015

@asampat3090 okay. Do go through the code and let me know of my mistake.

@pranv
Copy link
Contributor

pranv commented Aug 1, 2015

The only error I can think of is that the weights had to be transposed. Anything else would lead to dimensionality errors.

Or, in VGGs case, there could be an issue with the group parameter. But again, of this wasn't right, this would lead to dimensionality errors as well. Again, a transpose might be in necessary.

weights_p = np.zeros((nb_filter, stack_size, nb_col, nb_row))
weights_b = np.array(blobs[1].data)

chunk_data_size = len(blobs[0].data) // group
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for reference, this is how chainer does this:

def _setup_convolution(self, layer):
        blobs = layer.blobs
        param = layer.convolution_param
        ksize = _get_ksize(param)
        stride = _get_stride(param)
        pad = _get_pad(param)
        num = _get_num(blobs[0])
        channels = _get_channels(blobs[0])

        n_in = channels * param.group
        n_out = num
        func = functions.Convolution2D(n_in, n_out, ksize, stride, pad,
                                       nobias=not param.bias_term)
        func.W.fill(0)

        part_size = len(blobs[0].data) // param.group
        for i in six.moves.range(param.group):
            in_slice = slice(i * n_in // param.group,
                             (i+1) * n_in // param.group)
            out_slice = slice(i * n_out // param.group,
                              (i+1) * n_out // param.group)
            w = func.W[out_slice, in_slice]

            data = numpy.array(blobs[0].data[i*part_size:(i+1)*part_size])
            w[:] = data.reshape(w.shape)

        if param.bias_term:
            func.b[:] = blobs[1].data

        setattr(self.fs, layer.name, func)
        self.forwards[layer.name] = func
        self._add_layer(layer)

And this is how sklearn-theano does this:
def _blob_to_ndarray(blob):

    """Converts a caffe protobuf blob into an ndarray"""
    dimnames = ["num", "channels", "height", "width"]
    data = np.array(blob.data)
    shape = tuple([getattr(blob, dimname) for dimname in dimnames])
    return data.reshape(shape)

@pranv
Copy link
Contributor

pranv commented Aug 1, 2015

@fchollet For a measure of complexity, I think the above two repos might give an estimate.

@asampat3090
Copy link

It seems like the weights for the fully connected layers were transposed incorrectly, but otherwise the weights and biases for the convolution and fully connected layers all matched up with caffe.

However there is still something wrong. It probably isn't the group parameter since I checked the weights against caffe's after the model was imported, but it could be an issue with pooling or relu layers (made changes here). Any other ideas? It does seem like it could be a relu issue given that the keras result has no zero values and the caffe one does...but not sure how to check it.

Caffe features sample (correct):

array([-0.        , -0.        ,  2.7828536 , ..., -0.        ,
       -0.        ,  1.70453358], dtype=float32)

Keras features sample:

array([[-24.98186874,  21.50024796,  -1.37967587, ..., -28.88113213,
         -3.03378105,   5.12496424]])

@pranv
Copy link
Contributor

pranv commented Aug 2, 2015

@asampat3090 Thank you for that.

It does seem like it could be a relu issue given that the keras result has no zero values and the caffe one does...but not sure how to check it.

network.get_config() could offer some insight.

@pranv
Copy link
Contributor

pranv commented Aug 2, 2015

@asampat3090 also, do post the code changes you made (if any)

@phreeza
Copy link
Contributor

phreeza commented Aug 3, 2015

@asampat3090 Maybe you could add a test to perform the check you did automatically?

@fchollet
Copy link
Member Author

fchollet commented Aug 3, 2015

Maybe you could add a test to perform the check you did automatically?

That would be great, if it's doable. Such a check should absolutely be part of testing that Caffe import does work.

@fchollet
Copy link
Member Author

fchollet commented Aug 5, 2015

@asampat3090 :

It does seem like it could be a relu issue given that the keras result has no zero values and the caffe one does...but not sure how to check it.

What model method are you using to check your results? Dropout will only be applied in 'training' mode, i.e. using model.train, etc.

Also, it would be great to have your code so we can reproduce these results and investigate.

@fchollet
Copy link
Member Author

fchollet commented Aug 6, 2015

@pranv you can submit a PR to the caffe branch in Keras. This will be helpful since I have also local changes to caffe that I will commit later.

@fchollet
Copy link
Member Author

I have removed ~230 lines of code. Any concerns? 8a31762

I haven't tested for correctness yet, so although the conversion yields valid Keras models, it is not clear whether the weights do reproduce the original Caffe network...

By the way, I believe it would be simpler to just convert Caffe models to a Keras config dictionary, then instantiate the model from this config, instead of converting the Caffe model to an instantiated Keras model. I will look into it in the near future. Any concerns?

Also I will still be looking for a simpler way to do this step: https://github.com/fchollet/keras/blob/8a31762d0703d704befbbfe4850079bc6223f027/keras/caffe/converters.py#L29-L38

@pranv
Copy link
Contributor

pranv commented Aug 10, 2015

Based a on a quick skim:

  • I think you made an if...else decision with prototxt or caffemodel. As you can see from the discussion above, people would want to change prototxts to obtain modified models
  • I assumed convert_weights would be helpful in embedding something like VGG net for feature extraction
  • What happened in line 37?

Overall, it was a good idea to reduce almost 2 similar functions to 1.

@pranv
Copy link
Contributor

pranv commented Aug 10, 2015

Also, I had a conversation with @szagoruyko, the author of the torch equivalent and a member of the torch team(FAIR). He thought the code was fine

@fchollet
Copy link
Member Author

@asampat3090

It seems like the weights for the fully connected layers were transposed incorrectly, but otherwise the
weights and biases for the convolution and fully connected layers all matched up with caffe.

Checking out your changes, what was your justification for removing the transposition for the weights of the fully connected layers? That doesn't seem right.

@asampat3090
Copy link

@fchollet sorry for the delayed response. I'm using the VGG 16 layer caffemodel and have a custom prototext file (https://gist.github.com/asampat3090/c63a6f5082bab64e8a74). As for the weights, I compared the weights in caffe to those in keras and transposed those weights because they didn't match. Unfortunately it still didn't result in the correct answer given my prototxt.

@pranv
Copy link
Contributor

pranv commented Aug 11, 2015

@asampat3090, please try with the latest code again

@fchollet fchollet mentioned this pull request Aug 18, 2015
@pranv
Copy link
Contributor

pranv commented Aug 18, 2015

On the C++ side the image is represented as uint8 in 0-255 while on the Python side scikit-image represents the image data as float in 0-1. For this reason the Python wrapper scales first so that it can re-map the data to 0-255 to match the precomputed mean.

From caffe issue somewhere.

@asampat3090
Copy link

Sorry for posting on the older post. @fchollet referring to our previous convo (others please see #368). I used the caffemodel file for the VGG 16 layer as given in the Model Zoo (http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_16_layers.caffemodel).

I'm not sure why the caffe model would be in training mode - I did specify the phase to be test (or so I thought) via: net.set_phase_test()

@pranv oh interesting...I wonder how I can prevent that - or at the very least know what that scaling is. It seems like that may be out of my control if I am using the pycaffe wrapper then. Any ideas?

@pranv
Copy link
Contributor

pranv commented Aug 18, 2015

@asampat3090, can you divide the entire image by 255 and try?

@asampat3090
Copy link

Unfortunately that didn't work. I tried to divide the caffe input by 255, which made the caffe output change and have nearly as many non-zero values as the keras output, but the numbers were still off - also, I'm not confident that would be the correct result for caffe anyways. Alternatively I tried to multiply the keras input by 255, but that didn't change much.

It probably does have to do with how the image is interpreted though since I've verified the weights matched up with the caffe blobs for each layer. Or I suppose the layers could not be connected up correctly but that seems unlikely.

@pranv do you think you could try running my code and see what you get?

@fchollet
Copy link
Member Author

By the way, wouldn't it be easier (less prone to issues in the test code itself) to look at much simpler networks first (eg. single FC layer, or single Conv layer)? With a single-layer network any potential error (in the conversion or in the test code) should be immediately obvious.

Or I suppose the layers could not be connected up correctly but that seems unlikely.

You can look at the Keras network via graph.get_config(True).

@pranv
Copy link
Contributor

pranv commented Aug 19, 2015

@asampat3090, I suggested that you divide keras input by 255

@asampat3090
Copy link

@pranv I actually tried that as well, but that made it worse. @fchollet yea that's actually a good point. I'm guessing you mean using the same caffemodel file and slowly building up the prototxt one layer at a time?

@fchollet
Copy link
Member Author

@asampat3090 it could be done this way... but, more cleanly: couldn't we build a few single-layer (or 2-layers) sequential Caffe models, save their weights to a .caffemodel file, and write unit tests for the Keras-Caffe importer based on these files?

It's worth noting that unit tests need to be fast, so the existing tests are essentially useless since they take too long to actually be run every time we push to master. This approach would solve that.

@dasguptar
Copy link

Hi,
Could I ask what the current status of the Caffe branch is? The last activity here was almost 3 weeks ago.
Also, wow, this is really neat work!

@pranv
Copy link
Contributor

pranv commented Sep 12, 2015

@dasguptar Thanks :)

There is a small bug in convolution layers. And unit tests have to be added. I'm sure it'll be done within this week.

@mmmikael
Copy link
Contributor

I noticed a small difference between Keras and Caffe's implementations of the Dropout layer.
I made a "Caffe mode" here https://github.com/mmmikael/keras/blob/caffe_m/keras/layers/core.py#L252-L276

@phreeza
Copy link
Contributor

phreeza commented Oct 19, 2015

What is the status on this?

@fchollet fchollet closed this Oct 30, 2015
@fchollet fchollet deleted the caffe branch December 6, 2015 04:07
@lireagan
Copy link

Exciting work! and what is the status on this?

@grisaitis
Copy link

@lireagan @phreeza you might want to see #921

@lireagan
Copy link

@grisaitis Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants