Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caffe support by pranv #368

Closed
wants to merge 35 commits into from
Closed

Caffe support by pranv #368

wants to merge 35 commits into from

Conversation

fchollet
Copy link
Member

@fchollet fchollet commented Jul 8, 2015

Creating a PR for easier reviewing

@pranv
Copy link
Contributor

pranv commented Jul 10, 2015

I've been thinking of adding tests, but since the model files are huge to be included into keras, I think we have only 2 options:

  • Fetch it on demand
  • Eliminate it, just test the model definition conversion code with a complicated model.

Any suggestions?

@phreeza
Copy link
Contributor

phreeza commented Jul 10, 2015

+1 for tests, and +1 for fetch on demand, just as it is done with the datasets.

@fchollet
Copy link
Member Author

If you've uploaded your model files somewhere (e.g. S3), then it's just one line of code:

from keras.datasets.data_utils import get_file
local_path = get_file('local_name.ext', origin="https://s3.amazonaws.com/some_path.ext")

@pranv
Copy link
Contributor

pranv commented Jul 10, 2015

The model files will be available on the researcher's page, who trained the model or in Caffe model Zoo.
Will add it ASAP.

@pranv
Copy link
Contributor

pranv commented Jul 10, 2015

Has anyone tried it out on a few models yet?
Any results?

input_layer_names.append(layers[input_layer].name)

if layer_nb in ends:
name = 'output_' + name # outputs nodes are marked with 'output_' prefix from which output is derived later in 'add_output'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid very long lines, I would recommend putting comments before the line (possibly over several lines).

@fchollet
Copy link
Member Author

The protobuf issue is fixed in a clean way by adding the following to setup.py:

import os
from six.moves.urllib.request import urlretrieve

# First, compile Caffe protobuf Python file
datadir = os.path.expanduser(os.path.join('~', '.keras', 'data'))
if not os.path.exists(datadir):
    os.makedirs(datadir)

caffe_source = os.path.join(datadir, 'caffe.proto')
caffe_destination = os.path.join(os.path.dirname(os.path.realpath(__file__)), 'keras', 'caffe')
urlretrieve('https://raw.githubusercontent.com/BVLC/caffe/master/src/caffe/proto/caffe.proto', caffe_source)
os.system('protoc --proto_path="' + datadir + '" --python_out="' + caffe_destination + '" "' + caffe_source + '"')

This should be Windows compatible as well.

Only potential issue is that this requires Protobuf to be installed before running setup.py. Or else Caffe import won't work (Keras can still be installed though).

@fchollet
Copy link
Member Author

We can now remove the pre-compiled protobuf file in keras/caffe as well. Since we can generate it at install time. Please update the PR.

@pranv
Copy link
Contributor

pranv commented Jul 15, 2015

Thanks for the feedback!

This idea will help remove the caffe_pb2.py being in repo unncecessarily as discussed in chainer/chainer#166

I will update the based on your suggestions ASAP.

@pranv
Copy link
Contributor

pranv commented Jul 15, 2015

Only potential issue is that this requires Protobuf to be installed before running setup.py. Or else Caffe import won't work (Keras can still be installed though).

Can we make google protocol buffer a optional dependancy? Like h5py was before?

@pranv
Copy link
Contributor

pranv commented Jul 15, 2015

move test_caffe_conversion.py to tests/auto. You can put the auxiliary file on S3 and fetch it with get_file (if you want I can put it on S3 for you).

I don't have any S3 storage, please do it

@pranv
Copy link
Contributor

pranv commented Jul 15, 2015

@fchollet have you tried out a few models?

Any results, feedback, bugs in that regard?

@fchollet
Copy link
Member Author

Can we make google protocol buffer a optional dependancy? Like h5py was before?

Using the code above, it is already de facto an optional dependency, because you can install and use Keras without it. You just won't be able to load Caffe models.

I don't have any S3 storage, please do it

Sure. In that case just remove every model file, I'll set up S3 storage & fetching.

Any results, feedback, bugs in that regard?

Not yet.

@asampat3090
Copy link

I believe in the current form, the CaffeToKeras class creates the network graph based on the caffemodel file instead of the prototxt. That is, if the user wants to plug in a subset of weights from the caffemodel into a new model (as defined in the prototxt) they currently cannot. Since I assume many will try to do transfer learning using weights from models in the Model Zoo plugged into new models, this could be a big issue.

@pranv
Copy link
Contributor

pranv commented Jul 16, 2015

I think I know the problem. When a caffemodel is provided, my code will completely construct keras model from it, disregarding the prototext. Hence changing the prototext will not change your model. This is a bad idea. My initial idea was to create a model from prototext and then copy weights. I reverted it to be what it is now, since I hadn't written the convert_weights function at the time. This way I could test the conversion simply (the older PR before did everything from prototext and then copied weights).

I think it will be fixed when I complete it, along with the other changes mentioned here by tomorrow.

Thanks for pointing it out!

@asampat3090
Copy link

@pranv Just wanted to see if you were able to make any progress. If you are swamped with work and already know what needs to change let me know - I can try to work on some changes as well.

@fchollet
Copy link
Member Author

What's the status on this PR? We'd like to merge it asap. If you don't have time for it, do you want me to take over?

@pranv
Copy link
Contributor

pranv commented Jul 23, 2015

Hey,
Sorry for the delay. I'll have it ready by this time tomorrow.

layer_output_dim = layer_input_dim

else:
raise RuntimeError("one or many layers used int this model is not currently supported")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo fix and clarification:

raise RuntimeError("One or more layers used in this model are not currently supported")

@pranv
Copy link
Contributor

pranv commented Jul 24, 2015

@fchollet I think I've done the changes.
The file fetching has to be set up

@fchollet
Copy link
Member Author

Cool, thank you. I'll take it from here.

@fchollet fchollet closed this Jul 25, 2015
@llcao
Copy link
Contributor

llcao commented Aug 17, 2015

I could not find caffe converter in official keras repository. Where shall I look?

@fchollet
Copy link
Member Author

In the caffe branch: https://github.com/fchollet/keras/tree/caffe

It is still being tested and debugged.

On 18 August 2015 at 03:35, llcao notifications@github.com wrote:

I could not find caffe converter in official keras repository. Where shall
I look?


Reply to this email directly or view it on GitHub
#368 (comment).

@asampat3090
Copy link

I've attached my testing code if someone would like to try it out as well. I've loaded an example image the same way and just used caffe. Please see code below. You can try any image ('exampleimg.jpg') and I have just used the 16 layer caffemodel file. My 16 layer prototxt file is also shown below the code. The caffe output claims its from the fc7 layer but given I'm getting a lot of zeros, I'm pretty sure the ReLU is being applied. Either way, the result from the two aren't matching up. Please let me know if I made any egregious errors below.

import sys
import numpy as np
from scipy.misc import imread, imresize
import pdb

import caffe
from keras.caffe import convert

# model files used
cnn_model_def = 'cnn_params/VGG_ILSVRC_16_layers_deploy_features.prototxt'
cnn_model_params = 'cnn_params/VGG_ILSVRC_16_layers.caffemodel'

C = 3
H = 224
W = 224

def format_img_for_input(image, H, W):
    """
    Helper function to convert image read from imread to caffe input

    Input:
    image - numpy array describing the image
    H - height in px
    W - width in px
    """
    if len(image.shape) == 2:
        image = np.tile(image[:, :, np.newaxis], (1, 1, 3))
    # RGB -> BGR
    image = image[:, :, (2, 1, 0)]
    # mean subtraction (get mean from model file?..hardcoded for now)
    image = image - np.array([103.939, 116.779, 123.68])
    # resize
    image = imresize(image, (H, W))
    # get channel in correct dimension
    image = np.transpose(image, (2, 0, 1))
    return image


# setup caffe cnn
print "Setting up caffe CNN..."
net = caffe.Net(cnn_model_def, cnn_model_params)
net.set_mode_gpu()

net.set_phase_test()
caffe_batch = np.zeros((10, C, H, W))

# setup keras
print "Setting up keras CNN..."
model = convert.caffe_to_keras(
    prototext=cnn_model_def,
    caffemodel=cnn_model_params,
    phase='test')
graph = model
keras_batch = np.zeros((1, C, H, W))

# Load image and format for input
print "Loading example image..."
im = imread('exampleimg.jpg')
formatted_im = format_img_for_input(im, H, W)

keras_batch[0, :, :, :] = formatted_im
for i in range(10):
    caffe_batch[i] = formatted_im

# extract features using caffe
print "Extracting features from caffe Net..."
out = net.forward(**{net.inputs[0]: caffe_batch})
caffe_features = out[net.outputs[0]].squeeze(axis=(2, 3))
caffe_features = caffe_features[0]

# extract features using keras
print "Extracting features from keras Graph..."
graph.compile('rmsprop', {graph.outputs.keys()[0]: 'mse'})
keras_features = graph.predict({'conv1_1':keras_batch}, batch_size=1, verbose=1)

# compare values - print True if equal
print "Compare values..."
print np.sum(caffe_features==keras_features) == 4096
pdb.set_trace()

Any my prototxt here:

name: "VGG_ILSVRC_16_layers"
input: "data"
input_dim: 10
input_dim: 3
input_dim: 224
input_dim: 224
layers {
  bottom: "data"
  top: "conv1_1"
  name: "conv1_1"
  type: CONVOLUTION
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
  }
}
layers {
  bottom: "conv1_1"
  top: "conv1_1"
  name: "relu1_1"
  type: RELU
}
layers {
  bottom: "conv1_1"
  top: "conv1_2"
  name: "conv1_2"
  type: CONVOLUTION
  convolution_param {
    num_output: 64
    pad: 1
    kernel_size: 3
  }
}
layers {
  bottom: "conv1_2"
  top: "conv1_2"
  name: "relu1_2"
  type: RELU
}
layers {
  bottom: "conv1_2"
  top: "pool1"
  name: "pool1"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  bottom: "pool1"
  top: "conv2_1"
  name: "conv2_1"
  type: CONVOLUTION
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
  }
}
layers {
  bottom: "conv2_1"
  top: "conv2_1"
  name: "relu2_1"
  type: RELU
}
layers {
  bottom: "conv2_1"
  top: "conv2_2"
  name: "conv2_2"
  type: CONVOLUTION
  convolution_param {
    num_output: 128
    pad: 1
    kernel_size: 3
  }
}
layers {
  bottom: "conv2_2"
  top: "conv2_2"
  name: "relu2_2"
  type: RELU
}
layers {
  bottom: "conv2_2"
  top: "pool2"
  name: "pool2"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  bottom: "pool2"
  top: "conv3_1"
  name: "conv3_1"
  type: CONVOLUTION
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
  }
}
layers {
  bottom: "conv3_1"
  top: "conv3_1"
  name: "relu3_1"
  type: RELU
}
layers {
  bottom: "conv3_1"
  top: "conv3_2"
  name: "conv3_2"
  type: CONVOLUTION
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
  }
}
layers {
  bottom: "conv3_2"
  top: "conv3_2"
  name: "relu3_2"
  type: RELU
}
layers {
  bottom: "conv3_2"
  top: "conv3_3"
  name: "conv3_3"
  type: CONVOLUTION
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
  }
}
layers {
  bottom: "conv3_3"
  top: "conv3_3"
  name: "relu3_3"
  type: RELU
}
layers {
  bottom: "conv3_3"
  top: "pool3"
  name: "pool3"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  bottom: "pool3"
  top: "conv4_1"
  name: "conv4_1"
  type: CONVOLUTION
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
  }
}
layers {
  bottom: "conv4_1"
  top: "conv4_1"
  name: "relu4_1"
  type: RELU
}
layers {
  bottom: "conv4_1"
  top: "conv4_2"
  name: "conv4_2"
  type: CONVOLUTION
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
  }
}
layers {
  bottom: "conv4_2"
  top: "conv4_2"
  name: "relu4_2"
  type: RELU
}
layers {
  bottom: "conv4_2"
  top: "conv4_3"
  name: "conv4_3"
  type: CONVOLUTION
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
  }
}
layers {
  bottom: "conv4_3"
  top: "conv4_3"
  name: "relu4_3"
  type: RELU
}
layers {
  bottom: "conv4_3"
  top: "pool4"
  name: "pool4"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  bottom: "pool4"
  top: "conv5_1"
  name: "conv5_1"
  type: CONVOLUTION
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
  }
}
layers {
  bottom: "conv5_1"
  top: "conv5_1"
  name: "relu5_1"
  type: RELU
}
layers {
  bottom: "conv5_1"
  top: "conv5_2"
  name: "conv5_2"
  type: CONVOLUTION
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
  }
}
layers {
  bottom: "conv5_2"
  top: "conv5_2"
  name: "relu5_2"
  type: RELU
}
layers {
  bottom: "conv5_2"
  top: "conv5_3"
  name: "conv5_3"
  type: CONVOLUTION
  convolution_param {
    num_output: 512
    pad: 1
    kernel_size: 3
  }
}
layers {
  bottom: "conv5_3"
  top: "conv5_3"
  name: "relu5_3"
  type: RELU
}
layers {
  bottom: "conv5_3"
  top: "pool5"
  name: "pool5"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  bottom: "pool5"
  top: "fc6"
  name: "fc6"
  type: INNER_PRODUCT
  inner_product_param {
    num_output: 4096
  }
}
layers {
  bottom: "fc6"
  top: "fc6"
  name: "relu6"
  type: RELU
}
layers {
  bottom: "fc6"
  top: "fc6"
  name: "drop6"
  type: DROPOUT
  dropout_param {
    dropout_ratio: 0.5
  }
}
layers {
  bottom: "fc6"
  top: "fc7"
  name: "fc7"
  type: INNER_PRODUCT
  inner_product_param {
    num_output: 4096
  }
}
layers {
  bottom: "fc7"
  top: "fc7"
  name: "relu7"
  type: RELU
}

@fchollet
Copy link
Member Author

Please post any further comments in the current PR thread: #442

@asampat3090 : is the caffemodel hosted somewhere? I'd like to take a look.

One initial reason why you would see different result (independently of any potential bug in the PR) is that the networks are in different phases; the Keras net is in test mode and the Caffe net is in train mode (which is why Dropout is being applied). This changes intermediate representations substantially, but should not affect significantly the last layer probabilities (assuming the network has been trained until convergence).

@asampat3090 asampat3090 mentioned this pull request Aug 18, 2015
fchollet pushed a commit that referenced this pull request Sep 22, 2023
… by : @divyashreepathihalli (#368)

* add mlp classifier example

* remove tfaddons dependency, remove GELU and AdamW and replace with keras core optimizer isnstead

* review updates applied
hubingallin pushed a commit to hubingallin/keras that referenced this pull request Sep 22, 2023
… by : @divyashreepathihalli (keras-team#368)

* add mlp classifier example

* remove tfaddons dependency, remove GELU and AdamW and replace with keras core optimizer isnstead

* review updates applied
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants