Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Caffe Frontend] introduce caffe frontend for tvm #6206

Merged
merged 6 commits into from
Aug 27, 2020

Conversation

fernchen
Copy link
Contributor

@fernchen fernchen commented Aug 4, 2020

I mentioned a PR before, see #6000, but because of my mistake, I merged the master branch. In order to correct this mistake, I propose this pr.

Please forgive me for my mistakes and help to review this pr, thanks you very much!
@tqchen @FrozenGene @siju-samuel

Background & Motivation

Caffe is a deep learning framework made with expression, speed, and modularity in mind. Because of its simplicity, good scalability, fast and other characteristics, it is favored by many people. According to riselab who makes statistics of papers collected in arxiv.org, Caffe is ranked in the top four in the deep learning framework, which shows to some extent that Caffe's user base is still large, please refer to blog. In addition, according to our company's research on the market, the demand for Caffe in the production environment is still strong, and many models based on Caffe need to be deployed. For example, existing deployment frameworks, such as MNN, NCNN, MACE, etc., directly support the deployment of Caffe.

TVM only supports caffe2 at present, and the difference between Caffe and caffe2 is quite large. At present, there are two ways to deploy Caffe model in TVM: one is to convert Caffe model to Tensorflow or Pytorch model, the other is to convert Caffe model to onnx and then to relay IR. The two methods are essentially the same. They are all indirectly converted to relay IR through the third-party transformation. However, the problem is that some ops will fail in the process of model transformation, and even the result of transfer out may be different.

Based on the above situation, we decided to open our Caffe frontend codes, hoping to enrich the use scenarios of TVM.

Implementation Approach

The whole process of Caffe front end importing model is divided into:

  1. Read Model:The model graph and related parameters are read through the protobuffer API of Caffe;
  2. Rebuild Graph:Traverse the graph, then replace the top of the in-place layer with the name of the layer, and update all related layers at the same time;
  3. Model Conversion:Read the parameters of each layer and convert them into corresponding TVM OP and parameters;
  4. Layer Fusion:fuse batchnorm and scale layers;
  5. Convert to Relay IR:It mainly includes its module, params and the real name of the output layer。

Finally, we can import the Caffe model as follows:

from google.protobuf import text_format
from caffe.proto import caffe_pb2 as pb

init_net = pb.NetParameter()
predict_net = pb.NetParameter()

# load model
with open(proto_file, 'r') as f:
    text_format.Merge(f.read(), predict_net)
# load blob
with open(blob_file, 'rb') as f:
    init_net.ParseFromString(f.read())

shape_dict = {'data': [1,3,224,224]}
dtype_dict = {'data': 'float32'}

mod, params = relay.frontend.from_caffe(init_net, predict_net, shape_dict, dtype_dict)

Work Done

All of the things that we have done are listed as following:

1. List of supported Ops

  • BatchNorm
  • Concat
  • Convolution
  • Crop
  • Deconvolution
  • Dropout
  • Eltwise
  • Flatten
  • InnerProduct
  • Input
  • LRN
  • Normalize
  • Permute
  • Pooling
  • PReLU
  • PriorBox
  • proposal
  • Python
  • ReLU
  • Reshape
  • Resize
  • ROIPooling
  • Scale
  • Sigmoid
  • Slice
  • Softmax
  • TanH
  • Upsample

2. List of supported complete models

  • Alexnet
  • Resnet50
  • Mobilenetv1
  • Mobilenetv2
  • Inceptionv1
  • Inceptionv3
  • Inceptionv4
  • Vgg16
  • Squeezenetv1
  • SSDMobilenetv1
  • SSDMobilenetv2
  • YOLOv3
  • ENet

3. Caffe frontend test cases

4. Caffe frontend tutorial

TODO

  • Support more ops and more complete models.

According to the above implementation scheme, based on the front-end framework we built, you can add any new op, you only need to: firstly, add a method in the operatorconverter class, which needs to include your extraction of the layer parameters and the logic of conversion to TVM OP, secondly, register the method to convert_ map.

@fernchen fernchen force-pushed the CaffeFrontEnd branch 2 times, most recently from 645cee8 to aab7056 Compare August 5, 2020 02:12
Copy link
Member

@FrozenGene FrozenGene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some initial review comments.

python/tvm/relay/frontend/caffe.py Outdated Show resolved Hide resolved
python/tvm/relay/frontend/caffe.py Outdated Show resolved Hide resolved
tutorials/frontend/from_caffe.py Outdated Show resolved Hide resolved
@fernchen
Copy link
Contributor Author

fernchen commented Aug 6, 2020

Hi @tqchen @FrozenGene:
I adopted the suggestion that just add caffe env in the ci_cpu, see #6023 (comment)
But now there are some errors in this pr when doing ci_gpu. I have read the error log, and find that it will try to generate docs by executing this script tutorials/frontend/from_caffe.py in tvmai/ci-gpu:v0.64, and obviously there is no Caffe env in tvmai/ci-gpu:v0.64. So is there any way to avoid this problem, since we only have caffe env in ci_cpu?

@FrozenGene
Copy link
Member

Hi @tqchen @FrozenGene:
I adopted the suggestion that just add caffe env in the ci_cpu, see #6023 (comment)
But now there are some errors in this pr when doing ci_gpu. I have read the error log, and find that it will try to generate docs by executing this script tutorials/frontend/from_caffe.py in tvmai/ci-gpu:v0.64, and obviously there is no Caffe env in tvmai/ci-gpu:v0.64. So is there any way to avoid this problem, since we only have caffe env in ci_cpu?

I think we have to install caffe to gpu docker too.

@fernchen
Copy link
Contributor Author

fernchen commented Aug 6, 2020

Hi @tqchen @FrozenGene:
I adopted the suggestion that just add caffe env in the ci_cpu, see #6023 (comment)
But now there are some errors in this pr when doing ci_gpu. I have read the error log, and find that it will try to generate docs by executing this script tutorials/frontend/from_caffe.py in tvmai/ci-gpu:v0.64, and obviously there is no Caffe env in tvmai/ci-gpu:v0.64. So is there any way to avoid this problem, since we only have caffe env in ci_cpu?

I think we have to install caffe to gpu docker too.

Aha,what a bad news! Shall we append caffe in gpu docker? I can proposal a new pr.

@fernchen
Copy link
Contributor Author

fernchen commented Aug 6, 2020

Hi @tqchen @FrozenGene:
I adopted the suggestion that just add caffe env in the ci_cpu, see #6023 (comment)
But now there are some errors in this pr when doing ci_gpu. I have read the error log, and find that it will try to generate docs by executing this script tutorials/frontend/from_caffe.py in tvmai/ci-gpu:v0.64, and obviously there is no Caffe env in tvmai/ci-gpu:v0.64. So is there any way to avoid this problem, since we only have caffe env in ci_cpu?

I think we have to install caffe to gpu docker too.

Aha,what a bad news! Shall we append caffe in gpu docker? I can proposal a new pr.

Need more advice from you @tqchen

@FrozenGene
Copy link
Member

@fernchen please follow up.

@fernchen
Copy link
Contributor Author

@fernchen please follow up.

Thanks!
I have sent @tqchen a email about how to deal with this problem, and wait for his responses. If there is no better resolution, I will prorpose a new PR and add caffe environment to ci_gpu

@tqchen
Copy link
Member

tqchen commented Aug 17, 2020

@fernchen let us skip the tutorials for now and only add the frontend cpu tests

@fernchen
Copy link
Contributor Author

@fernchen let us skip the tutorials for now and only add the frontend cpu tests

Thanks for your advice! I have modified the codes, please review the other part of them. @tqchen

@fernchen
Copy link
Contributor Author

Hi @tqchen @FrozenGene :
Please let me know, if you have more suggestions about this pr.

@fernchen
Copy link
Contributor Author

Hi @tqchen
I'm sorry to bother you, but I really need more suggestions to push this pr forward.
Thanks very much!

@tqchen
Copy link
Member

tqchen commented Aug 25, 2020

@fernchen please try to send another dummy commit to retrigger the CI, @FrozenGene please followup. I think we can merge once all the tests are resolved

@fernchen
Copy link
Contributor Author

@fernchen please try to send another dummy commit to retrigger the CI, @FrozenGene please followup. I think we can merge once all the tests are resolved

@tqchen Thanks for your suggestions! @FrozenGene I have send a new dummy commit for this pr and now all tests are passed. Please let me konw if you have more suggestions about this pr, thanks.

target_host = 'llvm'

ctx = tvm.cpu(0)
with tvm.transform.PassContext(opt_level=2):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can not be opt level 3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can not be opt level 3?

Sorry,this is a historical issue. In previous version, it will throw errors while building LRN layer(Alexnet) with opt_level 3. And this problem has been solved, so i will reset the opt_level to 3. Thanks

m.set_input('data' + str(idx), tvm.nd.array(d.astype(dtype)))
else:
m.set_input('data', tvm.nd.array(data.astype(dtype)))
m.set_input(**params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need set parameter now in new module based runtime interface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need set parameter now in new module based runtime interface

Thanks! I will delete this line.

weight_filler=dict(type="xavier"),
bias_filler=dict(type="xavier"))

print("Testing layer Convolution pass!")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove all related print as we have assert

data = np.random.rand(1, 3, 10, 10).astype(np.float32)
_test_batchnorm(data)
_test_batchnorm(data, moving_average_fraction=0.88, eps=1e-4)
print("Testing layer BatchNorm pass!")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@FrozenGene
Copy link
Member

from tvm.relay.frontend import caffe_pb2 as pb

where is from tvm.relay.frontend import caffe_pb2 as pb, i don't see it in this pr. And another thing is we must at least provide one instruction how to build own caffe.proto to related stuff.

@fernchen
Copy link
Contributor Author

from tvm.relay.frontend import caffe_pb2 as pb

where is from tvm.relay.frontend import caffe_pb2 as pb, i don't see it in this pr. And another thing is we must at least provide one instruction how to build own caffe.proto to related stuff.

Based on disccussion on #6023 (comment), we will support BVLC Caffe in TVM firstly. So there is no need to compile own caffe.proto in this pr, we use the caffe.proto as follow:

from caffe.proto import caffe_pb2 as pb

@FrozenGene FrozenGene merged commit 44d97ad into apache:master Aug 27, 2020
@FrozenGene
Copy link
Member

Thanks @fernchen It is merged now.

@FrozenGene FrozenGene added status: accepted and removed status: need review status: need update need update based on feedbacks labels Aug 27, 2020
kevinthesun pushed a commit to kevinthesun/tvm that referenced this pull request Sep 17, 2020
* [Caffe Frontend] introduce caffe frontend for tvm.

* [Caffe Frontend] fix bugs for generating caption in tutorial.

* [Caffe Frontend] delete statement for python2 and modify the function name.

* [Caffe Frontend] change the directory which will hold the tmp files
when testing the caffe frondend.

* [Caffe Frontend] delete tutorial about caffe frontend.

* [Caffe Frontend] delete some print statements

Co-authored-by: fernchen <zifeng.cf@alibaba-inc.com>
kevinthesun pushed a commit to kevinthesun/tvm that referenced this pull request Sep 18, 2020
* [Caffe Frontend] introduce caffe frontend for tvm.

* [Caffe Frontend] fix bugs for generating caption in tutorial.

* [Caffe Frontend] delete statement for python2 and modify the function name.

* [Caffe Frontend] change the directory which will hold the tmp files
when testing the caffe frondend.

* [Caffe Frontend] delete tutorial about caffe frontend.

* [Caffe Frontend] delete some print statements

Co-authored-by: fernchen <zifeng.cf@alibaba-inc.com>
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Sep 18, 2020
* [Caffe Frontend] introduce caffe frontend for tvm.

* [Caffe Frontend] fix bugs for generating caption in tutorial.

* [Caffe Frontend] delete statement for python2 and modify the function name.

* [Caffe Frontend] change the directory which will hold the tmp files
when testing the caffe frondend.

* [Caffe Frontend] delete tutorial about caffe frontend.

* [Caffe Frontend] delete some print statements

Co-authored-by: fernchen <zifeng.cf@alibaba-inc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants