Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The image recognition and detection model on Fluid. #7253

Closed
qingqing01 opened this issue Jan 5, 2018 · 0 comments
Closed

The image recognition and detection model on Fluid. #7253

qingqing01 opened this issue Jan 5, 2018 · 0 comments
Assignees

Comments

@qingqing01
Copy link
Contributor

qingqing01 commented Jan 5, 2018

From the cooperation with Visual Technology Department on Fluid. We need to do three models about image classification, objection detection and optical character recognition (OCR). They are:

  • SE-ResNeXt 152 on ImageNet 2012 dataset.
  • MobileNet-SSD on MSCOCO dataset.
  • OCR Model
    • CNN + RNN(GRU) + CTC model
    • CNN + RNN(GRU) + Attention model.

SE-ResNeXt 152

The top-1 error on ImageNet 2012 dataset must less than 18.2%.
TODOs:

  • 1.) Add data argumentation operation

  • 2.) Write model configuration for SE-ResNeXt.
    Except the residual block, the SE-ResNeXt architecture contains squeeze-and-excitation(SE) block and aggregating transformations.

    • 2.1) SE-Block:
      • Global average pooling + FC (or 1x1 conv) + ReLU + FC(or 1x1 conv) + Sigmoid
      • Scale Op (elementwise_mul operator in Fluid.)
      • About the global average pooling:
        From the author's point of view, our global pooling operator may also be less efficient. We also need to optimize it or just try reduce_mean at first.
    • 2.2) Aggregating Transformations
      • This is a grouped convolution.
  • 3.) Experiment
    The single crop validation error of top-1 must be less than 18.2% on ImageNet 2012. But if the Multi-GPUs are not finished before the above works are finished. The result can be verified on CIFAR dataset at first.

  • 4.) Submit demo and report.

The following two parts will continue to be edited to list more detailed subtasks.

MobileNet-SSD

  • 1.) MobileNet
    • 1.1) depthwise-conv operator.
    • 1.2) ARM based depthwise-conv.
  • 2.) SSD architecture
    • Even though, the layers have been implemented in old Paddle. I think we should make a survey about object detection on other frameworks like TensorFlow and then split into many subtasks. I'm doing this now. In addition, except for the training, our goal is to deploy this model.
    • Done in The SSD for object detection on Fluid. #7402
  • 3.) Data argumentation

OCR Model

  • 1.) CNN + RNN(GRU) + CTC
  • 2.) CNN + RNN(GRU) + Spatial Attention
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants