Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fp16 support in the Object Detection API [Feature request] #3706

Open
eilifsolberg opened this issue Mar 22, 2018 · 19 comments
Open

fp16 support in the Object Detection API [Feature request] #3706

eilifsolberg opened this issue Mar 22, 2018 · 19 comments

Comments

@eilifsolberg
Copy link

eilifsolberg commented Mar 22, 2018

Featuere request: fp16/mixed precision support for training

  • Is fp16/mixed precision support on the roadmap for training networks using the Object Detection API?
  • If not, do you see any issues that needs to be resolved? It seems like you would either need to have two sets of pretrained models, our some automatic conversions between them.

System information

  • What is the top-level directory of the model you are using: N/A
  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): N/A
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): N/A
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 1.5.0
  • Bazel version (if compiling from source): N/A
  • CUDA/cuDNN version: CUDA 9.1/cuDNN v7.1.3
  • GPU model and memory: Nvidia Titan V, 12GB
  • Exact command to reproduce: N/A
@tensorflowbutler tensorflowbutler added the stat:awaiting response Waiting on input from the contributor label Apr 25, 2018
@tensorflowbutler
Copy link
Member

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
What is the top-level directory of the model you are using
Have I written custom code
OS Platform and Distribution
TensorFlow installed from
TensorFlow version
Bazel version
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce

@austinmw
Copy link

austinmw commented Jun 25, 2018

@tensorflowbutler this is a feature request
@eilifsolberg hey I agree with this, can you edit the title and add [Feature Request]

@gzchenjiajun
Copy link

How is the progress of this problem? I know that the official 2.0 code has FP16, but what about the old code? Who knows how to write? I tried but failed... @austinmw @eilifsolberg @tensorflowbutler

@eilifsolberg
Copy link
Author

eilifsolberg commented Oct 22, 2019

I guess there have come two solutions for this (for tensorflow >= 1.14):

  • If you use Nvdia NGC containers, you should be able to just set the environment variable TF_ENABLE_AUTO_MIXED_PRECISION to '1', e.g. by
os.environ['TF_ENABLE_AUTO_MIXED_PRECISION'] = '1'
optimizer = tf.train.experimental.enable_mixed_precision_graph_rewrite(optimizer)

See https://developer.nvidia.com/automatic-mixed-precision for more.

@eilifsolberg eilifsolberg changed the title fp16 support in the Object Detection API Feature request: fp16 support in the Object Detection API Oct 22, 2019
@eilifsolberg eilifsolberg reopened this Oct 22, 2019
@eilifsolberg eilifsolberg changed the title Feature request: fp16 support in the Object Detection API fp16 support in the Object Detection API [Feature request] Oct 22, 2019
@gzchenjiajun
Copy link

I have modified this document yesterday and also configured the environment variable os.environ['TF_ENABLE_AUTO_MIXED_PRECISION'] = '1' as required.

 /research/object_detection/builders/optimizer_builder.py
  If optimizer_type == 'momentum_optimizer':
    Config = optimizer_config.momentum_optimizer
    Learning_rate = _create_learning_rate(config.learning_rate)
    Summary_vars.append(learning_rate)
    Optimizer = tf.train.MomentumOptimizer(
        Learning_rate,
        Momentum=config.momentum_optimizer_value)
    Optimizer = tf.train.experimental.enable_mixed_precision_graph_rewrite(optimizer)

But I found that it didn't work, I don't know what happened. The pre-training is ssd_resnet101_v1_fpn_shared_box_predictor_oid_512x512_sync.config

I was thinking that it might be caused by 2 problems:
The first is that the location of the environment variable is set incorrectly.
The second one is that this code is wrong (I feel that the root code is not implemented to this step)

@eilifsolberg

@gzchenjiajun
Copy link

I am training with gpu. I am currently upgrading to tensorflow-gpu to 1.14, but tensorflow is still 1.13. Is this related?

@eilifsolberg

@eilifsolberg
Copy link
Author

Do you use nvidia ngc containers >= 19.03? The environment variable only works in this case, and needs to be set before model_builder.py is called. This could be done by setting the environment variable in the shell before the script is run.

Otherwise you need tensorflow 1.14 or higher and edit model_builder.py

@eilifsolberg
Copy link
Author

You should not both set the environment variable and wrap the optimizer, only one of them.

@gzchenjiajun
Copy link

What do you mean by saying that the environment variable ['TF_ENABLE_AUTO_MIXED_PRECISION'] = '1' and tf.train.experimental.enable_mixed_precision_graph_rewrite only need to use one at the same time?
My GPU is Tesla T4, I think it should meet the requirements of nvidia ngc containers >= 19.03

I have been troubled by this problem for a long time, I really hope to solve it, thank you very much.
@eilifsolberg

@gzchenjiajun
Copy link

I ran the demo code of FP16, it is working
But in tensorflow/models object_detection, it’s been a long time or not.
@eilifsolberg

@gzchenjiajun
Copy link

Research/object_detection/builders/optimizer_builder.py:57
I added the enable_mixed_precision_graph_rewrite code on line 57 of optimizer_builder.py, but it still has no effect. Where should I write this code?

@eilifsolberg

@gzchenjiajun
Copy link

Another point is that my current training program will report this warning, which is the warning I have after I upgraded from tensorflow-gpu1.13 to 1.14.
I don't know if this has any effect.
W1024 13:45:00.194824 139707432818496 ag_logging.py:145] Entity <bound method Conv.call of <tensorflow.python.layers.convolutional.Conv2D object at 0x7f0fcdcaff98>> could not be transformed and will be executed as-is. Please report This to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method Conv.call of <tensorflow.python.layers. convolutional.Conv2D object at 0x7f0fcdcaff98>>: AssertionError: Bad argument number for Name: 3, expecting 4
WARNING:tensorflow:Entity <bound method BatchNormalization.call of <tensorflow.python.layers.normalization.BatchNormalization object at 0x7f0fcdbbff60>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the Bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause: converting <bound method BatchNormalization.call of <tensorflow.python.layers.normalization.BatchNormalization object at 0x7f0fcdbbff60>>: AssertionError: Bad argument number for Name: 3, expecting 4

@eilifsolberg

@eilifsolberg
Copy link
Author

eilifsolberg commented Oct 24, 2019

I guess you should do it in line 76 of https://github.com/tensorflow/models/blob/master/research/object_detection/builders/optimizer_builder.py

as it will then be done independent of which optimizer you use. If you do this, don't change or set the environment variable.

Not sure about the last question, seems like this is something that you might want to report as an issue on the tensorflow github page (not tensorflow/models, but tensorflow/tensorflow).

@gzchenjiajun
Copy link

I have added the relevant code to line 76, and other requirements are also made according to your guidelines.
But it has no effect. Is there any solution?
thank
@eilifsolberg

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Waiting on input from the contributor label Oct 26, 2019
@eilifsolberg
Copy link
Author

Sorry, not sure what the problem might be. I think you should ask a question on StackOverflow.

@gzchenjiajun
Copy link

Ok, I haven't researched it for the time being.
Tensorflow2.0 and related code have been upgraded but still mixed precision or no effect, Stack Overflow also asked
Half precision float - fp16 support in the Object Detection API(tensorflow) - Stack Overflow
Https://stackoverflow.com/questions/58585259/fp16-support-in-the-object-detection-apitensorflow

I am using tensorRT for speeding up for the time being.

@eilifsolberg

@gzchenjiajun
Copy link

I would like to ask how you use the mixing precision to speed up your project. thank
@eilifsolberg

@apatsekin
Copy link

bump.. no updates?

@ravikyram ravikyram added models:research models that come under research directory type:support type:feature and removed type:support labels Jul 10, 2020
@saberkun
Copy link
Member

In Object detection TF2, the feature can be enabled with keras mix precision api: https://www.tensorflow.org/guide/mixed_precision

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests