Is it possible to enable apex O1 for inferences on a non-apex FP32 trained model? #750

Damiox · 2020-03-10T18:47:02Z

Can I use apex at inference time on a pure FP32 model that was not trained with apex?
Does apex require for inferences explicitly and emphatically a model that was initially trained with apex being enabled? It's not clear for me yet.

Could I get some explanation about that? I can't find the answer in the docs.

Lornatang · 2020-03-12T10:52:23Z

@Damiox
First, you must ensure that the 'optimizer' parameter exists in your model.
If you run the following code without any problems, you are successful.

# Initialization
opt_level = 'O2'  # for only use FP32
model, optimizer = amp.initialize(model, optimizer, opt_level=opt_level)

# Restore
model = ...
optimizer = ...
checkpoint = torch.load('checkpoint.pth')

model, optimizer = amp.initialize(model, optimizer, opt_level=opt_level)
model.load_state_dict(checkpoint['model'])
optimizer.load_state_dict(checkpoint['optimizer'])
amp.load_state_dict(checkpoint['amp'])

Damiox · 2020-03-12T13:05:03Z

hey @Lornatang . This is my current code:

                try:
                    from apex import amp
                    from transformers.optimization import AdamW
                    optimizer = AdamW(self.model.parameters())
                    self.model, optimizer = amp.initialize(self.model, optimizer, opt_level='O1')
                except ImportError:
                    print("NVIDIA's apex library is not installed. Automatic Mixed Precision cannot be enabled.")

The optimizer I'm using is this https://huggingface.co/transformers/main_classes/optimizer_schedules.html

My questions below:

Why should I need to use O2 instead of O1 for my use case?
Why should I restore the checkpoint?
Isn't my code good enough for my use case? Please remember that my model has not been trained with mixed precision. It's fp32, but I'm just trying to use apex only at inference time. I cannot find anything in the documentation about exploring that option. Is it not expected?

Lornatang · 2020-03-12T14:00:11Z

@Damiox

I think your problem is that you need to train on double precision instead of dynamic precision. If you set O1, it will work better.
If you call the pre training model directly, you do not need to, otherwise you must specify the location of the model weight in your directory.
Your sample code is correct and can be used.

Damiox · 2020-03-12T16:24:50Z

@Lornatang actually my problem is that I'm using a model that was not trained with mixed precision. It's fp32. I'm running inferences faster by using apex with O1 level at inference time for this model. I don't see much discrepancies, but I'm not sure whether what I do is right or not. I can't find in the documentation whether that's ok. Do you know where I can confirm that from?
Based on what you say then, training with mixed precision is not a requirement for using apex later for inferences? I can grab any fp32 model and run inferences with apex then, right? Thanks

Lornatang · 2020-03-13T01:21:51Z

@Damiox
Yes, you can

Damiox · 2020-03-13T15:41:59Z

@Lornatang thank for helping me out with this. Could you please elaborate more in the reason why this should work? What's the theory behind? Thanks

Lornatang · 2020-03-13T15:54:21Z

@Damiox
Apex tool characteristics: Hybrid precision training + dynamic loss amplification.

1.The essence of mixed precision training lies in "using fp16 as storage and multiplication in memory to speed up calculation, using fp32 as accumulation to avoid rounding error". The strategy of hybrid precision training effectively alleviates the problem of rounding error.

2.Loss scaling uses mixed precision training, or it can't converge, because the value of activation gradient is too small, resulting in underflow. The idea of loss amplification is as follows:

Before back propagation, the loss change (dloss) is increased by 2 ^ k times manually, so the intermediate variable (activation function gradient) obtained during back propagation will not overflow;
After back propagation, the weight gradient will be reduced by times and return to normal value.

Damiox · 2020-03-13T19:13:29Z

Just to be 100% on the same page here.
I am using an existing model for inferences that was initially trained with FP32 and without apex. I am using that model for inferences, not re-training it. I am using apex at inference time only to speed things up. I am not interested in anything about apex+training because I cannot re-train this model. Thanks

Damiox · 2020-03-16T13:14:50Z

@Lornatang I'm sorry to ping you again, but just wanted to make sure you got my point clearly. Is it wrong to initialize apex for inferences on an existing FP32 model that I haven't re-trained with apex? Everywhere in the documentation it looks like it's assumed that the model is re-trained with apex, but I'm not re-training my model with apex, I'm just using apex when running predictions for my model. Just wanted to clarify it and get some feedback from you. Thanks!

Lornatang · 2020-03-16T13:44:33Z

@Damiox
Sorry.
Apex reasoning can be done on any FP32 precision model.
You can try to load pytorch's vgg19 pre training model, which is trained with FP32. In the same way, he can initialize the model and make reasoning through the code I gave earlier.

kwanUm · 2020-09-02T12:40:00Z

@Damiox I'm trying the same thing (trained fp32, inference with apex). Unfortunately I see inference time gets slower.. How much increase in speed have you observed using apex only at inference time?

Damiox · 2020-09-02T16:03:01Z

@Damiox I'm trying the same thing (trained fp32, inference with apex). Unfortunately I see inference time gets slower.. How much increase in speed have you observed using apex only at inference time?

A considerable amount of time indeed. Take a look at your GPU. Not all GPUs take advantage of the FP16 operations. Tesla T4 indeed does take advantage of Apex. It's approximately 2x faster

BuaaAlban · 2020-09-22T10:07:14Z

@Damiox I'm trying the same thing (trained fp32, inference with apex). Unfortunately I see inference time gets slower.. How much increase in speed have you observed using apex only at inference time?

A considerable amount of time indeed. Take a look at your GPU. Not all GPUs take advantage of the FP16 operations. Tesla T4 indeed does take advantage of Apex. It's approximately 2x faster

I have run inference for a Fp32 model on TESLA T4， but I haven't got any speed up, how can I confirm that tensor Core is used? or can you help me ?
I have changed the model by

model = amp.initialize(model, opt_level='O3')
and changed the input of the model by
t_audio_signal_e=t_audio_signal_e.to(torch.half).cuda()

Thanks

Damiox mentioned this issue Mar 10, 2020

How to handle gradient overflow when training a deep model with mixed precision? #318

Open

Damiox mentioned this issue May 12, 2020

torch.cuda.amp > apex.amp #818

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to enable apex O1 for inferences on a non-apex FP32 trained model? #750

Is it possible to enable apex O1 for inferences on a non-apex FP32 trained model? #750

Damiox commented Mar 10, 2020 •

edited

Loading

Lornatang commented Mar 12, 2020

Damiox commented Mar 12, 2020

Lornatang commented Mar 12, 2020

Damiox commented Mar 12, 2020 •

edited

Loading

Lornatang commented Mar 13, 2020

Damiox commented Mar 13, 2020

Lornatang commented Mar 13, 2020

Damiox commented Mar 13, 2020

Damiox commented Mar 16, 2020

Lornatang commented Mar 16, 2020 •

edited

Loading

kwanUm commented Sep 2, 2020

Damiox commented Sep 2, 2020

BuaaAlban commented Sep 22, 2020

Is it possible to enable apex O1 for inferences on a non-apex FP32 trained model? #750

Is it possible to enable apex O1 for inferences on a non-apex FP32 trained model? #750

Comments

Damiox commented Mar 10, 2020 • edited Loading

Lornatang commented Mar 12, 2020

Damiox commented Mar 12, 2020

Lornatang commented Mar 12, 2020

Damiox commented Mar 12, 2020 • edited Loading

Lornatang commented Mar 13, 2020

Damiox commented Mar 13, 2020

Lornatang commented Mar 13, 2020

Damiox commented Mar 13, 2020

Damiox commented Mar 16, 2020

Lornatang commented Mar 16, 2020 • edited Loading

kwanUm commented Sep 2, 2020

Damiox commented Sep 2, 2020

BuaaAlban commented Sep 22, 2020

Damiox commented Mar 10, 2020 •

edited

Loading

Damiox commented Mar 12, 2020 •

edited

Loading

Lornatang commented Mar 16, 2020 •

edited

Loading