-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to enable apex O1 for inferences on a non-apex FP32 trained model? #750
Comments
@Damiox # Initialization
opt_level = 'O2' # for only use FP32
model, optimizer = amp.initialize(model, optimizer, opt_level=opt_level)
# Restore
model = ...
optimizer = ...
checkpoint = torch.load('checkpoint.pth')
model, optimizer = amp.initialize(model, optimizer, opt_level=opt_level)
model.load_state_dict(checkpoint['model'])
optimizer.load_state_dict(checkpoint['optimizer'])
amp.load_state_dict(checkpoint['amp']) |
hey @Lornatang . This is my current code:
The optimizer I'm using is this https://huggingface.co/transformers/main_classes/optimizer_schedules.html My questions below:
|
|
@Lornatang actually my problem is that I'm using a model that was not trained with mixed precision. It's fp32. I'm running inferences faster by using apex with O1 level at inference time for this model. I don't see much discrepancies, but I'm not sure whether what I do is right or not. I can't find in the documentation whether that's ok. Do you know where I can confirm that from? |
@Damiox |
@Lornatang thank for helping me out with this. Could you please elaborate more in the reason why this should work? What's the theory behind? Thanks |
@Damiox 1.The essence of mixed precision training lies in "using fp16 as storage and multiplication in memory to speed up calculation, using fp32 as accumulation to avoid rounding error". The strategy of hybrid precision training effectively alleviates the problem of rounding error. 2.Loss scaling uses mixed precision training, or it can't converge, because the value of activation gradient is too small, resulting in underflow. The idea of loss amplification is as follows:
|
Just to be 100% on the same page here. |
@Lornatang I'm sorry to ping you again, but just wanted to make sure you got my point clearly. Is it wrong to initialize apex for inferences on an existing FP32 model that I haven't re-trained with apex? Everywhere in the documentation it looks like it's assumed that the model is re-trained with apex, but I'm not re-training my model with apex, I'm just using apex when running predictions for my model. Just wanted to clarify it and get some feedback from you. Thanks! |
@Damiox |
@Damiox I'm trying the same thing (trained fp32, inference with apex). Unfortunately I see inference time gets slower.. How much increase in speed have you observed using apex only at inference time? |
A considerable amount of time indeed. Take a look at your GPU. Not all GPUs take advantage of the FP16 operations. Tesla T4 indeed does take advantage of Apex. It's approximately 2x faster |
I have run inference for a Fp32 model on TESLA T4, but I haven't got any speed up, how can I confirm that tensor Core is used? or can you help me ?
Thanks |
Can I use apex at inference time on a pure FP32 model that was not trained with apex?
Does apex require for inferences explicitly and emphatically a model that was initially trained with apex being enabled? It's not clear for me yet.
Could I get some explanation about that? I can't find the answer in the docs.
The text was updated successfully, but these errors were encountered: