Is it expected that the training speed of torch.cuda.amp is better than apex.amp #1412

kehuanfeng · 2022-06-23T03:16:32Z

The environment is as below,
torch: 1.11.0+cu113
apex: 0.1
cuda 11.3.109
mmdet: 2.24.1
mmcv-full: 1.5.1
model: configs/yolox/yolox_l_8x8_300e_coco.py

I tried to compare the training speed of torch.cuda.amp(autocast + gradscaler) and apex.amp, and found that the native torch.cuda.amp is faster.

# the number indicates the total training time for two epochs,
torch.cuda.amp: 2130 sec
apex.amp: 2309 sec

I'd like to understand whether it is expected and why it happens.

The text was updated successfully, but these errors were encountered:

kehuanfeng · 2022-06-23T09:34:51Z

After breaking down the step time of torch.cuda.amp and apex.amp, it seems like apex requires more data copy.

ptrblck · 2022-08-03T07:26:32Z

apex.amp is deprecated and you should use the native implementation via torch.cuda.amp as described here.
Closing

kehuanfeng added the bug Something isn't working label Jun 23, 2022

ptrblck closed this as completed Aug 3, 2022

Provide feedback