You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I read from here that since Cudnn 7.3 we don't need to worry about 'input channels, output channels, and batch size' in order for Tensor Core to speed FP16 computation. But I also read that this only applies to packed NCHW data. May I ask what is packed NCHW data? I understand that the channels and batch sizes doesn't need to be a multiple of 8. What about the size of H and W?
I'm asking this because I tried lightning with APex amp and pytorch 1.6 native amp. Neither of them has speeded my training process.
Thanks!
The text was updated successfully, but these errors were encountered:
I read from here that since Cudnn 7.3 we don't need to worry about 'input channels, output channels, and batch size' in order for Tensor Core to speed FP16 computation. But I also read that this only applies to packed NCHW data. May I ask what is packed NCHW data? I understand that the channels and batch sizes doesn't need to be a multiple of 8. What about the size of H and W?
I'm asking this because I tried lightning with APex amp and pytorch 1.6 native amp. Neither of them has speeded my training process.
Thanks!
The text was updated successfully, but these errors were encountered: