Tensor Size Constraint for Tensor Cores #921

zhenhuahu · 2020-07-25T04:57:44Z

I read from here that since Cudnn 7.3 we don't need to worry about 'input channels, output channels, and batch size' in order for Tensor Core to speed FP16 computation. But I also read that this only applies to packed NCHW data. May I ask what is packed NCHW data? I understand that the channels and batch sizes doesn't need to be a multiple of 8. What about the size of H and W?

I'm asking this because I tried lightning with APex amp and pytorch 1.6 native amp. Neither of them has speeded my training process.

Thanks!

zhenhuahu · 2020-08-07T00:07:54Z

I have found that it was the CPU bottleneck. Thanks

zhenhuahu closed this as completed Aug 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensor Size Constraint for Tensor Cores #921

Tensor Size Constraint for Tensor Cores #921

zhenhuahu commented Jul 25, 2020

zhenhuahu commented Aug 7, 2020

Tensor Size Constraint for Tensor Cores #921

Tensor Size Constraint for Tensor Cores #921

Comments

zhenhuahu commented Jul 25, 2020

zhenhuahu commented Aug 7, 2020