Describe the bug
Follow-up of Project-MONAI/tutorials#1336, depending on the cuDNN version and GPU mode, LocalNormalizedCrossCorrelationLoss running with low precision operations may not be numerically stable.
The current workaround is with
torch.backends.cuda.matmul.allow_tf32 = False torch.backends.cudnn.allow_tf32 = False. It would be great to improve the stability in general.