using skip_nan_grad with gradient accumulation for ASR #11272

qhoangdl · 2024-11-13T09:27:00Z

Hi, I notice that while training the ASR fastconformer model on my data, some after a few epochs there can be a batch that cause NaN error. So, I tried setting skip_nan_grad = True, which allow the train_loss curve to look normal (without spiking to NaN), but I notice in the error log that after the message "detected inf or nan values in gradients! Setting gradients to zero.", the all prediction in the following validation rounds are "??" and the validation wer increased to 1. I set accumulate_grad_batches to 4 for the training so I wonder whether this might be the culprit. Does setting skip_nan_grad = True can still work with gradient accumulation. I'd really appreciate any comment you can give on the issue!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using skip_nan_grad with gradient accumulation for ASR #11272

using skip_nan_grad with gradient accumulation for ASR #11272

qhoangdl commented Nov 13, 2024

using skip_nan_grad with gradient accumulation for ASR #11272

using skip_nan_grad with gradient accumulation for ASR #11272

Comments

qhoangdl commented Nov 13, 2024