-
Notifications
You must be signed in to change notification settings - Fork 95
Description
I was trying to fine-tune the model on a french corpus when I realized the loss kept turning into NaN, which ruined the model's parameters.
After some investigation I found a culprit : the code that normalises the audio features (specifically allosaurus.pm.utils.feature_cmvn()
) has several problems that can cause the features to become NaN, causing NaNs to appear further down the line during training.
First, on line 10, the computation for spk_std
is numerically unstable and can find a negative variance (for instance, -0.156 whereas numpy.var()
finds 4.50e-07), and then computing its square root returns NaN.
This can be fixed by replacing this line with spk.std = np.std(feature, axis=0)
(also line 9 can be removed) .
Second, on line 12, there is a division by the standard deviation, but there is no guarantee that it is not 0. As a result, features can be turned into NaN when their variance is null.
This can be fixed by adding the line spk_std += (spk_std == 0.)
, which replaces the zeros with ones, before the division.
Here is a file for which these problems occur, taken from the Mozilla CommonVoice dataset.
FNH4QW-sample-0.wav.zip