Skip to content

Feature normalization can cause NaN to appear #64

@rcontrai

Description

@rcontrai

I was trying to fine-tune the model on a french corpus when I realized the loss kept turning into NaN, which ruined the model's parameters.

After some investigation I found a culprit : the code that normalises the audio features (specifically allosaurus.pm.utils.feature_cmvn()) has several problems that can cause the features to become NaN, causing NaNs to appear further down the line during training.

First, on line 10, the computation for spk_std is numerically unstable and can find a negative variance (for instance, -0.156 whereas numpy.var() finds 4.50e-07), and then computing its square root returns NaN.
This can be fixed by replacing this line with spk.std = np.std(feature, axis=0) (also line 9 can be removed) .

Second, on line 12, there is a division by the standard deviation, but there is no guarantee that it is not 0. As a result, features can be turned into NaN when their variance is null.
This can be fixed by adding the line spk_std += (spk_std == 0.), which replaces the zeros with ones, before the division.

Here is a file for which these problems occur, taken from the Mozilla CommonVoice dataset.
FNH4QW-sample-0.wav.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions