Feature normalization can cause NaN to appear

I was trying to fine-tune the model on a french corpus when I realized the loss kept turning into NaN, which ruined the model's parameters.

After some investigation I found a culprit : the code that normalises the audio features (specifically `allosaurus.pm.utils.feature_cmvn()`) has several problems that can cause the features to become NaN, causing NaNs to appear further down the line during training.

First, on line 10, the computation for `spk_std` is numerically unstable and can find a negative variance (for instance, -0.156 whereas `numpy.var()` finds 4.50e-07), and then computing its square root returns NaN.
This can be fixed by replacing this line with `spk.std = np.std(feature, axis=0)` (also line 9 can be removed) .

Second, on line 12, there is a division by the standard deviation, but there is no guarantee that it is not 0. As a result, features can be turned into NaN when their variance is null.
This can be fixed by adding the line `spk_std += (spk_std == 0.)`, which replaces the zeros with ones, before the division.

Here is a file for which these problems occur, taken from the Mozilla CommonVoice dataset.
[FNH4QW-sample-0.wav.zip](https://github.com/xinjli/allosaurus/files/8917429/FNH4QW-sample-0.wav.zip)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature normalization can cause NaN to appear #64

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature normalization can cause NaN to appear #64

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions