Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve alignment accuracy by normalizing audio features #625

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

IbrahimAmin1
Copy link

Audio data should be pre-processed using the Wav2Vec2Processor (Wav2Vec2FeatureExtractor), I have noticed considerable alignment improvement (Mean absolute error) when audio is normalized (zero mean and unit variance) using the processor before the forward pass.

Other than that, Each Hugging face Wav2Vec2 Feature Extractor configuration should contain the same config used during fine-tuning these models (e.g. normalization, attention_mask usage, etc..)

A typical hugging face Wav2Vec2 Feature Extractor config file is as follows:

{
  "do_normalize": true,
  "feature_size": 1,
  "padding_side": "right",
  "padding_value": 0.0,
  "return_attention_mask": true,
  "sampling_rate": 16000
}

To maintain backwards compatibility, I have opted to let the user determine if Pre-processing should be applied or not, but chose to set Pre-processing as the default option as it improves alignment considerably.

@IbrahimAmin1 IbrahimAmin1 changed the title Improve alignment accuracy by normalizing audio features using Wav2Ve… Improve alignment accuracy by normalizing audio features Dec 13, 2023
Fix a typo in the preprocess argument
@Barabazs
Copy link
Collaborator

Barabazs commented Jan 1, 2025

Hi @IbrahimAmin1 thank you for the contribution.

Can you provide some examples to compare the result with and without the preprocessing?
Regarding the default behavior, I would suggest to default to False to keep the results consistent with previous versions. Maybe optionally add a CLI flag to activate it?

@Barabazs Barabazs added the question Further information is requested label Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants