Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: when preprocessing, everyvoice forces equal length time and f… #421

Merged
merged 2 commits into from
May 13, 2024

Conversation

roedoejet
Copy link
Member

…eature reps

audio must be divisible by the declared hop size

the number of frames in the spectrogram must exactly equal the number of samples in the audio when multiplied by the hop size

PR Goal?

We had very tiny mismatches between length of the audio and Mel spectrograms when the number of samples in the audio was not evenly divisible by the hop size. This PR truncates the audio when preprocessing to be evenly divisible by the hop size.

This is implemented for both the input and (potentially) upsampled audio in the case of vocoder that applies super-resolution to input spectrograms.

This means that we effectively discard up to hop_size - 1 samples in the audio. For the default (22.05 kHz audio with 256 hop size) this means that we discard up to 255/22050 = 11.5ms of audio, which I think is OK :)

Fixes?

Hopefully makes it easier to some downstream models that depend on exact alignment between

Feedback sought?

Sanity checking

Priority?

low/medium

Tests added?

Added some unittests

How to test?

Just having a look at the code is sufficient. If you want, you could try preprocessing some data and ensure that the resulting audio is divisible by the hop size (256 by default). Similarly you should be able to read in the preprocessed spectral features and multiply the 2nd dimension by the hop size to obtain the exact number of samples in the audio

Confidence?

high

Version change?

If we weren't in pre-alpha yes.

Copy link
Contributor

github-actions bot commented May 6, 2024

CLI load time: 0:00.29
Pull Request HEAD: 854dae21d889f7c52902e425854011a896522ced
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package

Copy link

codecov bot commented May 6, 2024

Codecov Report

Attention: Patch coverage is 78.94737% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 73.49%. Comparing base (dd09b8a) to head (854dae2).
Report is 2 commits behind head on main.

Files Patch % Lines
everyvoice/preprocessor/preprocessor.py 78.94% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #421      +/-   ##
==========================================
+ Coverage   73.45%   73.49%   +0.03%     
==========================================
  Files          43       43              
  Lines        2837     2852      +15     
  Branches      467      468       +1     
==========================================
+ Hits         2084     2096      +12     
- Misses        668      671       +3     
  Partials       85       85              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@roedoejet roedoejet changed the title feat\!: when preprocessing, everyvoice forces equal length time and f… feat!: when preprocessing, everyvoice forces equal length time and f… May 6, 2024
…ature reps

audio must be divisible by the declared hop size

the number of frames in the spectrogram must exactly equal the number of samples in the audio when multiplied by the hop size
Copy link
Collaborator

@SamuelLarkin SamuelLarkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put the calculation of the number of frame in a function to make the code DRY and to help ourselves in future if we have to change how we calculate the end of the wav file.

@@ -741,6 +759,14 @@ def process_text(
return (character_tokens, phone_tokens, pfs)

def process_spec(self, item):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should annotate the return type

@roedoejet roedoejet merged commit 110512c into main May 13, 2024
4 checks passed
@roedoejet roedoejet deleted the dev.ap/fix-audio-and-spec branch May 13, 2024 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants