feat!: when preprocessing, everyvoice forces equal length time and f… #421

roedoejet · 2024-05-06T21:21:36Z

…eature reps

audio must be divisible by the declared hop size

the number of frames in the spectrogram must exactly equal the number of samples in the audio when multiplied by the hop size

PR Goal?

We had very tiny mismatches between length of the audio and Mel spectrograms when the number of samples in the audio was not evenly divisible by the hop size. This PR truncates the audio when preprocessing to be evenly divisible by the hop size.

This is implemented for both the input and (potentially) upsampled audio in the case of vocoder that applies super-resolution to input spectrograms.

This means that we effectively discard up to hop_size - 1 samples in the audio. For the default (22.05 kHz audio with 256 hop size) this means that we discard up to 255/22050 = 11.5ms of audio, which I think is OK :)

Fixes?

Hopefully makes it easier to some downstream models that depend on exact alignment between

Feedback sought?

Sanity checking

Priority?

low/medium

Tests added?

Added some unittests

How to test?

Just having a look at the code is sufficient. If you want, you could try preprocessing some data and ensure that the resulting audio is divisible by the hop size (256 by default). Similarly you should be able to read in the preprocessed spectral features and multiply the 2nd dimension by the hop size to obtain the exact number of samples in the audio

Confidence?

high

Version change?

If we weren't in pre-alpha yes.

github-actions · 2024-05-06T21:26:23Z

CLI load time: 0:00.29
Pull Request HEAD: 854dae21d889f7c52902e425854011a896522ced
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package

codecov · 2024-05-06T21:26:31Z

Codecov Report

Attention: Patch coverage is 78.94737% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 73.49%. Comparing base (dd09b8a) to head (854dae2).
Report is 2 commits behind head on main.

Files	Patch %	Lines
everyvoice/preprocessor/preprocessor.py	78.94%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #421      +/-   ##
==========================================
+ Coverage   73.45%   73.49%   +0.03%     
==========================================
  Files          43       43              
  Lines        2837     2852      +15     
  Branches      467      468       +1     
==========================================
+ Hits         2084     2096      +12     
- Misses        668      671       +3     
  Partials       85       85

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…ature reps audio must be divisible by the declared hop size the number of frames in the spectrogram must exactly equal the number of samples in the audio when multiplied by the hop size

SamuelLarkin

Should we put the calculation of the number of frame in a function to make the code DRY and to help ourselves in future if we have to change how we calculate the end of the wav file.

SamuelLarkin · 2024-05-09T18:32:39Z

everyvoice/preprocessor/preprocessor.py

@@ -741,6 +759,14 @@ def process_text(
            return (character_tokens, phone_tokens, pfs)

    def process_spec(self, item):


We should annotate the return type

roedoejet force-pushed the dev.ap/fix-audio-and-spec branch from f756ce9 to 145b98e Compare May 6, 2024 21:23

roedoejet requested review from MENGZHEGENG and SamuelLarkin May 6, 2024 21:30

roedoejet changed the title ~~feat\!: when preprocessing, everyvoice forces equal length time and f…~~ feat!: when preprocessing, everyvoice forces equal length time and f… May 6, 2024

feat!: when preprocessing, everyvoice forces equal length time and fe…

a96307d

…ature reps audio must be divisible by the declared hop size the number of frames in the spectrogram must exactly equal the number of samples in the audio when multiplied by the hop size

roedoejet force-pushed the dev.ap/fix-audio-and-spec branch from 145b98e to a96307d Compare May 6, 2024 21:30

SamuelLarkin approved these changes May 9, 2024

View reviewed changes

fix: add function signature for review

854dae2

roedoejet merged commit 110512c into main May 13, 2024
4 checks passed

roedoejet deleted the dev.ap/fix-audio-and-spec branch May 13, 2024 16:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: when preprocessing, everyvoice forces equal length time and f… #421

feat!: when preprocessing, everyvoice forces equal length time and f… #421

roedoejet commented May 6, 2024

github-actions bot commented May 6, 2024 •

edited

Loading

codecov bot commented May 6, 2024 •

edited

Loading

SamuelLarkin left a comment

SamuelLarkin May 9, 2024

		@@ -741,6 +759,14 @@ def process_text(
		return (character_tokens, phone_tokens, pfs)

		def process_spec(self, item):

feat!: when preprocessing, everyvoice forces equal length time and f… #421

feat!: when preprocessing, everyvoice forces equal length time and f… #421

Conversation

roedoejet commented May 6, 2024

PR Goal?

Fixes?

Feedback sought?

Priority?

Tests added?

How to test?

Confidence?

Version change?

github-actions bot commented May 6, 2024 • edited Loading

codecov bot commented May 6, 2024 • edited Loading

Codecov Report

SamuelLarkin left a comment

Choose a reason for hiding this comment

SamuelLarkin May 9, 2024

Choose a reason for hiding this comment

github-actions bot commented May 6, 2024 •

edited

Loading

codecov bot commented May 6, 2024 •

edited

Loading