consolidate spectrogram dimensions #572

roedoejet · 2024-10-30T01:18:54Z

Previously, there was an issue where:

During preprocessing we saved spectrogram tensors as [K, T]
During text-to-spec synthesis we saved spectrogram tensors as [T, K]
During spec-to-wav synthesis we expected spectrogram tensors as [T, K]

#513 noticed this problem when synthesizing from certain freq-oriented tensors. Our models work with time-oriented tensors, but it's more standard to have frequency/Mel-band oriented tensors when saving spectrograms (this is the default in torchaudio, librosa etc). Since the output files should be as interoperable as possible, I've consolidated our read/write operations to use [K, T] orientation throughout (i.e. changing text-to-spec synthesis to output [K, T] tensors and expecting [K, T] tensors during spec-to-wav synthesis.

I also moved a log message that said "Loading Vocoder from None" which was annoying. And I replaced writing the wav files with scipy to torchaudio since I started getting some bit depth errors with the spec-to-wav synthesis.

PR Goal?

Ideally this should just work going forward. You should be able to:

synthesize text to wav
synthesize text to spec, then synthesize spec to wav from the generated spec
synthesize spec to wav from a preprocessed spec file
synthesize from a spec file generated prior to this PR using the --time-oriented flag
synthesize audio and spectrograms during training

Fixes?

#513

Feedback sought?

Sanity. I've tested the above expectations 1-5 but please try one or some of them to corroborate and write a comment for which things you tested.

Priority?

medium-high (synthesizing from non-time-oriented spectrograms causes an error right now)

Tests added?

How to test?

Try doing some of the things described in the PR Goal

Confidence?

medium

Version change?

This is a breaking change but we'll just include it in alpha.

Related PRs?

EveryVoiceTTS/FastSpeech2_lightning#94
EveryVoiceTTS/HiFiGAN_iSTFT_lightning#39

semanticdiff-com · 2024-10-30T01:18:57Z

Review changes with

Changed Files

File	Status
everyvoice/model/feature_prediction/FastSpeech2_lightning	0% smaller
everyvoice/model/vocoder/HiFiGAN_iSTFT_lightning	0% smaller

github-actions · 2024-10-30T01:25:47Z

CLI load time: 0:00.28
Pull Request HEAD: d16bc5b0b3e1d45d4e2585d76427d08cb6b8b059
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package

codecov · 2024-10-30T01:27:01Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 75.24%. Comparing base (88746a8) to head (d16bc5b).
Report is 5 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #572      +/-   ##
==========================================
- Coverage   76.12%   75.24%   -0.89%     
==========================================
  Files          46       46              
  Lines        3393     3490      +97     
  Branches      461      481      +20     
==========================================
+ Hits         2583     2626      +43     
- Misses        707      760      +53     
- Partials      103      104       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

marctessier · 2024-10-30T17:54:39Z

@roedoejet I think we might have an issues with this.

For my first test, I trained a Vocoder ( not issues here) . Then I tried to use that vocoder for training the FP model but it keeps on crashing with this message below. ( see attachment for the full error log. ) ( vocoder_path: ../logs_and_checkpoints/VocoderExperiment/base/checkpoints/voc.ckpt )
LJ-FP.e3162427.txt

I will try other things like you listed to see how that behaves.

   1524 │   │   if not (self._backward_hooks or self._backward_pre_hooks or s │
│   1525 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hoo │
│   1526 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks │
│ ❱ 1527 │   │   │   return forward_call(*args, **kwargs)                      │
│   1528 │   │                                                                 │
│   1529 │   │   try:                                                          │
│   1530 │   │   │   result = None                                             │
│                                                                              │
│ /gpfs/fs5/nrc/nrc-fs1/ict/others/u/tes001/miniforge3/envs/EveryVoice_dev.ap_ │
│ 513/lib/python3.10/site-packages/torch/nn/modules/conv.py:310 in forward     │
│                                                                              │
│    307 │   │   │   │   │   │   self.padding, self.dilation, self.groups)     │
│    308 │                                                                     │
│    309 │   def forward(self, input: Tensor) -> Tensor:                       │
│ ❱  310 │   │   return self._conv_forward(input, self.weight, self.bias)      │
│    311                                                                       │
│    312                                                                       │
│    313 class Conv2d(_ConvNd):                                                │
│                                                                              │
│ /gpfs/fs5/nrc/nrc-fs1/ict/others/u/tes001/miniforge3/envs/EveryVoice_dev.ap_ │
│ 513/lib/python3.10/site-packages/torch/nn/modules/conv.py:306 in             │
│ _conv_forward                                                                │
│                                                                              │
│    303 │   │   │   return F.conv1d(F.pad(input, self._reversed_padding_repea │
│    304 │   │   │   │   │   │   │   weight, bias, self.stride,                │
│    305 │   │   │   │   │   │   │   _single(0), self.dilation, self.groups)   │
│ ❱  306 │   │   return F.conv1d(input, weight, bias, self.stride,             │
│    307 │   │   │   │   │   │   self.padding, self.dilation, self.groups)     │
│    308 │                                                                     │
│    309 │   def forward(self, input: Tensor) -> Tensor:                       │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Given groups=1, weight of size [512, 80, 7], expected input[1, 
696, 80] to have 80 channels, but got 696 channels instead

Loading EveryVoice modules: 100%|██████████| 4/4 [00:13<00:00,  3.35s/it]   
srun: error: ib14gpu-002: task 0: Exited with exit code 1

marctessier · 2024-10-30T18:04:48Z

FYI, I also get the same issue when using hifigan_universal_v1_everyvoice.ckpt

I managed to get the FP training to work by removing the reference to vocoder " vocoder_path: " in config/everyvoice-text-to-spec.yaml

roedoejet · 2024-10-31T22:07:55Z

@marctessier - did you maybe not re-run preprocessing? The old mel spectrograms that were calculated will have to be re-processed (everyvoice preprocess config/everyvoice-text-to-spec.yaml -s spec -O)

EDIT: nevermind! I see what you mean. training worked for me, but when I added a vocoder checkpoint it failed during the validation step - nice catch! I fixed this and it should now be ready to go.

fixes #513

joanise · 2024-11-01T20:02:08Z

Since this is a breaking change, and it's possible some users will have preprocessed files saved, I'd like to see some heuristic tests that gives a friendly error message if the input looks transposed, with instructions telling the user what to rerun.

roedoejet requested review from SamuelLarkin, dlothian, joanise, MENGZHEGENG and marctessier October 30, 2024 01:19

roedoejet force-pushed the dev.ap/513 branch from 7a2df23 to 9884822 Compare October 30, 2024 01:22

chore: update submodule

d16bc5b

fixes #513

roedoejet force-pushed the dev.ap/513 branch from 9884822 to d16bc5b Compare October 31, 2024 22:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consolidate spectrogram dimensions #572

consolidate spectrogram dimensions #572

roedoejet commented Oct 30, 2024

semanticdiff-com bot commented Oct 30, 2024 •

edited

Loading

github-actions bot commented Oct 30, 2024 •

edited

Loading

codecov bot commented Oct 30, 2024 •

edited

Loading

marctessier commented Oct 30, 2024

marctessier commented Oct 30, 2024

roedoejet commented Oct 31, 2024 •

edited

Loading

joanise commented Nov 1, 2024

consolidate spectrogram dimensions #572

Are you sure you want to change the base?

consolidate spectrogram dimensions #572

Conversation

roedoejet commented Oct 30, 2024

PR Goal?

Fixes?

Feedback sought?

Priority?

Tests added?

How to test?

Confidence?

Version change?

Related PRs?

semanticdiff-com bot commented Oct 30, 2024 • edited Loading

github-actions bot commented Oct 30, 2024 • edited Loading

codecov bot commented Oct 30, 2024 • edited Loading

Codecov Report

marctessier commented Oct 30, 2024

marctessier commented Oct 30, 2024

roedoejet commented Oct 31, 2024 • edited Loading

joanise commented Nov 1, 2024

semanticdiff-com bot commented Oct 30, 2024 •

edited

Loading

github-actions bot commented Oct 30, 2024 •

edited

Loading

codecov bot commented Oct 30, 2024 •

edited

Loading

roedoejet commented Oct 31, 2024 •

edited

Loading