Noise at the end of produced wave file #93

iarspider · 2021-06-18T11:20:58Z

iarspider
Jun 18, 2021

🐛 Bug

Noise at the end of produced wave file

To Reproduce

Steps to reproduce the behavior:

Install torch, numpy
Run the example for Habrahabr article:

import os
import torch

device = torch.device('cpu')
torch.set_num_threads(4)
local_file = 'model.pt'

if not os.path.isfile(local_file):
    torch.hub.download_url_to_file('https://models.silero.ai/models/tts/ru/v2_kseniya.pt',
                                   local_file)  

model = torch.package.PackageImporter(local_file).load_pickle("tts_models", "model")
model.to(device)

example_batch = ['В недрах тундры выдры в г+етрах т+ырят в вёдра ядра кедров.',
                 'Котики - это жидкость!',
                 'М+ама М+илу м+ыла с м+ылом.']
sample_rate = 16000

audio_paths = model.save_wav(texts=example_batch,
                             sample_rate=sample_rate)

The second and third audio files are 12 seconds long (instead of ~1 sec), and are "padded" with noise

Expected behavior

No padding with noise

Environment

Please copy and paste the output from this
environment collection script
(or fill out the checklist below manually).

Collecting environment information...
PyTorch version: 1.9.0+cpu
Is debug build: False
CUDA used to build PyTorch: Could not collect
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Pro
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.8 (64-bit runtime)
Python platform: Windows-10-10.0.18362-SP0
Is CUDA available: False
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080
Nvidia driver version: 466.77
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.3
[pip3] torch==1.9.0
[conda] Could not collect

Additional context

Thanks a lot for creating this!

Answered by snakers4

Jun 18, 2021

This is a known tacotron bug
Happens due to batching

Batch size 1 fixes this (i.e. just feed 1 phrase)

We basically have 2 options how to fix this more properly

Downstream with a VAD (please use silero-vad or WebRTC)
By looking at the alignment map of tacotron (we did not implement this yet)

View full answer

snakers4 · 2021-06-18T11:27:44Z

snakers4
Jun 18, 2021
Maintainer

This is a known tacotron bug
Happens due to batching

Batch size 1 fixes this (i.e. just feed 1 phrase)

We basically have 2 options how to fix this more properly

Downstream with a VAD (please use silero-vad or WebRTC)
By looking at the alignment map of tacotron (we did not implement this yet)

0 replies

snakers4 · 2021-06-18T11:28:17Z

snakers4
Jun 18, 2021
Maintainer

in future though we will stop using tacotron, so the maybe just using batch_size = 1 or just applying a VAD is good temporary solution (especially when you run on CPU)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Noise at the end of produced wave file #93

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Noise at the end of produced wave file #93

iarspider Jun 18, 2021

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Replies: 2 comments

snakers4 Jun 18, 2021 Maintainer

snakers4 Jun 18, 2021 Maintainer

iarspider
Jun 18, 2021

snakers4
Jun 18, 2021
Maintainer

snakers4
Jun 18, 2021
Maintainer