Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training error #74

Closed
Eternity231 opened this issue Aug 29, 2022 · 6 comments
Closed

Training error #74

Eternity231 opened this issue Aug 29, 2022 · 6 comments

Comments

@Eternity231
Copy link

I just run train.py and got this error
INFO:baker_base:{'train': {'log_interval': 200, 'eval_interval': 10000, 'seed': 1234, 'epochs': 20000, 'learning_rate': 0.0002, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 16, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 8192, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'training_files': 'filelists/baker_train.txt', 'validation_files': 'filelists/baker_valid.txt', 'max_wav_value': 32768.0, 'sampling_rate': 16000, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'n_mel_channels': 80, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 0}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'n_layers_q': 3, 'use_spectral_norm': False}, 'model_dir': './logs\baker_base'}
WARNING:baker_base:E:\vits\ is not a git repository, therefore hash value comparison will be ignored.
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
Traceback (most recent call last):
File "train.py", line 294, in
main()
File "train.py", line 50, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "C:\Python38\lib\site-packages\torch\multiprocessing\spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "C:\Python38\lib\site-packages\torch\multiprocessing\spawn.py", line 198, in start_processes
while not context.join():
File "C:\Python38\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "C:\Python38\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap
fn(i, *args)
File "E:\vits\train.py", line 119, in run
train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
File "E:\vits\train.py", line 139, in train_and_evaluate
for batch_idx, (x, x_lengths, spec, spec_lengths, y, y_lengths) in enumerate(train_loader):
File "C:\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 530, in next
data = self._next_data()
File "C:\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 1224, in _next_data
return self._process_data(data)
File "C:\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 1250, in _process_data
data.reraise()
File "C:\Python38\lib\site-packages\torch_utils.py", line 457, in reraise
raise exception
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "C:\Python38\lib\site-packages\torch\utils\data_utils\worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "C:\Python38\lib\site-packages\torch\utils\data_utils\fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Python38\lib\site-packages\torch\utils\data_utils\fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "E:\vits\data_utils.py", line 90, in getitem
return self.get_audio_text_pair(self.audiopaths_and_text[index])
File "E:\vits\data_utils.py", line 61, in get_audio_text_pair
spec, wav = self.get_audio(audiopath)
File "E:\vits\data_utils.py", line 67, in get_audio
raise ValueError("{} {} SR doesn't match target {} SR".format(
IndexError: Replacement index 2 out of range for positional args tuple
Can anyone help me?

@lexkoro
Copy link

lexkoro commented Sep 4, 2022

You have different sample rates in your audio files.

There is also an error in the code https://github.com/jaywalnut310/vits/blob/main/data_utils.py#L68

Remove the first {} in raise ValueError("{} {} SR doesn't match target {} SR".format(sampling_rate, self.sampling_rate))

@Eternity231
Copy link
Author

I remove it and got this
INFO:baker_base:{'train': {'log_interval': 200, 'eval_interval': 10000, 'seed': 1234, 'epochs': 20000, 'learning_rate': 0.0002, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 16, 'fp16_run': True, 'lr_decay': 0.999875, 'segment_size': 8192, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'training_files': 'filelists/baker_train.txt', 'validation_files': 'filelists/baker_valid.txt', 'max_wav_value': 32768.0, 'sampling_rate': 16000, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'n_mel_channels': 80, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 0}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'n_layers_q': 3, 'use_spectral_norm': False}, 'model_dir': './logs\baker_base'}
WARNING:baker_base:E:\vits\ is not a git repository, therefore hash value comparison will be ignored.
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
Traceback (most recent call last):
File "train.py", line 294, in
main()
File "train.py", line 50, in main
mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,))
File "C:\Python38\lib\site-packages\torch\multiprocessing\spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "C:\Python38\lib\site-packages\torch\multiprocessing\spawn.py", line 198, in start_processes
while not context.join():
File "C:\Python38\lib\site-packages\torch\multiprocessing\spawn.py", line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "C:\Python38\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in _wrap
fn(i, *args)
File "E:\vits\train.py", line 119, in run
train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
File "E:\vits\train.py", line 139, in train_and_evaluate
for batch_idx, (x, x_lengths, spec, spec_lengths, y, y_lengths) in enumerate(train_loader):
File "C:\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 530, in next
data = self._next_data()
File "C:\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 1224, in _next_data
return self._process_data(data)
File "C:\Python38\lib\site-packages\torch\utils\data\dataloader.py", line 1250, in _process_data
data.reraise()
File "C:\Python38\lib\site-packages\torch_utils.py", line 457, in reraise
raise exception
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "C:\Python38\lib\site-packages\torch\utils\data_utils\worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "C:\Python38\lib\site-packages\torch\utils\data_utils\fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Python38\lib\site-packages\torch\utils\data_utils\fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "E:\vits\data_utils.py", line 90, in getitem
return self.get_audio_text_pair(self.audiopaths_and_text[index])
File "E:\vits\data_utils.py", line 61, in get_audio_text_pair
spec, wav = self.get_audio(audiopath)
File "E:\vits\data_utils.py", line 67, in get_audio
raise ValueError(" {} SR doesn't match target {} SR".format(
ValueError: 48000 SR doesn't match target 16000 SR

@lexkoro
Copy link

lexkoro commented Sep 4, 2022

ValueError: 48000 SR doesn't match target 16000 SR

You have a mismatch in sample rate

@UltimateAmitieKaiNiC
Copy link

CUDA error: device-side assert triggered

@Eternity231
Copy link
Author

Eternity231 commented Sep 5, 2022

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "C:\Python38\lib\site-packages\torch\multiprocessing\spawn.py", line 69, in wrap
fn(i, *args)
File "E:\vits\train.py", line 119, in run
train_and_evaluate(rank, epoch, hps, [net_g, net_d], [optim_g, optim_d], [scheduler_g, scheduler_d], scaler, [train_loader, eval_loader], logger, [writer, writer_eval])
File "E:\vits\train.py", line 192, in train_and_evaluate
scaler.scale(loss_gen_all).backward()
File "C:\Python38\lib\site-packages\torch_tensor.py", line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "C:\Python38\lib\site-packages\torch\autograd_init
.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: view_as_complex is only supported for float and double tensors, but got a tensor of scalar type: Half
i use torch 1.11.0is torch make this error?

@lexkoro
Copy link

lexkoro commented Sep 5, 2022

yes, try this fix #34

or downgrade

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants