Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training ASR. #1

Open
crypticsymmetry opened this issue Mar 21, 2023 · 1 comment
Open

Training ASR. #1

crypticsymmetry opened this issue Mar 21, 2023 · 1 comment

Comments

@crypticsymmetry
Copy link

I am trying to train this repo on LibriTTS dataset. Starting with ASR training.
question 1: is the data formatting the same "path|transcription|speaker#"? also i see in your config you are now using csv's, do i have to convert to a csv as well.
question 2: does this training log look correct? the text that it prints doesnt make any sense.

I changed the code to use a single string as train_data and val_data instead of a list.
config.yml:

batch_size: 64
pretrained_model: ""
train_data: "Data/ASR_Train_data_test_kaggle.txt"
val_data: "Data/ASR_Val_data_kaggle.txt"

meldataset.py:

//...
class MelDataset(torch.utils.data.Dataset):
    def __init__(self, data_list, dict_path=DEFAULT_DICT_PATH, sr=22050):

        _data_list = [l[:-1].split("|") for l in data_list]
        self.min_seq_len = int(0.6 * 22050)
        self.max_sql_len = int(10.0 * 22050)
        self.text_cleaner = TextCleaner(dict_path)
        self.sr = sr

        self.data_list = self._filter(_data_list)

    def _filter(self, data):
      data_list = [
          (data[0], data[1], data[2])
          for data in data
          if (
              self.max_sql_len
              > (Path(data[0]).stat().st_size // 2)
              > self.min_seq_len
              and len(data[1]) > 5
          )
      ]
      print("data_list length: ", len(data))
      print("filtered data_list length: ", len(data_list))
      return data_list


    def __len__(self):
        return len(self.data_list)
    //....

utils.py:

//...
def get_data_path_list(train_path=None, val_path=None):
    train_list = []
    val_list = []
    if train_path:
        with open(train_path, "r") as f:
            train_list.extend(f.readlines())

    if val_path:
        with open(val_path, "r") as f:
            val_list.extend(f.readlines())

    return train_list, val_list
//...

Example dataset format "train_data_test.txt".

LibriTTS/train-clean-100/1088/129236/1088_129236_000019_000008.wav|The lover sees no resemblance except to summer evenings and diamond mornings, to rainbows and the song of birds.|6
LibriTTS/train-clean-100/1088/129236/1088_129236_000020_000003.wav|It is destroyed for the imagination by any attempt to refer it to organization.|6
LibriTTS/train-clean-100/1098/133695/1098_133695_000012_000001.wav|He thought a great deal about her; she was constantly present to his mind. At a time when his thoughts had been a good deal of a burden to him her sudden arrival, which promised nothing and was an open handed gift of fate, had refreshed and quickened them, given them wings and something to fly for.|9
//...

training logs.

data_list length:  29493
filtered data_list length:  23397
speaker_samples_weight tensor([0.0027, 0.0027, 0.0027,  ..., 0.0019, 0.0019, 0.0180])
/opt/conda/envs/py39/lib/python3.9/site-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
data_list length:  2276
filtered data_list length:  1680
/opt/conda/envs/py39/lib/python3.9/site-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
  0%|                                                   | 0/365 [00:00<?, ?it/s]/opt/conda/envs/py39/lib/python3.9/site-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
A
"
W
R
"
W
O
W
R
"
D
M
"
S
;
"
A
"
M
"
;
S
W
H
O
M
E
O
"
;
"
"
"
I
"
A
"
I
T
A
B
T
T
Y
I
I
A
"
"
M
"
"
T
"
D
A
"
P
"
I
I
"
S
S
J
S
M
O
S
A
I
T
B
A
I
T
I
O
I
H
M
C
"
I
H
I
"
I
I
T
I
"
"
I
I
H
"
T
T
A
A
L
L
E
;
I
I
O
I
H
I
F
"
O
D
P
F
"
W
"
A
O
W
G
I
"
F
V
M
T
C
I
W
B
;
B
T
S
I
I
"
I
F
I
B
C
"
T
C
"
I
"
F
C
"
C
"
F
C
T
T
A
G
L
I
T
T
E
R
I
G
I
G
H
T
F
L
O
W
E
R
T
H
E
U
S
E
O
F
A
A
M
E
I
M
B
W
I
L
;
A
T
I
I
M
;
I
O
J
V
T
"
W
D
G
"
"
W
;
O
I
T
I
"
F
M
P
"
"
I
I
"
S
"
L
S
;
O
;
H
;
P
H
A
"
S
S
S
"
I
I
"
T
T
S
I
"
T
M
"
S
O
;
O
M
B
I
"
I
T
A
H
A
T
W
T
"
U
J
W
Y
"
H
"
"
M
S
O
I
D
;
B
B
"
A
T
;
;
S
"
I
"
C
"
B
A
D
"
B
A
M
F
A
"
B
W
I
"
R
"
G
B
S
P
I
T
"
I
G
B
"
;
"
"
T
"
H
"
"
S
"
I
L
J
R
I
A
H
"
"
H
H
W
I
"
P
P
K
F
A
"
H
S
"
I
J
H
F
H
S
;
T
F
Y
"
W
"
P
S
T
I
C
W
B
O
S
T
C
P
;
C
I
H
Y
"
"
I
"
B
D
"
S
"
"
T
"
J
H
F
S
A
A
"
"
"
T
W
"
I
I
"
U
"
H
"
"
I
I
K
"
S
B
I
"
Y
"
Q
"
T
"
T
;
"
I
I
"
T
I
"
"
"
I
B
A
T
"
O
I
"
I
"
S
B
;
F
"
P
A
R
I
S
S
I
"
"
K
"
"
"
T
M
H
"
"
I
I
"
R
I
S
"
D
"
O
I
"
E
"
S
"
"
T
E
S
P
T
M
T
B
"
I
"
"
Y
"
"
W
M
"
"
I
S
"
P
"
"
S
P
"
"
L
P
"
J
T
H
I
I
T
B
"
D
;
"
B
"
I
"
B
"
H
"
Y
E
I
"
W
"
Q
T
R
M
I
M
A
P
C
"
T
L
"
"
I
C
"
I
I
A
"
D
"
C
M
T
I
W
P
C
"
A
"
A
;
"
I
"
T
"
A
I
B
S
"
L
I
"
B
T
"
I
T
C
I
I
T
H
I
P
I
T
H
W
S
T
"
O
;
;
I
"
B
I
"
I
"
"
T
;
I
I
B
B
V
T
D
T
I
;
I
F
P
A
S
"
A
"
"
H
T
I
T
I
M
I
E
I
;
H
S
T
U
T
"
I
"
W
I
I
B
I
I
S
L
;
I
A
C
L
F
F
C
H
A
B
S
A
T
Y
S
S
D
R
B
O
A
A
L
L
"
W
I
L
H
S
H
"
I
"
P
A
"
W
I
O
"
Y
I
"
I
F
"
H
"
E
H
"
C
"
D
"
I
B
"
P
T
A
T
A
"
"
M
B
I
I
L
R
H
;
M
"
T
"
;
"
G
"
A
"
"
I
C
"
T
M
H
D
I
T
T
S
M
I
A
"
A
"
G
T
"
B
H
"
S
I
I
"
W
F
"
O
"
B
B
A
M
G
W
P
D
I
E
"
"
"
A
W
B
"
I
V
J
H
"
W
I
I
I
O
I
I
O
T
I
G
A
T
I
I
"
B
H
B
"
D
T
"
T
"
"
T
H
L
"
P
"
"
T
T
I
H
"
J
J
T
T
A
I
"
A
W
"
B
E
J
R
R
E
J
"
V
"
;
T
"
I
S
T
B
I
"
L
T
T
B
T
"
B
H
"
S
F
"
H
T
I
;
I
"
W
I
"
Y
;
"
"
I
;
I
C
J
"
I
I
"
T
T
"
"
D
M
"
I
"
A
"
S
E
S
S
W
"
C
"
B
O
I
G
T
M
S
P
H
I
I
I
"
S
C
B
T
A
I
A
W
H
I
"
I
"
A
A
C
"
I
M
H
C
T
"
I
"
S
;
O
S
I
"
W
I
"
D
"
Y
"
H
"
H
H
I
I
A
H
S
B
I
A
P
J
D
T
M
T
I
"
F
C
H
G
O
B
T
W
"
"
M
"
"
I
C
O
"
U
Y
L
"
D
"
"
I
"
I
;
I
B
"
I
"
W
I
"
B
"
W
"
S
C
B
I
D
F
B
D
I
"
D
M
"
T
D
C
L
T
"
A
;
"
A
I
;
"
O
B
I
"
H
A
J
W
T
D
"
A
"
;
"
A
M
"
I
I
I
I
I
"
C
"
Y
T
"
S
"
A
F
P
W
T
L
C
T
P
T
"
W
"
Q
O
I
S
[
]
I
I
W
V
W
"
I
"
T
F
"
G
T
"
M
P
"
"
D
"
;
T
O
M
I
"
"
W
I
B
;
K
I
"
A
B
"
C
I
"
T
A
S
L
"
T
"
W
I
"
S
"
R
D
I
;
I
I
U
"
A
O
T
"
"
A
B
T
C
P
A
I
T
F
T
O
M
T
T
T
F
;
S
M
S
I
T
T
I
O
B
S
"
W
I
J
I
"
B
T
I
"
"
Y
"
"
T
W
"
"
I
I
"
S
G
W
H
C
A
I
"
B
(
)
W
T
L
A
L
"
W
B
"
T
J
B
"
W
"
"
I
;
I
"
A
I
"
H
S
K
W
G
F
"
"
L
R
S
D
M
I
"
W
M
"
T
"
W
I
"
V
"
J
"
I
I
I
I
"
I
T
"
H
F
A
"
I
T
I
"
"
T
"
"
P
"
C
P
I
A
M
H
H
"
M
C
"
"
J
"
D
"
P
P
O
"
W
"
"
I
"
S
"
Y
C
"
T
T
M
I
I
H
H
T
I
T
T
G
"
H
E
J
;
T
O
C
K
A
H
C
S
R
"
I
J
"
P
I
W
H
"
C
I
"
T
T
"
I
I
I
W
"
T
I
"
O
"
G
"
I
T
T
B
"
A
"
H
W
"
"
T
S
O
O
C
"
O
I
"
L
;
"
"
T
K
S
G
G
W
S
T
A
L
"
P
;
"
S
L
"
A
"
B
W
"
T
K
"
"
"
"
D
I
"
W
"
"
I
"
S
S
W
C
B
H
S
"
T
M
A
I
G
G
I
"
"
Y
"
P
"
W
"
W
B
A
A
H
C
M
"
O
"
A
P
T
I
I
"
H
F
"
L
;
"
"
I
"
H
I
I
H
T
W
"
I
;
I
"
"
W
"
M
C
H
P
T
"
W
T
A
S
H
"
W
M
H
L
"
T
"
W
"
H
I
T
H
T
I
"
I
"
"
W
I
I
"
"
Y
;
W
"
B
I
"
"
W
"
A
C
I
"
H
W
"
J
"
"
I
I
"
"
W
"
W
"
B
B
R
"
;
I
"
A
T
L
(
)
W
I
S
I
"
J
T
S
W
"
A
K
F
G
C
"
S
A
"
T
A
I
D
C
A
T
L
H
"
W
"
"
I
G
"
A
I
"
W
"
"
A
"
H
H
H
"
I
I
B
I
"
Y
I
"
A
"
C
A
A
I
C
R
E
A
T
E
D
H
S
T
"
S
S
I
"
I
I
L
D
H
"
L
"
I
I
A
R
I
"
A
"
G
I
"
S
H
I
I
Y
I
"
I
B
I
C
"
A
"
T
"
I
I
"
L
E
O
F
P
P
T
O
E
I
I
C
"
"
W
"
I
"
"
S
B
;
P
B
I
I
B
I
I
C
"
D
G
C
I
I
A
U
S
H
I
I
"
M
;
I
"
"
I
"
W
"
"
"
W
A
;
I
I
"
P
"
M
A
F
I
L
"
W
"
A
O
W
T
W
;
I
"
B
I
"
"
A
I
"
T
A
;
S
I
B
"
L
"
C
C
I
T
"
"
W
"
T
"
I
"
E
G
O
B
T
V
S
"
O
S
"
R
A
T
S
"
B
I
I
"
W
J
"
"
J
"
I
I
"
H
"
W
C
Y
I
C
"
C
W
I
"
"
T
I
"
"
T
"
F
"
S
R
"
"
I
"
L
"
T
"
H
"
I
T
I
I
S
O
I
P
F
G
T
R
;
"
A
I
"
I
"
H
"
T
F
W
W
"
A
I
"
I
"
H
A
J
A
S
Y
I
L
"
S
I
F
;
I
W
"
S
I
W
"
B
"
I
W
A
"
H
"
H
H
S
L
T
O
B
J
W
B
B
;
"
B
I
"
"
I
"
L
B
V
T
M
T
M
Y
"
W
"
"
R
"
E
T
"
W
T
F
E
"
L
"
A
"
A
"
I
I
W
"
A
"
P
A
I
"
Y
I
"
B
I
L
I
I
"
W
"
I
I
"
L
"
I
T
W
"
K
C
I
G
B
C
"
F
"
G
Y
H
W
H
T
M
"
"
I
"
B
T
"
I
T
"
A
"
E
G
D
;
D
W
"
B
"
O
"
R
"
T
J
I
[
E
J
"
I
"
S
K
A
I
T
I
T
H
I
O
B
"
I
F
"
J
I
V
I
Y
"
T
J
"
I
I
Y
W
;
W
I
"
F
I
;
I
;
I
"
L
;
"
"
"
I
C
H
"
"
I
"
M
T
"
I
T
"
O
;
;
I
"
"
A
"
Y
W
W
"
"
B
"
T
"
H
"
"
"
P
O
H
H
C
"
M
"
W
I
"
I
"
"
Y
I
"
J
"
I
"
I
"
L
"
A
T
G
L
"
B
L
H
"
"
I
"
A
"
I
"
P
J
I
L
D
"
Y
;
I
I
O
H
E
J
"
"
I
"
M
"
A
D
I
I
A
"
M
I
"
;
"
I
I
I
"
C
"
H
B
B
"
P
"
G
T
B
"
B
T
I
"
B
C
R
I
I
"
T
S
E
I
H
F
J
D
B
"
W
"
J
I
"
F
"
A
"
I
I
I
"
"
W
"
P
G
W
"
P
C
T
"
A
H
B
A
H
P
T
R
L
F
C
B
T
I
"
A
"
I
I
C
"
I
T
I
"
W
H
"
T
J
"
I
C
"
I
I
"
S
;
"
I
T
"
D
H
"
A
I
"
I
"
L
"
"
G
F
I
"
C
"
Q
I
A
T
M
W
M
"
I
;
I
;
;
I
E
;
O
H
T
E
C
D
"
T
S
B
H
T
L
I
I
M
P
"
I
F
S
H
G
"
I
S
I
P
T
T
C
A
"
I
S
I
"
I
A
;
I
"
O
I
"
S
"
T
I
"
T
A
"
I
A
"
M
I
I
I
A
"
B
B
"
S
M
C
"
S
M
H
"
S
"
W
F
P
"
I
"
H
"
B
I
B
T
"
A
I
;
H
"
I
L
A
M
B
S
"
A
W
A
G
R
I
E
D
I
C
O
A
C
J
M
B
"
T
"
M
M
M
H
;
"
I
"
D
"
W
I
"
B
"
H
"
H
"
I
"
I
S
H
E
L
L
I
E
S
S
E
U
I
B
T
"
I
"
M
"
I
"
L
T
M
"
"
B
S
D
"
I
I
"
G
"
L
"
E
"
P
"
"
S
S
S
"
I
M
"
I
"
T
"
R
T
I
W
W
"
Y
M
F
T
S
F
S
A
I
R
A
Z
T
S
S
O
S
"
T
S
"
"
W
"
C
K
I
I
"
S
R
G
R
A
"
I
"
I
A
L
"
M
J
"
Y
"
M
M
M
O
"
T
W
J
M
C
R
S
"
I
T
S
H
B
"
W
"
M
I
"
A
"
B
G
T
P
P
S
T
I
"
P
C
"
P
A
I
W
E
S
T
M
I
S
T
E
R
G
A
Z
E
T
T
E
I
(
I
)
"
T
"
Y
H
S
E
"
Y
"
"
Y
"
S
C
I
A
C
B
T
E
J
R
R
E
J
O
I
L
L
;
I
G
G
"
I
I
"
I
I
"
"
Y
"
I
G
G
I
K
"
O
P
"
U
T
T
"
I
I
"
C
"
W
A
I
A
P
I
I
W
A
"
A
"
R
H
;
"
F
"
H
I
"
T
"
"
I
T
B
B
A
E
"
A
"
K
"
D
"
;
"
"
"
I
"
I
"
S
"
F
[Loss: 15.0932, LR: 0.00050]:   0%|           | 1/365 [00:22<2:13:41, 22.04s/it]"
G
J
C
"
Y
W
W
A
T
"
H
I
"
W
W
"
Y
E
I
B
J
I
T
S
L
P
"
A
"
T
"
T
I
B
B
B
"
L
D
"
R
R
R
"
O
H
B
J
T
H
I
P
I
H
W
"
T
A
F
I
L
Y
"
I
T
"
"
H
I
"
B
T
;
H
W
R
R
O
S
P
H
B
O
M
"
S
M
"
I
"
Y
"
A
R
T
;
I
"
"
A
"
I
F
I
"
I
"
O
I
H
T
M
M
M
S
T
"
I
"
T
"
O
"
R
O
T
P
B
;
"
I
"
"
"
[Loss: 13.3964, LR: 0.00050]:   1%|           | 2/365 [00:23<1:00:05,  9.93s/it]E
"
"
H
Y
U
H
P
S
I
W
T
B
W
I
Y
T
"
"
I
"
J
S
S
W
C
B
H
I
I
I
I
I
"
B
"
A
"
F
A
S
"
I
A
"
M
A
I
I
B
I
T
I
M
W
W
I
I
S
I
I
T
I
"
T
"
W
J
"
I
;
M
A
"
L
W
"
I
"
W
"
L
C
"
S
"
W
"
A
"
O
H
"
T
A
F
;
"
W
;
I
"
[Loss: 9.3134, LR: 0.00050]:   1%|              | 3/365 [00:24<36:07,  5.99s/it]B
A
C
M
"
W
I
I
S
I
I
I
"
I
"
B
I
I
I
R
I
"
H
C
H
W
"
J
H
J
"
P
I
R
U
C
"
A
"
"
"
P
"
I
"
S
W
T
T
A
I
B
;
"
T
"
B
I
T
A
L
D
C
W
"
T
"
T
A
F
A
P
;
L
"
W
I
E
C
"
W
I
S
F
F
S
H
T
I
;
I
I
T
;
W
C
B
W
B
I
F
B
H
"
Y
"
C
A
H
W
T
A
H
;
I
"
I
;
"
H
W
A
C
T
T
T
"
T
"
"
I
T
D
B
G
B
I
H
M
Q
W
T
H
"
F
I
E
"
C
B
;
;
G
O
T
"
"
L
"
H
T
P
B
L
H
P
I
A
T
H
[Loss: 8.1618, LR: 0.00050]:   1%|▏             | 4/365 [00:26<26:49,  4.46s/it]A
/.....
@lexkoro
Copy link
Owner

lexkoro commented Mar 22, 2023

Hi,

  1. doesn't matter what file type it is.

  2. you would have to add the the missing characters/ phonemes used in your dataset to this file: word_index_dict_new.txt
    or your own file and reference it in the code.

    DEFAULT_DICT_PATH = osp.join(osp.dirname(__file__), "word_index_dict_new.txt")

The log is printing all characters it doesn't find in the word_index_dict_new.txt file. You can check for this.

class TextCleaner:

Also I remove the phonemization step since I have precomputed it directly into the metadata file:

/SqNarrator/wavs/a0jm2r00.171.wav|SqNarrator_EN|5|1|jɛp, teɪsts d͡ʒʌst laɪ̯k jʌd ɛkspɛkt.|yep, tastes just like you'd expect.
/SqNarrator/wavs/a0ji2r00.0z1.wav|SqNarrator_EN|5|1|ju siː sʌm dɹɪpɪŋ, uzɪŋ stʌf.|you see some dripping, oozing stuff.

So you might wanna add it back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants