You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create the (intermediate) manifest file using code_switching_manifest_creation.py. It's usage is as follows:
python code_switching_manifest_creation.py --manifest_language1 <absolute path of Language 1's manifest file> --manifest_language2 <absolute path of Language 2's manifest file> --manifest_save_path --id_language1 <language code for language 1 (e.g. en)> --id_language2 <language code for language 2 (e.g. es)> --max_sample_duration_sec --min_sample_duration_sec --dataset_size_required_hrs
Estimated runtime for dataset_size_required_hrs=10,000 is ~2 mins
Create the synthetic audio data and the corresponding manifest file using code_switching_audio_data_creation.py It's usage is as follows:
python code_switching_audio_data_creation.py --manifest_path <absolute path to intermediate CS manifest generated in step 1> --audio_save_folder_path --manifest_save_path --audio_normalized_amplitude --cs_data_sampling_rate --sample_beginning_pause_msec --sample_joining_pause_msec <pause to be added between segments while joining, in milli seconds> --sample_end_pause_msec --is_lid_manifest <boolean to create manifest in the multi-sample lid format for the text field, true by default> --workers
Example of the multi-sample LID format: [{“str”:“esta muestra ” “lang”:”es”},{“str”:“was generated synthetically”: “lang”:”en”}]
Estimated runtime for generating a 10,000 hour corpus is ~40 hrs with a single worker"
after following these two steps, how to configure the train_ds and validation_ds here with the created synthetic code-switched dataset and manifest:
name: "FastConformer-CTC-BPE"
model:
sample_rate: 16000
log_prediction: true # enables logging sample predictions in the output during training
ctc_reduction: 'mean_volume'
skip_nan_grad: false
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
"Follow the 2 steps listed below in order -
Create the (intermediate) manifest file using code_switching_manifest_creation.py. It's usage is as follows:
python code_switching_manifest_creation.py --manifest_language1 <absolute path of Language 1's manifest file> --manifest_language2 <absolute path of Language 2's manifest file> --manifest_save_path --id_language1 <language code for language 1 (e.g. en)> --id_language2 <language code for language 2 (e.g. es)> --max_sample_duration_sec --min_sample_duration_sec --dataset_size_required_hrs
Estimated runtime for dataset_size_required_hrs=10,000 is ~2 mins
Create the synthetic audio data and the corresponding manifest file using code_switching_audio_data_creation.py It's usage is as follows:
python code_switching_audio_data_creation.py --manifest_path <absolute path to intermediate CS manifest generated in step 1> --audio_save_folder_path --manifest_save_path --audio_normalized_amplitude --cs_data_sampling_rate --sample_beginning_pause_msec --sample_joining_pause_msec <pause to be added between segments while joining, in milli seconds> --sample_end_pause_msec --is_lid_manifest <boolean to create manifest in the multi-sample lid format for the text field, true by default> --workers
Example of the multi-sample LID format: [{“str”:“esta muestra ” “lang”:”es”},{“str”:“was generated synthetically”: “lang”:”en”}]
Estimated runtime for generating a 10,000 hour corpus is ~40 hrs with a single worker"
after following these two steps, how to configure the train_ds and validation_ds here with the created synthetic code-switched dataset and manifest:
name: "FastConformer-CTC-BPE"
model:
sample_rate: 16000
log_prediction: true # enables logging sample predictions in the output during training
ctc_reduction: 'mean_volume'
skip_nan_grad: false
train_ds:
manifest_filepath: ???
sample_rate: ${model.sample_rate}
batch_size: 16 # you may increase batch_size if your memory allows
shuffle: true
num_workers: 8
pin_memory: true
max_duration: 16.7
min_duration: 0.1
# tarred datasets
is_tarred: false
tarred_audio_filepaths: null
shuffle_n: 2048
# bucketing params
bucketing_strategy: "fully_randomized"
bucketing_batch_size: null
is_code_switched: true
code_switched:
min_duration: 12
max_duration: 20
min_monolingual: 0.3
probs: [0.5, 0.5]
force_monochannel: true
sampling_scales: 0.75
seed: 123
validation_ds:
manifest_filepath: ???
sample_rate: ${model.sample_rate}
batch_size: 16 # you may increase batch_size if your memory allows
shuffle: false
use_start_end_token: false
num_workers: 8
pin_memory: true
is_code_switched: true
code_switched:
min_duration: 12
max_duration: 20
min_monolingual: 0.3
probs: [0.5, 0.5]
force_monochannel: true
sampling_scales: 0.75
seed: 123
test_ds:
manifest_filepath: null
sample_rate: ${model.sample_rate}
batch_size: 16
shuffle: false
use_start_end_token: false
num_workers: 8
pin_memory: true
tokenizer:
type: agg
langs:
en:
dir: ???
type: bpe
ar:
dir: ???
type: bpe
Beta Was this translation helpful? Give feedback.
All reactions