ASR conformer model training with nemo: data loader questions #10022

dhoore123 · 2024-08-02T08:30:43Z

dhoore123
Aug 2, 2024

The questions below are linked to training ASR models using nemo of type conformer and fast conformer.
I am in the process of investigating the impact of how training data is represented and loaded, such that I can provide guidelines to our internal teams how to best organize and handle their data.

Some people create manifest files where utterances have a reference to a wave file, using an offset and a duration to indicate where in the audio file the utterance is located. A single file can contain dozens of utterances, which means a manifest file has a dozen of lines, one for each utterance, each referring to the same audio file with a different offset and duration. To understand how such utterances are treated, I had some questions:

will the complete audio file be read and copied to GPU RAM, or only the audio as specified by the offset and duration?
will the system open/close such multi-utterance audio file just once and read the individual utterances, or will the files be opened and closed every time a single utterance is read?

Note that I am aware of the mechanism of using tarred audio files. Converting the above data organisation to tarred audio files would however not be efficient as one would end up with dozens of copies of the same big audio file. One should either use tarred files with one file per utterance (with offset 0), or use larger wave files and using proper offset values in the manifest files for each utterance. I have good experiences with the former in terms of efficiency. For illustration: in one of my experiments, using dynamic data bucketing with lhotse combined with tarring audio files led to a 10x speed-up of training compared to the original, traditional, setup I was using of manifest files and untarred audio (1 file per utterance) without bucketing. So how data is loaded and organized matters a lot.
I am looking forward learning your thoughts and experience about this topic. At this moment, I am particularly interested in better understanding how and what data gets loaded into GPU RAM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASR conformer model training with nemo: data loader questions #10022

{{title}}

Replies: 0 comments

Select a reply

ASR conformer model training with nemo: data loader questions #10022

dhoore123 Aug 2, 2024

Replies: 0 comments

dhoore123
Aug 2, 2024