Size of initial minibatch #146

danpovey · 2020-11-22T13:33:04Z

Piotr, in our snowfall eg with mini_librispeech, the 1st minibatch is 16 seconds long which seems on the long side.
Is that typical of the data, or are they arranged from longest to shortest?
.. because if we want them in a nonrandom order we probably want shortest to longest, which would be better for convergence.

pzelasko · 2020-11-22T14:36:22Z

They are randomly shuffled. There are also multiple cuts packed into a single batch example which might make the duration excessive. Do you think some curriculum learning like option would be useful (sorting by length, possibly for a number of first epochs)?

…

Wiadomość napisana przez Daniel Povey ***@***.***> w dniu 11/22/20, o godz. 08:33: Piotr, in our snowfall eg with mini_librispeech, the 1st minibatch is 16 seconds long which seems on the long side. Is that typical of the data, or are they arranged from longest to shortest? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

danpovey · 2020-11-22T14:52:36Z

I assume the length of the example is the length of the longest individual cut?

…

On Sun, Nov 22, 2020 at 10:36 PM Piotr Żelasko ***@***.***> wrote: They are randomly shuffled. There are also multiple cuts packed into a single batch example which might make the duration excessive. Do you think some curriculum learning like option would be useful (sorting by length, possibly for a number of first epochs)? > Wiadomość napisana przez Daniel Povey ***@***.***> w dniu 11/22/20, o godz. 08:33: > > > Piotr, in our snowfall eg with mini_librispeech, the 1st minibatch is 16 seconds long which seems on the long side. > Is that typical of the data, or are they arranged from longest to shortest? > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub, or unsubscribe. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#146 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLOYOXZGR56UXZHSKNPDSREOXFANCNFSM4T6PLHFA> .

pzelasko · 2020-11-22T15:11:34Z

By default it's twice of that (it helps the heuristic pack cuts better), but can be adjusted; see: https://github.com/lhotse-speech/lhotse/blob/master/lhotse/dataset/speech_recognition.py#L136

danpovey · 2020-11-22T15:20:07Z

I think we should make it that by default, not twice of that, because for many model types the time taken could be much more sensitive to the sequence length than to the total number of frames in the sequence. For attention models it can even take time quadratic in the num-frames.

…

On Sun, Nov 22, 2020 at 11:11 PM Piotr Żelasko ***@***.***> wrote: By default it's twice of that (it helps the heuristic pack cuts better), but can be adjusted; see: https://github.com/lhotse-speech/lhotse/blob/master/lhotse/dataset/speech_recognition.py#L136 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#146 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO7HI4M4Q5H2S4VW7CLSRES3DANCNFSM4T6PLHFA> .

pzelasko · 2020-11-22T15:23:51Z

Ok, will do. I wonder though, if it makes sense to concatenate the cuts for the attention models at all (unless these are subsequent utterances from the same recording/conversation).

…

Wiadomość napisana przez Daniel Povey ***@***.***> w dniu 11/22/20, o godz. 10:20: I think we should make it that by default, not twice of that, because for many model types the time taken could be much more sensitive to the sequence length than to the total number of frames in the sequence. For attention models it can even take time quadratic in the num-frames. On Sun, Nov 22, 2020 at 11:11 PM Piotr Żelasko ***@***.***> wrote: > By default it's twice of that (it helps the heuristic pack cuts better), > but can be adjusted; see: > https://github.com/lhotse-speech/lhotse/blob/master/lhotse/dataset/speech_recognition.py#L136 > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#146 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAZFLO7HI4M4Q5H2S4VW7CLSRES3DANCNFSM4T6PLHFA> > . > — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

pzelasko mentioned this issue Nov 23, 2020

Set default duration limit factor to 1 for K2 Iterable Dataset #148

Merged

pzelasko closed this as completed in #148 Nov 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Size of initial minibatch #146

Size of initial minibatch #146

danpovey commented Nov 22, 2020 •

edited

Loading

pzelasko commented Nov 22, 2020 via email

danpovey commented Nov 22, 2020 via email

pzelasko commented Nov 22, 2020

danpovey commented Nov 22, 2020 via email

pzelasko commented Nov 22, 2020 via email

Size of initial minibatch #146

Size of initial minibatch #146

Comments

danpovey commented Nov 22, 2020 • edited Loading

pzelasko commented Nov 22, 2020 via email

danpovey commented Nov 22, 2020 via email

pzelasko commented Nov 22, 2020

danpovey commented Nov 22, 2020 via email

pzelasko commented Nov 22, 2020 via email

danpovey commented Nov 22, 2020 •

edited

Loading