Adding LibriSpeech word alignments in supervisions #379

pzelasko · 2021-08-20T02:24:25Z

No description provided.

pzelasko · 2021-08-20T02:25:10Z

Note: silences are represented as an empty string

pzelasko · 2021-08-20T02:40:05Z

Adding the alignments in K2 dataset seems fairly easy; the "supervisions" dict has three more keys, "word", "word_start", "word_end"; it's list of lists of str/float.

@danpovey before I merge let me know if this format works for you guys (don't mind my transparent terminal with Spotify in the background).

…ilable

csukuangfj · 2021-08-20T02:51:09Z

it's list of lists of str/float.

Would it be easier for later use if it returns frames for word_start and word_end, i.e., use int32_t ?

pzelasko · 2021-08-20T03:18:31Z

Up to you guys. I don’t have a good idea of how you want to use it atm. Do you prefer that to be in frames?

csukuangfj · 2021-08-20T03:29:26Z

Up to you guys. I don’t have a good idea of how you want to use it atm. Do you prefer that to be in frames?

From
#378 (comment)

Let's suppose that we convert this to frames after subsampling, and let the corresponding
filler be -1. (Can do this by setting, say, fsa.times = [tensor of times], and fsa.times_filler = -1, suppposing
fsa.times contained int32)

It says we need start/end frames.

But from #378 (comment)

It would be easiest to use, I think, if the words had a 'begin_frame' and
'end_frame' (or just a single frame index) and these were prepared with
the same shape as the words themselves-- not sure if it becomes a list of
list of int at some point?

I assume that they'd be floating point times in seconds at the point we get
them from lhotse, as we need to set the frame rate.

It suggests using times in seconds.

I am not sure which one is better.

danpovey · 2021-08-20T04:05:47Z

Seconds is OK, it's best if the calling code converts that to frames because the calling code knows the frame rate.
I think this should be OK.

danpovey · 2021-08-20T04:07:20Z

... BTW, part of the reason I want this to be in integers when attached as an attribute is that k2 basically assumes that floating-point attributes are "score-like", so for instance they will be added together when integer attributes would be converted to ragged, such as when removing epsilons; and the default value can only be 0, never -1. Later we can change this behavior if it becomes a problem.

pzelasko · 2021-08-20T11:47:39Z

I think the calling code doesn’t know the frame shift anymore (unless you are using precomputed features and use dataset with return_cuts=True so you can query the cuts, but then it will fail with on the fly features). Also we are already returning start frame and num frames for each supervision from the dataset, so this is inconsistent. I’d suggest using frames here after all, unless you’re sure about seconds.

danpovey · 2021-08-20T12:34:31Z

ok.. frames is Ok.

…

On Friday, August 20, 2021, Piotr Żelasko ***@***.***> wrote: I think the calling code doesn’t know the frame shift anymore (unless you are using precomputed features and use dataset with return_cuts=True so you can query the cuts, but then it will fail with on the fly features). Also we are already returning start frame and num frames for each supervision from the dataset, so this is inconsistent. I’d suggest using frames here after all, unless you’re sure about seconds. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#379 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO6GSWZRHEMOPUGT3SDT5Y6GNANCNFSM5CPNAMOQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

…ent "timestamps"

pzelasko · 2021-08-20T14:51:25Z

Let's see if this is better, if it's OK I'm going to merge (can't thoroughly test it right now but seems fine on isolated examples -- I plan to clean it up and add some tests later)

danpovey · 2021-08-20T15:46:41Z

Thanks!!

…

On Fri, Aug 20, 2021 at 10:51 PM Piotr Żelasko ***@***.***> wrote: Let's see if this is better, if it's OK I'm going to merge (can't thoroughly test it right now but seems fine on isolated examples -- I plan to clean it up and add some tests later) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#379 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLO2T2G6DLZAVAQVEENLT5ZTXRANCNFSM5CPNAMOQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

csukuangfj · 2021-08-20T15:50:58Z

+2

Adding LibriSpeech word alignments in supervisions

ab9de8b

pzelasko added this to the v0.8 milestone Aug 20, 2021

Extend K2SpeechRecognitionDataset with alignment information when ava…

a65de1e

…ilable

Change returning float seconds to returning int num_frames for alignm…

107a13d

…ent "timestamps"

pzelasko merged commit 9b12055 into master Aug 20, 2021

pzelasko mentioned this pull request Aug 25, 2021

Segmented Librispeech data-prep? #256

Closed

csukuangfj mentioned this pull request Sep 7, 2021

Phone based LF-MMI training k2-fsa/icefall#19

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding LibriSpeech word alignments in supervisions #379

Adding LibriSpeech word alignments in supervisions #379

pzelasko commented Aug 20, 2021 •

edited

Loading

pzelasko commented Aug 20, 2021

pzelasko commented Aug 20, 2021

csukuangfj commented Aug 20, 2021

pzelasko commented Aug 20, 2021

csukuangfj commented Aug 20, 2021

danpovey commented Aug 20, 2021

danpovey commented Aug 20, 2021 •

edited

Loading

pzelasko commented Aug 20, 2021

danpovey commented Aug 20, 2021 via email

pzelasko commented Aug 20, 2021

danpovey commented Aug 20, 2021 via email

csukuangfj commented Aug 20, 2021

Adding LibriSpeech word alignments in supervisions #379

Adding LibriSpeech word alignments in supervisions #379

Conversation

pzelasko commented Aug 20, 2021 • edited Loading

pzelasko commented Aug 20, 2021

pzelasko commented Aug 20, 2021

csukuangfj commented Aug 20, 2021

pzelasko commented Aug 20, 2021

csukuangfj commented Aug 20, 2021

danpovey commented Aug 20, 2021

danpovey commented Aug 20, 2021 • edited Loading

pzelasko commented Aug 20, 2021

danpovey commented Aug 20, 2021 via email

pzelasko commented Aug 20, 2021

danpovey commented Aug 20, 2021 via email

csukuangfj commented Aug 20, 2021

pzelasko commented Aug 20, 2021 •

edited

Loading

danpovey commented Aug 20, 2021 •

edited

Loading