-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding LibriSpeech word alignments in supervisions #379
Conversation
Note: silences are represented as an empty string |
Adding the alignments in K2 dataset seems fairly easy; the "supervisions" dict has three more keys, "word", "word_start", "word_end"; it's list of lists of str/float. @danpovey before I merge let me know if this format works for you guys (don't mind my transparent terminal with Spotify in the background). |
Would it be easier for later use if it returns frames for word_start and word_end, i.e., use int32_t ? |
Up to you guys. I don’t have a good idea of how you want to use it atm. Do you prefer that to be in frames? |
From
It says we need start/end frames. But from #378 (comment)
It suggests using times in seconds. I am not sure which one is better. |
Seconds is OK, it's best if the calling code converts that to frames because the calling code knows the frame rate. |
... BTW, part of the reason I want this to be in integers when attached as an attribute is that k2 basically assumes that floating-point attributes are "score-like", so for instance they will be added together when integer attributes would be converted to ragged, such as when removing epsilons; and the default value can only be 0, never -1. Later we can change this behavior if it becomes a problem. |
I think the calling code doesn’t know the frame shift anymore (unless you are using precomputed features and use dataset with return_cuts=True so you can query the cuts, but then it will fail with on the fly features). Also we are already returning start frame and num frames for each supervision from the dataset, so this is inconsistent. I’d suggest using frames here after all, unless you’re sure about seconds. |
ok.. frames is Ok.
…On Friday, August 20, 2021, Piotr Żelasko ***@***.***> wrote:
I think the calling code doesn’t know the frame shift anymore (unless you
are using precomputed features and use dataset with return_cuts=True so you
can query the cuts, but then it will fail with on the fly features). Also
we are already returning start frame and num frames for each supervision
from the dataset, so this is inconsistent. I’d suggest using frames here
after all, unless you’re sure about seconds.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#379 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO6GSWZRHEMOPUGT3SDT5Y6GNANCNFSM5CPNAMOQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
Let's see if this is better, if it's OK I'm going to merge (can't thoroughly test it right now but seems fine on isolated examples -- I plan to clean it up and add some tests later) |
Thanks!!
…On Fri, Aug 20, 2021 at 10:51 PM Piotr Żelasko ***@***.***> wrote:
Let's see if this is better, if it's OK I'm going to merge (can't
thoroughly test it right now but seems fine on isolated examples -- I plan
to clean it up and add some tests later)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#379 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO2T2G6DLZAVAQVEENLT5ZTXRANCNFSM5CPNAMOQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
+2 |
No description provided.