Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

populate recording_id for Cut when using Cut.compute_and_store_featur… #147

Merged
merged 1 commit into from
Nov 23, 2020

Conversation

freewym
Copy link
Contributor

@freewym freewym commented Nov 23, 2020

…es()

Sometimes we might want to know the corresponding recoding of a specific cut for debugging or evaluation, e.g. reference transcripts are usually provided in the format of <recording_id> <reference>.

Currently we use Cut.compute_and_store_features() to extract features but this API does not fetch Cut.recording.id, so Cut.recording_id will return None. On the other hand, Cut.id is not None but it is a random string, making it a bit difficult to find the corresponding recording of a cut.

This PR is trying to copy Cut.recording.id to Cut.features.recording_id when calling Cut.compute_and_store_features().

@pzelasko LMK what you think.

@freewym
Copy link
Contributor Author

freewym commented Nov 23, 2020

I realized that a cut may be from multiple recordings... In that case maybe we can leave the recording id as None?

@danpovey
Copy link
Collaborator

danpovey commented Nov 23, 2020 via email

@freewym
Copy link
Contributor Author

freewym commented Nov 23, 2020

Why did you want to know that information? The Supervision would always come from a specific recording I think.

On Mon, Nov 23, 2020 at 8:50 AM Yiming Wang @.***> wrote: I realized that a cut may be from multiple recordings... In that case maybe we can leave the recording id as None? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#147 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLOZITRCSCHFCGUOTUZDSRGWUBANCNFSM4T62XXFA .

OK, you are right, SupervisionSegment always has non-empty recording_id. I was also thinking of a scenario where the supervisions are not available (e.g. in evaluation), and we may want to save the decoded results in a file to evaluate them later. Maybe an easier fix is switching the if else condition at the following line

return self.features.recording_id if self.has_features else self.recording.id

as long as Cut.recording is not None.

@pzelasko
Copy link
Collaborator

I am fine with that last change you suggested.

@freewym
Copy link
Contributor Author

freewym commented Nov 23, 2020

I am fine with that last change you suggested.

Done

@pzelasko pzelasko merged commit 2a8bba9 into lhotse-speech:master Nov 23, 2020
@freewym freewym deleted the reco_id branch November 23, 2020 14:47
@pzelasko pzelasko added this to the v0.3 milestone Nov 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants