Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up DSAlign #31

Open
galv opened this issue Jun 23, 2021 · 0 comments
Open

Speed up DSAlign #31

galv opened this issue Jun 23, 2021 · 0 comments

Comments

@galv
Copy link
Collaborator

galv commented Jun 23, 2021

Right now, we timeout when an audio file fails to align with its transcript within 200 seconds: https://github.com/mlcommons/peoples-speech/pull/27/files#diff-b790cd27585332e1eeca7dab897f1ccd7bcd483181132bd9914f2dd07062534fR401

This means 10% of our files timeout during alignment.

One observation is that DSAlign seems to slow to a crawl when the groundtruth transcript does not match what was actually said in the audio (e.g., the transcript is a translation)

One option is to reimplement some part of DSAlign in Cython. But we should really dive deep into what's going on, and see if there's something better we can do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant