Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ASR CTC inference tutorial #2106

Closed

Conversation

carolineechen
Copy link
Contributor

@carolineechen carolineechen commented Dec 28, 2021

demonstrate usage of the CTC beam search decoder w/ lexicon constraint and KenLM support, on a LibriSpeech sample and using a pretrained wav2vec2 model

rendered: https://485200-90321822-gh.circle-artifacts.com/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html

follow-ups:

  • incorporate nbest
  • demonstrate customizability of different beam search parameters

# ~~~~~~~~~~~~~~~~
#

def download_file(url, file):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be coming from the other tutorials but I think we can simply use torch.hub.download_url_to_file.

#
# The tokens are the possible symbols that the acoustic model can predict,
# including the blank and silent symbols.
#
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to lexicon bellow, can you show a part of the contents of token file?

speech_url = "https://pytorch.s3.amazonaws.com/torchaudio/tutorial-assets/ctc-decoding/8461-258277-0000.flac"
speech_file = "_assets/speech.flac"
speech_url = "https://pytorch.s3.amazonaws.com/torchaudio/tutorial-assets/ctc-decoding/8461-258277-0000.wav"
speech_file = "/tmp/speech.wav"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use torch.hub.get_dir? So that configurable default temporary directory will be selected.

Hardcoding /tmp would leave artifacts when running locally.

@carolineechen carolineechen marked this pull request as ready for review December 28, 2021 16:34
Copy link
Collaborator

@mthrok mthrok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. As a follow-up idea, we can add more detail on how the decoding result change based on decoding parameters.

Also we should show n-best result, which is something Greedy Decoder cannot provide.

# We see that the transcript with the lexicon-constrained beam search
# decoder consists of real words, while the greedy decoder can predict
# incorrectly spelled words like “hundrad”.
#
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run pre-commit run once again.

@facebook-github-bot
Copy link
Contributor

@carolineechen has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@carolineechen has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

xiaohui-zhang pushed a commit to xiaohui-zhang/audio that referenced this pull request May 4, 2022
Summary:
demonstrate usage of the CTC beam search decoder w/ lexicon constraint and KenLM support, on a LibriSpeech sample and using a pretrained wav2vec2 model

rendered: https://485200-90321822-gh.circle-artifacts.com/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html

follow-ups:
- incorporate `nbest`
- demonstrate customizability of different beam search parameters

Pull Request resolved: pytorch#2106

Reviewed By: mthrok

Differential Revision: D33340946

Pulled By: carolineechen

fbshipit-source-id: 0ab838375d96a035d54ed5b5bd9ab4dc8d19adb7
mthrok pushed a commit to mthrok/audio that referenced this pull request Dec 13, 2022
Add tags to the maskedtensor tutorials so they appear when filtering by tag
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants