Add ASR CTC inference tutorial #2106

carolineechen · 2021-12-28T14:22:47Z

demonstrate usage of the CTC beam search decoder w/ lexicon constraint and KenLM support, on a LibriSpeech sample and using a pretrained wav2vec2 model

rendered: https://485200-90321822-gh.circle-artifacts.com/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html

follow-ups:

incorporate nbest
demonstrate customizability of different beam search parameters

mthrok · 2021-12-28T14:47:01Z

examples/tutorials/asr_inference_with_ctc_decoder_tutorial.py

+# ~~~~~~~~~~~~~~~~
+# 
+
+def download_file(url, file):


This might be coming from the other tutorials but I think we can simply use torch.hub.download_url_to_file.

examples/tutorials/asr_inference_with_ctc_decoder_tutorial.py

mthrok · 2021-12-28T14:49:07Z

examples/tutorials/asr_inference_with_ctc_decoder_tutorial.py

+# 
+# The tokens are the possible symbols that the acoustic model can predict,
+# including the blank and silent symbols.
+# 


Similar to lexicon bellow, can you show a part of the contents of token file?

examples/tutorials/asr_inference_with_ctc_decoder_tutorial.py

mthrok · 2021-12-28T16:03:45Z

examples/tutorials/asr_inference_with_ctc_decoder_tutorial.py

-speech_url = "https://pytorch.s3.amazonaws.com/torchaudio/tutorial-assets/ctc-decoding/8461-258277-0000.flac"
-speech_file = "_assets/speech.flac"
+speech_url = "https://pytorch.s3.amazonaws.com/torchaudio/tutorial-assets/ctc-decoding/8461-258277-0000.wav"
+speech_file = "/tmp/speech.wav"


Can you use torch.hub.get_dir? So that configurable default temporary directory will be selected.

Hardcoding /tmp would leave artifacts when running locally.

mthrok

Looks good. As a follow-up idea, we can add more detail on how the decoding result change based on decoding parameters.

Also we should show n-best result, which is something Greedy Decoder cannot provide.

mthrok · 2021-12-28T16:53:47Z

examples/tutorials/asr_inference_with_ctc_decoder_tutorial.py

+# We see that the transcript with the lexicon-constrained beam search
+# decoder consists of real words, while the greedy decoder can predict
+# incorrectly spelled words like “hundrad”.
+# 


Please run pre-commit run once again.

facebook-github-bot · 2021-12-28T20:40:20Z

@carolineechen has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-12-28T20:59:09Z

@carolineechen has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: demonstrate usage of the CTC beam search decoder w/ lexicon constraint and KenLM support, on a LibriSpeech sample and using a pretrained wav2vec2 model rendered: https://485200-90321822-gh.circle-artifacts.com/0/docs/tutorials/asr_inference_with_ctc_decoder_tutorial.html follow-ups: - incorporate `nbest` - demonstrate customizability of different beam search parameters Pull Request resolved: pytorch#2106 Reviewed By: mthrok Differential Revision: D33340946 Pulled By: carolineechen fbshipit-source-id: 0ab838375d96a035d54ed5b5bd9ab4dc8d19adb7

Add tags to the maskedtensor tutorials so they appear when filtering by tag

add ctc inference tutorial

3bea137

pytorch-probot bot added the ciflow/default label Dec 28, 2021

facebook-github-bot added the CLA Signed label Dec 28, 2021

mthrok reviewed Dec 28, 2021

View reviewed changes

address pr comments

ee9c8c9

carolineechen force-pushed the ctc-inference-tutorial branch from 256f9e7 to ee9c8c9 Compare December 28, 2021 16:21

carolineechen marked this pull request as ready for review December 28, 2021 16:34

mthrok approved these changes Dec 28, 2021

View reviewed changes

mthrok reviewed Dec 28, 2021

View reviewed changes

Caroline Chen added 2 commits December 28, 2021 14:56

pre commit formatting

04808c1

split cells

18a51e2

small fixes

5dc78ec

facebook-github-bot closed this in 133d006 Dec 28, 2021

carolineechen added example module: ops labels Jan 24, 2022

mthrok pushed a commit to mthrok/audio that referenced this pull request Dec 13, 2022

Add tags to the maskedtensor tutorials (pytorch#2106)

e445a48

Add tags to the maskedtensor tutorials so they appear when filtering by tag

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ASR CTC inference tutorial #2106

Add ASR CTC inference tutorial #2106

carolineechen commented Dec 28, 2021 •

edited

Loading

mthrok Dec 28, 2021

mthrok Dec 28, 2021

mthrok Dec 28, 2021

mthrok left a comment

mthrok Dec 28, 2021

facebook-github-bot commented Dec 28, 2021

facebook-github-bot commented Dec 28, 2021

Add ASR CTC inference tutorial #2106

Add ASR CTC inference tutorial #2106

Conversation

carolineechen commented Dec 28, 2021 • edited Loading

mthrok Dec 28, 2021

Choose a reason for hiding this comment

mthrok Dec 28, 2021

Choose a reason for hiding this comment

mthrok Dec 28, 2021

Choose a reason for hiding this comment

mthrok left a comment

Choose a reason for hiding this comment

mthrok Dec 28, 2021

Choose a reason for hiding this comment

facebook-github-bot commented Dec 28, 2021

facebook-github-bot commented Dec 28, 2021

carolineechen commented Dec 28, 2021 •

edited

Loading