Refactor pipeline_demo.py to support variant EMFORMER_RNNT bundles #2203

nateanl · 2022-02-04T12:45:01Z

We refactored the demo script that can apply RNNT decoding using both torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH and torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3 in both streaming and non-streaming mode. (The first hypothesis prediction is streaming and the second one is non-streaming).

We convert each token id sequence to word pieces and then manually join the word pieces. This allows us to preserve leading whitespaces on output strings and therefore account for word breaks and continuations across token processor invocations, which is particularly useful when performing streaming ASR.

demo.mov

facebook-github-bot · 2022-02-04T16:50:02Z

@nateanl has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

mthrok

The script looks okay, but code-wise what is the difference from the equivalent one from librispeech_emformer_rnnt?

nateanl · 2022-02-09T19:27:24Z

code-wise what is the difference from the equivalent one from librispeech_emformer_rnnt?

It's actually the same, only the pipeline-related part is different.

hwangjeff

the readme also references pipeline_demo.py — can you update it to be consistent with the changes here?

examples/asr/emformer_rnnt/pipeline_demo.py

hwangjeff

in the video, why do the word pieces for each streaming transcription show up all at once?

examples/asr/emformer_rnnt/README.md

nateanl · 2022-02-11T15:31:31Z

why do the word pieces for each streaming transcription show up all at once?

I guess it's because it's run on AWS cluster, there may be some delay when printing to the screen.

hwangjeff · 2022-02-11T15:48:19Z

I guess it's because it's run on AWS cluster, there may be some delay when printing to the screen.

in that case, for the screen capture, can you run the script locally? that way, we can clearly show users what we mean by streaming asr and how responsive it is using the bundles (example: #2192)

nateanl · 2022-02-11T16:17:27Z

Sure. The demo video has been updated @hwangjeff

facebook-github-bot · 2022-02-11T16:18:18Z

@nateanl has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

hwangjeff

looks good — thanks!

mthrok · 2022-02-11T22:38:46Z

examples/asr/emformer_rnnt/pipeline_demo.py

+logger = logging.getLogger()
+
+
+def get_dataset(model_type, dataset_path):


Instead of repeating the key validation here and there, but these options in a dictionary, sthen pass the keys to choices arguments of argpsrse. So that the closed set of available options are defined once and only once.

mthrok · 2022-02-11T22:40:21Z

examples/asr/emformer_rnnt/pipeline_demo.py

+
+
+def parse_args():
+    parser = ArgumentParser()


Please add module level docstring that describes the gist of this script, then pass it to the help description so that it's easy to see what this script does.

mthrok · 2022-02-11T22:41:00Z

examples/asr/emformer_rnnt/pipeline_demo.py

@@ -0,0 +1,94 @@
+import logging


Please add shebang line.

…ytorch#2203) Summary: We refactored the demo script that can apply RNNT decoding using both `torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH` and `torchaudio.prototype.pipelines.EMFORMER_RNNT_BASE_TEDLIUM3` in both streaming and non-streaming mode. (The first hypothesis prediction is streaming and the second one is non-streaming). We convert each token id sequence to word pieces and then manually join the word pieces. This allows us to preserve leading whitespaces on output strings and therefore account for word breaks and continuations across token processor invocations, which is particularly useful when performing streaming ASR. https://user-images.githubusercontent.com/8653221/153627956-f0806f18-3c1c-44df-ac07-ec2def58a0cf.mov Pull Request resolved: pytorch#2203 Reviewed By: carolineechen Differential Revision: D34006388 Pulled By: nateanl fbshipit-source-id: 3d31173ee10cdab8a2f5802570e22b50fcce5632

nateanl added example module: pipelines labels Feb 4, 2022

nateanl requested review from mthrok, hwangjeff and carolineechen February 4, 2022 12:45

pytorch-bot bot added the ciflow/default label Feb 4, 2022

facebook-github-bot added the CLA Signed label Feb 4, 2022

mthrok reviewed Feb 7, 2022

View reviewed changes

Add demo script for EMFORMER_RNNT_BASE_TEDLIUM3 pipeline

939947c

nateanl force-pushed the rnnt_tedlium_pipeline branch from ec63c19 to 6393298 Compare February 10, 2022 21:08

refactor

e53b7c5

nateanl force-pushed the rnnt_tedlium_pipeline branch from 6393298 to e53b7c5 Compare February 10, 2022 21:10

hwangjeff reviewed Feb 11, 2022

View reviewed changes

examples/asr/emformer_rnnt/pipeline_demo.py Outdated Show resolved Hide resolved

examples/asr/emformer_rnnt/pipeline_demo.py Show resolved Hide resolved

refactor bundle and dataset method

a4f6a06

nateanl requested review from hwangjeff and mthrok February 11, 2022 06:54

update README

41fae45

nateanl changed the title ~~Add demo script for EMFORMER_RNNT_BASE_TEDLIUM3 pipeline~~ Refactor pipeline_demo.py to support variant EMFORMER_RNNT bundles Feb 11, 2022

hwangjeff reviewed Feb 11, 2022

View reviewed changes

examples/asr/emformer_rnnt/README.md Outdated Show resolved Hide resolved

address comment

faedc86

nateanl requested a review from hwangjeff February 11, 2022 15:33

hwangjeff approved these changes Feb 11, 2022

View reviewed changes

facebook-github-bot closed this in 16d02a9 Feb 11, 2022

mthrok reviewed Feb 11, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor pipeline_demo.py to support variant EMFORMER_RNNT bundles #2203

Refactor pipeline_demo.py to support variant EMFORMER_RNNT bundles #2203

nateanl commented Feb 4, 2022 •

edited

Loading

facebook-github-bot commented Feb 4, 2022

mthrok left a comment

nateanl commented Feb 9, 2022

hwangjeff left a comment

hwangjeff left a comment

nateanl commented Feb 11, 2022

hwangjeff commented Feb 11, 2022

nateanl commented Feb 11, 2022

facebook-github-bot commented Feb 11, 2022

hwangjeff left a comment

mthrok Feb 11, 2022

mthrok Feb 11, 2022

mthrok Feb 11, 2022

		logger = logging.getLogger()


		def get_dataset(model_type, dataset_path):

Refactor pipeline_demo.py to support variant EMFORMER_RNNT bundles #2203

Refactor pipeline_demo.py to support variant EMFORMER_RNNT bundles #2203

Conversation

nateanl commented Feb 4, 2022 • edited Loading

facebook-github-bot commented Feb 4, 2022

mthrok left a comment

Choose a reason for hiding this comment

nateanl commented Feb 9, 2022

hwangjeff left a comment

Choose a reason for hiding this comment

hwangjeff left a comment

Choose a reason for hiding this comment

nateanl commented Feb 11, 2022

hwangjeff commented Feb 11, 2022

nateanl commented Feb 11, 2022

facebook-github-bot commented Feb 11, 2022

hwangjeff left a comment

Choose a reason for hiding this comment

mthrok Feb 11, 2022

Choose a reason for hiding this comment

mthrok Feb 11, 2022

Choose a reason for hiding this comment

mthrok Feb 11, 2022

Choose a reason for hiding this comment

nateanl commented Feb 4, 2022 •

edited

Loading