[ASR] Conformer global tokens in local attention #6253

sam1373 · 2023-03-19T17:31:54Z

What does this PR do ?

Adds Longformer-style attention for Conformer encoder (limited context + few tokens with full context attention). Unlike with just limited context attention, when using global tokens model needs to be fine-tuned with this approach to get good results. After fine-tuning results are improved particularly on long audio (even if it's much longer than training data). When using the default Fast Conformer configs with this attention, inference on audio above 1 hour is supported.

Collection: ASR

Changelog

Add option to use global tokens
Adds corresponding test
Configs for training Conformer with this attention and mention in docs

Usage

Add the following parameters to the Conformer encoder in config:

    self_attention_model: rel_pos_local_attn # longformer-style attention (sliding window + global tokens)
    global_tokens: 1 # number of tokens that attend and are attended to by all tokens
    global_attn_separate: false # whether global tokens should use separate q,k,v layers

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: sam1373 <samuelkriman@gmail.com>

for more information, see https://pre-commit.ci

tests/collections/asr/test_asr_local_attn.py

titu1994

Overall it looks good, some minor changes commented. @VahidooX for final review

titu1994 · 2023-03-24T22:50:04Z

examples/asr/conf/fastconformer/fast-conformer-long_ctc_bpe.yaml

+
+# This version uses Longformer-style attention in order to handle longer audio
+
+name: "FastConformer-CTC-BPE"


Add long-context

titu1994 · 2023-03-24T22:50:50Z

examples/asr/conf/fastconformer/fast-conformer-long_ctc_bpe.yaml

+
+    # Feed forward module's params
+    ff_expansion_factor: 4
+


Add a Note regarding the sections that are the primary difference compared to original model config

titu1994 · 2023-03-24T22:51:29Z

examples/asr/conf/fastconformer/fast-conformer-long_transducer_bpe.yaml

+
+# This version uses Longformer-style attention in order to handle longer audio
+
+name: "FastConformer-Transducer-BPE"


Name update

titu1994 · 2023-03-24T22:51:38Z

examples/asr/conf/fastconformer/fast-conformer-long_transducer_bpe.yaml

+    # Feed forward module's params
+    ff_expansion_factor: 4
+
+    # Multi-headed Attention Module's params


Same as above

titu1994 · 2023-03-24T22:54:39Z

nemo/collections/asr/parts/submodules/multi_head_attention.py

+            if self.global_tokens > 0:
+                # compute sum of global and local attn
+                x = self._compute_attn_output_with_global_indices(
+                    value_vectors=v.transpose(1, 2),


Add shape info as comment here

titu1994 · 2023-03-24T22:55:00Z

nemo/collections/asr/parts/submodules/multi_head_attention.py

+                    max_num_global_attn_indices=max_num_global_attn_indices,
+                    is_index_global_attn_nonzero=is_index_global_attn_nonzero,
+                    is_local_index_global_attn_nonzero=is_local_index_global_attn_nonzero,
+                    w=w,


Use better keyword variable name

titu1994 · 2023-03-24T22:56:15Z

nemo/collections/asr/parts/submodules/multi_head_attention.py

+        return self.linear_out(x.reshape(n_batch, -1, self.h * self.d_k)[:, :T])
+
+    @staticmethod
+    def _get_global_attn_indices(is_index_global_attn: torch.Tensor) -> Tuple:


How about make this protected instead of private? Put _ at the end of the method name

titu1994 · 2023-03-24T22:56:35Z

nemo/collections/asr/parts/submodules/multi_head_attention.py

+            is_local_index_no_global_attn_nonzero,
+        )
+
+    def _compute_attn_probs_global_key(


Same as above

titu1994 · 2023-03-24T22:56:54Z

nemo/collections/asr/parts/submodules/multi_head_attention.py

+
+        return attn_output_only_global + attn_output_without_global
+
+    def _compute_global_attn_output_from_hidden(


Same as above

VahidooX

LGTM! Left some minor comments.

docs/source/asr/models.rst

VahidooX · 2023-03-31T20:17:59Z

examples/asr/conf/fastconformer/fast-conformer-long_ctc_bpe.yaml

@@ -0,0 +1,203 @@
+# It contains the default values for training a Fast Conformer-CTC ASR model, large size (~120M) with CTC loss and sub-word encoding.


Is it nice to have "-" and "_" in the name at the same time?

examples/asr/conf/fastconformer/fast-conformer-long_ctc_bpe.yaml

nemo/collections/asr/parts/submodules/conformer_modules.py

nemo/collections/asr/modules/conformer_encoder.py

examples/asr/conf/fastconformer/fast-conformer-long_ctc_bpe.yaml

Signed-off-by: sam1373 <samuelkriman@gmail.com>

for more information, see https://pre-commit.ci

Signed-off-by: sam1373 <samuelkriman@gmail.com>

nemo/collections/asr/parts/submodules/multi_head_attention.py

Signed-off-by: sam1373 <samuelkriman@gmail.com>

VahidooX

LGTM! Just left some minor comments. Let's wait for the second round of review from @titu1994.

examples/asr/conf/fastconformer/fast-conformer-long_ctc_bpe.yaml

Signed-off-by: sam1373 <samuelkriman@gmail.com>

VahidooX

LGTM!

* global tokens Signed-off-by: sam1373 <samuelkriman@gmail.com> * test, configs, docs Signed-off-by: sam1373 <samuelkriman@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: sam1373 <samuelkriman@gmail.com> * style Signed-off-by: sam1373 <samuelkriman@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * names, comments Signed-off-by: sam1373 <samuelkriman@gmail.com> * move comment Signed-off-by: sam1373 <samuelkriman@gmail.com> * import Signed-off-by: sam1373 <samuelkriman@gmail.com> * docs Signed-off-by: sam1373 <samuelkriman@gmail.com> * docs Signed-off-by: sam1373 <samuelkriman@gmail.com> * disable note Signed-off-by: sam1373 <samuelkriman@gmail.com> --------- Signed-off-by: sam1373 <samuelkriman@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>

sam1373 added 2 commits March 17, 2023 16:40

global tokens

a8f956c

Signed-off-by: sam1373 <samuelkriman@gmail.com>

test, configs, docs

f116f60

Signed-off-by: sam1373 <samuelkriman@gmail.com>

github-actions bot added the ASR label Mar 19, 2023

sam1373 and others added 3 commits March 19, 2023 10:32

Merge branch 'main' into pr_global_tokens

eb76a58

[pre-commit.ci] auto fixes from pre-commit.com hooks

fcf1dcf

for more information, see https://pre-commit.ci

Merge branch 'main' into pr_global_tokens

eaf12f0

sam1373 marked this pull request as ready for review March 20, 2023 18:31

github-advanced-security bot found potential problems Mar 20, 2023

View reviewed changes

tests/collections/asr/test_asr_local_attn.py Fixed Show fixed Hide fixed

Merge branch 'main' into pr_global_tokens

52db5d4

titu1994 reviewed Mar 24, 2023

View reviewed changes

Merge branch 'main' into pr_global_tokens

58c2c66

VahidooX requested changes Mar 31, 2023

View reviewed changes

sam1373 and others added 6 commits April 3, 2023 10:18

update

8c9407f

Signed-off-by: sam1373 <samuelkriman@gmail.com>

style

9bdf9bf

Signed-off-by: sam1373 <samuelkriman@gmail.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

c51fa12

for more information, see https://pre-commit.ci

Merge branch 'main' into pr_global_tokens

cd1fad2

names, comments

93e2acc

Signed-off-by: sam1373 <samuelkriman@gmail.com>

move comment

7ed691c

Signed-off-by: sam1373 <samuelkriman@gmail.com>

github-advanced-security bot found potential problems Apr 3, 2023

View reviewed changes

nemo/collections/asr/parts/submodules/multi_head_attention.py Fixed Show fixed Hide fixed

sam1373 added 2 commits April 3, 2023 16:10

import

fc3e7ce

Signed-off-by: sam1373 <samuelkriman@gmail.com>

Merge branch 'main' into pr_global_tokens

a87726c

sam1373 requested a review from VahidooX April 5, 2023 20:19

VahidooX previously approved these changes Apr 5, 2023

View reviewed changes

examples/asr/conf/fastconformer/fast-conformer-long_ctc_bpe.yaml Outdated Show resolved Hide resolved

docs

2a71e69

Signed-off-by: sam1373 <samuelkriman@gmail.com>

sam1373 dismissed VahidooX’s stale review via 2a71e69 April 6, 2023 00:15

sam1373 added 4 commits April 5, 2023 17:18

docs

5ce449c

Signed-off-by: sam1373 <samuelkriman@gmail.com>

Merge branch 'main' into pr_global_tokens

27488d7

disable note

a52a618

Signed-off-by: sam1373 <samuelkriman@gmail.com>

Merge branch 'main' into pr_global_tokens

db84191

VahidooX approved these changes Apr 6, 2023

View reviewed changes

sam1373 requested a review from titu1994 April 6, 2023 01:57

sam1373 merged commit 65ae83d into NVIDIA:main Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ASR] Conformer global tokens in local attention #6253

[ASR] Conformer global tokens in local attention #6253

sam1373 commented Mar 19, 2023 •

edited

Loading

titu1994 left a comment

titu1994 Mar 24, 2023

titu1994 Mar 24, 2023

titu1994 Mar 24, 2023

titu1994 Mar 24, 2023

titu1994 Mar 24, 2023

titu1994 Mar 24, 2023

titu1994 Mar 24, 2023

titu1994 Mar 24, 2023

titu1994 Mar 24, 2023

VahidooX left a comment

VahidooX Mar 31, 2023

VahidooX left a comment

VahidooX left a comment


		# This version uses Longformer-style attention in order to handle longer audio

		name: "FastConformer-CTC-BPE"


		# This version uses Longformer-style attention in order to handle longer audio

		name: "FastConformer-Transducer-BPE"


		return attn_output_only_global + attn_output_without_global

		def _compute_global_attn_output_from_hidden(

		@@ -0,0 +1,203 @@
		# It contains the default values for training a Fast Conformer-CTC ASR model, large size (~120M) with CTC loss and sub-word encoding.

[ASR] Conformer global tokens in local attention #6253

[ASR] Conformer global tokens in local attention #6253

Conversation

sam1373 commented Mar 19, 2023 • edited Loading

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

titu1994 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VahidooX left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

VahidooX left a comment

Choose a reason for hiding this comment

VahidooX left a comment

Choose a reason for hiding this comment

sam1373 commented Mar 19, 2023 •

edited

Loading