-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ASR] Conformer global tokens in local attention #6253
Conversation
Signed-off-by: sam1373 <samuelkriman@gmail.com>
Signed-off-by: sam1373 <samuelkriman@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks good, some minor changes commented. @VahidooX for final review
|
||
# This version uses Longformer-style attention in order to handle longer audio | ||
|
||
name: "FastConformer-CTC-BPE" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add long-context
|
||
# Feed forward module's params | ||
ff_expansion_factor: 4 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a Note regarding the sections that are the primary difference compared to original model config
|
||
# This version uses Longformer-style attention in order to handle longer audio | ||
|
||
name: "FastConformer-Transducer-BPE" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name update
# Feed forward module's params | ||
ff_expansion_factor: 4 | ||
|
||
# Multi-headed Attention Module's params |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
if self.global_tokens > 0: | ||
# compute sum of global and local attn | ||
x = self._compute_attn_output_with_global_indices( | ||
value_vectors=v.transpose(1, 2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add shape info as comment here
max_num_global_attn_indices=max_num_global_attn_indices, | ||
is_index_global_attn_nonzero=is_index_global_attn_nonzero, | ||
is_local_index_global_attn_nonzero=is_local_index_global_attn_nonzero, | ||
w=w, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use better keyword variable name
return self.linear_out(x.reshape(n_batch, -1, self.h * self.d_k)[:, :T]) | ||
|
||
@staticmethod | ||
def _get_global_attn_indices(is_index_global_attn: torch.Tensor) -> Tuple: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about make this protected instead of private? Put _ at the end of the method name
is_local_index_no_global_attn_nonzero, | ||
) | ||
|
||
def _compute_attn_probs_global_key( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
|
||
return attn_output_only_global + attn_output_without_global | ||
|
||
def _compute_global_attn_output_from_hidden( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Left some minor comments.
@@ -0,0 +1,203 @@ | |||
# It contains the default values for training a Fast Conformer-CTC ASR model, large size (~120M) with CTC loss and sub-word encoding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it nice to have "-" and "_" in the name at the same time?
examples/asr/conf/fastconformer/fast-conformer-long_ctc_bpe.yaml
Outdated
Show resolved
Hide resolved
examples/asr/conf/fastconformer/fast-conformer-long_ctc_bpe.yaml
Outdated
Show resolved
Hide resolved
for more information, see https://pre-commit.ci
Signed-off-by: sam1373 <samuelkriman@gmail.com>
Signed-off-by: sam1373 <samuelkriman@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just left some minor comments. Let's wait for the second round of review from @titu1994.
examples/asr/conf/fastconformer/fast-conformer-long_ctc_bpe.yaml
Outdated
Show resolved
Hide resolved
Signed-off-by: sam1373 <samuelkriman@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
* global tokens Signed-off-by: sam1373 <samuelkriman@gmail.com> * test, configs, docs Signed-off-by: sam1373 <samuelkriman@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update Signed-off-by: sam1373 <samuelkriman@gmail.com> * style Signed-off-by: sam1373 <samuelkriman@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * names, comments Signed-off-by: sam1373 <samuelkriman@gmail.com> * move comment Signed-off-by: sam1373 <samuelkriman@gmail.com> * import Signed-off-by: sam1373 <samuelkriman@gmail.com> * docs Signed-off-by: sam1373 <samuelkriman@gmail.com> * docs Signed-off-by: sam1373 <samuelkriman@gmail.com> * disable note Signed-off-by: sam1373 <samuelkriman@gmail.com> --------- Signed-off-by: sam1373 <samuelkriman@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
What does this PR do ?
Adds Longformer-style attention for Conformer encoder (limited context + few tokens with full context attention). Unlike with just limited context attention, when using global tokens model needs to be fine-tuned with this approach to get good results. After fine-tuning results are improved particularly on long audio (even if it's much longer than training data). When using the default Fast Conformer configs with this attention, inference on audio above 1 hour is supported.
Collection: ASR
Changelog
Usage
Add the following parameters to the Conformer encoder in config:
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information