Replies: 2 comments 1 reply
-
It's used in chunk aware transformers. cc @VahidooX |
Beta Was this translation helpful? Give feedback.
0 replies
-
The caching is being used during the inference when cache-aware streaming Conformer is being used. During the training, it is skipped. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am currently working with the MultiHeadAttention class and found the update_cache function. As far as I understand it does nothing at this moment and is a template for the future, am I right? and if it's true, can you explain what this function will do?
NeMo/nemo/collections/asr/parts/submodules/multi_head_attention.py
Lines 154 to 165 in 9f94649
Beta Was this translation helpful? Give feedback.
All reactions