Implement SummaryMixing in FastConformer Architecture for Enhanced Efficiency #8454

nabil6391 · 2024-02-19T01:28:48Z

nabil6391
Feb 19, 2024

Hello Nvidia NeMo Team,

I would like to propose the integration of SummaryMixing into the FastConformer architecture within the NeMo toolkit. SummaryMixing is a novel approach that eliminates the need for multi-head self-attention (MHSA) in speech recognition and understanding encoders, relying instead on an efficient global context vector summarizing each speech utterance.

Key Benefits:

Reduction in training time by up to 28%.
More than 50% reduction in VRAM consumption.
Accelerated inference and decoding times for offline speech recognition and understanding.

I believe that adopting SummaryMixing could greatly enhance the FastConformer's efficiency and effectiveness, what are your thoughts on that, if any?

https://arxiv.org/pdf/2307.07421.pdf
https://github.com/SamsungLabs/SummaryMixing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement SummaryMixing in FastConformer Architecture for Enhanced Efficiency #8454

{{title}}

Replies: 0 comments

Select a reply

Implement SummaryMixing in FastConformer Architecture for Enhanced Efficiency #8454

nabil6391 Feb 19, 2024

Replies: 0 comments

nabil6391
Feb 19, 2024