[Feat] Add WhisperFlashAttention2 #2018
Merged
+769
−38
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Test Report
Hard Environment:
Ascend(snt9b|32G)
Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 20250414 master) : 2.6.0 (conda env - mindnlp)
-- Python version (e.g., Python 3.7.5) : 3.10.0 (conda env - mindnlp)
-- Transformers version : 4.51.0 (conda env - torch)
-- PyTorch version : 2.1.0 (conda env - torch)
-- CANN Tookies version : 8.1.RC1.alpha002
-- OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu 22.04.4 LTS
-- GCC/Compiler version (if compiled from source): 11.04
-- Docker image : swr.cn-central-221.ovaijisuan.com/mindformers/deepseek_v3_mindspore2.5.0-infer:20250217
Recognition time comparison
ps: take the average of three times
Example 1 - nihao.mp3(1s):
After introducing flash-attention, the performance of short audio is improved by about
(13.0729-10.4567)/13.0729=20%
;There is still a certain gap compared with PTA implementation.Example 2 - tianlong0925.mp3(91s):
In the case of long audio, FlashAttention2 brings about
(72.8655- 64.7135)/72.8655=11.2%
acceleration effect; It is worth noting that PTA is seriously degraded in Flash mode (performance drops by about 6 times), while MindNLP implementation performs stably.Test Code:
MindSpore
PyTorch + Ascend
Related Issues
Fixes #2014