Commit ec24561
authored
[Example] Add efficient attention sink backward implementations and tests (#877)
* [Example] Add a new example to support attention sink for MHA
- Introduced a new example script for multi-head attention (MHA) with sliding window attention and sink tokens.
- Added a reference attention function to validate the implementation against PyTorch.
- Included argument parsing for command-line execution of the example.
* [Example] Replace MHA sink forward example with updated implementation
- Removed the old example script for multi-head attention (MHA) with sliding window attention and sink tokens.
- Introduced a new example script that modifies the attention mechanism to enhance performance and maintainability.
- Updated argument parsing and reference functions to align with the new implementation.
* Enhance MHA sink example with sliding window support
- Added a `window_size` parameter to the `flashattn` function to enable sliding window attention.
- Implemented assertions to ensure `window_size` is compatible with `block_N`.
- Updated the main function to include a `tune` option for performance tuning.
- Introduced a new test file to validate both full attention and sliding window scenarios.
- Adjusted FLOPS calculation to account for the sliding window configuration.
* lint
* [Fix] Add checkinf process to fix the bug of swa
* Migrate to BSHD layout to align with triton baselines
* lint
* fix typo
* Refactor MHA sink example to use seq_q and seq_kv parameters to accommodate the new sequence length parameters.
* Add GQA sink example for optimized attention mechanism & lint fix
* fix several typos and bugs
* lint
* fix speed issues of swa
* Add flash attention example with backward pass for BHSD layout and corresponding test cases
* Add backward pass implementation for flash attention with sinks and corresponding test case
* fix lint and typo
* Optimze the calculation of `dsinks`
* Add support for swa backward and update examples
* fix previous typos
* Add example for GQA sink backward pass and update tests for both MHA and GQA sinks
* fix lint
* fix previous typos
* typo1 parent 95c373f commit ec24561
File tree
6 files changed
+1407
-5
lines changed- examples
- attention_sink
- flash_attention
6 files changed
+1407
-5
lines changed
0 commit comments