Releases · ROCm/flash-attention

15 Nov 05:57

rocking5566

v2.7.0-cktile

241c682

v2.7.0-cktile Latest

Latest

Reduce LDS usage when num_splits <= 8
Use smaller tile size to speed-up small seqlen cases
Fine-tune block mapping
Use larger vector size for writing workspace
Speed-up combine kernel
Fix block table read out-of-bound issue
Fix wrong key/value range in each splits
Not to access dropout seed & offset device pointer in the host api

Assets 2

17 Sep 18:52

github-actions

v2.6.3-cktile

e2182cc

v2.6.3-cktile

We send the PR to upstream in this PR

Update the ROCm backend (CK), so I modify how to call ck due to changing of CK api.
Improve backward performance by updating the CK (1)
Implement mha_fwd_kvcache().
Change of compile flag to support ROCm6.2
Change bf16 rounding to RTN (round to nearest)

Assets 86

14 Aug 10:02

github-actions

v2.6.2-cktile

59594f2

v2.6.2-cktile

This release is the first version of supporting composable kernel tile backend

Assets 66

19 Jul 14:18

mawong-amd

v2.5.9post1-cktile-vllm

23a2b1c

vllm-v2.5.9post1-90a.942-240719 Pre-release

Pre-release

This release is created solely for convenient installation by vLLM. The attached wheel is created from the ck_tile branch as of 07/19/2024, with commit hash 23a2b1c2f21, for architectures gfx90a;gfx942, and is designed for use with torch==2.5.0.dev20240710 (this requirement is not strict) and ROCm 6.1.

To install matching version of torch:

python3 -m pip install --no-cache-dir --pre \
                torch==2.5.0.dev20240710 torchvision==0.20.0.dev20240710 \
               --index-url https://download.pytorch.org/whl/nightly/rocm6.1

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ROCm/flash-attention

v2.7.0-cktile

v2.6.3-cktile

v2.6.2-cktile

vllm-v2.5.9post1-90a.942-240719