Releases: ROCm/flash-attention
Releases · ROCm/flash-attention
v2.7.0-cktile
- Reduce LDS usage when num_splits <= 8
- Use smaller tile size to speed-up small seqlen cases
- Fine-tune block mapping
- Use larger vector size for writing workspace
- Speed-up combine kernel
- Fix block table read out-of-bound issue
- Fix wrong key/value range in each splits
- Not to access dropout seed & offset device pointer in the host api
v2.6.3-cktile
We send the PR to upstream in this PR
- Update the ROCm backend (CK), so I modify how to call ck due to changing of CK api.
- Improve backward performance by updating the CK (1)
- Implement mha_fwd_kvcache().
- Change of compile flag to support ROCm6.2
- Change bf16 rounding to RTN (round to nearest)
v2.6.2-cktile
This release is the first version of supporting composable kernel tile backend
vllm-v2.5.9post1-90a.942-240719
This release is created solely for convenient installation by vLLM. The attached wheel is created from the ck_tile
branch as of 07/19/2024, with commit hash 23a2b1c2f21
, for architectures gfx90a;gfx942
, and is designed for use with torch==2.5.0.dev20240710
(this requirement is not strict) and ROCm 6.1.
To install matching version of torch
:
python3 -m pip install --no-cache-dir --pre \
torch==2.5.0.dev20240710 torchvision==0.20.0.dev20240710 \
--index-url https://download.pytorch.org/whl/nightly/rocm6.1