FlashMLA on MXMACA

We provide the implementation of FlashMLA from FlashAttention-2(version 2.6.3), based on MACA toolkit and C500 chips.

FlashAttention-2 currently supports:

Datatype fp16 and bf16.
Multi-Token Parallelism = 1
Paged kvcache with block size equal to 2^n (n >= 0)

How to run on MXMACA Device

Installation

Requirements:

MXMACA GPUs.
MACA development toolkit.
mcTlass source code.
mcPytorch2.1 and mcTriton2.1 from maca toolkit wheel package and above.

To install:

Make sure that maca pyTorch2.1 is installed.
Download mctlass source code from: https://sw-download.metax-tech.com/

Set environment variables

export MACA_PATH=/your/maca/path
export CUDA_PATH=$MACA_PATH/tools/cu-bridge
export MACA_CLANG_PATH=$MACA_PATH/mxgpu_llvm/bin
export LD_LIBRARY_PATH=$MACA_PATH/lib:$MACA_PATH/mxgpu_llvm/lib:$MACA_PATH/ompi/lib:$LD_LIBRARY_PATH

Install

python setup.py install

Benchmark

python tests/test_flash_mla.py

Usage

from flash_mla import get_mla_metadata, flash_mla_with_kvcache

tile_scheduler_metadata, num_splits = get_mla_metadata(cache_seqlens, s_q * h_q // h_kv, h_kv)

for i in range(num_layers):
    ...
    o_i, lse_i = flash_mla_with_kvcache(
        q_i, kvcache_i, block_table, cache_seqlens, dv,
        tile_scheduler_metadata, num_splits, causal=True,
    )
    ...

Requirements

MXMACA 2.27 and above
PyTorch 2.0 and above

Acknowledgement

FlashMLA is inspired by FlashAttention 2&3 and cutlass projects.

Citation

@misc{flashmla2025,
      title={FlashMLA: Efficient MLA decoding kernel}, 
      author={Jiashi Li},
      year={2025},
      publisher = {GitHub},
      howpublished = {\url{https://github.com/deepseek-ai/FlashMLA}},
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
benchmark		benchmark
csrc		csrc
flash_attn		flash_attn
flash_mla		flash_mla
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlashMLA on MXMACA

How to run on MXMACA Device

Installation

Set environment variables

Install

Benchmark

Usage

Requirements

Acknowledgement

Citation

About

Releases

Packages

Languages

License

MetaX-MACA/FlashMLA

Folders and files

Latest commit

History

Repository files navigation

FlashMLA on MXMACA

How to run on MXMACA Device

Installation

Set environment variables

Install

Benchmark

Usage

Requirements

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages