Skip to content

Conversation

@k50112113
Copy link
Contributor

This PR contains the following:

  1. added fused elementwise multiply+addtion in aiter/ops/triton/fused_elementwise.py
  2. added fused RoPE+concat in aiter/ops/triton/fused_concat.py
  3. added fused flatten+mxfp4 quantization and fused RMSnorm+addtion+mxfp4 quantization in aiter/ops/triton/fused_quant.py
  4. fix "2*K" to "K" when reading aiter/ops/triton/gemm_a16w16.py configs
  5. added ouput options and batch size checks in aiter/ops/triton/batched_gemm_afp4wfp4_pre_quant.py
  6. added atomic add version of bf16 gemm in aiter/ops/triton/gemm_a16w16_atomic.py
  7. updated triton mxfp4 gemm and bf16 gemm configs

@k50112113 k50112113 requested review from azaidy and rahulbatra85 July 2, 2025 19:07
Copy link
Contributor

@rahulbatra85 rahulbatra85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comments. Thanks!

@k50112113 k50112113 requested a review from rahulbatra85 July 16, 2025 16:12
@rahulbatra85 rahulbatra85 merged commit 0a6777c into main Jul 16, 2025
13 checks passed
@rahulbatra85 rahulbatra85 deleted the shaoclee/ds_fused_custom_ops branch July 16, 2025 16:15
cagrikymk added a commit that referenced this pull request Jul 30, 2025
* add fused concat

* add fused elementwise and pytest

* clean up

* add fused quant

* clean up

* add fused mxfp4 quant and pytest

* add gemm_a16w16_atomic and related tests

* extend tests and add config files for bf16 GEMM with atomic add

* fused quant. code cleanup

* formatting changes

* add rms norm quant tests and changes

* update/add DS configs for GEMMs

* add prequant config

* fix bf16 gemm

* fix fp4 gemm atomic add

* black reformatting

* tune fp4 prequant gemm atomic

* fix batched fp4 prequant and add new DS configs

* optimize shapes for deepseek

* black reformatting

* update bf16 atomic GEMM

* add documentation for fused_concat

* update rope qk cat

* rename fused elementwise

* add doc for fused quant

* rename fused_quant into fused_mxfp4_quant

* fix typo

* fix a minor bug

* black formatting

* update comments and drop unused arg.

* fix pre-quant GEMM tests based on func. sign. changes

* fix pre-quant GEMM tests based on func. sign. changes

* doc on fused_mul_add

* corner case fix

* corner case fix

* test

* Fix pytest errors with LRU cache inplace mod - big test case error remains for batched gemm

* Black linting change

* Fix linting error

* Move inplace config operations to _get_config

---------

Co-authored-by: Mehmet Cagri Kaymak <mehmet.kaymak@amd.com>
Co-authored-by: Lukasz Burzawa <lukasz.burzawa@amd.com>
Co-authored-by: William Zhou <William.Zhou@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants