Skip to content

Conversation

@LopezCastroRoberto
Copy link
Collaborator

🚀 What is new in QuTLASS v0.2:

  • FlashInfer backend support for B200 GPUs
  • Quantization-Aware Training (QAT) via MXFP types:
    • Quartet clipping mask computation integrated in quantization routines
    • Prototype backward kernels for MXFP4 (sm_120) and MXFP8 (sm_100)
    • Integrated CUTLASS MXFP8 backward GEMM kernels (TN and NN layouts)
  • Updated Transformers Integration for QAT (#41897)
  • Nanochat-QAT Integration (#1)

Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>
@LopezCastroRoberto LopezCastroRoberto changed the title Qutlass v0.2 QuTLASS v0.2 Oct 28, 2025
@LopezCastroRoberto LopezCastroRoberto merged commit 0997aa2 into main Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants