Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 1,103 66 Updated Feb 28, 2025

hemingkx / Spec-Bench

Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)

Python 238 27 Updated Mar 10, 2025

Deep-Learning-Profiling-Tools / triton-viz

Python 187 15 Updated Feb 20, 2025

SiriusNEO / Triton-Puzzles-Lite

Puzzles for learning Triton, play it with minimal environment configuration!

Python 253 25 Updated Dec 3, 2024

srush / Triton-Puzzles

Puzzles for learning Triton

Jupyter Notebook 1,490 111 Updated Nov 18, 2024

bytedance / ABQ-LLM

An acceleration library that supports arbitrary bit-width combinatorial quantization operations

C++ 216 21 Updated Sep 30, 2024

ChangyuanWang17 / QVLM

[NeurIPS'24]Efficient and accurate memory saving method towards W4A4 large multi-modal models.

Python 67 5 Updated Jan 3, 2025

Intelligent-Computing-Lab-Yale / TesseraQ

Python 18 2 Updated Oct 31, 2024

ruikangliu / FlatQuant

Official PyTorch implementation of FlatQuant: Flatness Matters for LLM Quantization

Python 108 9 Updated Jan 23, 2025

microsoft / BitNet

Official inference framework for 1-bit LLMs

C++ 12,794 900 Updated Feb 18, 2025

opengear-project / GEAR

GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM

Python 157 16 Updated Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yiqian He rainyBJ

Block or report rainyBJ

Stars

sileix / chain-of-draft

kssteven418 / BigLittleDecoder

Infini-AI-Lab / Sequoia

NonvolatileMemory / GliDe_with_a_CaPE_ICML_24

HArmonizedSS / HASS

deepseek-ai / FlashMLA

HazyResearch / ThunderKittens

liyunqianggyn / Awesome-LLMs-Pruning

arcee-ai / PruneMe

sramshetty / ShortGPT

deepseek-ai / DeepSeek-Math

hao-ai-lab / LookaheadDecoding

huggingface / open-r1

tinganchen / AlignQ

deepseek-ai / DeepSeek-R1

deepseek-ai / DeepSeek-V3

unslothai / unsloth

shadowpa0327 / Palu

Zyphra / tree_attention

thu-ml / SageAttention