Pinned Loading
-
vllm
vllm PublicForked from vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
-
openvinotoolkit/openvino
openvinotoolkit/openvino PublicOpenVINO™ is an open source toolkit for optimizing and deploying AI inference
-
NVIDIA/cutlass
NVIDIA/cutlass PublicCUDA Templates and Python DSLs for High-Performance Linear Algebra
-
mit-han-lab/llm-awq
mit-han-lab/llm-awq Public[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
-
torch-custom-op
torch-custom-op PublicA project for demostrating custom op registration using modern PyTorch APIs
Cuda
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.

