Skip to content
Change the repository type filter

All

    Repositories list

    • aiter

      Public
      AI Tensor Engine for ROCm
      Cuda
      MIT License
      5000Updated Feb 14, 2025Feb 14, 2025
    • A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      5.7k002Updated Feb 14, 2025Feb 14, 2025
    • JamAIBase

      Public
      The collaborative spreadsheet for AI. Chain cells into powerful pipelines, experiment with prompts and models, and evaluate LLM responses in real-time. Work together seamlessly to build and iterate on AI applications.
      Python
      Apache License 2.0
      2579610Updated Feb 11, 2025Feb 11, 2025
    • vllm

      Public
      vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      5.7k8810Updated Feb 10, 2025Feb 10, 2025
    • The driver for LMCache core to run in vLLM
      Python
      Apache License 2.0
      16000Updated Jan 24, 2025Jan 24, 2025
    • LMCache

      Public
      ROCm support of Ultra-Fast and Cheaper Long-Context LLM Inference
      Python
      Apache License 2.0
      49000Updated Jan 24, 2025Jan 24, 2025
    • Python
      7000Updated Jan 23, 2025Jan 23, 2025
    • Python
      Apache License 2.0
      54000Updated Jan 22, 2025Jan 22, 2025
    • kvpress

      Public
      LLM KV cache compression made easy
      Python
      Apache License 2.0
      26000Updated Jan 21, 2025Jan 21, 2025
    • litellm

      Public
      Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
      Python
      Other
      2.1k000Updated Jan 13, 2025Jan 13, 2025
    • Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
      C++
      Other
      146000Updated Dec 20, 2024Dec 20, 2024
    • Mooncake

      Public
      Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
      C++
      Apache License 2.0
      151000Updated Dec 16, 2024Dec 16, 2024
    • ROCm Implementation of torchac_cuda from LMCache
      Cuda
      1000Updated Dec 16, 2024Dec 16, 2024
    • etalon

      Public
      LLM Serving Performance Evaluation Harness
      Python
      Apache License 2.0
      8000Updated Dec 16, 2024Dec 16, 2024
    • Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
      Python
      MIT License
      123001Updated Dec 7, 2024Dec 7, 2024
    • Efficient Triton Kernels for LLM Training
      Python
      BSD 2-Clause "Simplified" License
      267000Updated Dec 6, 2024Dec 6, 2024
    • Efficient LLM Inference over Long Sequences
      Python
      Apache License 2.0
      18000Updated Nov 29, 2024Nov 29, 2024
    • A calculator to estimate the memory footprint, capacity, and latency on NVIDIA AMD Intel
      Python
      4000Updated Nov 24, 2024Nov 24, 2024
    • ROCm Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
      Cuda
      Apache License 2.0
      58100Updated Nov 21, 2024Nov 21, 2024
    • Go ahead and axolotl questions
      Python
      Apache License 2.0
      958000Updated Nov 16, 2024Nov 16, 2024
    • Typescript Documentation of JamAISDK
      HTML
      0000Updated Nov 14, 2024Nov 14, 2024
    • skypilot

      Public
      SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
      Python
      Apache License 2.0
      564000Updated Nov 7, 2024Nov 7, 2024
    • This is a repository that contains a CI/CD that will try to compile docker images that already built flash attention into the image to facilitate quicker development and deployment of other frameworks.
      Shell
      Apache License 2.0
      0100Updated Oct 26, 2024Oct 26, 2024
    • ROCm Fork of Fast and memory-efficient exact attention (The idea of this branch is to hope to generate flash attention pypi package to be readily installed and used.
      Python
      BSD 3-Clause "New" or "Revised" License
      1.5k000Updated Oct 26, 2024Oct 26, 2024
    • A Python client for the Unstructured hosted API
      Python
      MIT License
      18001Updated Oct 14, 2024Oct 14, 2024
    • EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU
      Python
      13172Updated Oct 6, 2024Oct 6, 2024
    • Go
      1000Updated Sep 26, 2024Sep 26, 2024
    • PowerToys

      Public
      Windows system utilities to maximize productivity
      C#
      MIT License
      6.8k000Updated Aug 9, 2024Aug 9, 2024
    • Arena-Hard-Auto: An automatic LLM benchmark.
      Jupyter Notebook
      Apache License 2.0
      92000Updated Jul 15, 2024Jul 15, 2024
    • Python
      Apache License 2.0
      143000Updated Jul 11, 2024Jul 11, 2024