Skip to content
Change the repository type filter

All

    Repositories list

    • Ditto is an open-source framework that enables direct conversion of HuggingFace PreTrainedModels into TensorRT-LLM engines.
      Python
      Apache License 2.0
      01703Updated Mar 4, 2025Mar 4, 2025
    • owlite

      Public
      OwLite is a low-code AI model compression toolkit for AI models.
      Python
      GNU Affero General Public License v3.0
      44200Updated Feb 20, 2025Feb 20, 2025
    • vllm-fork

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      6k000Updated Feb 20, 2025Feb 20, 2025
    • gradio

      Public
      Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
      Python
      Apache License 2.0
      2.8k000Updated Jan 13, 2025Jan 13, 2025
    • TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
      C++
      Apache License 2.0
      1.1k001Updated Dec 12, 2024Dec 12, 2024
    • Python
      Apache License 2.0
      22000Updated Nov 22, 2024Nov 22, 2024
    • Intel Neural Compressor
      Python
      Apache License 2.0
      0000Updated Oct 22, 2024Oct 22, 2024
    • OwLite Examples repository offers illustrative example codes to help users seamlessly compress PyTorch deep learning models and transform them into TensorRT engines.
      Python
      11001Updated Sep 27, 2024Sep 27, 2024
    • Isolated DinD (Docker in Docker) container for developing and deploying Docker containers using NVIDIA GPUs and the NVIDIA container toolkit. Useful for deploying the Docker engine with NVIDIA in Kubernetes.
      Dockerfile
      Mozilla Public License 2.0
      17000Updated Aug 27, 2024Aug 27, 2024
    • C++
      Apache License 2.0
      1001Updated Jul 23, 2024Jul 23, 2024
    • .github

      Public
      0000Updated Jul 22, 2024Jul 22, 2024
    • QUICK

      Public
      QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
      Python
      MIT License
      511660Updated Mar 6, 2024Mar 6, 2024