Skip to content
@SqueezeBits

SqueezeBits Inc.

We are squeezing bits.

Popular repositories Loading

  1. QUICK QUICK Public

    QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

    Python 116 5

  2. owlite owlite Public

    OwLite is a low-code AI model compression toolkit for AI models.

    Python 42 4

  3. Torch-TRTLLM Torch-TRTLLM Public

    Ditto is an open-source framework that enables direct conversion of HuggingFace PreTrainedModels into TensorRT-LLM engines.

    Python 17

  4. owlite-examples owlite-examples Public

    OwLite Examples repository offers illustrative example codes to help users seamlessly compress PyTorch deep learning models and transform them into TensorRT engines.

    Python 10 1

  5. .github .github Public

  6. mlperf_inference_results_v4.0 mlperf_inference_results_v4.0 Public

    C++ 1

Repositories

Showing 10 of 12 repositories
  • Torch-TRTLLM Public

    Ditto is an open-source framework that enables direct conversion of HuggingFace PreTrainedModels into TensorRT-LLM engines.

    SqueezeBits/Torch-TRTLLM’s past year of commit activity
    Python 17 Apache-2.0 0 0 3 Updated Mar 4, 2025
  • owlite Public

    OwLite is a low-code AI model compression toolkit for AI models.

    SqueezeBits/owlite’s past year of commit activity
    Python 42 AGPL-3.0 4 0 0 Updated Feb 20, 2025
  • vllm-fork Public Forked from HabanaAI/vllm-fork

    A high-throughput and memory-efficient inference and serving engine for LLMs

    SqueezeBits/vllm-fork’s past year of commit activity
    Python 0 Apache-2.0 6,108 0 0 Updated Feb 20, 2025
  • gradio Public Forked from gradio-app/gradio

    Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

    SqueezeBits/gradio’s past year of commit activity
    Python 0 Apache-2.0 2,825 0 0 Updated Jan 13, 2025
  • TensorRT-LLM Public Forked from NVIDIA/TensorRT-LLM

    TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

    SqueezeBits/TensorRT-LLM’s past year of commit activity
    C++ 0 Apache-2.0 1,139 0 1 Updated Dec 12, 2024
  • SqueezeBits/vllm-hpu-extension’s past year of commit activity
    Python 0 Apache-2.0 22 0 0 Updated Nov 22, 2024
  • neural-compressor Public

    Intel Neural Compressor

    SqueezeBits/neural-compressor’s past year of commit activity
    Python 0 Apache-2.0 0 0 0 Updated Oct 22, 2024
  • owlite-examples Public

    OwLite Examples repository offers illustrative example codes to help users seamlessly compress PyTorch deep learning models and transform them into TensorRT engines.

    SqueezeBits/owlite-examples’s past year of commit activity
    Python 10 1 0 1 Updated Sep 27, 2024
  • nvidia-dind Public Forked from ehfd/nvidia-dind

    Isolated DinD (Docker in Docker) container for developing and deploying Docker containers using NVIDIA GPUs and the NVIDIA container toolkit. Useful for deploying the Docker engine with NVIDIA in Kubernetes.

    SqueezeBits/nvidia-dind’s past year of commit activity
    Dockerfile 0 MPL-2.0 17 0 0 Updated Aug 27, 2024
  • SqueezeBits/mlperf_inference_results_v4.0’s past year of commit activity
    C++ 0 Apache-2.0 1 0 1 Updated Jul 23, 2024

Top languages

Loading…

Most used topics

Loading…