Skip to content
@SqueezeBits

SqueezeBits Inc.

We are squeezing bits.

Popular repositories Loading

  1. QUICK QUICK Public

    QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

    Python 114 5

  2. owlite owlite Public

    OwLite is a low-code AI model compression toolkit for AI models.

    Python 39 3

  3. owlite-examples owlite-examples Public

    OwLite Examples repository offers illustrative example codes to help users seamlessly compress PyTorch deep learning models and transform them into TensorRT engines.

    Python 9 1

  4. .github .github Public

  5. mlperf_inference_results_v4.0 mlperf_inference_results_v4.0 Public

    C++ 1

  6. vllm-fork vllm-fork Public

    Forked from HabanaAI/vllm-fork

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python

Repositories

Showing 10 of 11 repositories
  • vllm-fork Public Forked from HabanaAI/vllm-fork

    A high-throughput and memory-efficient inference and serving engine for LLMs

    SqueezeBits/vllm-fork’s past year of commit activity
    Python 0 Apache-2.0 5,426 0 0 Updated Jan 15, 2025
  • gradio Public Forked from gradio-app/gradio

    Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

    SqueezeBits/gradio’s past year of commit activity
    Python 0 Apache-2.0 2,748 0 0 Updated Jan 13, 2025
  • TensorRT-LLM Public Forked from NVIDIA/TensorRT-LLM

    TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

    SqueezeBits/TensorRT-LLM’s past year of commit activity
    C++ 0 Apache-2.0 1,096 0 1 Updated Dec 12, 2024
  • SqueezeBits/vllm-hpu-extension’s past year of commit activity
    Python 0 Apache-2.0 16 0 0 Updated Nov 22, 2024
  • neural-compressor Public

    Intel Neural Compressor

    SqueezeBits/neural-compressor’s past year of commit activity
    Python 0 Apache-2.0 0 0 0 Updated Oct 22, 2024
  • owlite-examples Public

    OwLite Examples repository offers illustrative example codes to help users seamlessly compress PyTorch deep learning models and transform them into TensorRT engines.

    SqueezeBits/owlite-examples’s past year of commit activity
    Python 9 1 0 1 Updated Sep 27, 2024
  • owlite Public

    OwLite is a low-code AI model compression toolkit for AI models.

    SqueezeBits/owlite’s past year of commit activity
    Python 39 AGPL-3.0 3 0 0 Updated Sep 27, 2024
  • nvidia-dind Public Forked from ehfd/nvidia-dind

    Isolated DinD (Docker in Docker) container for developing and deploying Docker containers using NVIDIA GPUs and the NVIDIA container toolkit. Useful for deploying the Docker engine with NVIDIA in Kubernetes.

    SqueezeBits/nvidia-dind’s past year of commit activity
    Dockerfile 0 MPL-2.0 15 0 0 Updated Aug 27, 2024
  • SqueezeBits/mlperf_inference_results_v4.0’s past year of commit activity
    C++ 0 Apache-2.0 1 0 1 Updated Jul 23, 2024
  • .github Public
    SqueezeBits/.github’s past year of commit activity
    0 0 0 0 Updated Jul 22, 2024

Top languages

Loading…

Most used topics

Loading…