SqueezeBits Inc.

Torch-TRTLLM Public
Ditto is an open-source framework that enables direct conversion of HuggingFace PreTrainedModels into TensorRT-LLM engines.

SqueezeBits/Torch-TRTLLM’s past year of commit activity

Python 17 Apache-2.0 0 0 3 Updated Mar 4, 2025
owlite Public
OwLite is a low-code AI model compression toolkit for AI models.

SqueezeBits/owlite’s past year of commit activity

Python 42 AGPL-3.0 4 0 0 Updated Feb 20, 2025
vllm-fork Public Forked from HabanaAI/vllm-fork
A high-throughput and memory-efficient inference and serving engine for LLMs

SqueezeBits/vllm-fork’s past year of commit activity

Python 0 Apache-2.0 6,108 0 0 Updated Feb 20, 2025
gradio Public Forked from gradio-app/gradio
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

SqueezeBits/gradio’s past year of commit activity

Python 0 Apache-2.0 2,825 0 0 Updated Jan 13, 2025
TensorRT-LLM Public Forked from NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

SqueezeBits/TensorRT-LLM’s past year of commit activity

C++ 0 Apache-2.0 1,139 0 1 Updated Dec 12, 2024
vllm-hpu-extension Public Forked from HabanaAI/vllm-hpu-extension

SqueezeBits/vllm-hpu-extension’s past year of commit activity

Python 0 Apache-2.0 22 0 0 Updated Nov 22, 2024
neural-compressor Public
Intel Neural Compressor

SqueezeBits/neural-compressor’s past year of commit activity

Python 0 Apache-2.0 0 0 0 Updated Oct 22, 2024
owlite-examples Public
OwLite Examples repository offers illustrative example codes to help users seamlessly compress PyTorch deep learning models and transform them into TensorRT engines.

SqueezeBits/owlite-examples’s past year of commit activity

Python 10 1 0 1 Updated Sep 27, 2024
nvidia-dind Public Forked from ehfd/nvidia-dind
Isolated DinD (Docker in Docker) container for developing and deploying Docker containers using NVIDIA GPUs and the NVIDIA container toolkit. Useful for deploying the Docker engine with NVIDIA in Kubernetes.

SqueezeBits/nvidia-dind’s past year of commit activity

Dockerfile 0 MPL-2.0 17 0 0 Updated Aug 27, 2024
mlperf_inference_results_v4.0 Public

SqueezeBits/mlperf_inference_results_v4.0’s past year of commit activity

C++ 0 Apache-2.0 1 0 1 Updated Jul 23, 2024

View all repositories

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SqueezeBits Inc.

Popular repositories Loading

Repositories

People

Top languages

Most used topics