Skip to content

Commit

Permalink
added image
Browse files Browse the repository at this point in the history
  • Loading branch information
robertgshaw2-neuralmagic committed Jul 10, 2024
1 parent ac118ca commit d4f5cae
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 3 deletions.
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,18 @@
# LLM Compressor

`llm-compressor` is an easy-to-use library for optimizing models for deployment with `vllm`, including:

* Comprehensive set of quantization algorithms including weight-only and activation quantization
* Seemless integration Hugging Face models and repositories
* `safetensors`-based file format compatible with `vllm`

<p align="center">
<img alt="LLM Compressor Flow" src="docs/images/architecture.png" width="50%" />
</p>


### Supported Formats
* Mixed Precision: W4A16, W8A16
* Integer Quantization: W8A8 (int8)
* Floating Point Quantization: W8A8 (fp8)
* Activation Quantization: W8A8 (int8 and fp8)
* 2:4 Semi-structured Sparsity
* Unstructured Sparsity

Expand Down
Binary file added docs/images/architecture.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit d4f5cae

Please sign in to comment.