Skip to content

Commit

Permalink
docs: fix small mistakes in bnb paragraph
Browse files Browse the repository at this point in the history
  • Loading branch information
Titus-von-Koeller committed Sep 11, 2024
1 parent 2be4169 commit e607b7c
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/source/en/model_doc/mixtral.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ The Flash Attention-2 model uses also a more memory efficient cache slicing mech

As the Mixtral model has 45 billion parameters, that would require about 90GB of GPU RAM in half precision (float16), since each parameter is stored in 2 bytes. However, one can shrink down the size of the model using [quantization](../quantization.md). If the model is quantized to 4 bits (or half a byte per parameter), a single A100 with 40GB of RAM is enough to fit the entire model, as in that case only about 27 GB of RAM is required.

Quantizing a model is as simple as passing a `quantization_config` to the model. Below, we'll leverage the BitsAndyBytes quantization (but refer to [this page](../quantization.md) for other quantization methods):
Quantizing a model is as simple as passing a `quantization_config` to the model. Below, we'll leverage the bitsandbytes quantization library (but refer to [this page](../quantization.md) for alternative quantization methods):

```python
>>> import torch
Expand Down

0 comments on commit e607b7c

Please sign in to comment.