You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/tutorials/e2e_flow.rst
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -142,8 +142,8 @@ There are 3 types of folders:
142
142
│ ├── adapter_model.pt
143
143
│ ├── adapter_model.safetensors
144
144
│ ├── config.json
145
-
│ ├── ft-model-00001-of-00002.safetensors
146
-
│ ├── ft-model-00002-of-00002.safetensors
145
+
│ ├── model-00001-of-00002.safetensors
146
+
│ ├── model-00002-of-00002.safetensors
147
147
│ ├── generation_config.json
148
148
│ ├── LICENSE.txt
149
149
│ ├── model.safetensors.index.json
@@ -168,7 +168,7 @@ There are 3 types of folders:
168
168
Let's understand the files:
169
169
170
170
- ``adapter_model.safetensors`` and ``adapter_model.pt`` are your LoRA trained adapter weights. We save a duplicated .pt version of it to facilitate resuming from checkpoint.
171
-
- ``ft-model-{}-of-{}.safetensors`` are your trained full model weights (not adapters). When LoRA finetuning, these are only present if we set ``save_adapter_weights_only=False``. In that case, we merge the merged base model with trained adapters, making inference easier.
171
+
- ``model-{}-of-{}.safetensors`` are your trained full model weights (not adapters). When LoRA finetuning, these are only present if we set ``save_adapter_weights_only=False``. In that case, we merge the merged base model with trained adapters, making inference easier.
172
172
- ``adapter_config.json`` is used by Huggingface PEFT when loading an adapter (more on that later);
173
173
- ``model.safetensors.index.json`` is used by Hugging Face ``from_pretrained()`` when loading the model weights (more on that later)
174
174
- All other files were originally in the checkpoint_dir. They are automatically copied during training. Files over 100MiB and ending on .safetensors, .pth, .pt, .bin are ignored, making it lightweight.
@@ -223,8 +223,8 @@ Notice that we are using the merged weights, and not the LoRA adapters.
After fine-tuning, we can convert the model to get an actual quantized model.
111
-
If we print the converted model, we’ll see that the QAT linears have been
112
-
swapped with `Int8DynActInt4WeightLinear <https://github.com/pytorch/ao/blob/428084356ace4ea94c22a3a9b3d74cff8ee41db3/torchao/quantization/prototype/qat.py#L38>`_, which are the quantized versions
113
-
of the linear layers. This quantized model can then be saved to checkpoint and
114
-
used for inference or generation.
143
+
After fine-tuning, we can convert the model to get an actual quantized model:
115
144
116
145
.. code-block:: python
117
146
147
+
from torchao.quantization.qat import (
148
+
FromIntXQuantizationAwareTrainingConfig,
149
+
)
150
+
from torchao.quantization import (
151
+
Int8DynamicActivationInt4WeightConfig,
152
+
)
153
+
118
154
# Fine-tune as before
119
155
train_loop(prepared_model)
120
156
121
-
# Convert fake quantize to actual quantize operations
0 commit comments