Blog post on bitsandbytes integration on Hugging Face (#463)

* first commit * add new thumbnails * add more content * add new gif * Update _blog.yml * rename files * Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Apply suggestions from code review * change content a bit - add more details and adapt from stas suggestions * re-write text: part 1 * few modifs - add credits - add image - modify a bit the content * modify a bit * add more content * add image * paraphrase a bit * add more content * add more content * some improvements * add thumbnail * add more text + fix table * fix table * fix tables * add stas as author * add a last sentence * edit some more * few modifs * modify thumbail * add thumbnail * add removed comment * add photos * add more infos * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Add files via upload * add steven to the credits! * edits * edits * edits * edits * add script * change to std err * refactor a bit the tables * add Tim's comments * remove separators * explain why it is slow * Update hf-bitsandbytes-integration.md Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Add links to paper * delete dummy file * add correct link to paper * add more explanation on speed * update figure * replace authors by we * add freezed image * remove old table * Update hf-bitsandbytes-integration.md Some slight edits. * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Stas Bekman <stas@stason.org> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Tim Dettmers <TimDettmers@users.noreply.github.com>
huggingface · Aug 17, 2022 · d339f36 · d339f36
1 parent 822183c
commit d339f36
Show file tree

Hide file tree

Showing 23 changed files with 642 additions and 1 deletion.
diff --git a/_blog.yml b/_blog.yml
@@ -1120,7 +1120,6 @@
     - guide
 
 
-
 - local: skops
   title: Introducing Skops
   author: merve
@@ -1132,3 +1131,13 @@
     - announcement
     - guide
 
+
+- local: hf-bitsandbytes-integration
+  title: "A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes"
+  author: ybelkada
+  thumbnail: /blog/assets/96_hf_bitsandbytes_integration/thumbnail_blue.png
+  date: August 17, 2022
+  tags:
+    - nlp
+    - llm
+    - quantization
diff --git a/assets/96_hf_bitsandbytes_integration/BF16.png b/assets/96_hf_bitsandbytes_integration/BF16.png
diff --git a/assets/96_hf_bitsandbytes_integration/FP16.png b/assets/96_hf_bitsandbytes_integration/FP16.png
diff --git a/assets/96_hf_bitsandbytes_integration/FP32.png b/assets/96_hf_bitsandbytes_integration/FP32.png
diff --git a/assets/96_hf_bitsandbytes_integration/LLM.png b/assets/96_hf_bitsandbytes_integration/LLM.png
diff --git a/assets/96_hf_bitsandbytes_integration/LLM3.png b/assets/96_hf_bitsandbytes_integration/LLM3.png
diff --git a/assets/96_hf_bitsandbytes_integration/Matmul.png b/assets/96_hf_bitsandbytes_integration/Matmul.png
diff --git a/assets/96_hf_bitsandbytes_integration/Mixed-int8.gif b/assets/96_hf_bitsandbytes_integration/Mixed-int8.gif
diff --git a/assets/96_hf_bitsandbytes_integration/Model-storage.png b/assets/96_hf_bitsandbytes_integration/Model-storage.png
diff --git a/assets/96_hf_bitsandbytes_integration/TF32.png b/assets/96_hf_bitsandbytes_integration/TF32.png
diff --git a/assets/96_hf_bitsandbytes_integration/Thumbnail_blue.png b/assets/96_hf_bitsandbytes_integration/Thumbnail_blue.png
diff --git a/assets/96_hf_bitsandbytes_integration/byte.png b/assets/96_hf_bitsandbytes_integration/byte.png
diff --git a/assets/96_hf_bitsandbytes_integration/example.py b/assets/96_hf_bitsandbytes_integration/example.py
@@ -0,0 +1,41 @@
+import torch
+import torch.nn as nn
+
+from bitsandbytes.nn import Linear8bitLt
+
+# Utility function
+
+def get_model_memory_footprint(model):
+    r"""
+        Partially copied and inspired from: https://discuss.pytorch.org/t/gpu-memory-that-model-uses/56822/2
+    """
+    return sum([param.nelement() * param.element_size() for param in model.parameters()])
+
+# Main script
+
+fp16_model = nn.Sequential(
+    nn.Linear(64, 64),
+    nn.Linear(64, 64)
+).to(torch.float16)
+
+# Train and save your model!
+
+torch.save(fp16_model.state_dict(), "model.pt")
+
+# Define your int8 model!
+
+int8_model = nn.Sequential(
+    Linear8bitLt(64, 64, has_fp16_weights=False),
+    Linear8bitLt(64, 64, has_fp16_weights=False)
+)
+
+int8_model.load_state_dict(torch.load("model.pt"))
+int8_model = int8_model.to(0) # Quantization happens here
+
+input_ = torch.randn(8, 64, dtype=torch.float16)
+hidden_states = int8_model(input_.to(0))
+
+mem_int8 = get_model_memory_footprint(int8_model)
+mem_fp16 = get_model_memory_footprint(fp16_model)
+
+print(f"Relative difference: {mem_fp16/mem_int8}")
diff --git a/assets/96_hf_bitsandbytes_integration/mantissa.svg b/assets/96_hf_bitsandbytes_integration/mantissa.svg
diff --git a/assets/96_hf_bitsandbytes_integration/out-quant.gif b/assets/96_hf_bitsandbytes_integration/out-quant.gif
diff --git a/assets/96_hf_bitsandbytes_integration/quant-freeze.png b/assets/96_hf_bitsandbytes_integration/quant-freeze.png
diff --git a/assets/96_hf_bitsandbytes_integration/quantization.png b/assets/96_hf_bitsandbytes_integration/quantization.png
diff --git a/assets/96_hf_bitsandbytes_integration/tf32-Mantissa-chart-hi-res-FINAL.png b/assets/96_hf_bitsandbytes_integration/tf32-Mantissa-chart-hi-res-FINAL.png
diff --git a/assets/96_hf_bitsandbytes_integration/thumbnail.png b/assets/96_hf_bitsandbytes_integration/thumbnail.png
diff --git a/assets/96_hf_bitsandbytes_integration/thumbnail_logo.png b/assets/96_hf_bitsandbytes_integration/thumbnail_logo.png
diff --git a/assets/96_hf_bitsandbytes_integration/tim.jpeg b/assets/96_hf_bitsandbytes_integration/tim.jpeg
diff --git a/assets/96_hf_bitsandbytes_integration/younes.png b/assets/96_hf_bitsandbytes_integration/younes.png
diff --git a/hf-bitsandbytes-integration.md b/hf-bitsandbytes-integration.md