Add the Bamba Model (#34982)

* initial commit for PR Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com> * rename dynamic cache Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * add more unit tests Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * add integration test Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * add integration test Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * Add modular bamba file * Remove trainer changes from unrelated PR * Modify modular and cofig to get model running * Fix some CI errors and beam search * Fix a plethora of bugs from CI/docs/etc * Add bamba to models with special caches * Updat to newer mamba PR for mamba sublayer * fix test_left_padding_compatibility Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * fix style Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * fix remaining tests Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * missed this test Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * ran make style Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * move slow tag to integration obj Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * make style Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * address comments Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * fix modular Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * left out one part of modular Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * change model Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * Make Rotary modular as well * Update bamba.md Added overview, update Model inference card and added config * Update bamba.md * Update bamba.md * Update bamba.md Minor fixes * Add docs for config and model back Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> * Add warning when using fast kernels * replaced generate example Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> * Address comments from PR Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> * Propagate attention fixes Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> * Fix attention interfaces to the new API Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> * Fix API for decoder layer Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> * Remove extra weights Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> --------- Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com> Signed-off-by: Antoni Viros i Martin <aviros@ibm.com> Co-authored-by: Gabe Goodhart <gabe.l.hart@gmail.com> Co-authored-by: Antoni Viros i Martin <aviros@ibm.com> Co-authored-by: divya-kumari32 <72085811+divya-kumari32@users.noreply.github.com> Co-authored-by: Antoni Viros <ani300@gmail.com>
huggingface · Dec 18, 2024 · 9613933 · 9613933
1 parent 9a94dfe
commit 9613933
Show file tree

Hide file tree

Showing 19 changed files with 4,138 additions and 0 deletions.
diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml
@@ -322,6 +322,8 @@
       sections:
       - local: model_doc/albert
         title: ALBERT
+      - local: model_doc/bamba
+        title: Bamba
       - local: model_doc/bart
         title: BART
       - local: model_doc/barthez

diff --git a/docs/source/en/index.md b/docs/source/en/index.md
@@ -66,6 +66,7 @@ Flax), PyTorch, and/or TensorFlow.
 |                     [AriaText](model_doc/aria_text)                      |       ✅        |         ❌         |      ❌      |
 | [Audio Spectrogram Transformer](model_doc/audio-spectrogram-transformer) |       ✅        |         ❌         |      ❌      |
 |                    [Autoformer](model_doc/autoformer)                    |       ✅        |         ❌         |      ❌      |
+|                         [Bamba](model_doc/bamba)                         |       ✅        |         ❌         |      ❌      |
 |                          [Bark](model_doc/bark)                          |       ✅        |         ❌         |      ❌      |
 |                          [BART](model_doc/bart)                          |       ✅        |         ✅         |      ✅      |
 |                       [BARThez](model_doc/barthez)                       |       ✅        |         ✅         |      ✅      |

diff --git a/docs/source/en/model_doc/bamba.md b/docs/source/en/model_doc/bamba.md
@@ -0,0 +1,64 @@
+<!--Copyright 2024 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+
+⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
+
+-->
+
+# Bamba
+
+
+## Overview
+
+Bamba-9B is a decoder-only language model based on the [Mamba-2](https://github.com/state-spaces/mamba) architecture and is designed to handle a wide range of text generation tasks. It is trained from scratch using a two-stage training approach. In the first stage, the model is trained on 2 trillion tokens from the Dolma v1.7 dataset. In the second stage, it undergoes additional training on 200 billion tokens, leveraging a carefully curated blend of high-quality data to further refine its performance and enhance output quality.
+
+Checkout all Bamba-9B model checkpoints [here](https://github.com/foundation-model-stack/bamba).
+
+## BambaConfig
+
+| Model            | Params       | # Layers | Hidden Dim. | Attention Heads | GQA | KV Heads | Context Length |  Tied Embeddings |
+|-------------------|--------------|----------|-------------|-----------------|-----|----------|----------------|------------------|
+| Bamba  | 9B (9.78B)   | 32       | 4096        | 32              | Yes | 8        | 4096           | True |
+
+[[autodoc]] BambaConfig
+
+<!---
+## Usage Tips
+
+Tips: 
+
+- The architecture is based on Mamba-2 models.
+
+## BambaModel
+
+[[autodoc]] BambaModel
+    - forward
+-->
+
+## BambaForCausalLM
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+model = AutoModelForCausalLM.from_pretrained("ibm-fms/Bamba-9B")
+tokenizer = AutoTokenizer.from_pretrained("ibm-fms/Bamba-9B")
+
+message = ["Mamba is a snake with following properties  "]
+inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
+response = model.generate(**inputs, max_new_tokens=64)
+print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
+```
+
+[[autodoc]] BambaForCausalLM
+    - forward
+
+This HF implementation is contributed by [ani300](https://github.com/ani300) and [fabianlim](https://github.com/fabianlim). 
diff --git a/docs/source/en/perf_infer_gpu_one.md b/docs/source/en/perf_infer_gpu_one.md
@@ -39,6 +39,7 @@ FlashAttention-2 is experimental and may change considerably in future versions.
 FlashAttention-2 is currently supported for the following architectures:
 * [Aria](https://huggingface.co/docs/transformers/model_doc/aria#transformers.AriaForConditionalGeneration)
 * [Bark](https://huggingface.co/docs/transformers/model_doc/bark#transformers.BarkModel)
+* [Bamba](https://huggingface.co/docs/transformers/model_doc/bamba#transformers.BambaModel)
 * [Bart](https://huggingface.co/docs/transformers/model_doc/bart#transformers.BartModel)
 * [Chameleon](https://huggingface.co/docs/transformers/model_doc/chameleon#transformers.Chameleon)
 * [CLIP](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPModel)
@@ -220,6 +221,7 @@ For now, Transformers supports SDPA inference and training for the following arc
 * [Albert](https://huggingface.co/docs/transformers/model_doc/albert#transformers.AlbertModel)
 * [Aria](https://huggingface.co/docs/transformers/model_doc/aria#transformers.AriaForConditionalGeneration)
 * [Audio Spectrogram Transformer](https://huggingface.co/docs/transformers/model_doc/audio-spectrogram-transformer#transformers.ASTModel)
+* [Bamba](https://huggingface.co/docs/transformers/model_doc/bamba#transformers.BambaModel)
 * [Bart](https://huggingface.co/docs/transformers/model_doc/bart#transformers.BartModel)
 * [Beit](https://huggingface.co/docs/transformers/model_doc/beit#transformers.BeitModel)
 * [Bert](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertModel)

diff --git a/src/transformers/__init__.py b/src/transformers/__init__.py
@@ -193,6 +193,7 @@
         "AutoTokenizer",
     ],
     "models.autoformer": ["AutoformerConfig"],
+    "models.bamba": ["BambaConfig"],
     "models.bark": [
         "BarkCoarseConfig",
         "BarkConfig",
@@ -1540,6 +1541,13 @@
             "AutoformerPreTrainedModel",
         ]
     )
+    _import_structure["models.bamba"].extend(
+        [
+            "BambaForCausalLM",
+            "BambaModel",
+            "BambaPreTrainedModel",
+        ]
+    )
     _import_structure["models.bark"].extend(
         [
             "BarkCausalModel",
@@ -5104,6 +5112,7 @@
     from .models.autoformer import (
         AutoformerConfig,
     )
+    from .models.bamba import BambaConfig
     from .models.bark import (
         BarkCoarseConfig,
         BarkConfig,
@@ -6493,6 +6502,7 @@
             AutoformerModel,
             AutoformerPreTrainedModel,
         )
+        from .models.bamba import BambaForCausalLM, BambaModel, BambaPreTrainedModel
         from .models.bark import (
             BarkCausalModel,
             BarkCoarseModel,

diff --git a/src/transformers/generation/utils.py b/src/transformers/generation/utils.py
@@ -1693,6 +1693,7 @@ def _supports_default_dynamic_cache(self) -> bool:
             self._supports_cache_class
             and "jamba" not in self.__class__.__name__.lower()
             and "zamba" not in self.__class__.__name__.lower()
+            and "bamba" not in self.__class__.__name__.lower()
         )
 
     def _prepare_cache_for_generation(

diff --git a/src/transformers/models/__init__.py b/src/transformers/models/__init__.py
@@ -20,6 +20,7 @@
     audio_spectrogram_transformer,
     auto,
     autoformer,
+    bamba,
     bark,
     bart,
     barthez,

diff --git a/src/transformers/models/auto/configuration_auto.py b/src/transformers/models/auto/configuration_auto.py
@@ -39,6 +39,7 @@
         ("aria_text", "AriaTextConfig"),
         ("audio-spectrogram-transformer", "ASTConfig"),
         ("autoformer", "AutoformerConfig"),
+        ("bamba", "BambaConfig"),
         ("bark", "BarkConfig"),
         ("bart", "BartConfig"),
         ("beit", "BeitConfig"),
@@ -337,6 +338,7 @@
         ("aria_text", "AriaText"),
         ("audio-spectrogram-transformer", "Audio Spectrogram Transformer"),
         ("autoformer", "Autoformer"),
+        ("bamba", "Bamba"),
         ("bark", "Bark"),
         ("bart", "BART"),
         ("barthez", "BARThez"),

diff --git a/src/transformers/models/auto/modeling_auto.py b/src/transformers/models/auto/modeling_auto.py
@@ -39,6 +39,7 @@
         ("aria_text", "AriaTextModel"),
         ("audio-spectrogram-transformer", "ASTModel"),
         ("autoformer", "AutoformerModel"),
+        ("bamba", "BambaModel"),
         ("bark", "BarkModel"),
         ("bart", "BartModel"),
         ("beit", "BeitModel"),
@@ -471,6 +472,7 @@
     [
         # Model for Causal LM mapping
         ("aria_text", "AriaTextForCausalLM"),
+        ("bamba", "BambaForCausalLM"),
         ("bart", "BartForCausalLM"),
         ("bert", "BertLMHeadModel"),
         ("bert-generation", "BertGenerationDecoder"),

diff --git a/src/transformers/models/bamba/__init__.py b/src/transformers/models/bamba/__init__.py
@@ -0,0 +1,28 @@
+# Copyright 2024 IBM and the HuggingFace Inc. team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from typing import TYPE_CHECKING
+
+from ...utils import _LazyModule
+from ...utils.import_utils import define_import_structure
+
+
+if TYPE_CHECKING:
+    from .configuration_bamba import *
+    from .modeling_bamba import *
+    from .processing_bamba import *
+else:
+    import sys
+
+    _file = globals()["__file__"]
+    sys.modules[__name__] = _LazyModule(__name__, _file, define_import_structure(_file), module_spec=__spec__)
-Original file line number
+Diff line change
@@ Expand Up / @@ -20,6 +20,7 @@ @@
         audio_spectrogram_transformer,
         auto,
         autoformer,
+        bamba,
         bark,
         bart,
         barthez,
@@ Expand Down @@