Skip to content

Commit a0fd52a

Browse files
authored
Merge branch 'main' into shared-experts
2 parents 04cebe6 + 96625d8 commit a0fd52a

File tree

12 files changed

+42
-19
lines changed

12 files changed

+42
-19
lines changed

docker/transformers-pytorch-amd-gpu/Dockerfile

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM rocm/dev-ubuntu-22.04:6.3
1+
FROM rocm/dev-ubuntu-22.04:6.2.4
22
LABEL maintainer="Hugging Face"
33

44
ARG DEBIAN_FRONTEND=noninteractive
@@ -8,11 +8,9 @@ RUN apt update && \
88
apt clean && \
99
rm -rf /var/lib/apt/lists/*
1010

11-
RUN export PATH="${PATH:+${PATH}:}~/opt/rocm/bin"
12-
1311
RUN python3 -m pip install --no-cache-dir --upgrade pip numpy
1412

15-
RUN python3 -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3/
13+
RUN python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2
1614

1715
RUN python3 -m pip install --no-cache-dir --upgrade importlib-metadata setuptools ninja git+https://github.com/facebookresearch/detectron2.git pytesseract "itsdangerous<2.1.0"
1816

docker/transformers-pytorch-deepspeed-amd-gpu/Dockerfile

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
1-
FROM rocm/dev-ubuntu-22.04:6.3
1+
FROM rocm/dev-ubuntu-22.04:6.2.4
22
LABEL maintainer="Hugging Face"
33

44
ARG DEBIAN_FRONTEND=noninteractive
55
ARG PYTORCH='2.5.1'
66
ARG TORCH_VISION='0.20.0'
77
ARG TORCH_AUDIO='2.5.0'
8-
ARG ROCM='6.3'
8+
ARG ROCM='6.2'
99

1010
RUN apt update && \
1111
apt install -y --no-install-recommends \
@@ -45,4 +45,4 @@ RUN cd transformers && python3 setup.py develop
4545
RUN python3 -c "from deepspeed.launcher.runner import main"
4646

4747
# Remove nvml as it is not compatible with ROCm
48-
RUN python3 -m pip uninstall py3nvml pynvml -y
48+
RUN python3 -m pip uninstall py3nvml pynvml nvidia-ml-py apex -y

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -626,6 +626,8 @@
626626
title: YOSO
627627
- local: model_doc/zamba
628628
title: Zamba
629+
- local: model_doc/zamba2
630+
title: Zamba2
629631
title: Text models
630632
- isExpanded: false
631633
sections:

docs/source/en/installation.md

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,12 +32,32 @@ Install 🤗 Transformers for whichever deep learning library you're working wit
3232

3333
You should install 🤗 Transformers in a [virtual environment](https://docs.python.org/3/library/venv.html). If you're unfamiliar with Python virtual environments, take a look at this [guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/). A virtual environment makes it easier to manage different projects, and avoid compatibility issues between dependencies.
3434

35-
Now you're ready to install 🤗 Transformers with the following command:
35+
Create a virtual environment with [uv](https://docs.astral.sh/uv/) (refer to [Installation](https://docs.astral.sh/uv/getting-started/installation/) for installation instructions), a fast Rust-based Python package and project manager.
36+
37+
```bash
38+
uv venv my-env
39+
source my-env/bin/activate
40+
```
41+
42+
Now you're ready to install 🤗 Transformers with pip or uv.
43+
44+
<hfoptions id="install">
45+
<hfoption id="uv">
46+
47+
```bash
48+
uv pip install transformers
49+
```
50+
51+
</hfoption>
52+
<hfoption id="pip">
3653

3754
```bash
3855
pip install transformers
3956
```
4057

58+
</hfoption>
59+
</hfoptions>
60+
4161
For GPU acceleration, install the appropriate CUDA drivers for [PyTorch](https://pytorch.org/get-started/locally) and TensorFlow(https://www.tensorflow.org/install/pip).
4262

4363
Run the command below to check if your system detects an NVIDIA GPU.

docs/source/en/model_doc/zamba2.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@ Zamba2-1.2B, Zamba2-2.7B and Zamba2-7B are hybrid models combining state-space m
3434
Zamba2 requires you use `transformers` version 4.48.0 or higher:
3535
```bash
3636
pip install transformers>=4.48.0
37+
```
38+
3739
## Inference
3840

3941
```python

src/transformers/audio_utils.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ def chroma_filter_bank(
146146
sampling_rate: int,
147147
tuning: float = 0.0,
148148
power: Optional[float] = 2.0,
149-
weighting_parameters: Optional[Tuple[float]] = (5.0, 2),
149+
weighting_parameters: Optional[Tuple[float, float]] = (5.0, 2.0),
150150
start_at_c_chroma: Optional[bool] = True,
151151
):
152152
"""
@@ -165,7 +165,7 @@ def chroma_filter_bank(
165165
Tuning deviation from A440 in fractions of a chroma bin.
166166
power (`float`, *optional*, defaults to 2.0):
167167
If 12.0, normalizes each column with their L2 norm. If 1.0, normalizes each column with their L1 norm.
168-
weighting_parameters (`Tuple[float]`, *optional*, defaults to `(5., 2.)`):
168+
weighting_parameters (`Tuple[float, float]`, *optional*, defaults to `(5., 2.)`):
169169
If specified, apply a Gaussian weighting parameterized by the first element of the tuple being the center and
170170
the second element being the Gaussian half-width.
171171
start_at_c_chroma (`float`, *optional*, defaults to `True`):

src/transformers/models/auto/auto_factory.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -580,7 +580,7 @@ def register(cls, config_class, model_class, exist_ok=False):
580580
model_class ([`PreTrainedModel`]):
581581
The model to register.
582582
"""
583-
if hasattr(model_class, "config_class") and str(model_class.config_class) != str(config_class):
583+
if hasattr(model_class, "config_class") and model_class.config_class.__name__ != config_class.__name__:
584584
raise ValueError(
585585
"The model class you are passing has a `config_class` attribute that is not consistent with the "
586586
f"config class you passed (model has {model_class.config_class} and you passed {config_class}. Fix "

src/transformers/models/dbrx/modeling_dbrx.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -675,6 +675,8 @@ def forward(
675675
v1_chunked = [v1.squeeze(dim=0) for v1 in v1_chunked]
676676
w2_chunked = [w2.squeeze(dim=0) for w2 in w2_chunked]
677677
for expert_idx in range(0, self.moe_num_experts):
678+
# (This cause torch.compile to fail with `torch._dynamo.exc.Unsupported: dynamic shape operator: aten.nonzero.default`)
679+
# (set torch._dynamo.config.capture_dynamic_output_shape_ops = True may help but not tested)
678680
topk_idx, token_idx = torch.where(expert_mask[expert_idx])
679681
if token_idx.shape[0] == 0:
680682
continue
@@ -831,7 +833,6 @@ class DbrxPreTrainedModel(PreTrainedModel):
831833
_supports_sdpa = True
832834
_supports_cache_class = True
833835
_supports_quantized_cache = True
834-
_supports_static_cache = True
835836

836837
def _init_weights(self, module: nn.Module):
837838
std = self.config.initializer_range

src/transformers/models/granitemoe/modeling_granitemoe.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -330,6 +330,8 @@ def forward(self, hidden_states):
330330
) # [num_tokens, num_experts]
331331
gates = zeros.scatter(1, top_k_indices, 1) # [num_tokens, num_experts]
332332
expert_size = gates.long().sum(0) # [num_experts,]
333+
# (This cause torch.compile to fail with `torch._dynamo.exc.Unsupported: Backend compiler failed with a fake tensor exception at`)
334+
# (and `DataDependentOutputException`)
333335
expert_size = expert_size.tolist()
334336

335337
# sort and group input tokens according to expert assignment
@@ -875,7 +877,6 @@ class GraniteMoePreTrainedModel(PreTrainedModel):
875877
_supports_sdpa = True
876878
_supports_cache_class = True
877879
_supports_quantized_cache = True
878-
_supports_static_cache = True
879880

880881
def _init_weights(self, module):
881882
std = self.config.initializer_range
@@ -1189,8 +1190,6 @@ def _update_causal_mask(
11891190

11901191
if attention_mask is not None and attention_mask.dim() == 4:
11911192
# in this case we assume that the mask comes already in inverted form and requires no inversion or slicing
1192-
if attention_mask.max() != 0:
1193-
raise ValueError("Custom 4D attention mask should be passed in inverted form with max==0`")
11941193
causal_mask = attention_mask
11951194
else:
11961195
causal_mask = torch.full(

src/transformers/models/idefics/modeling_idefics.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -868,7 +868,7 @@ def forward(
868868
)
869869
hidden_states = nn.functional.dropout(hidden_states, p=self.config, training=self.training)
870870
# Fill in zeros for cross_attention hidden_states of tokens attending to no images
871-
hidden_states[cross_attention_gate == 0] = hidden_states[cross_attention_gate == 0].fill_(0)
871+
hidden_states = hidden_states.masked_fill((cross_attention_gate == 0)[:, :, None], 0.0)
872872
hidden_states = residual + self.act_cross_attn(self.alpha_cross_attn) * hidden_states
873873

874874
# Fully Connected
@@ -917,7 +917,6 @@ class IdeficsPreTrainedModel(PreTrainedModel):
917917
_no_split_modules = ["IdeficsDecoderLayer", "IdeficsGatedCrossAttentionLayer"]
918918
_supports_sdpa = True
919919
_supports_cache_class = True
920-
_supports_static_cache = True
921920

922921
def _init_weights(self, module):
923922
# important: this ported version of Idefics isn't meant for training from scratch - only
@@ -1155,7 +1154,7 @@ def forward(
11551154
elif position_ids is None:
11561155
position_ids = cache_position.unsqueeze(0)
11571156

1158-
if (pixel_values, image_encoder_embeddings, perceiver_embeddings).count(None) != 2:
1157+
if sum([x is None for x in [pixel_values, image_encoder_embeddings, perceiver_embeddings]]) != 2:
11591158
raise ValueError(
11601159
"Exactly 1 of pixel_values, image_encoder_embeddings or perceiver_embeddings has to be not-None."
11611160
)

0 commit comments

Comments
 (0)