Fix Mamba2 Grouped SSD Support in the torch_forward Path #37533

cyang49 · 2025-04-15T14:56:03Z

What does this PR do?

We found a bug in Bamba torch_forward implementation where the Mamba2 grouped SSD heads are incorrectly expanded for computations.

In the original code, it uses torch.repeat but it results in a tile like pattern, e.g. when ngroups=4 and num_heads=16, torch.repeat gives

[W, X, Y, Z] --> [W, X, Y, Z, W, X, Y, Z, W, X, Y, Z, W, X, Y, Z]

instead of the desired

[W, X, Y, Z] -->[W, W, W, W, X, X, X, X, Y, Y, Y, Y, Z, Z, Z, Z]

This causes models using ngroups > 1 and ngroups != num_heads to fail evaluations. We solve it by using torch.repeat_interleave to replace torch.repeat.

The bug was left undetected for a while, perhaps because the cuda_forward path is used by most people, or because Bamba-9B uses ngroups=1.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@fabianlim @ani300 @ArthurZucker

github-actions · 2025-04-15T14:56:16Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

vasqu · 2025-04-15T17:24:48Z

That's a good catch! Can confirm the issue and solution (based on some local tests). For reference, the original mamba repo also points to this in https://github.com/state-spaces/mamba/blob/2e16fc3062cdcd4ebef27a9aa4442676e1c7edf4/mamba_ssm/ops/triton/ssd_chunk_scan.py#L1813-L1814 (when looking at the repeat pattern).

This also affects other mamba2 based models: Could you

Fix the related models (mamba2, zamba2) as well
Add a test

vasqu · 2025-04-15T17:25:51Z

mamba2 😄 cc @molbap

cyang49 · 2025-04-15T17:49:48Z

That's a good catch! Can confirm the issue and solution (based on some local tests). For reference, the original mamba repo also points to this in https://github.com/state-spaces/mamba/blob/2e16fc3062cdcd4ebef27a9aa4442676e1c7edf4/mamba_ssm/ops/triton/ssd_chunk_scan.py#L1813-L1814 (when looking at the repeat pattern).

This also affects other mamba2 based models: Could you

Fix the related models (mamba2, zamba2) as well

Sure, I can patch them as well.

Add a test

I haven't got much experiences writing tests for transformers. Would this be adding a unit test using a ngroups>1 in the model configuration to all of the models affected?

vasqu · 2025-04-15T18:11:00Z

I think something along

transformers/tests/models/mamba2/test_modeling_mamba2.py

Lines 237 to 239 in 4cc6b60

    
           def test_mamba2_slow_vs_fast_forward(self): 
        
               config_and_inputs = self.model_tester.prepare_config_and_inputs() 
        
               self.model_tester.create_and_check_mamba2_slow_vs_fast_forward(*config_and_inputs)

in mamba2 (by adjusting groups/heads for that test) would be sufficient as both bamba and zamba2 basically copied from mamba2. The models should be refactored tbh to allow modular to copy (but thats not in scope for this PR).

molbap · 2025-04-15T18:15:26Z

Bamba-9B uses ngroups=1.

Yes, very likely why that was missed haha. Nice catch indeed, following here and yes we should modularize it all, will open an issue!

cyang49 · 2025-04-15T20:39:35Z

I think something along

transformers/tests/models/mamba2/test_modeling_mamba2.py

Lines 237 to 239 in 4cc6b60

def test_mamba2_slow_vs_fast_forward(self):

config_and_inputs = self.model_tester.prepare_config_and_inputs()

self.model_tester.create_and_check_mamba2_slow_vs_fast_forward(*config_and_inputs)

in mamba2 (by adjusting groups/heads for that test) would be sufficient as both bamba and zamba2 basically copied from mamba2. The models should be refactored tbh to allow modular to copy (but thats not in scope for this PR).

@vasqu I added a test and use half the default n_groups. I tested this locally on mamba2, and I was able to confirm that before the patch the test would fail and after the patch it would pass

vasqu

Just a small nit. Otherwise LGTM!

Ig slow runs just to be sure nothing is majorly broken? @molbap

vasqu · 2025-04-16T07:58:57Z

tests/models/mamba2/test_modeling_mamba2.py


+    def test_mamba2_slow_vs_fast_forward_grouped(self):
+        config_and_inputs = self.model_tester.prepare_config_and_inputs()
+        config_and_inputs[0].n_groups //= 2


Just a small nit: Could you add a comment / link to this PR so we know in the future why this test was added.

comment added

it's nice but I think he meant adding literally

# See https://github.com/huggingface/transformers/pull/37533/

we do that a lot across the library to keep the history :)

ah.. ok let me do that

molbap · 2025-04-16T08:57:47Z

run-slow: mamba2

github-actions · 2025-04-16T08:59:04Z

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/mamba2']
quantizations: [] ...

vasqu · 2025-04-16T09:30:56Z

Hmm something went wrong on picking up the commit? Not familiar with the new workflow.

vasqu · 2025-04-16T13:31:42Z

@molbap can we try slow runs again?

molbap · 2025-04-16T15:14:04Z

Hey, I'm not sure about the new workflow either haha :D I mostly ran it locally to be sure, and it seems to not break, but would be preferrable to check on our runners. Retrying, cc @ydshieh is this the correct current launch? I thought it was (no need for labels, just ready PR + the message from a maintainer?

run-slow mamba2

molbap · 2025-04-16T15:41:45Z

run-slow mamba2

ydshieh · 2025-04-16T15:41:46Z

yes, but i think it should not mixed with other comment text. simply run-slow: mamba2 with our without :

github-actions · 2025-04-16T15:43:06Z

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/mamba2']
quantizations: [] ...

molbap · 2025-04-16T15:53:16Z

all tests passing on the slow CI, congrats 🙌

HuggingFaceDocBuilderDev · 2025-04-16T16:07:28Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

vasqu · 2025-04-16T17:08:34Z

cc @Cyrilvallez for core maintainer review

Cyrilvallez

Merging! Thanks a lot for the fix and for the clean PR @cyang49! 🤗

…#37533) * Fix mamba2 grouped support in bamba torch path * patch zamba2 and mamba2 * Add a unit test for grouped SSD * add comment for the new unit test * add output_size arg value to repeat_interleave calls * Add comment

github-actions bot marked this pull request as draft April 15, 2025 14:56

cyang49 marked this pull request as ready for review April 15, 2025 14:57

github-actions bot requested a review from ArthurZucker April 15, 2025 14:57

cyang49 force-pushed the pr_mamba2_groups branch from 2228188 to 29b5b89 Compare April 15, 2025 18:26

cyang49 marked this pull request as draft April 15, 2025 19:10

cyang49 changed the title ~~Fix Mamba2 Grouped SSD Support in Bamba torch_forward Path~~ Fix Mamba2 Grouped SSD Support in the torch_forward Path Apr 15, 2025

cyang49 force-pushed the pr_mamba2_groups branch from 7114e91 to 7dacdec Compare April 15, 2025 20:37

cyang49 marked this pull request as ready for review April 15, 2025 20:39

vasqu approved these changes Apr 16, 2025

View reviewed changes

cyang49 force-pushed the pr_mamba2_groups branch from 7dacdec to 7d874fb Compare April 16, 2025 11:47

cyang49 added 4 commits April 16, 2025 12:06

Fix mamba2 grouped support in bamba torch path

53a8080

patch zamba2 and mamba2

48500af

Add a unit test for grouped SSD

2587088

add comment for the new unit test

dca1024

cyang49 added 2 commits April 16, 2025 12:06

add output_size arg value to repeat_interleave calls

1d0842d

Add comment

c029cf0

cyang49 force-pushed the pr_mamba2_groups branch from c794c1b to c029cf0 Compare April 16, 2025 16:06

Cyrilvallez approved these changes Apr 16, 2025

View reviewed changes

Cyrilvallez merged commit 4005730 into huggingface:main Apr 16, 2025
12 checks passed

cyang49 deleted the pr_mamba2_groups branch April 16, 2025 20:22

vasqu mentioned this pull request May 22, 2025

[MODEL] Add Falcon H1 #38249

Merged

dhiaEddineRhaiem mentioned this pull request May 23, 2025

[Falcon H1] Fix slow path forward pass #38320

Merged

garrett361 mentioned this pull request Aug 21, 2025

fix: n_groups expansion foundation-model-stack/foundation-model-stack#462

Merged

Fix Mamba2 Grouped SSD Support in the torch_forward Path #37533

Fix Mamba2 Grouped SSD Support in the torch_forward Path #37533

Uh oh!

Conversation

cyang49 commented Apr 15, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

github-actions bot commented Apr 15, 2025

Uh oh!

vasqu commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vasqu commented Apr 15, 2025

Uh oh!

cyang49 commented Apr 15, 2025

Uh oh!

vasqu commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

molbap commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cyang49 commented Apr 15, 2025

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

cyang49 Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

molbap Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

cyang49 Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

cyang49 Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

molbap commented Apr 16, 2025

Uh oh!

github-actions bot commented Apr 16, 2025

Uh oh!

vasqu commented Apr 16, 2025

Uh oh!

vasqu commented Apr 16, 2025

Uh oh!

molbap commented Apr 16, 2025

Uh oh!

molbap commented Apr 16, 2025

Uh oh!

ydshieh commented Apr 16, 2025

Uh oh!

github-actions bot commented Apr 16, 2025

Uh oh!

molbap commented Apr 16, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Apr 16, 2025

Uh oh!

vasqu commented Apr 16, 2025

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

vasqu commented Apr 15, 2025 •

edited

Loading

vasqu commented Apr 15, 2025 •

edited

Loading

molbap commented Apr 15, 2025 •

edited

Loading