[bug] Multiple test failures when testing all models

# 🐞 Describe the Bug

When running tests for all models in #289, I get the following failures:

```
FAILED tests/test_checkpoint.py::test_convert_distributed_to_huggingface[llamba]@dependency_group_2 - AssertionError: Un-handled entries after conversion: {'weights': ['layers.1.self_attn.query.weight', 'layers.1.self_attn.key_value.weigh...
FAILED tests/test_checkpoint.py::test_convert_fast_llm_to_huggingface[llamba]@dependency_group_2 - AssertionError: Un-handled entries after conversion: {'weights': ['layers.1.self_attn.query.weight', 'layers.1.self_attn.key_value.weigh...
FAILED tests/test_gpt_generate_and_forward.py::test_small_generate[mistral-False-True-10-10-10]@dependency_group_17 - AssertionError: assert False
FAILED tests/test_gpt_generate_and_forward.py::test_export_for_generate[llamba]@dependency_group_19 - AssertionError: Un-handled entries after conversion: {'weights': ['layers.1.self_attn.query.weight', 'layers.1.self_attn.key_value.weigh...
FAILED tests/test_gpt_generate_and_forward.py::test_small_generate[llama_mtp-False-True-10-10-10]@dependency_group_16 - AssertionError: assert False
FAILED tests/test_gpt_generate_and_forward.py::test_small_generate[llama_mtp-True-True-10-10-10]@dependency_group_16 - AssertionError: assert False
FAILED tests/test_gpt_generate_and_forward.py::test_small_generate_from_model[llama_mtp]@dependency_group_16 - AssertionError: assert False
FAILED tests/test_gpt_generate_and_forward.py::test_small_forward_return_hidden_states[llama_mtp]@dependency_group_16 - assert (9 - 1) == 2
FAILED tests/test_gpt_generate_and_forward.py::test_small_generate[mixtral-True-True-10-10-10]@dependency_group_18 - AssertionError: assert False
FAILED tests/test_gpt_generate_and_forward.py::test_small_generate_from_model[mixtral]@dependency_group_18 - AssertionError: assert False
FAILED tests/test_mb_seq_first.py::test_model_dp2_sp2_df4[llamba]@dependency_group_44 - ValueError: Comparison failed (66 errors)
FAILED tests/test_seq_first.py::test_model_sp2_ce4[llamba]@dependency_group_23 - ValueError: Comparison failed (1 errors)
```

From this we get the issues issues:
* Conversion looks broken for llamba
* Comparison flaky for `Global gradient: layers.0.word_embeddings_weight` (threshold issue? found in test_model_pp2s1_bf4[mixtral]`, `test_model_bf4[llamba]`)
* Generation tests are flaky (#274)
* `test_small_forward_return_hidden_states[llama_mtp]`:testing issue? (layer count mismatch)
* test_model_dp2_sp2_df4[llamba]: Distributed mismatch, 66 errors (distributed ssm is broken?)

I disabled distributed and conversion tests for llamba, and generation tests in #289, we'll want to fix them and bring them back.

# 🔄 Steps to Reproduce

Run tests with #289 (`pytest tests/ -v -n 10`)

# 🎯 Expected Behavior

Tests pass


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bug] Multiple test failures when testing all models #291

🐞 Describe the Bug

🔄 Steps to Reproduce

🎯 Expected Behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[bug] Multiple test failures when testing all models #291

Description

🐞 Describe the Bug

🔄 Steps to Reproduce

🎯 Expected Behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions