Skip to content

Conversation

@kashif
Copy link
Contributor

@kashif kashif commented Oct 25, 2025

What does this PR do?

Adds two context parallel tests for the CI

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@kashif kashif requested a review from SunMarc October 25, 2025 14:04
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@stas00
Copy link
Contributor

stas00 commented Oct 25, 2025

Looks fantastic, Kashif! Thank you for adding the missing tests. Now I don't need to worry about breaking your CP integration while integrating ALST/UlyssesSP.

@stas00
Copy link
Contributor

stas00 commented Oct 26, 2025

oh, one request, since we will now have more than one context parallel, you might want to rename the file+class to include torch in it - via #41832 and huggingface/accelerate#3817 we will now have pc.cp_backend = {torch|deepspeed} torch being the default.

training_args = parser.parse_args_into_dataclasses()[0]

# Use SmolLM (small Llama-based model that works with CP)
model_name = "HuggingFaceTB/SmolLM-135M"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting that "hf-internal-testing/tiny-random-LlamaForCausalLM" fails here:

stderr: [rank0]:   File "/code/users/stas/github/transformers-alst-integration/src/transformers/models/llama/modeling_llama.py", line 292, in forward
stderr: [rank0]:     attn_output = self.o_proj(attn_output)
stderr: [rank0]:                   ^^^^^^^^^^^^^^^^^^^^^^^^
stderr: [rank0]:   File "/home/yak/miniconda3/envs/dev/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
stderr: [rank0]:     return self._call_impl(*args, **kwargs)
stderr: [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
stderr: [rank0]:   File "/home/yak/miniconda3/envs/dev/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
stderr: [rank0]:     return forward_call(*args, **kwargs)
stderr: [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
stderr: [rank0]:   File "/home/yak/miniconda3/envs/dev/lib/python3.12/site-packages/torch/nn/modules/linear.py", line 125, in forward
stderr: [rank0]:     return F.linear(input, self.weight, self.bias)
stderr: [rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
stderr: [rank0]: RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x32 and 16x16)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes... i got the same so i switched to "HuggingFaceTB/SmolLM-135M"

@ydshieh
Copy link
Collaborator

ydshieh commented Oct 27, 2025

Thanks a lot. I will check on our multi-gpu CI runners too.

@stas00
Copy link
Contributor

stas00 commented Oct 27, 2025

would it be possible to exercise the rest of the torch/fsdp PC config here:

parallelism_config:
  parallelism_config_dp_replicate_size: 1
  parallelism_config_dp_shard_size: 1
  parallelism_config_tp_size: 1
  parallelism_config_cp_size: 2

  --------------
  parallelism_config_cp_comm_strategy: alltoall

that is to add all config values for the Torch PC plugin, like cp_comm_strategy? it also helps with tests-as-documentation/example

@stas00
Copy link
Contributor

stas00 commented Oct 27, 2025

Kashif, I'd suggest dropping the basic test, since it gets repeated almost identically in the 2nd test - will save $$ and time.

@stas00
Copy link
Contributor

stas00 commented Oct 28, 2025

@kashif, I took your test as a foundation for mine and further worked on it, while also making it self-contained w/o needing to include in the already bloated repo more files.

https://github.com/huggingface/transformers/pull/41832/files#diff-bfd2f0924ca5096a9cf7bff5929081cf6552b70df8bb40632d3b11273c9554af

Please feel free to re-use, with the note that I made some tweaks to testing_utils.py that it relies on - your version of my lib is quite outdated, I have made quite a few changes over the recent years. If you want to sync back the latest version is here https://github.com/stas00/ml-engineering/blob/master/testing/testing_utils.py

@kashif
Copy link
Contributor Author

kashif commented Nov 1, 2025

@stas00 I've made the script in the self-contained style and incorporated your suggestions

@kashif
Copy link
Contributor Author

kashif commented Nov 1, 2025

@ydshieh we should ideally run this only on the multi-gpu setup, is that possible?

@stas00
Copy link
Contributor

stas00 commented Nov 1, 2025

Looks great, Kashif.

@kashif kashif requested a review from ydshieh November 4, 2025 08:51
Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this ! Just a minor nit

@kashif
Copy link
Contributor Author

kashif commented Nov 5, 2025

@SunMarc moved to fsdp folder

@SunMarc SunMarc merged commit 0c4a202 into main Nov 5, 2025
16 checks passed
@SunMarc SunMarc deleted the cp-ci-tests branch November 5, 2025 10:40
@ydshieh
Copy link
Collaborator

ydshieh commented Nov 5, 2025

@ydshieh we should ideally run this only on the multi-gpu setup, is that possible?

Yes, we have multi-gpu runners (A10). I will check how it goes with that.

@ydshieh
Copy link
Collaborator

ydshieh commented Nov 5, 2025

@kashif The test is run and pass in our multi-gpu A10 runner, great!

@kashif
Copy link
Contributor Author

kashif commented Nov 5, 2025

amazing thank you @ydshieh

Abdennacer-Badaoui pushed a commit to Abdennacer-Badaoui/transformers that referenced this pull request Nov 10, 2025
* intial

* simplify tests

* add test_cp_equivalence

* removed fsdp_transformer_layer_cls_to_wrap

* use DataCollatorForLanguageModeling

* remove use_cache=False.

* changes from review

* make script self contained

* moved to fsdp folder

* fix class name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants