Skip to content

fix num_kv_heads sharding in uneven autoTP for Falcon-40b#4712

Merged
mrwyattii merged 4 commits intodeepspeedai:masterfrom
Yejing-Lai:lyj/fix_falcon40b
Jan 5, 2024
Merged

fix num_kv_heads sharding in uneven autoTP for Falcon-40b#4712
mrwyattii merged 4 commits intodeepspeedai:masterfrom
Yejing-Lai:lyj/fix_falcon40b

Conversation

@Yejing-Lai
Copy link
Contributor

Falcon-40b will fail on uneven autotp. Need to add 'num_kv_heads' in the kv_head_names list.

@delock
Copy link
Collaborator

delock commented Dec 4, 2023

hi @RezaYazdaniAminabadi, this PR solve Falcon-40b autoTP with uneven sharding on i.e. 3 ranks. Can this PR be reviewed? Thanks!

@delock
Copy link
Collaborator

delock commented Jan 3, 2024

@tjruwase from the failure log it seems like a environment issue. Has this already been resolved?
Thanks!

FAILED tests/deepspeed/test_deepspeed.py::TrainerIntegrationDeepSpeed::test_early_get_last_lr_zero2_fp16 - deepspeed.ops.op_builder.builder.CUDAMismatchException: >- DeepSpeed Op Builder: Installed CUDA version 11.6 does not match the version torch was compiled with 12.1, unable to compile cuda/cpp extensions without a matching cuda version.

@tjruwase
Copy link
Contributor

tjruwase commented Jan 3, 2024

@delock, apologies for the delay on this. The team is gradually returning from the holidays. This will be resolved asap.

@mrwyattii mrwyattii added this pull request to the merge queue Jan 5, 2024
Merged via the queue into deepspeedai:master with commit 1787673 Jan 5, 2024
mauryaavinash95 pushed a commit to mauryaavinash95/DeepSpeed that referenced this pull request Feb 17, 2024
…i#4712)

Falcon-40b will fail on uneven autotp. Need to add 'num_kv_heads' in the
kv_head_names list.

Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants