Currently multi-gpu generate does not work with hf.generate for hf checkpoints. This PR fixes that. #1332

ShashankMosaicML · 2024-07-02T17:48:59Z

Multi gpu generation using hf.generate with device map = 'auto' does pipeline parallelism and moves different modules to different gpus. This results in input tensors to certain operations being on different gpus than other inputs to that operation, which results in an error. This PR moves the tensors to match the other tensors. This should not slow down training because during training all of these tensor movements should be no-ops.

llmfoundry/models/layers/blocks.py

llmfoundry/models/layers/dmoe.py

snarayan21

would like a bit of clarity, thanks for submitting

llmfoundry/models/layers/dmoe.py

llmfoundry/models/layers/attention.py

dakinggg

approving to unblock

ShashankMosaicML added 2 commits July 2, 2024 10:47

making generate work

b64eca3

..

8f1ebed

vchiley reviewed Jul 2, 2024

View reviewed changes

llmfoundry/models/layers/blocks.py Outdated Show resolved Hide resolved

vchiley reviewed Jul 2, 2024

View reviewed changes

llmfoundry/models/layers/dmoe.py Outdated Show resolved Hide resolved

ShashankMosaicML added 2 commits July 2, 2024 12:59

addressing comments

6acce78

reverting hf rotary emb changes, they will remain on a branch

637e8bc

ShashankMosaicML marked this pull request as ready for review July 2, 2024 20:20

ShashankMosaicML requested a review from a team as a code owner July 2, 2024 20:20

snarayan21 reviewed Jul 2, 2024

View reviewed changes

llmfoundry/models/layers/dmoe.py Show resolved Hide resolved

llmfoundry/models/layers/attention.py Show resolved Hide resolved

adding comments

33cd71e

vchiley approved these changes Jul 2, 2024

View reviewed changes

dakinggg approved these changes Jul 2, 2024

View reviewed changes

ShashankMosaicML merged commit 199c3b9 into mosaicml:main Jul 2, 2024
9 checks passed

ShashankMosaicML deleted the changes_for_hf_generate branch July 2, 2024 22:55

ShashankMosaicML mentioned this pull request Jul 2, 2024

Adding a child class of hf's rotary embedding to make hf generate work on multiple gpus. #1334

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Currently multi-gpu generate does not work with hf.generate for hf checkpoints. This PR fixes that. #1332

Currently multi-gpu generate does not work with hf.generate for hf checkpoints. This PR fixes that. #1332

ShashankMosaicML commented Jul 2, 2024 •

edited

Loading

snarayan21 left a comment

dakinggg left a comment

Currently multi-gpu generate does not work with hf.generate for hf checkpoints. This PR fixes that. #1332

Currently multi-gpu generate does not work with hf.generate for hf checkpoints. This PR fixes that. #1332

Conversation

ShashankMosaicML commented Jul 2, 2024 • edited Loading

snarayan21 left a comment

Choose a reason for hiding this comment

dakinggg left a comment

Choose a reason for hiding this comment

ShashankMosaicML commented Jul 2, 2024 •

edited

Loading