Skip to content

Fix RuntimeError for moe on XPU: tensors found at least two devices#5519

Merged
tjruwase merged 4 commits intodeepspeedai:masterfrom
shiyang-weng:wengshiy/fix_ut_moe
May 21, 2024
Merged

Fix RuntimeError for moe on XPU: tensors found at least two devices#5519
tjruwase merged 4 commits intodeepspeedai:masterfrom
shiyang-weng:wengshiy/fix_ut_moe

Conversation

@shiyang-weng
Copy link
Contributor

There is following error on XPU while unit testing "DeepSpeed/tests/unit/moe/test_moe.py"
DeepSpeed/deepspeed/moe/sharded_moe.py line 223, in top1gating
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, xpu:0 and cpu!

Fix it by device conversion.

@shiyang-weng shiyang-weng requested a review from awan-10 as a code owner May 10, 2024 05:28
@shiyang-weng
Copy link
Contributor Author

Hi @loadams, could you help review this PR? This PR makes the syntax more reasonable and fix issue on XPU.

@tjruwase tjruwase added this pull request to the merge queue May 21, 2024
Merged via the queue into deepspeedai:master with commit 695d79e May 21, 2024
sfc-gh-reyazda pushed a commit to Snowflake-Labs/DeepSpeed that referenced this pull request Jun 10, 2024
…eepspeedai#5519)

There is following error on XPU while unit testing
"DeepSpeed/tests/unit/moe/test_moe.py"
DeepSpeed/deepspeed/moe/sharded_moe.py line 223, in top1gating
RuntimeError: Expected all tensors to be on the same device, but found
at least two devices, xpu:0 and cpu!

Fix it by device conversion.

---------

Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants