Make Ascend NPU available by hipudding · Pull Request #3831 · deepspeedai/DeepSpeed

hipudding · 2023-06-28T07:02:03Z

NPU accelerator support is introduced in (#3595).
This commit provides two enhancements:

Add a new accelerator_name 'npu' for choosing, it can be specified by environment variable or auto detected.
Optimize auto detect code in get_accelerator to avoid too many layers of exception throwing.

NPU accelerator support is introduced in (deepspeedai#3595). This commit provides two enhancements: 1. Add a new accelerator_name 'npu' for choosing, it can be specified by environment variable or auto detected. 2. Optimize auto detect code in get_accelerator to avoid too many layers of exception throwing.

hipudding · 2023-06-28T07:04:38Z

@microsoft-github-policy-service agree

accelerator/real_accelerator.py

When detecting override accelerators there's an error message to show all support accelerators, using an accelerator list instead of hard coding accelerator names in this message. And fix code format issue(yapf).

NPU accelerator support is introduced in (deepspeedai#3595). This commit provides two enhancements: 1. Add a new accelerator_name 'npu' for choosing, it can be specified by environment variable or auto detected. 2. Optimize auto detect code in get_accelerator to avoid too many layers of exception throwing.

When detecting override accelerators there's an error message to show all support accelerators, using an accelerator list instead of hard coding accelerator names in this message. And fix code format issue(yapf).

NPU accelerator support is introduced in (deepspeedai#3595). This commit provides two enhancements: 1. Add a new accelerator_name 'npu' for choosing, it can be specified by environment variable or auto detected. 2. Optimize auto detect code in get_accelerator to avoid too many layers of exception throwing.

When detecting override accelerators there's an error message to show all support accelerators, using an accelerator list instead of hard coding accelerator names in this message. And fix code format issue(yapf).

hipudding · 2023-06-30T06:11:09Z

@tjruwase Good day. Could you please review this PR again and trigger workflows? Thanks.

tjruwase · 2023-06-30T11:20:13Z

@hipudding, thanks for this PR. We are reviewing currently. Apologies for the delay.

hipudding · 2023-07-03T00:51:33Z

:-) Thanks.

hipudding · 2023-07-07T01:28:33Z

Rebase my commits to master HEAD to avoid too many merge back from master.
Change commit title to remove PR number because it will added automatically when PR merged.
Add two more commits.

BTW, Is this PR still under reviewing? Any review suggestions? Please let me know if there is anything else that needs to be done.

NPU accelerator support is introduced in (deepspeedai#3595). This commit provides two enhancements: 1. Add a new accelerator_name 'npu' for choosing, it can be specified by environment variable or auto detected. 2. Optimize auto detect code in get_accelerator to avoid too many layers of exception throwing.

When detecting override accelerators there's an error message to show all support accelerators, using an accelerator list instead of hard coding accelerator names in this message. And fix code format issue(yapf).

HCCL is the distribute backend of Ascend NPU, it already implemented in npu plugin for pytorch (https://gitee.com/ascend/pytorch). Add HCCL backend as a not implemented backend to avoid not supported warning.

Ascend NPU does not implement any op yet, leave npu folder empty will throw NoneType[op_name] when not supported op is called. Add this NPUNotImplementedBuilder as the default builder.

hipudding · 2023-07-07T06:08:39Z

It seems the failing test case is not related to these commits. Please re-trigger this failed workflow, Thanks.

accelerator/cpu_accelerator.py

accelerator/cuda_accelerator.py

1. cpu and other backend implement their ops in sub dirs under op_builder, cuda_accelerator should skip these sub dirs. 2. Each backend will have its own NotImplementedBuilder, add device prefix to this class to distinguish.

op_builder/npu/no_impl.py

hipudding · 2023-07-20T01:40:26Z

Good day. This PR is approved and ready to merge. But one workflow is failed due to OOM(not related to these changes). Could you retrigger this workflow and merge this PR? Thanks.

hipudding requested review from RezaYazdaniAminabadi, cmikeh2 and jeffra as code owners June 28, 2023 07:02

hipudding force-pushed the npu_support branch from 229e86e to a512f56 Compare June 28, 2023 07:02

tjruwase reviewed Jun 28, 2023

View reviewed changes

accelerator/real_accelerator.py Outdated Show resolved Hide resolved

hipudding requested a review from tjruwase June 29, 2023 01:23

hipudding force-pushed the npu_support branch from 6b31125 to 7fd5929 Compare June 29, 2023 01:26

hipudding force-pushed the npu_support branch from 7fd5929 to 60902c6 Compare June 30, 2023 06:09

hipudding force-pushed the npu_support branch from 5821b4f to 058f269 Compare July 7, 2023 01:25

hipudding added 3 commits July 7, 2023 01:29

Use DS_ACCELERATOR_LIST for overriding accelerators

058f269

When detecting override accelerators there's an error message to show all support accelerators, using an accelerator list instead of hard coding accelerator names in this message. And fix code format issue(yapf).

Add HCCL backend

b611bcd

HCCL is the distribute backend of Ascend NPU, it already implemented in npu plugin for pytorch (https://gitee.com/ascend/pytorch). Add HCCL backend as a not implemented backend to avoid not supported warning.

hipudding requested a review from awan-10 as a code owner July 7, 2023 02:53

hipudding force-pushed the npu_support branch from 70b91b6 to 006a59e Compare July 7, 2023 03:04

hipudding added 2 commits July 7, 2023 03:09

Add NPUNotImplementedBuilder

006a59e

Ascend NPU does not implement any op yet, leave npu folder empty will throw NoneType[op_name] when not supported op is called. Add this NPUNotImplementedBuilder as the default builder.

Merge branch 'master' into npu_support

894a984

Merge branch 'master' into npu_support

47e72db

hipudding force-pushed the npu_support branch from 6271c73 to 2e9b6a9 Compare July 10, 2023 02:34

hipudding force-pushed the npu_support branch from 2e9b6a9 to 6cc9068 Compare July 10, 2023 03:34

tjruwase reviewed Jul 11, 2023

View reviewed changes

accelerator/cpu_accelerator.py Outdated Show resolved Hide resolved

tjruwase reviewed Jul 11, 2023

View reviewed changes

accelerator/cuda_accelerator.py Outdated Show resolved Hide resolved

hipudding force-pushed the npu_support branch 3 times, most recently from 7876e3f to 127ecce Compare July 13, 2023 06:21

hipudding requested review from delock and tjruwase July 13, 2023 06:25

hipudding added 2 commits July 13, 2023 06:26

Optimize builder search logic

127ecce

1. cpu and other backend implement their ops in sub dirs under op_builder, cuda_accelerator should skip these sub dirs. 2. Each backend will have its own NotImplementedBuilder, add device prefix to this class to distinguish.

Merge branch 'master' into npu_support

e9b4a93

delock reviewed Jul 14, 2023

View reviewed changes

op_builder/npu/no_impl.py Outdated Show resolved Hide resolved

hipudding requested a review from delock July 19, 2023 00:59

Change the unimplemented builder name to the same for each backend

d115969

hipudding force-pushed the npu_support branch 4 times, most recently from 6b48a6a to eb8ab12 Compare July 19, 2023 01:27

hipudding added 2 commits July 19, 2023 01:32

Merge branch 'master' into npu_support

eb8ab12

Merge branch 'master' into npu_support

8800f7e

tjruwase approved these changes Jul 19, 2023

View reviewed changes

Merge branch 'master' into npu_support

dfeaaed

hipudding added 3 commits July 20, 2023 21:12

Merge branch 'master' into npu_support

9c093f6

Merge branch 'master' into npu_support

55a5992

Merge branch 'master' into npu_support

8e072a8

tjruwase added this pull request to the merge queue Jul 22, 2023

Merged via the queue into deepspeedai:master with commit 23a11a3 Jul 22, 2023

hipudding mentioned this pull request Oct 26, 2023

[Feature package] Full feature support with Ascend NPU #4567

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Make Ascend NPU available#3831

Make Ascend NPU available#3831
tjruwase merged 15 commits intodeepspeedai:masterfrom
hipudding:npu_support

hipudding commented Jun 28, 2023

Uh oh!

hipudding commented Jun 28, 2023

Uh oh!

Uh oh!

hipudding commented Jun 30, 2023

Uh oh!

tjruwase commented Jun 30, 2023

Uh oh!

hipudding commented Jul 3, 2023

Uh oh!

hipudding commented Jul 7, 2023 •

edited

Loading

Uh oh!

hipudding commented Jul 7, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hipudding commented Jul 20, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

hipudding commented Jun 28, 2023

Uh oh!

hipudding commented Jun 28, 2023

Uh oh!

Uh oh!

hipudding commented Jun 30, 2023

Uh oh!

tjruwase commented Jun 30, 2023

Uh oh!

hipudding commented Jul 3, 2023

Uh oh!

hipudding commented Jul 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hipudding commented Jul 7, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hipudding commented Jul 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hipudding commented Jul 7, 2023 •

edited

Loading

hipudding commented Jul 20, 2023 •

edited

Loading