Skip to content

Comments

Make Ascend NPU available#3831

Merged
tjruwase merged 15 commits intodeepspeedai:masterfrom
hipudding:npu_support
Jul 22, 2023
Merged

Make Ascend NPU available#3831
tjruwase merged 15 commits intodeepspeedai:masterfrom
hipudding:npu_support

Conversation

@hipudding
Copy link
Contributor

NPU accelerator support is introduced in (#3595).
This commit provides two enhancements:

  1. Add a new accelerator_name 'npu' for choosing, it can be specified by environment variable or auto detected.
  2. Optimize auto detect code in get_accelerator to avoid too many layers of exception throwing.

hipudding added a commit to hipudding/DeepSpeed that referenced this pull request Jun 28, 2023
NPU accelerator support is introduced in (deepspeedai#3595).
This commit provides two enhancements:
  1. Add a new accelerator_name 'npu' for choosing, it can be specified
by environment variable or auto detected.
  2. Optimize auto detect code in get_accelerator to avoid too many
layers of exception throwing.
@hipudding
Copy link
Contributor Author

@microsoft-github-policy-service agree

hipudding added a commit to hipudding/DeepSpeed that referenced this pull request Jun 29, 2023
When detecting override accelerators there's an error message to show
all support accelerators, using an accelerator list instead of hard
coding accelerator names in this message.

And fix code format issue(yapf).
@hipudding hipudding requested a review from tjruwase June 29, 2023 01:23
hipudding added a commit to hipudding/DeepSpeed that referenced this pull request Jun 29, 2023
NPU accelerator support is introduced in (deepspeedai#3595).
This commit provides two enhancements:
  1. Add a new accelerator_name 'npu' for choosing, it can be specified
by environment variable or auto detected.
  2. Optimize auto detect code in get_accelerator to avoid too many
layers of exception throwing.
hipudding added a commit to hipudding/DeepSpeed that referenced this pull request Jun 29, 2023
When detecting override accelerators there's an error message to show
all support accelerators, using an accelerator list instead of hard
coding accelerator names in this message.

And fix code format issue(yapf).
hipudding added a commit to hipudding/DeepSpeed that referenced this pull request Jun 30, 2023
NPU accelerator support is introduced in (deepspeedai#3595).
This commit provides two enhancements:
  1. Add a new accelerator_name 'npu' for choosing, it can be specified
by environment variable or auto detected.
  2. Optimize auto detect code in get_accelerator to avoid too many
layers of exception throwing.
hipudding added a commit to hipudding/DeepSpeed that referenced this pull request Jun 30, 2023
When detecting override accelerators there's an error message to show
all support accelerators, using an accelerator list instead of hard
coding accelerator names in this message.

And fix code format issue(yapf).
@hipudding
Copy link
Contributor Author

@tjruwase Good day. Could you please review this PR again and trigger workflows? Thanks.

@tjruwase
Copy link
Contributor

@hipudding, thanks for this PR. We are reviewing currently. Apologies for the delay.

@hipudding
Copy link
Contributor Author

:-) Thanks.

@hipudding
Copy link
Contributor Author

hipudding commented Jul 7, 2023

  1. Rebase my commits to master HEAD to avoid too many merge back from master.
  2. Change commit title to remove PR number because it will added automatically when PR merged.
  3. Add two more commits.

BTW, Is this PR still under reviewing? Any review suggestions? Please let me know if there is anything else that needs to be done.

hipudding added 3 commits July 7, 2023 01:29
NPU accelerator support is introduced in (deepspeedai#3595).
This commit provides two enhancements:
  1. Add a new accelerator_name 'npu' for choosing, it can be specified
by environment variable or auto detected.
  2. Optimize auto detect code in get_accelerator to avoid too many
layers of exception throwing.
When detecting override accelerators there's an error message to show
all support accelerators, using an accelerator list instead of hard
coding accelerator names in this message.

And fix code format issue(yapf).
HCCL is the distribute backend of Ascend NPU, it already implemented in
npu plugin for pytorch (https://gitee.com/ascend/pytorch). Add HCCL
backend as a not implemented backend to avoid not supported warning.
hipudding added 2 commits July 7, 2023 03:09
Ascend NPU does not implement any op yet, leave npu folder empty will
throw NoneType[op_name] when not supported op is called. Add this
NPUNotImplementedBuilder as the default builder.
@hipudding
Copy link
Contributor Author

It seems the failing test case is not related to these commits. Please re-trigger this failed workflow, Thanks.

@hipudding hipudding force-pushed the npu_support branch 3 times, most recently from 7876e3f to 127ecce Compare July 13, 2023 06:21
@hipudding hipudding requested review from delock and tjruwase July 13, 2023 06:25
1. cpu and other backend implement their ops in sub dirs under
op_builder, cuda_accelerator should skip these sub dirs.
2. Each backend will have its own NotImplementedBuilder, add device
prefix to this class to distinguish.
@hipudding hipudding requested a review from delock July 19, 2023 00:59
@hipudding hipudding force-pushed the npu_support branch 4 times, most recently from 6b48a6a to eb8ab12 Compare July 19, 2023 01:27
@hipudding
Copy link
Contributor Author

hipudding commented Jul 20, 2023

Good day. This PR is approved and ready to merge. But one workflow is failed due to OOM(not related to these changes). Could you retrigger this workflow and merge this PR? Thanks.

@tjruwase tjruwase added this pull request to the merge queue Jul 22, 2023
Merged via the queue into deepspeedai:master with commit 23a11a3 Jul 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants