Skip to content

[Feature package] Full feature support with Ascend NPU #4567

@hipudding

Description

@hipudding

Background

Ascend is a full-stack AI computing infrastructure for industry applications and services based on Huawei Ascend processors and software. For more information about Ascend, see Ascend Community.

CANN (Compute Architecture of Neural Networks), developped by Huawei, is a heterogeneous computing architecture for AI.

Pytorch has officially announced support for Ascend NPU (through key PrivateUse1), please see the PrivateUse1 tutorial here.

Previous work

NPU accelerator support has already been merged (see #3595, #3831), which makes it possible to use NPU as a backend accelerator for basic training and inferencing tasks. However, to achieve full support, more features need to be implemented.

Sub tasks

Here is a list of features that need to be implemented or tested.

status title assigned to
Done Ascend Accelerator @CurryRice233
Done Ascend Accelerator @hipudding
Done Unit tests @RUAN-ZX
Done FP16 @minchao-sun, @wuhhu
Done BF16 @minchao-sun, @wuhhu
Done Gradient Accumulation @minchao-sun, @wuhhu
Done Data Parallelism @minchao-sun, @wuhhu
Done Pipeline Parallelism @RUAN-ZX
Done Zero1 @misstek
Done Zero2 @misstek
Done Zero3 @misstek
Done Activation Checkpointing @CurryRice233
Done Fused Adam @CurryRice233
Done Mixture of Experts (MoE) @wangshuai09
Done RLHF @wangshuai09 @CurryRice233
Done ZeRO Offload @hipudding @CurryRice233
Done ZeRO Infinity @misstek
Done 1-bit Adam @RUAN-ZX
Done 1-bit LAMB @RUAN-ZX
Done 0/1 Adam @minchao-sun
Done Curriculum Learning @minchao-sun
Done Layer Dropping @minchao-sun

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions