Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NPU] apply npu_identity to conv bn and copy2cpu, test=develop #48039

Merged
merged 3 commits into from
Nov 28, 2022

Conversation

qili93
Copy link
Contributor

@qili93 qili93 commented Nov 16, 2022

PR types

Others

PR changes

Others

Describe

  1. When Tensor is in NPU storage format, need to use Identity OP to transform to origin format and then copy to cpu
  2. Conv and BatchNorm op need to prepare NPU storage format to improve performance on Ascend910

@paddle-bot
Copy link

paddle-bot bot commented Nov 16, 2022

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@qili93 qili93 marked this pull request as draft November 16, 2022 08:08
@qili93 qili93 marked this pull request as ready for review November 23, 2022 01:53
@qili93 qili93 force-pushed the apply_npu_identity branch 2 times, most recently from 2329fb9 to ad49fa3 Compare November 24, 2022 10:04
temp_tensor = npu_identity_ad_func(self->tensor, -1);
dense_tensor =
std::dynamic_pointer_cast<phi::DenseTensor>(temp_tensor.impl());
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是为了 Eager MODE 下当 调用 tensor_method_numpy ,将 tensor copy to cpu 的时候调用 npu_identity_ad_func 保证输出算子是正常的 NCHW 格式,而不是 NPU 的 ACL_FORMAT_NC1HWC0 特殊格式

std::dynamic_pointer_cast<phi::DenseTensor>(tensor_out.impl());
tensor_buf_ptr = dense_tensor->data();
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是为了当 tensor 通过 tensor.numpy() 接口获取值的时候,在 TensorToPyArray 调用 npu_identity_ad_func 转化为正常的输出格式。

<< out->dims() << ", please avoid using this kernel!";
*out = phi::EmptyLike<T, Context>(dev_ctx, *out);
VLOG(4) << "npu_identity op is only for NPU, please avoid using this kernel!";
out->ShareDataWith(x);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

正常的 CPU/GPU kernel 改为使用 share data,保证输入输出的值一致

new_ivar = self._grad_ivar()
if 'npu' in get_all_custom_device_type():
new_ivar = paddle.incubate._npu_identity(x=new_ivar, format=-1)
new_ivar = new_ivar._copy_to(core.CPUPlace(), True)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是为了 op test 单测文件修改,其中 a = inputs_grad_dict[inputs_to_check_name].gradient() 调用获取反向输出,这里需要调用 _npu_identity 获得正常格式数据之后再拷贝到 CPU


np.testing.assert_allclose(out.shape, self.shape, rtol=1e-08)
np.testing.assert_allclose(out.numpy(), self.x, rtol=1e-08)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

修改单测,比较 npu_identity 的输出和输入保持一致

@@ -52,7 +52,7 @@ def _npu_identity(x, format=-1):
return _C_ops.npu_identity(x, format)

if _in_legacy_dygraph():
return _legacy_C_ops.npu_identity(x, format)
return _legacy_C_ops.npu_identity(x, 'format', format)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

修复 _in_legacy_dygraph 下的算子调用问题

bias_storage = _C_ops.npu_identity(
bias, 3
) # ACL_FORMAT_NC1HWC0 = 3
bias_storage._share_underline_tensor_to(bias)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NPU CONV 算子的输出为 ACL_FORMAT_NC1HWC0 格式,在 NPU 下这里需要将 bias 也转为同一格式计算 Add

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardware related code can NOT exist in the API. If it exists temporarily, needs to be marked with TODO and cleaned up later.

bias, 3
) # ACL_FORMAT_NC1HWC0 = 3
bias_storage._share_underline_tensor_to(bias)
return _C_ops.add(pre_bias, bias)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上,修改 BIAS 为同一个格式进行计算,即原来 bias.shape = [6], 需要先变成 NCHW的格式 [1,1,6,1] 再通过 npu_identity 对里面的数据进行重排得到 ACL_FORMAT_NC1HWC0 的格式 [1, 1, 1, 1, 16],然后输入 NPU 的Add算子进行计算

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardware related code can NOT exist in the API. If it exists temporarily, needs to be marked with TODO and cleaned up later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

bias_trans._share_underline_tensor_to(self.bias)
mean_trans._share_underline_tensor_to(self._mean)
var_trans._share_underline_tensor_to(self._variance)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NPU BN 算子的所有输入参数需要预先转化为 ACL_FORMAT_NC1HWC0 的格式

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardware related code can NOT exist in the API. If it exists temporarily, needs to be marked with TODO and cleaned up later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

@jeff41404 jeff41404 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qili93 qili93 merged commit 32143f4 into PaddlePaddle:develop Nov 28, 2022
@qili93 qili93 deleted the apply_npu_identity branch November 28, 2022 05:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants