-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NPU] apply npu_identity to conv bn and copy2cpu, test=develop #48039
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
2ed79f7
to
c9fd2b2
Compare
2329fb9
to
ad49fa3
Compare
temp_tensor = npu_identity_ad_func(self->tensor, -1); | ||
dense_tensor = | ||
std::dynamic_pointer_cast<phi::DenseTensor>(temp_tensor.impl()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是为了 Eager MODE 下当 调用 tensor_method_numpy ,将 tensor copy to cpu 的时候调用 npu_identity_ad_func 保证输出算子是正常的 NCHW 格式,而不是 NPU 的 ACL_FORMAT_NC1HWC0 特殊格式
std::dynamic_pointer_cast<phi::DenseTensor>(tensor_out.impl()); | ||
tensor_buf_ptr = dense_tensor->data(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是为了当 tensor 通过 tensor.numpy() 接口获取值的时候,在 TensorToPyArray 调用 npu_identity_ad_func 转化为正常的输出格式。
<< out->dims() << ", please avoid using this kernel!"; | ||
*out = phi::EmptyLike<T, Context>(dev_ctx, *out); | ||
VLOG(4) << "npu_identity op is only for NPU, please avoid using this kernel!"; | ||
out->ShareDataWith(x); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
正常的 CPU/GPU kernel 改为使用 share data,保证输入输出的值一致
new_ivar = self._grad_ivar() | ||
if 'npu' in get_all_custom_device_type(): | ||
new_ivar = paddle.incubate._npu_identity(x=new_ivar, format=-1) | ||
new_ivar = new_ivar._copy_to(core.CPUPlace(), True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是为了 op test 单测文件修改,其中 a = inputs_grad_dict[inputs_to_check_name].gradient() 调用获取反向输出,这里需要调用 _npu_identity 获得正常格式数据之后再拷贝到 CPU
|
||
np.testing.assert_allclose(out.shape, self.shape, rtol=1e-08) | ||
np.testing.assert_allclose(out.numpy(), self.x, rtol=1e-08) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
修改单测,比较 npu_identity 的输出和输入保持一致
@@ -52,7 +52,7 @@ def _npu_identity(x, format=-1): | |||
return _C_ops.npu_identity(x, format) | |||
|
|||
if _in_legacy_dygraph(): | |||
return _legacy_C_ops.npu_identity(x, format) | |||
return _legacy_C_ops.npu_identity(x, 'format', format) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
修复 _in_legacy_dygraph 下的算子调用问题
bias_storage = _C_ops.npu_identity( | ||
bias, 3 | ||
) # ACL_FORMAT_NC1HWC0 = 3 | ||
bias_storage._share_underline_tensor_to(bias) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NPU CONV 算子的输出为 ACL_FORMAT_NC1HWC0 格式,在 NPU 下这里需要将 bias 也转为同一格式计算 Add
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hardware related code can NOT exist in the API. If it exists temporarily, needs to be marked with TODO
and cleaned up later.
bias, 3 | ||
) # ACL_FORMAT_NC1HWC0 = 3 | ||
bias_storage._share_underline_tensor_to(bias) | ||
return _C_ops.add(pre_bias, bias) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上,修改 BIAS 为同一个格式进行计算,即原来 bias.shape = [6], 需要先变成 NCHW的格式 [1,1,6,1] 再通过 npu_identity 对里面的数据进行重排得到 ACL_FORMAT_NC1HWC0 的格式 [1, 1, 1, 1, 16],然后输入 NPU 的Add算子进行计算
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hardware related code can NOT exist in the API. If it exists temporarily, needs to be marked with TODO
and cleaned up later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
bias_trans._share_underline_tensor_to(self.bias) | ||
mean_trans._share_underline_tensor_to(self._mean) | ||
var_trans._share_underline_tensor_to(self._variance) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NPU BN 算子的所有输入参数需要预先转化为 ACL_FORMAT_NC1HWC0 的格式
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hardware related code can NOT exist in the API. If it exists temporarily, needs to be marked with TODO
and cleaned up later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
Others
PR changes
Others
Describe