diff --git a/rfcs/APIs/20230928_api_design_for_diagonal_scatter.md b/rfcs/APIs/20230928_api_design_for_diagonal_scatter.md new file mode 100644 index 000000000..a31298493 --- /dev/null +++ b/rfcs/APIs/20230928_api_design_for_diagonal_scatter.md @@ -0,0 +1,255 @@ +# 标题 paddle.diagonal_scatter 设计文档 + +|API名称 | paddle.diagonal_scatter | +|---|---| +|提交作者 | 吴俊([bapijun] (https://github.com/bapijun)) | +|提交时间 | 2023-09-29 | +|版本号 | 此设计文档的版本号,如V1.0 | +|依赖飞桨版本 | 如无特殊情况,都应基于develop版本开发 | +|文件名 | 20230923_api_design_for_diagonal_scatter.md | + + +# 一、概述 +## 1、相关背景 +丰富Paddle的Tensor相关API,支持更多样的tensor操作 + +## 2、功能目标 + +对于一个Tensor,对于tensor a 和 b,将 b 中的内容按照索引的位置嵌入 a 中。如索引偏移量为0,则嵌入对角线位置。如索引偏移量 >0,则嵌入对角线上方,如偏移量 <0,则嵌入对角线下方。例如a = paddle.zeros([2,2]),b= paddle.ones([2]),输出为\[[1.0,0.0],[0.0,1.0]] +调用路径: +paddle.diagonal_scatter 作为独立的函数调用 +Tensor.diagonal_scatter,作为 Tensor 的方法使用 + +## 3、意义 + +为 Paddle 新增 `paddle.diagonal_scatter` API,丰富Paddle的Tensor相关API,支持更多样的tensor操作 + +# 二、飞桨现状 + +目前飞桨框架并不存在对应的api,可以通过其他的代码实现 + + +# 三、业内方案调研 + +### 1. Pytorch + +在 Pytorch 中使用的 API 格式如下: + +`torch.diagonal_scatter(input, src, offset=0, dim1=0, dim2=1)` + +- `input` 为 输入tensor,至少是2维。 +- `src` 为 tensor类型,用于填充input。 +- `offset` int类型,可选,决定是哪一个对角线,默认为0。 +- `dim1` int类型,可选,第一个维度来考虑对角线,默认为0。 +- `dim1` int类型,可选,第而个维度来考虑对角线,默认为1。 +其实现的代码如下 + +```cpp +// pytorch/aten/src/ATen/native/TensorShape.cpp +at::Tensor diagonal_scatter(const at::Tensor& self, const at::Tensor& src, int64_t offset, int64_t dim1, int64_t dim2) { + // See Note [*_scatter ops preserve strides] + auto output = clone_preserve_strides(self); + auto slice = output.diagonal(offset, dim1, dim2); + TORCH_CHECK(slice.sizes() == src.sizes(), "expected src to have a size equal to the slice of self. src size = ", src.sizes(), ", slice size = ", slice.sizes()); + slice.copy_(src); + return output; +} +``` +```cpp +// pytorch/aten/src/ATen/native/TensorShape.cpp +Tensor diagonal(const Tensor& self, int64_t offset, int64_t dim1_, int64_t dim2_) { + int64_t nDims = self.dim(); + int64_t dim1 = maybe_wrap_dim(dim1_, nDims); + int64_t dim2 = maybe_wrap_dim(dim2_, nDims); + TORCH_CHECK(dim1 != dim2, "diagonal dimensions cannot be identical ", dim1_, ", ", dim2_); + auto outnames = namedinference::compute_diagonal_outnames(self, dim1, dim2); + NoNamesGuard no_names_guard; + // NOLINTNEXTLINE(cppcoreguidelines-init-variables) + int64_t diag_size; + int64_t storage_offset = self.storage_offset(); + // compute storage offset and size for the diagonal + // for positive values of offset (above the main diagonal) + // "leftmost columns" (along dim2) are dropped + // for negative values of offset (below the main diagonal) + // "topmost rows" (along dim1) are dropped. + // Note that we invert +/- in the second to absorb the negative + // sign in the offset. + if (offset >= 0) { + diag_size = std::max(std::min(self.size(dim1), self.size(dim2)-offset), 0); + } else { + diag_size = std::max(std::min(self.size(dim1)+offset, self.size(dim2)), 0); + } + // NumPy allows you to specify offsets "off the end"; let's just be careful not to + // set a ridiculous storage_offset in that case (technically it shouldn't matter + // because there are no elements in the tensor, but let's be kosher). + if (diag_size == 0) { + // skip + } else if (offset >= 0) { + storage_offset += offset * self.stride(dim2); + } else { + storage_offset -= offset * self.stride(dim1); + } + // construct new size and stride: we drop dim1 and dim2 (maximum first for not changing the index of the minimum) + // the new ("joint") dimension is appended to the end of the shape / stride to match numpy semantics + DimVector sizes(self.sizes().begin(), self.sizes().end()); + DimVector strides(self.strides().begin(), self.strides().end()); + sizes.erase(sizes.begin() + std::max(dim1, dim2)); + strides.erase(strides.begin() + std::max(dim1, dim2)); + sizes.erase(sizes.begin() + std::min(dim1, dim2)); + strides.erase(strides.begin() + std::min(dim1, dim2)); + sizes.push_back(diag_size); + strides.push_back(self.stride(dim1)+self.stride(dim2)); + // return view with new parameters + auto result = self.as_strided(sizes, strides, storage_offset); + no_names_guard.reset(); + namedinference::propagate_names_if_nonempty(result, outnames); + return result; +} +``` + +```cpp +// Clones a tensor by cloning the underlying storage that it came from, +// which allows us to replicate the exact strides/storage_offset in the cloned tensor. +// Note [*_scatter ops preserve strides] +// In order for functionalization to preserve stride correctness, the *_scatter +// operators that it calls must preserve the striding behavior of their inputs. +// Specifically, the output of *_scatter(base, mutated_view, ...) +// should have identical size/stride/storage_offset to "base". +at::Tensor clone_preserve_strides(const at::Tensor& self) { + TORCH_INTERNAL_ASSERT(self.has_storage()); + // In cases where the input tensor has internal memory overlap, we cannot actually + // preserve the strides/storage_offset of the input tensor, because + // *_scatter ops will try to copy_() into the cloned tensor. + // However, this should **never** show up in functionalized user code; + // most aten ops that try to mutate a tensor with internal memory overlap would error anyway. + // + // The one place that this does come up is in autograd - if there's a select_scatter + // in the forward, then autograd will generate one for the backward. + // If the input to the select_scatter is grad_output, then this could be an expanded tensor + // with internal overlap. + if (at::has_internal_overlap(self) == at::MemOverlap::Yes) { + return self.clone(); + } + auto dtype_size = self.dtype().itemsize(); + auto nbytes = self.storage().sym_nbytes(); + TORCH_INTERNAL_ASSERT(nbytes % dtype_size == 0); + auto numel = nbytes / dtype_size; + auto self_full_size = self.as_strided_symint({std::move(numel)}, {1}, 0); + auto clone = self_full_size.clone(); + auto out = clone.as_strided_symint(self.sym_sizes(), self.sym_strides(), self.sym_storage_offset()); + return out; +} +``` + +### 2. TensorFlow + +没有找到对应的api + +### 3. MindSpore + +在 MindSpore 中使用的 API 格式如下: + +`mindspore.ops.diagonal_scatter` + +函数定义与pytorch类似 + +其实现的代码如下 +``` python +def _check_diagonal_scatter_shape(diag_shape, src_shape): + if diag_shape != src_shape: + raise ValueError(f"For diagonal_scatter, the shape of src should equal to the shape of input diagonal," + f"but got src.shape {src_shape} and diagonal shape {diag_shape}.") + + +def diagonal_scatter(input, src, offset=0, dim1=0, dim2=1): + + _check_is_tensor("input", input, "diagonal_scatter") + _check_is_tensor("src", src, "diagonal_scatter") + _check_is_int(offset, "offset", "diagonal_scatter") + _check_is_int(dim1, "dim1", "diagonal_scatter") + _check_is_int(dim2, "dim2", "diagonal_scatter") + input_diag = input.diagonal(offset, dim1, dim2) + _check_diagonal_scatter_shape(input_diag.shape, src.shape) + embed = ones_like(src) + embed = ops.diag_embed(embed, offset, dim1, dim2) + embed = input * embed + src = ops.diag_embed(src, offset, dim1, dim2) + return input + src - embed +``` + + +# 四、对比分析 + +这里倾向于pytorch和MindSpore的方式,其二者的实现思路类型,都是采用diagonal切片后,针对代码进行填入后才做处理,这里采用和MindSpore类似的方式去实现。 + +# 五、设计思路与实现方案 + +## 命名与参数设计 + +paddle.diagonal_scatter + + ```python +paddle.diagonal_scatter(x, y, offset=0, dim1=0, dim2=1, name=None) + ``` +参数定义: + +- `x(Tensor)`:输入张量,张量的维度至少为2维 +- `y(Tensor)`:嵌入张量,将会被嵌入到输入张量中 +- `offset(int, optional)`:偏移的对角线 + - 偏移量为0,则嵌入对角线位置 + - 偏移量大于0,则嵌入对角线上方 + - 偏移量小于0,则嵌入对角线下方 +- `dim1(int, optional)`:对角线的第一个维度,默认值为0 +- `dim2(int, optional)`:对角线的第二个维度,默认值为1 +- `name (str,optional)`:具体用法请参见 [Name](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_guides/low_level/program.html#api-guide-name),一般无需设置,默认值为 None + + +Tensor.diagonal_scatter + +```python +Tensor.diagonal_scatter(x, offset=0, dim1=0, dim2=1, name=None) + +``` +参数定义: + +- `x(Tensor)`:嵌入张量,将会被嵌入到输入张量中 +- `offset(int, optional)`:偏移的对角线 + - 偏移量为0,则嵌入对角线位置 + - 偏移量大于0,则嵌入对角线上方 + - 偏移量小于0,则嵌入对角线下方 +- `dim1(int, optional)`:对角线的第一个维度,默认值为0 +- `dim2(int, optional)`:对角线的第二个维度,默认值为1 +- `name (str,optional)`:具体用法请参见 [Name](https://www.paddlepaddle.org.cn/documentation/docs/zh/api_guides/low_level/program.html#api-guide-name),一般无需设置,默认值为 None + +## 底层OP设计 + +依赖已有的API(fill_diagonal_tensor或diagonal)实现,无需实现新的底层OP + +## API实现方案 + +参考MindSpore的方式去是实现对应的代码 + +# 六、测试和验收的考量 +参考:[新增API 测试及验收规范](https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/dev_guides/api_contributing_guides/api_accpetance_criteria_cn.html) + +可考虑一下场景: + +测试考虑以下case: + +- 校验diagonal_scatter答案的正确性,对比torch.diagonal_scatter进行校验 + +- 检查参数的正确性,比如是否为支持的数据类型,是否在offset/dim1/dim2设置有误时进行报错 + +- 检查input的维度是否符合大于等于2个维度 + +- 检查input的shape和src的维度是否是否能完成覆盖 + +# 七、可行性分析和排期规划 +方案实施难度可控,工期上可以满足在当前版本周期内开发完成 + +# 八、影响面 +需要进一步讨论的问题,开放性问题,有争议问题;对其他模块是否有影响 + +# 名词解释 + +# 附件及参考资料