From 49dffed6da62fd64f7d4f685b573fac8b6c87a0c Mon Sep 17 00:00:00 2001 From: PommesPeter <434596665@qq.com> Date: Sat, 9 Jul 2022 17:37:58 +0800 Subject: [PATCH 1/7] added bucketize rfc docs --- .../APIs/20220709_api_design_for_bucketize.md | 158 ++++++++++++++++++ 1 file changed, 158 insertions(+) create mode 100644 rfcs/APIs/20220709_api_design_for_bucketize.md diff --git a/rfcs/APIs/20220709_api_design_for_bucketize.md b/rfcs/APIs/20220709_api_design_for_bucketize.md new file mode 100644 index 000000000..0afcb6c11 --- /dev/null +++ b/rfcs/APIs/20220709_api_design_for_bucketize.md @@ -0,0 +1,158 @@ +# paddle.bucketize 设计文档 + +| API 名称 | paddle.bucketize | +| ------------ | ---------------------------------------- | +| 提交作者 | PommesPeter | +| 提交时间 | 2022-07-09 | +| 版本号 | V1.0 | +| 依赖飞桨版本 | develop | +| 文件名 | 20220709_api_design_for_bucketize.md | + +# 一、概述 + +## 1、相关背景 + +为了提升飞桨 API 丰富度,支持科学计算相关 API,Paddle 需要扩充 API `paddle.bucketize`。 + +## 2、功能目标 + +增加 API `paddle.bucketize`,用于根据 `sorted_sequence` 序列计算出 `x` 中每个元素的区间索引。 + +## 3、意义 + +为 Paddle 增加神经网络相关的距离计算函数,丰富 `paddle` 中科学计算相关的 API。 + +# 二、飞桨现状 + +- 目前 Paddle 缺少 `bucketize` API,但是存在 `searchsorted` API,参考其他框架可以发现,没有专门针对一维 `sorted_sequence` 进行计算的 api,直接使用 `searchsorted` API 导致花费时间在判断维度上。 +- 该 API 的实现及测试主要参考目前 Paddle 中含有的 `paddle.searchsorted`。 + +# 三、业内方案调研 + +## PyTorch + +PyTorch 中有 `torch.bucketize` 的API,详细参数为 `torch.bucketize(input, boundaries, *, out_int32=False, right=False, out=None) → Tensor`。 + +在 PyTorch 中的介绍为: + +> Returns the indices of the buckets to which each value in the `input` belongs, where the boundaries of the buckets are set by `boundaries`. Return a new tensor with the same size as `input`. If `right` is False (default), then the left boundary is closed. More formally, the returned index satisfies the following rules: +> +> | `right` | *returned index satisfies* | +> | ------- | --------------------------------------------------------- | +> | False | `boundaries[i-1] < input[m][n]...[l][x] <= boundaries[i]` | +> | True | `boundaries[i-1] <= input[m][n]...[l][x] < boundaries[i]` | + +在实现方法上,PyTorch 是通过 C++ API 组合实现的,[代码位置](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/Bucketization.cpp) + +参数表: + +- input(Tensor or Scalar):N-D Tensor, + +- boundaries(Tensor):,1-D Tensor,必须包含一个单调递增的序列。 + +- out_int32(bool,optional):指明输出数据类型。如果是True,则输出torch.int32;如果是False,则输出torch.int64。默认是False。 + +- right(bool,optional):如果为 False,返回找到的第一个合适的位置; 如果为 True,返回最后一个这样的索引; 如果没有找到合适的索引,则返回0作为非数值值(例如,Nan,Inf)或边界的大小(通过最后一个索引)。 + + 换句话说,如果为 False,则从边界获取输入中每个值的下界索引; 如果为 True,则获取上界索引。 默认值为 False。 + +- out(Tensor,optional):输出的Tensor必须和输出的Tensor大小相同。 + +## Tensorflow + +Tensorflow 中有 `tf.transform.bucketize` API,具体参数为 `tft.bucketize( x: common_types.ConsistentTensorType, num_buckets: int, epsilon: Optional[float] = None, weights: Optional[tf.Tensor] = None, elementwise: bool = False, name: Optional[str] = None) -> common_types.ConsistentTensorType` + +在实现方法上,Tensorflow 是通过 Python API 的方式组合实现的,[代码位置](https://github.com/tensorflow/transform/blob/d0c3349403120a2cf1177c111b674c07e9b38398/tensorflow_transform/mappers.py#L1690-L1770) + +参数表: + +| Args | | +| :------------ | ------------------------------------------------------------ | +| `x` | 一个数字输入的 `Tensor`或`CompositeTensor`,其值应被映射到桶中。对于一个`CompositeTensor`,只有非缺失的值才会被包括在定量计算中,`bucketize`的结果将是一个`CompositeTensor`,其非缺失的值被映射到桶中。如果 elementwise=True,那么`x`必须是密集的。 | +| `num_buckets` | 输入的`x`中的值被分成大小大致相等的桶,桶的数量是`num_buckets`。 | +| `epsilon` | (可选)误差容限,通常是一个接近于零的小部分。如果调用者没有指定一个值,将根据实验结果计算出一个合适的值。对于小于 100 的`num_buckets`,选择 0.01 的值来处理高达约 1 万亿的输入数据值的数据集。如果`num_buckets`更大,那么 epsilon 被设置为 (1 / `num_buckets`) 以执行更严格的误差容忍度,因为更多的桶将导致每个桶的范围更小,所以我们希望边界不那么模糊。详情见analyzers.quantiles()。 | +| `weights` | (可选)用于定量的权重张量。张量必须与 x 具有相同的形状。 | +| `elementwise` | (可选)如果为真,对 tensor 的每个元素进行独立的桶化。 | +| `name` | (可选) 该操作的名称。 | + +# 四、对比分析 + +## 共同点 + +- 都能实现根据 `sorted_sequence` 计算出输入 `x` 中每个元素所对应的区间索引 + +## 不同点 + +- PyTorch 是在 C++ API 基础上实现,使用 Python 调用 C++ 对应的接口。 +- PyTorch 输入参数比较简单,可选的操作比较少。 +- Tensorflow 则是通过 Python API 直接实现其对应的功能。 +- Tensorflow 有 `num_buckets`、`epsilon`、`weights` 等参数的设置,可调整的程度更高。 + + +# 五、设计思路与实现方案 + +## 命名与参数设计 + +添加 API + +```python +paddle.bucketize( + x: Tensor, + sorted_sequence: Tensor, + out_int32: bool=False, + right: bool=False, + name: str=None +) +``` + +## 底层 OP 设计 + +使用已有的 API 组合实现,不再单独设计 OP。 + +## API 实现方案 + +该 API 实现于 `python/paddle/tensor/search.py` + +首先,`bucketize` 主要针对一维情况下的 `sorted_sequence`,所以需要对输入的维度大小进行判断,通过断言进行判断,当输入维度不为 1 时触发 `AssertError`。 + +随后,Paddle 中已有 `searchsorted` API 的具体实现逻辑,位于 `python/paddle/tensor/search.py` 下的 `searchsorted` 函数中,因此只需要调用其函数即可。 + +# 六、测试和验收的考量 + +测试需要考虑的 case 如下: + +- 数值结果的一致性,使用 numpy 作为参考标准 +- 参数 `right` 为 True 和 False 时输出的正确性 +- 参数 `out_int32` 为 True 和 False 时 dtype 输出的正确性; +- 未输入 `right` 时的输出正确性; +- 未输入 `out_int32` 时的输出正确性; + +# 七、可行性分析和排期规划 + +方案主要依赖现有 Paddle API 组合而成,且依赖的 `paddle.searchsorted` 已经在 Paddle repo 的 [python/paddle/tensor/search.py](https://github.com/PaddlePaddle/Paddle/blob/release/2.3/python/paddle/tensor/search.py#L910)。工期上可以满足在当前版本周期内开发完成。 + +# 八、影响面 + +新增 API,对其他模块是否有影响 + +# 名词解释 + +无 + +# 附件及参考资料 + +## PyTorch + +[torch.bucketize](https://pytorch.org/docs/stable/generated/torch.bucketize.html) + +[torch.searchsorted](https://pytorch.org/docs/stable/generated/torch.searchsorted.html?highlight=searchsorted#torch.searchsorted) + +## tensorflow + +[tf.transform.bucketize](https://www.tensorflow.org/tfx/transform/api_docs/python/tft/bucketize) + +[tf.searchsorted](https://www.tensorflow.org/api_docs/python/tf/searchsorted) + +## Paddle + +[paddle.searchsorted](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/searchsorted_cn.html) \ No newline at end of file From b604beb9fd5066226e1eaa6fdfd9ff790176bf12 Mon Sep 17 00:00:00 2001 From: PommesPeter <434596665@qq.com> Date: Sun, 10 Jul 2022 16:52:54 +0800 Subject: [PATCH 2/7] updated bucketize rfc docs --- .../APIs/20220709_api_design_for_bucketize.md | 312 ++++++++++++++++++ 1 file changed, 312 insertions(+) diff --git a/rfcs/APIs/20220709_api_design_for_bucketize.md b/rfcs/APIs/20220709_api_design_for_bucketize.md index 0afcb6c11..f84ecca1f 100644 --- a/rfcs/APIs/20220709_api_design_for_bucketize.md +++ b/rfcs/APIs/20220709_api_design_for_bucketize.md @@ -44,6 +44,236 @@ PyTorch 中有 `torch.bucketize` 的API,详细参数为 `torch.bucketize(input 在实现方法上,PyTorch 是通过 C++ API 组合实现的,[代码位置](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/Bucketization.cpp) +实现代码: +```cpp +#include +#include +#include +#include +#include +#include + +/* Implement a numpy like searchsorted and a TF like bucketize function running on cpu + * + * - torch.searchsorted(sorted_sequence, values, right=False, side='left', out_int32=False, sorter=None) + * sorted_sequence - N*D or 1D (apply to all values) tensor containing sorted sequences in last dimension + * values - N*D tensor or a Scalar (when sorted_sequence is 1D) containing the search values + * right - corresponding to lower bound if False and upper bound if True + * side - (preferred to right) corresponding to lower bound if 'left' and upper bound if 'right' + * out_int32 - the output tensor is int64_t type if False and int(32bit normally) type if True. + * sorter - if provided, sorted_sequence may not be sorted and the sorted order is given by this tensor + * + * - torch.bucketize(values, boundaries, right=False, out_int32=False) + * values - N*D tensor or a Scalar containing the search value + * boundaries - 1D tensor containing a sorted sequences + * right - corresponding to lower bound if False and upper bound if True + * out_int32 - the output tensor is int64_t type if False and int(32bit normally) type if True. + * + * - Restrictions are defined in searchsorted_pre_check() + */ + +namespace at { +namespace native { + +namespace { + +// minimal size for searchsorted_cpu_contiguous to run parallel (multithread) +constexpr int64_t SEARCHSORTED_GRAIN_SIZE = 200; + +// customized lower_bound func to ensure the low bound of 'nan', 'inf' etc. be the end of boundary +// and we can properly handle a sorter argument +// std::lower_bound can not be used here since its customized comparator need strict weak ordering +// and the customized comparators require both arguments to have the same type, which wouldn't +// happen when comparing val of input_t to an indexer value from sorter of int64 +template +int64_t cus_lower_bound(int64_t start, int64_t end, const input_t val, const input_t* bd, const int64_t* sort) { + // sorter gives relative ordering for ND tensors, so we need to save and add the non-updated start as an offset + // i.e. the second row of a 3x3 tensors starts at element 3 but sorter's second row only contains 0, 1, or 2 + const int64_t orig_start = start; + while (start < end) { + const int64_t mid = start + ((end - start) >> 1); + const input_t mid_val = sort ? bd[sort[mid] + orig_start] : bd[mid]; + if (!(mid_val >= val)) { + start = mid + 1; + } + else { + end = mid; + } + } + return start; +} + +// customized upper_bound func to ensure we can properly handle a sorter argument +// std::upper_bound can not be used here since its customized comparator requires both arguments to have the +// same type, which wouldn't happen when comparing val of input_t to an indexer value from sorter of int64 +template +int64_t cus_upper_bound(int64_t start, int64_t end, const input_t val, const input_t* bd, const int64_t* sort) { + // sorter gives relative ordering for ND tensors, so we need to save and add the non-updated start as an offset + // i.e. the second row of a 3x3 tensors starts at element 3 but sorter's second row only contains 0, 1, or 2 + const int64_t orig_start = start; + while (start < end) { + const int64_t mid = start + ((end - start) >> 1); + const input_t mid_val = sort ? bd[sort[mid] + orig_start] : bd[mid]; + if (!(mid_val > val)) { + start = mid + 1; + } + else { + end = mid; + } + } + return start; +} + +template +void searchsorted_cpu_contiguous(Tensor& result, const Tensor& input, const Tensor& boundaries, const bool& right, const Tensor& sorter) { + int64_t numel_in = input.numel(); + bool is_scalar_input = input.dim() == 0 && numel_in == 1; + // inner most dim size of input and boundaries + int64_t idim_in = is_scalar_input ? 1 : input.sizes().back(); + int64_t idim_bd = boundaries.sizes().back(); + + const input_t *data_in = input.data_ptr(); + const input_t *data_bd = boundaries.data_ptr(); + const int64_t *data_st = sorter.defined() ? sorter.data_ptr() : nullptr; + output_t *data_out = result.data_ptr(); + + bool is_1d_boundaries = boundaries.dim() == 1; + at::parallel_for(0, numel_in, SEARCHSORTED_GRAIN_SIZE, [&](int64_t start, int64_t end) { + for (const auto i : c10::irange(start, end)) { + // If boundaries tensor is 1d, we always search the entire boundary tensor + int64_t start_bd = is_1d_boundaries ? 0 : i / idim_in * idim_bd; + int64_t end_bd = start_bd + idim_bd; + + int64_t pos = !right ? + cus_lower_bound(start_bd, end_bd, data_in[i], data_bd, data_st) - start_bd : + cus_upper_bound(start_bd, end_bd, data_in[i], data_bd, data_st) - start_bd; + + // type conversion might happen here + data_out[i] = pos; + } + }); +} + +void dispatch(Tensor& result, const Tensor& input, const Tensor& boundaries, bool out_int32, bool right, const Tensor& sorter) { + if (!out_int32) { + AT_DISPATCH_ALL_TYPES_AND2( + ScalarType::Half, + ScalarType::BFloat16, + input.scalar_type(), + "searchsorted_out_cpu", + [&] { + searchsorted_cpu_contiguous( + result, input, boundaries, right, sorter); + }); + } + else { + AT_DISPATCH_ALL_TYPES_AND2( + ScalarType::Half, + ScalarType::BFloat16, + input.scalar_type(), + "searchsorted_out_cpu", + [&] { + searchsorted_cpu_contiguous( + result, input, boundaries, right, sorter); + }); + } +} + +} + +Tensor& searchsorted_out_cpu( + const Tensor& sorted_sequence, + const Tensor& self, + bool out_int32, + bool right, + const c10::optional side_opt, + const c10::optional& sorter_opt, + Tensor& result) { + // See [Note: hacky wrapper removal for optional tensor] + c10::MaybeOwned sorter_maybe_owned = at::borrow_from_optional_tensor(sorter_opt); + const Tensor& sorter = *sorter_maybe_owned; + searchsorted_pre_check(sorted_sequence, self, result, out_int32, right, side_opt, sorter); + resize_output(result, self.sizes()); + + // we have two inputs to set right, pre_check checks that they aren't set to opposites + bool is_right = side_opt ? *side_opt == "right" : right; + + if (self.numel() == 0) { + return result; + } + + // for non-contiguous result tensors, we write the output to a contiguous copy so we can later copy back, maintaing the original result tensor + Tensor out = result; + if (!result.is_contiguous()) { + out = result.contiguous(); + } + if (sorted_sequence.is_contiguous() && self.is_contiguous() && sorted_sequence.dtype() == self.dtype() && sorter.is_contiguous()) { + dispatch(out, self, sorted_sequence, out_int32, is_right, sorter); + } + else { + Tensor trimmed_input; + Tensor trimmed_boundaries; + Tensor trimmed_sorter; + searchsorted_maybe_trim_input_tensors(trimmed_input, trimmed_boundaries, trimmed_sorter, self, sorted_sequence, sorter); + const Tensor& final_input = trimmed_input.defined() ? trimmed_input : self; + const Tensor& final_boundaries = trimmed_boundaries.defined() ? trimmed_boundaries : sorted_sequence; + const Tensor& final_sorter = trimmed_sorter.defined() ? trimmed_sorter : sorter; + dispatch(out, final_input, final_boundaries, out_int32, is_right, final_sorter); + } + + // if result is non-contiguous, we wrote the answer to a copied version, so we copy back to the original result tensor + if (!result.is_contiguous()) { + result.copy_(out); + } + return result; +} + +Tensor searchsorted_cpu( + const Tensor& sorted_sequence, + const Tensor& self, + bool out_int32, + bool right, + const c10::optional side_opt, + const c10::optional& sorter_opt) { + ScalarType scalar_type = out_int32 ? ScalarType::Int : ScalarType::Long; + c10::TensorOptions options = TensorOptions().device(self.options().device()).dtype(scalar_type); + Tensor result = at::empty({0}, options, MemoryFormat::Contiguous); + at::native::searchsorted_out_cpu(sorted_sequence, self, out_int32, right, side_opt, sorter_opt, result); + return result; +} + +Tensor searchsorted_cpu( + const Tensor& sorted_sequence, + const Scalar& self, + bool out_int32, + bool right, + const c10::optional side_opt, + const c10::optional& sorter_opt) { + const Tensor& scalar_tensor = searchsorted_scalar_tensor(self, sorted_sequence.device()); + return searchsorted_cpu(sorted_sequence, scalar_tensor, out_int32, right, side_opt, sorter_opt); +} + +Tensor& bucketize_out_cpu(const Tensor& self, const Tensor& boundaries, bool out_int32, bool right, Tensor& result) { + TORCH_CHECK(boundaries.dim() == 1, "boundaries tensor must be 1 dimension, but got dim(", boundaries.dim(), ")"); + at::native::searchsorted_out_cpu(boundaries, self, out_int32, right, nullopt, nullopt, result); + return result; +} + +Tensor bucketize_cpu(const Tensor& self, const Tensor& boundaries, bool out_int32, bool right) { + ScalarType scalar_type = out_int32 ? ScalarType::Int : ScalarType::Long; + c10::TensorOptions options = TensorOptions().device(self.options().device()).dtype(scalar_type); + Tensor result = at::empty({0}, options, MemoryFormat::Contiguous); + at::native::bucketize_out_cpu(self, boundaries, out_int32, right, result); + return result; +} + +Tensor bucketize_cpu(const Scalar& self, const Tensor& boundaries, bool out_int32, bool right) { + return bucketize_cpu(searchsorted_scalar_tensor(self, boundaries.device()), boundaries, out_int32, right); +} + +}} // namespace at::native +``` + 参数表: - input(Tensor or Scalar):N-D Tensor, @@ -64,6 +294,88 @@ Tensorflow 中有 `tf.transform.bucketize` API,具体参数为 `tft.bucketize( 在实现方法上,Tensorflow 是通过 Python API 的方式组合实现的,[代码位置](https://github.com/tensorflow/transform/blob/d0c3349403120a2cf1177c111b674c07e9b38398/tensorflow_transform/mappers.py#L1690-L1770) +代码实现: +```python +@common.log_api_use(common.MAPPER_COLLECTION) +def bucketize(x: common_types.ConsistentTensorType, + num_buckets: int, + epsilon: Optional[float] = None, + weights: Optional[tf.Tensor] = None, + elementwise: bool = False, + name: Optional[str] = None) -> common_types.ConsistentTensorType: + """Returns a bucketized column, with a bucket index assigned to each input. + Args: + x: A numeric input `Tensor` or `CompositeTensor` whose values should be + mapped to buckets. For a `CompositeTensor` only non-missing values will + be included in the quantiles computation, and the result of `bucketize` + will be a `CompositeTensor` with non-missing values mapped to buckets. If + elementwise=True then `x` must be dense. + num_buckets: Values in the input `x` are divided into approximately + equal-sized buckets, where the number of buckets is `num_buckets`. + epsilon: (Optional) Error tolerance, typically a small fraction close to + zero. If a value is not specified by the caller, a suitable value is + computed based on experimental results. For `num_buckets` less than 100, + the value of 0.01 is chosen to handle a dataset of up to ~1 trillion input + data values. If `num_buckets` is larger, then epsilon is set to + (1/`num_buckets`) to enforce a stricter error tolerance, because more + buckets will result in smaller range for each bucket, and so we want the + boundaries to be less fuzzy. See analyzers.quantiles() for details. + weights: (Optional) Weights tensor for the quantiles. Tensor must have the + same shape as x. + elementwise: (Optional) If true, bucketize each element of the tensor + independently. + name: (Optional) A name for this operation. + Returns: + A `Tensor` of the same shape as `x`, with each element in the + returned tensor representing the bucketized value. Bucketized value is + in the range [0, actual_num_buckets). Sometimes the actual number of buckets + can be different than num_buckets hint, for example in case the number of + distinct values is smaller than num_buckets, or in cases where the + input values are not uniformly distributed. + NaN values are mapped to the last bucket. Values with NaN weights are + ignored in bucket boundaries calculation. + Raises: + TypeError: If num_buckets is not an int. + ValueError: If value of num_buckets is not > 1. + ValueError: If elementwise=True and x is a `CompositeTensor`. + """ + with tf.compat.v1.name_scope(name, 'bucketize'): + if not isinstance(num_buckets, int): + raise TypeError('num_buckets must be an int, got %s' % type(num_buckets)) + + if num_buckets < 1: + raise ValueError('Invalid num_buckets %d' % num_buckets) + + if isinstance(x, (tf.SparseTensor, tf.RaggedTensor)) and elementwise: + raise ValueError( + 'bucketize requires `x` to be dense if `elementwise=True`') + + if epsilon is None: + # See explanation in args documentation for epsilon. + epsilon = min(1.0 / num_buckets, 0.01) + + x_values = tf_utils.get_values(x) + bucket_boundaries = analyzers.quantiles( + x_values, + num_buckets, + epsilon, + weights, + reduce_instance_dims=not elementwise) + + if not elementwise: + return apply_buckets(x, bucket_boundaries) + + num_features = tf.math.reduce_prod(x.get_shape()[1:]) + bucket_boundaries = tf.reshape(bucket_boundaries, [num_features, -1]) + x_reshaped = tf.reshape(x, [-1, num_features]) + bucketized = [] + for idx, boundaries in enumerate(tf.unstack(bucket_boundaries, axis=0)): + bucketized.append(apply_buckets(x_reshaped[:, idx], + tf.expand_dims(boundaries, axis=0))) + return tf.reshape(tf.stack(bucketized, axis=1), + [-1] + x.get_shape().as_list()[1:]) +``` + 参数表: | Args | | From 8413595cb21b7404c73cd333d17a555b31df8397 Mon Sep 17 00:00:00 2001 From: PommesPeter <434596665@qq.com> Date: Mon, 11 Jul 2022 18:06:21 +0800 Subject: [PATCH 3/7] update: modified part 3 and part 6 --- .../APIs/20220709_api_design_for_bucketize.md | 200 +----------------- 1 file changed, 7 insertions(+), 193 deletions(-) diff --git a/rfcs/APIs/20220709_api_design_for_bucketize.md b/rfcs/APIs/20220709_api_design_for_bucketize.md index f84ecca1f..85e5c202c 100644 --- a/rfcs/APIs/20220709_api_design_for_bucketize.md +++ b/rfcs/APIs/20220709_api_design_for_bucketize.md @@ -46,140 +46,16 @@ PyTorch 中有 `torch.bucketize` 的API,详细参数为 `torch.bucketize(input 实现代码: ```cpp -#include -#include -#include -#include -#include -#include - -/* Implement a numpy like searchsorted and a TF like bucketize function running on cpu - * - * - torch.searchsorted(sorted_sequence, values, right=False, side='left', out_int32=False, sorter=None) - * sorted_sequence - N*D or 1D (apply to all values) tensor containing sorted sequences in last dimension - * values - N*D tensor or a Scalar (when sorted_sequence is 1D) containing the search values - * right - corresponding to lower bound if False and upper bound if True - * side - (preferred to right) corresponding to lower bound if 'left' and upper bound if 'right' - * out_int32 - the output tensor is int64_t type if False and int(32bit normally) type if True. - * sorter - if provided, sorted_sequence may not be sorted and the sorted order is given by this tensor - * - * - torch.bucketize(values, boundaries, right=False, out_int32=False) - * values - N*D tensor or a Scalar containing the search value - * boundaries - 1D tensor containing a sorted sequences - * right - corresponding to lower bound if False and upper bound if True - * out_int32 - the output tensor is int64_t type if False and int(32bit normally) type if True. - * - * - Restrictions are defined in searchsorted_pre_check() - */ - namespace at { namespace native { namespace { -// minimal size for searchsorted_cpu_contiguous to run parallel (multithread) -constexpr int64_t SEARCHSORTED_GRAIN_SIZE = 200; - -// customized lower_bound func to ensure the low bound of 'nan', 'inf' etc. be the end of boundary -// and we can properly handle a sorter argument -// std::lower_bound can not be used here since its customized comparator need strict weak ordering -// and the customized comparators require both arguments to have the same type, which wouldn't -// happen when comparing val of input_t to an indexer value from sorter of int64 -template -int64_t cus_lower_bound(int64_t start, int64_t end, const input_t val, const input_t* bd, const int64_t* sort) { - // sorter gives relative ordering for ND tensors, so we need to save and add the non-updated start as an offset - // i.e. the second row of a 3x3 tensors starts at element 3 but sorter's second row only contains 0, 1, or 2 - const int64_t orig_start = start; - while (start < end) { - const int64_t mid = start + ((end - start) >> 1); - const input_t mid_val = sort ? bd[sort[mid] + orig_start] : bd[mid]; - if (!(mid_val >= val)) { - start = mid + 1; - } - else { - end = mid; - } - } - return start; -} +// ... -// customized upper_bound func to ensure we can properly handle a sorter argument -// std::upper_bound can not be used here since its customized comparator requires both arguments to have the -// same type, which wouldn't happen when comparing val of input_t to an indexer value from sorter of int64 -template -int64_t cus_upper_bound(int64_t start, int64_t end, const input_t val, const input_t* bd, const int64_t* sort) { - // sorter gives relative ordering for ND tensors, so we need to save and add the non-updated start as an offset - // i.e. the second row of a 3x3 tensors starts at element 3 but sorter's second row only contains 0, 1, or 2 - const int64_t orig_start = start; - while (start < end) { - const int64_t mid = start + ((end - start) >> 1); - const input_t mid_val = sort ? bd[sort[mid] + orig_start] : bd[mid]; - if (!(mid_val > val)) { - start = mid + 1; - } - else { - end = mid; - } - } - return start; } -template -void searchsorted_cpu_contiguous(Tensor& result, const Tensor& input, const Tensor& boundaries, const bool& right, const Tensor& sorter) { - int64_t numel_in = input.numel(); - bool is_scalar_input = input.dim() == 0 && numel_in == 1; - // inner most dim size of input and boundaries - int64_t idim_in = is_scalar_input ? 1 : input.sizes().back(); - int64_t idim_bd = boundaries.sizes().back(); - - const input_t *data_in = input.data_ptr(); - const input_t *data_bd = boundaries.data_ptr(); - const int64_t *data_st = sorter.defined() ? sorter.data_ptr() : nullptr; - output_t *data_out = result.data_ptr(); - - bool is_1d_boundaries = boundaries.dim() == 1; - at::parallel_for(0, numel_in, SEARCHSORTED_GRAIN_SIZE, [&](int64_t start, int64_t end) { - for (const auto i : c10::irange(start, end)) { - // If boundaries tensor is 1d, we always search the entire boundary tensor - int64_t start_bd = is_1d_boundaries ? 0 : i / idim_in * idim_bd; - int64_t end_bd = start_bd + idim_bd; - - int64_t pos = !right ? - cus_lower_bound(start_bd, end_bd, data_in[i], data_bd, data_st) - start_bd : - cus_upper_bound(start_bd, end_bd, data_in[i], data_bd, data_st) - start_bd; - - // type conversion might happen here - data_out[i] = pos; - } - }); -} - -void dispatch(Tensor& result, const Tensor& input, const Tensor& boundaries, bool out_int32, bool right, const Tensor& sorter) { - if (!out_int32) { - AT_DISPATCH_ALL_TYPES_AND2( - ScalarType::Half, - ScalarType::BFloat16, - input.scalar_type(), - "searchsorted_out_cpu", - [&] { - searchsorted_cpu_contiguous( - result, input, boundaries, right, sorter); - }); - } - else { - AT_DISPATCH_ALL_TYPES_AND2( - ScalarType::Half, - ScalarType::BFloat16, - input.scalar_type(), - "searchsorted_out_cpu", - [&] { - searchsorted_cpu_contiguous( - result, input, boundaries, right, sorter); - }); - } -} - -} +// ... Tensor& searchsorted_out_cpu( const Tensor& sorted_sequence, @@ -189,20 +65,18 @@ Tensor& searchsorted_out_cpu( const c10::optional side_opt, const c10::optional& sorter_opt, Tensor& result) { - // See [Note: hacky wrapper removal for optional tensor] + c10::MaybeOwned sorter_maybe_owned = at::borrow_from_optional_tensor(sorter_opt); const Tensor& sorter = *sorter_maybe_owned; searchsorted_pre_check(sorted_sequence, self, result, out_int32, right, side_opt, sorter); resize_output(result, self.sizes()); - // we have two inputs to set right, pre_check checks that they aren't set to opposites bool is_right = side_opt ? *side_opt == "right" : right; if (self.numel() == 0) { return result; } - // for non-contiguous result tensors, we write the output to a contiguous copy so we can later copy back, maintaing the original result tensor Tensor out = result; if (!result.is_contiguous()) { out = result.contiguous(); @@ -221,38 +95,12 @@ Tensor& searchsorted_out_cpu( dispatch(out, final_input, final_boundaries, out_int32, is_right, final_sorter); } - // if result is non-contiguous, we wrote the answer to a copied version, so we copy back to the original result tensor if (!result.is_contiguous()) { result.copy_(out); } return result; } -Tensor searchsorted_cpu( - const Tensor& sorted_sequence, - const Tensor& self, - bool out_int32, - bool right, - const c10::optional side_opt, - const c10::optional& sorter_opt) { - ScalarType scalar_type = out_int32 ? ScalarType::Int : ScalarType::Long; - c10::TensorOptions options = TensorOptions().device(self.options().device()).dtype(scalar_type); - Tensor result = at::empty({0}, options, MemoryFormat::Contiguous); - at::native::searchsorted_out_cpu(sorted_sequence, self, out_int32, right, side_opt, sorter_opt, result); - return result; -} - -Tensor searchsorted_cpu( - const Tensor& sorted_sequence, - const Scalar& self, - bool out_int32, - bool right, - const c10::optional side_opt, - const c10::optional& sorter_opt) { - const Tensor& scalar_tensor = searchsorted_scalar_tensor(self, sorted_sequence.device()); - return searchsorted_cpu(sorted_sequence, scalar_tensor, out_int32, right, side_opt, sorter_opt); -} - Tensor& bucketize_out_cpu(const Tensor& self, const Tensor& boundaries, bool out_int32, bool right, Tensor& result) { TORCH_CHECK(boundaries.dim() == 1, "boundaries tensor must be 1 dimension, but got dim(", boundaries.dim(), ")"); at::native::searchsorted_out_cpu(boundaries, self, out_int32, right, nullopt, nullopt, result); @@ -303,42 +151,6 @@ def bucketize(x: common_types.ConsistentTensorType, weights: Optional[tf.Tensor] = None, elementwise: bool = False, name: Optional[str] = None) -> common_types.ConsistentTensorType: - """Returns a bucketized column, with a bucket index assigned to each input. - Args: - x: A numeric input `Tensor` or `CompositeTensor` whose values should be - mapped to buckets. For a `CompositeTensor` only non-missing values will - be included in the quantiles computation, and the result of `bucketize` - will be a `CompositeTensor` with non-missing values mapped to buckets. If - elementwise=True then `x` must be dense. - num_buckets: Values in the input `x` are divided into approximately - equal-sized buckets, where the number of buckets is `num_buckets`. - epsilon: (Optional) Error tolerance, typically a small fraction close to - zero. If a value is not specified by the caller, a suitable value is - computed based on experimental results. For `num_buckets` less than 100, - the value of 0.01 is chosen to handle a dataset of up to ~1 trillion input - data values. If `num_buckets` is larger, then epsilon is set to - (1/`num_buckets`) to enforce a stricter error tolerance, because more - buckets will result in smaller range for each bucket, and so we want the - boundaries to be less fuzzy. See analyzers.quantiles() for details. - weights: (Optional) Weights tensor for the quantiles. Tensor must have the - same shape as x. - elementwise: (Optional) If true, bucketize each element of the tensor - independently. - name: (Optional) A name for this operation. - Returns: - A `Tensor` of the same shape as `x`, with each element in the - returned tensor representing the bucketized value. Bucketized value is - in the range [0, actual_num_buckets). Sometimes the actual number of buckets - can be different than num_buckets hint, for example in case the number of - distinct values is smaller than num_buckets, or in cases where the - input values are not uniformly distributed. - NaN values are mapped to the last bucket. Values with NaN weights are - ignored in bucket boundaries calculation. - Raises: - TypeError: If num_buckets is not an int. - ValueError: If value of num_buckets is not > 1. - ValueError: If elementwise=True and x is a `CompositeTensor`. - """ with tf.compat.v1.name_scope(name, 'bucketize'): if not isinstance(num_buckets, int): raise TypeError('num_buckets must be an int, got %s' % type(num_buckets)) @@ -433,9 +245,11 @@ paddle.bucketize( 测试需要考虑的 case 如下: -- 数值结果的一致性,使用 numpy 作为参考标准 +- 输出数值结果的一致性,使用 numpy 作为参考标准 - 参数 `right` 为 True 和 False 时输出的正确性 -- 参数 `out_int32` 为 True 和 False 时 dtype 输出的正确性; +- 参数 `out_int32` 为 True 和 False 时 dtype 输出的正确性 +- 参数 `x` 类型的正确性,若类型不为 Tensor 则抛出异常 +- 参数 `sorted_sequence` 的维度正确性,该 API 只针对 `sorted_sequence` 是一维的情况,所以对于输入需要约束 - 未输入 `right` 时的输出正确性; - 未输入 `out_int32` 时的输出正确性; From 0e8bdfda8dc95ba274a2ead45a674b7c0ea3d7de Mon Sep 17 00:00:00 2001 From: PommesPeter <434596665@qq.com> Date: Tue, 21 Feb 2023 15:11:02 +0800 Subject: [PATCH 4/7] [Doc] Added rfc design docs --- rfcs/APIs/20230221_api_design_for_polor.md | 226 +++++++++++++++++++++ 1 file changed, 226 insertions(+) create mode 100644 rfcs/APIs/20230221_api_design_for_polor.md diff --git a/rfcs/APIs/20230221_api_design_for_polor.md b/rfcs/APIs/20230221_api_design_for_polor.md new file mode 100644 index 000000000..417a8cc28 --- /dev/null +++ b/rfcs/APIs/20230221_api_design_for_polor.md @@ -0,0 +1,226 @@ +# paddle.polar 设计文档 + +| API 名称 | paddle.polar | +| ------------ | ---------------------------------------- | +| 提交作者 | PommesPeter | +| 提交时间 | 2023-02-21 | +| 版本号 | V1.0 | +| 依赖飞桨版本 | develop | +| 文件名 | 20220709_api_design_for_polar.md | + +# 一、概述 + +## 1、相关背景 + +为了提升飞桨 API 丰富度,支持科学计算相关 API,Paddle 需要扩充 API `paddle.polar`。 + +## 2、功能目标 + +增加 API `paddle.polar`,通过输入模和相位角,`elementwise` 构造复数 tensor。方便计算极坐标系下的运算。 + +## 3、意义 + +为 Paddle 增加极坐标和复数的计算函数,丰富 `paddle` 中科学计算相关的 API。 + +# 二、飞桨现状 + +- 目前 Paddle 缺少 `polar` API,但是存在 `paddle.complex`,参考其他框架可以发现,Paddle 没有专门针对极坐标系下进行计算的 api,无法构建极坐标,直接使用 `paddle.complex` 代码不够清晰易读。 +- 该 API 的实现及测试主要参考目前 Paddle 中含有的 `paddle.complex`。 + +# 三、业内方案调研 + +## PyTorch + +PyTorch 中有 `torch.polar` 的API,详细参数为 `torch.polar(abs, angle, *, out=None) → Tensor`。 + +在 PyTorch 中的介绍为: + +> Constructs a complex tensor whose elements are Cartesian coordinates corresponding to the polar coordinates with absolute value `abs` and angle `angle`. + +在实现方法上,PyTorch 是通过 C++ API 组合实现的,[代码位置](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/TensorFactories.cpp#L190-L251) + +实现代码: + +```cpp +// ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ complex / polar ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +void complex_check_floating(const Tensor& a, const Tensor& b) { + TORCH_CHECK((a.scalar_type() == kFloat || a.scalar_type() == kDouble || a.scalar_type() == kHalf) && + (b.scalar_type() == kFloat || b.scalar_type() == kDouble || b.scalar_type() == kHalf), + "Expected both inputs to be Half, Float or Double tensors but got ", + a.scalar_type(), " and ", b.scalar_type()); +} + +void complex_check_dtype( + const Tensor& result, + const Tensor& a, + const Tensor& b) { + complex_check_floating(a, b); + TORCH_CHECK(a.scalar_type() == b.scalar_type(), + "Expected object of scalar type ", a.scalar_type(), + " but got scalar type ", b.scalar_type(), " for second argument"); + TORCH_CHECK(result.scalar_type() == toComplexType(a.scalar_type()), + "Expected object of scalar type ", toComplexType(a.scalar_type()), + " but got scalar type ", result.scalar_type(), + " for argument 'out'"); +} + +Tensor& complex_out(const Tensor& real, const Tensor& imag, Tensor& result) { + complex_check_dtype(result, real, imag); + auto iter = TensorIteratorConfig() + .add_output(result) + .add_input(real) + .add_input(imag) + .check_all_same_dtype(false) + .build(); + complex_stub(iter.device_type(), iter); + return result; +} + +Tensor complex(const Tensor& real, const Tensor& imag) { + complex_check_floating(real, imag); + c10::TensorOptions options = real.options(); + options = options.dtype(toComplexType(real.scalar_type())); + Tensor result = at::empty(0, options); + return at::complex_out(result, real, imag); +} + +Tensor& polar_out(const Tensor& abs, const Tensor& angle, Tensor& result) { + complex_check_dtype(result, abs, angle); + auto iter = TensorIteratorConfig() + .add_output(result) + .add_input(abs) + .add_input(angle) + .check_all_same_dtype(false) + .build(); + polar_stub(iter.device_type(), iter); + return result; +} + +Tensor polar(const Tensor& abs, const Tensor& angle) { + complex_check_floating(abs, angle); + c10::TensorOptions options = abs.options(); + options = options.dtype(toComplexType(abs.scalar_type())); + Tensor result = at::empty(0, options); + return at::polar_out(result, abs, angle); +} +} +``` + +参数表: + +- abs:复数张量的绝对值。必须为 float 或 double。 +- angle:复数张量的角度。数据类型必须与abs相同。 +- out:如果输入为 torch.float32,则必须为 torch.complex64。如果输入为 torch.float64,则必须为 torch.complex128。 + +## SciPy + +实现方法上,Scipy 是通过 Python API 的方式组合实现的,[代码位置](https://github.com/scipy/scipy/blob/v1.10.1/scipy/linalg/_decomp_polar.py#L8-L111) + +代码实现: +```python +def polar(a, side="right"): + if side not in ['right', 'left']: + raise ValueError("`side` must be either 'right' or 'left'") + a = np.asarray(a) + if a.ndim != 2: + raise ValueError("`a` must be a 2-D array.") + + w, s, vh = svd(a, full_matrices=False) + u = w.dot(vh) + if side == 'right': + # a = up + p = (vh.T.conj() * s).dot(vh) + else: + # a = pu + p = (w * s).dot(w.T.conj()) + return u, p +``` + +参数表: + +- Parameters: + - a: (m, n) array_like + The array to be factored. + - side: {‘left’, ‘right’}, optional + Determines whether a right or left polar decomposition is computed. If side is “right”, then a = up. If side is “left”, then a = pu. The default is “right”. + +- Returns: + - u: (m, n) ndarray + If a is square, then u is unitary. If m > n, then the columns of a are orthonormal, and if m < n, then the rows of u are orthonormal. + - p: ndarray + p is Hermitian positive semidefinite. If a is nonsingular, p is positive definite. The shape of p is (n, n) or (m, m), depending on whether side is “right” or “left”, respectively. + +# 四、对比分析 + +## 共同点 + +- 都能通过输入模和相位角,`elementwise` 构造复数 tensor。方便计算极坐标系下的运算。 + +## 不同点 + +- PyTorch 是在 C++ API 基础上实现,使用 Python 调用 C++ 对应的接口。 +- Scipy 则是通过 Python API 直接实现其对应的功能。 +- Tensorflow 有 `a`、`side` 等参数的设置,可调整的程度更高。 + +# 五、设计思路与实现方案 + +## 命名与参数设计 + +添加 API + +```python +paddle.polar( + abs: Tensor, + angle: Tensor, + name: str=None +) +``` + +## 底层OP设计 + +使用已有的 API 组合实现,不再单独设计 OP。 + +需要注意:如果输入是 torch.float32,则必须是 torch.complex64。如果输入是 torch.float64,则必须是 torch.complex128。 + +## API实现方案 + +该 API 实现于 `python/paddle/tensor/creation.py` + +通过调研发现,计算该极坐标可以使用复数计算,Paddle 本身已实现 `paddle.complex`,可利用已有 API 实现。代入公式: + +$$ +\text{out} = \text{abs}\cdot\cos(\text{angle}) + \text{abs}\cdot\sin(\text{angle})\cdot j +$$ + +即可得到对应模和相位角的极坐标以及所对应的笛卡尔坐标。 + +随后,Paddle 中已有 `complex` API 的具体实现逻辑,位于 `python/paddle/tensor/creation.py` 下的 `complex` 函数中,因此只需要调用其函数构造复数即可。 + +# 六、测试和验收的考量 + +测试需要考虑的 case 如下: + +- 输出数值结果的一致性和数据类型是否正确,使用 pytorch 或 scipy 作为参考标准 +- 参数 `abs` 的数据类型准确性判断 +- 参数 `angle` 的数据类型准确性判断、 + +# 七、可行性分析和排期规划 + +方案主要依赖现有 Paddle API 组合而成,且依赖的 `paddle.complex` 已经在 Paddle repo 的 [python/paddle/tensor/search.py](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/tensor/creation.py#L2160-L2209)。工期上可以满足在当前版本周期内开发完成。 + +# 八、影响面 + +新增 API,对其他模块无有影响 + +# 名词解释 + +无 + +# 附件及参考资料 + +[torch.polar](https://pytorch.org/docs/stable/generated/torch.polar.html) + +[scipy.linalg.polar](https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.polar.html) + +[paddle.complex](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/tensor/creation.py#L2160-L2209) \ No newline at end of file From c04d6576a7029e54b5a4da1193e002576b9a7cd6 Mon Sep 17 00:00:00 2001 From: PommesPeter <434596665@qq.com> Date: Tue, 21 Feb 2023 15:55:25 +0800 Subject: [PATCH 5/7] [Doc] Deleted unused rfc design docs --- .../APIs/20220709_api_design_for_bucketize.md | 284 ------------------ 1 file changed, 284 deletions(-) delete mode 100644 rfcs/APIs/20220709_api_design_for_bucketize.md diff --git a/rfcs/APIs/20220709_api_design_for_bucketize.md b/rfcs/APIs/20220709_api_design_for_bucketize.md deleted file mode 100644 index 85e5c202c..000000000 --- a/rfcs/APIs/20220709_api_design_for_bucketize.md +++ /dev/null @@ -1,284 +0,0 @@ -# paddle.bucketize 设计文档 - -| API 名称 | paddle.bucketize | -| ------------ | ---------------------------------------- | -| 提交作者 | PommesPeter | -| 提交时间 | 2022-07-09 | -| 版本号 | V1.0 | -| 依赖飞桨版本 | develop | -| 文件名 | 20220709_api_design_for_bucketize.md | - -# 一、概述 - -## 1、相关背景 - -为了提升飞桨 API 丰富度,支持科学计算相关 API,Paddle 需要扩充 API `paddle.bucketize`。 - -## 2、功能目标 - -增加 API `paddle.bucketize`,用于根据 `sorted_sequence` 序列计算出 `x` 中每个元素的区间索引。 - -## 3、意义 - -为 Paddle 增加神经网络相关的距离计算函数,丰富 `paddle` 中科学计算相关的 API。 - -# 二、飞桨现状 - -- 目前 Paddle 缺少 `bucketize` API,但是存在 `searchsorted` API,参考其他框架可以发现,没有专门针对一维 `sorted_sequence` 进行计算的 api,直接使用 `searchsorted` API 导致花费时间在判断维度上。 -- 该 API 的实现及测试主要参考目前 Paddle 中含有的 `paddle.searchsorted`。 - -# 三、业内方案调研 - -## PyTorch - -PyTorch 中有 `torch.bucketize` 的API,详细参数为 `torch.bucketize(input, boundaries, *, out_int32=False, right=False, out=None) → Tensor`。 - -在 PyTorch 中的介绍为: - -> Returns the indices of the buckets to which each value in the `input` belongs, where the boundaries of the buckets are set by `boundaries`. Return a new tensor with the same size as `input`. If `right` is False (default), then the left boundary is closed. More formally, the returned index satisfies the following rules: -> -> | `right` | *returned index satisfies* | -> | ------- | --------------------------------------------------------- | -> | False | `boundaries[i-1] < input[m][n]...[l][x] <= boundaries[i]` | -> | True | `boundaries[i-1] <= input[m][n]...[l][x] < boundaries[i]` | - -在实现方法上,PyTorch 是通过 C++ API 组合实现的,[代码位置](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/Bucketization.cpp) - -实现代码: -```cpp -namespace at { -namespace native { - -namespace { - -// ... - -} - -// ... - -Tensor& searchsorted_out_cpu( - const Tensor& sorted_sequence, - const Tensor& self, - bool out_int32, - bool right, - const c10::optional side_opt, - const c10::optional& sorter_opt, - Tensor& result) { - - c10::MaybeOwned sorter_maybe_owned = at::borrow_from_optional_tensor(sorter_opt); - const Tensor& sorter = *sorter_maybe_owned; - searchsorted_pre_check(sorted_sequence, self, result, out_int32, right, side_opt, sorter); - resize_output(result, self.sizes()); - - bool is_right = side_opt ? *side_opt == "right" : right; - - if (self.numel() == 0) { - return result; - } - - Tensor out = result; - if (!result.is_contiguous()) { - out = result.contiguous(); - } - if (sorted_sequence.is_contiguous() && self.is_contiguous() && sorted_sequence.dtype() == self.dtype() && sorter.is_contiguous()) { - dispatch(out, self, sorted_sequence, out_int32, is_right, sorter); - } - else { - Tensor trimmed_input; - Tensor trimmed_boundaries; - Tensor trimmed_sorter; - searchsorted_maybe_trim_input_tensors(trimmed_input, trimmed_boundaries, trimmed_sorter, self, sorted_sequence, sorter); - const Tensor& final_input = trimmed_input.defined() ? trimmed_input : self; - const Tensor& final_boundaries = trimmed_boundaries.defined() ? trimmed_boundaries : sorted_sequence; - const Tensor& final_sorter = trimmed_sorter.defined() ? trimmed_sorter : sorter; - dispatch(out, final_input, final_boundaries, out_int32, is_right, final_sorter); - } - - if (!result.is_contiguous()) { - result.copy_(out); - } - return result; -} - -Tensor& bucketize_out_cpu(const Tensor& self, const Tensor& boundaries, bool out_int32, bool right, Tensor& result) { - TORCH_CHECK(boundaries.dim() == 1, "boundaries tensor must be 1 dimension, but got dim(", boundaries.dim(), ")"); - at::native::searchsorted_out_cpu(boundaries, self, out_int32, right, nullopt, nullopt, result); - return result; -} - -Tensor bucketize_cpu(const Tensor& self, const Tensor& boundaries, bool out_int32, bool right) { - ScalarType scalar_type = out_int32 ? ScalarType::Int : ScalarType::Long; - c10::TensorOptions options = TensorOptions().device(self.options().device()).dtype(scalar_type); - Tensor result = at::empty({0}, options, MemoryFormat::Contiguous); - at::native::bucketize_out_cpu(self, boundaries, out_int32, right, result); - return result; -} - -Tensor bucketize_cpu(const Scalar& self, const Tensor& boundaries, bool out_int32, bool right) { - return bucketize_cpu(searchsorted_scalar_tensor(self, boundaries.device()), boundaries, out_int32, right); -} - -}} // namespace at::native -``` - -参数表: - -- input(Tensor or Scalar):N-D Tensor, - -- boundaries(Tensor):,1-D Tensor,必须包含一个单调递增的序列。 - -- out_int32(bool,optional):指明输出数据类型。如果是True,则输出torch.int32;如果是False,则输出torch.int64。默认是False。 - -- right(bool,optional):如果为 False,返回找到的第一个合适的位置; 如果为 True,返回最后一个这样的索引; 如果没有找到合适的索引,则返回0作为非数值值(例如,Nan,Inf)或边界的大小(通过最后一个索引)。 - - 换句话说,如果为 False,则从边界获取输入中每个值的下界索引; 如果为 True,则获取上界索引。 默认值为 False。 - -- out(Tensor,optional):输出的Tensor必须和输出的Tensor大小相同。 - -## Tensorflow - -Tensorflow 中有 `tf.transform.bucketize` API,具体参数为 `tft.bucketize( x: common_types.ConsistentTensorType, num_buckets: int, epsilon: Optional[float] = None, weights: Optional[tf.Tensor] = None, elementwise: bool = False, name: Optional[str] = None) -> common_types.ConsistentTensorType` - -在实现方法上,Tensorflow 是通过 Python API 的方式组合实现的,[代码位置](https://github.com/tensorflow/transform/blob/d0c3349403120a2cf1177c111b674c07e9b38398/tensorflow_transform/mappers.py#L1690-L1770) - -代码实现: -```python -@common.log_api_use(common.MAPPER_COLLECTION) -def bucketize(x: common_types.ConsistentTensorType, - num_buckets: int, - epsilon: Optional[float] = None, - weights: Optional[tf.Tensor] = None, - elementwise: bool = False, - name: Optional[str] = None) -> common_types.ConsistentTensorType: - with tf.compat.v1.name_scope(name, 'bucketize'): - if not isinstance(num_buckets, int): - raise TypeError('num_buckets must be an int, got %s' % type(num_buckets)) - - if num_buckets < 1: - raise ValueError('Invalid num_buckets %d' % num_buckets) - - if isinstance(x, (tf.SparseTensor, tf.RaggedTensor)) and elementwise: - raise ValueError( - 'bucketize requires `x` to be dense if `elementwise=True`') - - if epsilon is None: - # See explanation in args documentation for epsilon. - epsilon = min(1.0 / num_buckets, 0.01) - - x_values = tf_utils.get_values(x) - bucket_boundaries = analyzers.quantiles( - x_values, - num_buckets, - epsilon, - weights, - reduce_instance_dims=not elementwise) - - if not elementwise: - return apply_buckets(x, bucket_boundaries) - - num_features = tf.math.reduce_prod(x.get_shape()[1:]) - bucket_boundaries = tf.reshape(bucket_boundaries, [num_features, -1]) - x_reshaped = tf.reshape(x, [-1, num_features]) - bucketized = [] - for idx, boundaries in enumerate(tf.unstack(bucket_boundaries, axis=0)): - bucketized.append(apply_buckets(x_reshaped[:, idx], - tf.expand_dims(boundaries, axis=0))) - return tf.reshape(tf.stack(bucketized, axis=1), - [-1] + x.get_shape().as_list()[1:]) -``` - -参数表: - -| Args | | -| :------------ | ------------------------------------------------------------ | -| `x` | 一个数字输入的 `Tensor`或`CompositeTensor`,其值应被映射到桶中。对于一个`CompositeTensor`,只有非缺失的值才会被包括在定量计算中,`bucketize`的结果将是一个`CompositeTensor`,其非缺失的值被映射到桶中。如果 elementwise=True,那么`x`必须是密集的。 | -| `num_buckets` | 输入的`x`中的值被分成大小大致相等的桶,桶的数量是`num_buckets`。 | -| `epsilon` | (可选)误差容限,通常是一个接近于零的小部分。如果调用者没有指定一个值,将根据实验结果计算出一个合适的值。对于小于 100 的`num_buckets`,选择 0.01 的值来处理高达约 1 万亿的输入数据值的数据集。如果`num_buckets`更大,那么 epsilon 被设置为 (1 / `num_buckets`) 以执行更严格的误差容忍度,因为更多的桶将导致每个桶的范围更小,所以我们希望边界不那么模糊。详情见analyzers.quantiles()。 | -| `weights` | (可选)用于定量的权重张量。张量必须与 x 具有相同的形状。 | -| `elementwise` | (可选)如果为真,对 tensor 的每个元素进行独立的桶化。 | -| `name` | (可选) 该操作的名称。 | - -# 四、对比分析 - -## 共同点 - -- 都能实现根据 `sorted_sequence` 计算出输入 `x` 中每个元素所对应的区间索引 - -## 不同点 - -- PyTorch 是在 C++ API 基础上实现,使用 Python 调用 C++ 对应的接口。 -- PyTorch 输入参数比较简单,可选的操作比较少。 -- Tensorflow 则是通过 Python API 直接实现其对应的功能。 -- Tensorflow 有 `num_buckets`、`epsilon`、`weights` 等参数的设置,可调整的程度更高。 - - -# 五、设计思路与实现方案 - -## 命名与参数设计 - -添加 API - -```python -paddle.bucketize( - x: Tensor, - sorted_sequence: Tensor, - out_int32: bool=False, - right: bool=False, - name: str=None -) -``` - -## 底层 OP 设计 - -使用已有的 API 组合实现,不再单独设计 OP。 - -## API 实现方案 - -该 API 实现于 `python/paddle/tensor/search.py` - -首先,`bucketize` 主要针对一维情况下的 `sorted_sequence`,所以需要对输入的维度大小进行判断,通过断言进行判断,当输入维度不为 1 时触发 `AssertError`。 - -随后,Paddle 中已有 `searchsorted` API 的具体实现逻辑,位于 `python/paddle/tensor/search.py` 下的 `searchsorted` 函数中,因此只需要调用其函数即可。 - -# 六、测试和验收的考量 - -测试需要考虑的 case 如下: - -- 输出数值结果的一致性,使用 numpy 作为参考标准 -- 参数 `right` 为 True 和 False 时输出的正确性 -- 参数 `out_int32` 为 True 和 False 时 dtype 输出的正确性 -- 参数 `x` 类型的正确性,若类型不为 Tensor 则抛出异常 -- 参数 `sorted_sequence` 的维度正确性,该 API 只针对 `sorted_sequence` 是一维的情况,所以对于输入需要约束 -- 未输入 `right` 时的输出正确性; -- 未输入 `out_int32` 时的输出正确性; - -# 七、可行性分析和排期规划 - -方案主要依赖现有 Paddle API 组合而成,且依赖的 `paddle.searchsorted` 已经在 Paddle repo 的 [python/paddle/tensor/search.py](https://github.com/PaddlePaddle/Paddle/blob/release/2.3/python/paddle/tensor/search.py#L910)。工期上可以满足在当前版本周期内开发完成。 - -# 八、影响面 - -新增 API,对其他模块是否有影响 - -# 名词解释 - -无 - -# 附件及参考资料 - -## PyTorch - -[torch.bucketize](https://pytorch.org/docs/stable/generated/torch.bucketize.html) - -[torch.searchsorted](https://pytorch.org/docs/stable/generated/torch.searchsorted.html?highlight=searchsorted#torch.searchsorted) - -## tensorflow - -[tf.transform.bucketize](https://www.tensorflow.org/tfx/transform/api_docs/python/tft/bucketize) - -[tf.searchsorted](https://www.tensorflow.org/api_docs/python/tf/searchsorted) - -## Paddle - -[paddle.searchsorted](https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/searchsorted_cn.html) \ No newline at end of file From 5c62c3aeb5ce68d9592039ebeaee815128cb7da2 Mon Sep 17 00:00:00 2001 From: PommesPeter <434596665@qq.com> Date: Tue, 21 Feb 2023 16:00:36 +0800 Subject: [PATCH 6/7] [Doc] Deleted a errors --- rfcs/APIs/20230221_api_design_for_polor.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/APIs/20230221_api_design_for_polor.md b/rfcs/APIs/20230221_api_design_for_polor.md index 417a8cc28..5ed707892 100644 --- a/rfcs/APIs/20230221_api_design_for_polor.md +++ b/rfcs/APIs/20230221_api_design_for_polor.md @@ -211,7 +211,7 @@ $$ # 八、影响面 -新增 API,对其他模块无有影响 +新增 API,对其他模块无影响 # 名词解释 From 1bf37cdc7fc39028ff7426b10e0cb85e3217bd90 Mon Sep 17 00:00:00 2001 From: PommesPeter <434596665@qq.com> Date: Tue, 21 Feb 2023 16:56:34 +0800 Subject: [PATCH 7/7] [Doc] Updated API rfc doc --- rfcs/APIs/20230221_api_design_for_polor.md | 44 +--------------------- 1 file changed, 2 insertions(+), 42 deletions(-) diff --git a/rfcs/APIs/20230221_api_design_for_polor.md b/rfcs/APIs/20230221_api_design_for_polor.md index 5ed707892..c3a5eaf0d 100644 --- a/rfcs/APIs/20230221_api_design_for_polor.md +++ b/rfcs/APIs/20230221_api_design_for_polor.md @@ -113,44 +113,6 @@ Tensor polar(const Tensor& abs, const Tensor& angle) { - angle:复数张量的角度。数据类型必须与abs相同。 - out:如果输入为 torch.float32,则必须为 torch.complex64。如果输入为 torch.float64,则必须为 torch.complex128。 -## SciPy - -实现方法上,Scipy 是通过 Python API 的方式组合实现的,[代码位置](https://github.com/scipy/scipy/blob/v1.10.1/scipy/linalg/_decomp_polar.py#L8-L111) - -代码实现: -```python -def polar(a, side="right"): - if side not in ['right', 'left']: - raise ValueError("`side` must be either 'right' or 'left'") - a = np.asarray(a) - if a.ndim != 2: - raise ValueError("`a` must be a 2-D array.") - - w, s, vh = svd(a, full_matrices=False) - u = w.dot(vh) - if side == 'right': - # a = up - p = (vh.T.conj() * s).dot(vh) - else: - # a = pu - p = (w * s).dot(w.T.conj()) - return u, p -``` - -参数表: - -- Parameters: - - a: (m, n) array_like - The array to be factored. - - side: {‘left’, ‘right’}, optional - Determines whether a right or left polar decomposition is computed. If side is “right”, then a = up. If side is “left”, then a = pu. The default is “right”. - -- Returns: - - u: (m, n) ndarray - If a is square, then u is unitary. If m > n, then the columns of a are orthonormal, and if m < n, then the rows of u are orthonormal. - - p: ndarray - p is Hermitian positive semidefinite. If a is nonsingular, p is positive definite. The shape of p is (n, n) or (m, m), depending on whether side is “right” or “left”, respectively. - # 四、对比分析 ## 共同点 @@ -160,8 +122,6 @@ def polar(a, side="right"): ## 不同点 - PyTorch 是在 C++ API 基础上实现,使用 Python 调用 C++ 对应的接口。 -- Scipy 则是通过 Python API 直接实现其对应的功能。 -- Tensorflow 有 `a`、`side` 等参数的设置,可调整的程度更高。 # 五、设计思路与实现方案 @@ -201,13 +161,13 @@ $$ 测试需要考虑的 case 如下: -- 输出数值结果的一致性和数据类型是否正确,使用 pytorch 或 scipy 作为参考标准 +- 输出数值结果的一致性和数据类型是否正确,使用 pytorch 作为参考标准 - 参数 `abs` 的数据类型准确性判断 - 参数 `angle` 的数据类型准确性判断、 # 七、可行性分析和排期规划 -方案主要依赖现有 Paddle API 组合而成,且依赖的 `paddle.complex` 已经在 Paddle repo 的 [python/paddle/tensor/search.py](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/tensor/creation.py#L2160-L2209)。工期上可以满足在当前版本周期内开发完成。 +方案主要依赖现有 Paddle API 组合而成,且依赖的 `paddle.complex` 已经在 Paddle repo 的 [python/paddle/tensor/creation.py](https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/tensor/creation.py#L2160-L2209)。工期上可以满足在当前版本周期内开发完成。 # 八、影响面