【Hackathon 5th No.32】为 Paddle 新增 tensor_split / hsplit / dsplit API #682

megemini · 2023-10-05T11:37:37Z

PR types

Others

PR changes

Docs

Description

【Hackathon 5th No.32】为 Paddle 新增 tensor_split / hsplit / dsplit API

请评审！

paddle-bot · 2023-10-05T11:37:41Z

你的PR提交成功，感谢你对开源项目的贡献!
请检查PR提交格式和内容是否完备，具体请参考示例和模版。
Your PR has been submitted. Thanks for your contribution!
Please check its format and content. For this, you can refer to Template and Demo.

zoooo0820 · 2023-10-10T03:49:51Z

rfcs/APIs/20231003_api_design_for_tensor_split.md

+            )
+    ```
+
+    **疑问**： 静态图目前调不通，`while starts < total_n` 似乎永远跳不出来，还请指教！


静态图下使用tensor的值进行控制流判断时，不能直接使用python的if/while/for等，需要使用专门的控制流API，如cond / while_loop等; 这里如非必要可以在非tensor类型下操作。

这里尝试过用 while_loop ，但是报错了，提示返回与输入长度对不上～不确定是我这里调用有问题还是API的bug，我再试一下～

zoooo0820 · 2023-10-10T03:58:46Z

rfcs/APIs/20231003_api_design_for_tensor_split.md

+> ```
+> 
+
+^*^ 注 : `Paddle` 的 `split` 函数签名为 `split(x, num_or_sections, axis=0, name=None)`，与上文中介绍的不一样，但并不影响后续的分析。


这里因为是飞桨现状, 最好展开介绍下paddle.split和上面的差异，如果行为和上面介绍的相似或一致，也可以直接指出来

辛苦也介绍下已有的APIpaddle.vsplit，其参数是对齐split还是tensor_split ；因为 vsplit / dsplit / hsplit 从观感上是一组API

OK，我尽快更新一下～

zoooo0820 · 2023-10-10T04:07:48Z

rfcs/APIs/20231003_api_design_for_tensor_split.md

+
+    - 参数列表
+    > x (Tensor) – 输入的一个 Tensor。数据类型支持：float32、float64、int32、int64。
+    > num_or_sections (Tensor|int|list|tuple) 


关于x, 理论上tensor操作类应该支持所有数据类型，这里因为受制于依赖的API可能有部分会缺失，目前官网文档中相比实际可能偏少，辛苦验证下实际支持的数据类型

num_or_sections ：需要说明下语义，这里应该是和paddle.split的核心差异所在，参数命名也可以考虑和语义对齐下

zoooo0820 · 2023-10-10T04:08:14Z

rfcs/APIs/20231003_api_design_for_tensor_split.md

+
+    - 参数列表
+    > x (Tensor) – 输入的一个 Tensor。数据类型支持：float32、float64、int32、int64。
+    > num_or_sections (Tensor|int|list|tuple) 


这两个API 同tensor_split

zoooo0820 · 2023-10-10T04:09:12Z

rfcs/APIs/20231003_api_design_for_tensor_split.md

+    - 返回值
+    > output (List of Tensors)
+
+另外，这几个接口均无需 `name` 参数，因为输出可能为多个 Tensor。


这里应该是可以加上name的，参考已有的vsplit / split

vsplit/split 确实有 name 参数，但是代码里面实际没有用到的：

https://github.com/PaddlePaddle/Paddle/blob/10a9e4eed0331249f59aa0bae7ebc7fcc3971729/python/paddle/tensor/manipulation.py#L1910-L2135

可以看到，vsplit 调用的 split，而 split 里面实际没有用到 name ～

所以这里才有的这个疑问。而且，最关键的问题是，如果有 name，那么这个 name 怎么赋值？因为返回的是一个 list，那么这个 name 是不是也要是 list？🤔

@megemini 这里还是默认就可以。参考下文档了解下name参数的作用哈，主要是可以指定名称，以替代自动生成的OP名前缀。可以在静态图模式下使用split设置name参数看看输出的差异

zoooo0820 · 2023-10-10T04:11:23Z

rfcs/APIs/20231003_api_design_for_tensor_split.md

+另一方面，`hsplit`、`dsplit`、`vsplit` 是一组功能类似的接口，`Paddle` 通过 `split` 接口实现了 `vsplit` 函数，因此，可以考虑与 `TensorFlow`、`Numpy` 相同的方式，使用 `split` 接口实现。
+
+# 五、设计思路与实现方案
+


在这里，建议明确强调下以下两点：

设计的tensor_split和已有的split的差异

dsplit / hsplit 的参数功能是对齐split 还是tensor_split

megemini · 2023-10-24T10:36:54Z

@zoooo0820

非常抱歉拖了这么久才更新，此次更新主要包括：

更新 split 与 tensor_split 的区别
将 split vsplit dsplit hsplit 归为一组，与 tensor_split 主要区别体现为
- int 分割，split 需要等分，tensor_split 可以不等分
- list|tuple 分割，split 不能越界，tensor_split 可以越界；split 可以有一个 -1，tensor_split 虽然也可以有 -1，但不是推断作用
由此，使用 tensor_split 的签名(indices_or_sections) 与 split 的(num_or_sections)做区别
删除了之前 tensor_split 中 indices_or_sections 为 Tensor 的情况，因为
- split 本身就不支持，这里与其对齐
- 如果支持 Tensor，则静态图中由于涉及到 Tensor 与数字的计算，目前没有找到好的实现方式，已经尝试过 for、while、while_loop、tensor.item() 转为整数、LayerHelper，或者全部用 tensor 的方式计算，都无法跑通静态图。这种 Tensor 与数字混合计算的方式怎么在静态图中使用？
更新了支持的数据类型，经验证，float16, bfloat16, float32, float64, int32, int64, uint8 可以使用 split，int16, complex64, complex128 不可以，后面还会进行详细测试
增加 name 参数

请审核～非常感谢！

zoooo0820 · 2023-11-06T06:28:15Z

rfcs/APIs/20231003_api_design_for_tensor_split.md

+
+另外，对于 `split` 函数与 `tensor_split` 的区别，这里引用 [Pytorch文档学习 TORCH.TENSOR_SPLIT](https://blog.csdn.net/Jamesgender/article/details/130559738) ：
+
+> 这个方法和 split 方法长得很像。他们的作用都是根据 indices_or_sections，把输入拆分成几个视图。区别在于：


这里split / tensor_split的一个重要差异，前者参数num_or_sections 表示数量或分片长度 ，但后者indices_or_sections 则表示数量或切分索引位置，这个语义上的差异会对API的使用有较大影响，是需要明确如何设计的。

嗯这里之前没说清楚，已更新～非常感谢！

zoooo0820 · 2023-11-06T06:30:42Z

rfcs/APIs/20231003_api_design_for_tensor_split.md

+分割参数为 `list|tuple`：
+  - `split`，包括对应的 `vsplit`, `dsplit`, `hsplit` 为一组 API，输入 `不能越界`，即，list 或 tuple 的长度不能超过输入 Tensor 待分割的维度的大小，且参数中可以有一个 `-1`。
+  - `tensor_split` 可以 `越界`，由此，分割参数中不能有 `-1`。
+


这里补充下参数num_or_sections以及 indice_or_sections`语义的差异吧

zoooo0820 · 2023-11-06T12:42:06Z

rfcs/APIs/20231003_api_design_for_tensor_split.md

+    - 返回值
+    > output (List of Tensors)
+
+另外，这几个接口均无需 `name` 参数，因为输出可能为多个 Tensor。


目前已经加上name了，这一句需要移除下

之前清理的时候遗漏了 ... ...

megemini · 2023-11-08T06:24:46Z

重新检查了一遍并作更新，请评审～ 😂

zoooo0820

LGTM

[Add] add Hackathon 5th No.32 pfc

4de72d0

paddle-bot bot added the contributor label Oct 5, 2023

Ligoml mentioned this pull request Oct 7, 2023

【PaddlePaddle Hackathon 5th】开源贡献个人挑战赛 PaddlePaddle/Paddle#57262

Open

luotao1 assigned luotao1 and zoooo0820 Oct 9, 2023

zoooo0820 reviewed Oct 10, 2023

View reviewed changes

megemini mentioned this pull request Oct 12, 2023

【Hackathon 5th No.33】为 Paddle 新增 atleast_1d / atleast_2d / atleast_3d API #679

Merged

megemini added 2 commits October 24, 2023 18:16

[Change] update tensor_split different from split

783d465

[Change] add name param

827e4db

megemini requested a review from zoooo0820 October 26, 2023 05:15

zoooo0820 reviewed Nov 6, 2023

View reviewed changes

[Update] num_or_sections vs indices_or_sections

e9eda08

megemini requested a review from zoooo0820 November 6, 2023 11:08

zoooo0820 reviewed Nov 6, 2023

View reviewed changes

megemini added 2 commits November 7, 2023 12:03

[Fix] fix name param describe

4aa532d

[Update] num_or_sections vs indices_or_sections

47f543f

megemini requested a review from zoooo0820 November 8, 2023 06:22

zoooo0820 approved these changes Nov 8, 2023

View reviewed changes

zoooo0820 merged commit 976cfd4 into PaddlePaddle:master Nov 8, 2023

megemini mentioned this pull request Nov 10, 2023

【Hackathon 5th No.32】为 Paddle 新增 tensor_split / hsplit / dsplit API -part PaddlePaddle/Paddle#58917

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【Hackathon 5th No.32】为 Paddle 新增 tensor_split / hsplit / dsplit API #682

【Hackathon 5th No.32】为 Paddle 新增 tensor_split / hsplit / dsplit API #682

megemini commented Oct 5, 2023

paddle-bot bot commented Oct 5, 2023

zoooo0820 Oct 10, 2023

megemini Oct 11, 2023

zoooo0820 Oct 10, 2023

megemini Oct 11, 2023

zoooo0820 Oct 10, 2023

zoooo0820 Oct 10, 2023

zoooo0820 Oct 10, 2023

megemini Oct 11, 2023

zoooo0820 Oct 12, 2023

zoooo0820 Oct 10, 2023

megemini commented Oct 24, 2023

zoooo0820 Nov 6, 2023

megemini Nov 6, 2023

zoooo0820 Nov 6, 2023

zoooo0820 Nov 6, 2023

megemini Nov 7, 2023

megemini commented Nov 8, 2023

zoooo0820 left a comment

		另一方面，`hsplit`、`dsplit`、`vsplit` 是一组功能类似的接口，`Paddle` 通过 `split` 接口实现了 `vsplit` 函数，因此，可以考虑与 `TensorFlow`、`Numpy` 相同的方式，使用 `split` 接口实现。

		# 五、设计思路与实现方案


		另外，对于 `split` 函数与 `tensor_split` 的区别，这里引用 [Pytorch文档学习 TORCH.TENSOR_SPLIT](https://blog.csdn.net/Jamesgender/article/details/130559738) ：

		> 这个方法和 split 方法长得很像。他们的作用都是根据 indices_or_sections，把输入拆分成几个视图。区别在于：

【Hackathon 5th No.32】为 Paddle 新增 tensor_split / hsplit / dsplit API #682

【Hackathon 5th No.32】为 Paddle 新增 tensor_split / hsplit / dsplit API #682

Conversation

megemini commented Oct 5, 2023

PR types

PR changes

Description

paddle-bot bot commented Oct 5, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

megemini commented Oct 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

megemini commented Nov 8, 2023

zoooo0820 left a comment

Choose a reason for hiding this comment