Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[geometric]Move graph-related incubate api to geometric #44970

Merged
merged 20 commits into from
Aug 29, 2022

Conversation

DesmonDay
Copy link
Contributor

@DesmonDay DesmonDay commented Aug 8, 2022

PR types

New features

PR changes

APIs

Describe

Move graph-related API to paddle.geometric.

from paddle import _C_ops


def graph_reindex(x,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里计划加一个 heter_graph_reindex,把同构图和异构图的 reindex 区分开来。heter_graph_reindex 就输入 List of Tensor,然后直接在 API 层直接concat 成 Tensor。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已增加

out, tmp = _C_ops.segment_pool(data, segment_ids, 'pooltype', "SUM")
return out

check_variable_and_dtype(data, "X",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

segment 目前是否支持float16了?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

原子操作支持了就可以了,我增加一下。

python/paddle/geometric/math.py Show resolved Hide resolved
python/paddle/geometric/sampling/__init__.py Outdated Show resolved Hide resolved
reindex_src, reindex_dst, out_nodes = \
_C_ops.graph_reindex(x, neighbors, count, value_buffer, index_buffer,
"flag_buffer_hashtable", flag_buffer_hashtable)
return reindex_src, reindex_dst, out_nodes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里有个因为,目前不在区分新、老动态图的目的是什么了?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为 graph_reindex 这些操作可能没有反向,框架侧同学没有增加支持新动态图。

be int32, and should be filled with -1.
index_buffer (Tensor|None): Index buffer for hashtable. The data type should
be int32, and should be filled with -1.
flag_buffer_hashtable (bool): Whether to use buffer for hashtable to speed up.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的变量名字可以再想想,既然都是bool类型了,名字加上了一个flag比较奇怪,has_buffer_hastable ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改成has_buffer_hastable

python/paddle/geometric/sampling/graph_reindex.py Outdated Show resolved Hide resolved
check_variable_and_dtype(count, "Count", ("int32"), "graph_reindex")

if flag_buffer_hashtable:
check_variable_and_dtype(value_buffer, "HashTable_Value", ("int32"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里只是支持int32为啥了? 看代码逻辑,这里应该是需要支持int64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这块我目前在 C++层面是只设置支持了 int32。因为担心如果一个图的节点数量过多,则申请的 hashtable 空间会很大,此时其实也不适合用这种方式加速了。不过后面可以找时间把这块的支持放上去。这次先暂时略过。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: 增加 int64支持。

from paddle import _C_ops


def sample_neighbors(row,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的API名字,是否要考虑到后续的API的接入了?这个名字看起来一种GraphSage的采样的方式,这里需要讨论一下

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

参考 dgl,把这个文件名改成了 neighbors.py

sample_size (int): The number of neighbors we need to sample. Default value is
-1, which means returning all the neighbors of the input nodes.
return_eids (bool): Whether to return eid information of sample edges. Default is False.
flag_perm_buffer (bool): Using the permutation for fisher-yates sampling in GPU. Default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已修改

@ZHUI ZHUI self-requested a review August 17, 2022 11:22
ZHUI
ZHUI previously approved these changes Aug 17, 2022
python/paddle/geometric/sampling/neighbors.py Show resolved Hide resolved
has_perm_buffer=False,
name=None):
"""
Graph Sample Neighbors API.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里整体上是新增了API,文档先预览一下,粘贴到PR描述中。

from paddle.fluid.layer_helper import LayerHelper
from paddle.fluid.framework import _non_static_mode
from paddle.fluid.data_feeder import check_variable_and_dtype
from paddle.fluid import core
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from paddle.fluid import core 用到了吗?

geometric 目录整体,可以用 pylint 扫一遍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 27 to 32
input_nodes,
eids=None,
perm_buffer=None,
sample_size=-1,
return_eids=False,
has_perm_buffer=False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的 参数顺序,是如何考虑,perm_buffer是一个较高使用频率的参数吗》

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不是,但是不知道参数顺序有么有要求 Tensor 放前面,attribute 放后面?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已调整参数顺序

'graph_reindex',
'heter_graph_reindex',
'khop_sampler',
'sample_neighbors',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感觉这里的命名,不是很对齐

@@ -381,15 +379,14 @@ def send_uv(x, y, src_index, dst_index, message_op="add", name=None):
src_index (Tensor): An 1-D tensor, and the available data type is int32, int64.
dst_index (Tensor): An 1-D tensor, and should have the same shape as `src_index`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

send_ue_recv 的除法有潜在 除 0 错误。

部分api 注释中 code 部分,定义数据的重复代码比较多。

    if message_op == 'sub':
        message_op = 'add'
        y = -y
    if message_op == "div":
        message_op = 'mul'
        y = 1. / y

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

api 加到 __all__

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应该不加, api调用不会走到 send_recv.xxx 这一级别的。所以泽阳让我限制__all__等于空。

Comment on lines 146 to 152
def heter_graph_reindex(x,
neighbors,
count,
value_buffer=None,
index_buffer=None,
has_buffer_hashtable=False,
name=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def graph_reindex(x,
                  neighbors,
                  count,
                  value_buffer=None,
                  index_buffer=None,
                  has_buffer_hashtable=False,
                  name=None):
def heter_graph_reindex(x,
                        neighbors,
                        count,
                        value_buffer=None,
                        index_buffer=None,
                        has_buffer_hashtable=False,
                        name=None):

两个API统一成一个是否简洁

def reindex_graph(x,
                  neighbors,
                  count,
                  heter_graph=False,
                  value_buffer=None,
                  index_buffer=None,
                  has_buffer_hashtable=False,
                  name=None):

Comment on lines 60 to 63
perm_buffer (Tensor): Permutation buffer for fisher-yates sampling. If `has_perm_buffer`
is True, then `perm_buffer` should not be None. The data type should
be the same with `row`. Default is None.
has_perm_buffer (bool): Using the permutation for fisher-yates sampling in GPU. Default
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里可以用None 来判断是否使用,perm_buffer 减少一个参数?

ZHUI
ZHUI previously approved these changes Aug 25, 2022
Copy link
Collaborator

@ZHUI ZHUI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@ZHUI ZHUI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ZeyuChen ZeyuChen merged commit 8f657f7 into PaddlePaddle:develop Aug 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants