Add random walk based `AddRandomMetaPaths` as a faster alternative #5397

EdisonLeeeee · 2022-09-09T10:06:31Z

Description

This PR implements AddRandomMetaPaths as a faster alternative to generative metapaths. As mentioned in this issue, metapaths are sampled via multiple one-step random walks (walk length per node is 1). The implementation of a one-step random walk was inspired by that in Metapath2Vec

Feature

One can specify walks_per_node for the number of random walks per starting node in different metapaths. walks_per_node must be an integer or a list of integers. For the later, the length of walks_per_node must match the length of metapaths.

metapaths = [
    [("author", "paper"), ("paper", "author")],
    [("author", "paper"), ("paper", "venue"), ("venue", "paper"), ("paper", "author")],
]
>>> AddRandomMetaPaths(metapaths, walks_per_node=1)

>>> AddRandomMetaPaths(metapaths, walks_per_node=[10, 20])

Benchmark test

Time

from torch_geometric.datasets import AMiner
from torch_geometric.transforms import AddMetaPaths, AddRandomMetaPaths

data = AMiner(root="~/data/pygdata/AMiner")[0]
metapaths = [
    [("author", "paper"), ("paper", "author")],
    [("author", "paper"), ("paper", "venue"), ("venue", "paper"), ("paper", "author")],
]

AddRandomMetaPaths(on CPU and GPU)

data = AddRandomMetaPaths(metapaths)(data)
# cannot be finished in hours

AddRandomMetaPaths(on CPU)

%%time
data = AddRandomMetaPaths(metapaths)(data)
# 417 ms

Performance

Take the example here for comparison.

AddMetapaths

metapaths = [[('movie', 'actor'), ('actor', 'movie')],
             [('movie', 'director'), ('director', 'movie')]]
transform = T.AddMetaPaths(metapaths=metapaths, drop_orig_edges=True,
                           drop_unconnected_nodes=True)
# ACC (at most) 0.59

AddRandomMetaPaths

metapaths = [[('movie', 'actor'), ('actor', 'movie')],
             [('movie', 'director'), ('director', 'movie')]]
transform = T.AddRandomMetaPaths(metapaths=metapaths, drop_orig_edges=True,
                           drop_unconnected_nodes=True, walks_per_node=[20, 15])
# ACC (at most) 0.58

It seems AddRandomMetaPaths requires to speficy a larger walks_per_node to obtain a better performance.

PS. Sorry for such a straightforward name, but currently I don't have a better idea to name this class.

codecov · 2022-09-09T10:10:46Z

Codecov Report

Merging #5397 (d72c0cb) into master (96b8b9d) will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #5397      +/-   ##
==========================================
+ Coverage   83.38%   83.42%   +0.04%     
==========================================
  Files         350      350              
  Lines       19021    19074      +53     
==========================================
+ Hits        15860    15913      +53     
  Misses       3161     3161

Impacted Files	Coverage Δ
torch_geometric/transforms/__init__.py	`100.00% <100.00%> (ø)`
torch_geometric/transforms/add_metapaths.py	`100.00% <100.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

kswhitecross

@EdisonLeeeee This is a great improvement to see! I ran a test, and even on a gpu, this implementation is faster than AddMetaPaths with max_sample=1, so great job!

About the interface, my thought would be to extend AddMetaPaths to potentially use this method, by adding a method kwarg. We could have method=spspmm and method=random_walk to determine which method to use. You could also change the interface from walks_per_node and sample_ratio back to max_sample, which would ensure the same performance between both methods (as they produce metapaths from the same distribution).

How does that sound?

EdisonLeeeee · 2022-09-13T01:30:07Z

@kpstesla Sounds great! I think this is the best way to be compatible with the existing implementation while adding new features to AddMetaPaths.

However, I'm still a bit confused about the interface. As you can see, the spspmm based implementation can also support weighted sampling (set weighted=True) while the random walk based cannot. Besides, I have no idea so far to change the interface from walks_per_node and sample_ratio back to max_sample as they might share different intuitions.

Let me think about how to optimize the interface without breaking the compatibility. Really appreciate your suggestion and help!

kswhitecross · 2022-09-13T18:45:01Z

@EdisonLeeeee that sounds good! I had a couple thoughts that may be helpful:

One way to implement max_sample would be a 2 step process. We could sample the adjacency matrix to restrict the out degree of each node to be max_sample. Then, we could do something similar to what you've done here and compute the columns by array indexing, which would be much faster than spspmm in most cases.
As for weighted sampling, it may be possible to compute the weights of each path by indexing the value row of the sparse tensor similarly to how you index the column row of the sparse tensor. This might give you duplicate edges though. To fix this, you could use torch_geometric.utils.coalesce, which reduce adds duplicate edges (ie merges duplicates and adds their values).

EdisonLeeeee · 2022-09-14T02:54:23Z

Thank you @kpstesla, these are great thoughts. I think I can start working on it.

wsad1

Thanks for adding this. I think we should keep this as a separate class and call it AddRandomMetapaths. Will take a look again soon.

torch_geometric/transforms/add_metapaths.py

Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>

EdisonLeeeee · 2022-09-15T13:13:47Z

Still need some time to adapt it when weighted=True.

torch_geometric/transforms/add_metapaths.py

wsad1

Thanks for addressing all the comments till now. This looks good to me mostly, left some last comments 😄 .

torch_geometric/transforms/add_metapaths.py

Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>

EdisonLeeeee · 2022-09-20T09:09:11Z

Thanks again for your comments. Will address these ASAP.

for more information, see https://pre-commit.ci

EdisonLeeeee · 2022-09-20T10:57:53Z

Jobs done @wsad1

for more information, see https://pre-commit.ci

rusty1s · 2022-09-20T15:10:49Z

Thanks to all for getting this merged:)

…yg-team#5397) * add AddMetaPath2 as a faster alternative * remove redundant code * Update torch_geometric/transforms/add_metapaths.py Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com> * Update torch_geometric/transforms/add_metapaths.py Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com> * rename * add __repr__ * add test * coalesce edges * Update torch_geometric/transforms/add_metapaths.py Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com> * Update torch_geometric/transforms/add_metapaths.py Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com> * Update torch_geometric/transforms/add_metapaths.py Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update * Rename some variables + update docs. * Refactor to share code. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update CHANGELOG.md * Update add_metapaths.py * Update add_metapaths.py * Update add_metapaths.py * Update CHANGELOG.md Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

add AddMetaPath2 as a faster alternative

2716266

remove redundant code

c1cec1c

EdisonLeeeee changed the title ~~add AddMetaPath2 as a faster alternative~~ Add random walk based AddMetaPath2 as a faster alternative Sep 10, 2022

rusty1s assigned EdisonLeeeee Sep 11, 2022

rusty1s requested a review from RexYing September 11, 2022 10:21

rusty1s added feature 1 - Priority P1 transform labels Sep 11, 2022

kswhitecross self-requested a review September 12, 2022 16:28

kswhitecross reviewed Sep 12, 2022

View reviewed changes

wsad1 reviewed Sep 14, 2022

View reviewed changes

torch_geometric/transforms/add_metapaths.py Outdated Show resolved Hide resolved

torch_geometric/transforms/add_metapaths.py Outdated Show resolved Hide resolved

torch_geometric/transforms/add_metapaths.py Outdated Show resolved Hide resolved

EdisonLeeeee and others added 5 commits September 14, 2022 23:01

Update torch_geometric/transforms/add_metapaths.py

544c416

Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>

Update torch_geometric/transforms/add_metapaths.py

f4f138d

Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>

rename

8bd110a

add __repr__

8b2ea69

add test

d5d6874

Merge branch 'master' into add_metapath

4d6e4db

EdisonLeeeee changed the title ~~Add random walk based AddMetaPath2 as a faster alternative~~ Add random walk based AddRandomMetaPaths as a faster alternative Sep 15, 2022

EdisonLeeeee added 2 commits September 17, 2022 22:59

coalesce edges

97de0f5

Merge branch 'master' into add_metapath

8751ac6

wsad1 reviewed Sep 19, 2022

View reviewed changes

torch_geometric/transforms/add_metapaths.py Outdated Show resolved Hide resolved

torch_geometric/transforms/add_metapaths.py Show resolved Hide resolved

Merge branch 'master' into add_metapath

a3e0990

wsad1 approved these changes Sep 20, 2022

View reviewed changes

EdisonLeeeee and others added 2 commits September 20, 2022 17:07

Update torch_geometric/transforms/add_metapaths.py

05c6c4f

Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>

Update torch_geometric/transforms/add_metapaths.py

72a56e2

Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>

Update torch_geometric/transforms/add_metapaths.py

1cc5e63

Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>

pre-commit-ci bot and others added 2 commits September 20, 2022 09:11

[pre-commit.ci] auto fixes from pre-commit.com hooks

123de91

for more information, see https://pre-commit.ci

update

d013a67

EdisonLeeeee and others added 6 commits September 20, 2022 18:57

Merge branch 'master' into add_metapath

d613123

Rename some variables + update docs.

0a1e5ec

Refactor to share code.

e69ab6b

[pre-commit.ci] auto fixes from pre-commit.com hooks

e579655

for more information, see https://pre-commit.ci

Update CHANGELOG.md

564e16c

Update add_metapaths.py

48c0ca4

wsad1 enabled auto-merge (squash) September 20, 2022 14:12

wsad1 disabled auto-merge September 20, 2022 14:12

wsad1 added 3 commits September 20, 2022 19:50

Update add_metapaths.py

53b0b06

Update add_metapaths.py

86328d4

Update CHANGELOG.md

d72c0cb

wsad1 merged commit fe87db0 into pyg-team:master Sep 20, 2022

EdisonLeeeee deleted the add_metapath branch September 20, 2022 15:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add random walk based `AddRandomMetaPaths` as a faster alternative #5397

Add random walk based `AddRandomMetaPaths` as a faster alternative #5397

EdisonLeeeee commented Sep 9, 2022 •

edited

Loading

codecov bot commented Sep 9, 2022 •

edited

Loading

kswhitecross left a comment

EdisonLeeeee commented Sep 13, 2022 •

edited

Loading

kswhitecross commented Sep 13, 2022 •

edited

Loading

EdisonLeeeee commented Sep 14, 2022

wsad1 left a comment

EdisonLeeeee commented Sep 15, 2022

wsad1 left a comment •

edited

Loading

EdisonLeeeee commented Sep 20, 2022

EdisonLeeeee commented Sep 20, 2022

rusty1s commented Sep 20, 2022

Add random walk based AddRandomMetaPaths as a faster alternative #5397

Add random walk based AddRandomMetaPaths as a faster alternative #5397

Conversation

EdisonLeeeee commented Sep 9, 2022 • edited Loading

Description

Feature

Benchmark test

Time

Performance

codecov bot commented Sep 9, 2022 • edited Loading

Codecov Report

kswhitecross left a comment

Choose a reason for hiding this comment

EdisonLeeeee commented Sep 13, 2022 • edited Loading

kswhitecross commented Sep 13, 2022 • edited Loading

EdisonLeeeee commented Sep 14, 2022

wsad1 left a comment

Choose a reason for hiding this comment

EdisonLeeeee commented Sep 15, 2022

wsad1 left a comment • edited Loading

Choose a reason for hiding this comment

EdisonLeeeee commented Sep 20, 2022

EdisonLeeeee commented Sep 20, 2022

rusty1s commented Sep 20, 2022

Add random walk based `AddRandomMetaPaths` as a faster alternative #5397

Add random walk based `AddRandomMetaPaths` as a faster alternative #5397

EdisonLeeeee commented Sep 9, 2022 •

edited

Loading

codecov bot commented Sep 9, 2022 •

edited

Loading

EdisonLeeeee commented Sep 13, 2022 •

edited

Loading

kswhitecross commented Sep 13, 2022 •

edited

Loading

wsad1 left a comment •

edited

Loading