Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add random walk based AddRandomMetaPaths as a faster alternative #5397

Merged
merged 25 commits into from
Sep 20, 2022

Conversation

EdisonLeeeee
Copy link
Contributor

@EdisonLeeeee EdisonLeeeee commented Sep 9, 2022

Description

This PR implements AddRandomMetaPaths as a faster alternative to generative metapaths. As mentioned in this issue, metapaths are sampled via multiple one-step random walks (walk length per node is 1). The implementation of a one-step random walk was inspired by that in Metapath2Vec

Feature

One can specify walks_per_node for the number of random walks per starting node in different metapaths. walks_per_node must be an integer or a list of integers. For the later, the length of walks_per_node must match the length of metapaths.

metapaths = [
    [("author", "paper"), ("paper", "author")],
    [("author", "paper"), ("paper", "venue"), ("venue", "paper"), ("paper", "author")],
]
>>> AddRandomMetaPaths(metapaths, walks_per_node=1)

>>> AddRandomMetaPaths(metapaths, walks_per_node=[10, 20])

Benchmark test

Time

from torch_geometric.datasets import AMiner
from torch_geometric.transforms import AddMetaPaths, AddRandomMetaPaths

data = AMiner(root="~/data/pygdata/AMiner")[0]
metapaths = [
    [("author", "paper"), ("paper", "author")],
    [("author", "paper"), ("paper", "venue"), ("venue", "paper"), ("paper", "author")],
]
  • AddRandomMetaPaths(on CPU and GPU)
data = AddRandomMetaPaths(metapaths)(data)
# cannot be finished in hours
  • AddRandomMetaPaths(on CPU)
%%time
data = AddRandomMetaPaths(metapaths)(data)
# 417 ms

Performance

Take the example here for comparison.

  • AddMetapaths
metapaths = [[('movie', 'actor'), ('actor', 'movie')],
             [('movie', 'director'), ('director', 'movie')]]
transform = T.AddMetaPaths(metapaths=metapaths, drop_orig_edges=True,
                           drop_unconnected_nodes=True)
# ACC (at most) 0.59
  • AddRandomMetaPaths
metapaths = [[('movie', 'actor'), ('actor', 'movie')],
             [('movie', 'director'), ('director', 'movie')]]
transform = T.AddRandomMetaPaths(metapaths=metapaths, drop_orig_edges=True,
                           drop_unconnected_nodes=True, walks_per_node=[20, 15])
# ACC (at most) 0.58

It seems AddRandomMetaPaths requires to speficy a larger walks_per_node to obtain a better performance.

PS. Sorry for such a straightforward name, but currently I don't have a better idea to name this class.

@codecov
Copy link

codecov bot commented Sep 9, 2022

Codecov Report

Merging #5397 (d72c0cb) into master (96b8b9d) will increase coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #5397      +/-   ##
==========================================
+ Coverage   83.38%   83.42%   +0.04%     
==========================================
  Files         350      350              
  Lines       19021    19074      +53     
==========================================
+ Hits        15860    15913      +53     
  Misses       3161     3161              
Impacted Files Coverage Δ
torch_geometric/transforms/__init__.py 100.00% <100.00%> (ø)
torch_geometric/transforms/add_metapaths.py 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@EdisonLeeeee EdisonLeeeee changed the title add AddMetaPath2 as a faster alternative Add random walk based AddMetaPath2 as a faster alternative Sep 10, 2022
Copy link
Contributor

@kswhitecross kswhitecross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EdisonLeeeee This is a great improvement to see! I ran a test, and even on a gpu, this implementation is faster than AddMetaPaths with max_sample=1, so great job!

About the interface, my thought would be to extend AddMetaPaths to potentially use this method, by adding a method kwarg. We could have method=spspmm and method=random_walk to determine which method to use. You could also change the interface from walks_per_node and sample_ratio back to max_sample, which would ensure the same performance between both methods (as they produce metapaths from the same distribution).

How does that sound?

@EdisonLeeeee
Copy link
Contributor Author

EdisonLeeeee commented Sep 13, 2022

@kpstesla Sounds great! I think this is the best way to be compatible with the existing implementation while adding new features to AddMetaPaths.

However, I'm still a bit confused about the interface. As you can see, the spspmm based implementation can also support weighted sampling (set weighted=True) while the random walk based cannot. Besides, I have no idea so far to change the interface from walks_per_node and sample_ratio back to max_sample as they might share different intuitions.

Let me think about how to optimize the interface without breaking the compatibility. Really appreciate your suggestion and help!

@kswhitecross
Copy link
Contributor

kswhitecross commented Sep 13, 2022

@EdisonLeeeee that sounds good! I had a couple thoughts that may be helpful:

  • One way to implement max_sample would be a 2 step process. We could sample the adjacency matrix to restrict the out degree of each node to be max_sample. Then, we could do something similar to what you've done here and compute the columns by array indexing, which would be much faster than spspmm in most cases.
  • As for weighted sampling, it may be possible to compute the weights of each path by indexing the value row of the sparse tensor similarly to how you index the column row of the sparse tensor. This might give you duplicate edges though. To fix this, you could use torch_geometric.utils.coalesce, which reduce adds duplicate edges (ie merges duplicates and adds their values).

@EdisonLeeeee
Copy link
Contributor Author

Thank you @kpstesla, these are great thoughts. I think I can start working on it.

Copy link
Member

@wsad1 wsad1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this. I think we should keep this as a separate class and call it AddRandomMetapaths. Will take a look again soon.

torch_geometric/transforms/add_metapaths.py Outdated Show resolved Hide resolved
torch_geometric/transforms/add_metapaths.py Outdated Show resolved Hide resolved
torch_geometric/transforms/add_metapaths.py Outdated Show resolved Hide resolved
EdisonLeeeee and others added 5 commits September 14, 2022 23:01
Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>
Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>
@EdisonLeeeee
Copy link
Contributor Author

Still need some time to adapt it when weighted=True.

@EdisonLeeeee EdisonLeeeee changed the title Add random walk based AddMetaPath2 as a faster alternative Add random walk based AddRandomMetaPaths as a faster alternative Sep 15, 2022
Copy link
Member

@wsad1 wsad1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing all the comments till now. This looks good to me mostly, left some last comments 😄 .

torch_geometric/transforms/add_metapaths.py Outdated Show resolved Hide resolved
torch_geometric/transforms/add_metapaths.py Outdated Show resolved Hide resolved
torch_geometric/transforms/add_metapaths.py Outdated Show resolved Hide resolved
torch_geometric/transforms/add_metapaths.py Outdated Show resolved Hide resolved
EdisonLeeeee and others added 2 commits September 20, 2022 17:07
Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>
Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>
Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>
@EdisonLeeeee
Copy link
Contributor Author

Thanks again for your comments. Will address these ASAP.

@EdisonLeeeee
Copy link
Contributor Author

Jobs done @wsad1

@wsad1 wsad1 enabled auto-merge (squash) September 20, 2022 14:12
@wsad1 wsad1 disabled auto-merge September 20, 2022 14:12
@wsad1 wsad1 merged commit fe87db0 into pyg-team:master Sep 20, 2022
@EdisonLeeeee EdisonLeeeee deleted the add_metapath branch September 20, 2022 15:07
@rusty1s
Copy link
Member

rusty1s commented Sep 20, 2022

Thanks to all for getting this merged:)

JakubPietrakIntel pushed a commit to JakubPietrakIntel/pytorch_geometric that referenced this pull request Nov 25, 2022
…yg-team#5397)

* add AddMetaPath2 as a faster alternative

* remove redundant code

* Update torch_geometric/transforms/add_metapaths.py

Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>

* Update torch_geometric/transforms/add_metapaths.py

Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>

* rename

* add __repr__

* add test

* coalesce edges

* Update torch_geometric/transforms/add_metapaths.py

Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>

* Update torch_geometric/transforms/add_metapaths.py

Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>

* Update torch_geometric/transforms/add_metapaths.py

Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update

* Rename some variables + update docs.

* Refactor to share code.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update CHANGELOG.md

* Update add_metapaths.py

* Update add_metapaths.py

* Update add_metapaths.py

* Update CHANGELOG.md

Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants