-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add random walk based AddRandomMetaPaths
as a faster alternative
#5397
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5397 +/- ##
==========================================
+ Coverage 83.38% 83.42% +0.04%
==========================================
Files 350 350
Lines 19021 19074 +53
==========================================
+ Hits 15860 15913 +53
Misses 3161 3161
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
AddMetaPath2
as a faster alternativeAddMetaPath2
as a faster alternative
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EdisonLeeeee This is a great improvement to see! I ran a test, and even on a gpu, this implementation is faster than AddMetaPaths
with max_sample=1
, so great job!
About the interface, my thought would be to extend AddMetaPaths
to potentially use this method, by adding a method
kwarg. We could have method=spspmm
and method=random_walk
to determine which method to use. You could also change the interface from walks_per_node
and sample_ratio
back to max_sample
, which would ensure the same performance between both methods (as they produce metapaths from the same distribution).
How does that sound?
@kpstesla Sounds great! I think this is the best way to be compatible with the existing implementation while adding new features to However, I'm still a bit confused about the interface. As you can see, the Let me think about how to optimize the interface without breaking the compatibility. Really appreciate your suggestion and help! |
@EdisonLeeeee that sounds good! I had a couple thoughts that may be helpful:
|
Thank you @kpstesla, these are great thoughts. I think I can start working on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this. I think we should keep this as a separate class and call it AddRandomMetapaths
. Will take a look again soon.
Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>
Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>
Still need some time to adapt it when |
AddMetaPath2
as a faster alternativeAddRandomMetaPaths
as a faster alternative
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing all the comments till now. This looks good to me mostly, left some last comments 😄 .
Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>
Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>
Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>
Thanks again for your comments. Will address these ASAP. |
for more information, see https://pre-commit.ci
Jobs done @wsad1 |
Thanks to all for getting this merged:) |
…yg-team#5397) * add AddMetaPath2 as a faster alternative * remove redundant code * Update torch_geometric/transforms/add_metapaths.py Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com> * Update torch_geometric/transforms/add_metapaths.py Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com> * rename * add __repr__ * add test * coalesce edges * Update torch_geometric/transforms/add_metapaths.py Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com> * Update torch_geometric/transforms/add_metapaths.py Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com> * Update torch_geometric/transforms/add_metapaths.py Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update * Rename some variables + update docs. * Refactor to share code. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update CHANGELOG.md * Update add_metapaths.py * Update add_metapaths.py * Update add_metapaths.py * Update CHANGELOG.md Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Description
This PR implements
AddRandomMetaPaths
as a faster alternative to generative metapaths. As mentioned in this issue, metapaths are sampled via multiple one-step random walks (walk length per node is 1). The implementation of a one-step random walk was inspired by that in Metapath2VecFeature
One can specify
walks_per_node
for the number of random walks per starting node in different metapaths.walks_per_node
must be an integer or a list of integers. For the later, the length ofwalks_per_node
must match the length ofmetapaths
.Benchmark test
Time
Performance
Take the example here for comparison.
It seems
AddRandomMetaPaths
requires to speficy a largerwalks_per_node
to obtain a better performance.PS. Sorry for such a straightforward name, but currently I don't have a better idea to name this class.