Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to disable shaDow_GNN sampling? #7

Open
intelsam opened this issue Dec 21, 2022 · 5 comments
Open

How to disable shaDow_GNN sampling? #7

intelsam opened this issue Dec 21, 2022 · 5 comments

Comments

@intelsam
Copy link

Hi, I need to compare the efficiency of the shaDow_GNN (decoupled GNN) with normal GNN (coupled GNN).
Is there a way to run this repo without the ParallelSampler?
What I mean is how do I run the baseline (coupled GNN where neighborhood expands with layer)?

I could not find any flag or config which disables the ParallelSampler. If you could kindly point me on the correct direction that will be really helpful.

@ZimpleX
Copy link
Contributor

ZimpleX commented Dec 21, 2022

Hi, for training GNNs without sampling, there are two ways

  1. Full batch: we perform message passing along all edges and among all nodes in the original graph
  2. Minibatch: we randomly select a small set of target nodes (e.g., 64), and propagate from the full k-hop neighbors of the 64 nodes.

To train with method 2, you can use the khop sampler and set the budget to -1 to skip sampling. For example, replace [20] to [-1] here. At the end of epoch 1, by default, we also print out the subgraph profile. And you should be able to see the number of k-hop nodes becomes very large with this -1 budget.

The current codebase does not directly support training with method 1 (although supporting it shouldn't be too hard; I may need to cleanup some minibatch logic in the next release). In the paper, when we do full-batching training of the coupled GNNs, I simply have a separate code to implement an equivalent architecture with PyG.

@intelsam
Copy link
Author

intelsam commented Jan 2, 2023

First of all, wish you a very happy new year.
Now coming to the topic, thanks a lot for the detailed answer. This is exactly (minibatch + k-hop) I was looking for.
One quick question, is budget (e.g. number of neighbors sampled per node = 20 in the above example) per layer or for all the layers combined (Apologies if this is already answered in the paper).

@ZimpleX
Copy link
Contributor

ZimpleX commented Jan 2, 2023

Happy new year!

In the current code, budget is the same for all layers. So budget=20 and depth=2 means budget of 20 for both layers.

In case you want to modify this to specify budget for each layer separately, you can change the signature at the frontend as well as the backend to be a list / vector.

@intelsam
Copy link
Author

Thanks a lot for the details.
I need to get timings (ppr, avg. train and test time) for the following scenario:

  • Train: minibatch (the current setup)
  • Infer (validation, test): Full batch a.k.a. Full-graph (pass the whole adjacency matrix and embedding matrix)

I want to train in a minibatch setting to learn the weights and then use weights to make the forward pass directly for inference.
I see that the code is using minibatching (

one_epoch(0, mode, model, minibatch, logger, status='final')
) during inference.

Can you please point me how to achieve this easily with the current code base, if possible.
Also could you please point to the "separate code" for full-batch, you mentioned in your previous comment?

@ZimpleX
Copy link
Contributor

ZimpleX commented Jan 31, 2023

Sorry for the late reply.

Currently, the minibatch sampler can support full-batch data (i.e., no sampling). For this you just need to modify the sampler section of the yaml config. For example:

Change from

- method: ppr
  phase: train
  ...

to

- method: full
  phase: train

However, the training pipeline will throw out some errors since the logger would assumes a subgraph data structure which is different from the data structure of the full graph.

I do see some inflexibility (e.g., the one you mentioned) and redundant data structures in the current codebase. So I am actually trying to restructure & optimize performance for the next release. It is still work-in-progress and I hope to publish it in March. If you need this feature before then, you probably need to manually modify some code (e.g., start from fixing the logger issue).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants