How to disable shaDow_GNN sampling? #7

intelsam · 2022-12-21T16:15:14Z

Hi, I need to compare the efficiency of the shaDow_GNN (decoupled GNN) with normal GNN (coupled GNN).
Is there a way to run this repo without the ParallelSampler?
What I mean is how do I run the baseline (coupled GNN where neighborhood expands with layer)?

I could not find any flag or config which disables the ParallelSampler. If you could kindly point me on the correct direction that will be really helpful.

ZimpleX · 2022-12-21T20:25:20Z

Hi, for training GNNs without sampling, there are two ways

Full batch: we perform message passing along all edges and among all nodes in the original graph
Minibatch: we randomly select a small set of target nodes (e.g., 64), and propagate from the full k-hop neighbors of the 64 nodes.

To train with method 2, you can use the khop sampler and set the budget to -1 to skip sampling. For example, replace [20] to [-1] here. At the end of epoch 1, by default, we also print out the subgraph profile. And you should be able to see the number of k-hop nodes becomes very large with this -1 budget.

The current codebase does not directly support training with method 1 (although supporting it shouldn't be too hard; I may need to cleanup some minibatch logic in the next release). In the paper, when we do full-batching training of the coupled GNNs, I simply have a separate code to implement an equivalent architecture with PyG.

intelsam · 2023-01-02T06:15:19Z

First of all, wish you a very happy new year.
Now coming to the topic, thanks a lot for the detailed answer. This is exactly (minibatch + k-hop) I was looking for.
One quick question, is budget (e.g. number of neighbors sampled per node = 20 in the above example) per layer or for all the layers combined (Apologies if this is already answered in the paper).

ZimpleX · 2023-01-02T23:52:49Z

Happy new year!

In the current code, budget is the same for all layers. So budget=20 and depth=2 means budget of 20 for both layers.

In case you want to modify this to specify budget for each layer separately, you can change the signature at the frontend as well as the backend to be a list / vector.

intelsam · 2023-01-11T18:18:59Z

Thanks a lot for the details.
I need to get timings (ppr, avg. train and test time) for the following scenario:

Train: minibatch (the current setup)
Infer (validation, test): Full batch a.k.a. Full-graph (pass the whole adjacency matrix and embedding matrix)

I want to train in a minibatch setting to learn the weights and then use weights to make the forward pass directly for inference.
I see that the code is using minibatching (

shaDow_GNN/shaDow/main.py

Line 210 in 045b85e

one_epoch(0, mode, model, minibatch, logger, status='final')

) during inference.

Can you please point me how to achieve this easily with the current code base, if possible.
Also could you please point to the "separate code" for full-batch, you mentioned in your previous comment?

ZimpleX · 2023-01-31T05:13:09Z

Sorry for the late reply.

Currently, the minibatch sampler can support full-batch data (i.e., no sampling). For this you just need to modify the sampler section of the yaml config. For example:

Change from

- method: ppr
  phase: train
  ...

to

- method: full
  phase: train

However, the training pipeline will throw out some errors since the logger would assumes a subgraph data structure which is different from the data structure of the full graph.

I do see some inflexibility (e.g., the one you mentioned) and redundant data structures in the current codebase. So I am actually trying to restructure & optimize performance for the next release. It is still work-in-progress and I hope to publish it in March. If you need this feature before then, you probably need to manually modify some code (e.g., start from fixing the logger issue).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to disable shaDow_GNN sampling? #7

How to disable shaDow_GNN sampling? #7

intelsam commented Dec 21, 2022

ZimpleX commented Dec 21, 2022

intelsam commented Jan 2, 2023

ZimpleX commented Jan 2, 2023

intelsam commented Jan 11, 2023

ZimpleX commented Jan 31, 2023 •

edited

Loading

How to disable shaDow_GNN sampling? #7

How to disable shaDow_GNN sampling? #7

Comments

intelsam commented Dec 21, 2022

ZimpleX commented Dec 21, 2022

intelsam commented Jan 2, 2023

ZimpleX commented Jan 2, 2023

intelsam commented Jan 11, 2023

ZimpleX commented Jan 31, 2023 • edited Loading

ZimpleX commented Jan 31, 2023 •

edited

Loading