Graph heads ci tests #208

allaffa · 2024-01-18T22:09:36Z

The capability to use a full stack of convolutional layers when only nodal predictions are needed was never tested in the code.
This PR:

updated pre-existing code to maintain support of type="conv" as a choice for nodal heads of HydraGNN
has if-then-else conditions to make sure that get_conv is called with the proper number of arguments. All models except SCFStack take two input arguments. SCFStack takes also last_layer, which is a Boolean variable.

allaffa · 2024-01-18T22:12:12Z

@pzhanggit
The fixes provided by this PR allow us to have a working baseline graph auto encoder in HydraGNN

allaffa · 2024-01-19T19:32:14Z

I changed like 225 of the file hydragnn/utils/distributed.py as follows:

model = torch.nn.parallel.DistributedDataParallel(model, find_unused_parameters=True)

Setting find_unused_parameters=True does not make the code crash, and should allow to track the parameters.

I also added the following lined to the train() function inside train_validate_test.py

        with record_function("forward"):
            data = data.to(get_device())
            pred = model(data)
            # Print unused parameters
            unused_params = [name for name, param in model.module.named_parameters() if not param.requires_grad]
            if unused_params:
                print("Unused Parameters:")
                for name in unused_params:
                    print(name)
            else:
                print("No unused parameters.")
            loss, tasks_loss = model.module.loss(pred, data.y, head_index)

However, no unused parameters are tracked.

pzhanggit · 2024-01-20T07:28:22Z

I changed like 225 of the file hydragnn/utils/distributed.py as follows:

model = torch.nn.parallel.DistributedDataParallel(model, find_unused_parameters=True)

Setting find_unused_parameters=True does not make the code crash, and should allow to track the parameters.

I also added the following lined to the train() function inside train_validate_test.py
        with record_function("forward"):
            data = data.to(get_device())
            pred = model(data)
            # Print unused parameters
            unused_params = [name for name, param in model.module.named_parameters() if not param.requires_grad]
            if unused_params:
                print("Unused Parameters:")
                for name in unused_params:
                    print(name)
            else:
                print("No unused parameters.")
            loss, tasks_loss = model.module.loss(pred, data.y, head_index)
However, no unused parameters are tracked.

https://discuss.pytorch.org/t/how-to-find-the-unused-parameters-in-network/63948/5

allaffa · 2024-01-20T15:50:24Z

I changed like 225 of the file hydragnn/utils/distributed.py as follows:
model = torch.nn.parallel.DistributedDataParallel(model, find_unused_parameters=True)
Setting find_unused_parameters=True does not make the code crash, and should allow to track the parameters.
I also added the following lined to the train() function inside train_validate_test.py
        with record_function("forward"):
            data = data.to(get_device())
            pred = model(data)
            # Print unused parameters
            unused_params = [name for name, param in model.module.named_parameters() if not param.requires_grad]
            if unused_params:
                print("Unused Parameters:")
                for name in unused_params:
                    print(name)
            else:
                print("No unused parameters.")
            loss, tasks_loss = model.module.loss(pred, data.y, head_index)
However, no unused parameters are tracked.
https://discuss.pytorch.org/t/how-to-find-the-unused-parameters-in-network/63948/5

@pzhanggit thanks for the link.

If you do not use DDP, then you need to call model.parameters() as in the link you provided. If we use DDP, then replacing model with model.module (as we already do in other parts of the code) and calling model.module.parameters() should do the same.

Am I missing something?

hydragnn/models/Base.py

allaffa · 2024-01-23T02:24:46Z

@JustinBakerMath

This PR fixed some bugs introduced by successive code developments and re-establishes the capabilities to create a graph auto encoder in HydraGNN that solely relies on message passing layers for node-to-node mapping predictions.
This is not meant to replace the shared MLP, it just aims at providing the user with an additional (optional) architecture for nodal predictions. This way, the user can eider choose to build (1) a stack of message passing layers with a shared MLP on top, or (2) use message passing layers all the way till the end.

The performance of the graph auto encoder using only message passing layers is pretty disappointing on the unit tests. However, this is in line with previous runs observed by Pei a long time ago on the FePt dataset.
Since we want to further develop generative graph diffusion models on top of HydraGNN, I would like to see how graph auto encoders behave in that context.

Would you mind helping me make sure that my Changs do not mess up the SchNet layer?
I remember that for some equivariant models you turn off the batch-norm. I would like to get your help make sure that I did not add bugs when the graph auto encoder uses equivariant message passing backend.

Thanks,

JustinBakerMath · 2024-01-26T18:38:29Z

@JustinBakerMath

This PR fixed some bugs introduced by successive code developments and re-establishes the capabilities to create a graph auto encoder in HydraGNN that solely relies on message passing layers for node-to-node mapping predictions. This is not meant to replace the shared MLP, it just aims at providing the user with an additional (optional) architecture for nodal predictions. This way, the user can eider choose to build (1) a stack of message passing layers with a shared MLP on top, or (2) use message passing layers all the way till the end.

The performance of the graph auto encoder using only message passing layers is pretty disappointing on the unit tests. However, this is in line with previous runs observed by Pei a long time ago on the FePt dataset. Since we want to further develop generative graph diffusion models on top of HydraGNN, I would like to see how graph auto encoders behave in that context.

Would you mind helping me make sure that my Changs do not mess up the SchNet layer? I remember that for some equivariant models you turn off the batch-norm. I would like to get your help make sure that I did not add bugs when the graph auto encoder uses equivariant message passing backend.

Thanks,

Thank you for your patience.

These updates have not introduced any errors into the implementation of EGCL or SchNet.

It has added batch normalization to the convolutional "head". The performance of the batch normalization can be assessed by overriding the _init_node_conv() function in the children of Base. This would be done in the same fashion as the existing overrides for the _init_conv() function.

This isn't necessary for this PR. However, going forward, I would be happy to assist in assessing the batch normalization performance.

pzhanggit · 2024-02-16T15:30:43Z

The tests failed due to that the changes introduced requiring torch-geometric>=2.4.0 which stops supporting python 3.7, as in pyg-team/pytorch_geometric#7939. We will come back to the PR later.

allaffa · 2024-03-04T14:26:11Z

@pzhanggit
Do you think that the new upgraded versions of PR#210 (now merged into the master) would allow for the tests of this PR to pass after rebasing?

pzhanggit · 2024-03-04T14:39:35Z

@pzhanggit Do you think that the new upgraded versions of PR#210 (now merged into the master) would allow for the tests of this PR to pass after rebasing?

Yes, I will rebase now

jychoi-hpc · 2024-03-23T03:14:38Z

PYG released a new version (https://github.com/pyg-team/pytorch_geometric/releases/tag/2.5.2), which includes the fix for the problem. Can we try if pyg 2.5.2 works?

torch_geometric==2.5.2

Threshold increased for SchNet in unit test for convolutional heads

* upgrade pyg 2.5.2 * JSON file for convolutional heads added and test_graph updated * thresholds increased for EGNN and SchNet * Update test_graphs.py Threshold increased for SchNet in unit test for convolutional heads * update DimeNet weights initilization by Justin * hyperparameter adjust for conv_head tests * format * relax error tolerance in conv_head for GIN --------- Co-authored-by: Choi <choij@ornl.gov> Co-authored-by: Zhang, Pei <zhangp1@ornl.gov>

allaffa added the bug Something isn't working label Jan 18, 2024

allaffa requested review from JustinBakerMath and pzhanggit January 18, 2024 22:09

allaffa self-assigned this Jan 18, 2024

allaffa force-pushed the graph_heads_ci_tests branch from 4947f79 to 310089b Compare January 19, 2024 03:15

pzhanggit reviewed Jan 22, 2024

View reviewed changes

hydragnn/models/Base.py Outdated Show resolved Hide resolved

allaffa force-pushed the graph_heads_ci_tests branch from e6ce4aa to e580a88 Compare January 23, 2024 00:41

JustinBakerMath approved these changes Jan 26, 2024

View reviewed changes

allaffa requested a review from pzhanggit January 26, 2024 19:49

pzhanggit force-pushed the graph_heads_ci_tests branch from a215810 to d2a551d Compare March 4, 2024 16:53

jychoi-hpc and others added 7 commits March 28, 2024 22:40

upgrade pyg 2.5.2

d30bffa

JSON file for convolutional heads added and test_graph updated

3160b88

thresholds increased for EGNN and SchNet

1ef0448

Update test_graphs.py

23b70de

Threshold increased for SchNet in unit test for convolutional heads

update DimeNet weights initilization by Justin

d3f85a0

hyperparameter adjust for conv_head tests

1188183

format

afe91b1

pzhanggit force-pushed the graph_heads_ci_tests branch from d2a551d to afe91b1 Compare April 1, 2024 15:53

relax error tolerance in conv_head for GIN

d8cd181

pzhanggit force-pushed the graph_heads_ci_tests branch from 6177412 to d8cd181 Compare April 1, 2024 18:00

allaffa merged commit c520b80 into ORNL:main Apr 1, 2024
2 checks passed

allaffa deleted the graph_heads_ci_tests branch April 1, 2024 18:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph heads ci tests #208

Graph heads ci tests #208

allaffa commented Jan 18, 2024 •

edited

Loading

allaffa commented Jan 18, 2024 •

edited

Loading

allaffa commented Jan 19, 2024

pzhanggit commented Jan 20, 2024

allaffa commented Jan 20, 2024

allaffa commented Jan 23, 2024

JustinBakerMath commented Jan 26, 2024

pzhanggit commented Feb 16, 2024

allaffa commented Mar 4, 2024

pzhanggit commented Mar 4, 2024

jychoi-hpc commented Mar 23, 2024

Graph heads ci tests #208

Graph heads ci tests #208

Conversation

allaffa commented Jan 18, 2024 • edited Loading

allaffa commented Jan 18, 2024 • edited Loading

allaffa commented Jan 19, 2024

pzhanggit commented Jan 20, 2024

allaffa commented Jan 20, 2024

allaffa commented Jan 23, 2024

JustinBakerMath commented Jan 26, 2024

pzhanggit commented Feb 16, 2024

allaffa commented Mar 4, 2024

pzhanggit commented Mar 4, 2024

jychoi-hpc commented Mar 23, 2024

allaffa commented Jan 18, 2024 •

edited

Loading

allaffa commented Jan 18, 2024 •

edited

Loading