add GLEM model, TAGDataset and example of GLEM #9662

ECMGit · 2024-09-15T21:27:03Z

reopened #9591

Feature summary:

Add GLEM as GNN & LLM Co-training model to PyG
adapt GLEM's LM to AutoModelForSequenceClassification from transformers
Lora support
LM/LLM support
ogbn-products/ogbn-arxiv testing finished
TAGDataset can be used as a wrapper class for any node classification dataset in PyG with LM tokenizer and associate raw text
external prediction as pseudo labels supported

for more information, see https://pre-commit.ci

codecov · 2024-09-15T22:09:51Z

Codecov Report

Attention: Patch coverage is 11.93182% with 155 lines in your changes missing coverage. Please review.

Project coverage is 86.92%. Comparing base (ba3b906) to head (a22742c).
Report is 4 commits behind head on master.

Files with missing lines	Patch %	Lines
torch_geometric/nn/models/glem.py	11.42%	155 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #9662      +/-   ##
==========================================
- Coverage   87.54%   86.92%   -0.62%     
==========================================
  Files         482      483       +1     
  Lines       31414    31585     +171     
==========================================
- Hits        27501    27455      -46     
- Misses       3913     4130     +217

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

puririshi98

LGTM just get CI green

puririshi98 · 2024-09-24T19:28:55Z

@rusty1s @akihironitta ready for your reviews

akihironitta

Could we have type annotations all over the PR? Also, I'd suggest splitting this PR into smaller ones.

examples/llm/glem.py

examples/llm/README.md

examples/llm/glem.py

akihironitta · 2024-10-16T15:46:47Z

examples/llm/glem.py

+        if em_phase == 'gnn':
+            gnn_test_acc = max(gnn_test_acc, final_test_acc)
+            model.gnn = model.gnn.to('cpu', non_blocking=True)
+            em_phase = 'lm'
+        else:
+            lm_test_acc = max(lm_test_acc, final_test_acc)
+            model.lm = model.lm.to('cpu', non_blocking=True)
+            em_phase = 'gnn'
+        torch.cuda.empty_cache()
+    print(f'Best GNN acc: {gnn_test_acc}, LM acc: {lm_test_acc}')


This is the same comment as #9467 (comment), but we shouldn't pick the best metric evaluated on the test set at the end of every EM step.

Hi Akihiro,

Thanks for reviewing the code.

I think the case is different here, I agree that we should not pick best test metrics after every epoch, but the test metrics still required for every EM step. Since E step is LM training and M-step is GNN training, both step need certain number of epochs. We need to run full inference after every E and M step for finding out that which model have better result.

@ECMGit i think @akihironitta's point is that to "finding out that which model have better result" you should only use val accuracy, not test accuracy since if you use the test acc this could be viewed as a form of loosely fitting the model to the test set

akihironitta

I haven't had a look outside the example script yet, but this addition is exciting! 🚀

for more information, see https://pre-commit.ci

puririshi98 · 2024-10-29T21:21:19Z

LGTM @akihironitta @rusty1s let us know if anything else needed

ECMGit and others added 4 commits September 15, 2024 16:23

add GLEM model, TAGDataset and example of GLEM

696fe7a

[pre-commit.ci] auto fixes from pre-commit.com hooks

430c4fd

for more information, see https://pre-commit.ci

fix docstring unexpected intentation

78d9781

[pre-commit.ci] auto fixes from pre-commit.com hooks

a22742c

for more information, see https://pre-commit.ci

puririshi98 self-requested a review September 16, 2024 15:27

puririshi98 approved these changes Sep 24, 2024

View reviewed changes

Merge branch 'master' into gnn-llm-integration-glem

c74cbcb

puririshi98 marked this pull request as ready for review September 24, 2024 19:28

puririshi98 requested review from wsad1 and EdisonLeeeee as code owners September 24, 2024 19:28

puririshi98 requested review from rusty1s and akihironitta September 24, 2024 22:18

puririshi98 assigned ECMGit Sep 24, 2024

puririshi98 added feature example labels Sep 24, 2024

puririshi98 mentioned this pull request Oct 7, 2024

[Community Sprint] GNNs<>LLMs 🚀 #9694

Open

4 tasks

puririshi98 and others added 3 commits October 8, 2024 08:09

Merge branch 'master' into gnn-llm-integration-glem

599e3d8

Merge branch 'master' into gnn-llm-integration-glem

558bac9

Merge branch 'master' into gnn-llm-integration-glem

10a4e40

akihironitta reviewed Oct 16, 2024

View reviewed changes

puririshi98 and others added 7 commits October 23, 2024 12:07

Merge branch 'master' into gnn-llm-integration-glem

96fe9b5

change data folder and remove test acc metrics in each epoch

a13cafa

[pre-commit.ci] auto fixes from pre-commit.com hooks

471b602

for more information, see https://pre-commit.ci

remove unused variables

09301da

[pre-commit.ci] auto fixes from pre-commit.com hooks

8ef78f0

for more information, see https://pre-commit.ci

precommit fix

9377881

[pre-commit.ci] auto fixes from pre-commit.com hooks

745ed21

for more information, see https://pre-commit.ci

puririshi98 and others added 5 commits October 28, 2024 19:12

Merge branch 'master' into gnn-llm-integration-glem

320bb80

select best model by validation accuracy

7c22ec0

[pre-commit.ci] auto fixes from pre-commit.com hooks

c5db5cf

for more information, see https://pre-commit.ci

precommit failure fix

f56cbce

[pre-commit.ci] auto fixes from pre-commit.com hooks

62784d3

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add GLEM model, TAGDataset and example of GLEM #9662

add GLEM model, TAGDataset and example of GLEM #9662

ECMGit commented Sep 15, 2024

codecov bot commented Sep 15, 2024 •

edited

Loading

puririshi98 left a comment

puririshi98 commented Sep 24, 2024

akihironitta left a comment

akihironitta Oct 16, 2024

ECMGit Oct 25, 2024

puririshi98 Oct 29, 2024

ECMGit Oct 29, 2024

akihironitta left a comment

puririshi98 commented Oct 29, 2024

add GLEM model, TAGDataset and example of GLEM #9662

Are you sure you want to change the base?

add GLEM model, TAGDataset and example of GLEM #9662

Conversation

ECMGit commented Sep 15, 2024

codecov bot commented Sep 15, 2024 • edited Loading

Codecov Report

puririshi98 left a comment

Choose a reason for hiding this comment

puririshi98 commented Sep 24, 2024

akihironitta left a comment

Choose a reason for hiding this comment

akihironitta Oct 16, 2024

Choose a reason for hiding this comment

ECMGit Oct 25, 2024

Choose a reason for hiding this comment

puririshi98 Oct 29, 2024

Choose a reason for hiding this comment

ECMGit Oct 29, 2024

Choose a reason for hiding this comment

akihironitta left a comment

Choose a reason for hiding this comment

puririshi98 commented Oct 29, 2024

codecov bot commented Sep 15, 2024 •

edited

Loading