Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LRGB: Long Range Graph Benchmark datasets #5935

Merged
merged 36 commits into from
Nov 17, 2022
Merged

Conversation

vijaydwivedi75
Copy link
Contributor

This PR adds the LRGB datasets from the paper Long Range Graph Benchmark. The original dataset source is in this repo.

The Long Range Graph Benchmark (LRGB) is a collection of 5 graph learning datasets that arguably require long-range reasoning to achieve strong performance in a given task. The 5 datasets in this benchmark can be used to prototype new models that can capture long range dependencies in graphs.

Dataset Domain Task
PascalVOC-SP Computer Vision Node Classification
COCO-SP Computer Vision Node Classification
PCQM-Contact Quantum Chemistry Link Prediction
Peptides-func Chemistry Graph Classification
Peptides-struct Chemistry Graph Regression

The torch_geometric.datasets.LRGBDataset can be used to access any of the 5 datasets in the benchmark.

@codecov
Copy link

codecov bot commented Nov 9, 2022

Codecov Report

Merging #5935 (3a9e039) into master (15dbdaf) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #5935   +/-   ##
=======================================
  Coverage   84.54%   84.54%           
=======================================
  Files         361      361           
  Lines       19877    19877           
=======================================
  Hits        16806    16806           
  Misses       3071     3071           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@EdisonLeeeee
Copy link
Contributor

Thanks for adding these! Left some initial comments.

torch_geometric/datasets/lrgb.py Show resolved Hide resolved
torch_geometric/datasets/lrgb.py Outdated Show resolved Hide resolved
@vijaydwivedi75
Copy link
Contributor Author

Thanks @EdisonLeeeee for your comments!
I have included the following revisions in an updated commit.

  • Merged process steps for the superpixels and Peptides datasets
  • Removed 'Task' column from the Stats
  • Simplified contiguous label through LabelEncoder.

torch_geometric/datasets/lrgb.py Outdated Show resolved Hide resolved
torch_geometric/datasets/lrgb.py Outdated Show resolved Hide resolved
torch_geometric/datasets/lrgb.py Outdated Show resolved Hide resolved
torch_geometric/datasets/lrgb.py Outdated Show resolved Hide resolved
torch_geometric/datasets/lrgb.py Outdated Show resolved Hide resolved
torch_geometric/datasets/lrgb.py Outdated Show resolved Hide resolved
@vijaydwivedi75
Copy link
Contributor Author

Hi @EdisonLeeeee I have updated the code based on the feedbacks -thank you!
It is ready from my side.

@EdisonLeeeee
Copy link
Contributor

LGTM. Thanks for the update!

CHANGELOG.md Outdated Show resolved Hide resolved
@EdisonLeeeee
Copy link
Contributor

A gentle ping @rusty1s :)

vijaydwivedi75 and others added 2 commits November 12, 2022 17:18
Co-authored-by: Jintang Li <cnljt@outlook.com>
@EdisonLeeeee
Copy link
Contributor

I'm also wondering if we could add the dataset information (as you summarized in this PR) into docstring so that users can choose specific datasets according to their tasks. FYI:

+----------------+-----------------+------------------------------+-----------+
| Molecule | Level of Theory | Name | #Examples |
+================+=================+==============================+===========+
| Benzene | DFT | :obj:`benzene` | 49,863 |
+----------------+-----------------+------------------------------+-----------+
| Benzene | DFT FHI-aims | :obj:`benzene FHI-aims` | 627,983 |

@vijaydwivedi75
Copy link
Contributor Author

Done, added dataset info in docstring!

Copy link
Member

@rusty1s rusty1s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding <3

torch_geometric/datasets/lrgb.py Outdated Show resolved Hide resolved
torch_geometric/datasets/lrgb.py Outdated Show resolved Hide resolved
@rusty1s rusty1s merged commit 964a447 into pyg-team:master Nov 17, 2022
JakubPietrakIntel pushed a commit to JakubPietrakIntel/pytorch_geometric that referenced this pull request Nov 25, 2022
This PR adds the LRGB datasets from the paper [Long Range Graph
Benchmark](https://openreview.net/pdf?id=in7XC5RcjEn). The original
dataset source is [in this repo](http://github.com/vijaydwivedi75/lrgb).

The Long Range Graph Benchmark (LRGB) is a collection of 5 graph
learning datasets that arguably require long-range reasoning to achieve
strong performance in a given task. The 5 datasets in this benchmark can
be used to prototype new models that can capture long range dependencies
in graphs.

Dataset | Domain | Task
-- | -- | --
PascalVOC-SP | Computer Vision | Node Classification
COCO-SP | Computer Vision | Node Classification
PCQM-Contact | Quantum Chemistry | Link Prediction
Peptides-func | Chemistry | Graph Classification
Peptides-struct | Chemistry | Graph Regression

The `torch_geometric.datasets.LRGBDataset` can be used to access any of
the 5 datasets in the benchmark.

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Jintang Li <cnljt@outlook.com>
Co-authored-by: Jinu Sunil <jinu.sunil@gmail.com>
Co-authored-by: Matthias Fey <matthias.fey@tu-dortmund.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants