Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add chebi 20 to datasets #108

Merged
merged 12 commits into from
Mar 28, 2023
Merged

Conversation

jackapbutler
Copy link
Collaborator

@jackapbutler jackapbutler commented Mar 14, 2023

I've added a transform.py script and related meta.yaml for the Chebi-20 dataset. It contains a dataset of SMILES with their corresponding natural language description. The dataset was first used in this paper but also used here by the same authors. It also features in Kevin's awesome-chemistry-datasets repository.

I reopened this PR because I accidentally removed the commits from the old one 🙃.

@jackapbutler jackapbutler changed the title Add chebi 20 to datasets feat: add chebi 20 to datasets Mar 14, 2023
@kjappelbaum
Copy link
Collaborator

Thanks, looks quite nice and clean to me! The only thing is the train/val/test thing. I have no strong preference other than that it should be consistent.

In https://github.com/OpenBioML/chemnlp/pull/98/files# we have been introducing a special column indicating the split. We might also try your approach and revert the other PR, do you have any preference or opinion?

@jackapbutler
Copy link
Collaborator Author

Thanks, looks quite nice and clean to me! The only thing is the train/val/test thing. I have no strong preference other than that it should be consistent.

In https://github.com/OpenBioML/chemnlp/pull/98/files# we have been introducing a special column indicating the split. We might also try your approach and revert the other PR, do you have any preference or opinion?

I've replies in #104 but I think they're two different things

@kjappelbaum
Copy link
Collaborator

kjappelbaum commented Mar 28, 2023

Looks good to me, thanks! 💯

@jackapbutler jackapbutler merged commit 0c3408b into OpenBioML:main Mar 28, 2023
@jackapbutler jackapbutler deleted the add-chebi-20 branch March 28, 2023 10:27
@phalem phalem mentioned this pull request Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New Task: Add chebi-20 dataset
2 participants