-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add chebi 20 to datasets #108
Conversation
Thanks, looks quite nice and clean to me! The only thing is the train/val/test thing. I have no strong preference other than that it should be consistent. In https://github.com/OpenBioML/chemnlp/pull/98/files# we have been introducing a special column indicating the split. We might also try your approach and revert the other PR, do you have any preference or opinion? |
I've replies in #104 but I think they're two different things |
381661a
to
0ebf7de
Compare
for more information, see https://pre-commit.ci
Looks good to me, thanks! 💯 |
I've added a transform.py script and related meta.yaml for the Chebi-20 dataset. It contains a dataset of SMILES with their corresponding natural language description. The dataset was first used in this paper but also used here by the same authors. It also features in Kevin's awesome-chemistry-datasets repository.
I reopened this PR because I accidentally removed the commits from the old one 🙃.