Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Task: Add chebi-20 dataset #63

Closed
jackapbutler opened this issue Mar 3, 2023 · 1 comment · Fixed by #108
Closed

New Task: Add chebi-20 dataset #63

jackapbutler opened this issue Mar 3, 2023 · 1 comment · Fixed by #108
Assignees
Labels

Comments

@jackapbutler
Copy link
Collaborator

jackapbutler commented Mar 3, 2023

Overview

I will add the chebi-20 dataset from this paper which provides rows which map from "CID", "SMILES" and a natural language description of the particular molecule.

Basic template could be The molecule <CID> with smiles <SMILES> can be described as follows ____. This dataset is also already mentioned in the awesome-chemistry-datasets repository.

@jackapbutler
Copy link
Collaborator Author

In order to test out the functionality of Hugging Face datasets, I can try to upload this to the OpenBioML Hugging Face organisation and use that path within the transform.py script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants