feat: add chebi 20 to datasets #108

jackapbutler · 2023-03-14T15:39:34Z

closes New Task: Add chebi-20 dataset #63

I've added a transform.py script and related meta.yaml for the Chebi-20 dataset. It contains a dataset of SMILES with their corresponding natural language description. The dataset was first used in this paper but also used here by the same authors. It also features in Kevin's awesome-chemistry-datasets repository.

I reopened this PR because I accidentally removed the commits from the old one 🙃.

kjappelbaum · 2023-03-15T22:58:28Z

Thanks, looks quite nice and clean to me! The only thing is the train/val/test thing. I have no strong preference other than that it should be consistent.

In https://github.com/OpenBioML/chemnlp/pull/98/files# we have been introducing a special column indicating the split. We might also try your approach and revert the other PR, do you have any preference or opinion?

jackapbutler · 2023-03-16T09:20:32Z

Thanks, looks quite nice and clean to me! The only thing is the train/val/test thing. I have no strong preference other than that it should be consistent.

In https://github.com/OpenBioML/chemnlp/pull/98/files# we have been introducing a special column indicating the split. We might also try your approach and revert the other PR, do you have any preference or opinion?

I've replies in #104 but I think they're two different things

for more information, see https://pre-commit.ci

…dd-chebi-20

kjappelbaum · 2023-03-28T09:42:34Z

Looks good to me, thanks! 💯

jackapbutler changed the title ~~Add chebi 20 to datasets~~ feat: add chebi 20 to datasets Mar 14, 2023

jackapbutler requested a review from kjappelbaum March 15, 2023 10:46

kjappelbaum added the Awaiting author contribution label Mar 15, 2023

jackapbutler removed the Awaiting author contribution label Mar 20, 2023

jackapbutler requested review from kjappelbaum and removed request for kjappelbaum March 21, 2023 21:30

jackapbutler mentioned this pull request Mar 23, 2023

feat: implement benchmark field 2 #128

Merged

jackapbutler and others added 4 commits March 23, 2023 13:23

add transform script

67815e3

add yaml configs

84df446

[pre-commit.ci] auto fixes from pre-commit.com hooks

219db80

for more information, see https://pre-commit.ci

refactor: remove multiple yaml files

0ebf7de

jackapbutler force-pushed the add-chebi-20 branch from 381661a to 0ebf7de Compare March 23, 2023 13:23

jackapbutler and others added 8 commits March 23, 2023 13:25

fix: ensure transform script runs without errors

241c0b9

[pre-commit.ci] auto fixes from pre-commit.com hooks

fd2e1e7

for more information, see https://pre-commit.ci

update data template with extra fields

f5f91f7

add compund id as field

cdd30d8

Merge branch 'add-chebi-20' of github.com:jackapbutler/chemnlp into a…

ad02dd6

…dd-chebi-20

shorten description string

04b050d

add string field for target types

7786817

turn off linting for long bibtext entry

a6651d0

kjappelbaum approved these changes Mar 28, 2023

View reviewed changes

jackapbutler merged commit 0c3408b into OpenBioML:main Mar 28, 2023

jackapbutler deleted the add-chebi-20 branch March 28, 2023 10:27

phalem mentioned this pull request Jun 27, 2023

Dataset TODO list #75

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add chebi 20 to datasets #108

feat: add chebi 20 to datasets #108

jackapbutler commented Mar 14, 2023 •

edited

Loading

kjappelbaum commented Mar 15, 2023

jackapbutler commented Mar 16, 2023

kjappelbaum commented Mar 28, 2023 •

edited

Loading

feat: add chebi 20 to datasets #108

feat: add chebi 20 to datasets #108

Conversation

jackapbutler commented Mar 14, 2023 • edited Loading

kjappelbaum commented Mar 15, 2023

jackapbutler commented Mar 16, 2023

kjappelbaum commented Mar 28, 2023 • edited Loading

jackapbutler commented Mar 14, 2023 •

edited

Loading

kjappelbaum commented Mar 28, 2023 •

edited

Loading