Improve the ChEMBL drugs dataset #214

stwhitfield · 2023-10-17T13:02:39Z

Purpose: For datamol to have a dataset of all approved drugs in ChEMBL that contains metadata columns such as ChEMBL ID, date of approval, etc.

Changelogs

Added chembl_approved_drugs to datamol/data/
Modified dm.data.chembl_drugs() to leverage it
Adapted docstring to explain how it was generated
Modified unit tests
Added notebooks folder with code that generated chembl_approved_drugs.parquet

Checklist:

Was this PR discussed in an issue? It is recommended to first discuss a new feature into a GitHub issue before opening a PR.
Add tests to cover the fixed bug(s) or the new introduced feature(s) (if appropriate).
Update the API documentation if a new function is added, or an existing one is deleted.
Write concise and explanatory changelogs below.
If possible, assign one of the following labels to the PR: feature, fix or test (or ask a maintainer to do it for you).

Update chembl_drugs to use chembl approved drugs.

fix Unnamed:0 column header

Update test_chembl_drugs

codecov · 2023-10-17T13:05:23Z

Codecov Report

Merging #214 (994ef96) into main (e3c4a38) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main     #214   +/-   ##
=======================================
  Coverage   91.91%   91.91%           
=======================================
  Files          46       46           
  Lines        3835     3835           
=======================================
  Hits         3525     3525           
  Misses        310      310

Flag	Coverage Δ
unittests	`91.91% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files	Coverage Δ
datamol/data/__init__.py	`78.07% <100.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

hadim

Thanks Shawn. As discussed in live, a few points below:

Edit the original message in this ticket to add less details about the PR.
Edit the docstring as discussed (dont mention medchem, markdown link to the notebook, date of the run, etc).
Use Parquet instead of CSV.
Apply black on the code.

want parquet not csv format

hadim · 2023-10-22T22:46:05Z

Thanks @stwhitfield for your first contribution to datamol!

stwhitfield added 9 commits October 16, 2023 14:07

add notebook folder and Get_ChEMBL_Approved_Drugs

07e3b81

Delete notebooks directory

9b7547a

add chembl approved drugs

27b04d7

Delete datamol/data/chembl_approved_drugs.parquet

cb0cc82

add chembl approved drugs csv

703b9d4

add notebooks folder and get_chembl_approved_drugs

c67d017

Update __init__.py

4c34111

Update chembl_drugs to use chembl approved drugs.

fix chembl_approved_drugs.csv

1d1cb9e

fix Unnamed:0 column header

Update test_data.py

46f0707

Update test_chembl_drugs

stwhitfield added the fix label Oct 17, 2023

stwhitfield requested a review from hadim as a code owner October 17, 2023 13:02

stwhitfield linked an issue Oct 17, 2023 that may be closed by this pull request

Improve chembl drugs dataset #213

Closed

hadim reviewed Oct 18, 2023

View reviewed changes

stwhitfield and others added 6 commits October 18, 2023 14:30

Delete datamol/data/chembl_approved_drugs.csv

a4da330

want parquet not csv format

update chembl_approved_drugs with parquet

3f774a2

applied black on new code

93ee78f

change reference filepath in chembl_drugs docstring

e301632

fix chembl drugs open file

0eaf00d

fix relative filepath in chembl_drugs docstring

994ef96

hadim changed the title ~~213 improve chembl drugs dataset~~ Improve the ChEMBL drugs dataset Oct 22, 2023

hadim self-requested a review October 22, 2023 22:44

hadim approved these changes Oct 22, 2023

View reviewed changes

hadim merged commit 3939c12 into main Oct 22, 2023
16 checks passed

hadim deleted the 213-improve-chembl-drugs-dataset branch October 22, 2023 22:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the ChEMBL drugs dataset #214

Improve the ChEMBL drugs dataset #214

stwhitfield commented Oct 17, 2023 •

edited

Loading

codecov bot commented Oct 17, 2023 •

edited

Loading

hadim left a comment •

edited

Loading

hadim commented Oct 22, 2023

Improve the ChEMBL drugs dataset #214

Improve the ChEMBL drugs dataset #214

Conversation

stwhitfield commented Oct 17, 2023 • edited Loading

Changelogs

codecov bot commented Oct 17, 2023 • edited Loading

Codecov Report

hadim left a comment • edited Loading

Choose a reason for hiding this comment

hadim commented Oct 22, 2023

stwhitfield commented Oct 17, 2023 •

edited

Loading

codecov bot commented Oct 17, 2023 •

edited

Loading

hadim left a comment •

edited

Loading