Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the ChEMBL drugs dataset #214

Merged
merged 15 commits into from
Oct 22, 2023
Merged

Conversation

stwhitfield
Copy link
Contributor

@stwhitfield stwhitfield commented Oct 17, 2023

Purpose: For datamol to have a dataset of all approved drugs in ChEMBL that contains metadata columns such as ChEMBL ID, date of approval, etc.

Changelogs

Added chembl_approved_drugs to datamol/data/
Modified dm.data.chembl_drugs() to leverage it
Adapted docstring to explain how it was generated
Modified unit tests
Added notebooks folder with code that generated chembl_approved_drugs.parquet


Checklist:

  • Was this PR discussed in an issue? It is recommended to first discuss a new feature into a GitHub issue before opening a PR.
  • Add tests to cover the fixed bug(s) or the new introduced feature(s) (if appropriate).
  • Update the API documentation if a new function is added, or an existing one is deleted.
  • Write concise and explanatory changelogs below.
  • If possible, assign one of the following labels to the PR: feature, fix or test (or ask a maintainer to do it for you).

@stwhitfield stwhitfield linked an issue Oct 17, 2023 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Oct 17, 2023

Codecov Report

Merging #214 (994ef96) into main (e3c4a38) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main     #214   +/-   ##
=======================================
  Coverage   91.91%   91.91%           
=======================================
  Files          46       46           
  Lines        3835     3835           
=======================================
  Hits         3525     3525           
  Misses        310      310           
Flag Coverage Δ
unittests 91.91% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
datamol/data/__init__.py 78.07% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@hadim hadim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Shawn. As discussed in live, a few points below:

  • Edit the original message in this ticket to add less details about the PR.
  • Edit the docstring as discussed (dont mention medchem, markdown link to the notebook, date of the run, etc).
  • Use Parquet instead of CSV.
  • Apply black on the code.

@hadim hadim changed the title 213 improve chembl drugs dataset Improve the ChEMBL drugs dataset Oct 22, 2023
@hadim hadim self-requested a review October 22, 2023 22:44
@hadim
Copy link
Contributor

hadim commented Oct 22, 2023

Thanks @stwhitfield for your first contribution to datamol!

@hadim hadim merged commit 3939c12 into main Oct 22, 2023
16 checks passed
@hadim hadim deleted the 213-improve-chembl-drugs-dataset branch October 22, 2023 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve chembl drugs dataset
2 participants