Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional datasets #61

Merged
merged 15 commits into from
Jan 25, 2022
Merged

Conversation

cthoyt
Copy link
Contributor

@cthoyt cthoyt commented Jan 24, 2022

Summary

This PR adds the OncoPolyPharmacology, adds a new dataloader that does all processing locally, and updates the pipeline for continuous labels (e.g., for drug synergy) as opposed to binary labels.

  • Code passes all tests
  • Unit tests provided for these changes
  • Documentation and docstrings added for these changes
  • Add other datasets from Create data loader #3 (comment)

Changes

  • Add new harness for local preprocessing and storage of datasets
  • Add mock data for local dataset loader and associated tests
  • Add OncoPolyPharmacology
  • Enable selection of evaluation metrics. ROC-AUC is not appropriate for drug synergy tasks, where the values are continuous, so MSE and MAE are given as an example.

@codecov-commenter
Copy link

codecov-commenter commented Jan 24, 2022

Codecov Report

Merging #61 (7acb34c) into main (30d3327) will decrease coverage by 6.23%.
The diff coverage is 52.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #61      +/-   ##
==========================================
- Coverage   98.07%   91.83%   -6.24%     
==========================================
  Files          28       29       +1     
  Lines         675      772      +97     
==========================================
+ Hits          662      709      +47     
- Misses         13       63      +50     
Impacted Files Coverage Δ
chemicalx/data/drugfeatureset.py 100.00% <ø> (ø)
chemicalx/data/datasetloader.py 77.34% <44.44%> (-17.90%) ⬇️
chemicalx/data/utils.py 45.23% <45.23%> (ø)
chemicalx/data/contextfeatureset.py 90.00% <75.00%> (-10.00%) ⬇️
chemicalx/pipeline.py 88.57% <84.61%> (-0.14%) ⬇️
chemicalx/data/__init__.py 100.00% <100.00%> (ø)
tests/unit/test_models.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 30d3327...7acb34c. Read the comment docs.

@cthoyt cthoyt marked this pull request as ready for review January 25, 2022 13:14
@cthoyt
Copy link
Contributor Author

cthoyt commented Jan 25, 2022

@benedekrozemberczki I could do a mock for the OncoPolyPharmacology dataset, but maybe it makes more sense to apply the loader class to a mock dataset instead

Edit: done in d8309f4

Copy link
Contributor

@benedekrozemberczki benedekrozemberczki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @cthoyt!

@benedekrozemberczki benedekrozemberczki merged commit 02cf0fd into AstraZeneca:main Jan 25, 2022
@cthoyt cthoyt deleted the add-datasets branch January 25, 2022 18:22
@cthoyt cthoyt added the dataset label Feb 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants