Add additional datasets #61

cthoyt · 2022-01-24T12:09:53Z

Summary

This PR adds the OncoPolyPharmacology, adds a new dataloader that does all processing locally, and updates the pipeline for continuous labels (e.g., for drug synergy) as opposed to binary labels.

Code passes all tests
Unit tests provided for these changes
Documentation and docstrings added for these changes
Add other datasets from Create data loader #3 (comment)

Changes

Add new harness for local preprocessing and storage of datasets
Add mock data for local dataset loader and associated tests
Add OncoPolyPharmacology
Enable selection of evaluation metrics. ROC-AUC is not appropriate for drug synergy tasks, where the values are continuous, so MSE and MAE are given as an example.

codecov-commenter · 2022-01-24T12:16:49Z

Codecov Report

Merging #61 (7acb34c) into main (30d3327) will decrease coverage by 6.23%.
The diff coverage is 52.33%.

@@            Coverage Diff             @@
##             main      #61      +/-   ##
==========================================
- Coverage   98.07%   91.83%   -6.24%     
==========================================
  Files          28       29       +1     
  Lines         675      772      +97     
==========================================
+ Hits          662      709      +47     
- Misses         13       63      +50

Impacted Files	Coverage Δ
chemicalx/data/drugfeatureset.py	`100.00% <ø> (ø)`
chemicalx/data/datasetloader.py	`77.34% <44.44%> (-17.90%)`	⬇️
chemicalx/data/utils.py	`45.23% <45.23%> (ø)`
chemicalx/data/contextfeatureset.py	`90.00% <75.00%> (-10.00%)`	⬇️
chemicalx/pipeline.py	`88.57% <84.61%> (-0.14%)`	⬇️
chemicalx/data/__init__.py	`100.00% <100.00%> (ø)`
tests/unit/test_models.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 30d3327...7acb34c. Read the comment docs.

cthoyt · 2022-01-25T13:42:46Z

@benedekrozemberczki I could do a mock for the OncoPolyPharmacology dataset, but maybe it makes more sense to apply the loader class to a mock dataset instead

Edit: done in d8309f4

benedekrozemberczki

Looks good @cthoyt!

cthoyt added 2 commits January 24, 2022 13:09

Add oncopolypharmacology dataset

830d845

Round down cell features

9556ae6

cthoyt added 10 commits January 24, 2022 13:19

Move some code into main package

276b897

Update drugbank_ddi_cleaner.py

d3e29fa

Make oncopolypharmacology dataset more reproducible

012328b

Cleanup flake8/mypy

b73d8f1

Fix views, add metric selection, add synergy example

e172918

Cleanup and add pystow

d954fdf

Delete .DS_Store

99fc49c

Update setup.py

0d1582d

Update datasetloader.py

7f27540

Update test_models.py

7acb34c

cthoyt marked this pull request as ready for review January 25, 2022 13:14

cthoyt requested a review from benedekrozemberczki January 25, 2022 13:40

cthoyt added 3 commits January 25, 2022 14:53

Add mock data

d8309f4

Update format and mocks

96a9db1

Use classvars to enable hacking the names

550f396

benedekrozemberczki approved these changes Jan 25, 2022

View reviewed changes

benedekrozemberczki merged commit 02cf0fd into AstraZeneca:main Jan 25, 2022

cthoyt deleted the add-datasets branch January 25, 2022 18:22

cthoyt added the dataset label Feb 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add additional datasets #61

Add additional datasets #61

cthoyt commented Jan 24, 2022 •

edited

Loading

codecov-commenter commented Jan 24, 2022 •

edited

Loading

cthoyt commented Jan 25, 2022 •

edited

Loading

benedekrozemberczki left a comment

Add additional datasets #61

Add additional datasets #61

Conversation

cthoyt commented Jan 24, 2022 • edited Loading

Summary

Changes

codecov-commenter commented Jan 24, 2022 • edited Loading

Codecov Report

cthoyt commented Jan 25, 2022 • edited Loading

benedekrozemberczki left a comment

Choose a reason for hiding this comment

cthoyt commented Jan 24, 2022 •

edited

Loading

codecov-commenter commented Jan 24, 2022 •

edited

Loading

cthoyt commented Jan 25, 2022 •

edited

Loading