Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create data loader #3

Closed
benedekrozemberczki opened this issue Dec 15, 2021 · 2 comments
Closed

Create data loader #3

benedekrozemberczki opened this issue Dec 15, 2021 · 2 comments

Comments

@benedekrozemberczki
Copy link
Contributor

  • Load drug side features.
  • Load triples.
  • Document.
  • Test generation.
@cthoyt
Copy link
Contributor

cthoyt commented Dec 16, 2021

I have 5 papers with datasets that would be worth looking into suggested by @debplana:

Mathews Griner LA, Guha R, Shinn P, Young RM, Keller JM, Liu D, Goldlust IS, Yasgar A, McKnight C, Boxer MB, Duveau DY, Jiang JK, Michael S, Mierzwa T, Huang W, Walsh MJ, Mott BT, Patel P, Leister W, Maloney DJ, et al. 2014. High-throughput combinatorial screening identifies drugs that cooperate with ibrutinib to kill activated B-cell-like diffuse large B-cell lymphoma cells. PNAS 111:2349–2354. DOI: https://doi.org/10.1073/pnas.1311846111, PMID: 24469833

This dataset only has a handful of drug synergy pairs. Could be manually curated to be used for evalution, but not enough for training.

O’Neil J, Benita Y, Feldman I, Chenard M, Roberts B, Liu Y, Li J, Kral A, Lejnine S, Loboda A, Arthur W. An unbiased oncology compound screen to identify novel combination strategies. Molecular Cancer Therapeutics. 2016;15(6):1155–1162. https://doi.org/10.1158/1535-7163.MCT-15-0843

This is the OncoPolyPharmacology in TDC.

Borisy AA, Elliott PJ, Hurst NW, Lee MS, Lehar J, Price ER, Serbedzija G, Zimmermann GR, Foley MA, Stockwell BR, Keith CT. 2003. Systematic discovery of multicomponent therapeutics. PNAS 100:7977–7982. DOI: https://doi.org/10.1073/pnas.1337088100, PMID: 12799470

Could not find supplementary information

DREAM Challenge

Bansal M, Yang J, Karan C, Menden MP, Costello JC, Tang H, Xiao G, Li Y, Allen J, Zhong R, Chen B, Kim M, Wang T, Heiser LM, Realubit R, Mattioli M, Alvarez MJ, Shen Y, Gallahan D, Singer D, et al. 2014. A community computational challenge to predict the activity of pairs of compounds. Nature Biotechnology 32:1213–1222. DOI: https://doi.org/10.1038/nbt.3052, PMID: 25419740

It appears the website linked by this paper, http://www.the-dream-project.org/challenges/nci-dream-drug-sensitivity-prediction-challenge, is down.

AstraZeneca-Sanger Drug Combination DREAM Consortium, Menden MP, Wang D, Mason MJ, Szalai B, Bulusu KC, Guan Y, Yu T, Kang J, Jeon M, Wolfinger R, Nguyen T, Zaslavskiy M, Jang IS, Ghazoui Z, Ahsen ME, Vogel R, Neto EC, Norman T, Tang EKY, Garnett MJ, et al. 2019. Community assessment to advance computational prediction of Cancer drug combinations in a pharmacogenomic screen. Nature Communications 10:2674. DOI: https://doi.org/10.1038/s41467-019-09799-2, PMID: 3120923

Therapeutic Data Commons


After chatting with Deb, it's clear we should be really careful to make sure we only compare within cell lines. This also opens us up to doing an interesting evaluation where you train on data from one cell line and test on another. Second important thing is we need to be careful of is concentration. Last is we need to also provide some meaningful baselines , because it's a good bet the ML people are way off base compared to what's actually useful in the field

@benedekrozemberczki
Copy link
Contributor Author

Added some basic loader for two datasets - I will close for now, but these comments are extremely good - the person who worked on the AZ sanger dataset (Krishna Bulusu) works with us closely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants