Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRAFT: Preprocess the IMPROVE way for our Uno model #1

Draft
wants to merge 168 commits into
base: develop
Choose a base branch
from

Conversation

rajeeja
Copy link

@rajeeja rajeeja commented Oct 19, 2023

Map files from the Uno way of doing things to IMPROVE way:
UNO: https://web.cels.anl.gov/projects/IMPROVE_FTP/candle/public/benchmarks/Pilot1/combo/

IMPROVE: https://ftp.mcs.anl.gov/pub/candle/public/improve/IMP_data/


 Uno git:(develop) ✗ grep DATA_URL uno_data.py
DATA_URL = "https://web.cels.anl.gov/projects/IMPROVE_FTP/candle/public/benchmarks/Pilot1/combo/"
    path = get_file(DATA_URL + "rescaled_combined_single_drug_growth")
    path = get_file(DATA_URL + "ComboDrugGrowth_Nov2017.csv")
    cellmap_path = get_file(DATA_URL + "NCI60_CELLNAME_to_Combo.txt")
    path = get_file(DATA_URL + "combined_single_response_agg")
    path = get_file(DATA_URL + "extended_combined_mordred_descriptors")
    path = get_file(DATA_URL + "drug_info")
    path = get_file(DATA_URL + "cl_metadata")
    path = get_file(DATA_URL + "NCI60_CELLNAME_to_Combo.txt")
    path = get_file(DATA_URL + "cl_mapping")
    path = get_file(DATA_URL + "NCI_IOA_AOA_drugs")
    path = get_file(DATA_URL + "{}_dragon7_descriptors.tsv".format(drug_set))
    path = get_file(DATA_URL + "{}_dragon7_{}.tsv".format(drug_set, fp))
    path = get_file(DATA_URL + 'ChemStr

Replace uno_data CombinedDataGenerator, CombinedDataLoader, DataFeeder with improve equivalents.

Work on Uno_IMPROVE folder for this work.
The goal is to be able to do cross-study like the other IMPROVE models.

@rajeeja
Copy link
Author

rajeeja commented Nov 14, 2023

Another approach that can be added - maybe a little difficult -- to use uno_preprocess.py to create a .h5 file and use the config parameter --use-exported and load that hdf5 file for train and infer. That'd be much less intrusive for the original model.

Copy link

@RylieWeaver RylieWeaver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you make changes to uno_data.py in the original? I think we want to leave that folder alone

@rajeeja rajeeja marked this pull request as draft December 5, 2023 04:16
@rajeeja rajeeja requested a review from RylieWeaver December 5, 2023 04:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants