Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRAFT: Preprocess the IMPROVE way for our Uno model #1

Draft
wants to merge 168 commits into
base: develop
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
168 commits
Select commit Hold shift + click to select a range
31912fa
o Change to new link
rajeeja Oct 19, 2023
b3a2cc8
o Add a new folder for this work, this preserves the old working code…
rajeeja Oct 19, 2023
7b604b3
test branch push permissions
Oct 20, 2023
51065af
cleanup
Oct 20, 2023
1b2b397
o Boiler plate for improve
rajeeja Nov 8, 2023
b81f65b
o Add more boiler plate to preprocess
rajeeja Nov 8, 2023
6b1ef97
o Add more readme
rajeeja Nov 8, 2023
255aa88
o Uno train structure
rajeeja Nov 8, 2023
27d8149
Draft uno_preprocess2.py
RylieWeaver Nov 12, 2023
6753b56
Delete Pilot1/uno_preprocess2.py
RylieWeaver Nov 13, 2023
973c28d
Create uno_preprocess_draft.py
RylieWeaver Nov 13, 2023
1130561
load_aggregated_single_response IMPROVE standardize
Nov 21, 2023
7ba0cc2
fix accidental line delete
Nov 21, 2023
cfb417c
o incremental changes and comments
rajeeja Nov 28, 2023
1b249b7
Create To-Do.md
RylieWeaver Nov 29, 2023
07c3b4b
o more preprocess
rajeeja Nov 29, 2023
fa7eb17
o Add an IMPROVE compliant model defs file
rajeeja Nov 29, 2023
166b849
IMPROVE data preprocessing and uno improve model
Nov 29, 2023
3ea0523
change to parquet for speed, include CANDLE_DATA_DIR, and download li…
Dec 3, 2023
abdd547
don't need scalings because using IMPROVE functions now
Dec 3, 2023
4e51e73
o Try to use single drug and not July2020 data
rajeeja Dec 4, 2023
711c210
o preprocess getting final touches
rajeeja Dec 5, 2023
ead83e5
Adapt uno_improve and uno_preprocess
Dec 5, 2023
4fbc438
scale drug data and make network smaller for testing
Dec 5, 2023
4513f2f
soft code train using params file and use preprocess.sh
Dec 10, 2023
c2e9858
IMPROVE compliance, sh files, and start singularity file
Dec 12, 2023
539cae9
Delete Pilot1/Uno_IMPROVE/uno_baseline_keras2.py
RylieWeaver Dec 13, 2023
d45e197
Delete Pilot1/Uno_IMPROVE/uno.py
RylieWeaver Dec 13, 2023
523e638
Delete Pilot1/Uno_IMPROVE/uno_data.py
RylieWeaver Dec 13, 2023
620c12d
Delete Pilot1/Uno_IMPROVE/uno_preprocess_draft.py
RylieWeaver Dec 13, 2023
9f25dfc
Delete Pilot1/Uno_IMPROVE/uno_train.py
RylieWeaver Dec 13, 2023
6ead6b8
Rename uno_preprocess_improve.sh to preprocess.sh
RylieWeaver Dec 13, 2023
92b1dc1
Delete Pilot1/Uno_IMPROVE/uno_preprocess_new.py
RylieWeaver Dec 13, 2023
ff4598f
Rename uno_infer.py to uno_infer_improve.py
RylieWeaver Dec 13, 2023
360ca25
preprocess and train sh files standardized for cuda devices and etc...
Dec 19, 2023
9b968bf
IMPROVE_DATA_DIR instead of CANDLE_DATA_DIR
Dec 19, 2023
4d43e06
Update uno_preprocess_improve.py
wilke Dec 20, 2023
62a7e67
Merge pull request #4 from JDACS4C-IMPROVE/wilke-patch-1
wilke Dec 20, 2023
cc7cea5
Updated preprocess, check for config file, removed CUDA
wilke Dec 20, 2023
99f3877
Merge branch 'preprocess_improve' of github.com:JDACS4C-IMPROVE/Bench…
wilke Dec 20, 2023
bc86ccd
Check for config file, removed CUDA
wilke Dec 20, 2023
38d7ca7
Check for config file, removed CUDA
wilke Dec 20, 2023
89ff908
Debugging and simplifywq
wilke Dec 20, 2023
d8bc2f0
Added missing option
wilke Dec 20, 2023
53a2a93
Reverse any changes to original Uno
Dec 20, 2023
8883fc1
Revert "Reverse any changes to original Uno"
Dec 20, 2023
c241f53
Had to revert VS Code formatting... making the original uno have no c…
Dec 20, 2023
f571c4f
Create conda_env.sh
RylieWeaver Jan 10, 2024
3e4dba0
Update conda_env.sh
RylieWeaver Jan 11, 2024
3f4b6f5
Update conda_env.sh
RylieWeaver Jan 11, 2024
1a86ba2
Update conda_env.sh
RylieWeaver Jan 11, 2024
3156dfa
CANDLE_DATA_DIR is deprecated. Use IMPROVE_DATA_DIR
Jan 12, 2024
b11ef74
Rough preprocess call in train.sh and preprocess python file formatting
Jan 12, 2024
8791ec3
Fix pathing looking for processed data inside IMPROVE_DATA_DIR
Jan 12, 2024
0305423
tweak parameter file
Jan 12, 2024
f236788
Path fix for preprocess call
Jan 12, 2024
af95a33
add test inferral
Jan 16, 2024
0899ef5
unnecessary IMPROVE_DATA_DIR check
Jan 19, 2024
2cbbd27
adding hyperparameter functionality with number of layers
Jan 19, 2024
0069aef
warmup type hyperparameter functionality
Jan 19, 2024
efd656f
better naming and updating txt file for new hyperparameters
Jan 19, 2024
193ea56
txt file for num layers hyperparameter functionality
Jan 19, 2024
9b7bd3f
parquet reading and smaller dataset debug setting
Jan 20, 2024
3d83c7e
Better debug setting
Jan 20, 2024
a3172c6
parquet reading in infer
Jan 20, 2024
f6a1ccb
Revise debug setting to work (can't load only n rows)
Jan 20, 2024
f63ca20
Update README.md
RylieWeaver Jan 22, 2024
6dd8676
update train_params definitions
Jan 22, 2024
f5087a6
fix test split naming
Jan 22, 2024
26e6553
update train_params definitions and polish preprocessing / add debug …
Jan 23, 2024
06da9c9
polish preprocessing / add debug option
Jan 23, 2024
e9cc679
refine compose_data_arrays and make prints cleaner
Jan 23, 2024
75da962
make prints cleaner
Jan 23, 2024
61c892f
update default model and fix train debug param name
Jan 22, 2024
730df21
times
Jan 23, 2024
9346094
times
Jan 23, 2024
7c5e5d9
subset data functionality
Jan 23, 2024
ca81cf4
save and load one data file
Jan 24, 2024
4b46c6a
slight hyperparameter change
Jan 24, 2024
b121312
slight hyperparameter change
Jan 24, 2024
679bcdb
update defaults and remove old prints
Jan 24, 2024
cbb703d
fix test set read
Jan 24, 2024
675ebbb
merge differenlt and create common samples function
Jan 25, 2024
25e8641
improve subsetting
Jan 25, 2024
ed89959
preprocess debug false by default
Jan 25, 2024
6f4e8b9
gene symbol to solve multilevel dataframe issue
Jan 25, 2024
8342cbf
debug accidentally set default true
Jan 25, 2024
e17b6be
train data batching
Jan 25, 2024
e953ffa
update infer and polish others to use auc param
Jan 25, 2024
41d2fe7
flatten to accomodate store predictions
Jan 25, 2024
8d3e0e1
better ntime keeping
Jan 26, 2024
41d00f6
import timekeeping instead of multiple functions
Jan 26, 2024
8a56735
deal with memory issues. data loaders, r2 callback adjustment, and gp…
Jan 26, 2024
13835b9
update params for batching
Jan 26, 2024
af11a05
more for dealing with large datasets
Jan 26, 2024
119d2c5
optimizer for HPO add
Jan 27, 2024
14f8933
Update infer to work with large datasets and put common functions in …
Jan 28, 2024
2c1eee9
GPU check
Jan 28, 2024
c209479
Update README.md
RylieWeaver Jan 28, 2024
64821d0
update def file and instructions
Jan 28, 2024
bad448a
take preprocess out of train.sh
Jan 28, 2024
bc9f726
default params and preprocess in train.sh
Jan 28, 2024
acb4668
protobuf version fix
Jan 28, 2024
877926a
preprocess.sh
Jan 28, 2024
ef467cc
clone Uno_IMPROVE not original
Jan 28, 2024
59d504c
update def file
Jan 28, 2024
850fd6f
cross study add
Jan 28, 2024
432104b
singularity testing
Jan 28, 2024
854dcb9
optimizer and missing interaction activation defaults
Jan 28, 2024
335744a
move src location
Jan 28, 2024
0f230e3
not in use for now
Jan 28, 2024
9b304d2
restore defaults
Jan 28, 2024
dd00259
can't have singularity in here when trying to build with cloning this…
Jan 28, 2024
fe4557d
make robust preprocess in train.sh
Jan 28, 2024
cdea77b
Update train.sh
RylieWeaver Jan 28, 2024
b259cc2
directory to execute preprocess in container
Jan 28, 2024
b658935
fix variable
Jan 28, 2024
a998a79
fix index error with data generation
Jan 29, 2024
743638f
better early stopping default and no warmup option
Jan 29, 2024
9fe64be
data generator factory for r2 callback. additional definitions fix fo…
Jan 30, 2024
482b858
fix definitions list error
Jan 30, 2024
c4425c1
verbosity to tell indices
Jan 30, 2024
87a392e
soft debugging and subsetting. training batch size and validating bat…
Jan 31, 2024
f4c50a4
use generator batch size
Jan 31, 2024
2f52db7
load preprocess_param into train
Jan 31, 2024
8fb6905
make train debug and subset CL passable
Jan 31, 2024
27904de
missing comma
Jan 31, 2024
b1f773e
update sh scripts to say IMPROVE and preprocess list allowed datasets
Feb 1, 2024
eb54186
stick with CANDLE_MODEL because models are candleized, datasets are I…
Feb 1, 2024
89bbbf7
slight cchanges for using generator
Feb 1, 2024
70d0eba
update README
Feb 1, 2024
3db34ed
punctuation
Feb 1, 2024
718a701
unintentional added line
Feb 1, 2024
43d236e
make preprocess have train params
Feb 2, 2024
7db1c1c
move definitions for hpo compatibility
Feb 2, 2024
e6e3cda
move parameters
Feb 2, 2024
909c434
Update README.md data downloading
RylieWeaver Feb 6, 2024
686ad38
Data Shuffling and regression hyperparameter
Feb 7, 2024
92c19c5
Merge branch 'preprocess_improve' of github.com:JDACS4C-IMPROVE/Bench…
Feb 7, 2024
868889a
simple comment fix
Feb 7, 2024
f61fd35
better early stopping parameter
Feb 7, 2024
63974d7
activation function note
Feb 8, 2024
14f3dd9
regression activation
Feb 9, 2024
66a6fc2
comment uneccessary gpu memory growth
Feb 12, 2024
9c3999f
Take out lr_log_10_range hyperparameter and improve compliance with t…
Feb 18, 2024
2adf861
standardize train.sh
Feb 18, 2024
4311ab4
debug power_yj scaling
Feb 21, 2024
3848544
exploring different hp-spaces functionality
Feb 21, 2024
11867ef
update to new params
Feb 22, 2024
8166a78
fix typo in params type
Feb 22, 2024
751a5cd
Update README.md
RylieWeaver Feb 22, 2024
d407873
Update README.md
RylieWeaver Feb 22, 2024
79eceb9
better pathing defaults
Feb 22, 2024
07ef2de
Merge branch 'preprocess_improve' of github.com:JDACS4C-IMPROVE/Bench…
Feb 22, 2024
30467b4
cleaning up
Feb 23, 2024
cff4afb
clean
Feb 23, 2024
7810fcc
comment test prediction
Feb 25, 2024
5f635ed
clean up gpu detection
Feb 25, 2024
9ef0ebe
take test prediction out of train
Feb 25, 2024
8799c15
update parameters
Feb 28, 2024
80a3429
different scheme for larger datasets (merging during batch yielding)
Mar 19, 2024
21d88db
cleanup given addition of new model
Mar 27, 2024
3b52f32
combine utils
Mar 30, 2024
b3b33c5
put more functions in utils
Mar 31, 2024
1c430ea
have batch merging as default
Mar 31, 2024
2fd7635
better naming
Mar 31, 2024
c101791
update infer
Mar 31, 2024
a371f1f
revise back to original
Mar 31, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions Pilot1/Uno_IMPROVE/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Files in the repo
- Modified for IMPROVE (pick one configuration of drug):
- uno_default_model.txt
- uno_preprocess.py
- uno_train.py
- uno_infer.py

# Conda Run (Miniconda version: 23.11.0)
- conda create --name Uno_IMPROVE python=3.7.16
- conda activate Uno_IMPROVE
- conda config --add channels conda-forge
- conda install tensorflow-gpu=2.10.0
- pip install git+https://github.com/ECP-CANDLE/candle_lib@develop
- pip install protobuf==3.20.0
- git clone https://github.com/JDACS4C-IMPROVE/IMPROVE.git
- export PYTHONPATH=<IMPROVE_LIBRARY_PATH>/:$PYTHONPATH
- pip install pyarrow==12.0.1 scikit-learn==1.0.2 joblib==1.3.2


In order to run the modified version of the code, you need to run the following commands:
```
export IMPROVE_DATA_DIR=<DESIRED_DATA_DIR>
wget --cut-dirs=9 -P ~/$IMPROVE_DATA_DIR -np -nH -m https://web.cels.anl.gov/projects/IMPROVE_FTP/candle/public/improve/benchmarks/single_drug_drp/benchmark-data-pilot1/csa_data/
export PYTHONPATH=<IMPROVE_LIBRARY>/:$PYTHONPATH
python uno_preprocess_improve.py
python uno_train_improve.py
```
3 changes: 3 additions & 0 deletions Pilot1/Uno_IMPROVE/To-Do.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
1: Change the datasets to the IMPROVE library
2: Use the splits provided by the IMPROVE library
3: Use the IMPROVE functions for data_preprocess
266 changes: 266 additions & 0 deletions Pilot1/Uno_IMPROVE/csa_wf_v3.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,266 @@
""" Python implementation of cross-study analysis workflow """

import os
import subprocess
import warnings
from time import time
from pathlib import Path

import pandas as pd

# IMPROVE imports
from improve import framework as frm

# LightGBM imports
# TODO: change this for your model
import uno_preprocess_improve
import uno_train_improve
import uno_infer_improve

# from ap_utils.classlogger import Logger
# from ap_utils.utils import get_print_func, Timer


class Timer:
""" Measure time. """
def __init__(self):
self.start = time()

def timer_end(self):
self.end = time()
return self.end - self.start

def display_timer(self, print_fn=print):
time_diff = self.timer_end()
if (time_diff) // 3600 > 0:
print_fn("Runtime: {:.1f} hrs".format( (time_diff)/3600) )
else:
print_fn("Runtime: {:.1f} mins".format( (time_diff)/60) )


fdir = Path(__file__).resolve().parent

maindir = Path(f"./cross_study_HPO1")
MAIN_ML_DATA_DIR = Path(f"./{maindir}/ml_data")
MAIN_MODEL_DIR = Path(f"./{maindir}/models")
MAIN_INFER_OUTDIR = Path(f"./{maindir}/infer")

# Check that environment variable "IMPROVE_DATA_DIR" has been specified
if os.getenv("IMPROVE_DATA_DIR") is None:
raise Exception("ERROR ! Required system variable not specified. \
You must define IMPROVE_DATA_DIR ... Exiting.\n")
os.environ["CANDLE_DATA_DIR"] = os.environ["IMPROVE_DATA_DIR"]

params = frm.initialize_parameters(
fdir,
default_model="csa_workflow_params.txt",
)

main_datadir = Path(os.environ["IMPROVE_DATA_DIR"])
raw_datadir = main_datadir / params["raw_data_dir"]
x_datadir = raw_datadir / params["x_data_dir"]
y_datadir = raw_datadir / params["y_data_dir"]
splits_dir = raw_datadir / params["splits_dir"]

# lg = Logger(main_datadir/"csa.log")
print_fn = print
# print_fn = get_print_func(lg.logger)
print_fn(f"File path: {fdir}")

### Source and target data sources
## Set 1 - full analysis
source_datasets = ["gCSI", "CTRPv2", "GDSCv1", "GDSCv2", "CCLE"]
target_datasets = ["gCSI", "CTRPv2", "GDSCv1", "GDSCv2", "CCLE"]
## Set 2 - smaller datasets
# source_datasets = ["CCLE", "gCSI"]
# target_datasets = ["CCLE", "gCSI"]
# source_datasets = ["CCLE", "gCSI", "GDSCv1", "GDSCv2"]
# target_datasets = ["CCLE", "gCSI", "GDSCv1", "GDSCv2"]
# source_datasets = ["CCLE", "GDSCv1"]
# target_datasets = ["CCLE", "gCSI", "GDSCv1", "GDSCv2"]
## Set 3 - full analysis for a single source
# source_datasets = ["CCLE"]
# source_datasets = ["CTRPv2"]
# target_datasets = ["CCLE", "CTRPv2", "gCSI", "GDSCv1", "GDSCv2"]
# target_datasets = ["CCLE", "gCSI", "GDSCv1", "GDSCv2"]
# target_datasets = ["CCLE", "gCSI", "GDSCv2"]
## Set 4 - same source and target
# source_datasets = ["CCLE"]
# target_datasets = ["CCLE"]
## Set 5 - single source and target
# source_datasets = ["GDSCv1"]
# target_datasets = ["CCLE"]

only_cross_study = False
# only_cross_study = True

y_col_name = "auc"
# y_col_name = "auc1"

## Splits
# split_nums = [] # all splits
split_nums = [0]
# split_nums = [4, 7]
# split_nums = [1, 4, 7]
# split_nums = [1, 3, 5, 7, 9]

def build_split_fname(source: str, split: int, phase: str):
""" Build split file name. If file does not exist continue """
return f"{source_data_name}_split_{split}_{phase}.txt"

# ===============================================================
### Generate CSA results (within- and cross-study)
# ===============================================================

timer = Timer()
# Iterate over source datasets
# Note! The "source_data_name" iterations are independent of each other
print_fn(f"\nsource_datasets: {source_datasets}")
print_fn(f"target_datasets: {target_datasets}")
print_fn(f"split_nums: {split_nums}")
# import pdb; pdb.set_trace()
for source_data_name in source_datasets:

# Get the split file paths
# This parsing assumes splits file names are: SOURCE_split_NUM_[train/val/test].txt
if len(split_nums) == 0:
# Get all splits
split_files = list((splits_dir).glob(f"{source_data_name}_split_*.txt"))
split_nums = [str(s).split("split_")[1].split("_")[0] for s in split_files]
split_nums = sorted(set(split_nums))
# num_splits = 1
else:
# Use the specified splits
split_files = []
for s in split_nums:
split_files.extend(list((splits_dir).glob(f"{source_data_name}_split_{s}_*.txt")))

files_joined = [str(s) for s in split_files]

# --------------------
# Preprocess and Train
# --------------------
for split in split_nums:
print_fn(f"Split id {split} out of {len(split_nums)} splits.")
# Check that train, val, and test are available. Otherwise, continue to the next split.
# split = 11
# files_joined = [str(s) for s in split_files]
# TODO: check this!
for phase in ["train", "val", "test"]:
fname = build_split_fname(source_data_name, split, phase)
# print(f"{phase}: {fname}")
if fname not in "\t".join(files_joined):
warnings.warn(f"\nThe {phase} split file {fname} is missing (continue to next split)")
continue

for target_data_name in target_datasets:
if only_cross_study and (source_data_name == target_data_name):
continue # only cross-study
print_fn(f"\nSource data: {source_data_name}")
print_fn(f"Target data: {target_data_name}")

ml_data_outdir = MAIN_ML_DATA_DIR/f"{source_data_name}-{target_data_name}"/f"split_{split}"

if source_data_name == target_data_name:
# If source and target are the same, then infer on the test split
test_split_file = f"{source_data_name}_split_{split}_test.txt"
else:
# If source and target are different, then infer on the entire target dataset
test_split_file = f"{target_data_name}_all.txt"

# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# p1 (none): Preprocess train data
# train_split_files = list((ig.splits_dir).glob(f"{source_data_name}_split_0_train*.txt")) # TODO: placeholder for lc analysis
timer_preprocess = Timer()
# ml_data_path = graphdrp_preprocess_improve.main([
# "--train_split_file", f"{source_data_name}_split_{split}_train.txt",
# "--val_split_file", f"{source_data_name}_split_{split}_val.txt",
# "--test_split_file", str(test_split_file_name),
# "--ml_data_outdir", str(ml_data_outdir),
# "--y_col_name", y_col_name
# ])
print_fn("\nPreprocessing")
train_split_file = f"{source_data_name}_split_{split}_train.txt"
val_split_file = f"{source_data_name}_split_{split}_val.txt"
print_fn(f"train_split_file: {train_split_file}")
print_fn(f"val_split_file: {val_split_file}")
print_fn(f"test_split_file: {test_split_file}")
print_fn(f"ml_data_outdir: {ml_data_outdir}")
preprocess_run = ["python",
"uno_preprocess_improve.py",
"--train_split_file", str(train_split_file),
"--val_split_file", str(val_split_file),
"--test_split_file", str(test_split_file),
"--ml_data_outdir", str(ml_data_outdir),
"--y_col_name", str(y_col_name)
]
result = subprocess.run(preprocess_run, capture_output=True,
text=True, check=True)
# print(result.stdout)
# print(result.stderr)
timer_preprocess.display_timer(print_fn)

# p2 (p1): Train model
# Train a single model for a given [source, split] pair
# Train using train samples and early stop using val samples
model_outdir = MAIN_MODEL_DIR/f"{source_data_name}"/f"split_{split}"
if model_outdir.exists() is False:
train_ml_data_dir = ml_data_outdir
val_ml_data_dir = ml_data_outdir
timer_train = Timer()
# graphdrp_train_improve.main([
# "--train_ml_data_dir", str(train_ml_data_dir),
# "--val_ml_data_dir", str(val_ml_data_dir),
# "--model_outdir", str(model_outdir),
# "--epochs", str(epochs), # available in config_file
# # "--ckpt_directory", str(MODEL_OUTDIR), # TODO: we'll use candle known param ckpt_directory instead of model_outdir
# # "--cuda_name", "cuda:5"
# ])
print_fn("\nTrain")
print_fn(f"train_ml_data_dir: {train_ml_data_dir}")
print_fn(f"val_ml_data_dir: {val_ml_data_dir}")
print_fn(f"model_outdir: {model_outdir}")
# import pdb; pdb.set_trace()
train_run = ["python",
"uno_train_improve.py",
"--train_ml_data_dir", str(train_ml_data_dir),
"--val_ml_data_dir", str(val_ml_data_dir),
"--model_outdir", str(model_outdir),
"--y_col_name", y_col_name
]
result = subprocess.run(train_run, capture_output=True,
text=True, check=True)
# print(result.stdout)
# print(result.stderr)
timer_train.display_timer(print_fn)

# Infer
# p3 (p1, p2): Inference
test_ml_data_dir = ml_data_outdir
model_dir = model_outdir
infer_outdir = MAIN_INFER_OUTDIR/f"{source_data_name}-{target_data_name}"/f"split_{split}"
timer_infer = Timer()
# graphdrp_infer_improve.main([
# "--test_ml_data_dir", str(test_ml_data_dir),
# "--model_dir", str(model_dir),
# "--infer_outdir", str(infer_outdir),
# # "--cuda_name", "cuda:5"
# ])
print_fn("\nInfer")
print_fn(f"test_ml_data_dir: {test_ml_data_dir}")
print_fn(f"infer_outdir: {infer_outdir}")
infer_run = ["python",
"uno_infer_improve.py",
"--test_ml_data_dir", str(test_ml_data_dir),
"--model_dir", str(model_dir),
"--infer_outdir", str(infer_outdir),
"--y_col_name", y_col_name
]
result = subprocess.run(infer_run, capture_output=True,
text=True, check=True)
timer_infer.display_timer(print_fn)

# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

timer.display_timer(print_fn)
print_fn("Finished a full cross-study run.")
8 changes: 8 additions & 0 deletions Pilot1/Uno_IMPROVE/csa_workflow_params.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[Global_Params]
model_name = "DRP_model"

[CSA_Workflow]
raw_data_dir = "raw_data"
x_data_dir = "x_data"
y_data_dir = "y_data"
splits_dir = "splits"
Loading