-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME
43 lines (33 loc) · 2.74 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
Set-Up:
- your own Snakefile with the following first line:
include: "/path/to/DETECT/Snakefile" # this imports the rules from this module
- config file
- outcome code lists (it's helpful to make sure that none of the outcomes have zero individuals)
- input date matrices indicated by config file
Input (Date of Onset) Matrix Format:
- .csv
- column names = IID, code1, code2, etc...
- row names = person1, person2, etc...
- cell values - decimal value with unit YEARS relative to some date (we used the linux epoch by default)
Running:
- in a python 3.8 environment with pandas, numpy, scipy, matplotlib, and snakemake>=7.8.3
- for dry-run > snakemake -n {OUTPUT} # {OUTPUT} is the target file you're asking for
- example: lsf queueing/cluster system profile > snakemake --profile lsf {OUTPUT} # {OUTPUT} is the target file you're asking for
- A shortcut to asking for multiple output files would be to add a custom all rule to your Snakefile - example:
rule all:
input:
'Summary/positive_phecode_tests.csv',
'Summary/positive_icd_tests.csv,
expand('Trajectories/trajectories_{index}_{outcome}_icd_{dataset}.csv', outcome=open(config['icd_code_list']).read().splitlines(), dataset=[k for k, v in config['icd_date_matrices'].items()], index=config['icd_index_code_list'])),
'Trajectory_Times/E11_I21._icd_4dig.all_signpost_times.csv.gz' # particular interest in diabetes -> MI
- to use your custom all rule > snakemake all --profile lsf
Suggested Target outputs:
- All phecode tests with RR > 1 = 'Summary/positive_phecode_tests.csv'
- All ICD tests with RR > 1 = 'Summary/positive_icd_tests.csv'
- Individual-Level Trajectories for one Phecode test = 'Trajectories/trajectories_{index}_{outcome}_phecode_{dataset}.csv'
- Individual-Level Trajectories for one ICD Code test = 'Trajectories/trajectories_{index}_{outcome}_icd_{dataset}.csv'
Aggregated Target outputs:
- All Individual-Level Trajectories for Phecodes = expand('Trajectories/trajectories_{index}_{outcome}_phecode_{dataset}.csv', outcome=[c.replace('*', '') for c in open(config['phecode_list']).read().splitlines()], dataset=[k for k, v in config['phecode_date_matrices'].items()], index=config['phecode_index_code_list'])
- All Individual-Level Trajectories for ICD codes = expand('Trajectories/trajectories_{index}_{outcome}_icd_{dataset}.csv', outcome=[c.replace('*', '') for c in open(config['icd_code_list']).read().splitlines()], dataset=[k for k, v in config['icd_date_matrices'].items()], index=config['icd_index_code_list'])
Plotting/Detailed outputs:
- Trajectory timing information for selected index/outcome combinations = 'Trajectory_Times/{index}_{outcome}_icd_4dig.all_signpost_times.csv.gz'