-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can we standarize the output directory a bit more to include sample ID? And or add a sample-ID paramater to the inputcsv? #103
Comments
Thanks for bringing this up @Alfredo-Enrique. Back then we had temporarily held back on this front of standardization, we can have this finalized and added in the next immediate release. |
@Alfredo-Enrique @tyamaguchi-ucla @yashpatel6 I just checked call-sSV's test results for release v5.0.0 and I see the sample level dir structure already exists.
Is it perhaps the way its written from meta-pipeline? P.S. This comment below addressed issue #65 which is different from the current issue
|
Hmmm looking at the code I'm pretty sure right now it's just taking the For example here is the input csv of that run!
Then if you look at the config, nowhere to specify sample-id:
|
Found it, here you go, right now we're just taking the filename. This is the code generating the input channel for the downstream modules. You can see line 107 we just get the file name. Lines 102 to 111 in 0c8def0
This is the 5-value tuple being fed as input to the modules (A), and the first value of the tuple is what's being used in our A: pipeline-call-sSV/module/delly.nf Lines 26 to 27 in 0c8def0
B: pipeline-call-sSV/module/delly.nf Lines 39 to 43 in 0c8def0
|
Sample ID parsing from BAM was intentional and output dir structure is determined in pipeline-call-sSV/config/methods.config Lines 23 to 33 in 0c8def0
Anyway, I see the concern now. Actual sample ID should be used instead of ID parsed from BAM file name. |
@Alfredo-Enrique can you share an example path to meta pipeline output? |
Yes happy to @Faizal-Eeman ! Let me know if you have any quesitons or if I can help in any way! |
Running through the metapipeline a whole bunch of samples and noticed that call-sSV is the only one that does not fully have a standardized output folder structure. The main output folder is based on whatever the input file bam name was eg (BWA-MEM2-2.2.1_GATK-4.2.4.1_TCGA-STSA_H-MV-3B-A9HS-01A-11D-A38Z-09) as opposed to the sample-ID we use for the other pipelines.
This makes it nonstandard for future analysis scripts as we will have to do additional parsing of metadata to match the specified input file name as opposed to using our internal sample IDs
Best,
EXAMPLE
call-sSV-5.0.0/
├── BWA-MEM2-2.2.1_GATK-4.2.4.1_TCGA-STSA_H-MV-3B-A9HS-01A-11D-A38Z-09
├── BWA-MEM2-2.2.1_GATK-4.2.4.1_TCGA-STSA_H-MV-DX-A23Y-01A-11D-A27P-09
├── BWA-MEM2-2.2.1_GATK-4.2.4.1_TCGA-STSA_H-MV-DX-A240-01A-32D-A27P-09
...
call-gSNP-10.0.0-rc.1/
├── TCGASTSA000001-T001-P01-P
├── TCGASTSA000002-T001-P01-P
├── TCGASTSA000003-T001-P01-P
...
call-mtSNV-3.0.0/
├── TCGASTSA000001-T001-P01-P
├── TCGASTSA000002-T001-P01-P
├── TCGASTSA000003-T001-P01-P
I believe #65 is related.
The text was updated successfully, but these errors were encountered: