Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cell line specific situation - N and T samples with different patient IDs #114

Open
RoniHaas opened this issue Jun 20, 2023 · 6 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@RoniHaas
Copy link

Describe the issue
I received an error that its meaning isn't clear to me. This is after an 8.15 hr run. The run was marked as completed.

executor > local (7) [01/d51cc8] process > create_CSV_metapipeline_DNA... [100%] 2 of 2 ✔ [80/5d6b21] process > create_config_metapipeline_DNA [100%] 1 of 1 ✔ [c1/3a6964] process > call_metapipeline_DNA (2) [100%] 2 of 2 ✔ [69/ebd9b6] process > check_process_status (2) [100%] 2 of 2 ✔ Process in /hot/project/disease/ProstateTumor/PRAD-000096-RadioResDU145Molecular/22Rv1/WXS/input/CHPRRR2M_1/c1/3a69648915774d2f6e6d01586063a3 failed with non-zero exit code.
It would be great if you could help me understand the source of this error @yashpatel6 .

  • Pipeline release version:
    /hot/user/yashpatel/metapipeline-DNA/yashpatel-global-jobs-limits/main.nf * Node type (F2s (lowmem) / F72s (midmem) / M64s (execute))
  • Submission method (interactive/submission script)
  • Actual submission script (python submission script, "nextflow run ...", etc.)
  • Sbatch or qsub command and logs if applicable
  • Config files
    `* /hot/project/disease/ProstateTumor/PRAD-000096-RadioResDU145Molecular/22Rv1/WXS/input/CHPRRR2M_1/metapipe_input_CHPRRR2M_1.config
  • Any logs produced by the pipeline
    /hot/project/disease/ProstateTumor/PRAD-000096-RadioResDU145Molecular/22Rv1/WXS/input/CHPRRR2M_1.error
    /hot/project/disease/ProstateTumor/PRAD-000096-RadioResDU145Molecular/22Rv1/WXS/input/CHPRRR2M_1.log
@yashpatel6
Copy link
Collaborator

Hi Roni, it looks like the issue is with the YAML: /hot/project/disease/ProstateTumor/PRAD-000096-RadioResDU145Molecular/22Rv1/WXS/input/CHPRRR2M_1/metapipe_input_CHPRRR2M_1.yam - there seem to be two patients in the input with the normal coming from one patient and the tumors coming from another patient. Seems like the normal sample is accidentally from/labeled as a different patient in the input, which causes call-gSNP to fail since it expects a normal sample per patient

@RoniHaas
Copy link
Author

Hi Roni, it looks like the issue is with the YAML: /hot/project/disease/ProstateTumor/PRAD-000096-RadioResDU145Molecular/22Rv1/WXS/input/CHPRRR2M_1/metapipe_input_CHPRRR2M_1.yam - there seem to be two patients in the input with the normal coming from one patient and the tumors coming from another patient. Seems like the normal sample is accidentally from/labeled as a different patient in the input, which causes call-gSNP to fail since it expects a normal sample per patient

I see! Thank you @yashpatel6 . The problem is that my normal is not really "Normal". I am comparing cell lines of 2 types and both of them are from Tumors. For data registration, I found it right to define this sample as a different patient. Otherwise, I think it might cause confusion. I consider one of the cell line types as "Normal" since I want to identify mutations in relation to this cell line type. Is there a way to overcome this?

@tyamaguchi-ucla
Copy link
Contributor

@RoniHaas is this discussion helpful for your case? #109

@RoniHaas
Copy link
Author

@RoniHaas is this discussion helpful for your case? #109

Thank you for sharing. It still seems that I would have to change the patient ID for the run, in any event, to make it work. Is that correct? I can change the patient ID for the run easily. But I thought that consistency between data registration and the output file names is important. On the other hand, changing the patient IDs for data registration to solve this issue may be less logical.
Any thoughts about that? @tyamaguchi-ucla @yashpatel6

@yashpatel6
Copy link
Collaborator

@RoniHaas is this discussion helpful for your case? #109

Thank you for sharing. It still seems that I would have to change the patient ID for the run, in any event, to make it work. Is that correct? I can change the patient ID for the run easily. But I thought that consistency between data registration and the output file names is important. On the other hand, changing the patient IDs for data registration to solve this issue may be less logical. Any thoughts about that? @tyamaguchi-ucla @yashpatel6

That is correct, the patient ID would have to be changed so the metapipeline properly associated samples. Without relying on the patient ID, grouping samples would get much more challenging from the metapipeline's perspective (there would basically have to be an additional identifier indicating grouping/relation between samples somehow). While it may end up being slightly inconsistent between dataset registration and the metapipeline run, the best solution at the moment is to change the patient ID and track it

@RoniHaas RoniHaas changed the title Help with an error - metapipeline-DNA Cell line specific situation - N and T samples with different patient IDs Jun 22, 2023
@RoniHaas RoniHaas added the enhancement New feature or request label Jun 22, 2023
@RoniHaas
Copy link
Author

@RoniHaas is this discussion helpful for your case? #109

Thank you for sharing. It still seems that I would have to change the patient ID for the run, in any event, to make it work. Is that correct? I can change the patient ID for the run easily. But I thought that consistency between data registration and the output file names is important. On the other hand, changing the patient IDs for data registration to solve this issue may be less logical. Any thoughts about that? @tyamaguchi-ucla @yashpatel6

That is correct, the patient ID would have to be changed so the metapipeline properly associated samples. Without relying on the patient ID, grouping samples would get much more challenging from the metapipeline's perspective (there would basically have to be an additional identifier indicating grouping/relation between samples somehow). While it may end up being slightly inconsistent between dataset registration and the metapipeline run, the best solution at the moment is to change the patient ID and track it

Thanks for explaining. Yup, the run is urgent, so I will change the patient IDs. In my opinion, it might be worth thinking about these situations (I guess cell-line-related situations) for the next release if that makes sense. I have changed the issue name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants