-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qmode collect step for large cohorts #180
Comments
Dear @kate-stankiewicz , sorry for the late reply. I sometimes fall behind on answering issues due to other higher priority items. To refer to your individual steps above, I will refer to them numbered from 1 to 6. In step 1, you generate a single splicing graph per input bam file. In the same step, you quantify the single graphs in that sample, as the In step 2, you merge the single sample graphs into a joint graphs. That is, all events detected in all samples will be merged per gene. Also here, you would like to add the In step 3, you will now quantify the merged graph with each single bam file. That is, also events detected in only one sample will get quantified in all samples. This is sometimes useful, as the criteria for adding a new event to the splicing graph are quite stringent. So you might not have enough reads in a sample to add a new event, but you have enough to quantify it, if it was detected in another sample. If you do this in parallel over samples, you can speed up this process depending on how many samples you run in parallel. In step 4, you now collect the quantifications from step 3. This should generate I hope this answers your question. Please let me know if you have additional ones. Best, |
Hi Andre, Thank you for this informative and in depth explanation--it is really helpful! So if I am understanding correctly, the results between running these steps in the way that I have and running them as you outlines here should not change, correct? The only thing I have missed out on is better parallelization in delaying the graft quantification until a later step? Also, to possibly save you having to address a similar issue from another user, the --no-quantify-graph option is not listed in the steps for use on large cohorts: https://spladder.readthedocs.io/en/latest/spladder_cohort.html Only the --no-extract-ase option is listed in the earlier steps. Just wanted to let you know since it seems to be important to include and makes sense as you explained it above. |
Hi @kate-stankiewicz , Thanks for the hint with the documentation. I will fix this right away. Regarding you statement about completeness - yes, this is true. It only took longer than necessary, but the outputs are the same. Best, Andre |
Hi @akahles , Thanks for the clarification! I will close this issue now. |
Description
I have followed the steps for running SplAdder on large cohorts as outlined here https://spladder.readthedocs.io/en/latest/spladder_cohort.html
However, I am not sure what the second phase in the quantification step is doing? On the documentation page is says "As a second step to this phase, we need to collect the individual quantifications and aggregate them in a joint database:". However, when I run this step I get no output and no log information either (even if run in -v mode). Additionally, it only runs for about 30 seconds (all other steps in the workflow run for at least 30 mins, if not days). From what I can tell, no new files are created, nor are any existing files modified. The subsequent steps (event calling and testing) seem to run despite this. Is the step doing anything that might not be visible to me? Or is it possible to skip it?
Many thanks!
What I Did
The text was updated successfully, but these errors were encountered: