-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
problem running Clinker on STAR-Fusion output #11
Comments
Clinker works fine in our hands on the test data provided together with the source code in the GitHub repository. However, we experience difficulty with producing meaningful output when STAR-Fusion results are used as input for Clinker. Although Clinker runs through completion, without error messages, but no data is found in the "alignment" subfolder after the run. More specifically, we run the following command: bpipe -p out=SKBR3 -p caller=data/SKBR3_star-fusion.fusion_predictions.tsv -p del="t" -p col="6,8" -p print="true" -p competitive=true -p header=true -p align_mem=32000000000 -p genome_mem=32000000000 -p fusions="TATDN1:GSDMB" $CLINKERDIR/workflow/clinker.pipe data/SKBR3.Left.fq.gz data/SKBR3.Right.fq.gz where the first two lines in the "caller" file, data/SKBR3_star-fusion.fusion_predictions.tsv, are: #FusionName JunctionReadCount SpanningFragCount SpliceType LeftGene LeftBreakpoint RightGene RightBreakpoint JunctionReads SpanningFrags LargeAnchorSupport FFPM LeftBreakDinuc LeftBreakEntropy RightBreakDinuc RightBreakEntropy annots and the first few lines in the FASTAQ file SKBR3.Left.fq are (SKBR3.Right.fq is similar): @HWI-EAS418:8:1:3:1091/1 In particular, I am wondering if the bpipe option -p col="6,8" is correct for the "caller" file I have been using. (The format of this file, produced by STAR-Fusion, is very different from the format of your test example). Please, let me know if you need more info. Regards, |
Hi Gennady, thanks for taking the time to try Clinker, let's get it working for you. Your column parameter looks like it should work (plus you would generally receive an error if it wasn't). Which stage are you getting up to in particular? By the sounds of it you're completing the first stage and the alignment hasn't been performed via STAR yet? Do you have a fst_reference.fasta that has appeared in your Thanks! |
Hi Breon, Thanks for getting back to me promptly. Yes, the file SKBR3/reference/fst_reference.fasta was produced by my run Regards, |
No problem at all! Breaking this problem down:
The next stage is the STAR alignment. I just noticed your FASTQ file input naming convention, could you please try and renaming and gzipping them so they reflect Within the If you could make this change then rerun the pipeline after either deleting the contents of the previous Clinker run (the SKBR3a folder) or simply changing your Let me know how you go! |
Hi Breon, Thanks for your additional input. Following your suggestion, I renamed the input FASTQ files to be SKBR3.Left.fastq.gz and SKBR3.Right.fastq.gz, respectively. Furthermore, I have re-ran once again STAR-Fusion and then Clinker to produce a fresh output. The fst_reference.fasta file, as well as a number of files in the "genome" subfolder were produced: SKBR3/reference: SKBR3/genome: However, the SKBR3/alignment folder is still empty. One possibility I have been thinking about is whether the Clinker option -p col=6,8 is correct/sufficient for parsing the "caller" file produced by STAR-Fusion. In the test example provided in Clinker Github repository, the "caller" CSV file, bcr_abl1.csv, is "chrom1","base1","chrom2","base2" and the option -p col=1,2,3,4 specifies each the chromosome id and the coordinate inside the chromosome as a separate column. However, the format of the "caller" TSV file produced by STAR-Fusion is different: #FusionName JunctionReadCount SpanningFragCount SpliceType LeftGene LeftBreakpoint RightGene RightBreakpoint JunctionReads SpanningFrags LargeAnchorSupport FFPM LeftBreakDinuc LeftBreakEntropy RightBreakDinuc RightBreakEntropy annots In each the column specified by the the option -p col=6,8 , chromosome id is separated from coordinate by a colon (:). Regards, |
Hi Gennaby, Apologies, I should have made it clear, would you mind renaming the files to the below and trying again: The _R1 and _R2 are important in this case :). In terms of the Star Fusion input, I think that stage has been completed successfully and the col="6,8" is correct. Clinker has been developed to recognise that when two columns have been specified, the input format must be something like If you look at the first two lines of the When you do rerun, could you please post your command line output of the Thanks! Let me know how you go :). EDIT Maybe just send me the whole bpipe output, not just the generate_fst stage. |
Thank you Breon! Your suggestion works: the alignment folder is no longer empty. However, I noticed that STAR-Fusion and Clinker actually have contradictory requirements for naming the input FASTQ files:
Thus, one has to rename the files between the STAR-Fusion and Clinker runs! Therefore, I believe it would be helpful in the future to relax the Clinker's FASTQ file naming requirements. Why don't you just use the order of files passed to bpipe to distinguish between the "left" and "right" FASTQs? Regards, |
Hi Gennady, Great to hear that you got it working! If you run into any other problems, please don't hesitate to ask. Happy to help. That's a really interesting discovery! I will certainly be updating how FASTQ's get read in based on this thread and another, I will let you know when it's complete, but it will certainly be in the next version. Cheers, |
No description provided.
The text was updated successfully, but these errors were encountered: