EXITING because of INPUT ERROR: the file format of the genomeFastaFile #23

JAYRJPT · 2021-07-02T18:24:28Z

Hello
I am using Clinker to visualize one of the fusions came from fusion catcher tool. I have made a csv file with the coordinates of the fusion named DUX4:IGH@ similar to the bcr_abl1.csv file mentioned in test folder.
Here is my command-
bpipe -p out=/home/deepak/output -p caller=$CLINKERDIR/test/caller/dux4_igh.csv -p col=1,2,3,4 -p genome=38 -p print=true -p competitive=true -p header=true -p align_mem=31025992405 -p genome_mem=31025992405 -p threads=30 -p fusions=DUX4:IGH@ $CLINKERDIR/workflow/clinker.pipe $CLINKERDIR/test/fastq/*.fastq.gz

But I am getting the error at the alignment step

====================================================================================================
|                              Starting Pipeline at 2021-07-02 23:25                               |
====================================================================================================

======================================== Stage generate_fst ========================================


==============================================================


	Fusion Super Transcript Generator

	A fusion visualiser.


==============================================================



==============================================================

Create fusion superTranscriptome:

WARNING: a gene (line 0 of fusion input) does not exist in annotation/hg19_ucscGenes.txt based upon breakpoint.
         Closest mapped gene name is 'RABL2B' (139512811 bp downstream)

--------------------------------------------------------------
Gene Symbols Mapped: 0 Not Mapped: 1 Total: 1
--------------------------------------------------------------

Note: Some superTranscripts were not generated. This could be because of:
	A: The breakpoint was not within a gene (this program only deals with these).
	B: The superTranscript reference file did not contain an entry for that gene symbol.
	C: You have identified the wrong columns, or they contain the wrong information, with the -pos argument.

==============================================================

Creating output directory at: /home/deepak/output
Creating fused superTranscriptome and annotation files


...Success!

Use the plot_fst bpipe workflow or IGV to visualise your results.

==============================================================


====================================== Stage star_genome_gen =======================================
Jul 02 23:25:31 ..... started STAR run
Jul 02 23:25:31 ... starting to generate Genome files

EXITING because of INPUT ERROR: the file format of the genomeFastaFile: /home/deepak/output/reference/fst_reference.fasta is not fasta: the first character is '
' (10), not '>'.
 Solution: check formatting of the fasta file. Make sure the file is uncompressed (unzipped).

Jul 02 23:25:31 ...... FATAL ERROR, exiting
ERROR: stage star_genome_gen failed: Command in stage star_genome_gen failed with exit status = 104 : 

STAR --runMode genomeGenerate --runThreadN 30 --genomeDir /home/deepak/output/genome --genomeFastaFiles /home/deepak/output/reference/fst_reference.fasta --limitGenomeGenerateRAM 31025992405 --genomeSAindexNbases 5 


========================================= Pipeline Failed ==========================================

Command in stage star_genome_gen failed with exit status = 104 : 

STAR --runMode genomeGenerate --runThreadN 30 --genomeDir /home/deepak/output/genome --genomeFastaFiles /home/deepak/output/reference/fst_reference.fasta --limitGenomeGenerateRAM 31025992405 --genomeSAindexNbases 5

Use 'bpipe errors' to see output from failed commands.

Here is the bpipe error

deepak@ngs:~/ClINKERDIR$ bpipe errors

============================== Found 1 failed commands from run 26797 ==============================

=================================== Command star_genome_gen (68) ===================================


Command    : STAR --runMode genomeGenerate --runThreadN 30 --genomeDir /home/deepak/output/genome --genomeFastaFiles /home/deepak/output/reference/fst_reference.fasta --limitGenomeGenerateRAM 31025992405 --genomeSAindexNbases 5
Started    : Fri Jul 02 23:25:31 IST 2021
Stopped    : Fri Jul 02 23:25:31 IST 2021
Exit Code  : 104
Config: 
                   Name           |  Value 
          ---------------------------------
          max_per_command_threads | 16     
          executor                | local  
          stats_update_interval   | 120000 
          outputScanConcurrency   | 5      
          maxFileNameLength       | 2048   
          name                    | stargen
          procs                   | 1      

Output    : 

	Jul 02 23:25:31 ..... started STAR run
	Jul 02 23:25:31 ... starting to generate Genome files
	
	EXITING because of INPUT ERROR: the file format of the genomeFastaFile: /home/deepak/output/reference/fst_reference.fasta is not fasta: the first character is '
	' (10), not '>'.
	 Solution: check formatting of the fasta file. Make sure the file is uncompressed (unzipped).
	
	Jul 02 23:25:31 ...... FATAL ERROR, exiting

Any suggestion to remove this error?

Thanks and Regards,

Jay

The text was updated successfully, but these errors were encountered:

breons · 2021-07-02T22:09:11Z

Hi Jay, thanks for trying Clinker!

That error comes during the first stage (generate_fst) where the superTranscripts cannot be located in the reference files given the inputted coordinates.

I noticed hg19 has a IGH@ gene, but not hg38 (at least in Clinker's reference). Did the fusion caller us hg19? If so, simply delete the current output and change your -p genome=38 to -p genome=19.

If you're sure it's hg38, then I'll have to look into why that is missing.

Cheers,
Breon.

JAYRJPT · 2021-07-03T05:50:56Z

Hi Breon,
I have used Fusioncatcher and it has used hg38 as reference genome. I have mentioned the coordinates of the gene according to hg38 only.

Thanks,
Jay

breons · 2021-07-09T02:56:59Z

Hi Jay,

Sorry for the delay. I will need to rebuild the references to account for IGH@ in hg38 - it seems Clinker currently doesn't have a superTranscript for that. The bad news is that it might take me some time to get together as I am currently finishing some other projects.

However, I'm a bit confused as to why RABL2B is coming up as the closest gene (chr22), when DUX4 and IGH@ are on other chromosomes in the hg38 reference? Would you mind sharing the csv with the coordinates in them? Otherwise, just double check the positions are accurate.

Thanks!
Breon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EXITING because of INPUT ERROR: the file format of the genomeFastaFile #23

EXITING because of INPUT ERROR: the file format of the genomeFastaFile #23

JAYRJPT commented Jul 2, 2021

breons commented Jul 2, 2021

JAYRJPT commented Jul 3, 2021

breons commented Jul 9, 2021

EXITING because of INPUT ERROR: the file format of the genomeFastaFile #23

EXITING because of INPUT ERROR: the file format of the genomeFastaFile #23

Comments

JAYRJPT commented Jul 2, 2021

breons commented Jul 2, 2021

JAYRJPT commented Jul 3, 2021

breons commented Jul 9, 2021