Not all iGenomes compatible with Salmon/extracting transcripts? #272

olgabot · 2019-08-26T14:12:47Z

Hello,
I'm using the dev branch with Salmon to align some chimp data using the CHIMP2.1.4 genome from iGenomes.

Here is the nextflow run command:

 Sat 24 Aug - 16:55  ~/code/kmer-hashing/kh-workflows/rnaseq/brawand2011/chimpanzee   origin ☊ olgabot/brawand2011 ✔ 30☀ 
  make paired_end
nextflow run nf-core/rnaseq \
        -r dev \
        -latest \
        -resume \
        --reads '/home/olga/ibm_lg/sra-explorer/brawand2011/paired_end/*'ptr'*{1,2}.fastq.gz' \
        -profile docker,czbiohub_local \
        --saveTrimmed \
        --outdir "/home/olga/ibm_lg/kmer-hashing/brawand2011/nfcore-rnaseq/" \
        -work-dir "/home/olga/ibm_lg/kmer-hashing/nextflow-intermediates/" \
        --custom_config_base "/home/olga/code/nf-core/configs" \
        --pseudo_aligner salmon \
        --genome CHIMP2.1.4 \
        --email olga.botvinnik@czbiohub.org

Here is an excerpt of the error message:

Error executing process > 'transcriptsToFasta (genome.fa)'

Caused by:
  Process `transcriptsToFasta (genome.fa)` terminated with an error exit status (1)

Command executed:

  gffread -w transcripts.fa -g genome.fa genes.gtf

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  FASTA index file genome.fa.fai created.
  Warning: couldn't find fasta record for 'AACZ03149418.1'!
  Error: no genomic sequence available (check -g option!).

Work dir:
  /home/olga/ibm_lg/kmer-hashing/nextflow-intermediates/36/1916b1bf4672b99b68e879e8141dea

Here is the full log: chimpanzee_nf-core_rnaseq_dev_salmon_nextflow.log

Potentially we could filter the fasta and gtf for only the overlapping sequences using bioawk:

(sourmash-sbt2knn)
 Mon 26 Aug - 06:45  ~/ibm_lg/kmer-hashing/nextflow-intermediates/36/1916b1bf4672b99b68e879e8141dea 
  bioawk -c fastx '{print $name}'  genome.fa > seqnames_genome.txt
(sourmash-sbt2knn)
 Mon 26 Aug - 06:46  ~/ibm_lg/kmer-hashing/nextflow-intermediates/36/1916b1bf4672b99b68e879e8141dea 
  bioawk -c gff  '{print $seqname}' genes.gtf | sort | uniq > seqnames_gtf.txt

And then do some kind of unix-fu to filter for only entries in the gtf whose $seqid exists in the file seqnames_genome.txt.

The text was updated successfully, but these errors were encountered:

apeltzer · 2019-09-23T08:52:50Z

I believe this is adressed already in #274 and will close this therefore. If not, please let us know 👍

olgabot mentioned this issue Aug 30, 2019

Add step to filter gtf for only chromosomes that exist in the fasta file #274

Merged

8 tasks

apeltzer closed this as completed Sep 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not all iGenomes compatible with Salmon/extracting transcripts? #272

Not all iGenomes compatible with Salmon/extracting transcripts? #272

olgabot commented Aug 26, 2019

apeltzer commented Sep 23, 2019

Not all iGenomes compatible with Salmon/extracting transcripts? #272

Not all iGenomes compatible with Salmon/extracting transcripts? #272

Comments

olgabot commented Aug 26, 2019

apeltzer commented Sep 23, 2019