Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not all iGenomes compatible with Salmon/extracting transcripts? #272

Closed
olgabot opened this issue Aug 26, 2019 · 1 comment
Closed

Not all iGenomes compatible with Salmon/extracting transcripts? #272

olgabot opened this issue Aug 26, 2019 · 1 comment

Comments

@olgabot
Copy link
Contributor

olgabot commented Aug 26, 2019

Hello,
I'm using the dev branch with Salmon to align some chimp data using the CHIMP2.1.4 genome from iGenomes.

Here is the nextflow run command:

 Sat 24 Aug - 16:55  ~/code/kmer-hashing/kh-workflows/rnaseq/brawand2011/chimpanzee   origin ☊ olgabot/brawand2011 ✔ 30☀ 
  make paired_end
nextflow run nf-core/rnaseq \
        -r dev \
        -latest \
        -resume \
        --reads '/home/olga/ibm_lg/sra-explorer/brawand2011/paired_end/*'ptr'*{1,2}.fastq.gz' \
        -profile docker,czbiohub_local \
        --saveTrimmed \
        --outdir "/home/olga/ibm_lg/kmer-hashing/brawand2011/nfcore-rnaseq/" \
        -work-dir "/home/olga/ibm_lg/kmer-hashing/nextflow-intermediates/" \
        --custom_config_base "/home/olga/code/nf-core/configs" \
        --pseudo_aligner salmon \
        --genome CHIMP2.1.4 \
        --email olga.botvinnik@czbiohub.org

Here is an excerpt of the error message:

Error executing process > 'transcriptsToFasta (genome.fa)'

Caused by:
  Process `transcriptsToFasta (genome.fa)` terminated with an error exit status (1)

Command executed:

  gffread -w transcripts.fa -g genome.fa genes.gtf

Command exit status:
  1

Command output:
  (empty)

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  FASTA index file genome.fa.fai created.
  Warning: couldn't find fasta record for 'AACZ03149418.1'!
  Error: no genomic sequence available (check -g option!).

Work dir:
  /home/olga/ibm_lg/kmer-hashing/nextflow-intermediates/36/1916b1bf4672b99b68e879e8141dea

Here is the full log: chimpanzee_nf-core_rnaseq_dev_salmon_nextflow.log

Potentially we could filter the fasta and gtf for only the overlapping sequences using bioawk:

(sourmash-sbt2knn)
 Mon 26 Aug - 06:45  ~/ibm_lg/kmer-hashing/nextflow-intermediates/36/1916b1bf4672b99b68e879e8141dea 
  bioawk -c fastx '{print $name}'  genome.fa > seqnames_genome.txt
(sourmash-sbt2knn)
 Mon 26 Aug - 06:46  ~/ibm_lg/kmer-hashing/nextflow-intermediates/36/1916b1bf4672b99b68e879e8141dea 
  bioawk -c gff  '{print $seqname}' genes.gtf | sort | uniq > seqnames_gtf.txt

And then do some kind of unix-fu to filter for only entries in the gtf whose $seqid exists in the file seqnames_genome.txt.

@apeltzer
Copy link
Member

I believe this is adressed already in #274 and will close this therefore. If not, please let us know 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants