Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use --transcript_length_matrix with nf-core/rnaseq --aligner star_rsem output? #279

Closed
olgabot opened this issue Jun 17, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@olgabot
Copy link

olgabot commented Jun 17, 2024

Description of the bug

Hello,
Hope you are doing well! I'm wondering how to take advantage of the transcript length feature (#203) when using gene counts created by RSEM. I prefer RSEM as an aligner as I find it to be more specific -- we had some RNA-seq data with plasmids only in certain conditions, and those plasmids got >0 counts in the conditions WITHOUT the plasmids when using salmon :( and didn't have the same issues with RSEM -- but I can't find a *.gene_lengths.tsv file created by RSEM.

From nf-core/rnaseq documentation

STAR via RSEM

  • Output files
    • star_rsem/
      • rsem.merged.gene_counts.tsv: Matrix of gene-level raw counts across all samples.
      • rsem.merged.gene_tpm.tsv: Matrix of gene-level TPM values across all samples.
      • rsem.merged.transcript_counts.tsv: Matrix of isoform-level raw counts across all samples.
      • rsem.merged.transcript_tpm.tsv: Matrix of isoform-level TPM values across all samples.
      • .genes.results: RSEM gene-level quantification results for each sample.
      • .isoforms.results: RSEM isoform-level quantification results for each sample.
      • .STAR.genome.bam: If -save_align_intermeds is specified the original BAM file containing read alignments to the reference genome will be placed in this directory.
      • .transcript.bam: If -save_align_intermeds is specified the original BAM file containing read alignments to the transcriptome will be placed in this directory.
    • star_rsem/<SAMPLE>.stat/
      • .cnt.model.theta: RSEM counts and statistics for each sample.
      • star_rsem/log/
      • .log: STAR alignment report containing the mapping results summary.

RSEM is a software package for estimating gene and isoform expression levels from RNA-seq data. It has been widely touted as one of the most accurate quantification tools for RNA-seq analysis. RSEM wraps other popular tools to map the reads to the genome (i.e. STAR, Bowtie2, HISAT2; STAR is used in this pipeline) which are then subsequently filtered relative to a transcriptome before quantifying at the gene- and isoform-level. Other advantages of using RSEM are that it performs both the alignment and quantification in a single package and its ability to effectively use ambiguously-mapping reads.

You can choose to align and quantify your data with RSEM by providing the --aligner star_rsem parameter.

How do you advise creating a --transcript_length_matrix file from the RSEM data? Or should we use the TPMs in this case?

Thank you!

Command used and terminal output

No response

Relevant files

No response

System information

No response

@olgabot olgabot added the bug Something isn't working label Jun 17, 2024
@olgabot
Copy link
Author

olgabot commented Jun 18, 2024

Whelp, I solved my own problem by actually reading the tximport documentation for RSEM. In case it helps someone else, here's the code snippet to get the lengths from the RSEM output from nf-core/rnaseq:

library(tximport)
files = Sys.glob('star_rsem/*.genes.results')
names = c()
for (filename in files) { names = c(names, strsplit(strsplit(filename, '/')[[1]][2], '.genes')[[1]][1]) }
names(files) = names
txi.rsem <- tximport(files, type = "rsem", txIn = FALSE, txOut = FALSE)
head(txi.rsem$length)
write.table(txi.rsem$length, 'rsem_gene_lengths.tsv', sep='\t')

@WackerO
Copy link
Collaborator

WackerO commented Jul 26, 2024

Hey Olga, glad you solved the issue! Closing this then 🙂

@WackerO WackerO closed this as completed Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants