How to get isoform counts for single cell RNA-seq data? #48

biopzhang · 2022-11-02T19:47:48Z

Great tool that integrates lots of functions!

I was wondering if there is a way to get the isoform counts. I was trying to get the isoform counts following your Nature paper (specifically https://github.com/pachterlab/BYVSTZP_2020).

You mentioned that for the 10xv3 data, "gene-count matrices were made by using the -genecounts flag and TCC matrices were made by omitting it". It works great for the gene-count part with the following command:

$ kb count --h5ad -i index.idx -g t2g.txt -x 10xv3 -o XXX -m 64G --workflow standard --filter bustools -t 32

I got the cells x genes matrix both in the mtx and h5ad format.

My question is, how to get a cells x transcripts matrix? It does not seem to work by simply adding the "--tcc" to the above command. I can get a cells x tcc mtx, but not the cells x transcripts mtx. Moreover, I don't know how to apply or omit the "--genecounts" flag.

Thank you so much!
P.

Yenaled · 2022-11-03T13:46:32Z

Currently, kb count only does transcript quantification for bulk/smart-seq data (where each sample or cell is in a separate FASTA file).

For 10X type data, kb count stops at the cells x tcc mtx. However, you can run "kallisto quant-tcc" on the cells x tcc mtx to try to get transcript quantification.

biopzhang · 2022-11-03T14:15:11Z

Thank you for your quick reply, Yenaled!

I was testing this on the forebrain glutamatergic neuronal lineage data in the KBtools tutorial. The kb count tcc matrix (394,494 x 6,238,208) is huge for the kallisto quant-tcc step. It runs forever even on an HPC cluster node (64 cores, ~ TB memory; 12 hours now, still running). I think probably I should only take the cells according to other studies, such as in the RNA velocity study (only about 1800 cells are kept). Could you please commend on this?

Yenaled · 2022-11-04T08:34:38Z

Oh, with such a large matrix, it's computationally intractable. You will definitely need to filter cells.

The EM algorithm (which gives you transcript counts) in quant-tcc only takes a few seconds to run, but if you multiply a few seconds by hundreds of thousands of cells, well, you do the math of how long it'll take to run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get isoform counts for single cell RNA-seq data? #48

How to get isoform counts for single cell RNA-seq data? #48

biopzhang commented Nov 2, 2022 •

edited

Loading

Yenaled commented Nov 3, 2022

biopzhang commented Nov 3, 2022

Yenaled commented Nov 4, 2022

How to get isoform counts for single cell RNA-seq data? #48

How to get isoform counts for single cell RNA-seq data? #48

Comments

biopzhang commented Nov 2, 2022 • edited Loading

Yenaled commented Nov 3, 2022

biopzhang commented Nov 3, 2022

Yenaled commented Nov 4, 2022

biopzhang commented Nov 2, 2022 •

edited

Loading