kallisto bustools with reference transcriptome #45

MartaBenegas · 2022-09-26T11:05:41Z

Dear team,

I'm a little bit confused about the build index step. The manual says that it builds a transcriptome index but needs as input a genomic fasta and a gff. I would like to create the count table using a reference transcriptome. Is this possible with kallisto + bustools?

Thank you,
Marta.

Yenaled · 2022-09-26T12:10:54Z

kb ref makes a reference transcriptome from a genome fasta and gtf.

If you already have a transcriptome, there's no need to use kb ref. Simply use kallisto index -i index.idx reference_transcriptome.fasta to create your index (index.idx).

MartaBenegas · 2022-09-26T13:24:26Z

Thanks for the explanation!

MartaBenegas · 2022-09-28T14:27:17Z

Dear Delaney, sorry for re-open the issue.

In order to use the kb count, I also need the transcript-to-gene mapping file. Which kind of file it is? Is it a tab file with transcript in one column and gene name in another?

Moreover, is there another option to perform the counting without using this file? I would like to use a de novo assembled transcriptome so I don't have this piece of information.

Thanks!

Yenaled · 2022-09-29T07:13:54Z

It's just a tab file with transcript in first column and gene name in second column.

You need this file to performing the counting -- but, if you want, you can pretend that each transcript is its own gene (i.e. put the transcript name in both columns).

The main issue is that kb count will discard all multimappers (i.e. if a UMI maps to more than 1 gene, that UMI will not be counted). Thus, multimapping might be a big issue if you pretend each transcript belongs to a different gene.

There are ways around this (e.g. if you use the --tcc option in kb count, an EM algorithm will try to probablistically figure out what to do with the multimappers). It basically boils down to: If you have a UMI associated with transcripts A, B, and C but have no gene-level information, how do you want to count that UMI?

MartaBenegas · 2022-10-19T09:24:34Z

Hi Delaney, thank you very much for your explanation!
Now I see that multimappers are really an issue, I hadn't taken this fact into account so thank you for pointing that out!

Is there a way to not discard multimappers? And assign the count to the transcript with the most reliable alignment or something similar.

To explain my context a little bit, I'm working with a non-model organism and I've obtained my own curated reference transcriptome. Now I would like to use it for single-cell analysis, so I was searching for a counting algorithm that worked with a reference transcriptome. For the time being, I think I'll use your workaround to see how it behaves and maybe perform a sequence clustering to my transcriptome prior to the counting. I know it's not the perfect procedure, but I'll let you know how it goes :)

MartaBenegas closed this as completed Sep 26, 2022

MartaBenegas reopened this Sep 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kallisto bustools with reference transcriptome #45

kallisto bustools with reference transcriptome #45

MartaBenegas commented Sep 26, 2022

Yenaled commented Sep 26, 2022

MartaBenegas commented Sep 26, 2022

MartaBenegas commented Sep 28, 2022

Yenaled commented Sep 29, 2022 •

edited

Loading

MartaBenegas commented Oct 19, 2022

kallisto bustools with reference transcriptome #45

kallisto bustools with reference transcriptome #45

Comments

MartaBenegas commented Sep 26, 2022

Yenaled commented Sep 26, 2022

MartaBenegas commented Sep 26, 2022

MartaBenegas commented Sep 28, 2022

Yenaled commented Sep 29, 2022 • edited Loading

MartaBenegas commented Oct 19, 2022

Yenaled commented Sep 29, 2022 •

edited

Loading