Questions regarding tsebra #879

bauerlev · 2024-10-24T16:02:30Z

Hello, I have a few questions regarding tsebra. I know it has it's own github, but it doesn't seem to be active and your group is the same maker so I'm hoping this is an appropriate place to ask.

What do each of these parameters in the config file actually mean? I can't find an answer in the documentation anywhere.

Allowed difference for each feature

Values have to be in [0,2]

e_1 0.0
e_2 0.5
e_3 0.096
e_4 0.02
e_5 0.18
e_6 0.18

Is there documentation on how the script get_longest_isoform works? We noticed multiple transcripts for a given loci after attempting to run braker3, so I tried this script and while it helped there are still instances where there's more than one transcript for a given locus.

Thanks for your help! We've been having significantly better success with braker over maker and I'm very grateful.

bijendrabio · 2024-12-02T22:01:27Z

Hello @bauerlev,
Hope you may find this helpful;

"Hi, I will upload a TSEBRA version with the keep-all option by the end of this week.

Your command line looks correct and it should work.

You might be correct that the configuration of the long-read version of TSEBRA isn't fitted for all species as the amount of long-read data available during development was very limited. If you want to adjust the configuration, I would suggest that you try different values for intron_support, e_1, e_4, e_5, e_6.
The support values in the config file specify the minimum fraction that has to be supported by extrinsic evidence. If a transcript has lower evidence support in start/stop-codon and intron, it will be filtered out. For the current long read configuration, this means that all transcripts must have either all introns or their stop supported. I would suggest decreasing intron_support if you want to change anything here. This can be especially helpful if you think that the sensitivity at the gene level is not high enough.
The e parameter are thresholds that are used to allow some difference between the different scores of two transcripts at the same locus. In short, the thresholds correspond to scores as follows: e_1: relative fraction of supported introns, e_2: relative fraction of supported stop-codons, e_3 relative fraction of supported start-codons, e_4: absolute intron support, e_5: absolute stop-codon support, e_6 absolute start-codon support. If you want to go more in-depth, you can take a look at our paper. I would try to increase e_1, e_4, e_5, e_6, especially if you want to keep more alternative isoforms per gene."

Gaius-Augustus/TSEBRA#13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions regarding tsebra #879

Questions regarding tsebra #879

bauerlev commented Oct 24, 2024

bijendrabio commented Dec 2, 2024

Questions regarding tsebra #879

Questions regarding tsebra #879

Comments

bauerlev commented Oct 24, 2024

Allowed difference for each feature

Values have to be in [0,2]

bijendrabio commented Dec 2, 2024