Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions regarding tsebra #879

Open
bauerlev opened this issue Oct 24, 2024 · 1 comment
Open

Questions regarding tsebra #879

bauerlev opened this issue Oct 24, 2024 · 1 comment

Comments

@bauerlev
Copy link

Hello, I have a few questions regarding tsebra. I know it has it's own github, but it doesn't seem to be active and your group is the same maker so I'm hoping this is an appropriate place to ask.

  1. What do each of these parameters in the config file actually mean? I can't find an answer in the documentation anywhere.

Allowed difference for each feature

Values have to be in [0,2]

e_1 0.0
e_2 0.5
e_3 0.096
e_4 0.02
e_5 0.18
e_6 0.18

  1. Is there documentation on how the script get_longest_isoform works? We noticed multiple transcripts for a given loci after attempting to run braker3, so I tried this script and while it helped there are still instances where there's more than one transcript for a given locus.

Thanks for your help! We've been having significantly better success with braker over maker and I'm very grateful.

@bijendrabio
Copy link

Hello @bauerlev,
Hope you may find this helpful;

"Hi, I will upload a TSEBRA version with the keep-all option by the end of this week.

Your command line looks correct and it should work.

You might be correct that the configuration of the long-read version of TSEBRA isn't fitted for all species as the amount of long-read data available during development was very limited. If you want to adjust the configuration, I would suggest that you try different values for intron_support, e_1, e_4, e_5, e_6.
The support values in the config file specify the minimum fraction that has to be supported by extrinsic evidence. If a transcript has lower evidence support in start/stop-codon and intron, it will be filtered out. For the current long read configuration, this means that all transcripts must have either all introns or their stop supported. I would suggest decreasing intron_support if you want to change anything here. This can be especially helpful if you think that the sensitivity at the gene level is not high enough.
The e parameter are thresholds that are used to allow some difference between the different scores of two transcripts at the same locus. In short, the thresholds correspond to scores as follows: e_1: relative fraction of supported introns, e_2: relative fraction of supported stop-codons, e_3 relative fraction of supported start-codons, e_4: absolute intron support, e_5: absolute stop-codon support, e_6 absolute start-codon support. If you want to go more in-depth, you can take a look at our paper. I would try to increase e_1, e_4, e_5, e_6, especially if you want to keep more alternative isoforms per gene."

Gaius-Augustus/TSEBRA#13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants