Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some suggestions required after checking braker3 gff file #902

Open
Abieskawa opened this issue Dec 24, 2024 · 0 comments
Open

Some suggestions required after checking braker3 gff file #902

Abieskawa opened this issue Dec 24, 2024 · 0 comments

Comments

@Abieskawa
Copy link

Hello,
I am now working on a genome which is limited RNA information (only one library) + protein evidence (Vertebrata+UniProtKB_7898 + ncbi_1489894)
I performed braker3 analysis as @LarsGab suggested post.
Now I got 21,239 genes with a BUSCO proteome value of 98.5%.

I transformed the gtf file with gtf2gff.pl in Augustus script, and then check gff information with gff3_QC v2.1.0 and my own custom code.

what I got in gff3_QC:

Error_code Number_of_problematic_models Error_level Error_tag
Esf0014 1 Error ##gff-version" missing from the first line
Esf0012 2 Info Found Ns in a feature using the external FASTA
Ema0008 221 Warning Warning for distinct isoforms that do not share any regions
Emr0002 76 Warning Incorrectly split gene parent?

what I got in custom code:

=== Additional Validation Checks ===

Not necessary error. Count these scenarios.
Stop codon missing: 6
mRNA g4787.t1 has no stop codon defined.
mRNA g9511.t2 has no stop codon defined.
mRNA g11392.t1 has no stop codon defined.
mRNA g19024.t2 has no stop codon defined.
mRNA g20446.t1 has no stop codon defined.
mRNA g21239.t1 has no stop codon defined.
Start codon missing: 8
mRNA g891.t1 has no start codon defined.
mRNA g2706.t1 has no start codon defined.
mRNA g12910.t1 has no start codon defined.
mRNA g13255.t1 has no start codon defined.
mRNA g16568.t1 has no start codon defined.
mRNA g20454.t1 has no start codon defined.
mRNA g21234.t1 has no start codon defined.
mRNA g21239.t1 has no start codon defined.
Invalid coordinates: 0
Duplicate errors: 0

Here is some visualisation of warning:

GFFErrors.pptx

I want to ask:

  1. What is the gff version which gtf2gff.pl used? I only know I used --gff3 option, but is there any specific version information?
  2. Is it normal to have missing stop/start codon in mRNA prediction?
  3. Any suggestion to deal with distinct isoforms/Incorrectly split gene parent warning?
  4. I guess braker3 allow some N in the protein-coding gene region? Is it suggested to skip it?

Many thanks~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant