gene interval is short than CDS interval #896

leon945945 · 2024-12-13T02:36:30Z

Hi, I used barker3 to annotate my genome, and updated UTR region using GUSHR, but some genes interval are short than CDS interval after updating.

braker3 results:

h1tg000001l AUGUSTUS gene 165856 168762 . - . ID=g16;
h1tg000001l AUGUSTUS mRNA 165856 168762 1 - . ID=g16.t1;Parent=g16;
h1tg000001l AUGUSTUS stop_codon 165856 165858 . - 0 ID=g16.t1.stop1;Parent=g16.t1;
h1tg000001l AUGUSTUS CDS 165856 168762 1 - 0 ID=g16.t1.CDS1;Parent=g16.t1;
h1tg000001l AUGUSTUS exon 165856 168762 . - . ID=g16.t1.exon1;Parent=g16.t1;
h1tg000001l AUGUSTUS start_codon 168760 168762 . - 0 ID=g16.t1.start1;Parent=g16.t1;

GUSHR results:

h1tg000001l GUSHR gene 76535 165638 . - . ID=gene:g16;biotype=protein_coding;
h1tg000001l GUSHR mRNA 76535 165638 . - . ID=transcript:g16.t1;Parent=gene:g16;biotype=protein_coding;
h1tg000001l AnnotationFinalizer three_prime_utr 76521 76535 . - . Parent=transcript:g16.t1;
h1tg000001l AnnotationFinalizer three_prime_utr 165638 165855 . - . Parent=transcript:g16.t1;
h1tg000001l AUGUSTUS stop_codon 165856 165858 . - 0 Parent=transcript:g16.t1;
h1tg000001l AUGUSTUS CDS 165856 168762 1 - 0 Parent=transcript:g16.t1;
h1tg000001l AUGUSTUS start_codon 168760 168762 . - 0 Parent=transcript:g16.t1;
h1tg000001l AnnotationFinalizer five_prime_utr 168763 169190 . - . Parent=transcript:g16.t1;

I want to know how to fix the gff file. Thanks.

The text was updated successfully, but these errors were encountered:

KatharinaHoff · 2024-12-13T04:32:42Z

It is a bug. We don’t have a quick Solution to fix this. leon945945 ***@***.***> schrieb am Fr. 13. Dez. 2024 um 03:36:

…

Hi, I used barker3 to annotate my genome, and updated UTR region using GUSHR, but some genes interval are short than CDS interval after updating. braker3 results: h1tg000001l AUGUSTUS gene 165856 168762 . - . ID=g16; h1tg000001l AUGUSTUS mRNA 165856 168762 1 - . ID=g16.t1;Parent=g16; h1tg000001l AUGUSTUS stop_codon 165856 165858 . - 0 ID=g16.t1.stop1;Parent=g16.t1; h1tg000001l AUGUSTUS CDS 165856 168762 1 - 0 ID=g16.t1.CDS1;Parent=g16.t1; h1tg000001l AUGUSTUS exon 165856 168762 . - . ID=g16.t1.exon1;Parent=g16.t1; h1tg000001l AUGUSTUS start_codon 168760 168762 . - 0 ID=g16.t1.start1;Parent=g16.t1; GUSHR results: h1tg000001l GUSHR gene 76535 165638 . - . ID=gene:g16;biotype=protein_coding; h1tg000001l GUSHR mRNA 76535 165638 . - . ID=transcript:g16.t1;Parent=gene:g16;biotype=protein_coding; h1tg000001l AnnotationFinalizer three_prime_utr 76521 76535 . - . Parent=transcript:g16.t1; h1tg000001l AnnotationFinalizer three_prime_utr 165638 165855 . - . Parent=transcript:g16.t1; h1tg000001l AUGUSTUS stop_codon 165856 165858 . - 0 Parent=transcript:g16.t1; h1tg000001l AUGUSTUS CDS 165856 168762 1 - 0 Parent=transcript:g16.t1; h1tg000001l AUGUSTUS start_codon 168760 168762 . - 0 Parent=transcript:g16.t1; h1tg000001l AnnotationFinalizer five_prime_utr 168763 169190 . - . Parent=transcript:g16.t1; I want to know how to fix the gff file. Thanks. — Reply to this email directly, view it on GitHub <#896>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJMC6JDJDRABJG7773HYDLD2FJB4JAVCNFSM6AAAAABTRCCIVKVHI2DSMVQWIX3LMV43ASLTON2WKOZSG4ZTOMRUG43TSNA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

leon945945 · 2024-12-13T04:48:54Z

Bad news :(

leon945945 · 2024-12-26T12:50:58Z

Hi, I wrote a simple python scripts to fix the gff

import pandas as pd
gff = pd.read_csv("GUSHR.gff3",sep="\t",header=None,names=["seqname","source","feature","start","end","score","strand","frame","attribute"])
flag = 0
allRows = []
for idx in gff.index:
        leng = gff.shape[0]
        i = leng - idx -1
        row = gff.iloc[i,:]
        if row["feature"] != "mRNA" and row["feature"] != "gene" and flag == 0:
                end = row["end"]
                flag += 1
        elif row["feature"] == "mRNA" or row["feature"] == "gene":
                start = pre["start"]
                row["start"] = start; row["end"] = end
                flag = 0
        pre = row
        allRows.insert(0,row)
newGff = pd.concat(allRows, axis=1).T
newGff.to_csv("GUSHR.gixed.gff",sep="\t",header=None,index=False)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gene interval is short than CDS interval #896

gene interval is short than CDS interval #896

leon945945 commented Dec 13, 2024

KatharinaHoff commented Dec 13, 2024 via email

leon945945 commented Dec 13, 2024

leon945945 commented Dec 26, 2024

gene interval is short than CDS interval #896

gene interval is short than CDS interval #896

Comments

leon945945 commented Dec 13, 2024

KatharinaHoff commented Dec 13, 2024 via email

leon945945 commented Dec 13, 2024

leon945945 commented Dec 26, 2024