Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'replace' #127

Open
meiyang12 opened this issue Nov 3, 2022 · 6 comments
Open

KeyError: 'replace' #127

meiyang12 opened this issue Nov 3, 2022 · 6 comments
Labels

Comments

@meiyang12
Copy link

meiyang12 commented Nov 3, 2022

I ran gff3_merge program, and I got the error

Traceback (most recent call last):
  File "/home/meiyang/anaconda3/bin/gff3_merge", line 33, in <module>
    sys.exit(load_entry_point('gff3tool==2.1.0', 'console_scripts', 'gff3_merge')())
  File "/home/meiyang/anaconda3/lib/python3.9/site-packages/gff3tool-2.1.0-py3.9.egg/gff3tool/bin/gff3_merge.py", line 229, in script_main
    main(args.gff_file1, args.gff_file2, args.fasta, report_fh, args.output_gff, args.all, args.auto_assignment, args.user_defined_file1, args.user_defined_file2, logger=logger_stderr)
  File "/home/meiyang/anaconda3/lib/python3.9/site-packages/gff3tool-2.1.0-py3.9.egg/gff3tool/bin/gff3_merge.py", line 70, in main
    gff3_merge.revision.main(gff_file=gff_file1, revision_file=autoFILE, output_gff=autoReviseGff, report_file=autoReviseReport, user_defined1=user_defined1, auto=auto, logger=logger)
  File "/home/meiyang/anaconda3/lib/python3.9/site-packages/gff3tool-2.1.0-py3.9.egg/gff3tool/lib/gff3_merge/revision.py", line 227, in main
    tag = ','.join(child['attributes']['replace']).replace(' ','')
KeyError: 'replace'

and my gff3 files were like these:
one:

##gff-version 3
##sequence-region BMSK_chr10_RagTag 64593 17730062
BMSK_chr10_RagTag       Liftoff gene    64593   65277   .       +       .       ID=gene5668;name=BMSK0005247;coverage=1.0;sequence_ID=1.0;valid_ORFs=1;extra_copy_number=0;copy_num_ID=BMSK0005247_0
BMSK_chr10_RagTag       Liftoff mRNA    64593   65277   .       +       .       ID=mRNA5668;Parent=gene5668;name=BMSK0005247.1;matches_ref_protein=True;valid_ORF=True;extra_copy_number=0
BMSK_chr10_RagTag       Liftoff exon    64593   64628   .       +       .       Parent=mRNA5668;extra_copy_number=0
BMSK_chr10_RagTag       Liftoff five_prime_UTR  64593   64628   .       +       .       Parent=mRNA5668;extra_copy_number=0
BMSK_chr10_RagTag       Liftoff five_prime_UTR  64752   64818   .       +       .       Parent=mRNA5668;extra_copy_number=0
BMSK_chr10_RagTag       Liftoff exon    64752   65277   .       +       .       Parent=mRNA5668;extra_copy_number=0
BMSK_chr10_RagTag       Liftoff CDS     64819   65028   .       +       0       Parent=mRNA5668;extra_copy_number=0
BMSK_chr10_RagTag       Liftoff three_prime_UTR 65029   65277   .       +       .       Parent=mRNA5668;extra_copy_number=0

two:

##gff-version 3
##sequence-region BMSK_chr10_RagTag 60014 17730543
BMSK_chr10_RagTag       Liftoff gene    60014   62435   .       -       .       ID=gene6047;Name=KWMTBOMO05391;coverage=0.997;sequence_ID=0.996;valid_ORFs=0;extra_copy_number=0;copy_num_ID=KWMTBOMO05391_0
BMSK_chr10_RagTag       Liftoff mRNA    60014   62435   .       -       .       ID=mRNA6047;Name=KWMTBOMO05391;Parent=gene6047;matches_ref_protein=False;valid_ORF=False;missing_stop_codon=True;extra_copy_number=0
BMSK_chr10_RagTag       Liftoff transcription_end_site  60014   60014   .       -       .       Parent=mRNA6047;extra_copy_number=0
BMSK_chr10_RagTag       Liftoff exon    60014   60869   .       -       .       Parent=mRNA6047;extra_copy_number=0
BMSK_chr10_RagTag       Liftoff stop_codon      60169   60171   .       -       .       Parent=mRNA6047;extra_copy_number=0
BMSK_chr10_RagTag       Liftoff CDS     60169   60869   .       -       1       Parent=mRNA6047;extra_copy_number=0
BMSK_chr10_RagTag       Liftoff terminal        60169   60869   .       -       .       Parent=mRNA6047;extra_copy_number=0
BMSK_chr10_RagTag       Liftoff CDS     62184   62227   .       -       0       Parent=mRNA6047;extra_copy_number=0
BMSK_chr10_RagTag       Liftoff initial 62184   62227   .       -       .       Parent=mRNA6047;extra_copy_number=0
BMSK_chr10_RagTag       Liftoff exon    62184   62338   .       -       .       Parent=mRNA6047;extra_copy_number=0
BMSK_chr10_RagTag       Liftoff start_codon     62225   62227   .       -       .       Parent=mRNA6047;extra_copy_number=0
BMSK_chr10_RagTag       Liftoff exon    62427   62435   .       -       .       Parent=mRNA6047;extra_copy_number=0
BMSK_chr10_RagTag       Liftoff transcription_start_site        62435   62435   .       -       .       Parent=mRNA6047;extra_copy_number=0

anyone can help?

@mpoelchau
Copy link
Contributor

Sorry about the problems, @meiyang12! I can't tell from the gff3 snippets what might be wrong. If your gff3 and genome fasta files aren't too big and you're willing to share them, I can run the program and try to help debug. Let me know and we can figure out a way to transfer them.

If that doesn't work, you might try running the ID generator script - it might be that gff3_merge requires that all features have IDs. python lib/gff3_ID_generator.py -g in.gff3 -og out.gff -uuid -r report.txt

@meiyang12
Copy link
Author

Sorry about the problems, @meiyang12! I can't tell from the gff3 snippets what might be wrong. If your gff3 and genome fasta files aren't too big and you're willing to share them, I can run the program and try to help debug. Let me know and we can figure out a way to transfer them.

If that doesn't work, you might try running the ID generator script - it might be that gff3_merge requires that all features have IDs. python lib/gff3_ID_generator.py -g in.gff3 -og out.gff -uuid -r report.txt

Hi, thanks for your suggestions. I am sorry for the late response. I tried the methods you say, but it occurred errors. So, I have uploaded the three file to google drive (https://drive.google.com/drive/folders/1YlQfbJuNmCWl3FHG3OavLusGKMuJDENt?usp=share_link).

@mpoelchau
Copy link
Contributor

Thanks for sharing the files. It looks like there'a a bug that misinterprets the attributes sequence_ID and copy_num_ID as the ID attribute. Removing those attributes allowed the merge program to complete. We don't have a fix yet, but if you need to run the program now, I'd advise removing or renaming those attributes to exclude the ID portion of the tag.

@mpoelchau mpoelchau added the bug label Dec 14, 2022
@Ocean-Lyu
Copy link

Hi, thanks for gff3toolkit!

I also ran the gff_merge and encountered the same error. Renaming the sequence_ID and copy_num_ID did solve the problem. However, I met this error again when I tried to auto-assign replace tags for types beyond mRNA and my user defined file is as below:

mRNA exon
lnc_RNA exon
transcript exon
snoRNA exon
rRNA exon
tRNA exon
snRNA exon
V_gene_segment exon
C_gene_segment exon

The error emerged again after I added rRNA, tRNA, snRNA, V_gene_segment and C_gene_segment. Could you possibly offer some suggestions?

Some example gff entries are as follow:
image

@MonicaPoelchau-USDA
Copy link

Sorry you're having trouble with the gff3_merge program! Is there a way you could share the 2 gff files and your command with me? I can try to debug with that information.

@Ocean-Lyu
Copy link

I circumvented the problem by renaming 'V_gene_segment' and 'C_gene_segment' in the third column into 'mRNA' using:

sed -E -i 's/V_gene_segment/mRNA/' my.gff

and named them back after gff_merge using:

sed -E -i '/^\S+\t\S+\tmRNA.*V_segment/ s/mRNA/V_gene_segment/' my.gff

which I think worked just fine and there seems to be no missing information.
I do not konw if doing so is correct or can I carry on with the current gff?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants