-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
too many LTR/unknown and most of LTR/unknown are classified as LTR/Copia #51
Comments
Hello @zhangrengang, I think this is a very good point and I agree that the classification of copia and gypsy in LTR_retriever is not the best scheme. I have been using the copia and gypsy specific hmms in rice to assign new LTR elements into these superfamilies. A better way would be to use the GyDB to assign superfamilies as you suggested. Another way I have been thinking of, but not yet get the time to implement, is to use the order of these conserved domains to classify, which is the fundamental difference between gypsy and copia. If you can implement a better scheme, welcome to contribute! For benchmarking of accuracy, I use the rice curated TE library. Best, |
Hello Dr. Ou, here is a simple implement. You may test it and/or intergrate it. |
Hello @zhangrengang , Thank you so much for developing these code in such a short time. I will test it soon and let you know. Best, |
Thousands of LTR in a plant genome are clasified as unkown by LTR_retriever. However, most of them are clasified as Copia on the basis of GyDB as belows:
I think there is an issue in
annotate_TE.pl
:Copia has the same wieght (0.3) as Gypsy but Copia only has 8 PFAMs, ~1/3 of 28 PFAMs of Gypsy.
The text was updated successfully, but these errors were encountered: