Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ID Conversion #175

Open
rmhubley opened this issue Jul 16, 2024 · 2 comments
Open

ID Conversion #175

rmhubley opened this issue Jul 16, 2024 · 2 comments

Comments

@rmhubley
Copy link

rmhubley commented Jul 16, 2024

Feature Request: While I understand the necessity for ID conversion, the way it's implemented breaks downstream processing of the results. There are two ways to solve this (that I can think of), provide a translation from the old identifiers to the new in an output file, or convert back before outputting the final files.

@oushujun
Copy link
Owner

Hi Robert,

Sorry for the delayed response. Which ID conversion were you referring to? Can you please provide a little bit more information on the original format, the new format, and the expected format?

Shujun

@oushujun
Copy link
Owner

Hi Robert,

I am revisitin this issue. The ID conversion code implemented in LTR_retriever was to help users that don't know how to format their sequence IDs before any whole genome annotations. It's always recommended to format seq IDs before analysis. The current ID length allocation is limited to 13 characters because the downstream rmblast restrict ID length to 50 characters (If I remember correctly). The remaining length is reserved to contain coordinate information in intermediate files. Please refer to more discussions in #16 #93. Your proposed translation system may work, but still require users to tidy up sequence IDs before any analysis.

Thanks!
Shujun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants