Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference chromosome(s) with no reads mapped to them are not in dictionary keys #1

Open
dehui333 opened this issue Oct 26, 2023 · 2 comments
Assignees

Comments

@dehui333
Copy link

dehui333 commented Oct 26, 2023

Hi,

Thanks for making this tool, it is useful for my work. I have a question about something:

When I ran hifieval with the -h option, I got the error message "Chromosomes don't match for Homopolymer Evaluation.". After looking at the code and some testing, I think this is because sequences in the reference that are not mapped to by any of the reads (thus not appearing in paf) will not be in correction_dict.keys() but will be in hp_dict.keys(). Is this intended?

if correction_dict.keys() != hp_dict.keys():

@dehui333 dehui333 changed the title Reference chromosome(s) with no reads mapped to them are not in dictionary index Reference chromosome(s) with no reads mapped to them are not in dictionary keys Oct 26, 2023
@magspho
Copy link
Owner

magspho commented Nov 16, 2023

Hello @dehui333 , sorry for the late reply and thank you for raising this issue! Did you use hifiasm for error correction? (That is the only tool I tested when I run a homopolymer evaluation) I will take a look into this issue as soon as I can and get back you.

@dehui333
Copy link
Author

dehui333 commented Nov 20, 2023

Hi, no worries. I did not use hifiasm for error correction. In fact, the issue can be reproduced by generating a random sequence to use as the "reference", then generating perfect reads from it to use as the "corrected reads" and modifying these reads somewhat (even if just 1bp) to use as the "reads before correction". If I add an extra sequence to the "reference" which does not appear in the pafs as a target sequence, the above-mentioned issue occurs.

Although if I removed that extra sequence, another issue occurs - an error:
ValueError: max() arg is an empty sequence with reference to

max_len = max(hp_len_range)

@magspho magspho self-assigned this Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants