Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use generator instead of list in create_file_intersections() to save memory #126

Merged
merged 1 commit into from
Jun 10, 2022

Conversation

pwwang
Copy link
Contributor

@pwwang pwwang commented Jun 10, 2022

I have 70+ vcfs. While running consistency, it complains about memory usage with the list used in create_file_intersections(). Using a generator solves it.

Use generator instead of list in create_file_intersections() to save memory
@ACEnglish ACEnglish merged commit abf677f into ACEnglish:develop Jun 10, 2022
@ACEnglish
Copy link
Owner

Thank you for the change. It all looks good.

The consistency tool could definitely use some further improvements. I've been meaning to refactor it for some time. If you find further opportunities to fix it, please feel free to create more pull requests.

@pwwang
Copy link
Contributor Author

pwwang commented Jun 10, 2022

Will give it a try. Current implementation still requires tons of memory with large # vcfs.

@ACEnglish
Copy link
Owner

So, thinking about this, I feel like consistency tool could easily be sped up if we change one assumption. Currently, it has no requirements for input VCFs being sorted. If we require sorted/indexed VCFs, it would be crazy simple to reuse truvari.bench.file_zipper to simply 'pop' entries from multiple VCFs in-order and do the comparisons.

Given that your analysis of 70+ VCFs is likely the most use consistency has gotten, I'm curious about your opinion of requiring sorted/indexed vcfs. If you would be more than okay with that, I think I could prototype a much faster/lighter consistency over the weekend.

@pwwang
Copy link
Contributor Author

pwwang commented Jun 10, 2022

I don't think this requires sorting/index of VCF files. See my implementation at #127

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants