-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address Pandas Dataframe Merge and Other Performance Issues #199
Comments
Hi @gwaygenomics, just wanted to provide a quick update on this issue based on some findings yesterday. I ran some tests using some of the work from #198 to see just how much we could reduce the memory consumption. In addition to the dataset being modified to include
I'm still working on the merges themselves as these seem to still be a large bottleneck of memory consumption. In exploring performance with another resource profiling tool, Scalene, it predicts (using some presumptions built into that library) that the Pandas merges are a possible source of memory leakage, but only does so when using large datasets. I've stored some early results from this tool here. Please note there are some differences in how this tool vs Memray measure performance, so they do not show the exact same results. |
Pandas Dataframe merge (and likely other) performance during various runtime operations may hinder or completely stall progress. This issue is dedicated to addressing Pandas merge and other performance challenges, including solutions which may not involve or migrate from Pandas itself.
Issues which may be related or tied to this:
.merge_single_cells()
method #195The text was updated successfully, but these errors were encountered: