Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hysplit concentration read speedup #177

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

TAdeJong
Copy link

@TAdeJong TAdeJong commented Jul 2, 2024

For my use case of reading relatively large hysplit concentration grids, I found readfile to be much slower than the hysplit simulation itself.
Looking at the code, 93% of the time was spend on iterative calls to xr.merge.
I had to debug a bit, because it seems that, at least for my output of hysplit v5.2.2, the species and levels were flipped in order.
Building a list of lists and calling xr.merge and xr.concat yielded a significant speedup of roughly 15x, very worthwhile for me.

Tests are passing, but tests are not actually testing this part of the code.

BTW: I think further speed up would be possible by lifting the conversion to a pandas dataframe out of the innermost function.

Tobias de Jong added 2 commits July 2, 2024 13:28
@TAdeJong
Copy link
Author

TAdeJong commented Jul 2, 2024

OK, this is more complicated than I thought; This version breaks does not work for some output I have.
WIP.

@TAdeJong
Copy link
Author

TAdeJong commented Jul 2, 2024

This now handles empty columns, however, there might be more cases to consider that I do not know of.

@zmoon
Copy link
Member

zmoon commented Jul 8, 2024

Thanks @TAdeJong, this sounds beneficial. @amcz do you have any initial thoughts?

@TAdeJong
Copy link
Author

I ran into another edge case and fixed that. By peeling out some of the logic out of the inner loop I got another factor of ~2 speedup.
Looking at a line profile, a lot of time is still spend in xarray merging and pandas indexing. I suspect another order of magnitude could be won by pre-allocating xr.DataArray's and indexing the underlying arrays directly while reading records, but that would require a more major rewrite.

@amcz
Copy link
Collaborator

amcz commented Jul 29, 2024

Thanks! the reader could use some improvements and I appreciate this work on it. I will have time to review it and pull into hysplit development branch in sometime in beginning or mid August.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants