Hysplit concentration read speedup #177

TAdeJong · 2024-07-02T11:48:49Z

For my use case of reading relatively large hysplit concentration grids, I found readfile to be much slower than the hysplit simulation itself.
Looking at the code, 93% of the time was spend on iterative calls to xr.merge.
I had to debug a bit, because it seems that, at least for my output of hysplit v5.2.2, the species and levels were flipped in order.
Building a list of lists and calling xr.merge and xr.concat yielded a significant speedup of roughly 15x, very worthwhile for me.

Tests are passing, but tests are not actually testing this part of the code.

BTW: I think further speed up would be possible by lifting the conversion to a pandas dataframe out of the innermost function.

Tested against output of hysplit.v5.2.2

TAdeJong · 2024-07-02T12:48:53Z

OK, this is more complicated than I thought; This version breaks does not work for some output I have.
WIP.

TAdeJong · 2024-07-02T13:57:24Z

This now handles empty columns, however, there might be more cases to consider that I do not know of.

zmoon · 2024-07-08T17:56:49Z

Thanks @TAdeJong, this sounds beneficial. @amcz do you have any initial thoughts?

TAdeJong · 2024-07-17T12:35:41Z

I ran into another edge case and fixed that. By peeling out some of the logic out of the inner loop I got another factor of ~2 speedup.
Looking at a line profile, a lot of time is still spend in xarray merging and pandas indexing. I suspect another order of magnitude could be won by pre-allocating xr.DataArray's and indexing the underlying arrays directly while reading records, but that would require a more major rewrite.

amcz · 2024-07-29T21:34:03Z

Thanks! the reader could use some improvements and I appreciate this work on it. I will have time to review it and pull into hysplit development branch in sometime in beginning or mid August.

Tobias de Jong added 2 commits July 2, 2024 13:28

Use combine_nested. Not tested.

7fd89e8

Update to work with multiple tracers. Speed is 15x.

f21545e

Tested against output of hysplit.v5.2.2

Properly handle empty columns

8975252

Tobias de Jong added 2 commits July 17, 2024 12:50

Handle the case of no particles of a tracer in some timesteps

f10dbe7

Organise and further speedup

7fd9f08

Handle tracers fully outside the grid

25d725c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hysplit concentration read speedup #177

Hysplit concentration read speedup #177

TAdeJong commented Jul 2, 2024

TAdeJong commented Jul 2, 2024

TAdeJong commented Jul 2, 2024

zmoon commented Jul 8, 2024 •

edited

Loading

TAdeJong commented Jul 17, 2024

amcz commented Jul 29, 2024

Hysplit concentration read speedup #177

Are you sure you want to change the base?

Hysplit concentration read speedup #177

Conversation

TAdeJong commented Jul 2, 2024

TAdeJong commented Jul 2, 2024

TAdeJong commented Jul 2, 2024

zmoon commented Jul 8, 2024 • edited Loading

TAdeJong commented Jul 17, 2024

amcz commented Jul 29, 2024

zmoon commented Jul 8, 2024 •

edited

Loading