-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of volume_statistics #1498
Comments
If this is really just this simple change, I think we can include it.
|
Not quite, I have one test failing. Anyway I don't want to stress over this. But it's something that should be considered looking into for 2.6 as I don't think one year of monthly data should get stuck as it does right now. |
no tests fail in CI, Saskia, make sure you have all up to date. Yes, I aim at daskifying as much as I have time for in the very near future, judging how things stand now in terms of time, it might just be one function 😆 |
maybe you and me should form the A-Dask-Team 😁 |
For future references, this seems to run without breaking the tests:
The indices should be written in a better way and scientific results should be double checked, but at least it's a start. |
Just a quick comment that while it's only in a couple public recipes in esmvaltool, it is used a lot in private recipes. However, if it can be made lazy and it doesn't break the weighting using fx files (which I don't think is covered in the unit tests), then go for it! |
@ledm cheers! By private recipes you mean AR6 stuff? 🍺 |
I use this all the time. It's in so many recipes in ERSMValTool_private, ESMValTool_AR6, and on some other unmerged branches of ESMValTool as well. |
I worked a bit on a new implementation, testing it with
In terms of performance, the current implementation takes:
While the new one takes:
So even though the memory consumption is slightly larger, the reduction in the execution time is quite an improvement. |
Impressive speedup, @sloosvel! Is there a PR to have a peek? |
It's in branch |
If I remember correctly, the reason we didn't do this before was that dask average didn't accept weights before. I'm guessing that it does now, but it may be worth appending the test to ensure that the weights are treated correctly. |
The new approach uses iris to do the averages, as it will be easier to add new operators (since it seems it's something that is pending to be done). It just gets rid of the loops, that really slow down the code when the number of levels or the number of timesteps increases. But if you have an example for the test it can be added just to be sure. |
Describe the bug
@ESMValGroup/esmvaltool-coreteam do I have time to quickly fix volume statistics for 2.5? The current implementation performs quite poorly and I have been running the function in one year of data with 75 levels for half an hour and it's not done yet. I think that any approach such as da.average(axis=[all axis except time], weights) would be better than the double loop that is implemented right now.
Please attach
run
directory in the output directorymain_log_debug.txt
file, this can also be found in therun
directory in the output directoryThe text was updated successfully, but these errors were encountered: