-
Notifications
You must be signed in to change notification settings - Fork 5
map file generation is slow and fails for big problems #5
Comments
I am curious to know what kind of error (MemoryError, etc) or is it just too slow? |
Pretty sure it was a memory error, but I don't recall the specific message. I had to use several nodes to get over the memory hurdle with MPI. |
Per xesmf documentation: https://xesmf.readthedocs.io/en/latest/limitations.html
I just found about it |
We are currently using xESMF, but don't have to. ESMPy does support MPI: though it's not clear how to integrate with dask. |
Introducing MPI, ESMPy's complicated interface :) , integrating these with Xarray and Dask would definitely be a conundrum. I am curious, what is the highest priority for esmlab-regrid? Is it usability? Performance? Do we want users to be able to perform regridding with one line of code? Because if usability is not the highest priority, it would be worth looking into MPI and ESMPy functionality |
It looks like Dask's folks are looking into this kind of workflow: Running Dask and MPI programs together an experiment |
@matt-long, Correct me if I'm wrong. This kind of parallelism is only needed when generating the weights. Once you have the weights, you don't need ESMPy/MPI machinery anymore. To apply the weights which is a matrix multiplication would be done without this heavy machinery, and this could be achieved with Scipy/Dask/Xarray, right? |
I think our focus should remain on an end-to-end workflow and usability in the near term, but keep performance thru parallelism on the radar. We could consider prototyping an MPI implementation as a standalone script, analogous to that shown here. @andersy005, you are correct. The weights files are sparse matrices and are handled well by scipy.sparse. |
@matt-long, was the work you were doing to generate |
Yes |
Since you are not using xesmf and ESMF/ESMPy, and the code deals with raw NumPy, I was thinking of exploring some optimization with numba and dask. Do you see any value in this or am I missing anything before I end up going down a rabbit hole :) ? |
By "connected" I mean that that code was used in the same project. It does not compute the weight files, but rather only the grid file. It's fast enough as is, I'd say. Not a high priority for optimization. |
Good point. Does this mean that the failing component is esmlab-regrid/esmlab_regrid/core.py Lines 84 to 88 in b8b7182
|
Yes. |
Thank you for the clarification! Speaking of high priority, is there anything on your plate I can help with? :) |
Not sure if related to JiaweiZhuang/xESMF#29. Parallel weight generation is very hard (if possible at all) to rewrite in a non-MPI way. But after the weights are generated, applying them to data using dask is much easier. My plan is to clearly separate between "weight generation" and "weight application" phases:
Such separation will be much clearer after resolving JiaweiZhuang/xESMF#11. My plan is to have a "mini-xesmf" installation that doesn't depend on ESMPy -- it will just construct a complete regridder from existing weight files, generated from a ESMPy program running elsewhere (potentially a huge MPI run, potentially with a xesmf wrapper for better usability). |
I recently wanted to generate weights to map ETOPO1 (1-minute data) to 0.1° POP. The esmlab.regrid function failed.
I resorted to running
ESMF_RegridWeightGen
in MPI on 12 Cheyenne nodes.The text was updated successfully, but these errors were encountered: