-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invoking xesmf with mpirun #79
Comments
So you don't need any inter-process communication using mpi4py? In that case I would suggest not using
Parallel weight construction is not supported yet, but the weights can be apply in parallel via Dask. See a long discussion at #3. Does your use case actually need MPI-style parallelization? If the data can be chunked in vertical/time dimension, Dask should be sufficient. Any reason for having to chunk in the horizontal? |
@JiaweiZhuang thank you for the very quick reply and useful suggestions. Yes that is correct, the only reason to use MPI-style parallelization is to launch across multiple nodes. The SLURM suggestion is a good one, and this is what I'm doing on a cluster that has that installed. However I also need to get it running on a PBS cluster system which uses MPI to do task launching. I have tried disconnecting the MPI communicator (comm.Disconnect()) after start-up but this seems to crash ESMF with a seg fault. |
OK, I think you've answered this. The best approach is probably to use job arrays. An alternative might be use ESMF compiled without mpi support. |
…ble-vars Allow non-regriddable vars in datasets
I am using mpirun to run my Python program across multiple nodes in a cluster. Each instance of the program uses MPI to determine it's own rank and the number of processes but nothing else. Each program also uses xESMF to do some regridding.
The problem is that the underlying ESMF library then tries to decompose the regridding task across the ranks. xESMF does not handle this and will have an error.
Since xESMF does not support parallel regridding (yet) - is there a way to ensure that the underlying library does not try to do this?
Any thoughts or work-around ideas would be much appreciated.
The text was updated successfully, but these errors were encountered: