Invoking xesmf with mpirun #79

nichannah · 2019-12-14T04:05:17Z

I am using mpirun to run my Python program across multiple nodes in a cluster. Each instance of the program uses MPI to determine it's own rank and the number of processes but nothing else. Each program also uses xESMF to do some regridding.

The problem is that the underlying ESMF library then tries to decompose the regridding task across the ranks. xESMF does not handle this and will have an error.

Since xESMF does not support parallel regridding (yet) - is there a way to ensure that the underlying library does not try to do this?

Any thoughts or work-around ideas would be much appreciated.

JiaweiZhuang · 2019-12-14T04:16:33Z

Each instance of the program uses MPI to determine it's own rank and the number of processes but nothing else.

So you don't need any inter-process communication using mpi4py? In that case I would suggest not using mpirun to launch your python script, but using a scheduler feature like Slurm Job Array Support and get your job ID via os.environ['SLURM_ARRAY_TASK_ID'].

Since xESMF does not support parallel regridding (yet)

Parallel weight construction is not supported yet, but the weights can be apply in parallel via Dask. See a long discussion at #3.

Does your use case actually need MPI-style parallelization? If the data can be chunked in vertical/time dimension, Dask should be sufficient. Any reason for having to chunk in the horizontal?

nichannah · 2019-12-14T04:50:27Z

@JiaweiZhuang thank you for the very quick reply and useful suggestions.

Yes that is correct, the only reason to use MPI-style parallelization is to launch across multiple nodes.

The SLURM suggestion is a good one, and this is what I'm doing on a cluster that has that installed. However I also need to get it running on a PBS cluster system which uses MPI to do task launching.

I have tried disconnecting the MPI communicator (comm.Disconnect()) after start-up but this seems to crash ESMF with a seg fault.

nichannah · 2019-12-15T04:17:43Z

OK, I think you've answered this. The best approach is probably to use job arrays. An alternative might be use ESMF compiled without mpi support.

…ble-vars Allow non-regriddable vars in datasets

nichannah closed this as completed Dec 15, 2019

aulemahal pushed a commit to Ouranosinc/xESMF that referenced this issue May 18, 2021

Merge pull request JiaweiZhuang#79 from pangeo-data/allow-nonregridda…

c942bd5

…ble-vars Allow non-regriddable vars in datasets

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invoking xesmf with mpirun #79

Invoking xesmf with mpirun #79

nichannah commented Dec 14, 2019 •

edited

Loading

JiaweiZhuang commented Dec 14, 2019 •

edited

Loading

nichannah commented Dec 14, 2019 •

edited

Loading

nichannah commented Dec 15, 2019

Invoking xesmf with mpirun #79

Invoking xesmf with mpirun #79

Comments

nichannah commented Dec 14, 2019 • edited Loading

JiaweiZhuang commented Dec 14, 2019 • edited Loading

nichannah commented Dec 14, 2019 • edited Loading

nichannah commented Dec 15, 2019

nichannah commented Dec 14, 2019 •

edited

Loading

JiaweiZhuang commented Dec 14, 2019 •

edited

Loading

nichannah commented Dec 14, 2019 •

edited

Loading