very slow parallel ArrayMesh? #14

pmcdonal · 2022-05-25T23:02:41Z

Computing IC power for Abacus cubes, the time for mpi.size>1 is dominated by the line:
mesh_init = pypower.ArrayMesh(mesh_init,L,mpiroot=0)
which jumps from ~7 seconds for one process to ~90s for 2 or 32 processes on cori (and worse, ~4s to 150s on my laptop, so not cori-specific)

Of course can just plow through it, but might as well cut down on friction where possible...

seems to come from
pmesh/pm.py, line ~445

mesh_init being:

  if not mpi.rank():
    with asdf.open(ic_file, lazy_load=False) as af:
      mesh_init = af['data']['density']
  else:
      mesh_init=None

(is there a way to parallel read asdf?)

The text was updated successfully, but these errors were encountered:

pmcdonal · 2022-05-25T23:05:15Z

Somehow lost that that pm.py line is:
mpsort.permute(flatiter, argindex=ind.flat, comm=self.pm.comm, out=self.flat)

adematti · 2022-05-25T23:24:30Z

Just a quick comment before going to sleep:
not sure I'll be very useful here as these are Yu Feng's routines --- but I can try to help.
I got the unravel() trick from nbodykit, https://github.com/bccp/nbodykit/blob/4aec168f176939be43f5f751c90363b39ec6cf3a/nbodykit/source/mesh/array.py#L62, which enforces all ranks but root to have 0 size array. I'd guess we could avoid that as long as the flattened mesh is split (in natural order) across all ranks.

What is the mesh shape?

About asdf, I usually just read the rows of interest for each process, e.g. https://github.com/cosmodesi/mpytools/blob/6f2766ea00b5f316f70e221672cf8d41ac6166f4/mpytools/io.py#L969. Since only slices (start, stop, step) are supported in asdf slicing (if I remember correctly), this should do the right thing, i.e. only read the relevant rows for each process. Not sure this is faster than non-parallel io, though (haven't tried much).

pmcdonal · 2022-05-25T23:45:00Z

Yes, the pmesh/pm.py line I mentioned is coming from this unravel() line. This makes it several times faster to just run the whole thing in 1 process (while standard CatalogFFTPower is much faster with mpi, i.e., my mpi is working). In the grand scheme of things this particular case is unimportant, so I will try this asdf read thing just out of curiosity and then forget about it for now. (actually though, it looks like ArrayMesh assumes the data is not distributed, i.e., mpiroot=None doesn't look valid?)(which I guess makes sense... a mesh being different to distribute than list of objects)(I hadn't really thought about it, having gotten used to the catalogs)

adematti · 2022-05-25T23:54:54Z

(actually though, it looks like ArrayMesh assumes the data is not distributed, i.e., mpiroot=None doesn't look valid?)
=> yes, that's the part I got from nbodykit and may be relaxed to accept distributed arrays, as long as they are distributed with increasing C index. I may try to allow for the distributed version at some point (if you do not try first!)

adematti · 2022-05-26T16:01:44Z

commit acba368 should allow to pass distributed array to ArrayMesh, e.g.
mesh = ArrayMesh(distributed_array, boxsize=boxsize, nmesh=shape, mpiroot=None)
(nmesh must be provided in this case)
this may still not help with the slowness issue; there may be room for improvement in the specific case of the full mesh hold by a single rank, but I would need more details for testing purposes: the mesh shape or better the path to ic_file

pmcdonal · 2022-05-26T19:42:38Z

The file is
/global/cfs/cdirs/desi/public/cosmosim/AbacusSummit/ic/AbacusSummit_base_c000_ph000/ic_dens_N576.asdf

This parallel read does work (producing same results). mpi.size>1 is faster, but still overall not as fast as mpi.size=1. I'm happy to leave this until it comes up somewhere as a real obstacle.

pmcdonal added the bug Something isn't working label May 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

very slow parallel ArrayMesh? #14

very slow parallel ArrayMesh? #14

pmcdonal commented May 25, 2022

pmcdonal commented May 25, 2022

adematti commented May 25, 2022 •

edited

Loading

pmcdonal commented May 25, 2022

adematti commented May 25, 2022

adematti commented May 26, 2022

pmcdonal commented May 26, 2022

very slow parallel ArrayMesh? #14

very slow parallel ArrayMesh? #14

Comments

pmcdonal commented May 25, 2022

pmcdonal commented May 25, 2022

adematti commented May 25, 2022 • edited Loading

pmcdonal commented May 25, 2022

adematti commented May 25, 2022

adematti commented May 26, 2022

pmcdonal commented May 26, 2022

adematti commented May 25, 2022 •

edited

Loading