Is there any way to parallelize the computation of multiple convolutions? #683

WangYun1995 · 2023-09-15T06:45:50Z

Hi,
First, I have a 3D density mesh (Nmesh=1536^3) obtained by using nbodykit, stored as a bigfile on my disk.
Then I want to convolve the mesh with a band-pass filter at multiple scales, which is equivalent to the multiplication operation in Fourier domain. Finally, at each scale, I compute some binned statistic env_WPS of the convolved mesh, and my code snippet is shown below.

import numpy as np
from scipy import stats
from nbodykit.lab import BigFileMesh

def bandpass(scale):
   '''The band-pass filter in the Fourier space
   '''
   def filter( k, v ):
      cn       = 4.0*np.sqrt( np.sqrt(np.pi**3)/(9.0+55.0*np.e) )
      k2       = k[0]**2 + k[1]**2 + k[2]**2
      k2      /= scale**2 
      k        = np.sqrt(k2)
      filter_  = cn*( np.exp(k-0.5*k2)*(k2-k) + np.exp(-0.5*k**2-k)*(k2+k) )
      filter_ /= np.sqrt(scale**3)
      return v*filter_
   return filter

# Load the density mesh
path_mesh  = '/.../.../dens_field.bigfile'
dens_mesh  = BigFileMesh(path_mesh, 'Field')

# Parameters
Lbox  = dens_mesh.attrs['BoxSize'][0]    # Unit: Mpc/h
Nmesh = dens_mesh.attrs['Nmesh'][0]     # Integer
kf = 2*np.pi/Lbox
kNyq  = Nmesh*np.pi/Lbox
scales  = np.geomspace(kf, 0.4*kNyq, 25)
dens_bins = np.geomspace(0.1,100,7,endpoint=True) # The density bins

# Compute the volume fraction of each density environment
dens_m     = real_mesh_.compute(mode='real', Nmesh=Nmesh)
dens_m_    = np.ravel(dens_m)
dens_bins_ = np.pad( dens_bins, (1, 1), 'constant', constant_values=(0,np.amax(dens_m)) )
Nvol, _, _ = stats.binned_statistic(dens_m_, dens_m_, 'count', bins=dens_bins_)
fvol       = Nvol/Nmesh**3

# Initialze the env_WPS and global-WPS
env_WPS    = np.zeros( (len(scales), len(dens_bins)+1) )
global_WPS = np.zeros_like( scales )
# Perform the CWT and compute the env-WPS and global-WPS
for i, scale in enumerate(scales):
   cwt_mesh           = real_mesh.apply(bandpass(scale), mode='complex', kind='wavenumber')
   cwt_rmesh          = cwt_mesh.compute(mode='real', Nmesh=Nmesh)
   cwt2               = cwt_rmesh**2
   env_WPS[i,:], _, _ = stats.binned_statistic(dens_m_, np.ravel(cwt2), 'mean', bins=dens_bins_)
   global_WPS[i]      = np.sum( env_WPS[i,:]*fvol )

Since the above code uses only one core, it is inefficient in the case of a large Nmesh. So how to parallelize the code in the context of NBODYKIT?

The text was updated successfully, but these errors were encountered:

rainwoodman · 2023-09-17T20:44:31Z

Running the code under mpirun. Each MPI process will get a part of the full density field.
I think 'stats.binned_statistic' would only compute the partial statistics on the part of the full density field that is local to this process. You will likely need to call mpi.COMM_WORLD.allreduce(...) on the result of stats.binned_statistic to add up the partial statistics (need to undo the mean first), and then recompute the mean.

WangYun1995 · 2023-09-17T23:41:57Z

Thank you for your nice reply, i will try what you said.

WangYun1995 · 2023-09-20T06:48:21Z

Hi Yu,
The method you suggest does work. However, Parallelization is not going to be very efficient. For example, I use a mesh with Nmesh = 768**3 as a test. When using only one core, the time taken is 1407 seconds. When using 64 cores, the time taken is 722 seconds. When using 96 cores, the time taken is 611 seconds.
Obviously, turning on the parallelization only saves half the time， not 1407/64 or 1407/96 seconds.
Why is this the case?

rainwoodman · 2023-09-20T15:37:20Z

Do you know which part of your program is taking most time? Usually if your application is already bound by IO then the speed will depend on how much the file system can provide. Adding some prints of the time can help us find out. Less critical is making sure cpu cores are not oversubscribed, by making sure no adequate MKL or OpenMP threads per task.

…

On Tue, Sep 19, 2023 at 11:48 PM WangYun1995 ***@***.***> wrote: Hi Yu, The method you suggest does work. However, Parallelization is not going to be very efficient. For example, I use a mesh with Nmesh = 768**3 as a test. When using only one core, the time taken is 1407 seconds. When using 64 cores, the time taken is 722 seconds. When using 96 cores, the time taken is 611 seconds. Obviously, turning on the parallelization only saves half the time， not 1407/64 or 1407/96 seconds. Why is this the case? — Reply to this email directly, view it on GitHub <#683 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABBWTCH26BDOYSNHN65SMDX3KGT7ANCNFSM6AAAAAA4ZKYTSQ> . You are receiving this because you commented.Message ID: ***@***.***>

WangYun1995 · 2023-09-22T05:06:13Z

The operations inside the for loop consume the most time.

rainwoodman · 2023-09-22T17:00:56Z

Perhaps it is because the computation and disk IO that leads to dens_m is repeated for each iteration? I recall there is a way to create a Mesh from dens_m (FieldMesh? ArrayMesh)? Cannot recall the name right away). perhaps replace the real density Mesh object used in the loop with that?

…

On Thu, Sep 21, 2023 at 10:06 PM WangYun1995 ***@***.***> wrote: The operations inside the for loop consume the most time. — Reply to this email directly, view it on GitHub <#683 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABBWTE4L6DOLIE5WJDUPA3X3UME7ANCNFSM6AAAAAA4ZKYTSQ> . You are receiving this because you commented.Message ID: ***@***.***>

WangYun1995 · 2023-09-23T07:53:17Z

This really improves the efficiency. I will acknowledge you in my new paper.

rainwoodman · 2023-09-23T16:20:55Z

Glad to hear it worked. Thank you!

…

On Sat, Sep 23, 2023 at 12:53 AM WangYun1995 ***@***.***> wrote: This really improves the efficiency. I will acknowledge you in my new paper. — Reply to this email directly, view it on GitHub <#683 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABBWTFBS2ESZ4IBIFSKXR3X32IPRANCNFSM6AAAAAA4ZKYTSQ> . You are receiving this because you commented.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there any way to parallelize the computation of multiple convolutions? #683

Is there any way to parallelize the computation of multiple convolutions? #683

WangYun1995 commented Sep 15, 2023

rainwoodman commented Sep 17, 2023

WangYun1995 commented Sep 17, 2023

WangYun1995 commented Sep 20, 2023

rainwoodman commented Sep 20, 2023 via email

WangYun1995 commented Sep 22, 2023

rainwoodman commented Sep 22, 2023 via email

WangYun1995 commented Sep 23, 2023

rainwoodman commented Sep 23, 2023 via email

Is there any way to parallelize the computation of multiple convolutions? #683

Is there any way to parallelize the computation of multiple convolutions? #683

Comments

WangYun1995 commented Sep 15, 2023

rainwoodman commented Sep 17, 2023

WangYun1995 commented Sep 17, 2023

WangYun1995 commented Sep 20, 2023

rainwoodman commented Sep 20, 2023 via email

WangYun1995 commented Sep 22, 2023

rainwoodman commented Sep 22, 2023 via email

WangYun1995 commented Sep 23, 2023

rainwoodman commented Sep 23, 2023 via email