Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autoparallelized wfl is slower than GNU parallel. #266

Open
jungsdao opened this issue Sep 26, 2023 · 8 comments
Open

autoparallelized wfl is slower than GNU parallel. #266

jungsdao opened this issue Sep 26, 2023 · 8 comments

Comments

@jungsdao
Copy link
Contributor

Hello,
I was trying with wfl package with minima hopping.
And I have discovered that when the same process is compared with GNU parallel, wfl is way more slower.
Following, I have compared one geometry relaxation step within minima hopping between GNU parallel and wfl.

  1. GNU parallel
                Step[ FC]     Time          Energy          fmax
*Force-consistent energies used in optimization.
BFGSLineSearch:    0[  0] 14:25:38     -534.014633*       3.0278
BFGSLineSearch:    1[  2] 14:25:44     -534.189030*       1.3638
BFGSLineSearch:    2[  4] 14:25:48     -534.351071*       2.4404
BFGSLineSearch:    3[  5] 14:25:49     -534.665330*       2.9015
BFGSLineSearch:    4[  6] 14:25:54     -534.994242*       2.9427
BFGSLineSearch:    5[  8] 14:25:57     -535.194093*       3.3632
BFGSLineSearch:    6[ 10] 14:26:00     -535.408273*       2.3933
BFGSLineSearch:    7[ 12] 14:26:03     -535.550177*       1.3325
BFGSLineSearch:    8[ 14] 14:26:06     -535.684763*       3.0824
BFGSLineSearch:    9[ 16] 14:26:09     -536.134389*       1.6315
BFGSLineSearch:   10[ 17] 14:26:10     -536.266573*       1.9401
BFGSLineSearch:   11[ 18] 14:26:12     -536.369399*       1.6626
BFGSLineSearch:   12[ 19] 14:26:13     -536.445034*       0.9739
BFGSLineSearch:   13[ 21] 14:26:16     -536.561701*       0.8027
BFGSLineSearch:   14[ 23] 14:26:19     -536.650002*       0.5800
BFGSLineSearch:   15[ 24] 14:26:21     -536.681904*       0.3205
BFGSLineSearch:   16[ 25] 14:26:22     -536.701256*       0.3439
BFGSLineSearch:   17[ 26] 14:26:23     -536.713216*       0.1637
BFGSLineSearch:   18[ 27] 14:26:25     -536.717918*       0.2025
BFGSLineSearch:   19[ 29] 14:26:28     -536.719005*       0.1261
BFGSLineSearch:   20[ 30] 14:26:29     -536.720355*       0.0749
BFGSLineSearch:   21[ 31] 14:26:30     -536.721657*       0.1069
BFGSLineSearch:   22[ 34] 14:26:34     -536.724530*       0.1934
BFGSLineSearch:   23[ 35] 14:26:35     -536.726039*       0.2071
BFGSLineSearch:   24[ 37] 14:26:38     -536.727076*       0.1802
BFGSLineSearch:   25[ 38] 14:26:39     -536.728364*       0.1102
BFGSLineSearch:   26[ 39] 14:26:40     -536.729102*       0.0925
BFGSLineSearch:   27[ 40] 14:26:42     -536.729533*       0.0620
BFGSLineSearch:   28[ 41] 14:26:43     -536.729646*       0.0437

Whole relaxation step finished within 1-2 minutes when parallelized with GNU parallel.

  1. wfl
                Step[ FC]     Time          Energy          fmax
*Force-consistent energies used in optimization.
BFGSLineSearch:    0[  0] 14:45:06     -535.723086*       0.9643
BFGSLineSearch:    1[  1] 14:46:21     -535.822649*       2.3116
BFGSLineSearch:    2[  3] 14:50:03     -535.964835*       1.0688
BFGSLineSearch:    3[  5] 14:54:51     -535.992950*       0.5638
BFGSLineSearch:    4[  7] 14:56:34     -536.008899*       0.5146
BFGSLineSearch:    5[  9] 14:58:17     -536.019360*       0.5074
BFGSLineSearch:    6[ 11] 15:00:02     -536.027474*       0.6783
BFGSLineSearch:    7[ 12] 15:00:55     -536.050990*       0.5662
BFGSLineSearch:    8[ 14] 15:02:40     -536.066940*       0.9052
BFGSLineSearch:    9[ 16] 15:04:24     -536.200172*       1.0468

Whereas with wfl pacakge one relaxation step takes 1-2 minutes.

I'm using MACE potential for relaxation. Could anyone give some comments on potential reason why it's way slower in wfl parallelization? Many thanks in advance.

@bernstei
Copy link
Contributor

How are you parallelizing with "gnu parallel"? In general wfl parallelizes over multiple input configurations, but I have no idea how the minima hopping is parallelized - I'd guess over multiple initial configs.

@jungsdao
Copy link
Contributor Author

In GNU parallel, it's also parallelized over initial configuration. If that's what you asked.
As far as I know, in parallelization of minima hopping, it's also parallelized over multiple input configurations.

@bernstei
Copy link
Contributor

bernstei commented Sep 26, 2023

Without more information on exactly what you're calling and what the "gnu parallel" parallelization is doing (I didn't see any mention of it on the ASE minimahopping web docs), there's no way to tell. This looks like the output from a single config local minimization (that's what BFGSLineSearch usualky does). We have no idea what system you're using, how long a single force evaluation is expected to take, etc, so no way to know what's reasonable.

@jungsdao
Copy link
Contributor Author

jungsdao commented Sep 26, 2023

GNU parallel is not related to ASE minima hopping, but I just used it to parallelize minima hopping. It's just multiple execution of separate python script but shares one common file of minima.traj which is the history of found minima.
python script (parallel_minhop.py) would be look like following.

from ase.io import read
from ase.optimize.minimahopping import MinimaHopping

atoms = read("structure.traj")
atoms.calc = MACECalculator(model_path = final_mlip_file, device="cpu")

opt = MinimaHopping(atoms, Ediff0=0.75, T0=2000, fmax=0.05, 
		minima_traj="../minima.traj", timestep=0.5, use_abort_check=False)
opt(totalsteps = 80, maxtemp=2*2000) 

in a file called cmd.lst, the commands that I want to parallelize is given

cd ./00; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log 
cd ./01; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log 
cd ./02; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log 
cd ./03; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log 
cd ./04; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log 
...
cd ./56; srun --mem=4GB --exclusive -N 1 -n 1 python ../parallel_minhop.py > stdout.log 

And GNU parallel is executed with following command.
parallel -X --delay 0.2 --joblog task.log --progress --resume -j 72 < cmd.lst
Up to so far is how GNU parallel is executed.

What I posted above is logfile of local minimization of single configuration. (qn0000.log in minima hopping)
What I would expect is that GNU parallel and wfl should be similar in their geometry relaxation speed since I'm relaxing the same structure with the same MACE potential.

@bernstei
Copy link
Contributor

bernstei commented Sep 26, 2023

Which of those times is reasonable energy evaluation time for your system? How exactly are you running the wfl job? Are you somehow forcing all the parallel wfl processes to share one core?

@jungsdao
Copy link
Contributor Author

jungsdao commented Sep 26, 2023

Relaxation time of GNU parallel is more reasonable evaluation time for this system. It shouldn't be that slow.
Maybe it's better to attach tar.gz of files that I have used to parallelize minima hopping with wfl.

wfl_paralllel_minhop.tar.gz

@bernstei
Copy link
Contributor

I don't see anything obviously wrong. Can you ssh into the node while it's running? If you run top, you should see N (in principle 72, but I only see 57 initial configs) python processes, each using 100% CPU, not more. Is that what you see?

@bernstei
Copy link
Contributor

Other things that might be helpful - run on the node, but don't autoparallelize. Add print statements with timing info (print(time.time())) to the wfl minima hopping wrapper to see if something unexpected is taking a lot of time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants