Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intel MKL ERROR: Parameter 6 was incorrect on entry to DLASWP #285

Open
moabe84 opened this issue Apr 15, 2024 · 25 comments
Open

Intel MKL ERROR: Parameter 6 was incorrect on entry to DLASWP #285

moabe84 opened this issue Apr 15, 2024 · 25 comments
Labels
bug Something isn't working parallelization issue related to parallelization

Comments

@moabe84
Copy link

moabe84 commented Apr 15, 2024

Hi,
I've been trying to run QCG calculations for a small organic molecule using CREST 3.0., but I keep encountering the trial MTD convergence error (after the "quantum cluster growth: GROW" part), regardless of the timestep or SHAKE option I select. Interestingly, using CREST 2.12 everything is fine and it works even with the default settings. Could it be a bug of some sort?

Thanks,
Mostafa

@moabe84
Copy link
Author

moabe84 commented Apr 20, 2024

Hi,
I've encountered yet another issue that might be related to my earlier post. It's a simple conformational sampling calculation, and I keep receiving this error message:

*MTD 12 terminated with early ... 1 min, 10.969 sec
*MTD 3 terminated with early ... 1 min, 16.308 sec
*MTD 1 terminated with early ... 1 min, 16.387 sec
*MTD 7 terminated with early ... 1 min, 43.418 sec
*MTD 4 terminated with early ... 1 min, 48.743 sec
Intel MKL ERROR: Parameter 6 was incorrect on entry to DLASWP.
Intel MKL ERROR: Parameter 6 was incorrect on entry to DLASWP.

Again, it works perfectly with CREST 2.12.
I'm using the precompiled binaries. Thanks.

Mostafa

@pprcht
Copy link
Contributor

pprcht commented Apr 20, 2024

Is this a new build from PR #288 by any chance? What's the build information displayed at the end of a --dry run?

2.12 and 3.0 work fundamentally different, errors in one are most likely unrelated to errors in the other.

@moabe84
Copy link
Author

moabe84 commented Apr 20, 2024

The reported issues are both related to the CREST 3.0 version. I just wanted to emphasize that these issues were not encountered in the previous 2.12 version.

Here's the build information displayed at the end of a --dry run using CREST 3.0:

Dry run was requested.
Running CREST with the chosen cmd arguments will result in the following settings:

Input file : ../../opt_xtb/mol_7/xtbopt.xyz

Job type :

  1. Conformational search via the iMTD-GC algo

Job settings
sort Z-matrix : F

CRE settings
energy window (-ewin) : 6.0000
RMSD threshold (-rthr) : 0.1250
energy threshold (-ethr) : 0.0500
rot. const. threshold (-bthr) : 0.01
T (for boltz. weight) (-temp) : 298.15

General MD/MTD settings
simulation length [ps] (-len) :
time step [fs] (-tstep) : 5.0
shake mode (-shake) : 2
MTD temperature [K] (-mdtemp) : 300.00
trj dump step [fs] (-mddump) : 100
MTD Vbias dump [ps] (-vbdump) : 1.0

Calculation settings
(final) opt level (-opt) : 3
Implicit solvation (-alpb) : water

Technical settings
working directory : /anfhome/mabedi/calc_data/CDFT_groups_A_B_C/group_A/conf_sampling_crest/test
CPUs (threads) (-T) : 120

CREST binary info
CREST version : 3.0
timestamp : Sat Apr 6 18:06:37 UTC 2024
commit : d321183
compiled by : 'runner@fv-az778-216'
Fortran compiler : intel 2021.9.0
C compiler : intel 2021.9.0
build system : meson 1.2.0
-DWITH_TOMLF : True
-DWITH_GFN0 : True
-DWITH_GFNFF : True
-DWITH_TBLITE : True
-DWITH_XHCFF : False
-DWITH_LWONIOM : True

normal dry run termination.

@pprcht
Copy link
Contributor

pprcht commented Apr 20, 2024

The reported issues are both related to the CREST 3.0 version. I just wanted to emphasize that these issues were not encountered in the previous 2.12 version.

Yes, I got that. What I'm saying is that 3.0 is a complete code revision, even if 2.12 works, this does not indicate much about the error.

As for the output you attached, 120 threads seems a bit extreme for OpenMP parallelization, it might break down for that. And overhead from thread handling could be an issue.
I don't know how large your molecules are, but could you try running a test with just a few cores, like 10 or something?

@moabe84
Copy link
Author

moabe84 commented Apr 20, 2024

Following your suggestion, I tried the calculations with 8 CPUs and encountered the same problem. However, when I repeated the calculations in the "gas phase", everything seemed to work fine. Attached is the input structure in case you'd like to try.
xtbopt.xyz.txt

@pprcht
Copy link
Contributor

pprcht commented Apr 21, 2024

I tried running this structure and don't have any issues with it in the gas phase (Ctrl+C'd after the MDs):

output (dropdown)

 
       ╔════════════════════════════════════════════╗
       ║            ___ ___ ___ ___ _____           ║
       ║           / __| _ \ __/ __|_   _|          ║
       ║          | (__|   / _|\__ \ | |            ║
       ║           \___|_|_\___|___/ |_|            ║
       ║                                            ║
       ║  Conformer-Rotamer Ensemble Sampling Tool  ║
       ║          based on the xTB methods          ║
       ║                                            ║
       ╚════════════════════════════════════════════╝
       Version 3.0.1, Sun, 21 April 12:23:46, 04/21/2024
       commit (86d275a) compiled by 'philipp@xps15'
 
   Cite work conducted with this code as

   • P.Pracht, F.Bohle, S.Grimme, PCCP, 2020, 22, 7169-7192.
   • S.Grimme, JCTC, 2019, 15, 2847-2862.
   • P.Pracht, S.Grimme, C.Bannwarth, F.Bohle, S.Ehlert,
     G.Feldmann, J.Gorges, M.Müller, T.Neudecker, C.Plett,
     S.Spicher, P.Steinbach, P.Wesołowski, F.Zeller,
     J. Chem. Phys., 2024, 160, 114110.

   for works involving QCG cite

   • S.Spicher, C.Plett, P.Pracht, A.Hansen, S.Grimme,
     JCTC, 2022, 18 (5), 3174-3189.
   • C.Plett, S. Grimme,
     Angew. Chem. Int. Ed. 2023, 62, e202214477.

   for works involving MECP screening cite

   • P.Pracht, C.Bannwarth, JCTC, 2022, 18 (10), 6370-6385.
 
   Original code
     P.Pracht, S.Grimme, Universität Bonn, MCTC
   with help from (alphabetical order):
     C.Bannwarth, F.Bohle, S.Ehlert, G.Feldmann, J.Gorges,
     S.Grimme, C.Plett, P.Pracht, S.Spicher, P.Steinbach,
     P.Wesolowski, F.Zeller
 
   Online documentation is available at
   https://crest-lab.github.io/crest-docs/
 
   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
   GNU Lesser General Public License (LGPL) for more details.

 Command line input:
 $ crest input.xyz -T 8 -mdlen x0.5

  -T 8 (CPUs/Threads selected)

> Setting up backup calculator ... done.
 ----------------
 Calculation info
 ----------------
> User-defined calculation level:
 : xTB calculation via tblite lib
 : GFN2-xTB level
 :   Molecular charge    : 0
 :   Fermi temperature   : 300.00000
 :   Accuracy            : 1.00000
 :   max SCC cycles      : 500
 :   Read dipoles?       : yes
 :   Weight              : 1.00000
 
 
 -----------------------------
 Initial Geometry Optimization
 -----------------------------
 Geometry successfully optimized.
 
          ┍━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┑
          │              CREST iMTD-GC SAMPLING             │
          ┕━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┙
 
 Input structure:
  72
 
 O          2.0732334881        0.5249590014        0.0206435230
 C          2.2796932230        1.5426846433       -0.6016254382
 O          3.3122397372        1.7193497288       -1.4607474493
 C          4.2241142231        0.6368850891       -1.5701623753
 C          5.2140028071        0.6331346943       -0.3929918894
 O          6.3037182097       -0.2278953221       -0.5899164364
 C          6.0093529618       -1.6030146709       -0.4345252606
 C          7.3129073130       -2.4109804696       -0.4859536588
 O          8.0043600385       -2.4640693629        0.7332483067
 C          8.6910205019       -1.2787622505        1.0632106214
 N          1.5369678466        2.6629676934       -0.5242672072
 C          0.2878329637        2.7057415971        0.2201023181
 C         -0.7833091352        2.0075133935       -0.6418946377
 N         -0.8450962472        0.6820004306       -0.4303501492
 C         -1.7909586972       -0.1629009365       -1.1141491856
 C         -1.2086800436       -1.5748533808       -1.3294664117
 C         -1.0889012821       -2.2964434436       -0.0151649867
 C         -2.1534101752       -3.0511640699        0.4638592263
 C         -2.0788601405       -3.6714073849        1.6977074981
 C         -0.9365915341       -3.5439787540        2.4703203357
 C          0.1309414380       -2.8009574934        1.9982834784
 C          0.0574908925       -2.1805051638        0.7621427842
 C         -3.0853554659       -0.2402523350       -0.3329428855
 C         -4.2533440269       -0.9015317154       -1.0738767771
 N         -5.3485654302       -0.9457572214       -0.2998114983
 C         -6.5918193304       -1.5020176794       -0.7048465209
 C         -7.4178539658       -2.2544019380        0.2955818762
 C         -7.8666090376       -0.9067037911       -0.1890385564
 O         -4.1699233937       -1.3278948157       -2.2100357396
 O         -3.2251305489        0.1899886941        0.7875516900
 O         -1.4380133611        2.5921511751       -1.4792481390
 C         -0.0933317388        4.1504294641        0.5278170324
 C         -0.9997853174        4.2873398455        1.7562563320
 C         -2.3410387080        3.5887355881        1.5515737676
 C         -1.2216760460        5.7687050563        2.0578269544
 H          3.6661447013       -0.3046229734       -1.6055827738
 H          4.7608318927        0.7914378817       -2.5081126007
 H          4.6736211091        0.3793355893        0.5303487304
 H          5.6410202357        1.6355786128       -0.2940387590
 H          5.5086835099       -1.7781114329        0.5280689273
 H          5.3488023576       -1.9480048451       -1.2415678207
 H          7.0720898454       -3.4494246873       -0.7265038915
 H          7.9498140731       -1.9907863859       -1.2769113837
 H          8.0101393608       -0.4300716527        1.1676232328
 H          9.1899899685       -1.4698774713        2.0115457105
 H          9.4383313056       -1.0355204028        0.2980459826
 H          1.7070599005        3.3894196529       -1.2024146588
 H          0.4481288160        2.1371474915        1.1420716450
 H         -0.2596375318        0.2500912002        0.2702916459
 H         -2.0254419371        0.2958880714       -2.0835766948
 H         -0.2293982781       -1.4656103917       -1.7979720166
 H         -1.8707046810       -2.1153474830       -2.0058228973
 H         -3.0433351292       -3.1603479425       -0.1404063358
 H         -2.9129673197       -4.2565855767        2.0567383694
 H         -0.8788977752       -4.0260312289        3.4349890520
 H          1.0262823606       -2.7013380800        2.5939175154
 H          0.9019439177       -1.6091513591        0.4027527264
 H         -5.2696672280       -0.5419544911        0.6261590048
 H         -6.5395811872       -1.8530699304       -1.7273370415
 H         -7.9505462844       -3.1307662428       -0.0377672490
 H         -7.0414097794       -2.3294229125        1.3044670142
 H         -8.7098573194       -0.8497667958       -0.8587076769
 H         -7.7938679003       -0.0732790917        0.4929159245
 H         -0.6037364195        4.5613902477       -0.3465412274
 H          0.8213722501        4.7225191003        0.6988698953
 H         -0.4920526204        3.8311848147        2.6144797313
 H         -2.8196018768        3.9501042684        0.6445026496
 H         -2.2235562570        2.5115475726        1.4658562501
 H         -3.0001384828        3.7889741896        2.3936380901
 H         -1.8343337286        5.8880259686        2.9480566077
 H         -1.7296142307        6.2499114582        1.2252682817
 H         -0.2743390068        6.2769153200        2.2241613876
 
 ------------------------------------------------
 Generating MTD length from a flexibility measure
 ------------------------------------------------
 Calculating GFN0-xTB WBOs   ... done.
 Calculating NCI flexibility ... done.
     covalent flexibility measure :   0.578
 non-covalent flexibility measure :   0.714
 flexibility measure :   0.560
 t(MTD) based on flexibility :   108.2
 MTD length is scaled by     : 0.500
 t(MTD) / ps    :    54.0
 Σ(t(MTD)) / ps :   756.0 (14 MTDs)
 
 -----------------------------------
 Starting trial MTD to test settings
 -----------------------------------
 Trial MTD 1 runtime (1.0 ps) ...        0 min, 13.763 sec
 Estimated runtime for one MTD (54.0 ps) on a single thread: 12 min 23 sec
 Estimated runtime for a batch of 14 MTDs on 8 threads: 24 min 46 sec
 
******************************************************************************************
**                         N E W   I T E R A T I O N  C Y C L E                         **
******************************************************************************************
 
 ------------------------------
 Meta-Dynamics Iteration 1
 ------------------------------
 list of applied metadynamics Vbias parameters:
$metadyn    0.21600   1.300
$metadyn    0.10800   1.300
$metadyn    0.05400   1.300
$metadyn    0.21600   0.780
$metadyn    0.10800   0.780
$metadyn    0.05400   0.780
$metadyn    0.21600   0.468
$metadyn    0.10800   0.468
$metadyn    0.05400   0.468
$metadyn    0.21600   0.281
$metadyn    0.10800   0.281
$metadyn    0.05400   0.281
$metadyn    0.07200   0.100
$metadyn    0.36000   0.800
 
  ::::::::::::: starting MTD    1 :::::::::::::
  |   MD simulation time   :    54.0 ps       |
  |   target T             :   300.0 K        |
  |   timestep dt          :     5.0 fs       |
  |   dump interval(trj)   :   100.0 fs       |
  |   SHAKE algorithm      : true (all bonds) |
  |   dump interval(Vbias) :    1.00 ps       |
  |   Vbias prefactor (k)  :  0.2160 Eh       |
  |   Vbias exponent (α)   :  1.3000 bohr⁻²   |
  ::::::::::::: starting MTD    2 :::::::::::::
  |   MD simulation time   :    54.0 ps       |
  |   target T             :   300.0 K        |
  |   timestep dt          :     5.0 fs       |
  |   dump interval(trj)   :   100.0 fs       |
  |   SHAKE algorithm      : true (all bonds) |
  |   dump interval(Vbias) :    1.00 ps       |
  |   Vbias prefactor (k)  :  0.1080 Eh       |
  |   Vbias exponent (α)   :  1.3000 bohr⁻²   |
  ::::::::::::: starting MTD    3 :::::::::::::
  |   MD simulation time   :    54.0 ps       |
  |   target T             :   300.0 K        |
  |   timestep dt          :     5.0 fs       |
  |   dump interval(trj)   :   100.0 fs       |
  |   SHAKE algorithm      : true (all bonds) |
  |   dump interval(Vbias) :    1.00 ps       |
  |   Vbias prefactor (k)  :  0.0540 Eh       |
  |   Vbias exponent (α)   :  1.3000 bohr⁻²   |
  ::::::::::::: starting MTD    4 :::::::::::::
  |   MD simulation time   :    54.0 ps       |
  |   target T             :   300.0 K        |
  |   timestep dt          :     5.0 fs       |
  |   dump interval(trj)   :   100.0 fs       |
  |   SHAKE algorithm      : true (all bonds) |
  |   dump interval(Vbias) :    1.00 ps       |
  |   Vbias prefactor (k)  :  0.2160 Eh       |
  |   Vbias exponent (α)   :  0.7800 bohr⁻²   |
  ::::::::::::: starting MTD    5 :::::::::::::
  |   MD simulation time   :    54.0 ps       |
  |   target T             :   300.0 K        |
  |   timestep dt          :     5.0 fs       |
  |   dump interval(trj)   :   100.0 fs       |
  |   SHAKE algorithm      : true (all bonds) |
  |   dump interval(Vbias) :    1.00 ps       |
  |   Vbias prefactor (k)  :  0.1080 Eh       |
  |   Vbias exponent (α)   :  0.7800 bohr⁻²   |
  ::::::::::::: starting MTD    6 :::::::::::::
  |   MD simulation time   :    54.0 ps       |
  |   target T             :   300.0 K        |
  |   timestep dt          :     5.0 fs       |
  |   dump interval(trj)   :   100.0 fs       |
  |   SHAKE algorithm      : true (all bonds) |
  |   dump interval(Vbias) :    1.00 ps       |
  |   Vbias prefactor (k)  :  0.0540 Eh       |
  |   Vbias exponent (α)   :  0.7800 bohr⁻²   |
  ::::::::::::: starting MTD    7 :::::::::::::
  |   MD simulation time   :    54.0 ps       |
  |   target T             :   300.0 K        |
  |   timestep dt          :     5.0 fs       |
  |   dump interval(trj)   :   100.0 fs       |
  |   SHAKE algorithm      : true (all bonds) |
  |   dump interval(Vbias) :    1.00 ps       |
  |   Vbias prefactor (k)  :  0.2160 Eh       |
  |   Vbias exponent (α)   :  0.4680 bohr⁻²   |
  ::::::::::::: starting MTD   14 :::::::::::::
  |   MD simulation time   :    54.0 ps       |
  |   target T             :   300.0 K        |
  |   timestep dt          :     5.0 fs       |
  |   dump interval(trj)   :   100.0 fs       |
  |   SHAKE algorithm      : true (all bonds) |
  |   dump interval(Vbias) :    1.00 ps       |
  |   Vbias prefactor (k)  :  0.3600 Eh       |
  |   Vbias exponent (α)   :  0.8000 bohr⁻²   |
*MTD   2 completed successfully ...       20 min, 44.317 sec
  ::::::::::::: starting MTD    8 :::::::::::::
  |   MD simulation time   :    54.0 ps       |
  |   target T             :   300.0 K        |
  |   timestep dt          :     5.0 fs       |
  |   dump interval(trj)   :   100.0 fs       |
  |   SHAKE algorithm      : true (all bonds) |
  |   dump interval(Vbias) :    1.00 ps       |
  |   Vbias prefactor (k)  :  0.1080 Eh       |
  |   Vbias exponent (α)   :  0.4680 bohr⁻²   |
*MTD  14 completed successfully ...       20 min, 46.922 sec
  ::::::::::::: starting MTD   13 :::::::::::::
  |   MD simulation time   :    54.0 ps       |
  |   target T             :   300.0 K        |
  |   timestep dt          :     5.0 fs       |
  |   dump interval(trj)   :   100.0 fs       |
  |   SHAKE algorithm      : true (all bonds) |
  |   dump interval(Vbias) :    1.00 ps       |
  |   Vbias prefactor (k)  :  0.0720 Eh       |
  |   Vbias exponent (α)   :  0.1000 bohr⁻²   |
*MTD   1 completed successfully ...       20 min, 47.901 sec
  ::::::::::::: starting MTD    9 :::::::::::::
  |   MD simulation time   :    54.0 ps       |
  |   target T             :   300.0 K        |
  |   timestep dt          :     5.0 fs       |
  |   dump interval(trj)   :   100.0 fs       |
  |   SHAKE algorithm      : true (all bonds) |
  |   dump interval(Vbias) :    1.00 ps       |
  |   Vbias prefactor (k)  :  0.0540 Eh       |
  |   Vbias exponent (α)   :  0.4680 bohr⁻²   |
*MTD   4 completed successfully ...       20 min, 49.155 sec
  ::::::::::::: starting MTD   10 :::::::::::::
  |   MD simulation time   :    54.0 ps       |
  |   target T             :   300.0 K        |
  |   timestep dt          :     5.0 fs       |
  |   dump interval(trj)   :   100.0 fs       |
  |   SHAKE algorithm      : true (all bonds) |
  |   dump interval(Vbias) :    1.00 ps       |
  |   Vbias prefactor (k)  :  0.2160 Eh       |
  |   Vbias exponent (α)   :  0.2808 bohr⁻²   |
*MTD   6 completed successfully ...       20 min, 50.346 sec
  ::::::::::::: starting MTD   11 :::::::::::::
  |   MD simulation time   :    54.0 ps       |
  |   target T             :   300.0 K        |
  |   timestep dt          :     5.0 fs       |
  |   dump interval(trj)   :   100.0 fs       |
  |   SHAKE algorithm      : true (all bonds) |
  |   dump interval(Vbias) :    1.00 ps       |
  |   Vbias prefactor (k)  :  0.1080 Eh       |
  |   Vbias exponent (α)   :  0.2808 bohr⁻²   |
*MTD   5 completed successfully ...       20 min, 52.033 sec
  ::::::::::::: starting MTD   12 :::::::::::::
  |   MD simulation time   :    54.0 ps       |
  |   target T             :   300.0 K        |
  |   timestep dt          :     5.0 fs       |
  |   dump interval(trj)   :   100.0 fs       |
  |   SHAKE algorithm      : true (all bonds) |
  |   dump interval(Vbias) :    1.00 ps       |
  |   Vbias prefactor (k)  :  0.0540 Eh       |
  |   Vbias exponent (α)   :  0.2808 bohr⁻²   |
*MTD   7 completed successfully ...       20 min, 53.401 sec
*MTD   3 completed successfully ...       21 min, 13.603 sec
*MTD  13 completed successfully ...       18 min, 36.420 sec
*MTD   8 completed successfully ...       18 min, 39.749 sec
*MTD   9 completed successfully ...       18 min, 39.656 sec
*MTD  11 completed successfully ...       18 min, 45.447 sec
*MTD  12 completed successfully ...       18 min, 47.907 sec
*MTD  10 completed successfully ...       18 min, 54.249 sec
 
 ======================================
 |  Multilevel Ensemble Optimization  |
 ======================================
 Optimizing all 7546 structures from file "crest_dynamics.trj" ...
 ----------------------
 crude pre-optimization
 ----------------------
 Optimization engine: ANCOPT
 Hessian update type: BFGS
 E/G convergence criteria:  0.500E-03 Eh, 0.100E-01 Eh/a0
 |>0.0% |>5.0% |>10.0% |>15.0% |>20.0% |>25.0%

However, upon trying the same thing with ALPB implicit solvation, I encounter the same Intel MKL error. and early terminations in the MD.
The error also seems to vanish when using only a single thread (i.e. serial execution of the program). Which is all a bit odd...
It seems to be a parallelization/memory issue for large molecules.
I'll keep looking into it.

@moabe84
Copy link
Author

moabe84 commented Apr 21, 2024

Thank you Philipp. I'm looking forward to hearing back from you.

@pprcht
Copy link
Contributor

pprcht commented Apr 22, 2024

I've some new insight. It seems to be a memory issue for large molecules related to either the MKL libraries, or the Intel compilers (or both). I can reproduce it for other large molecules with ease, and with different levels of theory even (GFN2, GFNFF).

GNU (gfortran) builds don't seem to suffer from it and large molecules can be calculated, although the builds in turn don't seem able to handle nested parallelism (as mentioned in #284). But then again, this is only relevant for the MD part.

I'll try and see if it is possible to circumvent the MKL issue somehow via the code or the build.

@pprcht pprcht added bug Something isn't working parallelization issue related to parallelization labels Apr 22, 2024
@moabe84
Copy link
Author

moabe84 commented Apr 26, 2024

Hi Philipp,
I've tried the continuous release version, and it seems the issue has been resolved. Should I start using this version now, or better to wait until it's fully finalized?

Thanks,
Mostafa

@OkKakao
Copy link

OkKakao commented Apr 28, 2024

sue for large molecules related to either the MKL libraries, or the Intel compilers (or both). I can reproduce it for other large molecules with ease, and with different levels of theory even (GFN2, GFNFF).

I have the same error message with @moabe84, and my situation is somewhat different.

My system works well with crest 2.12 version, but if I used multiple thread by --T command, one of the MTDs does not get started.

I stucked in infinite loop. Nothing is included in the trajectory file. It's just empty

@pprcht
Copy link
Contributor

pprcht commented Apr 29, 2024

Hi, I spent a while looking into this issue, unfortunately with not much actual success. A few comments:

  • I can confirm that there is an issue with the multi-layered parallelization via OpenMP, but this seems to be selective (at least in my attempts) to the CMake/ifort build, and mainly affects large systems.
  • With multi-layered OpenMP processes I mean routines that run many jobs in parallel, like the aforementioned MTD part, or geometry optimizations following that. With the respective builds, the shared memory handling seems to be broken.
  • On the other hand, runtypes only using one calculator at a time (like running standalone singlepoints, optimizations, MDs...) do not suffer from the same issue. The same is true if running the multi-layered processes in serial (-T 1)

At the moment I seem to be unable find a workaround for the affected source code parts, although I will continue looking into it. After all, running multi-layered processes like the MDs and optimizations is abusing the OpenMP functionalities a bit. These would be much better suited for MPI parallelization, which I will address at some point in the future. As for now, the advice I can give is:

  • The CMake/GNU and meson/ifort (=continuous release) builds seem to be doing fine, which would check out with the last message of @moabe84
  • Make sure to provide OpenMP with enough memory for large systems (via the OMP_STACKSIZE global variable) and be cautious about providing too many threads with -T. This can actually cause significant thread-handling overhead from the OpenMP lib side, ontop of the memory requirements.
  • When using versions >=3.0.1, try setting omp_nested=false in the toml input (may be unrelated to this issue)

I will upload the 3.0.1 hotfix this week which addresses some other recent problems. I've included a warning in that versions README.md regarding the CMake/ifort build, referencing to this issue.

@moabe84
Copy link
Author

moabe84 commented Apr 30, 2024

Thank you so much, Philipp, for the updates. I've started using the continuous release builds and so far everything's working fine.

@moabe84
Copy link
Author

moabe84 commented Apr 30, 2024

I just wanted to provide an update on this issue. I faced the same problem when running calculations for a different new structure. The issue seems to be selective.

I do have a question: will using version 3 yield better results in the QCG calculations compared to version 2.12? It seems I'll have to use version 2.12 for the time being, and I just want to make sure that it's fine.

@pprcht
Copy link
Contributor

pprcht commented Apr 30, 2024

It is systemsize dependent as far as I can tell, yes.

The QCG implementation is still the same in both versions so it won't matter

@pprcht pprcht changed the title Issue with QCG calculations in CREST 3.0 Intel MKL ERROR: Parameter 6 was incorrect on entry to DLASWP Aug 6, 2024
@pprcht pprcht pinned this issue Aug 6, 2024
@pprcht
Copy link
Contributor

pprcht commented Aug 6, 2024

Some update on the parallelization issue, in particular the Intel MKL ERROR: Parameter 6 was incorrect on entry to DLASWP.

I'm continuing to investigate, although I have not found a conclusive answer. tblite had an update that addressed some MKL memory leak and I have updated the submodule accordingly (#328). This update seemed to have helped in some cases. Unfortunately I still got the MKL Error in others.

However, I may have found another possible explanation within the parallel processing of calculations. It seems that reinitialization of the calculator/wavefunction helps to avoid the error. The MKL error occurs somewhere within the BLAS and LAPACK implementation as DLASWP is not called directly in CREST or its subprojects. Any part that processes larger matrices like the wavefunction could be the origin here. My thinking is that if tblite calculations are not reinitialized between different molecules/conformers there might be a mismatch in some algorithm/dimension within the linear algebra part which causes the error. At least if the two molecules/conformers are quite different. Within the optimization loops the motivation for not reinitializing was to save some compute time, assuming the wavefunction between different conformers are similar enough to provide a nice SCF starting guess. But maybe that can actually cause problems.
I'll continue to look into this and, if this actually seems to be the cause, think about a more general avoidance strategy.

@pprcht
Copy link
Contributor

pprcht commented Aug 19, 2024

After some back-and-forth the issue seems to result from nested parallelism and a mismatch between OpenMP and MKL after all. I was not able to get rid of it entirely, but some changes in #331 made it much more robust against this error. I had a reliably reproducible example to test on. Unfortunately the omp nested settings do not fully affect the MKL implementation and vice versa, it's a really complicated problem.
From what I can tell, a GNU/openblas build via CMake does not have this problem, but is a bit slower w.r.t. the linear algebra (which doesn't matter too much in the default runtypes). I'm working on an updated GH workflow and on a conda update to make this easier available. Will draft a proper 3.0.2 release with this soon.

@coltonbh
Copy link

coltonbh commented Aug 21, 2024

Hi @pprcht. When we get this error (here's mine for references):

*MTD  13 completed successfully ...        3 min, 38.531 sec

*MTD   4 completed successfully ...        3 min, 58.353 sec

*MTD  11 completed successfully ...        4 min, 10.991 sec

*MTD   3 completed successfully ...        4 min, 11.186 sec

*MTD  12 completed successfully ...        4 min, 19.571 sec

*MTD   6 terminated with early ...        4 min, 23.957 sec

*MTD   9 completed successfully ...        4 min, 25.710 sec

*MTD   7 completed successfully ...        4 min, 28.931 sec

*MTD  10 completed successfully ...        4 min, 32.039 sec

*MTD   8 completed successfully ...        4 min, 32.993 sec

*MTD   2 completed successfully ...        4 min, 37.151 sec

*MTD   5 completed successfully ...        4 min, 38.266 sec



Intel MKL ERROR: Parameter 6 was incorrect on entry to DLASWP.

Intel MKL ERROR: Parameter 6 was incorrect on entry to DLASWP.

Intel MKL ERROR: Parameter 6 was incorrect on entry to DLASWP.

Intel MKL ERROR: Parameter 6 was incorrect on entry to DLASWP.

Intel MKL ERROR: Parameter 6 was incorrect on entry to DLASWP.

Intel MKL ERROR: Parameter 6 was incorrect on entry to DLASWP.

Intel MKL ERROR: Parameter 6 was incorrect on entry to DLASWP.

Intel MKL ERROR: Parameter 6 was incorrect on entry to DLASWP.
*MTD  14 completed successfully ...        4 min, 42.447 sec

What does this mean about the calculation? Are the results invalid? Are the results just fine? My calculation throws this error but then does not exit--it seems the calculation just sort of hangs for a very long time... So maybe the results will never get calculated anyways?? A single thread continues to run at 100% but the calculation does not advance. How should we interpret this error coming up and is there a way to prevent it (set MKL_THREADS=0? or some other fix)?

My googling/chatgpt-ing around on this issue suggests this is an issue with the pivot indices for an LU decomposition inside of the DLASWP function. My guess would be something like a matrix halfway through an LU decomposition is getting modified and then the pivot indices are no longer valid...?

To Reproduce:

XYZ file attached as .txt.

input.toml:

threads = 16
input = "structure.xyz"
runtype = "imtd-gc"

[calculation]
level = [
    { method = "gfn2", charge = -1, uhf = 0 },
]

structure.txt

Additional Details

This happens for me when doing conformer search on a superstructure of 3 molecules (all part of a reaction complex). Perhaps this has something to do with having multiple molecules in the structure? When doing conformer search I can see how this would make any initial wavefunction very inappropriate for an initial guess because the atom centers may have moved dramatically from a starting frame. Just throwing that out there as a possible caused based on your comments above.

@pprcht
Copy link
Contributor

pprcht commented Aug 21, 2024

As explained above, this is most likely an issue with nested parallelism in OpenMP and MKL, and so far I have not found a definitive way of preventing it, except not using an MKL-based build at all, or running the thing in serial.

Approaching the problem via DLASWP is not really an option since neither CREST or it's subprojects actively call it, it is buried in some of LAPACKs linear algebra calls. I've already implemented all the things previously suggested, such as resetting the wavefunction, turning nested parallelism on and off, etc.. Frustratingly, nothing has fixed it entirely yet. If the calculation continues after hanging for a long time my feeling is most of the results are probably fine. If it would affect more than a few energy+gradient calls the error would be thrown much more frequently. The broken structures (if there are any) would be kicked out in the sorting.

@pprcht
Copy link
Contributor

pprcht commented Aug 21, 2024

FYI all, the GitHub workflow is now updated so it also creates a GNU/openblas static binary which should not have the MKL issue.

Download (GNU)

@coltonbh
Copy link

coltonbh commented Aug 22, 2024

I ran the exact same input as above and then did not get the error. This makes me think it's a race condition of some sort which is what makes it happen sporadically. Just adding that context.

Although I did not get the error printed out at the same point in the conformer search, the run did ultimately end up frozen at some point (threads running for 24 hours + without making any visible progress via stdout print outs). So perhaps the issue still occured but without the print out; however, if so it happened at a very different point in the algorithms which might still suggest a race condition.

@coltonbh
Copy link

FYI all, the GitHub workflow is now updated so it also creates a GNU/openblas static binary which should not have the MKL issue.

Download (GNU)

Will give this a try! Thank you!

@moabe84
Copy link
Author

moabe84 commented Aug 22, 2024

Hi
I just tried the "GNU/OpenBLAS static binary" for the QCG calculations and encountered the following error. Both the solute and solvent input structures are in the directory:

_ERROR STOP error while reading input coordinates

Error termination. Backtrace:
#0 0xcee5f0 in ???
#1 0xcee899 in ???
#2 0xcef9a7 in ???
#3 0x845da4 in _strucrd_MOD_rdxmol
at /home/runner/work/crest/crest/src/strucreader.f90:1028
#4 0x846c43 in strucrd_MOD_rdcoord
at /home/runner/work/crest/crest/src/strucreader.f90:936
#5 0x6a94d3 in qcg_grow

at /home/runner/work/crest/crest/src/qcg/solvtool.f90:661
#6 0x6b3e68 in crest_solvtool

at /home/runner/work/crest/crest/src/qcg/solvtool.f90:87
#7 0x40e52b in crest
at /home/runner/work/crest/crest/src/crest_main.f90:261
#8 0x408f7e in main
at /home/runner/work/crest/crest/src/crest_main.f90:26

@pprcht
Copy link
Contributor

pprcht commented Aug 22, 2024

@moabe84 I would prefer to keep this separate from the MKL issue. STOP and ERROR STOP statements in the GNU builds will attach the backtrace, so this is really the "error while reading input coordinates" stop. Grepping through the code reveals that it triggers here because there is a mismatch between "best.xyz" in the read-in and expected number of atoms. Is the file best.xyz present in your run? does the content make sense? CREST doesn't seem to write this file, so it must be written by either xtb or xtbiff. xtb here, according to the code.

@moabe84
Copy link
Author

moabe84 commented Aug 22, 2024

Here's the command line:
crest xtbopt.xyz --qcg water.xyz --nsolv 100 --T 24 --gsolv --nclus 10 --mreset 4 --alpb water --nofix
There is no ""best.xyz" file. Both xtbopt.xyz and water.xyz are in the run directory. The xtbopt.xyz file, which is generated from the xtb optimization, seems correct. I should mention that this command runs successfully with version 2.12.

@pprcht
Copy link
Contributor

pprcht commented Aug 22, 2024

Again, the file best.xyz should be written by xtb. Which means it is likely a feature of aISS, which, in turn, was not interfaced to the 2.12 code, so if the command goes through with the old version it is because the algorithm differs. You best either contact Christoph Plett directly about this and ask under what conditions best.xyz is written in aISS, or you try switching to the xtbiff version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working parallelization issue related to parallelization
Projects
None yet
Development

No branches or pull requests

4 participants