-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORCA converges in 8 steps, geomeTRIC gets stuck #186
Comments
I've also run ORCA optimization with QN, it converged in 7 steps: GDIIS took 8 steps like RFO. So I suspect that the culprit may be incorrect handling of Hessian in geomeTRIC. In particular, I see that geomeTRIC by default gets rid of negative eigenvalues by regenerating guess Hessian, while ORCA continues despite having negative eigenvalue in Hessian. |
Thanks for sharing this. I agree geomeTRIC is not behaving properly in this calculation when you have provided a high-quality Hessian in the beginning. I would expect it to take several good steps if you have an exact initial Hessian, but it seems that Step 2 is already low quality. I am curious what happens if you restrict the step using I am seeing one other difference. In the beginning of the optimization, it appears several Hessian eigenvalues are negative:
However, this is not the behavior seen in ORCA as there is only one negative eigenvalue. There may be an issue with conversion of the Hessian into internal coordinates. If that is true, it's a more serious problem that needs fixing. I think this will require some time for me to diagnose, thanks a lot for sharing the data. |
Will check it now. BTW, at least in ORCA there is a clear separation between maximum step and trust radius. The former is a maximum for changes of individual internal coordinates, while trust radius limits length of step vector. Both limits can be set separately, though I didn't have much success with tight limits on MaxStep (I guess because it distorts step direction too much) |
With |
An issue I have observed in the past that could be related to what you're seeing is that sometimes the BFGS Hessian update results in the appearance of negative eigenvalues in the Hessian matrix. I think BFGS is supposed to preserve positive definiteness so there may be something going on there. If you could also let me know if geomeTRIC performs differently using a different coordinate system (try |
FWIW I see this happening in ORCA too, both with model and pre-computed hessians |
Interestingly, with
|
Results: Cartesian converged after step 100, DLC — after step 20. |
I've re-run DLC without |
I've found this in manual (https://geometric.readthedocs.io/en/latest/how-it-works.html#bfgs-hessian-update):
So, it looks like you were already aware of this kind of issue. |
Thanks a lot for this data. I think it's a clear example of a need for improving the optimization performance. I previously did not run into such an issue because I don't usually have an exact Hessian at the start of an energy minimization. It might take me a few weeks to resolve the problem, since it would require setting up the environment on my side, (possibly) making a test case that runs in a short time, and implementing the improvements themselves; the end of the term is getting pretty busy over here with graduate admissions and final exams, but this is important. |
Okay, no problem. Thanks! |
I have a few ideas for improving the performance or diagnosing problems for your job (or more generally, energy minimizations that have an exact Hessian at the start). You might be able to try some of these ideas before I can get to them.
Let me know if you're interested in trying these, otherwise, I should have more time by the end of this week to get to them.
|
Trying
To be fair my Hessian is not "exact" in strict sense. This molecule was optimized at B3LYP/6-31G** level, and starting geometry and Hessian are taken from there. In my understanding Hessian would be "exact" if it was computed at the same level of theory on starting geometry, in that case it would also likely have big negative eigenvalues with corresponding modes pointing towards minimum. This is not the case with my Hessian which is positively defined initially. |
I've tried xtb (GFN2) using ORCA (with the same optimizer settings as before) and geomeTRIC (with dlc coordinates) as optimizers with the same starting geometry and Hessian. As a result, geomeTRIC has converged faster: 31 step vs 43 steps. |
Does |
Correct, at the moment geomeTRIC can only request gradients from external codes. In the next version I plan on adding the ability to request energies and analytic Hessians (if available). |
Here are results of fdcheck/dlc run: |
Trying tsbfgs with this patch now: diff --git a/geometric/step.py b/geometric/step.py
index d8c755d..0548648 100644
--- a/geometric/step.py
+++ b/geometric/step.py
@@ -357,7 +357,7 @@ def update_hessian(IC, H0, xyz_seq, gradx_seq, params, trust_limit=False, max_up
if ts_bfgs: Hup = get_hessian_update_tsbfgs(Dy, Dg, H)
else: Hup = get_hessian_update_msp(Dy, Dg, H, verbose=params.verbose)
else:
- Hup = get_hessian_update_bfgs(Dy, Dg, H)
+ Hup = get_hessian_update_tsbfgs(Dy, Dg, H)
# Compute some diagnostics.
if params.verbose:
Eig = sorted_eigh(H, asc=True)[0] Update: needed to add following patch to avoid errors when code attempts to access diff --git a/geometric/step.py b/geometric/step.py
index d8c755d..8cf3424 100644
--- a/geometric/step.py
+++ b/geometric/step.py
@@ -278,8 +278,8 @@ def get_hessian_update_tsbfgs(Dy, Dg, H): # pragma: no cover
# yk = col(self.G - self.Gprev)
yk = Dg
dk = Dy
- jk = yk - np.dot(self.H, dk)
- B = force_positive_definite(self.H)
+ jk = yk - np.dot(H, dk)
+ B = force_positive_definite(H)
# Scalar 1: dk^T |Bk| dk
s1 = multi_dot([dk.T, B, dk])
# Scalar 2: (yk^T dk)^2 + (dk^T |Bk| dk)^2 |
Results for run with above patches: It didn't converge after 81 steps (unlike |
I've activated MSP updates with patch diff --git a/geometric/step.py b/geometric/step.py
index d8c755d..b8dad9d 100644
--- a/geometric/step.py
+++ b/geometric/step.py
@@ -357,7 +357,7 @@ def update_hessian(IC, H0, xyz_seq, gradx_seq, params, trust_limit=False, max_up
if ts_bfgs: Hup = get_hessian_update_tsbfgs(Dy, Dg, H)
else: Hup = get_hessian_update_msp(Dy, Dg, H, verbose=params.verbose)
else:
- Hup = get_hessian_update_bfgs(Dy, Dg, H)
+ Hup = get_hessian_update_msp(Dy, Dg, H, verbose=params.verbose)
# Compute some diagnostics.
if params.verbose:
Eig = sorted_eigh(H, asc=True)[0] Result: converged in 10 steps which is much better than anything above! It's still worse than ORCA (7-8 steps, uses BFGS), also it features 3 missteps and ends with two steps with positive energy changes. |
Interestingly, MSP at step 8 was really close by energy value to ORCA's final result ( |
Thanks a lot. I think the cause of the bad performance could be one of the following:
|
If you are interested I can record Hessian updates made by ORCA after each step so you could compare their implementation of BFGS with yours. Switching to MSP is indeed a bad idea for generic case. I've actually seen it slowing down convergence in a different case when I forgot that my geomeTRIC installation was patched.
I think my fdcheck.tar.gz I've attached above contained the complete log (as |
Sorry, I must have missed that you posted the log above. Thanks! At the moment I don't think we need to check the Hessian update vs. Orca. The step size and gradient in Orca will probably be different from the one in geomeTRIC, due to differences in how the IC system is constructed. If you want to do this, you would need to pass Orca's Hessian, step, and gradient change into geomeTRIC's Hessian update function, and check that it gives the same update. However I don't think that's the most likely cause of the problem. From reading the log, I can see the second derivs of internal coordinates w/r.t. Cartesian coordinates has good agreement with finite difference. However, the second derivs of the energy in internal coordinates has poor agreement with finite difference (see selected lines):
Errors of this size are big enough to cause negative eigenvalues in the IC Hessian to appear when there shouldn't be any. There might be a bug in the construction of the IC Hessian but I think we need to try two more things to be sure:
Thanks, and sorry for the trouble.
|
I've made another run with DLC, regular BFGS update, and |
My current hypothesis is that initial ICs are suboptimal and quickly become unsuitable for stable calculations, and that instability is causing missteps and re-appearance of negative Hessian eigenvalues. However, it still doesn't explain overall slow convergence in comparison to ORCA. |
Running fdcheck with the following patch now: diff --git a/geometric/ic_tools.py b/geometric/ic_tools.py
index e1bcf8d..00fbe55 100644
--- a/geometric/ic_tools.py
+++ b/geometric/ic_tools.py
@@ -126,6 +126,10 @@ def check_internal_hess(coords, molecule, IC, engine, dirname, verbose=0):
logger.info("Hessian Eigenvalues (Internal):\n")
for i in range(len(Eigq)):
logger.info("% 10.5f %s" % (Eigq[i], "\n" if i%9 == 8 else ""))
+ Eigqf = sorted(np.linalg.eigh(Hq_f)[0])
+ logger.info("Hessian Eigenvalues (Hq_f):\n")
+ for i in range(len(Eigqf)):
+ logger.info("% 10.5f %s" % (Eigqf[i], "\n" if i%9 == 8 else ""))
return Hq, Hq_f
def write_displacements(coords, M, IC, dirname, ic_select="all", displace_range=(-0.3, 0.3, 7), verbose=0):
diff --git a/geometric/optimize.py b/geometric/optimize.py
index 73d2733..c90dc17 100644
--- a/geometric/optimize.py
+++ b/geometric/optimize.py
@@ -959,10 +959,10 @@ def run_optimizer(**kwargs):
fdcheck = kwargs.get('fdcheck', False) # Check internal coordinate gradients using finite difference..
if fdcheck:
- IC.Prims.checkFiniteDifferenceGrad(coords)
- IC.Prims.checkFiniteDifferenceHess(coords)
- check_internal_grad(coords, M, IC.Prims, engine, dirname, verbose)
- check_internal_hess(coords, M, IC.Prims, engine, dirname, verbose)
+ IC.checkFiniteDifferenceGrad(coords)
+ IC.checkFiniteDifferenceHess(coords)
+ check_internal_grad(coords, M, IC, engine, dirname, verbose)
+ check_internal_hess(coords, M, IC, engine, dirname, verbose)
return
# Print out information about the coordinate system |
Results of fdcheck with patch above: |
Thanks a lot. From looking at your latest test, I can see a potential major issue with the IC Hessian. Here are lowest four eigenvalues of the IC Hessian generated by converting the Cartesian Hessian. In normal operation, geomeTRIC calculates the IC Hessian by converting a Cartesian Hessian that is either generated by finite-difference on the Cartesian gradient or supplied by the user. (Note that fdcheck always calculates the Cartesian Hessian so if you provided a Hessian file it is not used).
Here are the lowest four eigenvalues of the IC Hessian generated using finite-difference on the IC gradient.
This indicates there may be something wrong with the conversion of the Cartesian Hessian into the IC Hessian, i.e. this function: https://github.com/leeping/geomeTRIC/blob/master/geometric/internal.py#L1986 That function is supposed to equation 6 in the attached paper for converting the Cartesian Hessian to the IC Hessian. That equation requires calculating To resolve the issue, it would be nice to try running the fdcheck with the above patch using an optimized geometry with relatively tight criteria (say gmax ~ 1e-5). If there is still significant disagreement in the Hessian, then we know the problem isn't solely due to the second term in parentheses in Eq. 6, i.e. it could be the matrix multiplications "outside" the parentheses. I will also install ORCA to attempt to reproduce the issue on my side. |
I can think of a different approach — make a unit test for each part of calculation to ensure that all operations produce correct results. |
Some tests are located in https://github.com/leeping/geomeTRIC/blob/master/geometric/tests/test_hessian.py but evidently they did not catch this problem. One problem that complicates the standard unit test approach is that there are always numerical precision errors in comparing the finite difference vs. analytic gradient. For example, the fdcheck shows plenty of errors on the order of 1e-5 for the IC gradient, but I don't think that is a problem; on the other hand, I believe the appearance of negative eigenvalues in the IC Hessian is a real problem. I think to find the bug, we need to first determine if there is a problem in the |
FWIW that's what |
Okay, I see what you mean. I think the test of the IC Hessian should pass if the "absolute differences" in the final column (below) are less than 1e-4, maybe 1e-3 (I'm not sure on the absolute threshold to use). The differences are certainly a lot bigger right now.
|
Results of fdcheck at minimum point: fdcheck_minimum.tar.gz |
Thanks a lot. There is still a significant error. I will spend some time on reproducing the issue here and attempt a fix. |
I've gotten your example to run on my server. I'd like to know whether it's possible to increase the accuracy of the gradient ORCA further, in order to eliminate other possible issues. The output below indicates a gradient error on the order of 1e-4, and I'd like to make it one order of magnitude lower. Could you suggest which parameters I should adjust? (I'm not an expert ORCA user.)
|
It's possible to disable RIJCOSX by replacing Are you using ORCA 5.0.4? Manual says about version 5 that
BTW, it would be great if geomeTRIC could work correctly with lower quality gradients. There are other QC methods which can introduce gradient noise, e.g. solvation models, and there are methods which don't have analytical gradients at all. I think it could make sense to add debugging option to geomeTRIC that would add pseudorandom noise vector of chosen norm to gradient, and use it to ensure that it still can converge reasonably. |
Thanks for the tip! I'm using ORCA 5.0.1 here, the version is older than 5.0.4 because I installed it a while ago. I agree it would be great if geomeTRIC could converge reliably with low quality gradients but I am not aware of any software or code designed to do that. The underlying assumption of any gradient-based optimization code (that I know about) is that the gradient is "correct". There might be examples of optimization methods that use noisy gradients in the literature, if you or anyone could suggest some, I can look into it. It would be a major feature though. |
I guess it should be fine. IIRC the only major thing fixed recently was bug in D4, which is not used here.
However it seems like ORCA can cope well with RIJCOSX gradients (at least for small molecules like this one). Maybe it's because of better Hessian update, or maybe because of some fine tuning of RFO parameters. |
Those are good points. If there is a bug in the Hessian conversion or the Hessian update, it will cause convergence issues whether the gradient is exact or not, so the current priority is to find whether there's a bug and fix it. Tuning parameters could also improve performance, but one needs to tune them in a way that improves performance for the currently-bad cases like this one without worsening performance for other cases where it's working well. We can revisit that later. In the meantime, I got this result when comparing Hessian eigenvalues. Somehow, there isn't a significant difference in the smallest eigenvalues on my end. It could be because I was running on a different branch (interpolate). I will switch back to the master branch and try again. Some selected values are reproduced below.
|
Here are two minimizations, the former performed with built-in optimizer of ORCA 5.0.4 (redundant internal coordinates, RFO) and the latter by geomeTRIC (unmodified master +
0001-Okay-patch.patch.txt). This is the same system as in #180, but high-quality Hessian (B3LYP/6-31G**) was used from start to the end to aid convergence.
defgrid3 nofinalgridx
was used in both cases to improve quality of gradients as compared to default settings.In case of geomeTRIC I've used
--reset=False
option, otherwise high-quality Hessian would be dropped right in the beginning because of negative eigenvalues created by Hessian update. I've also used--subfrctor 2
to avoid whole system rotations with intent to keep Hessian more relevant for the current geometry.ORCA's inputs and outputs: PMDA_chair_dimer2_vdzp_orca.tar.gz
geomeTRIC's
command.sh
, inputs and outputs:PMDA_chair_dimer2_geometric.tar.gz
The text was updated successfully, but these errors were encountered: