Dantzig LCP solver nan values #892

pchorak · 2017-07-14T18:34:41Z

Hello,
I have come across an issue while using DART with Gazebo in which the Dantzig LCP solver returns nans, causing the simulation to fail. I am using the Gazebo default branch and DART master branch. Here is a Gazebo world file (sdf) to reproduce the issue. In the example, two boxes anchored to the world with parallel revolute joints collide. matrices.txt contains the debug printout from DantzigLCPSolver..cpp. If I replace the boxes with spheres, do not align the revolute axes, or use the projected Gauss-Seidel solver then the issue does not occur.

The text was updated successfully, but these errors were encountered:

pchorak · 2017-07-14T19:22:02Z

Here is an example using DART without Gazebo: collideBoxes.zip

pchorak · 2017-07-15T00:30:13Z

I see that pull request #881 relates to contact constraints and nan values as well. I checked the normals of the of the contacts at this point. They all appear to be of approximately length 1. There are a number of contact constraints (10) and they point in many different directions.

pchorak · 2017-07-18T16:50:17Z

Digging further into DantzigLCPSolver.cpp, it appears that the ODE solver dSolveLCP is being used to handle a system of linear equalities Ax=b with bounds as opposed to an LCP. The issue occurs when the system does not have a solution. The matrix A is singular (30x30 with rank 11) but almost has a solution because of linearly dependent equations. Using the pseudo-inverse (ignoring bounds for now) gives a solution with error of 0.001 in a few elements, similar to the PGS solution.

While the contact normals look ok, some of the contact locations are causing the inconsistent equations. The GIF below shows the contacts visualized in Gazebo and the simulation failure.

If I apply the diff below to ignore the invalid contact with the lower z position, the simulation continues for more steps. However, the other contacts begin to drift and the simulation fails eventually as shown below.

diff --git a/dart/constraint/ConstraintSolver.cpp b/dart/constraint/ConstraintSolver.cpp
index 0407973..e0877b0 100644
--- a/dart/constraint/ConstraintSolver.cpp
+++ b/dart/constraint/ConstraintSolver.cpp
@@ -426,7 +426,8 @@ DART_SUPPRESS_DEPRECATED_END
   {
     contactConstraint->update();
 
-    if (contactConstraint->isActive())
+    if (contactConstraint->isActive() &&
+        (contactConstraint->mContacts[0]->point(2) < 0.9))
       mActiveConstraints.push_back(contactConstraint);
   }

jslee02 · 2017-07-18T17:02:50Z

Thank you for investigating this issue. I'm currently on vacation so might not be able to take a look at this for the next few weeks. However, any update would be much appreciated.

mxgrey · 2017-07-18T17:05:26Z

I can help look into this. Enjoy your vacation, JS!

pchorak · 2017-07-18T17:06:16Z

I see two ways to keep the simulation from failing.

Use a more robust solver such as PGS that returns an approximate solution when given inconsistent equations. LP solvers may also be an option since the LCP capabilities of dSolveLCP appear unused.
Fix the contact point generation.

Option 2 may be preferable since invalid contacts points are not good. However, option 1 could be more general if inconsistent equations crop up in other contact patterns.

rosebudflyaway · 2017-08-26T00:24:14Z

@scpeters FYI
I spent some more time investigating this issue. Originally I have had the same thoughts with @pchorak on the two potential root causes of this issue: either collision detection return invalid contacts, which results in an invalid problem; or the solver is not robust enough to solve this problem

From the gif that Peter posted, it is already clear that there is contact within the body itself, which would fail the simulation if and only if the solver returns a contact force close to 0 at this invalid contact. For PGS, I suppose most of the case, initial iteration, or initial guess is 0, which is actually helpful in this case.

Now after investigation, we should change the title of this issue, because this is a collision detection issue, not solver. I tried with the following combinations:
(1) ODE collision detection + quick_step (a modified PGS, different from the PGS in DART? )
(2) ODE collision detection + world step (Dantzig)
(3) DART collision detection (FCL) + PGS
(4) DART collision detection (FCL) + Dantzig

The solvers in case (2) and (4) are exactly the same since DART uses the implementation from ODE as well. Here are the simulation results:
(1) 2 contacts at the 2 vertexes of the two boxes, simulation runs smoothly for minutes, no crash
(2) 2 contacts at the 2 vertexes of the two boxes, simulation runs smoothly for minutes, no crash
(3) multiple contacts, 1 of them is inside the box, the simulation survives even this invalid contact exists (I suppose the solver returns force of 0 at this invalid contact, can be easily verified)
(4) multiple contacts, 1 of them is always inside the box, as shown in the gif, and simulation crashes.

Therefore, (2) and (4) are sufficient proof that the PCL returned invalid contacts, which results in the NaN in the Dantzig solution. I think it would not worth the effort to fix the solver at this time. But rather improve the FCL.

rosebudflyaway · 2017-08-26T00:26:07Z

@scpeters @pchorak This further motivates the ignition libraries and flexible switches between collision detection, solver, etc., so that we can identify which part caused the issue quickly 👍

scpeters · 2017-08-28T17:49:58Z

It does sound like the problem is coming from FCL, but it also seems that the Dantzig solver is less robust to bad inputs. Is there a way to detect when invalid contact points are given? Maybe we could compute the signed distance from a contact point to each collision shape? It could be a useful utility for debugging collision detection issues.

FYI @JenniferBuehler this issue may be interesting to you

JenniferBuehler · 2017-09-05T08:55:03Z

This is an interesting issue. I will only have more time to help look into this in October, but from what you are writing above I'd also say that it would make sense to fix the contact points as Peter suggests and as Ying has shown in the test (good proof btw). However making the solver more robust (at the very least with assertions to catch critical inputs) would be helpful as well. IMO generated NaN values should always be caught somewhere (ideally as close to the cause as possible), before they cause follow-up issues.

If this is not already solved by October when I will get back to work, I can help look into this as well.

mentisn4b · 2017-11-08T11:17:24Z

Hi, I'm also running into the problem of nonsensical contact points using the Bullet collision detector. Situation is: two cylinders colliding with each other at 90 degrees (in the shape of an X); should return one contact point roughly in the middle of the X. Most of the time this goes OK, but eventually Bullet collision detector returns one or more points that are outside the intersection of the AABBs of the colliding bodies. So clearly nonsensical. My codebase is fairly big so it's hard to make a minimal working example, but I can try if this would help.

First idea is to compute the AABB intersection between the two colliding bodies, and cull all contact points that lie outside it. But clearly it's just a workaround and not really a solution.

Phenomenon occurs with Bullet collision margin of 0 as well as 1E-3, 1E-6.

Any ideas how this might happen and how we can address it properly?

JenniferBuehler · 2017-11-08T13:17:04Z

I've had that off-shape contact point with bullet in the collision_benchmark tests. It would be next on my TODO list to start debugging the tests which fail with the collision_benchmark. If I find anything I will comment on it here.

mentisn4b · 2017-11-09T14:49:39Z

Hi, some more observations. I printed all collision shapes and their transformations after getting the collision result back from Bullet, and they all work out, so the problem is not there. (Anyway, the code that interfaces DART to the Bullet collision detector is pretty small and simple.)

The Bullet collision detector is very sensitive though; there are several epsilons defined in Bullet's GJK code. Earlier I had decreased some of these margins; recompiling with a fresh clone of the Bullet repository leads to the error sometimes not occurring, sometimes later in the simulation.

Some more insight from Erwin Coumans (also note: GJK penetration depth is only valid for small penetrations, i.e. less than the margins):

So, it might not be a bug per se, just a shortcoming of the GJK/EPA algorithms? (Any experts on those subjects?)

jslee02 · 2019-03-22T22:23:40Z

It seems the root causes are:
(1) FCL returns contactacts that leads to infeasible constraint problem
(2) Dantiz is sensitive to infeasible constraint problem

(1) may be the same with other collision detectors as well. What FCL makes the behavior worse is that it returns more contact points.

There are two possible solutions on top of my head now.

(a) Let the Dantzig solver falls back to using PGS solver when there is no solution, which can be detected by observing the s term in the implementation.
(b) Introduce contact manifold, which is the same concept of Bullet, to manage the contact points between two colliding objects. This might bring two positive effects to us: (i) limiting the maximum number of contacts so less computational load to constraint solver and (ii) maintaining the best contact points by discarding bad contact points.

In addition to that, (b) will allow us to use the convexity based collision algorithms for primitive shapes that is more efficient than using meshes. The main reason we don't use the FCL's primitive shapes (that uses the convexity based collision algorithms) is that they only return a single contact at a time. Contact manifold will aggregate contact pointes over time so that they can be used in dynamic simulation.

jslee02 · 2019-03-23T22:30:10Z

#1265 is a partial solution to this issue.

I'll leave this issue open because I think the Dantzig solver shouldn't return true when the solution includes NAN values. Also, we need to investigate why FCL returns contact points in the middle of the boxes.

@pchorak

### Problem Dantzig LCP solver sometimes returns inaccurate solution or even NAN values when the LCP doesn't have a solution. More importantly, `BoxedLCPConstraintSolver` uses the solution without any sanity checks, which easily leads to unstable simulation. This is reported in #892. ### Solution Let `BoxedLCPConstraintSolver` to use more robust LCP solver such as PGS when it fails to solve with the primary LCP solver. In addition to that, as a final resort, it discards NAN value solution when the secondary LCP solver failed as well. ### Testing Added the regression test provided by @pchorak. Thanks! *** **Before creating a pull request** - [x] Document new methods and classes - [x] Format new code files using `clang-format` **Before merging a pull request** - [x] Set version target by selecting a milestone on the right side - [x] Summarize this change in `CHANGELOG.md` - [x] Add unit test(s) for this change

stale bot added the stale label Feb 13, 2018

jslee02 added the status: on hold label Feb 13, 2018

stale bot removed the stale label Feb 13, 2018

jslee02 removed the status: on hold label Apr 8, 2018

dartsim deleted a comment from stale bot Apr 27, 2018

jslee02 added a commit that referenced this issue Mar 23, 2019

Add regression test for #892

13ad417

jslee02 mentioned this issue Mar 23, 2019

Allow BoxedLcpConstraintSolver to have secondary LCP solver #1265

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dantzig LCP solver nan values #892

Dantzig LCP solver nan values #892

pchorak commented Jul 14, 2017 •

edited

Loading

pchorak commented Jul 14, 2017

pchorak commented Jul 15, 2017

pchorak commented Jul 18, 2017

jslee02 commented Jul 18, 2017

mxgrey commented Jul 18, 2017

pchorak commented Jul 18, 2017

rosebudflyaway commented Aug 26, 2017

rosebudflyaway commented Aug 26, 2017

scpeters commented Aug 28, 2017

JenniferBuehler commented Sep 5, 2017

mentisn4b commented Nov 8, 2017

JenniferBuehler commented Nov 8, 2017

mentisn4b commented Nov 9, 2017 •

edited

Loading

jslee02 commented Mar 22, 2019

jslee02 commented Mar 23, 2019 •

edited

Loading

Dantzig LCP solver nan values #892

Dantzig LCP solver nan values #892

Comments

pchorak commented Jul 14, 2017 • edited Loading

pchorak commented Jul 14, 2017

pchorak commented Jul 15, 2017

pchorak commented Jul 18, 2017

jslee02 commented Jul 18, 2017

mxgrey commented Jul 18, 2017

pchorak commented Jul 18, 2017

rosebudflyaway commented Aug 26, 2017

rosebudflyaway commented Aug 26, 2017

scpeters commented Aug 28, 2017

JenniferBuehler commented Sep 5, 2017

mentisn4b commented Nov 8, 2017

JenniferBuehler commented Nov 8, 2017

mentisn4b commented Nov 9, 2017 • edited Loading

jslee02 commented Mar 22, 2019

jslee02 commented Mar 23, 2019 • edited Loading

pchorak commented Jul 14, 2017 •

edited

Loading

mentisn4b commented Nov 9, 2017 •

edited

Loading

jslee02 commented Mar 23, 2019 •

edited

Loading