Add relative threshold while testing inversing #8973

Yipeng1994 · 2022-08-20T04:34:30Z

Inverse of a matrix is a technique which requires extremely delicate care.
Special treatments such as QR factorization are needed when encountering an ill-conditioned matrix.
One simple example:

>>> a = np.mat([[0.4, 0.8001], [0.3, 0.6]])
>>> a.I
matrix([[-19999.99999999,  26669.99999998],
        [  9999.99999999, -13333.33333332]])

I understand that QP factorization or Singular Value Decomposition, as well as any other matrix inversion techniques, might be too hard for CS guys, but at least we need to loose the threshold for inversion of matrixes.

The current test /oneflow/python/oneflow/test/modules/test_inv.py would fail with a possibility at around 1/6. (Test 6 times and fails 1 of them.)

Using @autotest(n=1000) would increase the possibility to 100%. (Fails 6/Test 6)

Some failed examples:

Tensor([3, 3]).to(cuda)
linalg_inv(Tensor([3, 3]))
Tensor([3, 3]).backward(Tensor([3, 3]))
-----------------------------------------------------------
This program has 1 input tensor: 
Shape[3, 3]
tensor([[-0.3783,  0.6652,  0.7347],
        [ 0.5621, -0.5812, -0.8161],
        [-0.9072, -0.2409,  0.3697]], requires_grad=True)
-----------------------------------------------------------
---------Grad Shape--------
(3, 3)
(3, 3)
Grads are not equal. PyTorch grad: 
[[ 1.6151541e+02 -2.0440527e+02  2.8227338e+02]
 [ 1.2237598e+02 -1.5728748e+02  2.2142242e+02]
 [-3.6738663e+00  3.3087006e+00  4.9804688e-02]]
, OneFlow grad: 
[[ 1.6151517e+02 -2.0440511e+02  2.8227298e+02]
 [ 1.2237585e+02 -1.5728735e+02  2.2142215e+02]
 [-3.6738594e+00  3.3086803e+00  4.9820259e-02]]
F..
======================================================================
FAIL: test_inv_3by3_with_random_data (__main__.TestLinalgInv)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/liyipeng/oneflow/python/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py", line 1275, in new_f
    test_case.assertTrue(
AssertionError: False is not true : PyTorch object:
tensor([[-0.3783,  0.6652,  0.7347],
        [ 0.5621, -0.5812, -0.8161],
        [-0.9072, -0.2409,  0.3697]], requires_grad=True)

OneFlow object:
tensor([[-0.3783,  0.6652,  0.7347],
        [ 0.5621, -0.5812, -0.8161],
        [-0.9072, -0.2409,  0.3697]], dtype=oneflow.float32,
       grad_fn=<accumulate_grad>)

----------------------------------------------------------------------
Ran 3 tests in 29.627s

Tensor([1, 5, 4, 4]).to(cpu)
linalg_inv(Tensor([1, 5, 4, 4]))
Tensor([1, 5, 4, 4]).backward(Tensor([1, 5, 4, 4]))
-----------------------------------------------------------
This program has 0 input tensor: 
---------Grad Shape--------
(1, 5, 4, 4)
(1, 5, 4, 4)
Grads are not equal. PyTorch grad: 
[[[[-4.65521698e+01 -2.71171844e+02 -3.41911652e+02 -8.51887741e+01]
   [-1.25220251e+00  1.59806156e+01  2.16551609e+01  6.06580114e+00]
   [-1.60665874e+01 -1.01248917e+02 -1.27097038e+02 -3.24005127e+01]
   [-8.36951904e+01 -5.38389282e+02 -6.83052979e+02 -1.71473633e+02]]

  [[ 4.32749420e-01  9.42310765e-02  8.96222740e-02  1.32365465e+00]
   [ 5.29575348e-02  1.76484537e+00 -1.03522265e+00 -3.05497265e+00]
   [-2.15416431e-01 -2.36049557e+00  1.56413865e+00  2.84060001e+00]
   [ 2.49766493e+00  4.92702341e+00 -3.95438075e-01  6.32508755e-01]]

  [[-5.60792685e-02 -7.17838645e-01 -4.33617210e+00  1.33587563e+00]
   [ 9.83020723e-01 -1.06119394e-01  2.80733967e+00 -1.52282453e+00]
   [ 1.13506043e+00  8.13995481e-01  2.56936073e+00 -1.95242929e+00]
   [ 6.07015848e-01  7.24248886e-02 -3.91491938e+00 -3.72836709e-01]]

  [[-4.70855141e+00 -1.09308243e-01 -1.48735809e+00 -5.98532867e+00]
   [-2.29591968e+03  4.16507227e+03  7.48318848e+03  1.44038457e+04]
   [ 6.94184766e+03 -1.25190430e+04 -2.24846953e+04 -4.32550234e+04]
   [ 4.86320215e+03 -8.75112988e+03 -1.57159219e+04 -3.02282266e+04]]

  [[ 2.55540580e-01 -1.14453837e-01  1.47593722e-01 -3.50318521e-01]
   [ 3.02729082e+00 -2.52486277e+00 -1.41736627e+00  4.90831017e-01]
   [ 1.85220146e+00 -1.43423986e+00 -1.39060760e+00  4.84482646e-01]
   [-3.32398713e-01  1.31413257e+00  3.93355519e-01 -4.45561081e-01]]]]
, OneFlow grad: 
[[[[-4.65521851e+01 -2.71171875e+02 -3.41911713e+02 -8.51888123e+01]
   [-1.25219285e+00  1.59806614e+01  2.16552334e+01  6.06581879e+00]
   [-1.60665932e+01 -1.01248947e+02 -1.27097084e+02 -3.24005318e+01]
   [-8.36952362e+01 -5.38389404e+02 -6.83053284e+02 -1.71473755e+02]]

  [[ 4.32748944e-01  9.42309499e-02  8.96222144e-02  1.32365441e+00]
   [ 5.29606044e-02  1.76484573e+00 -1.03522217e+00 -3.05497003e+00]
   [-2.15418443e-01 -2.36049557e+00  1.56413817e+00  2.84059882e+00]
   [ 2.49766588e+00  4.92702293e+00 -3.95437658e-01  6.32509589e-01]]

  [[-5.60792312e-02 -7.17838764e-01 -4.33617020e+00  1.33587515e+00]
   [ 9.83020246e-01 -1.06118940e-01  2.80733824e+00 -1.52282417e+00]
   [ 1.13506019e+00  8.13995361e-01  2.56936026e+00 -1.95242858e+00]
   [ 6.07016087e-01  7.24245161e-02 -3.91491747e+00 -3.72836977e-01]]

  [[-4.71032810e+00 -1.06056117e-01 -1.48148429e+00 -5.97390556e+00]
   [-2.29598462e+03  4.16518604e+03  7.48339160e+03  1.44042461e+04]
   [ 6.94204980e+03 -1.25193984e+04 -2.24853301e+04 -4.32562734e+04]
   [ 4.86334424e+03 -8.75137988e+03 -1.57163682e+04 -3.02291074e+04]]

  [[ 2.55540639e-01 -1.14454016e-01  1.47593722e-01 -3.50318611e-01]
   [ 3.02729082e+00 -2.52486300e+00 -1.41736627e+00  4.90831107e-01]
   [ 1.85220098e+00 -1.43423963e+00 -1.39060760e+00  4.84482616e-01]
   [-3.32398593e-01  1.31413257e+00  3.93355787e-01 -4.45560992e-01]]]]
F
======================================================================
FAIL: test_inv_random_square_with_random_data (__main__.TestLinalgInv)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/liyipeng/oneflow/python/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py", line 1275, in new_f
    test_case.assertTrue(
AssertionError: False is not true : PyTorch object:
tensor([[[[-0.2352,  0.9168, -0.7041,  0.2248],
          [ 0.1014, -0.7882,  0.5368,  0.4170],
          [-0.2696, -0.5297,  0.6678, -0.7738],
          [ 0.1932, -0.3529,  0.2615,  0.0697]],

         [[-0.3393,  0.5726,  0.8393, -0.5488],
          [-0.5826,  0.9323, -0.8257, -0.5832],
          [ 0.2835,  0.8376, -0.8545, -0.8949],
          [ 0.8430, -0.0065, -0.1107, -0.8425]],

         [[ 0.7822, -0.1643,  0.6387,  0.9623],
          [ 0.2658,  0.7110,  0.3318,  0.7943],
          [-0.7222, -0.9264,  0.5820, -0.1784],
          [-0.2293,  0.5952, -0.0253, -0.2249]],

         [[ 0.0216,  0.9170,  0.9748, -0.7578],
          [-0.2142, -0.4402,  0.8749, -0.3553],
          [ 0.0645, -0.4060,  0.7393, -0.2541],
          [-0.1840,  0.3850, -0.6396,  0.1955]],

         [[-0.8927, -0.9675, -0.3301, -0.2418],
          [-0.4759, -0.3266, -0.6548,  0.1417],
          [ 0.6373, -0.0540,  0.1246, -0.8636],
          [ 0.8255,  0.1997,  0.7166,  0.9452]]]], requires_grad=True)

OneFlow object:
tensor([[[[-0.2352,  0.9168, -0.7041,  0.2248],
          [ 0.1014, -0.7882,  0.5368,  0.4170],
          [-0.2696, -0.5297,  0.6678, -0.7738],
          [ 0.1932, -0.3529,  0.2615,  0.0697]],

         [[-0.3393,  0.5726,  0.8393, -0.5488],
          [-0.5826,  0.9323, -0.8257, -0.5832],
          [ 0.2835,  0.8376, -0.8545, -0.8949],
          [ 0.8430, -0.0065, -0.1107, -0.8425]],

         [[ 0.7822, -0.1643,  0.6387,  0.9623],
          [ 0.2658,  0.7110,  0.3318,  0.7943],
          [-0.7222, -0.9264,  0.5820, -0.1784],
          [-0.2293,  0.5952, -0.0253, -0.2249]],

         [[ 0.0216,  0.9170,  0.9748, -0.7578],
          [-0.2142, -0.4402,  0.8749, -0.3553],
          [ 0.0645, -0.4060,  0.7393, -0.2541],
          [-0.1840,  0.3850, -0.6396,  0.1955]],

         [[-0.8927, -0.9675, -0.3301, -0.2418],
          [-0.4759, -0.3266, -0.6548,  0.1417],
          [ 0.6373, -0.0540,  0.1246, -0.8636],
          [ 0.8255,  0.1997,  0.7166,  0.9452]]]], dtype=oneflow.float32,
       grad_fn=<accumulate_grad>)

----------------------------------------------------------------------
Ran 3 tests in 31.385s

The second shown example indicates that 1e-3 is not enough, thus we use 1e-2 as the relative threshold.
And absolute threshold is useless when encountering an ill-conditioned matrix.
Therefore, we remove the absolute threshold.

simonJJJ · 2022-08-20T04:53:43Z

good catch!

Add relative threshold while testing inversing

bbd91eb

Yipeng1994 requested a review from simonJJJ August 20, 2022 04:34

Yipeng1994 requested review from BBuf, daquexian and jackalcooper as code owners August 20, 2022 04:34

Yipeng1994 added bug ci labels Aug 20, 2022

simonJJJ approved these changes Aug 20, 2022

View reviewed changes

jackalcooper merged commit 0d91cf6 into master Aug 20, 2022

jackalcooper deleted the fix-inv_ci-bug branch August 20, 2022 05:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add relative threshold while testing inversing #8973

Add relative threshold while testing inversing #8973

Yipeng1994 commented Aug 20, 2022 •

edited

Loading

simonJJJ commented Aug 20, 2022

Add relative threshold while testing inversing #8973

Add relative threshold while testing inversing #8973

Conversation

Yipeng1994 commented Aug 20, 2022 • edited Loading

simonJJJ commented Aug 20, 2022

Yipeng1994 commented Aug 20, 2022 •

edited

Loading