Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add relative threshold while testing inversing #8973

Merged
merged 1 commit into from
Aug 20, 2022
Merged

Conversation

Yipeng1994
Copy link
Contributor

@Yipeng1994 Yipeng1994 commented Aug 20, 2022

Inverse of a matrix is a technique which requires extremely delicate care.
Special treatments such as QR factorization are needed when encountering an ill-conditioned matrix.
One simple example:

>>> a = np.mat([[0.4, 0.8001], [0.3, 0.6]])
>>> a.I
matrix([[-19999.99999999,  26669.99999998],
        [  9999.99999999, -13333.33333332]])

I understand that QP factorization or Singular Value Decomposition, as well as any other matrix inversion techniques, might be too hard for CS guys, but at least we need to loose the threshold for inversion of matrixes.

The current test /oneflow/python/oneflow/test/modules/test_inv.py would fail with a possibility at around 1/6. (Test 6 times and fails 1 of them.)

Using @autotest(n=1000) would increase the possibility to 100%. (Fails 6/Test 6)

Some failed examples:

Tensor([3, 3]).to(cuda)
linalg_inv(Tensor([3, 3]))
Tensor([3, 3]).backward(Tensor([3, 3]))
-----------------------------------------------------------
This program has 1 input tensor: 
Shape[3, 3]
tensor([[-0.3783,  0.6652,  0.7347],
        [ 0.5621, -0.5812, -0.8161],
        [-0.9072, -0.2409,  0.3697]], requires_grad=True)
-----------------------------------------------------------
---------Grad Shape--------
(3, 3)
(3, 3)
Grads are not equal. PyTorch grad: 
[[ 1.6151541e+02 -2.0440527e+02  2.8227338e+02]
 [ 1.2237598e+02 -1.5728748e+02  2.2142242e+02]
 [-3.6738663e+00  3.3087006e+00  4.9804688e-02]]
, OneFlow grad: 
[[ 1.6151517e+02 -2.0440511e+02  2.8227298e+02]
 [ 1.2237585e+02 -1.5728735e+02  2.2142215e+02]
 [-3.6738594e+00  3.3086803e+00  4.9820259e-02]]
F..
======================================================================
FAIL: test_inv_3by3_with_random_data (__main__.TestLinalgInv)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/liyipeng/oneflow/python/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py", line 1275, in new_f
    test_case.assertTrue(
AssertionError: False is not true : PyTorch object:
tensor([[-0.3783,  0.6652,  0.7347],
        [ 0.5621, -0.5812, -0.8161],
        [-0.9072, -0.2409,  0.3697]], requires_grad=True)

OneFlow object:
tensor([[-0.3783,  0.6652,  0.7347],
        [ 0.5621, -0.5812, -0.8161],
        [-0.9072, -0.2409,  0.3697]], dtype=oneflow.float32,
       grad_fn=<accumulate_grad>)

----------------------------------------------------------------------
Ran 3 tests in 29.627s
Tensor([1, 5, 4, 4]).to(cpu)
linalg_inv(Tensor([1, 5, 4, 4]))
Tensor([1, 5, 4, 4]).backward(Tensor([1, 5, 4, 4]))
-----------------------------------------------------------
This program has 0 input tensor: 
---------Grad Shape--------
(1, 5, 4, 4)
(1, 5, 4, 4)
Grads are not equal. PyTorch grad: 
[[[[-4.65521698e+01 -2.71171844e+02 -3.41911652e+02 -8.51887741e+01]
   [-1.25220251e+00  1.59806156e+01  2.16551609e+01  6.06580114e+00]
   [-1.60665874e+01 -1.01248917e+02 -1.27097038e+02 -3.24005127e+01]
   [-8.36951904e+01 -5.38389282e+02 -6.83052979e+02 -1.71473633e+02]]

  [[ 4.32749420e-01  9.42310765e-02  8.96222740e-02  1.32365465e+00]
   [ 5.29575348e-02  1.76484537e+00 -1.03522265e+00 -3.05497265e+00]
   [-2.15416431e-01 -2.36049557e+00  1.56413865e+00  2.84060001e+00]
   [ 2.49766493e+00  4.92702341e+00 -3.95438075e-01  6.32508755e-01]]

  [[-5.60792685e-02 -7.17838645e-01 -4.33617210e+00  1.33587563e+00]
   [ 9.83020723e-01 -1.06119394e-01  2.80733967e+00 -1.52282453e+00]
   [ 1.13506043e+00  8.13995481e-01  2.56936073e+00 -1.95242929e+00]
   [ 6.07015848e-01  7.24248886e-02 -3.91491938e+00 -3.72836709e-01]]

  [[-4.70855141e+00 -1.09308243e-01 -1.48735809e+00 -5.98532867e+00]
   [-2.29591968e+03  4.16507227e+03  7.48318848e+03  1.44038457e+04]
   [ 6.94184766e+03 -1.25190430e+04 -2.24846953e+04 -4.32550234e+04]
   [ 4.86320215e+03 -8.75112988e+03 -1.57159219e+04 -3.02282266e+04]]

  [[ 2.55540580e-01 -1.14453837e-01  1.47593722e-01 -3.50318521e-01]
   [ 3.02729082e+00 -2.52486277e+00 -1.41736627e+00  4.90831017e-01]
   [ 1.85220146e+00 -1.43423986e+00 -1.39060760e+00  4.84482646e-01]
   [-3.32398713e-01  1.31413257e+00  3.93355519e-01 -4.45561081e-01]]]]
, OneFlow grad: 
[[[[-4.65521851e+01 -2.71171875e+02 -3.41911713e+02 -8.51888123e+01]
   [-1.25219285e+00  1.59806614e+01  2.16552334e+01  6.06581879e+00]
   [-1.60665932e+01 -1.01248947e+02 -1.27097084e+02 -3.24005318e+01]
   [-8.36952362e+01 -5.38389404e+02 -6.83053284e+02 -1.71473755e+02]]

  [[ 4.32748944e-01  9.42309499e-02  8.96222144e-02  1.32365441e+00]
   [ 5.29606044e-02  1.76484573e+00 -1.03522217e+00 -3.05497003e+00]
   [-2.15418443e-01 -2.36049557e+00  1.56413817e+00  2.84059882e+00]
   [ 2.49766588e+00  4.92702293e+00 -3.95437658e-01  6.32509589e-01]]

  [[-5.60792312e-02 -7.17838764e-01 -4.33617020e+00  1.33587515e+00]
   [ 9.83020246e-01 -1.06118940e-01  2.80733824e+00 -1.52282417e+00]
   [ 1.13506019e+00  8.13995361e-01  2.56936026e+00 -1.95242858e+00]
   [ 6.07016087e-01  7.24245161e-02 -3.91491747e+00 -3.72836977e-01]]

  [[-4.71032810e+00 -1.06056117e-01 -1.48148429e+00 -5.97390556e+00]
   [-2.29598462e+03  4.16518604e+03  7.48339160e+03  1.44042461e+04]
   [ 6.94204980e+03 -1.25193984e+04 -2.24853301e+04 -4.32562734e+04]
   [ 4.86334424e+03 -8.75137988e+03 -1.57163682e+04 -3.02291074e+04]]

  [[ 2.55540639e-01 -1.14454016e-01  1.47593722e-01 -3.50318611e-01]
   [ 3.02729082e+00 -2.52486300e+00 -1.41736627e+00  4.90831107e-01]
   [ 1.85220098e+00 -1.43423963e+00 -1.39060760e+00  4.84482616e-01]
   [-3.32398593e-01  1.31413257e+00  3.93355787e-01 -4.45560992e-01]]]]
F
======================================================================
FAIL: test_inv_random_square_with_random_data (__main__.TestLinalgInv)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/liyipeng/oneflow/python/oneflow/test_utils/automated_test_util/torch_flow_dual_object.py", line 1275, in new_f
    test_case.assertTrue(
AssertionError: False is not true : PyTorch object:
tensor([[[[-0.2352,  0.9168, -0.7041,  0.2248],
          [ 0.1014, -0.7882,  0.5368,  0.4170],
          [-0.2696, -0.5297,  0.6678, -0.7738],
          [ 0.1932, -0.3529,  0.2615,  0.0697]],

         [[-0.3393,  0.5726,  0.8393, -0.5488],
          [-0.5826,  0.9323, -0.8257, -0.5832],
          [ 0.2835,  0.8376, -0.8545, -0.8949],
          [ 0.8430, -0.0065, -0.1107, -0.8425]],

         [[ 0.7822, -0.1643,  0.6387,  0.9623],
          [ 0.2658,  0.7110,  0.3318,  0.7943],
          [-0.7222, -0.9264,  0.5820, -0.1784],
          [-0.2293,  0.5952, -0.0253, -0.2249]],

         [[ 0.0216,  0.9170,  0.9748, -0.7578],
          [-0.2142, -0.4402,  0.8749, -0.3553],
          [ 0.0645, -0.4060,  0.7393, -0.2541],
          [-0.1840,  0.3850, -0.6396,  0.1955]],

         [[-0.8927, -0.9675, -0.3301, -0.2418],
          [-0.4759, -0.3266, -0.6548,  0.1417],
          [ 0.6373, -0.0540,  0.1246, -0.8636],
          [ 0.8255,  0.1997,  0.7166,  0.9452]]]], requires_grad=True)

OneFlow object:
tensor([[[[-0.2352,  0.9168, -0.7041,  0.2248],
          [ 0.1014, -0.7882,  0.5368,  0.4170],
          [-0.2696, -0.5297,  0.6678, -0.7738],
          [ 0.1932, -0.3529,  0.2615,  0.0697]],

         [[-0.3393,  0.5726,  0.8393, -0.5488],
          [-0.5826,  0.9323, -0.8257, -0.5832],
          [ 0.2835,  0.8376, -0.8545, -0.8949],
          [ 0.8430, -0.0065, -0.1107, -0.8425]],

         [[ 0.7822, -0.1643,  0.6387,  0.9623],
          [ 0.2658,  0.7110,  0.3318,  0.7943],
          [-0.7222, -0.9264,  0.5820, -0.1784],
          [-0.2293,  0.5952, -0.0253, -0.2249]],

         [[ 0.0216,  0.9170,  0.9748, -0.7578],
          [-0.2142, -0.4402,  0.8749, -0.3553],
          [ 0.0645, -0.4060,  0.7393, -0.2541],
          [-0.1840,  0.3850, -0.6396,  0.1955]],

         [[-0.8927, -0.9675, -0.3301, -0.2418],
          [-0.4759, -0.3266, -0.6548,  0.1417],
          [ 0.6373, -0.0540,  0.1246, -0.8636],
          [ 0.8255,  0.1997,  0.7166,  0.9452]]]], dtype=oneflow.float32,
       grad_fn=<accumulate_grad>)

----------------------------------------------------------------------
Ran 3 tests in 31.385s

The second shown example indicates that 1e-3 is not enough, thus we use 1e-2 as the relative threshold.
And absolute threshold is useless when encountering an ill-conditioned matrix.
Therefore, we remove the absolute threshold.

@simonJJJ
Copy link
Contributor

good catch!

@jackalcooper jackalcooper merged commit 0d91cf6 into master Aug 20, 2022
@jackalcooper jackalcooper deleted the fix-inv_ci-bug branch August 20, 2022 05:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants