[Core Aten ops] Logs for fixing core aten ops coverage issues #5934

qihqi · 2023-11-29T00:51:51Z

Let's use this issue as space for sharing notes and steps for adding lowerings for missing core aten ops.

qihqi · 2023-11-29T00:52:07Z

Issue beging worked #5902

1. Uncomment and rerun the test

LD_LIBRARY_PATH=/mnt/hanq/miniconda3/envs/torch310/lib/:/usr/lib/x86_64-linux-gnu/ PJRT_DEVICE=CPU XLA_STABLEHLO_COMPILE=1 XLA_HLO_DEBUG=1 XLA_IR_DEBUG=1 pytest test/test_core_aten_ops.py -k test_aten_tan_1

output:

=========================== short test summary info ============================
[torch_xla_diff:0.001] SUBFAIL test/test_core_aten_ops.py::AtenOpTest::test_aten_tan_1 - AssertionError: False is not true
[stablehlo_diff: 0.001] SUBFAIL test/test_core_aten_ops.py::AtenOpTest::test_aten_tan_1 - AssertionError: False is not true
================= 2 failed, 1 passed, 514 deselected in 5.51s ==================
I0000 00:00:1700690393.569658 2513762 tfrt_cpu_pjrt_client.cc:352] TfrtCpuClient destroyed.
(torch310) hanq@hanq-compile-2:/mnt/hanq/git/qihqi/pytorch/xla$

This means that the accuracy is not good.

Break line here

(torch310) hanq@hanq-compile-2:/mnt/hanq/git/qihqi/pytorch/xla$ git diff
diff --git a/test/test_core_aten_ops.py b/test/test_core_aten_ops.py
index 46a18494d..ff055ee38 100644
--- a/test/test_core_aten_ops.py
+++ b/test/test_core_aten_ops.py
@@ -36,6 +36,7 @@ def run_export_and_compare(testcase, func, args, kwargs, atol=1e-3):
                                      lambda x: x.to(device=device), kwargs)
       res_xla = func(*args2, **kwargs2)
       with testcase.subTest('torch_xla_diff:' + str(atol)):
+        import pdb; pdb.set_trace()
         diff_output(testcase, res, res_xla, atol)

Rerun, print out the difference:

(Pdb) p res - res_xla.cpu()
tensor([[ 0.0000e+00,  0.0000e+00, -4.8828e-04,  0.0000e+00,  0.0000e+00,
          0.0000e+00,  0.0000e+00,  6.1035e-05,  0.0000e+00,  0.0000e+00],
        [-4.8828e-04,  0.0000e+00,  0.0000e+00,  9.7656e-04,  0.0000e+00,
          1.2207e-04,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00, -1.5259e-05,  0.0000e+00,  0.0000e+00,  0.0000e+00,
          4.8828e-04,  0.0000e+00,  0.0000e+00,  0.0000e+00,  1.2207e-04],
        [ 0.0000e+00,  2.4414e-04,  0.0000e+00, -1.9531e-03,  0.0000e+00,
          0.0000e+00,  0.0000e+00,  0.0000e+00, -3.0518e-05,  0.0000e+00],
        [ 0.0000e+00, -4.8828e-04, -2.4414e-04,  0.0000e+00,  0.0000e+00,
          0.0000e+00,  0.0000e+00, -6.1035e-05,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00, -1.9531e-03,
          0.0000e+00,  0.0000e+00,  1.9531e-03,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00, -1.9531e-03,  0.0000e+00,  0.0000e+00,
          2.4414e-04,  9.7656e-04,  1.2207e-04,  0.0000e+00,  0.0000e+00],
        [ 4.8828e-04,  0.0000e+00,  0.0000e+00, -7.8125e-03,  1.2207e-04,
         -9.7656e-04,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  1.5625e-02,  0.0000e+00,  0.0000e+00, -4.8828e-04,
         -1.2207e-04,  0.0000e+00,  0.0000e+00, -4.8828e-04, -3.9062e-03],
        [ 0.0000e+00, -1.2207e-04,  0.0000e+00,  0.0000e+00,  0.0000e+00,
          0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00]],
       dtype=torch.float16)

The result looks good enough; This means that probably we are being too strict in
test; setting a larger tolerance probably will work.

(Pdb) p torch.max(torch.abs(res - res_xla.cpu()))
tensor(0.0156, dtype=torch.float16)

printing out the difference shows that roughly 0.01 atol with a slightly larger
rtol probably work.

(Pdb) torch.allclose(res, res_xla.cpu(), atol=0.01, rtol=0.001)
True

Now it's time to PR:
#5915

Add link to #5934 in our FIX_LOWERING_FOR_CORE_ATEN_OPS.md.

…rch#5944) Add link to pytorch#5934 in our FIX_LOWERING_FOR_CORE_ATEN_OPS.md.

wonjoolee95 · 2023-12-12T00:24:11Z

Working on issue: #5934

Doing a quick check for the differences (res-res_xla), we can see that results are pretty much equal:

WONJOO: at diff_output, output1-output2_cpu=tensor([[        nan,         nan,         nan,         nan,  0.0000e+00,
          0.0000e+00,         nan,         nan,  0.0000e+00,         nan],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,
                 nan,         nan,         nan,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,         nan,         nan,  0.0000e+00,  0.0000e+00,
                 nan,         nan,  0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,         nan,         nan,  0.0000e+00,         nan,
          0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00,  0.0000e+00],
        [        nan, -1.4901e-08,         nan,  0.0000e+00,  0.0000e+00,
          0.0000e+00,         nan,         nan,         nan,  0.0000e+00],
        [-3.7253e-09,  0.0000e+00,         nan,         nan,  0.0000e+00,
                 nan,         nan,  0.0000e+00,  2.9802e-08,  0.0000e+00],
        [        nan,         nan,  0.0000e+00,         nan,         nan,
          0.0000e+00,  0.0000e+00,  0.0000e+00,         nan,  0.0000e+00],
        [        nan,  1.1921e-07,         nan,         nan,         nan,
                 nan,         nan,  0.0000e+00,         nan,  0.0000e+00],
        [        nan,  0.0000e+00,         nan,  0.0000e+00,         nan,
                 nan,         nan,  0.0000e+00,  0.0000e+00,  0.0000e+00],
        [ 0.0000e+00,  0.0000e+00,  0.0000e+00,         nan,  0.0000e+00,
                 nan,  0.0000e+00,         nan,         nan,  0.0000e+00]])

But we see a bunch of nan's, which is expected for the aten_log op as ln(x) is undefined for x <= 0. And looking at the torch.allclose's documentation (https://pytorch.org/docs/stable/generated/torch.allclose.html), we can actually see that there is a flag called equal_nan that defaults to False. If set to true, this flag is consider two nan's as equal, which is what we want at least for this aten_log op.

Note that we should have this equal_nan to False by default. Only in these specific ops such as aten_log, we want to set this to true.

…rch#5944) Add link to pytorch#5934 in our FIX_LOWERING_FOR_CORE_ATEN_OPS.md.

Add link to #5934 in our FIX_LOWERING_FOR_CORE_ATEN_OPS.md.

wonjoolee95 · 2024-03-29T23:11:35Z

Closing as all issues under this label have been resolved.

Add link to #5934 in our FIX_LOWERING_FOR_CORE_ATEN_OPS.md.

qihqi mentioned this issue Nov 29, 2023

Increase tolerance for tan #5915

Merged

wonjoolee95 added the core aten opset label Nov 29, 2023

wonjoolee95 added a commit that referenced this issue Nov 29, 2023

Add link to work log issue in FIX_LOWERING_FOR_CORE_ATEN_OPS.md

b873123

Add link to #5934 in our FIX_LOWERING_FOR_CORE_ATEN_OPS.md.

wonjoolee95 mentioned this issue Nov 29, 2023

Add link to work log issue in FIX_LOWERING_FOR_CORE_ATEN_OPS.md #5944

Merged

qihqi pushed a commit that referenced this issue Nov 29, 2023

Add link to work log issue in FIX_LOWERING_FOR_CORE_ATEN_OPS.md (#5944)

09f2a14

Add link to #5934 in our FIX_LOWERING_FOR_CORE_ATEN_OPS.md.

ManfeiBai pushed a commit to ManfeiBai/PyTorchXLA that referenced this issue Dec 1, 2023

Add link to work log issue in FIX_LOWERING_FOR_CORE_ATEN_OPS.md (pyto…

79ea461

…rch#5944) Add link to pytorch#5934 in our FIX_LOWERING_FOR_CORE_ATEN_OPS.md.

ManfeiBai pushed a commit to ManfeiBai/PyTorchXLA that referenced this issue Dec 1, 2023

Add link to work log issue in FIX_LOWERING_FOR_CORE_ATEN_OPS.md (pyto…

d929d8c

…rch#5944) Add link to pytorch#5934 in our FIX_LOWERING_FOR_CORE_ATEN_OPS.md.

wonjoolee95 mentioned this issue Dec 12, 2023

Add flag for equal_nan in test_core_aten_ops.py #6104

Merged

chunnienc pushed a commit to chunnienc/xla that referenced this issue Dec 14, 2023

Add link to work log issue in FIX_LOWERING_FOR_CORE_ATEN_OPS.md (pyto…

67b5e2a

…rch#5944) Add link to pytorch#5934 in our FIX_LOWERING_FOR_CORE_ATEN_OPS.md.

golechwierowicz pushed a commit that referenced this issue Jan 12, 2024

Add link to work log issue in FIX_LOWERING_FOR_CORE_ATEN_OPS.md (#5944)

a895636

Add link to #5934 in our FIX_LOWERING_FOR_CORE_ATEN_OPS.md.

wonjoolee95 closed this as completed Mar 29, 2024

bhavya01 pushed a commit that referenced this issue Apr 22, 2024

Add link to work log issue in FIX_LOWERING_FOR_CORE_ATEN_OPS.md (#5944)

197f94d

Add link to #5934 in our FIX_LOWERING_FOR_CORE_ATEN_OPS.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core Aten ops] Logs for fixing core aten ops coverage issues #5934

[Core Aten ops] Logs for fixing core aten ops coverage issues #5934

qihqi commented Nov 29, 2023

qihqi commented Nov 29, 2023

wonjoolee95 commented Dec 12, 2023

wonjoolee95 commented Mar 29, 2024

[Core Aten ops] Logs for fixing core aten ops coverage issues #5934

[Core Aten ops] Logs for fixing core aten ops coverage issues #5934

Comments

qihqi commented Nov 29, 2023

qihqi commented Nov 29, 2023

Issue beging worked #5902

1. Uncomment and rerun the test

wonjoolee95 commented Dec 12, 2023

wonjoolee95 commented Mar 29, 2024