Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Inconsistency caused by 127i8*127i8*numi64 after using compute_inline #12378

Closed
cxx122 opened this issue Aug 11, 2022 · 3 comments
Closed

Comments

@cxx122
Copy link

cxx122 commented Aug 11, 2022

TENSOR_0 = te.compute([2,2], lambda pcg,wcv:te.max_value("int8"), name ="TENSOR_0")
TENSOR_1 = te.placeholder([2,2], dtype="int64", name="TENSOR_1")
TENSOR_2 = te.compute([2,2], lambda zcv,tcu:TENSOR_0[zcv,tcu]*TENSOR_0[zcv,tcu]*TENSOR_1[zcv,tcu], name ="TENSOR_2")

The tir program before compute_inline:

@main = primfn(TENSOR_0_1: handle, TENSOR_1_1: handle, TENSOR_2_1: handle) -> ()
  attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
  buffers = {TENSOR_0: Buffer(TENSOR_0_2: Pointer(int8), int8, [4], []),
             TENSOR_1: Buffer(TENSOR_1_2: Pointer(int64), int64, [4], []),
             TENSOR_2: Buffer(TENSOR_2_2: Pointer(int64), int64, [4], [])}
  buffer_map = {TENSOR_0_1: TENSOR_0, TENSOR_1_1: TENSOR_1, TENSOR_2_1: TENSOR_2}
  preflattened_buffer_map = {TENSOR_0_1: TENSOR_0_3: Buffer(TENSOR_0_2, int8, [2, 2], []), TENSOR_1_1: TENSOR_1_3: Buffer(TENSOR_1_2, int64, [2, 2], []), TENSOR_2_1: TENSOR_2_3: Buffer(TENSOR_2_2, int64, [2, 2], [])} {
  for (pcg: int32, 0, 2) {
    for (wcv: int32, 0, 2) {
      TENSOR_0[((pcg*2) + wcv)] = 127i8
    }
  }
  for (zcv: int32, 0, 2) {
    for (tcu: int32, 0, 2) {
      let cse_var_1: int32 = ((zcv*2) + tcu)
      TENSOR_2[cse_var_1] = (cast(int64, (TENSOR_0[cse_var_1]*TENSOR_0[cse_var_1]))*TENSOR_1[cse_var_1])
    }
  }
}

The tir program after compute_inline:

@main = primfn(TENSOR_0_1: handle, TENSOR_1_1: handle, TENSOR_2_1: handle) -> ()
  attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
  buffers = {TENSOR_0: Buffer(TENSOR_0_2: Pointer(int8), int8, [4], []),
             TENSOR_1: Buffer(TENSOR_1_2: Pointer(int64), int64, [4], []),
             TENSOR_2: Buffer(TENSOR_2_2: Pointer(int64), int64, [4], [])}
  buffer_map = {TENSOR_0_1: TENSOR_0, TENSOR_1_1: TENSOR_1, TENSOR_2_1: TENSOR_2}
  preflattened_buffer_map = {TENSOR_0_1: TENSOR_0_3: Buffer(TENSOR_0_2, int8, [2, 2], []), TENSOR_1_1: TENSOR_1_3: Buffer(TENSOR_1_2, int64, [2, 2], []), TENSOR_2_1: TENSOR_2_3: Buffer(TENSOR_2_2, int64, [2, 2], [])} {
  for (zcv: int32, 0, 2) {
    for (tcu: int32, 0, 2) {
      let cse_var_1: int32 = ((zcv*2) + tcu)
      TENSOR_2[cse_var_1] = (TENSOR_1[cse_var_1]*16129i64)
    }
  }
}

Actual behavior

AssertionError: 
Not equal to tolerance rtol=1e-05, atol=1e-07

Mismatched elements: 3 / 4 (75%)
Max absolute difference: 64512
Max relative difference: 0.999938
 x: array([[0, 1],
       [3, 4]])
 y: array([[    0, 16129],
       [48387, 64516]])

Environment

Operating System: Ubuntu 18.04, TVM version: tag0.9.0 [d361585]

Steps to reproduce

import os
import numpy as np
import tvm
from tvm import te, auto_scheduler, topi
import tvm.testing

TENSOR_0 = te.compute([2,2], lambda pcg,wcv:te.max_value("int8"), name ="TENSOR_0")
TENSOR_1 = te.placeholder([2,2], dtype="int64", name="TENSOR_1")
TENSOR_2 = te.compute([2,2], lambda zcv,tcu:TENSOR_0[zcv,tcu]*TENSOR_0[zcv,tcu]*TENSOR_1[zcv,tcu], name ="TENSOR_2")
s = te.create_schedule(TENSOR_2.op)
tensor_list = [TENSOR_0,TENSOR_1,TENSOR_2]

dev = tvm.cpu(0)
pre_list = []
after_list = []
for tensor in tensor_list:
    shape = [x.value if 'value' in dir(x) and isinstance(x.value, int) else 1 for x in tensor.shape]
    params = (5*np.random.uniform(size=shape)).astype(tensor.dtype)
    pre_list.append(tvm.nd.array(params.copy(), dev))
    after_list.append(tvm.nd.array(params.copy(), dev))

pre_mod = tvm.lower(s, tensor_list, simple_mode=True)
with tvm.transform.PassContext(opt_level=4):
    f = tvm.build(pre_mod)
f(*pre_list)

# Schedule
s[TENSOR_0].compute_inline()

now_mod = tvm.lower(s, tensor_list, simple_mode=True)
with tvm.transform.PassContext(opt_level=4):
    f = tvm.build(now_mod)
f(*after_list)

tvm.testing.assert_allclose(pre_list[2].numpy(), after_list[2].numpy(),rtol=1e-5)
@cxx122 cxx122 changed the title [Bug] Inconsistent caused by 127i8*127i8*numi64 after using compute_inline [Bug] Inconsistency caused by 127i8*127i8*numi64 after using compute_inline Aug 11, 2022
@ganler
Copy link
Contributor

ganler commented Aug 17, 2022

Not sure if this is real bug or an undefined behavior similar to #12377. In C standard (which I guess TVM should be consistent with), signed integer overflow leads to undefined behavior. Therefore, if there is any signed overflow happening, the results are not guaranteed to be "consistent".

@cxx122
Copy link
Author

cxx122 commented Aug 18, 2022

image
I just tried it and it may be the same as #12377. Some optimizations may not be suitable to be applied when there is an overflow.

@ganler
Copy link
Contributor

ganler commented Aug 18, 2022

Thanks for confirming the UB. I think this level of guarantee (i.e., consistency over UB) is often not available in TVM as well as traditional compilers that have been engineered for decades due to its cost. Maybe you can tune the fuzzer a bit to filter false alarms by invalid numerics as well as integer overflows. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants