Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Incorrect buffer offset for vectorized computation #8308

Closed
wrongtest-intellif opened this issue Jun 23, 2021 · 3 comments
Closed

[BUG] Incorrect buffer offset for vectorized computation #8308

wrongtest-intellif opened this issue Jun 23, 2021 · 3 comments
Assignees

Comments

@wrongtest-intellif
Copy link
Contributor

Hi there, we are working on vectorized computation in some tagged storage scope. We encountered the problem on vectorize:
The example is as below:

# before vectorize
for (i: int32, 0, 1024) {
    C[i] = ((float32*)A[i] + (float32*)B[(i])
}
 
# after vectorize (correct now)
C[ramp(0, 1, 1024)] = ((float32x1024*)A[ramp(0, 1, 1024)] + (float32x1024*)B[ramp(0, 1, 1024)])
 
# after storage rewrite (problematic)
Merged[ramp(0, 1, 1024)] = ((float32x1024*)Merged[ramp(0, 1, 1024)] + (float32x1024*)Merged[ramp(1, 1, 1024)])

# after storage rewrite (correct version if we do not vectorize)
for (i: int32, 0, 1024) {
    Merged[i] = ((float32*)Merged[i] + (float32*)Merge[(i + 1024)])
}

Here we use tagged storage scope, thus A,B,C can be merge into single buffer Merged and A and C share same region:

  • A -> Merged[0: 1024]
  • B -> Merged[1024; 2048]
  • C -> Merge[0: 1024]

It seems that after buffer merging and buffer index remap, the ramp node is incorrectly rewrite to
ramp(origin + 1, 1, 1024) instead of ramp(origin + 1024, 1, 1024)

The related implementation seems to be here:
https://github.com/apache/tvm/blob/main/src/tir/transforms/storage_rewrite.cc#L507-L512
where the offset is divided by a factor of datatype lanes (in our case, 1024)

The code to reproduce the problem is as below:

import tvm
from tvm import te
from tvm import testing
import numpy as np

@tvm.register_func("tvm.info.mem.global_mycategory")
def my_mem_info():
    return tvm.ir.make_node(
        "MemoryInfo",
        unit_bits=8,
        max_simd_bits=512,
        max_num_bits=99999,
        head_address=None,
    )

A = te.placeholder([1024], name="A")
B = te.placeholder([1024], name="B")
C = te.compute([1024], lambda i: A[i] + B[i], name="C")
s = te.create_schedule(C.op)
AA = s.cache_read(A, "global_mycategory", readers=[C])
BB = s.cache_read(B, "global_mycategory", readers=[C])
CC = s.cache_write(C, "global_mycategory")
s[CC].vectorize(s[CC].op.axis[0])
print(tvm.lower(s, [A, B, C]))
f = tvm.build(s, [A, B, C], "llvm")
arr_A = tvm.nd.array(np.linspace(0, 1024, 1024).astype("float32"))
arr_B = tvm.nd.array(np.linspace(1024, 2048, 1024).astype("float32"))
arr_C = tvm.nd.array(np.zeros([1024], dtype="float32"))
f(arr_A, arr_B, arr_C)
print(arr_A.asnumpy())
print(arr_B.asnumpy())
print(arr_C.asnumpy())
tvm.testing.assert_allclose(
    arr_C.asnumpy(), arr_A.asnumpy() + arr_B.asnumpy(), rtol=1e-5)
@tqchen
Copy link
Member

tqchen commented Jun 24, 2021

Thanks @wrongtest for reporting the issue, can you help to suggest a fix and send a PR?

@tqchen
Copy link
Member

tqchen commented Jun 24, 2021

also cc @vinx13 @masahi who can help manage the related PRs

@wrongtest-intellif
Copy link
Contributor Author

Hi, there is a fix at #8338. @tqchen @vinx13 @masahi

@vinx13 vinx13 closed this as completed Jul 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants