Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PASS] InstrumentBoundCheckers pass #2079

Merged
merged 1 commit into from
Nov 30, 2018

Conversation

denis0x0D
Copy link
Contributor

@denis0x0D denis0x0D commented Nov 8, 2018

The patch is related to issue:
https://discuss.tvm.ai/t/array-bounds-checking/944

Lets imagine simple example with out of bounds access:

n = tvm.var("n")
 
 A = tvm.placeholder ((n,n,n), name='A')
 B = tvm.placeholder ((n,n,n), name='B')
 C = tvm.compute(A.shape, lambda i,j,k: A[i + 300][j][k] + B[i][j][k], name='C')
 s = tvm.create_schedule (C.op)

We will get IR like this:

 produce C {
    for (i, 0, n) {
      for (j, 0, n) {
        for (k, 0, n) {
          C[((((i*n) + j)*n) + k)] = (A[(((((i + 300)*n) + j)*n) + k)] + B[((((i*n) + j)*n) + k)])
        }
      }
    }
  }

This is actually out of the bounds access, this code could cause a segfault or access to "junk" memory since TMV has native runtime.

Could we set for example, in debug mode, an instrumentation before every memory access with simple check and assert. So at the runtime we can handle it, and make sure our model does not access any invalid memory.

produce C {
   for (i, 0, n) {
     for (j, 0, n) {
       for (k, 0, n) {
         if (((((((((i + 300)*n) + j)*n) + k) < ((n*n)*n)) && (((((i*n) + j)*n) + k) < ((n*n)*n))) && (((((i*n) + j)*n) + k) < ((n*n)*n)))) {
           C[((((i*n) + j)*n) + k)] = (A[(((((i + 300)*n) + j)*n) + k)] + B[((((i*n) + j)*n) + k)])
         } else {
           assert(((((((((i + 300)*n) + j)*n) + k) < ((n*n)*n)) && (((((i*n) + j)*n) + k) < ((n*n)*n))) && (((((i*n) + j)*n) + k) < ((n*n)*n))), "OUT OF BOUNDS")
           1
         }
       }
     }
   }
 }

So, I wrote the pass, which instruments bound check before potentially invalid memory access.

The pass could be enabled by option instrument_bound_checkers=True.
The idea is simple,
at first we associate every buffer_var with actual Buffer shape, as far as I understood, in term of TVM buffer_var is like a pointer to actual Buffer, and then in special pass insert check before memory access.
Also new buffer could be created - we check it.
Some buffers could increase size by allocate - we check it.
Could have situation when optimization pass insert an intrinsic tvm.if_then_else - we check it.

I think it could solve the problem mentioned in the issue.

Could someone please review and give me any feedback.
Thanks.

@denis0x0D denis0x0D force-pushed the sandbox/bound_checkers branch 5 times, most recently from 0240240 to ff7d970 Compare November 9, 2018 14:37
@tqchen
Copy link
Member

tqchen commented Nov 9, 2018

Thanks for the contribution, please request reviews

@denis0x0D
Copy link
Contributor Author

Hi @tqchen can you please review the patch.
Thanks.

@tqchen
Copy link
Member

tqchen commented Nov 9, 2018

I would love to and will do it when I have time:) but because I cannot personally review every PRs, please also request reviews from other reviewers :)

@denis0x0D
Copy link
Contributor Author

@tqchen got it :)
Hi @tmoreau89 @ZihengJiang @yzhliu can you please review the patch.
Thanks.

@ajtulloch
Copy link
Contributor

ajtulloch commented Nov 10, 2018

One thing I'm curious about - have you considered instead (or in addition) adding an option for generating code for LLVM backends with address sanitizer? It seems pretty simple to add (https://github.com/halide/Halide/blob/1d471591077939fda9d9cbb77f92c6322cf28206/src/CodeGen_LLVM.cpp#L1094-L1104), and would catch these OOB accesses, along with a lot more kinds of bugs (https://clang.llvm.org/docs/AddressSanitizer.html).

I noticed you've contributed to LLVM's sanitizers in the past, so I wonder if you agree this is useful here?

@denis0x0D
Copy link
Contributor Author

@ajtulloch thanks for point this out.
At first I was thinking about to just add ASan's pass to backend,
but IMHO we should be really careful with it, because it could not be a generic solution:

  1. ASan alongside with instrumentation pass requires runtime part, which allocates shadow memory and manages it like poison/unpoison and so on. The place of the shadow memory depends on your OS and architecture, we can face really hard bugs with it.
    I personally faced some issues:
    a. ASan's runtime was not initialized before instrumented binary/library - which cause a segfault.
    b. GDB maps binary with PIE to ASan's shadow on 32 bit ARM Tizen OS.
    c. Broken "fast unwinder" on 32 bit thumb https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00312.html?fbclid=IwAR0hP_a3kFdCZbwMq07kjQnhoPdfg_3MZZya9roPYy8l8I46rpZjgj3TMM4
    And so.
  2. ASan officially does support 32 bit and 64 bit ARM only for Android, so other OS requires to have patched version of ASan's runtime.
  3. GPU support? Depends on how it allocates memory, some issue https://devtalk.nvidia.com/default/topic/1037466/cuda-programming-and-performance/cuda-runtime-library-and-addresssanitizer-incompatibilty/

So, IMHO that every architecture which TVM supports and every new architecture would require to custom ASan's runtime, but otherwise if we need just x86/x64 it should be ok.
Thanks.

python/tvm/build_module.py Outdated Show resolved Hide resolved
@tmoreau89
Copy link
Contributor

Overall this is very good work @denis0x0D! Does it support bound checking on vectorized loops?

python/tvm/build_module.py Outdated Show resolved Hide resolved
python/tvm/build_module.py Outdated Show resolved Hide resolved
src/pass/bound_checker.cc Outdated Show resolved Hide resolved
@denis0x0D
Copy link
Contributor Author

@tqchen @tmoreau89 thanks for review, I've updated patch regarding to your suggestions.

@tmoreau89
Copy link
Contributor

great thank you for addressing my comments @denis0x0D!

namespace tvm {
namespace ir {
struct BoundCheckerManager {
std::unordered_map<const Variable *, Array<Expr>> mem_to_buffer;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of global singleton to pass information between passes seems to be dangerous, as:

  • The const Variable* can get de-allocated and re-allocated to a different variable
  • It makes the pass less self-contained, need to depend on context information.

I would try two possible ways to alleviate this:

@tqchen
Copy link
Member

tqchen commented Nov 11, 2018

The use of global singleton to pass information between passes seems to be dangerous, because:

  • The const Variable* can get de-allocated and re-allocated to a different variable
  • It makes the pass less self-contained, need to depend on context information.

I would recommend two possible ways to alleviate this:

@denis0x0D
Copy link
Contributor Author

@tqchen @tmoreau89 thanks for review.
Looks like I need to learn TVM IR more precisely, I didn't really know about how const Variable* could be reused internally.
The problem I've tried to solve with global hashtable - to map actual size of the buffer to buffer_var, because the actual size could be increased by other passes, which could cause a false positive. Also, I might miss this mechanism which could helps to map buffer_var to actual buffer, it might be AttrStmt.

In other case, IMHO, it seems like a normal practice to pass some context depended info between passes in other compilers, for example GCC UBSAN check for object size, at first walk on control flow graph for every basic block and insert internal call, if needed, with info like ptr to OBJECT, builtin __builtin_object_size which helps to get actual size - https://github.com/gcc-mirror/gcc/blob/master/gcc/ubsan.c#L2213
and then expand it and insert check
https://github.com/gcc-mirror/gcc/blob/master/gcc/ubsan.c#L925
Also this tool has out of bounds check, which works pretty match the same way, but the bound could not be changed between passes https://github.com/gcc-mirror/gcc/blob/master/gcc/ubsan.c#L687

@denis0x0D
Copy link
Contributor Author

@tqchen @tmoreau89 @yzhliu I've updated patch. Now it based on AttrStmt, could you please review it.
Thanks.

@@ -96,11 +96,15 @@ class StorageFlattener : public IRMutator {
return this->Mutate(op->body);
} else if (op->attr_key == attr::opengl_stage_scope) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we pass create_bound_attributes as argument of storage flatten?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fixed, but this also required to change public api for Storage Flatten pass and some tests.

// The size of cacheline
int cache_line_size_;
// The current stage is an OpenGL shader.
bool is_opengl_{false};
// Whether to mark load/store with theirs bounds.
bool create_bound_attributes{false};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Google C style, private member names end with underscore _

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

@tqchen
Copy link
Member

tqchen commented Nov 14, 2018

@xqdan @sgrechanik-h @ZihengJiang can you also help review this PR?

@tqchen
Copy link
Member

tqchen commented Nov 14, 2018

To follow up the discussion @denis0x0D

  • Variable* get reused:
    • If we transformed the IR and the original Variable* get de-allocated(because of the IR get transformed), then the transformed IR allocate another Variable*, there is a tiny chance that the de-allocated space gets allocated to the same Variable* again.
  • Pass context info between passes: I agree that compiler passes should pass information to each other. The only question is that how this context info be passed, we could do
    • Global singleton context state
    • Context object
    • Embed the info in the IR

The global context object probably will cause a hard time for the programmer to reason about things because it depends on the exact order of the pass. Imagine if we do another pass in parallel and both of them write to the same global singleton context, then there would be some problem.

When possible, use an explicit context info object or embed the information into the IR will make the reasoning easier, since the pass writer only has to consider how to implement the exact signature of the function.

@xqdan
Copy link
Contributor

xqdan commented Nov 26, 2018

thanks!
for loop partition, please make sure we can insert correct assert for tail loop body, that is the thing I wanna emphasize. I've added a comment for that.

for inject copy intrin pass, the reason of failure is the if stmt inserted, we need to support assert if in this pass. we can fix this issue in another PR.

@xqdan
Copy link
Contributor

xqdan commented Nov 26, 2018

As far as I understood even for accelerators such as VTA the "storage flatten" pass should come at first, but I might be wrong, please fix me in this case.

Tensorize is in scheduleOps pass, which is before storage flatten. inject copy is another codegen pass which is after flatten, looks not hard to work with this pr. but for tensorize, we need more discussion.

tvm.testing.assert_allclose(c.asnumpy(), a.asnumpy() + 1)

def test_in_bounds_loop_partition_basic_llvm():
n = tvm.var('n')
Copy link
Contributor

@xqdan xqdan Nov 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add another case for const loop? also enable cfg.partition_const_loop, so we can verify this PR for tail part. we may need to check the IR generated of tail part after InstrumentBoundCheckers pass.
https://github.com/dmlc/tvm/blob/master/python/tvm/build_module.py#L353

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xqdan Added new test cases on const loop with check for TVM ir.

@denis0x0D
Copy link
Contributor Author

denis0x0D commented Nov 27, 2018

@xqdan

for loop partition, please make sure we can insert correct assert for tail loop body, that is the thing I wanna emphasize. I've added a comment for that.

I've added more tests cases. The condition for the branch should always be correct, because it relies on index from load/store.

So the algorithm is straightforward -
at first, in StorageFlatten pass we mark every load/store instruction with their bounds by AttrStmt, next
we walk recursively on created IR and collect all created AttrStmt and after that we run instrumentation pass, which inserts branch with condition for every load and store, the condition consists of two parts

  • Lower bound index of the load/store should be always greater then 0, because of normalization.
  • Upper bound index of the load/store should be always less then actual size of buffer associated with that load/store. The size of the buffer could be increased by Allocate node, we check that too.

Example of the condition for spliting const loop :

 368 produce compute {
 369   for (i.outer, 0, 5) {
 370     for (i.inner, 0, 4) {
 371       // attr [compute] buffer_bound = 21
 372       // attr [B] buffer_bound = 21
 373       // attr [A] buffer_bound = 21
 374       if (((((int64(((i.outer*4) + i.inner)) >= (int64)0) && (int64(((i.outer*4) + i.inner)) < int64(21)))      && ((int64(((i.outer*4) + i.inner)) >= (int64)0) && (int64(((i.outer*4) + i.inner)) < int64(21)))) && ((int     64(((i.outer*4) + i.inner)) >= (int64)0) && (int64(((i.outer*4) + i.inner)) < int64(21))))) {
 375         compute[((i.outer*4) + i.inner)] = (A[((i.outer*4) + i.inner)] + B[((i.outer*4) + i.inner)])
 376       } else {
 377         assert(((((int64(((i.outer*4) + i.inner)) >= (int64)0) && (int64(((i.outer*4) + i.inner)) < int64(2     1))) && ((int64(((i.outer*4) + i.inner)) >= (int64)0) && (int64(((i.outer*4) + i.inner)) < int64(21)))) &&      ((int64(((i.outer*4) + i.inner)) >= (int64)0) && (int64(((i.outer*4) + i.inner)) < int64(21)))), "OUT OF TH     E BOUNDS")
 378         1
 379       }
 380     }
 381   }
 382   for (i.inner, 0, 1) {
 383     // attr [compute] buffer_bound = 21
 384     // attr [B] buffer_bound = 21
 385     // attr [A] buffer_bound = 21
 386     if (((((int64((i.inner + 20)) >= (int64)0) && (int64((i.inner + 20)) < int64(21))) && ((int64((i.inner      + 20)) >= (int64)0) && (int64((i.inner + 20)) < int64(21)))) && ((int64((i.inner + 20)) >= (int64)0) && (in     t64((i.inner + 20)) < int64(21))))) {
 387       compute[(i.inner + 20)] = (A[(i.inner + 20)] + B[(i.inner + 20)])
 388     } else {
 389       assert(((((int64((i.inner + 20)) >= (int64)0) && (int64((i.inner + 20)) < int64(21))) && ((int64((i.i     nner + 20)) >= (int64)0) && (int64((i.inner + 20)) < int64(21)))) && ((int64((i.inner + 20)) >= (int64)0) &     & (int64((i.inner + 20)) < int64(21)))), "OUT OF THE BOUNDS")
 390       1
 391     }
 392   }
 393 }

Tensorize is in scheduleOps pass, which is before storage flatten. inject copy is another codegen pass which is after flatten, looks not hard to work with this pr. but for tensorize, we need more discussion.

@xqdan thanks for example

for inject copy intrin pass, the reason of failure is the if stmt inserted, we need to support assert if in this pass. we can fix this issue in another PR.

@tmoreau89 @xqdan It looks like the support for VTA backed is more complicated then I was thinking.
I'm not really sure how the check should be implemented in case the default TVM IR is mapping to VTA instrinsics, should it be like

VTACheckMemAccess

which inserts before VTALoadBuffer*, and checks the out of the bound, but I'm not sure about this.

  1 // attr [A_buf] storage_scope = "local.acc_buffer"
  2 // attr [iter_var(vta, , vta)] coproc_scope = 2
  3 produce A_buf {
  4   VTALoadBuffer2D(tvm_thread_context(VTATLSCommandHandle()), A, 0, 64, 1, 64, 0, 0, 0, 0, 0, 3)
  5 }
  6 produce B_buf {
  7   VTALoadBuffer2D(tvm_thread_context(VTATLSCommandHandle()), B, 0, 64, 1, 64, 0, 0, 0, 0, 64, 3)
  8 }
  9 // attr [iter_var(vta, , vta)] coproc_uop_scope = "VTAPushALUOp"
 10 produce C_buf {
 11   VTAUopLoopBegin(64, 1, 1, 0)
 12   VTAUopPush(1, 0, 64, 0, 0, 2, 0, 0)
 13   VTAUopLoopEnd()
 14 }
 15 vta.coproc_dep_push(2, 3)
 16 // attr [iter_var(vta, , vta)] coproc_scope = 3
 17 vta.coproc_dep_pop(2, 3)
 18 produce C {
 19   VTAStoreBuffer2D(tvm_thread_context(VTATLSCommandHandle()), 64, 4, C, 0, 64, 1, 64)
 20 }
 21 vta.coproc_sync()
 22
 23 Initialize VTACommandHandle...
 24 Successful vector add test!
 25 Close VTACommandhandle...

@tqchen @tmoreau89 @ZihengJiang @yzhliu @xqdan @sgrechanik-h Anyway should we create another PR to support Bound Checkers for VTA ?
Thanks.

@xqdan
Copy link
Contributor

xqdan commented Nov 27, 2018

LGTM. thanks for your work!

Copy link
Member

@tqchen tqchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick comments, it is possible for us to have zero-dimensional tensors, we do need to confirm if we handle this case correctly

Stmt StorageFlatten(Stmt stmt,
Map<Tensor, Buffer> extern_buffer,
int cache_line_size);
Stmt StorageFlatten(Stmt stmt, Map<Tensor, Buffer> extern_buffer,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, one line per argument

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, fixed. BTW should TVM has its own .clang-format file ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a proposal in #1732 , we generally use Google C style, but haven't yet have our own .clang-format

@@ -429,6 +452,30 @@ class StorageFlattener : public IRMutator {
}
}
};

bool ShapeIsValid(const Array<Expr> &shape) {
if (!shape.size())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is possible for us to have zero-dimensional tensors(scalar), in which case shape.size() == 0

Copy link
Contributor Author

@denis0x0D denis0x0D Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe that it's right way to skip this type of tensors and do not instrument, please fix me if I'm wrong.
I've added some tests cases. The compute op could be more complicated, at first I was thinking if the store accesses buffer with shape == 0, we should skip annotations and instrumentations, but there are could be example where store accesses zero shape buffer, but load accesses not zero shape.
I've added test cases for this situation.
c6083d6#diff-fb278cb819f24c9c0369504de1bc2e01R351
c6083d6#diff-f904c24c2b3bba4f50a565961755b153R434

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, can you add a comment that zero-dimensional tensor does not need boundary check, so it can give context future readers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, added.

@denis0x0D
Copy link
Contributor Author

denis0x0D commented Nov 28, 2018

@xqdan thanks for review.

it is possible for us to have zero-dimensional tensors, we do need to confirm if we handle this case correctly

@tqchen thanks for review.
Yes, I believe that it's right way to skip this type of tensors and do not instrument, please fix me if I'm wrong.
I've added some test cases. The compute op could be more complicated, at first I was thinking if the store accesses buffer with shape == 0, we should skip annotations and instrumentations, but there are could be an example where store accesses zero shape buffer, but load accesses not zero shape.

 A = tvm.placeholder((n, ), name='A')
 scale = tvm.placeholder((), name='scale')
 k = tvm.reduce_axis((0, n), name="k")
 C = tvm.compute((), lambda : tvm.sum(A[k + k + k] * scale, axis=k), name="C")
 D = tvm.compute((), lambda : C + 1)
 s = tvm.create_schedule(D.op)

I've added test cases for this situation.

@denis0x0D denis0x0D force-pushed the sandbox/bound_checkers branch 6 times, most recently from c6083d6 to 683c6aa Compare November 28, 2018 18:49
TVM_REGISTER_API("ir_pass.StorageFlatten")
.set_body([](TVMArgs args, TVMRetValue *ret) {
*ret = StorageFlatten(args[0], args[1], args[2],
args.size() == 3 ? false : args[3]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following would be better (to avoid duplicating the default value in two places)

if (args.size() <= 3) {
        *ret = StorageFlatten(args[0], args[1], args[2]);
} else {
        *ret = StorageFlatten(args[0], args[1], args[2], args[3]);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for review, updated.

}

bool IndexIsValid(const Expr &index) const {
if (!index.defined())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for if spans multiple lines, enclose with {}, or put return in the same line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for review, updated.

Copy link
Member

@tqchen tqchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @denis0x0D , I think it is good modulo a nit comment

@tqchen
Copy link
Member

tqchen commented Nov 29, 2018

cc @tmoreau89

The pass which instruments checkers before
memory accesses (load/store).
This allows to handle invalid memory accesses.

The patch is related to issue:
https://discuss.tvm.ai/t/array-bounds-checking/944
@tmoreau89 tmoreau89 merged commit 2f1d709 into apache:master Nov 30, 2018
FrozenGene pushed a commit to FrozenGene/tvm that referenced this pull request Dec 27, 2018
The pass which instruments checkers before
memory accesses (load/store).
This allows to handle invalid memory accesses.

The patch is related to issue:
https://discuss.tvm.ai/t/array-bounds-checking/944
@ZihengJiang ZihengJiang mentioned this pull request Feb 1, 2019
wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019
The pass which instruments checkers before
memory accesses (load/store).
This allows to handle invalid memory accesses.

The patch is related to issue:
https://discuss.tvm.ai/t/array-bounds-checking/944
wweic pushed a commit to neo-ai/tvm that referenced this pull request Feb 20, 2019
The pass which instruments checkers before
memory accesses (load/store).
This allows to handle invalid memory accesses.

The patch is related to issue:
https://discuss.tvm.ai/t/array-bounds-checking/944
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants