[TIR] Add TIR While node #7425

masahi · 2021-02-09T05:38:02Z

This is an implementation of TIR While node as discussed in RFC https://discuss.tvm.apache.org/t/rfc-add-while-loop-node-to-tir/9028. It supercedes my earlier attempt in #7385.

The PR consists of

IR node definition + boilerplate
Minimal changes to TIR transform passes (so far only modifies storage_rewrite.cc, everything else uses the default visitor)
LLVM and C source codegen
Various test cases
Update CUDA NMS to use while loop

Hybrid script support etc are left for future work.

Now we can write binary search succinctly as follows:

lo[0] = 0
hi[0] = n
v = B[i]

with ib.while_loop(lo[0] < hi[0]):
    mid = lo[0] + (hi[0] - lo[0] >> 1)
    with ib.if_scope(A[mid] < v):
        lo[0] = mid + 1
    with ib.else_scope():
        hi[0] = mid

C[i] = lo[0]

As another nice use of while loop, I added a test that draws a useless mandelbrot set 🙂

@tqchen @junrushao1994 @vinx13 @mbrookhart @zhiics @kevinthesun @anijain2305 @trevor-m

giuseros

LGTM, very nice addition!

tqchen · 2021-02-09T14:36:53Z

Thanks @masahi , before we merge it in. it would be really awesome to go through the current list of passes and check if special handling of while is needed (so we won't bring in new bugs because the mix). Some of the example passes could include (I would at least check passes that need special IfThenElse handling)

For example, I can see the need to update following pass:

Vectorize (we will need to abort if the condition is vectorized)

tqchen · 2021-02-09T14:49:28Z

also cc @zxybazh please help to review this PR

junrushao

Thanks for the PR! It looks good to me :-) Surprisingly it doesn't need to change any passes besides storage_rewrite :-)

junrushao · 2021-02-09T18:33:04Z

CC @spectrometerHBH: we might want to have it supported in TensorIR too, either like a syntactic sugar to opaque binding or other ways

tqchen · 2021-02-09T18:38:28Z

include/tvm/tir/stmt_functor.h

@@ -109,6 +110,7 @@ class StmtFunctor<R(const Stmt& n, Args... args)> {
    IR_STMT_FUNCTOR_DISPATCH(AttrStmtNode);
    IR_STMT_FUNCTOR_DISPATCH(IfThenElseNode);
    IR_STMT_FUNCTOR_DISPATCH(ForNode);
+    IR_STMT_FUNCTOR_DISPATCH(WhileNode);


need checks through the current passes, per my comment

zxybazh

Thanks @masahi! Looks good to me.

masahi · 2021-02-10T13:24:52Z

@tqchen @junrushao1994 @vinx13

I went through the passes and here is my summary:

VectorizeLoop: Need to disallow a while loop inside a vectorized loop. Without it, no errors occurs during lowering but the lowered code is incorrect. Add a test case test_vectorize_while_fail() to make sure we error out in such cases
StorageAccessVisitor: I don't understand what it does, but added a special visitor for While following the existing visitor for IfThenElse. Please check 1e629b6
CoProcSync and LiftAttrScope: They both have special visitor for IfThenElse, but I don't understand them. They are only used by VTA, for now I just error out if we find WhileNode there. See a71066d and 00c17d9
InjectVirtualThread: I think we need some special handling for this, but I don't know what it should be. For now I just added a placeholder and call the base class visitor. See 896b02f and let me know what we should do here.

Do we need to change MergeNest? I haven't touched it for now

tvm/src/tir/transforms/ir_utils.cc

Lines 35 to 59 in 7340c02

    
           Stmt MergeNest(const std::vector<Stmt>& nest, Stmt body) { 
        
             // use reverse iteration 
        
             for (auto ri = nest.rbegin(); ri != nest.rend(); ++ri) { 
        
               Stmt s = *ri; 
        
               if (const auto* for_ = s.as<ForNode>()) { 
        
                 auto n = make_object<ForNode>(*for_); 
        
                 ICHECK(is_no_op(n->body)); 
        
                 n->body = body; 
        
                 body = Stmt(n); 
        
               } else if (const auto* let = s.as<LetStmtNode>()) { 
        
                 auto n = make_object<LetStmtNode>(*let); 
        
                 ICHECK(is_no_op(n->body)); 
        
                 n->body = body; 
        
                 body = Stmt(n); 
        
               } else if (const auto* attr = s.as<AttrStmtNode>()) { 
        
                 auto n = make_object<AttrStmtNode>(*attr); 
        
                 ICHECK(is_no_op(n->body)); 
        
                 n->body = body; 
        
                 body = Stmt(n); 
        
               } else if (const auto* ite = s.as<IfThenElseNode>()) { 
        
                 auto n = make_object<IfThenElseNode>(*ite); 
        
                 ICHECK(is_no_op(n->then_case)); 
        
                 ICHECK(!n->else_case.defined()); 
        
                 n->then_case = body; 
        
                 body = Stmt(n);

Probably we don't need to change hoist_if_then_else.cc and loop_partition.cc. We can do something in remove_no_op.cc, but I think it is not important.

masahi · 2021-02-12T05:05:49Z

@tqchen Can you have a look?

src/tir/transforms/inject_virtual_thread.cc

tqchen · 2021-02-12T15:56:37Z

I left a comment for inject virtual thread, @junrushao1994 @ZihengJiang @vinx13 would be great if you can also help check the StorageAccessVisitor

vinx13 · 2021-02-12T19:23:02Z

I've checked StorageAccessVisitor and it looks good to me.InplaceOpVerifier, StoragePlanRewriter also need handling.

masahi · 2021-02-12T20:45:48Z

@vinx13 Ok, For InplaceOpVerifier I think I need to update

tvm/src/tir/transforms/storage_rewrite.cc

Lines 241 to 251 in 7340c02

    
           if (stmt->IsInstance<AttrStmtNode>()) { 
        
             VisitStmt_(static_cast<const AttrStmtNode*>(stmt)); 
        
           } else if (stmt->IsInstance<ForNode>()) { 
        
             VisitStmt_(static_cast<const ForNode*>(stmt)); 
        
           } else if (stmt->IsInstance<IfThenElseNode>()) { 
        
             VisitStmt_(static_cast<const IfThenElseNode*>(stmt)); 
        
           } else if (stmt->IsInstance<StoreNode>()) { 
        
             VisitStmt_(static_cast<const StoreNode*>(stmt)); 
        
           } else { 
        
             return false; 
        
           }

But I don't see how we should update StoragePlanRewriter. Maybe here?

tvm/src/tir/transforms/storage_rewrite.cc

Lines 757 to 773 in 7340c02

    
           // enter/exit new scope 
        
           if (s.stmt->IsInstance<AttrStmtNode>()) { 
        
             const auto* op = static_cast<const AttrStmtNode*>(s.stmt); 
        
             if (op->attr_key == attr::thread_extent || op->attr_key == attr::virtual_thread || 
        
                 attr::IsPragmaKey(op->attr_key)) { 
        
               PlanNewScope(op); 
        
             } else { 
        
               ICHECK(op->attr_key == attr::extern_scope); 
        
             } 
        
           } else if (s.stmt->IsInstance<ForNode>()) { 
        
             const auto* op = static_cast<const ForNode*>(s.stmt); 
        
             if (op->kind == ForKind::kParallel) { 
        
               if (thread_scope_ == nullptr || thread_scope_ == op) { 
        
                 PlanNewScope(op); 
        
               } 
        
             } 
        
           }

tqchen · 2021-02-12T20:53:27Z

Thanks @masahi , it would also be great for you to spend a bit more time to look into these passes :) It certainly takes more time, but we will also have more experts in TIR passes :)

Please also consider to add a test case to the passes that need while handling

vinx13 · 2021-02-12T21:07:01Z

@masahi For StoragePlanRewriter, we need to do something similar to ForNode

tvm/src/tir/transforms/storage_rewrite.cc

Lines 440 to 452 in 7340c02

    
           Stmt VisitStmt_(const ForNode* op) final { 
        
             ICHECK(op->kind != ForKind::kVectorized) << "VectorizeLoop before LiftStorageAlloc"; 
        
             // remake all the allocation at the attach scope. 
        
             if (attach_map_.count(op)) { 
        
               auto& svec = attach_map_[op]; 
        
               Stmt stmt = StmtExprMutator::VisitStmt_(op); 
        
               op = stmt.as<ForNode>(); 
        
               return For(op->loop_var, op->min, op->extent, op->kind, MakeAttach(svec, op->body), 
        
                          op->thread_binding, op->annotations); 
        
             } else { 
        
               return StmtExprMutator::VisitStmt_(op); 
        
             } 
        
           }

masahi · 2021-02-12T21:18:40Z

ok, to me it's not obvious what it is doing, time for another deep dive...

masahi · 2021-02-16T12:13:28Z

@tqchen @vinx13 @junrushao1994 Does the behavior of While node wrt StorageRewrite below look reasonable?

In the following IR, "A" and "B" buffers, which are allocated in For loop, are coalesced into a one buffer, but "C" buffer, which is allocated inside While loop, is not:

def test_parallel_alloc():
    ib = tvm.tir.ir_builder.create()
    n = te.var("n")
    with ib.for_range(0, n, name="i", kind="parallel") as i:
        with ib.for_range(0, 10, name="j") as j:
            A = ib.allocate("float32", n, name="A", scope="global")
            A[j] = A[j] + 2

        with ib.for_range(0, 10, name="j") as j:
            B = ib.allocate("float32", n, name="B", scope="global")
            B[j] = B[j] + 2

        i = ib.allocate("int32", (1,), name="i", scope="local")
        i[0] = 1
        with ib.while_loop(i[0] < 10):
            C = ib.allocate("float32", n, name="C", scope="local")
            C[i[0]] = C[i[0]] + 2
            i[0] += 1

parallel (i, 0, n) {
  // attr [A] storage_scope = "global"
  allocate A[float32 * n]
  // attr [i] storage_scope = "local"
  allocate i[int32 * 1]
  // attr [C] storage_scope = "local"
  allocate C[float32 * n]
  for (j, 0, 10) {
    A[j] = (A[j] + 2f)
  }
  for (j, 0, 10) {
    A[j] = (A[j] + 2f)
  }
  i[0] = 1
  while((i[0] < 10)){
    C[i[0]] = (C[i[0]] + 2f)
    i[0] = (i[0] + 1)
  }
}

In the following IR, all buffers, including the one allocated inside While loop, are coalesced:

def test_alloc_seq():
    scope_tb = "local.L0A"
    max_bits = 1024 * 1024 * 1024

    register_mem(scope_tb, max_bits)

    ib = tvm.tir.ir_builder.create()
    n = te.var("n")
    with ib.for_range(0, n, name="i") as i:
        with ib.for_range(0, 10, name="j") as j:
            A = ib.allocate("float32", 200, name="A", scope=scope_tb)
            A[j] = 1.2
        with ib.for_range(0, 10, name="j") as j:
            B = ib.allocate("float32", 200, name="B", scope=scope_tb)
            B[j] = 1.3

        i = ib.allocate("int32", (1,), name="i", scope="local")
        i[0] = 1
        with ib.while_loop(i[0] < 10):
            C = ib.allocate("float32", 200, name="C", scope=scope_tb)
            C[i[0]] = 1.4
            i[0] += 1

    body = ib.get()

// attr [A] storage_scope = "local.L0A"
allocate A[float32 * 200]
// attr [i] storage_scope = "local"
allocate i[int32 * 1]
for (i, 0, n) {
  for (j, 0, 10) {
    A[j] = 1.2f
  }
  for (j, 0, 10) {
    A[j] = 1.3f
  }
  i[0] = 1
  while((i[0] < 10)){
    A[i[0]] = 1.4f
    i[0] = (i[0] + 1)
  }
}

tqchen · 2021-02-22T15:31:21Z

@vinx13 can you please take another look at the PR and manage?

tqchen

Thanks @masahi ! the change has addressed my previous comments. Please add testcases to transforms that touches requires special While handling to cover these passes

ZihengJiang

Nice work! Thanks @masahi

masahi · 2021-03-02T07:37:52Z

@tqchen @junrushao1994 @vinx13 @ZihengJiang @zxybazh

I came to a conclusion that While node doesn't need a special handling in storage_rewrite.

The first observation is that even if I remove all ForNode handling from StoragePlanRewriter, all tests in test_tir_transform_storage_rewrite.py except test_parallel_alloc() pass.

If we look at the visitor for ForNode,

tvm/src/tir/transforms/storage_rewrite.cc

Lines 440 to 452 in 7340c02

    
           Stmt VisitStmt_(const ForNode* op) final { 
        
             ICHECK(op->kind != ForKind::kVectorized) << "VectorizeLoop before LiftStorageAlloc"; 
        
             // remake all the allocation at the attach scope. 
        
             if (attach_map_.count(op)) { 
        
               auto& svec = attach_map_[op]; 
        
               Stmt stmt = StmtExprMutator::VisitStmt_(op); 
        
               op = stmt.as<ForNode>(); 
        
               return For(op->loop_var, op->min, op->extent, op->kind, MakeAttach(svec, op->body), 
        
                          op->thread_binding, op->annotations); 
        
             } else { 
        
               return StmtExprMutator::VisitStmt_(op); 
        
             } 
        
           }

it only does something special when attach_map_ has an entry for this node. Here comes the second observation: the only case whereattach_map_ can have an entry for ForNode is if this ForNode is a parallel for loop, due to these lines:

tvm/src/tir/transforms/storage_rewrite.cc

Lines 766 to 772 in 7340c02

    
           } else if (s.stmt->IsInstance<ForNode>()) { 
        
             const auto* op = static_cast<const ForNode*>(s.stmt); 
        
             if (op->kind == ForKind::kParallel) { 
        
               if (thread_scope_ == nullptr || thread_scope_ == op) { 
        
                 PlanNewScope(op); 
        
               } 
        
             }

Together, these two handler for ForNode lift allocation inside an inner loop and attach merged allocation under the parallel loop scope (via MakeAttach function at

tvm/src/tir/transforms/storage_rewrite.cc

Line 447 in 7340c02

    
           return For(op->loop_var, op->min, op->extent, op->kind, MakeAttach(svec, op->body),

). This is what's tested in test_parallel_alloc(). For other kinds of For loop, a merged allocation is placed at the global scope, see

tvm/src/tir/transforms/storage_rewrite.cc

Lines 457 to 461 in 7340c02

    
           struct StorageEntry { 
        
             // The scope that this alloc attaches after 
        
             // For shared/local memory it is beginning of the thread extent. 
        
             // for global memory it is nullptr, means beginning of everything. 
        
             const Object* attach_scope_{nullptr};

.

Since While node doesn't involve threading, I think we can always lift allocation done inside While loop into the global scope. That means WhileNode should be handled in the same way non-parallel ForNode are handled, i.e. we don't need a special handling logic for WhileNode. Two simple test cases involving While loop are added in

tvm/tests/python/unittest/test_tir_transform_storage_rewrite.py

Line 301 in c3af5ae

def test_while_alloc():

to test allocation is attached at the right scope after storage_rewrite.

I think I nailed it, thoughts?

vinx13 · 2021-03-02T07:59:40Z

@masahi You are right, thanks for looking into this

junrushao · 2021-03-02T08:04:10Z

That makes sense to me. Thanks for diving deep into this issue!

masahi · 2021-03-02T19:34:02Z

cc @tqchen please take a look

tqchen · 2021-03-02T20:02:13Z

@masahi you are right that the MakeAttach is only needed for parallel for loop, where we can nolonger lift the memory to the outside(otherwise the memory won't be thread local)

tqchen · 2021-03-02T20:02:43Z

@junrushao1994 @vinx13 please help to manage the PR

tqchen

One minor comment

tests/python/unittest/test_tir_ir_builder.py

masahi · 2021-03-03T01:12:05Z

@junrushao1994 @vinx13 @tqchen ready to merge...!!

vinx13 · 2021-03-03T01:32:06Z

Thanks everyone @masahi @tqchen @junrushao1994 @giuseros @zxybazh @ZihengJiang

junrushao · 2021-03-03T02:10:35Z

Really awesome work!!!

masahi · 2021-03-03T02:29:46Z

Thank you very much for the reviews!!

* add while node * update visitors * binary search lowering works * llvm codegen working * cuda codegen working * nms updated to use while loop * add missing upper bound check too * add mandelbrot test * add gpu mandel commit ee2363b Author: Masahiro Masuda <masahi129@gmail.com> Date: Fri Jan 29 11:44:02 2021 +0900 enable extern lib offload for nvptx * rename test * run black * add doc * add collatz test * add while + vectorize test * simplify bin search * Add special case visit method to storage_access.cc * disallow while loop inside vectorized loop * disallow trivial condition since we do not have break * error out in CoprocSync for now * error out LiftAttrScope for now * add placeholder to inject_vpthread * refactor to use MakeAttach * handle WhileNode in InplaceOpVerifier * error out in InjectVirtualThread * try handle WhileNode in StoragePlanRewriter * remove WhileNode visitor from storage rewrite * add while loop storage rewrite test * update tests * move test_vectorize_while_fail to test_tir_transform_vectorize.py

masahi marked this pull request as ready for review February 9, 2021 13:06

masahi mentioned this pull request Feb 9, 2021

[TIR] Add additional termination condition to For node to enable While loop like feature #7385

Closed

masahi assigned vinx13 and junrushao Feb 9, 2021

giuseros approved these changes Feb 9, 2021

View reviewed changes

junrushao approved these changes Feb 9, 2021

View reviewed changes

tqchen requested changes Feb 9, 2021

View reviewed changes

zxybazh reviewed Feb 9, 2021

View reviewed changes

masahi force-pushed the tir-while branch from 3a8eef6 to 896b02f Compare February 10, 2021 13:07

tqchen reviewed Feb 12, 2021

View reviewed changes

src/tir/transforms/inject_virtual_thread.cc Outdated Show resolved Hide resolved

masahi force-pushed the tir-while branch from 896b02f to b044e1c Compare February 13, 2021 02:32

masahi marked this pull request as draft February 15, 2021 22:19

masahi marked this pull request as ready for review February 15, 2021 22:57

tqchen added status: need review status: need test case need test cases to cover the change labels Feb 22, 2021

tqchen reviewed Feb 22, 2021

View reviewed changes

ZihengJiang approved these changes Feb 23, 2021

View reviewed changes

remove WhileNode visitor from storage rewrite

f442ecc

masahi force-pushed the tir-while branch from 3a0c3bb to 6cb0dca Compare March 2, 2021 06:51

add while loop storage rewrite test

3012876

masahi force-pushed the tir-while branch from 6cb0dca to 3012876 Compare March 2, 2021 07:15

update tests

c3af5ae

tqchen approved these changes Mar 2, 2021

View reviewed changes

tqchen requested changes Mar 2, 2021

View reviewed changes

tests/python/unittest/test_tir_ir_builder.py Outdated Show resolved Hide resolved

move test_vectorize_while_fail to test_tir_transform_vectorize.py

35b8e28

tqchen approved these changes Mar 2, 2021

View reviewed changes

junrushao approved these changes Mar 2, 2021

View reviewed changes

vinx13 merged commit cf36aa6 into apache:main Mar 3, 2021

vinx13 added status: accepted and removed status: need review status: need test case need test cases to cover the change status: need update need update based on feedbacks labels Mar 3, 2021

masahi mentioned this pull request Mar 3, 2021

[SPIR-V] Add SPIR-V lowering for While node #7574

Merged

junrushao mentioned this pull request Aug 11, 2021

[TIR][USMP] Added buffer info extraction pass #8468

Merged

junrushao mentioned this pull request Nov 1, 2021

Apache TVM v0.8 Release Note Candidate #9416

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TIR] Add TIR While node #7425

[TIR] Add TIR While node #7425

masahi commented Feb 9, 2021 •

edited

Loading

giuseros left a comment

tqchen commented Feb 9, 2021 •

edited

Loading

tqchen commented Feb 9, 2021

junrushao left a comment

junrushao commented Feb 9, 2021

tqchen Feb 9, 2021

zxybazh left a comment

masahi commented Feb 10, 2021 •

edited

Loading

masahi commented Feb 12, 2021

tqchen commented Feb 12, 2021

vinx13 commented Feb 12, 2021 •

edited

Loading

masahi commented Feb 12, 2021

tqchen commented Feb 12, 2021 •

edited

Loading

vinx13 commented Feb 12, 2021

masahi commented Feb 12, 2021

masahi commented Feb 16, 2021

tqchen commented Feb 22, 2021

tqchen left a comment •

edited

Loading

ZihengJiang left a comment

masahi commented Mar 2, 2021 •

edited

Loading

vinx13 commented Mar 2, 2021

junrushao commented Mar 2, 2021

masahi commented Mar 2, 2021

tqchen commented Mar 2, 2021

tqchen commented Mar 2, 2021

tqchen left a comment

masahi commented Mar 3, 2021

vinx13 commented Mar 3, 2021

junrushao commented Mar 3, 2021

masahi commented Mar 3, 2021

[TIR] Add TIR While node #7425

[TIR] Add TIR While node #7425

Conversation

masahi commented Feb 9, 2021 • edited Loading

giuseros left a comment

Choose a reason for hiding this comment

tqchen commented Feb 9, 2021 • edited Loading

tqchen commented Feb 9, 2021

junrushao left a comment

Choose a reason for hiding this comment

junrushao commented Feb 9, 2021

tqchen Feb 9, 2021

Choose a reason for hiding this comment

zxybazh left a comment

Choose a reason for hiding this comment

masahi commented Feb 10, 2021 • edited Loading

masahi commented Feb 12, 2021

tqchen commented Feb 12, 2021

vinx13 commented Feb 12, 2021 • edited Loading

masahi commented Feb 12, 2021

tqchen commented Feb 12, 2021 • edited Loading

vinx13 commented Feb 12, 2021

masahi commented Feb 12, 2021

masahi commented Feb 16, 2021

tqchen commented Feb 22, 2021

tqchen left a comment • edited Loading

Choose a reason for hiding this comment

ZihengJiang left a comment

Choose a reason for hiding this comment

masahi commented Mar 2, 2021 • edited Loading

vinx13 commented Mar 2, 2021

junrushao commented Mar 2, 2021

masahi commented Mar 2, 2021

tqchen commented Mar 2, 2021

tqchen commented Mar 2, 2021

tqchen left a comment

Choose a reason for hiding this comment

masahi commented Mar 3, 2021

vinx13 commented Mar 3, 2021

junrushao commented Mar 3, 2021

masahi commented Mar 3, 2021

masahi commented Feb 9, 2021 •

edited

Loading

tqchen commented Feb 9, 2021 •

edited

Loading

masahi commented Feb 10, 2021 •

edited

Loading

vinx13 commented Feb 12, 2021 •

edited

Loading

tqchen commented Feb 12, 2021 •

edited

Loading

tqchen left a comment •

edited

Loading

masahi commented Mar 2, 2021 •

edited

Loading