Decoupling AOT from graph memory planner #8096

giuseros · 2021-05-20T15:57:01Z

In this PR we are decoupling AOT from the Graph Memory Planner. Since AOT has the runner expressed in TIR we can get rid of the GMP in relay and use the Storage Rewrite Pass to do memory planning on the runner function. This also sorts out the issue mentioned in #8062

giuseros · 2021-05-20T15:57:37Z

cc: @manupa-arm @MatthewARM @mehrdadh @areusch

mehrdadh · 2021-05-20T18:42:49Z

thanks @giuseros for working on this! I tested it and it solves the issue that I had. So I close the issue assuming this will merge sometimes soon.

manupak

Thanks for the quick fix @giuseros !

My comments are mostly about documentation and clarification questions.

Also I think you wanted to mention : @mbaret and not Matthew Bentham (@MatthewARM ) :)

src/relay/backend/aot_executor_codegen.cc

u99127 · 2021-05-21T11:31:09Z

thanks @giuseros for working on this! I tested it and it solves the issue that I had. So I close the issue assuming this will merge sometimes soon.

I think we should close issues when things merge not before that actually happens.

giuseros · 2021-05-21T15:10:08Z

@manupa-arm I applied your comments. Sorry, this was a draft that I quickly turned into a PR, hence the in-elegance of the code. Please have another look and let me know.

manupak

Broadly looking good!.

Just one more question.

src/relay/backend/aot_executor_codegen.cc

manupak

LGTM

src/relay/backend/aot_executor_codegen.cc

tests/python/relay/aot/test_crt_aot.py

giuseros · 2021-05-26T10:22:48Z

Hi @areusch , @manupa-arm , @mehrdadh ,
Quick update on this. We found an issue in the storage rewrite pass. The problem is due to the fact that the packed functions accept TVMValues and not bare pointers. This means that in this TIR:

  let sid_5_value_1: handle = @tir.tvm_stack_alloca("array", 1, dtype=handle)
   @tir.tvm_struct_set(sid_5_value_1, 0, 1, sid_5, dtype=handle)
   let sid_4_value: handle = @tir.tvm_stack_alloca("array", 1, dtype=handle)
   @tir.tvm_struct_set(sid_4_value, 0, 1, sid_4, dtype=handle)
    {
     @tir.tvm_call_cpacked("fused_transpose", sid_5_value_1, sid_4_value, dtype=int32)
   }

The rewriter is able to identify sid_5_value_1 and sid_4_value as both live variables, and they are not overridden. But it fails to see that those variable relate to sid_5 and sid_4 (respectively).

Early on, in the calls:

@tir.tvm_struct_set(sid_5_value_1, 0, 1, sid_5, dtype=handle)
@tir.tvm_struct_set(sid_4_value, 0, 1, sid_4, dtype=handle)

The variables sid_5 and sid_4 are treated as separate variables (that are NOT live at the same time) and hence get overridden. In the last commit I explicitly create a translation table between the TVMValues and the allocations. This seems to fix the issue. I have added also a further test case test_transpose (disabling the op-fusion in Relay) to test this scenario.

giuseros · 2021-06-03T17:42:51Z

Hi @areusch , @manupa-arm ,
I was wondering if there was anything else you may want me to address on this PR.

Thanks!

PhilippvK · 2021-06-07T13:34:27Z

thanks @giuseros for working on this! I tested it and it solves the issue that I had. So I close the issue assuming this will merge sometimes soon.

I think we should close issues when things merge not before that actually happens.

I agree with this. I have experienced the same problems as described in #8062 and thought they should be already fixed. I’ve also tried out the proposed changes and they work for me at least for basic use cases.

mehrdadh · 2021-06-07T16:53:11Z

@giuseros there is a lint issue due to black update. You can run "docker/lint.sh python_format" to see the error locally.

giuseros · 2021-06-07T17:19:06Z

Done! Thanks @mehrdadh

manupak

Thanks @giuseros! I dont have further comments.

areusch

sorry for the delay @giuseros , here's a round of comments. this seems overall fine, but i'm curious about the special handling we are adding to StorageRewritePass for the AOT top-level func. i'm also curious whether we intend to eventually abstract this behind the interface from @manupa-arm RFC?

src/relay/backend/aot_executor_codegen.cc

src/tir/transforms/storage_rewrite.cc

src/relay/backend/aot_executor_codegen.cc

src/tir/transforms/storage_rewrite.cc

areusch · 2021-06-07T18:30:36Z

src/tir/transforms/storage_rewrite.cc

+          this->VisitExpr(arg);
+        }
+      }
+    } else if (op->op.same_as(builtin::tvm_struct_set())) {


are these the only two such builtins we need to care about? seems like any access to the data would be affected, no?

So, for called_packed, we don't care about accessing the data (tvm_struct_get), but only setting the data of the TVMValue. But in general it is true that other patterns might be problematic if used.

I am not sure about tvm_struct_get in particular, because you would use the (real) data you extracted, while with tvm_struct_set you use the real data to set the structure and then you pass the structure in the call. This change is only to annotate that the struct is a mere container, and that the rewriter needs to focus on the real data.

I would say that this PR is not trying to fix the rewriter on all the possible corner cases. This particular corner case is affecting AOT, and it needs to be fixed. If there are other corner cases, I would fix them in a separate PR.

i guess another way to view this though is that this PR is adding reference tracking, but reference tracking only works with one particular TIR pattern. i agree with you that adding full reference tracking is a lot to ask particularly considering the goals of this PR. However, there's no documentation (by which I mean no unit test) as to what specifically needs to be tracked. can you add a unit test for the changes you've made to StorageRewrite?

tests/python/relay/aot/test_crt_aot.py

giuseros · 2021-06-07T21:35:20Z

Hi @areusch ,
I applied your changes.

About the storage rewrite changes, I tried to explain more in the code. It's about the fact that in order to call a function, you need to set a struct. The rewriter focuses on the structs (marking them conflicting) but the real allocations are lost (and get reused even when they don't need to be). If you have more questions, feel free to ask.
The point about testing is that it's very useful for us to test a quantized mobilenet, and coming up with the edge situation that that network is testing is non-trivial. So I saw having that network test as a two-bird-one-stone situation.
About the unified static memory planner, yes. This change will be incorporated by the planner that will replace the storage rewrite pass.

giuseros · 2021-06-10T15:41:43Z

Hi @areusch ,
Any more thoughts on this?

Thanks,
Giuseppe

areusch

hi @giuseros @manupa-arm ,

i chatted with @tqchen about this change and i think one concern we share is that there isn't a good way to delineate between the reference-counting functionality being added to StorageRewrite, which is fairly small, and a full-fledged implementation. @tqchen pointed out that given the lack of a generalized ref-counting implementation and rules for what TIR authors can do with addresses, likely the correct thing to make StorageRewrite robust agains this problem is to add logic to make StorageRewrite a no-op if tvm_struct_set is used to assign a buffer pointer. However, that means that StorageRewrite becomes useless for this use case.

I also think that on further reading, this may not actually solve the original problem from #8062. there, the problem was that the declared size of the output tensor was actually too small because GraphPlanMemory enlarged the storage_id it belonged to. Because StorageRewrite has authority to modify the parameters to the top-level AOT function, it can indeed correct the inaccurate size. However, that's not what we want here, right? We want it to be the case that the output tensor need only be as large as the graph output (e.g. no reuse of that tensor should happen). Apologies for not catching this earlier, but I'm not sure I see that StorageRewrite actually prevents that (or let me know if you see differently).

given this, it seems like this approach might be a bit fragile to keep logic in StorageRewrite that is fairly tailored to AOTExecutorCodegen. at minimum I think we'd need a unit test suite to exercise the AOTExecutorCodegen TIR patterns in StorageRewrite, but I'm concerned that cross-talk between that and other StorageRewrite use cases may lead to difficulties down the road (e.g. multiplying test cases and StorageRewrite becoming quite complex). @tqchen suggested perhaps a better approach is to apply the results of memory planning to AOTExecutorCodegen and handle buffer reuse manually there--this way any address assignments are purposeful.

what are your thoughts on this?

i added a couple other comments from an earlier reading of this PR too, though they may be moot given the above.

src/relay/backend/aot_executor_codegen.cc

areusch · 2021-06-10T17:24:42Z

src/tir/transforms/storage_rewrite.cc

+          this->VisitExpr(arg);
+        }
+      }
+    } else if (op->op.same_as(builtin::tvm_struct_set())) {


i guess another way to view this though is that this PR is adding reference tracking, but reference tracking only works with one particular TIR pattern. i agree with you that adding full reference tracking is a lot to ask particularly considering the goals of this PR. However, there's no documentation (by which I mean no unit test) as to what specifically needs to be tracked. can you add a unit test for the changes you've made to StorageRewrite?

tests/python/relay/aot/test_crt_aot.py

giuseros · 2021-06-11T11:37:42Z

Hi @areusch,
Thanks for your reply. I think what you are saying is can be summarized in:
a) The StorageRewrite pass does not solve our problem because it can still touch the output buffers (which are parameters of the tir PrimFunc)
b) The solution that we have put in place in StorageRewrite does not address the reference counting in an general way.

About a), I disagree. StorageRewrite only rewrites tir.Allocate nodes, and the output is not allocated in our main runner function. This is why if you run mobilenet quantized with the StorageRewrite pass, it works (and it doesn't work with Graph Memory Planning). So StorageRewrite is a perfectly good candidate to solve the issue we have in #8062 . The problem with StorageRewrite is that it doesn't handle tvm_struct_set correctly, and this brings us to point b)

About b), I agree. The solution we are proposing is not general, but any general solution can be built on top of it. As you said, we are trying to solve a specific case, but when you tackle the general case, the code I wrote is still valid. Especially in a TIR based world (which is the aim of TensorIR, afaiu) I think we will want (at some point) to extend this solution instead of accepting that StorageRewrite does not work with some TIR built-ins.

In other words, our solution is not tailored to AOT at all. It is a partial solution to a generic problem that (only) AOT is hitting (for now). It cannot break anything in the future, because either the user is doing what AOT is doing, and it will work. For more edge cases, StorageRewrite won't work as it didn't use to work before.

Bear also in mind that until we have a static memory planner, without this solution AOT is not usable. If you want, we can try to write a test that stresses the problem we are trying to solve with the change in StorageRewrite.

What do you think?

areusch · 2021-06-11T21:27:07Z

hi @giuseros,

you're right about a)--my analysis was incorrect of StorageRewrite and it shouldn't modify the parameters appreciably to the memory map. thanks for correcting me on that.

on b), my concern is:

do other uses of StorageRewrite exist that may be affected by the narrow approach taken here?
how do we document what the narrow approach needs to accomplish using unit tests so that, if others run into problems with StorageRewrite elsewhere, they don't change it and break us?

On the latter point, my main critique is that the unit tests are pretty coarse (e.g. they are integration tests). there are enough moving pieces that i'm not sure it's obvious from the tests how StorageRewrite should work (it probably is today, but things could change over time that would make it less so). possible to add a finer-grained unit test e.g. in tests/python/unittest/test_tir_transform_storage_rewrite.py?

@tqchen also raised a concern on whether it's feasible to push StorageRewrite in the direction of having pointer analysis. The problem posed was: we currently pass buffers to operator implementation as pointers, and the pointer lifetime of those buffers is clear enough from convention (when the operator impl returns, they are invalid; so impl cannot save those pointers). if we start down the path of adding a more general reference counting impl to StorageRewrite, that implies we may be passing user data structs that contain pointers to operator implementations. however, it isn't really clear how we should explain the lifecycle of those pointers.

sort of I think this is a minor point given the current impl, but I also agree that if we are arguing that StorageRewrite's struct_set analysis is in fact a limited application of a general pattern, and implementing the general pattern is also fine, we do need to then elaborate where we are going with it. i do think keeping the impl in AOTExecutorCodegen doesn't carry this problem.

another possible way to alleviate both concerns would be to define an attr on the tir.tvm_stack_alloca node that explicitly instructs StorageRewrite to link that tvm_stack_alloca with a given AllocateNode. what do you think about that?

giuseros · 2021-06-14T14:08:52Z

Hi @areusch ,
After some thinking we came up with a different solution. The way we are doing things now is the following:
pass-a) Compose the main_func with tvm_set_struct, i.e., codegen ready
pass-b) Storage rewrite modified to take care of the structs
pass-c) tvm::build

A possible alternative can be the following:
pass-a2) Compose the main_func without tvm_set_struct. This means that the packed calls will receive raw pointers. This in turn means that the main_func in TIR is not ready to be code generated
pass-b2) Storage rewrite without any change. This is possible now since we are not using tvm_set_struct yet
pass-c2) transform the packed calls inputs by using tvm_set_struct. We can actually avoid this pass if we are using unpacked signatures.
pass-d2) tvm::build

With pipilene-2 we can leave StorageRewrite unchanged and still address the issues that tvm_set_struct gives us. What do you think?

manupak · 2021-06-17T16:17:01Z

Hi @areusch ,

a friendly ping!
It'd be good hear your thoughts on the second alternative mentioned here, for us to proceed in that approach.

As @giuseros mentioned here, we figured out adding aliasing (partial / incremental) support for StorageRewrite might not be required for this fix. In the pass-a2, if we did not create the struct early on, StorageRewrite on its own will work on raw pointers (not aliased via structs).

Also, post-StorageRewrite TIR, we could essentially create the structs if packed API was desired, in a subsequent pass (pass-c2).

areusch · 2021-06-17T17:12:07Z

@manupa-arm @giuseros apologies for the delay!

yeah the alternative approach makes sense to me. i think it's better to avoid adding special-case logic to StorageRewrite when we don't have a clear plan to implement the general case.

one question on the overall plan: i imagine after this, to implement USMP, there would be an additional pass to replace the allocate calls with indexing into workspace buffers. would USMP then run StorageRewrite before memory planning, or would that be left up to the memory planning implementation?

manupak · 2021-06-17T17:31:33Z

Yes, the overall plan, we'd not like to run StorageRewrite before the USMP passes are run because it will perform non-optimal local sharing inside the primfunc. The plan is to move StorageRewrite pass down after USMP passes if there is anything left out StorageRewrite could help with.

giuseros · 2021-06-21T21:48:10Z

Hi @manupa-arm , @areusch ,
Here is a second attempt to resolve this. I basically applied what we discussed. I also cleaned a bit the TIR produced by removing all the tir::Evaluate(0) statements. Please, let me know what you guys think!

giuseros · 2021-06-22T22:17:59Z

Hi @manupa-arm , @areusch ,
Quick update on this. I added a unit test and slightly changed the pass to address some issues. This should be good enough to be reviewed.

Thanks,
Giuseppe

giuseros · 2021-06-24T14:58:23Z

A friendly ping @manupa-arm , @jroesch , @areusch !

areusch

thanks @giuseros, i think the new approach makes sense. i would like @manupa-arm and perhaps @jroesch to explicit-approve before we merge this

areusch · 2021-06-26T00:20:17Z

src/tir/transforms/legalize_packed_calls.cc

+
+        // Allocate the TVMValues on the stack and define the variables
+        for (auto v : tvm_values) {
+          call_stmt = LetStmt(v, StackAlloca("array", 1), call_stmt);


just wondering if we may want to eventually create a separate memory pool for these, and whether you'd be open to (in a future PR) converting into tir.allocate optionally

Yes, I think it is a very good idea, and I would personally like a similar direction. @manupa-arm, what do you think?

yes, that sounds good! (not sure we want a seperate pool but I can see them pooled to 'a' workspace buffer)

manupak

LGTM!

manupak · 2021-06-28T16:34:35Z

src/tir/transforms/legalize_packed_calls.cc

+
+        // Allocate the TVMValues on the stack and define the variables
+        for (auto v : tvm_values) {
+          call_stmt = LetStmt(v, StackAlloca("array", 1), call_stmt);


yes, that sounds good! (not sure we want a seperate pool but I can see them pooled to 'a' workspace buffer)

Change-Id: I13888471d4b8927a4012d6a8e749fb7a8935dd77

Change-Id: I7aa12e0217b8a2e1ff2a97a7c5fdda6b7597ae64

Change-Id: If9f1ee190690f9a810fe41eb1933d736f1eb4ec3

Change-Id: I8aa43d3a1b837b03a5cf3c6b32fc760bd78d3436

Change-Id: I5b0d75380ff660dd5a0acf5b14fa84bb992fbec4

Change-Id: I52ceab5cf6e9b54390cb36c18dbb8e22505d8e18

giuseros · 2021-06-28T17:38:08Z

Hi @manupa-arm , @jroesch , @areusch ,
I just rebased and had to change StorageInfo to TempStorageInfo because of a conflict. Please, let me know if this is OK for you!

Thanks,
Giuseppe

areusch · 2021-06-28T17:41:19Z

hey @giuseros, i think @jroesch was hoping with #8297 that you could re-use that common StorageInfo struct for your purposes here. is it possible to do that? sorry if this was unclear from that PR.

giuseros · 2021-06-28T19:37:48Z

Hi @areusch , I can reuse that in theory, but TempStorageInfo is only a local structure used to have on-demand memory allocation, i.e., I don't need to pass the TempStorageInfo around. This was the reason you suggested to not use struct of arrays here: #8096 (comment). I personally agreed with your comment and would prefer the way it is (i.e., array of structs), because using a struct of arrays makes the code less readable. However, if you guys feel strongly about reusing the StorageInfo struct of arrays, I can surely do it

jroesch · 2021-06-28T22:52:51Z

@giuseros I would prefer to also use an array of structs, but my goal is to just get us to use the same data structure everywhere since there are multiple places where we are storing different versions of the same data without a shared data structure. We could move the Array outside and simplify the struct but it would be good to move memory planning to also use the same structure.

I guess one solution could be merge as is, I can rewrite the MemoryPlanning to use the Array of struct and then you can update to use the same structure.

Change-Id: Ia8b7de1373f167ca7d0d69a99846d417405bbe48

giuseros · 2021-06-29T11:26:36Z

Hi @jroesch,
The solution I picked is to use the common StorageInfo so that later on we can have a PR to move from SoA to AoS for all the users of that structure (i.e., all the memory planning algorithms).

Thanks,
Giuseppe

areusch · 2021-06-29T23:17:24Z

thanks for working through this one with us @giuseros, the PR is now merged! we will now temporarily prioritize #7518 over further compiler changes. we aim to land that this week, so feel free to send additional PRs based on top of that if you'd like.

* Fix an issue with storage-rewrite pass and packed functions Change-Id: I13888471d4b8927a4012d6a8e749fb7a8935dd77 * Rebasing Change-Id: I7aa12e0217b8a2e1ff2a97a7c5fdda6b7597ae64 * Addressing comments Change-Id: If9f1ee190690f9a810fe41eb1933d736f1eb4ec3 * Add a pass to legalize packed calls Change-Id: I8aa43d3a1b837b03a5cf3c6b32fc760bd78d3436 * Add a unit test for the legalization pass Change-Id: I5b0d75380ff660dd5a0acf5b14fa84bb992fbec4 * rebasing Change-Id: I52ceab5cf6e9b54390cb36c18dbb8e22505d8e18 * Use common StorageInfo Change-Id: Ia8b7de1373f167ca7d0d69a99846d417405bbe48

giuseros mentioned this pull request May 20, 2021

AOT C Codegen Type Issue #8062

Closed

manupak requested changes May 21, 2021

View reviewed changes

giuseros force-pushed the aot-remove-gpm branch 2 times, most recently from 443e7a8 to 836648e Compare May 21, 2021 14:24

manupak reviewed May 21, 2021

View reviewed changes

src/relay/backend/aot_executor_codegen.cc Outdated Show resolved Hide resolved

manupak approved these changes May 21, 2021

View reviewed changes

src/relay/backend/aot_executor_codegen.cc Outdated Show resolved Hide resolved

areusch requested changes May 24, 2021

View reviewed changes

giuseros force-pushed the aot-remove-gpm branch from cfa991a to 7354547 Compare May 26, 2021 09:50

giuseros force-pushed the aot-remove-gpm branch from 7354547 to e15fcc8 Compare June 7, 2021 15:13

giuseros force-pushed the aot-remove-gpm branch from e15fcc8 to 6075b68 Compare June 7, 2021 17:18

manupak approved these changes Jun 7, 2021

View reviewed changes

areusch reviewed Jun 7, 2021

View reviewed changes

areusch reviewed Jun 10, 2021

View reviewed changes

giuseros force-pushed the aot-remove-gpm branch from f816b93 to d062e71 Compare June 21, 2021 21:46

giuseros force-pushed the aot-remove-gpm branch 3 times, most recently from fc7eb40 to 6b010fd Compare June 22, 2021 15:37

giuseros mentioned this pull request Jun 22, 2021

[AOT] Name mangling in AOT #8014

Merged

areusch approved these changes Jun 28, 2021

View reviewed changes

manupak approved these changes Jun 28, 2021

View reviewed changes

Giuseppe Rossini added 6 commits June 28, 2021 18:13

Fix an issue with storage-rewrite pass and packed functions

a9e0b72

Change-Id: I13888471d4b8927a4012d6a8e749fb7a8935dd77

Rebasing

52a5109

Change-Id: I7aa12e0217b8a2e1ff2a97a7c5fdda6b7597ae64

Addressing comments

75167f3

Change-Id: If9f1ee190690f9a810fe41eb1933d736f1eb4ec3

Add a pass to legalize packed calls

a1c3455

Change-Id: I8aa43d3a1b837b03a5cf3c6b32fc760bd78d3436

Add a unit test for the legalization pass

dee4a8c

Change-Id: I5b0d75380ff660dd5a0acf5b14fa84bb992fbec4

rebasing

bc7ba50

Change-Id: I52ceab5cf6e9b54390cb36c18dbb8e22505d8e18

giuseros force-pushed the aot-remove-gpm branch from a77cf3c to bc7ba50 Compare June 28, 2021 17:36

Use common StorageInfo

62fe73c

Change-Id: Ia8b7de1373f167ca7d0d69a99846d417405bbe48

areusch merged commit b803bab into apache:main Jun 29, 2021

junrushao mentioned this pull request Nov 1, 2021

Apache TVM v0.8 Release Note Candidate #9416

Closed

Decoupling AOT from graph memory planner #8096

Decoupling AOT from graph memory planner #8096

Conversation

giuseros commented May 20, 2021

giuseros commented May 20, 2021

mehrdadh commented May 20, 2021

manupak left a comment • edited Loading

Choose a reason for hiding this comment

u99127 commented May 21, 2021

giuseros commented May 21, 2021

manupak left a comment

Choose a reason for hiding this comment

manupak left a comment

Choose a reason for hiding this comment

giuseros commented May 26, 2021

giuseros commented Jun 3, 2021

PhilippvK commented Jun 7, 2021

mehrdadh commented Jun 7, 2021

giuseros commented Jun 7, 2021

manupak left a comment

Choose a reason for hiding this comment

areusch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

giuseros commented Jun 7, 2021 • edited Loading

giuseros commented Jun 10, 2021

areusch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

giuseros commented Jun 11, 2021 • edited Loading

areusch commented Jun 11, 2021

giuseros commented Jun 14, 2021 • edited Loading

manupak commented Jun 17, 2021

areusch commented Jun 17, 2021

manupak commented Jun 17, 2021

giuseros commented Jun 21, 2021 • edited Loading

giuseros commented Jun 22, 2021

giuseros commented Jun 24, 2021

areusch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manupak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

giuseros commented Jun 28, 2021

areusch commented Jun 28, 2021 • edited Loading

giuseros commented Jun 28, 2021

jroesch commented Jun 28, 2021 • edited Loading

giuseros commented Jun 29, 2021

areusch commented Jun 29, 2021

manupak left a comment •

edited

Loading

giuseros commented Jun 7, 2021 •

edited

Loading

giuseros commented Jun 11, 2021 •

edited

Loading

giuseros commented Jun 14, 2021 •

edited

Loading

giuseros commented Jun 21, 2021 •

edited

Loading

areusch commented Jun 28, 2021 •

edited

Loading

jroesch commented Jun 28, 2021 •

edited

Loading