-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decoupling AOT from graph memory planner #8096
Conversation
thanks @giuseros for working on this! I tested it and it solves the issue that I had. So I close the issue assuming this will merge sometimes soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick fix @giuseros !
My comments are mostly about documentation and clarification questions.
Also I think you wanted to mention : @mbaret and not Matthew Bentham (@MatthewARM ) :)
I think we should close issues when things merge not before that actually happens. |
443e7a8
to
836648e
Compare
@manupa-arm I applied your comments. Sorry, this was a draft that I quickly turned into a PR, hence the in-elegance of the code. Please have another look and let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadly looking good!.
Just one more question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Hi @areusch , @manupa-arm , @mehrdadh ,
The rewriter is able to identify Early on, in the calls:
The variables |
Hi @areusch , @manupa-arm , Thanks! |
I agree with this. I have experienced the same problems as described in #8062 and thought they should be already fixed. I’ve also tried out the proposed changes and they work for me at least for basic use cases. |
@giuseros there is a lint issue due to black update. You can run "docker/lint.sh python_format" to see the error locally. |
Done! Thanks @mehrdadh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @giuseros! I dont have further comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the delay @giuseros , here's a round of comments. this seems overall fine, but i'm curious about the special handling we are adding to StorageRewritePass for the AOT top-level func. i'm also curious whether we intend to eventually abstract this behind the interface from @manupa-arm RFC?
this->VisitExpr(arg); | ||
} | ||
} | ||
} else if (op->op.same_as(builtin::tvm_struct_set())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these the only two such builtins we need to care about? seems like any access to the data would be affected, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, for called_packed, we don't care about accessing the data (tvm_struct_get
), but only setting the data of the TVMValue. But in general it is true that other patterns might be problematic if used.
I am not sure about tvm_struct_get
in particular, because you would use the (real) data you extracted, while with tvm_struct_set
you use the real data to set the structure and then you pass the structure in the call. This change is only to annotate that the struct is a mere container, and that the rewriter needs to focus on the real data.
I would say that this PR is not trying to fix the rewriter on all the possible corner cases. This particular corner case is affecting AOT, and it needs to be fixed. If there are other corner cases, I would fix them in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess another way to view this though is that this PR is adding reference tracking, but reference tracking only works with one particular TIR pattern. i agree with you that adding full reference tracking is a lot to ask particularly considering the goals of this PR. However, there's no documentation (by which I mean no unit test) as to what specifically needs to be tracked. can you add a unit test for the changes you've made to StorageRewrite?
Hi @areusch ,
|
Hi @areusch , Thanks, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @giuseros @manupa-arm ,
i chatted with @tqchen about this change and i think one concern we share is that there isn't a good way to delineate between the reference-counting functionality being added to StorageRewrite, which is fairly small, and a full-fledged implementation. @tqchen pointed out that given the lack of a generalized ref-counting implementation and rules for what TIR authors can do with addresses, likely the correct thing to make StorageRewrite robust agains this problem is to add logic to make StorageRewrite a no-op if tvm_struct_set is used to assign a buffer pointer. However, that means that StorageRewrite becomes useless for this use case.
I also think that on further reading, this may not actually solve the original problem from #8062. there, the problem was that the declared size of the output tensor was actually too small because GraphPlanMemory enlarged the storage_id it belonged to. Because StorageRewrite has authority to modify the parameters to the top-level AOT function, it can indeed correct the inaccurate size. However, that's not what we want here, right? We want it to be the case that the output tensor need only be as large as the graph output (e.g. no reuse of that tensor should happen). Apologies for not catching this earlier, but I'm not sure I see that StorageRewrite actually prevents that (or let me know if you see differently).
given this, it seems like this approach might be a bit fragile to keep logic in StorageRewrite that is fairly tailored to AOTExecutorCodegen. at minimum I think we'd need a unit test suite to exercise the AOTExecutorCodegen TIR patterns in StorageRewrite, but I'm concerned that cross-talk between that and other StorageRewrite use cases may lead to difficulties down the road (e.g. multiplying test cases and StorageRewrite becoming quite complex). @tqchen suggested perhaps a better approach is to apply the results of memory planning to AOTExecutorCodegen and handle buffer reuse manually there--this way any address assignments are purposeful.
what are your thoughts on this?
i added a couple other comments from an earlier reading of this PR too, though they may be moot given the above.
this->VisitExpr(arg); | ||
} | ||
} | ||
} else if (op->op.same_as(builtin::tvm_struct_set())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess another way to view this though is that this PR is adding reference tracking, but reference tracking only works with one particular TIR pattern. i agree with you that adding full reference tracking is a lot to ask particularly considering the goals of this PR. However, there's no documentation (by which I mean no unit test) as to what specifically needs to be tracked. can you add a unit test for the changes you've made to StorageRewrite?
Hi @areusch, About a), I disagree. About b), I agree. The solution we are proposing is not general, but any general solution can be built on top of it. As you said, we are trying to solve a specific case, but when you tackle the general case, the code I wrote is still valid. Especially in a TIR based world (which is the aim of TensorIR, afaiu) I think we will want (at some point) to extend this solution instead of accepting that In other words, our solution is not tailored to AOT at all. It is a partial solution to a generic problem that (only) AOT is hitting (for now). It cannot break anything in the future, because either the user is doing what AOT is doing, and it will work. For more edge cases, Bear also in mind that until we have a static memory planner, without this solution AOT is not usable. If you want, we can try to write a test that stresses the problem we are trying to solve with the change in What do you think? |
hi @giuseros, you're right about a)--my analysis was incorrect of StorageRewrite and it shouldn't modify the parameters appreciably to the memory map. thanks for correcting me on that. on b), my concern is:
On the latter point, my main critique is that the unit tests are pretty coarse (e.g. they are integration tests). there are enough moving pieces that i'm not sure it's obvious from the tests how StorageRewrite should work (it probably is today, but things could change over time that would make it less so). possible to add a finer-grained unit test e.g. in @tqchen also raised a concern on whether it's feasible to push StorageRewrite in the direction of having pointer analysis. The problem posed was: we currently pass buffers to operator implementation as pointers, and the pointer lifetime of those buffers is clear enough from convention (when the operator impl returns, they are invalid; so impl cannot save those pointers). if we start down the path of adding a more general reference counting impl to StorageRewrite, that implies we may be passing user data structs that contain pointers to operator implementations. however, it isn't really clear how we should explain the lifecycle of those pointers. sort of I think this is a minor point given the current impl, but I also agree that if we are arguing that StorageRewrite's struct_set analysis is in fact a limited application of a general pattern, and implementing the general pattern is also fine, we do need to then elaborate where we are going with it. i do think keeping the impl in AOTExecutorCodegen doesn't carry this problem. another possible way to alleviate both concerns would be to define an attr on the |
Hi @areusch , A possible alternative can be the following: With pipilene-2 we can leave |
Hi @areusch , a friendly ping! As @giuseros mentioned here, we figured out adding aliasing (partial / incremental) support for StorageRewrite might not be required for this fix. In the pass-a2, if we did not create the struct early on, StorageRewrite on its own will work on raw pointers (not aliased via structs). Also, post-StorageRewrite TIR, we could essentially create the structs if packed API was desired, in a subsequent pass (pass-c2). |
@manupa-arm @giuseros apologies for the delay! yeah the alternative approach makes sense to me. i think it's better to avoid adding special-case logic to StorageRewrite when we don't have a clear plan to implement the general case. one question on the overall plan: i imagine after this, to implement USMP, there would be an additional pass to replace the allocate calls with indexing into workspace buffers. would USMP then run StorageRewrite before memory planning, or would that be left up to the memory planning implementation? |
Yes, the overall plan, we'd not like to run StorageRewrite before the USMP passes are run because it will perform non-optimal local sharing inside the primfunc. The plan is to move StorageRewrite pass down after USMP passes if there is anything left out StorageRewrite could help with. |
Hi @manupa-arm , @areusch , |
fc7eb40
to
6b010fd
Compare
Hi @manupa-arm , @areusch , Thanks, |
A friendly ping @manupa-arm , @jroesch , @areusch ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @giuseros, i think the new approach makes sense. i would like @manupa-arm and perhaps @jroesch to explicit-approve before we merge this
|
||
// Allocate the TVMValues on the stack and define the variables | ||
for (auto v : tvm_values) { | ||
call_stmt = LetStmt(v, StackAlloca("array", 1), call_stmt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just wondering if we may want to eventually create a separate memory pool for these, and whether you'd be open to (in a future PR) converting into tir.allocate optionally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it is a very good idea, and I would personally like a similar direction. @manupa-arm, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that sounds good! (not sure we want a seperate pool but I can see them pooled to 'a' workspace buffer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
||
// Allocate the TVMValues on the stack and define the variables | ||
for (auto v : tvm_values) { | ||
call_stmt = LetStmt(v, StackAlloca("array", 1), call_stmt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that sounds good! (not sure we want a seperate pool but I can see them pooled to 'a' workspace buffer)
Change-Id: I13888471d4b8927a4012d6a8e749fb7a8935dd77
Change-Id: If9f1ee190690f9a810fe41eb1933d736f1eb4ec3
Change-Id: I8aa43d3a1b837b03a5cf3c6b32fc760bd78d3436
Change-Id: I5b0d75380ff660dd5a0acf5b14fa84bb992fbec4
Hi @manupa-arm , @jroesch , @areusch , Thanks, |
Hi @areusch , I can reuse that in theory, but |
@giuseros I would prefer to also use an array of structs, but my goal is to just get us to use the same data structure everywhere since there are multiple places where we are storing different versions of the same data without a shared data structure. We could move the Array outside and simplify the struct but it would be good to move memory planning to also use the same structure. I guess one solution could be merge as is, I can rewrite the MemoryPlanning to use the Array of struct and then you can update to use the same structure. |
Change-Id: Ia8b7de1373f167ca7d0d69a99846d417405bbe48
Hi @jroesch, Thanks, |
* Fix an issue with storage-rewrite pass and packed functions Change-Id: I13888471d4b8927a4012d6a8e749fb7a8935dd77 * Rebasing Change-Id: I7aa12e0217b8a2e1ff2a97a7c5fdda6b7597ae64 * Addressing comments Change-Id: If9f1ee190690f9a810fe41eb1933d736f1eb4ec3 * Add a pass to legalize packed calls Change-Id: I8aa43d3a1b837b03a5cf3c6b32fc760bd78d3436 * Add a unit test for the legalization pass Change-Id: I5b0d75380ff660dd5a0acf5b14fa84bb992fbec4 * rebasing Change-Id: I52ceab5cf6e9b54390cb36c18dbb8e22505d8e18 * Use common StorageInfo Change-Id: Ia8b7de1373f167ca7d0d69a99846d417405bbe48
* Fix an issue with storage-rewrite pass and packed functions Change-Id: I13888471d4b8927a4012d6a8e749fb7a8935dd77 * Rebasing Change-Id: I7aa12e0217b8a2e1ff2a97a7c5fdda6b7597ae64 * Addressing comments Change-Id: If9f1ee190690f9a810fe41eb1933d736f1eb4ec3 * Add a pass to legalize packed calls Change-Id: I8aa43d3a1b837b03a5cf3c6b32fc760bd78d3436 * Add a unit test for the legalization pass Change-Id: I5b0d75380ff660dd5a0acf5b14fa84bb992fbec4 * rebasing Change-Id: I52ceab5cf6e9b54390cb36c18dbb8e22505d8e18 * Use common StorageInfo Change-Id: Ia8b7de1373f167ca7d0d69a99846d417405bbe48
In this PR we are decoupling AOT from the Graph Memory Planner. Since AOT has the runner expressed in TIR we can get rid of the GMP in relay and use the Storage Rewrite Pass to do memory planning on the runner function. This also sorts out the issue mentioned in #8062