-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RELAY][AST] Add virtual device as a first class field to Relay expressions #45
[RELAY][AST] Add virtual device as a first class field to Relay expressions #45
Conversation
Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45
Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45
Here's my proposed implementation: https://github.com/apache/tvm/pull/9641/files |
Thanks Lily, LGTM. Let me apologize for getting us into the current on_device + function attributes mess your proposal gets us out of. It seemed like a workable approach at the time which would not require both an AST change and careful treatment of AST node copies. But in retrospect the on_device + function attributes approach has caused at least four bugs due to the fragility of maintaining its annotation invariant. I'm hoping the experience we gain from adding virtual_device_ will inform improvements to checked_type_. In particular getting us out of the need to rerun InferType after every xform (and sometimes in strict succession). But one step at a time. |
- Allow device_copy into and out of PrimFuncs - Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45 is implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I don't have other comments at this moment. Gentle ping @jroesch @zhiics @ZihengJiang @junrushao1994 for review, or I'll merge this RFC tomorrow.
@comaniac Thanks for reviewing! Looking forward to it going in :D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @electriclilies and all reviewers. This RFC is merged. |
- Allow device_copy into and out of PrimFuncs - Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45 is implemented.
…pe constraints. - Memory scope constraints can flow both out of and in to PrimFuncs introduced by LowerTE. In TIR memory scopes are represented by 'storage scopes' on the PointerType type annotations on TIR Buffer data variables. - It is straightforward to extract memory scopes from PrimFuncs by looking at the PrimFunc's buffer_map. We do this is 'phase 1' of PlanDevices, which collects all the device constraints implied by - However, pushing memory constraints in to PrimFuncs is more challenging due to buffer aliasing. This aspect is still experimental. - Allow device_copies to be inserted for both arguments and results of PrimFunc calls, on the assumption PlanDevices has already established a consistent device assignment prior to lowering and any new mismatch is required to match up memory scopes. - Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45 is implemented. - Other misc fixups.
Thanks cody! |
…straints. This PR: 1) Makes PlanDevices consider lowered calls when solving device domain constraints. 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data Var type annotation PointerTypes storage_scope fields) to the memory_scope fields of the SEScopes which PlanDevices unifies over. 3) Allows new device_copies to be inserted on the arguments and results of lowered calls so as to acount for any memory scope mismatches which are now apparent. [device_planner.cc has main changes, rest is secondary.] In the short term we'd like to use this machinery to flow memory scope choices made during lowering back out into the overall Relay program. In the longer term we'd also like to be able to use memory scopes to influence the lowering of yet-to-be-lowered functions (or lowered functions which have yet to been scheduled, a distinction now possible with TensorIR). - Memory scope constraints can flow both out of and in to PrimFuncs introduced by LowerTE. In TIR memory scopes are represented by 'storage scopes' on the PointerType type annotations on TIR Buffer data variables. - It is straightforward to extract memory scopes from PrimFuncs by looking at the PrimFunc's buffer_map. We do this is 'phase 1' of PlanDevices, which collects all the device constraints implied by - However, pushing memory constraints in to PrimFuncs is more challenging due to buffer aliasing. This aspect is still experimental. - Allow device_copies to be inserted for both arguments and results of PrimFunc calls, on the assumption PlanDevices has already established a consistent device assignment prior to lowering and any new mismatch is required to match up memory scopes. We use the new 'free' on_device annotations to implement this. Coming along for the ride: - To make unit tests of mixed Relay/TIR functions possible needed to be able to supply a checked_type to GlobalVar since that's currently the only way to give a Relay type to PrimFuncs. - Use GenSym to get unique var names in ANF & partial eval so easier to diff debug output between passes and connect program fragments back into the overall program. Relying on pretty-printing to automagically unique-ify var names is certainly cute but until we have better span support is very hard to work with. - Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45 is implemented. - Make build -Werror clean for clang-12 (mostly move fixups). - Address post-submit comments from apache#9693.
…straints. (#9613) * [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints. This PR: 1) Makes PlanDevices consider lowered calls when solving device domain constraints. 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data Var type annotation PointerTypes storage_scope fields) to the memory_scope fields of the SEScopes which PlanDevices unifies over. 3) Allows new device_copies to be inserted on the arguments and results of lowered calls so as to acount for any memory scope mismatches which are now apparent. [device_planner.cc has main changes, rest is secondary.] In the short term we'd like to use this machinery to flow memory scope choices made during lowering back out into the overall Relay program. In the longer term we'd also like to be able to use memory scopes to influence the lowering of yet-to-be-lowered functions (or lowered functions which have yet to been scheduled, a distinction now possible with TensorIR). - Memory scope constraints can flow both out of and in to PrimFuncs introduced by LowerTE. In TIR memory scopes are represented by 'storage scopes' on the PointerType type annotations on TIR Buffer data variables. - It is straightforward to extract memory scopes from PrimFuncs by looking at the PrimFunc's buffer_map. We do this is 'phase 1' of PlanDevices, which collects all the device constraints implied by - However, pushing memory constraints in to PrimFuncs is more challenging due to buffer aliasing. This aspect is still experimental. - Allow device_copies to be inserted for both arguments and results of PrimFunc calls, on the assumption PlanDevices has already established a consistent device assignment prior to lowering and any new mismatch is required to match up memory scopes. We use the new 'free' on_device annotations to implement this. Coming along for the ride: - To make unit tests of mixed Relay/TIR functions possible needed to be able to supply a checked_type to GlobalVar since that's currently the only way to give a Relay type to PrimFuncs. - Use GenSym to get unique var names in ANF & partial eval so easier to diff debug output between passes and connect program fragments back into the overall program. Relying on pretty-printing to automagically unique-ify var names is certainly cute but until we have better span support is very hard to work with. - Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45 is implemented. - Make build -Werror clean for clang-12 (mostly move fixups). - Address post-submit comments from #9693. * [checkpoint] thread safe GenSym
…straints. (apache#9613) * [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints. This PR: 1) Makes PlanDevices consider lowered calls when solving device domain constraints. 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data Var type annotation PointerTypes storage_scope fields) to the memory_scope fields of the SEScopes which PlanDevices unifies over. 3) Allows new device_copies to be inserted on the arguments and results of lowered calls so as to acount for any memory scope mismatches which are now apparent. [device_planner.cc has main changes, rest is secondary.] In the short term we'd like to use this machinery to flow memory scope choices made during lowering back out into the overall Relay program. In the longer term we'd also like to be able to use memory scopes to influence the lowering of yet-to-be-lowered functions (or lowered functions which have yet to been scheduled, a distinction now possible with TensorIR). - Memory scope constraints can flow both out of and in to PrimFuncs introduced by LowerTE. In TIR memory scopes are represented by 'storage scopes' on the PointerType type annotations on TIR Buffer data variables. - It is straightforward to extract memory scopes from PrimFuncs by looking at the PrimFunc's buffer_map. We do this is 'phase 1' of PlanDevices, which collects all the device constraints implied by - However, pushing memory constraints in to PrimFuncs is more challenging due to buffer aliasing. This aspect is still experimental. - Allow device_copies to be inserted for both arguments and results of PrimFunc calls, on the assumption PlanDevices has already established a consistent device assignment prior to lowering and any new mismatch is required to match up memory scopes. We use the new 'free' on_device annotations to implement this. Coming along for the ride: - To make unit tests of mixed Relay/TIR functions possible needed to be able to supply a checked_type to GlobalVar since that's currently the only way to give a Relay type to PrimFuncs. - Use GenSym to get unique var names in ANF & partial eval so easier to diff debug output between passes and connect program fragments back into the overall program. Relying on pretty-printing to automagically unique-ify var names is certainly cute but until we have better span support is very hard to work with. - Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45 is implemented. - Make build -Werror clean for clang-12 (mostly move fixups). - Address post-submit comments from apache#9693. * [checkpoint] thread safe GenSym
…straints. (apache#9613) * [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints. This PR: 1) Makes PlanDevices consider lowered calls when solving device domain constraints. 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data Var type annotation PointerTypes storage_scope fields) to the memory_scope fields of the SEScopes which PlanDevices unifies over. 3) Allows new device_copies to be inserted on the arguments and results of lowered calls so as to acount for any memory scope mismatches which are now apparent. [device_planner.cc has main changes, rest is secondary.] In the short term we'd like to use this machinery to flow memory scope choices made during lowering back out into the overall Relay program. In the longer term we'd also like to be able to use memory scopes to influence the lowering of yet-to-be-lowered functions (or lowered functions which have yet to been scheduled, a distinction now possible with TensorIR). - Memory scope constraints can flow both out of and in to PrimFuncs introduced by LowerTE. In TIR memory scopes are represented by 'storage scopes' on the PointerType type annotations on TIR Buffer data variables. - It is straightforward to extract memory scopes from PrimFuncs by looking at the PrimFunc's buffer_map. We do this is 'phase 1' of PlanDevices, which collects all the device constraints implied by - However, pushing memory constraints in to PrimFuncs is more challenging due to buffer aliasing. This aspect is still experimental. - Allow device_copies to be inserted for both arguments and results of PrimFunc calls, on the assumption PlanDevices has already established a consistent device assignment prior to lowering and any new mismatch is required to match up memory scopes. We use the new 'free' on_device annotations to implement this. Coming along for the ride: - To make unit tests of mixed Relay/TIR functions possible needed to be able to supply a checked_type to GlobalVar since that's currently the only way to give a Relay type to PrimFuncs. - Use GenSym to get unique var names in ANF & partial eval so easier to diff debug output between passes and connect program fragments back into the overall program. Relying on pretty-printing to automagically unique-ify var names is certainly cute but until we have better span support is very hard to work with. - Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45 is implemented. - Make build -Werror clean for clang-12 (mostly move fixups). - Address post-submit comments from apache#9693. * [checkpoint] thread safe GenSym
…straints. (apache#9613) * [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints. This PR: 1) Makes PlanDevices consider lowered calls when solving device domain constraints. 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data Var type annotation PointerTypes storage_scope fields) to the memory_scope fields of the SEScopes which PlanDevices unifies over. 3) Allows new device_copies to be inserted on the arguments and results of lowered calls so as to acount for any memory scope mismatches which are now apparent. [device_planner.cc has main changes, rest is secondary.] In the short term we'd like to use this machinery to flow memory scope choices made during lowering back out into the overall Relay program. In the longer term we'd also like to be able to use memory scopes to influence the lowering of yet-to-be-lowered functions (or lowered functions which have yet to been scheduled, a distinction now possible with TensorIR). - Memory scope constraints can flow both out of and in to PrimFuncs introduced by LowerTE. In TIR memory scopes are represented by 'storage scopes' on the PointerType type annotations on TIR Buffer data variables. - It is straightforward to extract memory scopes from PrimFuncs by looking at the PrimFunc's buffer_map. We do this is 'phase 1' of PlanDevices, which collects all the device constraints implied by - However, pushing memory constraints in to PrimFuncs is more challenging due to buffer aliasing. This aspect is still experimental. - Allow device_copies to be inserted for both arguments and results of PrimFunc calls, on the assumption PlanDevices has already established a consistent device assignment prior to lowering and any new mismatch is required to match up memory scopes. We use the new 'free' on_device annotations to implement this. Coming along for the ride: - To make unit tests of mixed Relay/TIR functions possible needed to be able to supply a checked_type to GlobalVar since that's currently the only way to give a Relay type to PrimFuncs. - Use GenSym to get unique var names in ANF & partial eval so easier to diff debug output between passes and connect program fragments back into the overall program. Relying on pretty-printing to automagically unique-ify var names is certainly cute but until we have better span support is very hard to work with. - Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45 is implemented. - Make build -Werror clean for clang-12 (mostly move fixups). - Address post-submit comments from apache#9693. * [checkpoint] thread safe GenSym
…straints. (apache#9613) * [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints. This PR: 1) Makes PlanDevices consider lowered calls when solving device domain constraints. 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data Var type annotation PointerTypes storage_scope fields) to the memory_scope fields of the SEScopes which PlanDevices unifies over. 3) Allows new device_copies to be inserted on the arguments and results of lowered calls so as to acount for any memory scope mismatches which are now apparent. [device_planner.cc has main changes, rest is secondary.] In the short term we'd like to use this machinery to flow memory scope choices made during lowering back out into the overall Relay program. In the longer term we'd also like to be able to use memory scopes to influence the lowering of yet-to-be-lowered functions (or lowered functions which have yet to been scheduled, a distinction now possible with TensorIR). - Memory scope constraints can flow both out of and in to PrimFuncs introduced by LowerTE. In TIR memory scopes are represented by 'storage scopes' on the PointerType type annotations on TIR Buffer data variables. - It is straightforward to extract memory scopes from PrimFuncs by looking at the PrimFunc's buffer_map. We do this is 'phase 1' of PlanDevices, which collects all the device constraints implied by - However, pushing memory constraints in to PrimFuncs is more challenging due to buffer aliasing. This aspect is still experimental. - Allow device_copies to be inserted for both arguments and results of PrimFunc calls, on the assumption PlanDevices has already established a consistent device assignment prior to lowering and any new mismatch is required to match up memory scopes. We use the new 'free' on_device annotations to implement this. Coming along for the ride: - To make unit tests of mixed Relay/TIR functions possible needed to be able to supply a checked_type to GlobalVar since that's currently the only way to give a Relay type to PrimFuncs. - Use GenSym to get unique var names in ANF & partial eval so easier to diff debug output between passes and connect program fragments back into the overall program. Relying on pretty-printing to automagically unique-ify var names is certainly cute but until we have better span support is very hard to work with. - Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45 is implemented. - Make build -Werror clean for clang-12 (mostly move fixups). - Address post-submit comments from apache#9693. * [checkpoint] thread safe GenSym
…straints. (apache#9613) * [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints. This PR: 1) Makes PlanDevices consider lowered calls when solving device domain constraints. 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data Var type annotation PointerTypes storage_scope fields) to the memory_scope fields of the SEScopes which PlanDevices unifies over. 3) Allows new device_copies to be inserted on the arguments and results of lowered calls so as to acount for any memory scope mismatches which are now apparent. [device_planner.cc has main changes, rest is secondary.] In the short term we'd like to use this machinery to flow memory scope choices made during lowering back out into the overall Relay program. In the longer term we'd also like to be able to use memory scopes to influence the lowering of yet-to-be-lowered functions (or lowered functions which have yet to been scheduled, a distinction now possible with TensorIR). - Memory scope constraints can flow both out of and in to PrimFuncs introduced by LowerTE. In TIR memory scopes are represented by 'storage scopes' on the PointerType type annotations on TIR Buffer data variables. - It is straightforward to extract memory scopes from PrimFuncs by looking at the PrimFunc's buffer_map. We do this is 'phase 1' of PlanDevices, which collects all the device constraints implied by - However, pushing memory constraints in to PrimFuncs is more challenging due to buffer aliasing. This aspect is still experimental. - Allow device_copies to be inserted for both arguments and results of PrimFunc calls, on the assumption PlanDevices has already established a consistent device assignment prior to lowering and any new mismatch is required to match up memory scopes. We use the new 'free' on_device annotations to implement this. Coming along for the ride: - To make unit tests of mixed Relay/TIR functions possible needed to be able to supply a checked_type to GlobalVar since that's currently the only way to give a Relay type to PrimFuncs. - Use GenSym to get unique var names in ANF & partial eval so easier to diff debug output between passes and connect program fragments back into the overall program. Relying on pretty-printing to automagically unique-ify var names is certainly cute but until we have better span support is very hard to work with. - Realized both dead_code.cc and fold_constant.cc would happily move values into a different lexical virtual device context since device_planner.cc was being 'clever' and eliding on_devices for let-bound values when there's no change. Fixed so that every let-bound value has an on_device. Will be much better after apache/tvm-rfcs#45 is implemented. - Make build -Werror clean for clang-12 (mostly move fixups). - Address post-submit comments from apache#9693. * [checkpoint] thread safe GenSym
This is a proposal to add virtual devices as a first class field to Relay expressions.
cc @mbs-octoml @mikepapadim @jroesch