Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELAY][AST] Add virtual device as a first class field to Relay expressions #45

Merged
merged 3 commits into from
Dec 7, 2021

Conversation

electriclilies
Copy link
Contributor

This is a proposal to add virtual devices as a first class field to Relay expressions.

cc @mbs-octoml @mikepapadim @jroesch

@electriclilies electriclilies changed the title Add virtual device as a first class field to Relay expressions [RELAY] Add virtual device as a first class field to Relay expressions Dec 1, 2021
@tqchen tqchen changed the title [RELAY] Add virtual device as a first class field to Relay expressions [RELAY][AST] Add virtual device as a first class field to Relay expressions Dec 2, 2021
@tqchen tqchen added the status: need review RFC needs review label Dec 2, 2021
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Dec 2, 2021
Realized both dead_code.cc and fold_constant.cc would
happily move values into a different lexical virtual
device context since device_planner.cc was being
'clever' and eliding on_devices for let-bound values
when there's no change. Fixed so that every let-bound
value has an on_device.

Will be much better after apache/tvm-rfcs#45
@tqchen
Copy link
Member

tqchen commented Dec 2, 2021

cc @comaniac @zhiics @ZihengJiang

mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Dec 2, 2021
Realized both dead_code.cc and fold_constant.cc would
happily move values into a different lexical virtual
device context since device_planner.cc was being
'clever' and eliding on_devices for let-bound values
when there's no change. Fixed so that every let-bound
value has an on_device.

Will be much better after apache/tvm-rfcs#45
@electriclilies
Copy link
Contributor Author

Here's my proposed implementation: https://github.com/apache/tvm/pull/9641/files

@mbs-octoml
Copy link
Contributor

Thanks Lily, LGTM.

Let me apologize for getting us into the current on_device + function attributes mess your proposal gets us out of. It seemed like a workable approach at the time which would not require both an AST change and careful treatment of AST node copies. But in retrospect the on_device + function attributes approach has caused at least four bugs due to the fragility of maintaining its annotation invariant.

I'm hoping the experience we gain from adding virtual_device_ will inform improvements to checked_type_. In particular getting us out of the need to rerun InferType after every xform (and sometimes in strict succession). But one step at a time.

rfcs/1111-first-class-virtual-device.md Outdated Show resolved Hide resolved
rfcs/1111-first-class-virtual-device.md Outdated Show resolved Hide resolved
rfcs/1111-first-class-virtual-device.md Outdated Show resolved Hide resolved
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Dec 4, 2021
 - Allow device_copy into and out of PrimFuncs
 - Realized both dead_code.cc and fold_constant.cc would
   happily move values into a different lexical virtual
   device context since device_planner.cc was being
   'clever' and eliding on_devices for let-bound values
   when there's no change. Fixed so that every let-bound
   value has an on_device. Will be much better after
   apache/tvm-rfcs#45 is implemented.
Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I don't have other comments at this moment. Gentle ping @jroesch @zhiics @ZihengJiang @junrushao1994 for review, or I'll merge this RFC tomorrow.

@electriclilies
Copy link
Contributor Author

@comaniac Thanks for reviewing! Looking forward to it going in :D

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@comaniac comaniac merged commit 67c39d2 into apache:main Dec 7, 2021
@comaniac
Copy link
Contributor

comaniac commented Dec 7, 2021

Thanks @electriclilies and all reviewers. This RFC is merged.

mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Dec 7, 2021
 - Allow device_copy into and out of PrimFuncs
 - Realized both dead_code.cc and fold_constant.cc would
   happily move values into a different lexical virtual
   device context since device_planner.cc was being
   'clever' and eliding on_devices for let-bound values
   when there's no change. Fixed so that every let-bound
   value has an on_device. Will be much better after
   apache/tvm-rfcs#45 is implemented.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Dec 7, 2021
…pe constraints.

 - Memory scope constraints can flow both out of and in to PrimFuncs
   introduced by LowerTE. In TIR memory scopes are represented by
   'storage scopes' on the PointerType type annotations on TIR Buffer data
   variables.
    - It is straightforward to extract memory scopes from PrimFuncs by
      looking at the PrimFunc's buffer_map. We do this is 'phase 1' of
      PlanDevices, which collects all the device constraints implied by
    - However, pushing memory constraints in to PrimFuncs is more challenging
      due to buffer aliasing. This aspect is still experimental.

 - Allow device_copies to be inserted for both arguments and
   results of PrimFunc calls, on the assumption PlanDevices has
   already established a consistent device assignment prior to
   lowering and any new mismatch is required to match up memory scopes.

 - Realized both dead_code.cc and fold_constant.cc would
   happily move values into a different lexical virtual
   device context since device_planner.cc was being
   'clever' and eliding on_devices for let-bound values
   when there's no change. Fixed so that every let-bound
   value has an on_device. Will be much better after
   apache/tvm-rfcs#45 is implemented.

 - Other misc fixups.
@electriclilies
Copy link
Contributor Author

Thanks cody!

mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Dec 13, 2021
…straints.

This PR:
 1) Makes PlanDevices consider lowered calls when solving device domain constraints.
 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data
    Var type annotation PointerTypes storage_scope fields) to the memory_scope
    fields of the SEScopes which PlanDevices unifies over.
 3) Allows new device_copies to be inserted on the arguments and results of lowered
    calls so as to acount for any memory scope mismatches which are now apparent.

[device_planner.cc has main changes, rest is secondary.]

In the short term we'd like to use this machinery to flow memory scope choices made
during lowering back out into the overall Relay program. In the longer term we'd
also like to be able to use memory scopes to influence the lowering of
yet-to-be-lowered functions (or lowered functions which have yet to been scheduled,
a distinction now possible with TensorIR).

 - Memory scope constraints can flow both out of and in to PrimFuncs
   introduced by LowerTE. In TIR memory scopes are represented by
   'storage scopes' on the PointerType type annotations on TIR Buffer data
   variables.
    - It is straightforward to extract memory scopes from PrimFuncs by
      looking at the PrimFunc's buffer_map. We do this is 'phase 1' of
      PlanDevices, which collects all the device constraints implied by
    - However, pushing memory constraints in to PrimFuncs is more challenging
      due to buffer aliasing. This aspect is still experimental.

 - Allow device_copies to be inserted for both arguments and
   results of PrimFunc calls, on the assumption PlanDevices has
   already established a consistent device assignment prior to
   lowering and any new mismatch is required to match up memory scopes.
   We use the new 'free' on_device annotations to implement this.

Coming along for the ride:

 - To make unit tests of mixed Relay/TIR functions possible needed
   to be able to supply a checked_type to GlobalVar since that's currently
   the only way to give a Relay type to PrimFuncs.

 - Use GenSym to get unique var names in ANF & partial eval so easier
   to diff debug output between passes and connect program fragments
   back into the overall program. Relying on pretty-printing to
   automagically unique-ify var names is certainly cute but until we
   have better span support is very hard to work with.

 - Realized both dead_code.cc and fold_constant.cc would
   happily move values into a different lexical virtual
   device context since device_planner.cc was being
   'clever' and eliding on_devices for let-bound values
   when there's no change. Fixed so that every let-bound
   value has an on_device. Will be much better after
   apache/tvm-rfcs#45 is implemented.

 - Make build -Werror clean for clang-12 (mostly move fixups).

 - Address post-submit comments from apache#9693.
mbrookhart pushed a commit to apache/tvm that referenced this pull request Dec 14, 2021
…straints. (#9613)

* [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints.

This PR:
 1) Makes PlanDevices consider lowered calls when solving device domain constraints.
 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data
    Var type annotation PointerTypes storage_scope fields) to the memory_scope
    fields of the SEScopes which PlanDevices unifies over.
 3) Allows new device_copies to be inserted on the arguments and results of lowered
    calls so as to acount for any memory scope mismatches which are now apparent.

[device_planner.cc has main changes, rest is secondary.]

In the short term we'd like to use this machinery to flow memory scope choices made
during lowering back out into the overall Relay program. In the longer term we'd
also like to be able to use memory scopes to influence the lowering of
yet-to-be-lowered functions (or lowered functions which have yet to been scheduled,
a distinction now possible with TensorIR).

 - Memory scope constraints can flow both out of and in to PrimFuncs
   introduced by LowerTE. In TIR memory scopes are represented by
   'storage scopes' on the PointerType type annotations on TIR Buffer data
   variables.
    - It is straightforward to extract memory scopes from PrimFuncs by
      looking at the PrimFunc's buffer_map. We do this is 'phase 1' of
      PlanDevices, which collects all the device constraints implied by
    - However, pushing memory constraints in to PrimFuncs is more challenging
      due to buffer aliasing. This aspect is still experimental.

 - Allow device_copies to be inserted for both arguments and
   results of PrimFunc calls, on the assumption PlanDevices has
   already established a consistent device assignment prior to
   lowering and any new mismatch is required to match up memory scopes.
   We use the new 'free' on_device annotations to implement this.

Coming along for the ride:

 - To make unit tests of mixed Relay/TIR functions possible needed
   to be able to supply a checked_type to GlobalVar since that's currently
   the only way to give a Relay type to PrimFuncs.

 - Use GenSym to get unique var names in ANF & partial eval so easier
   to diff debug output between passes and connect program fragments
   back into the overall program. Relying on pretty-printing to
   automagically unique-ify var names is certainly cute but until we
   have better span support is very hard to work with.

 - Realized both dead_code.cc and fold_constant.cc would
   happily move values into a different lexical virtual
   device context since device_planner.cc was being
   'clever' and eliding on_devices for let-bound values
   when there's no change. Fixed so that every let-bound
   value has an on_device. Will be much better after
   apache/tvm-rfcs#45 is implemented.

 - Make build -Werror clean for clang-12 (mostly move fixups).

 - Address post-submit comments from #9693.

* [checkpoint] thread safe GenSym
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022
…straints. (apache#9613)

* [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints.

This PR:
 1) Makes PlanDevices consider lowered calls when solving device domain constraints.
 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data
    Var type annotation PointerTypes storage_scope fields) to the memory_scope
    fields of the SEScopes which PlanDevices unifies over.
 3) Allows new device_copies to be inserted on the arguments and results of lowered
    calls so as to acount for any memory scope mismatches which are now apparent.

[device_planner.cc has main changes, rest is secondary.]

In the short term we'd like to use this machinery to flow memory scope choices made
during lowering back out into the overall Relay program. In the longer term we'd
also like to be able to use memory scopes to influence the lowering of
yet-to-be-lowered functions (or lowered functions which have yet to been scheduled,
a distinction now possible with TensorIR).

 - Memory scope constraints can flow both out of and in to PrimFuncs
   introduced by LowerTE. In TIR memory scopes are represented by
   'storage scopes' on the PointerType type annotations on TIR Buffer data
   variables.
    - It is straightforward to extract memory scopes from PrimFuncs by
      looking at the PrimFunc's buffer_map. We do this is 'phase 1' of
      PlanDevices, which collects all the device constraints implied by
    - However, pushing memory constraints in to PrimFuncs is more challenging
      due to buffer aliasing. This aspect is still experimental.

 - Allow device_copies to be inserted for both arguments and
   results of PrimFunc calls, on the assumption PlanDevices has
   already established a consistent device assignment prior to
   lowering and any new mismatch is required to match up memory scopes.
   We use the new 'free' on_device annotations to implement this.

Coming along for the ride:

 - To make unit tests of mixed Relay/TIR functions possible needed
   to be able to supply a checked_type to GlobalVar since that's currently
   the only way to give a Relay type to PrimFuncs.

 - Use GenSym to get unique var names in ANF & partial eval so easier
   to diff debug output between passes and connect program fragments
   back into the overall program. Relying on pretty-printing to
   automagically unique-ify var names is certainly cute but until we
   have better span support is very hard to work with.

 - Realized both dead_code.cc and fold_constant.cc would
   happily move values into a different lexical virtual
   device context since device_planner.cc was being
   'clever' and eliding on_devices for let-bound values
   when there's no change. Fixed so that every let-bound
   value has an on_device. Will be much better after
   apache/tvm-rfcs#45 is implemented.

 - Make build -Werror clean for clang-12 (mostly move fixups).

 - Address post-submit comments from apache#9693.

* [checkpoint] thread safe GenSym
yangulei pushed a commit to yangulei/tvm that referenced this pull request Jan 11, 2022
…straints. (apache#9613)

* [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints.

This PR:
 1) Makes PlanDevices consider lowered calls when solving device domain constraints.
 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data
    Var type annotation PointerTypes storage_scope fields) to the memory_scope
    fields of the SEScopes which PlanDevices unifies over.
 3) Allows new device_copies to be inserted on the arguments and results of lowered
    calls so as to acount for any memory scope mismatches which are now apparent.

[device_planner.cc has main changes, rest is secondary.]

In the short term we'd like to use this machinery to flow memory scope choices made
during lowering back out into the overall Relay program. In the longer term we'd
also like to be able to use memory scopes to influence the lowering of
yet-to-be-lowered functions (or lowered functions which have yet to been scheduled,
a distinction now possible with TensorIR).

 - Memory scope constraints can flow both out of and in to PrimFuncs
   introduced by LowerTE. In TIR memory scopes are represented by
   'storage scopes' on the PointerType type annotations on TIR Buffer data
   variables.
    - It is straightforward to extract memory scopes from PrimFuncs by
      looking at the PrimFunc's buffer_map. We do this is 'phase 1' of
      PlanDevices, which collects all the device constraints implied by
    - However, pushing memory constraints in to PrimFuncs is more challenging
      due to buffer aliasing. This aspect is still experimental.

 - Allow device_copies to be inserted for both arguments and
   results of PrimFunc calls, on the assumption PlanDevices has
   already established a consistent device assignment prior to
   lowering and any new mismatch is required to match up memory scopes.
   We use the new 'free' on_device annotations to implement this.

Coming along for the ride:

 - To make unit tests of mixed Relay/TIR functions possible needed
   to be able to supply a checked_type to GlobalVar since that's currently
   the only way to give a Relay type to PrimFuncs.

 - Use GenSym to get unique var names in ANF & partial eval so easier
   to diff debug output between passes and connect program fragments
   back into the overall program. Relying on pretty-printing to
   automagically unique-ify var names is certainly cute but until we
   have better span support is very hard to work with.

 - Realized both dead_code.cc and fold_constant.cc would
   happily move values into a different lexical virtual
   device context since device_planner.cc was being
   'clever' and eliding on_devices for let-bound values
   when there's no change. Fixed so that every let-bound
   value has an on_device. Will be much better after
   apache/tvm-rfcs#45 is implemented.

 - Make build -Werror clean for clang-12 (mostly move fixups).

 - Address post-submit comments from apache#9693.

* [checkpoint] thread safe GenSym
yangulei pushed a commit to yangulei/tvm that referenced this pull request Jan 12, 2022
…straints. (apache#9613)

* [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints.

This PR:
 1) Makes PlanDevices consider lowered calls when solving device domain constraints.
 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data
    Var type annotation PointerTypes storage_scope fields) to the memory_scope
    fields of the SEScopes which PlanDevices unifies over.
 3) Allows new device_copies to be inserted on the arguments and results of lowered
    calls so as to acount for any memory scope mismatches which are now apparent.

[device_planner.cc has main changes, rest is secondary.]

In the short term we'd like to use this machinery to flow memory scope choices made
during lowering back out into the overall Relay program. In the longer term we'd
also like to be able to use memory scopes to influence the lowering of
yet-to-be-lowered functions (or lowered functions which have yet to been scheduled,
a distinction now possible with TensorIR).

 - Memory scope constraints can flow both out of and in to PrimFuncs
   introduced by LowerTE. In TIR memory scopes are represented by
   'storage scopes' on the PointerType type annotations on TIR Buffer data
   variables.
    - It is straightforward to extract memory scopes from PrimFuncs by
      looking at the PrimFunc's buffer_map. We do this is 'phase 1' of
      PlanDevices, which collects all the device constraints implied by
    - However, pushing memory constraints in to PrimFuncs is more challenging
      due to buffer aliasing. This aspect is still experimental.

 - Allow device_copies to be inserted for both arguments and
   results of PrimFunc calls, on the assumption PlanDevices has
   already established a consistent device assignment prior to
   lowering and any new mismatch is required to match up memory scopes.
   We use the new 'free' on_device annotations to implement this.

Coming along for the ride:

 - To make unit tests of mixed Relay/TIR functions possible needed
   to be able to supply a checked_type to GlobalVar since that's currently
   the only way to give a Relay type to PrimFuncs.

 - Use GenSym to get unique var names in ANF & partial eval so easier
   to diff debug output between passes and connect program fragments
   back into the overall program. Relying on pretty-printing to
   automagically unique-ify var names is certainly cute but until we
   have better span support is very hard to work with.

 - Realized both dead_code.cc and fold_constant.cc would
   happily move values into a different lexical virtual
   device context since device_planner.cc was being
   'clever' and eliding on_devices for let-bound values
   when there's no change. Fixed so that every let-bound
   value has an on_device. Will be much better after
   apache/tvm-rfcs#45 is implemented.

 - Make build -Werror clean for clang-12 (mostly move fixups).

 - Address post-submit comments from apache#9693.

* [checkpoint] thread safe GenSym
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
…straints. (apache#9613)

* [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints.

This PR:
 1) Makes PlanDevices consider lowered calls when solving device domain constraints.
 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data
    Var type annotation PointerTypes storage_scope fields) to the memory_scope
    fields of the SEScopes which PlanDevices unifies over.
 3) Allows new device_copies to be inserted on the arguments and results of lowered
    calls so as to acount for any memory scope mismatches which are now apparent.

[device_planner.cc has main changes, rest is secondary.]

In the short term we'd like to use this machinery to flow memory scope choices made
during lowering back out into the overall Relay program. In the longer term we'd
also like to be able to use memory scopes to influence the lowering of
yet-to-be-lowered functions (or lowered functions which have yet to been scheduled,
a distinction now possible with TensorIR).

 - Memory scope constraints can flow both out of and in to PrimFuncs
   introduced by LowerTE. In TIR memory scopes are represented by
   'storage scopes' on the PointerType type annotations on TIR Buffer data
   variables.
    - It is straightforward to extract memory scopes from PrimFuncs by
      looking at the PrimFunc's buffer_map. We do this is 'phase 1' of
      PlanDevices, which collects all the device constraints implied by
    - However, pushing memory constraints in to PrimFuncs is more challenging
      due to buffer aliasing. This aspect is still experimental.

 - Allow device_copies to be inserted for both arguments and
   results of PrimFunc calls, on the assumption PlanDevices has
   already established a consistent device assignment prior to
   lowering and any new mismatch is required to match up memory scopes.
   We use the new 'free' on_device annotations to implement this.

Coming along for the ride:

 - To make unit tests of mixed Relay/TIR functions possible needed
   to be able to supply a checked_type to GlobalVar since that's currently
   the only way to give a Relay type to PrimFuncs.

 - Use GenSym to get unique var names in ANF & partial eval so easier
   to diff debug output between passes and connect program fragments
   back into the overall program. Relying on pretty-printing to
   automagically unique-ify var names is certainly cute but until we
   have better span support is very hard to work with.

 - Realized both dead_code.cc and fold_constant.cc would
   happily move values into a different lexical virtual
   device context since device_planner.cc was being
   'clever' and eliding on_devices for let-bound values
   when there's no change. Fixed so that every let-bound
   value has an on_device. Will be much better after
   apache/tvm-rfcs#45 is implemented.

 - Make build -Werror clean for clang-12 (mostly move fixups).

 - Address post-submit comments from apache#9693.

* [checkpoint] thread safe GenSym
qsqqsqqsq-intellif pushed a commit to qsqqsqqsq-intellif/tvm that referenced this pull request Apr 29, 2022
…straints. (apache#9613)

* [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints.

This PR:
 1) Makes PlanDevices consider lowered calls when solving device domain constraints.
 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data
    Var type annotation PointerTypes storage_scope fields) to the memory_scope
    fields of the SEScopes which PlanDevices unifies over.
 3) Allows new device_copies to be inserted on the arguments and results of lowered
    calls so as to acount for any memory scope mismatches which are now apparent.

[device_planner.cc has main changes, rest is secondary.]

In the short term we'd like to use this machinery to flow memory scope choices made
during lowering back out into the overall Relay program. In the longer term we'd
also like to be able to use memory scopes to influence the lowering of
yet-to-be-lowered functions (or lowered functions which have yet to been scheduled,
a distinction now possible with TensorIR).

 - Memory scope constraints can flow both out of and in to PrimFuncs
   introduced by LowerTE. In TIR memory scopes are represented by
   'storage scopes' on the PointerType type annotations on TIR Buffer data
   variables.
    - It is straightforward to extract memory scopes from PrimFuncs by
      looking at the PrimFunc's buffer_map. We do this is 'phase 1' of
      PlanDevices, which collects all the device constraints implied by
    - However, pushing memory constraints in to PrimFuncs is more challenging
      due to buffer aliasing. This aspect is still experimental.

 - Allow device_copies to be inserted for both arguments and
   results of PrimFunc calls, on the assumption PlanDevices has
   already established a consistent device assignment prior to
   lowering and any new mismatch is required to match up memory scopes.
   We use the new 'free' on_device annotations to implement this.

Coming along for the ride:

 - To make unit tests of mixed Relay/TIR functions possible needed
   to be able to supply a checked_type to GlobalVar since that's currently
   the only way to give a Relay type to PrimFuncs.

 - Use GenSym to get unique var names in ANF & partial eval so easier
   to diff debug output between passes and connect program fragments
   back into the overall program. Relying on pretty-printing to
   automagically unique-ify var names is certainly cute but until we
   have better span support is very hard to work with.

 - Realized both dead_code.cc and fold_constant.cc would
   happily move values into a different lexical virtual
   device context since device_planner.cc was being
   'clever' and eliding on_devices for let-bound values
   when there's no change. Fixed so that every let-bound
   value has an on_device. Will be much better after
   apache/tvm-rfcs#45 is implemented.

 - Make build -Werror clean for clang-12 (mostly move fixups).

 - Address post-submit comments from apache#9693.

* [checkpoint] thread safe GenSym
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: need review RFC needs review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants