Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Relay] PlanDevices supports 'free' on_device annotations #9693

Merged
merged 2 commits into from
Dec 10, 2021

Conversation

mbs-octoml
Copy link
Contributor

@mbs-octoml mbs-octoml commented Dec 9, 2021

This is in support of #9613, which allows PlanDevices to be run
after lowering so as to flow memory constraints in and
out of PrimFuncs. That requires a way to insert device_copies
when the memory scopes chosen during separate lowering of fused
primitive functions clashes, but otherwise avoid device_copies when
scopes can be chosen so as to avoid them.

We support that by generalizing the "on_device" annotation to
allow the device constraint to be independently controlled for
its 'body' and 'result'.

  • Standard user annotation: body is constrained to S
on_device(body, S)
  • Used by PlanDevices to 'fix' expression to S
    (was is_fixed=True)
on_device(body, S, constrain_result=True)
  • Used by PlanDevices to indicate a device_copy can be
    inserted if necessary.
on_device(body, S, constrain_body=False)
  • Supported, but currently has no use.
on_device(body, S, constrain_result=True, constrain_body=False)

A few extra odd's 'n ends collected along the way:

  • Some CallLowered cleanup which I found useful.
  • The usual extra debugging output needed as I debugged.
    In return I removed some particularly verbose logging I'd
    added while tracking down unexpected object copies.
  • Cleanup warnings from clang-12 as I touch files.

This is in support of apache#9613, which allows PlanDevices to be run
after lowering so as to flow memory constraints in and
out of PrimFuncs. That requires a way to insert device_copies
when the memory scopes chosen during separate lowering of fused
primitive functions clashes, but otherwise avoid device_copies when
scopes can be chosen so as to avoid them.

We support that by generalizing the "on_device" annotation to
allow the device constraint to be independently controlled for
its 'body' and 'result'.

# Standard user annotation: body is constrained to S
on_device(body, S)

# Used by PlanDevices to 'fix' expression to S
# (was is_fixed=True)
on_device(body, S, constrain_result=True)

# Used by PlanDevices to indicate a device_copy can be
# inserted if necessary.
on_device(body, S, constrain_body=False)

# Supported, but currently has no use.
on_device(body, S, constrain_result=True, constrain_body=False)

A few extra odd's 'n ends collected along the way:
 - Some CallLowered cleanup which I found useful.
 - The usual extra debugging output needed as I debugged.
   In return I removed some particularly verbose logging I'd
   added while tracking down unexpected object copies.
 - Cleanup warnings from clang-12 as I touch files.
Copy link
Contributor

@electriclilies electriclilies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM! I left some suggestions to improve comments and some clarifying questions for my own edification.

include/tvm/relay/attrs/on_device.h Show resolved Hide resolved
src/printer/text_printer.cc Show resolved Hide resolved
src/relay/backend/vm/compiler.cc Show resolved Hide resolved
src/relay/op/call/call.cc Show resolved Hide resolved
src/relay/op/memory/on_device.cc Show resolved Hide resolved
<< "Cannot constrain intermediate result of nested on_device calls to different SEScopes";
}
// We can now ignore the intermediate constraints, if any.
return OnDevice(props.body, (constrain_inner || constrain_outer) ? outer : inner,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the inner is constrained, we return outer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that doesn't look right does it? Fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it is correct -- added a comment to explain.

src/relay/op/memory/on_device.h Show resolved Hide resolved
src/relay/op/memory/on_device.h Show resolved Hide resolved
src/relay/transforms/device_planner.cc Show resolved Hide resolved
@jroesch jroesch merged commit e3379a6 into apache:main Dec 10, 2021
@mbs-octoml mbs-octoml deleted the mbs-devplanning branch December 13, 2021 16:59
@mbs-octoml
Copy link
Contributor Author

Thanks @electriclilies and @jroesch , merged your post-submit comments into the sequel.

mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Dec 13, 2021
…straints.

This PR:
 1) Makes PlanDevices consider lowered calls when solving device domain constraints.
 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data
    Var type annotation PointerTypes storage_scope fields) to the memory_scope
    fields of the SEScopes which PlanDevices unifies over.
 3) Allows new device_copies to be inserted on the arguments and results of lowered
    calls so as to acount for any memory scope mismatches which are now apparent.

[device_planner.cc has main changes, rest is secondary.]

In the short term we'd like to use this machinery to flow memory scope choices made
during lowering back out into the overall Relay program. In the longer term we'd
also like to be able to use memory scopes to influence the lowering of
yet-to-be-lowered functions (or lowered functions which have yet to been scheduled,
a distinction now possible with TensorIR).

 - Memory scope constraints can flow both out of and in to PrimFuncs
   introduced by LowerTE. In TIR memory scopes are represented by
   'storage scopes' on the PointerType type annotations on TIR Buffer data
   variables.
    - It is straightforward to extract memory scopes from PrimFuncs by
      looking at the PrimFunc's buffer_map. We do this is 'phase 1' of
      PlanDevices, which collects all the device constraints implied by
    - However, pushing memory constraints in to PrimFuncs is more challenging
      due to buffer aliasing. This aspect is still experimental.

 - Allow device_copies to be inserted for both arguments and
   results of PrimFunc calls, on the assumption PlanDevices has
   already established a consistent device assignment prior to
   lowering and any new mismatch is required to match up memory scopes.
   We use the new 'free' on_device annotations to implement this.

Coming along for the ride:

 - To make unit tests of mixed Relay/TIR functions possible needed
   to be able to supply a checked_type to GlobalVar since that's currently
   the only way to give a Relay type to PrimFuncs.

 - Use GenSym to get unique var names in ANF & partial eval so easier
   to diff debug output between passes and connect program fragments
   back into the overall program. Relying on pretty-printing to
   automagically unique-ify var names is certainly cute but until we
   have better span support is very hard to work with.

 - Realized both dead_code.cc and fold_constant.cc would
   happily move values into a different lexical virtual
   device context since device_planner.cc was being
   'clever' and eliding on_devices for let-bound values
   when there's no change. Fixed so that every let-bound
   value has an on_device. Will be much better after
   apache/tvm-rfcs#45 is implemented.

 - Make build -Werror clean for clang-12 (mostly move fixups).

 - Address post-submit comments from apache#9693.
mbrookhart pushed a commit that referenced this pull request Dec 14, 2021
…straints. (#9613)

* [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints.

This PR:
 1) Makes PlanDevices consider lowered calls when solving device domain constraints.
 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data
    Var type annotation PointerTypes storage_scope fields) to the memory_scope
    fields of the SEScopes which PlanDevices unifies over.
 3) Allows new device_copies to be inserted on the arguments and results of lowered
    calls so as to acount for any memory scope mismatches which are now apparent.

[device_planner.cc has main changes, rest is secondary.]

In the short term we'd like to use this machinery to flow memory scope choices made
during lowering back out into the overall Relay program. In the longer term we'd
also like to be able to use memory scopes to influence the lowering of
yet-to-be-lowered functions (or lowered functions which have yet to been scheduled,
a distinction now possible with TensorIR).

 - Memory scope constraints can flow both out of and in to PrimFuncs
   introduced by LowerTE. In TIR memory scopes are represented by
   'storage scopes' on the PointerType type annotations on TIR Buffer data
   variables.
    - It is straightforward to extract memory scopes from PrimFuncs by
      looking at the PrimFunc's buffer_map. We do this is 'phase 1' of
      PlanDevices, which collects all the device constraints implied by
    - However, pushing memory constraints in to PrimFuncs is more challenging
      due to buffer aliasing. This aspect is still experimental.

 - Allow device_copies to be inserted for both arguments and
   results of PrimFunc calls, on the assumption PlanDevices has
   already established a consistent device assignment prior to
   lowering and any new mismatch is required to match up memory scopes.
   We use the new 'free' on_device annotations to implement this.

Coming along for the ride:

 - To make unit tests of mixed Relay/TIR functions possible needed
   to be able to supply a checked_type to GlobalVar since that's currently
   the only way to give a Relay type to PrimFuncs.

 - Use GenSym to get unique var names in ANF & partial eval so easier
   to diff debug output between passes and connect program fragments
   back into the overall program. Relying on pretty-printing to
   automagically unique-ify var names is certainly cute but until we
   have better span support is very hard to work with.

 - Realized both dead_code.cc and fold_constant.cc would
   happily move values into a different lexical virtual
   device context since device_planner.cc was being
   'clever' and eliding on_devices for let-bound values
   when there's no change. Fixed so that every let-bound
   value has an on_device. Will be much better after
   apache/tvm-rfcs#45 is implemented.

 - Make build -Werror clean for clang-12 (mostly move fixups).

 - Address post-submit comments from #9693.

* [checkpoint] thread safe GenSym
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022
* [Relay] PlanDevices supports 'free' on_device annotations

This is in support of apache#9613, which allows PlanDevices to be run
after lowering so as to flow memory constraints in and
out of PrimFuncs. That requires a way to insert device_copies
when the memory scopes chosen during separate lowering of fused
primitive functions clashes, but otherwise avoid device_copies when
scopes can be chosen so as to avoid them.

We support that by generalizing the "on_device" annotation to
allow the device constraint to be independently controlled for
its 'body' and 'result'.

# Standard user annotation: body is constrained to S
on_device(body, S)

# Used by PlanDevices to 'fix' expression to S
# (was is_fixed=True)
on_device(body, S, constrain_result=True)

# Used by PlanDevices to indicate a device_copy can be
# inserted if necessary.
on_device(body, S, constrain_body=False)

# Supported, but currently has no use.
on_device(body, S, constrain_result=True, constrain_body=False)

A few extra odd's 'n ends collected along the way:
 - Some CallLowered cleanup which I found useful.
 - The usual extra debugging output needed as I debugged.
   In return I removed some particularly verbose logging I'd
   added while tracking down unexpected object copies.
 - Cleanup warnings from clang-12 as I touch files.

* [checkpoint] unused var
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022
…straints. (apache#9613)

* [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints.

This PR:
 1) Makes PlanDevices consider lowered calls when solving device domain constraints.
 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data
    Var type annotation PointerTypes storage_scope fields) to the memory_scope
    fields of the SEScopes which PlanDevices unifies over.
 3) Allows new device_copies to be inserted on the arguments and results of lowered
    calls so as to acount for any memory scope mismatches which are now apparent.

[device_planner.cc has main changes, rest is secondary.]

In the short term we'd like to use this machinery to flow memory scope choices made
during lowering back out into the overall Relay program. In the longer term we'd
also like to be able to use memory scopes to influence the lowering of
yet-to-be-lowered functions (or lowered functions which have yet to been scheduled,
a distinction now possible with TensorIR).

 - Memory scope constraints can flow both out of and in to PrimFuncs
   introduced by LowerTE. In TIR memory scopes are represented by
   'storage scopes' on the PointerType type annotations on TIR Buffer data
   variables.
    - It is straightforward to extract memory scopes from PrimFuncs by
      looking at the PrimFunc's buffer_map. We do this is 'phase 1' of
      PlanDevices, which collects all the device constraints implied by
    - However, pushing memory constraints in to PrimFuncs is more challenging
      due to buffer aliasing. This aspect is still experimental.

 - Allow device_copies to be inserted for both arguments and
   results of PrimFunc calls, on the assumption PlanDevices has
   already established a consistent device assignment prior to
   lowering and any new mismatch is required to match up memory scopes.
   We use the new 'free' on_device annotations to implement this.

Coming along for the ride:

 - To make unit tests of mixed Relay/TIR functions possible needed
   to be able to supply a checked_type to GlobalVar since that's currently
   the only way to give a Relay type to PrimFuncs.

 - Use GenSym to get unique var names in ANF & partial eval so easier
   to diff debug output between passes and connect program fragments
   back into the overall program. Relying on pretty-printing to
   automagically unique-ify var names is certainly cute but until we
   have better span support is very hard to work with.

 - Realized both dead_code.cc and fold_constant.cc would
   happily move values into a different lexical virtual
   device context since device_planner.cc was being
   'clever' and eliding on_devices for let-bound values
   when there's no change. Fixed so that every let-bound
   value has an on_device. Will be much better after
   apache/tvm-rfcs#45 is implemented.

 - Make build -Werror clean for clang-12 (mostly move fixups).

 - Address post-submit comments from apache#9693.

* [checkpoint] thread safe GenSym
yangulei pushed a commit to yangulei/tvm that referenced this pull request Jan 11, 2022
* [Relay] PlanDevices supports 'free' on_device annotations

This is in support of apache#9613, which allows PlanDevices to be run
after lowering so as to flow memory constraints in and
out of PrimFuncs. That requires a way to insert device_copies
when the memory scopes chosen during separate lowering of fused
primitive functions clashes, but otherwise avoid device_copies when
scopes can be chosen so as to avoid them.

We support that by generalizing the "on_device" annotation to
allow the device constraint to be independently controlled for
its 'body' and 'result'.

# Standard user annotation: body is constrained to S
on_device(body, S)

# Used by PlanDevices to 'fix' expression to S
# (was is_fixed=True)
on_device(body, S, constrain_result=True)

# Used by PlanDevices to indicate a device_copy can be
# inserted if necessary.
on_device(body, S, constrain_body=False)

# Supported, but currently has no use.
on_device(body, S, constrain_result=True, constrain_body=False)

A few extra odd's 'n ends collected along the way:
 - Some CallLowered cleanup which I found useful.
 - The usual extra debugging output needed as I debugged.
   In return I removed some particularly verbose logging I'd
   added while tracking down unexpected object copies.
 - Cleanup warnings from clang-12 as I touch files.

* [checkpoint] unused var
yangulei pushed a commit to yangulei/tvm that referenced this pull request Jan 11, 2022
…straints. (apache#9613)

* [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints.

This PR:
 1) Makes PlanDevices consider lowered calls when solving device domain constraints.
 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data
    Var type annotation PointerTypes storage_scope fields) to the memory_scope
    fields of the SEScopes which PlanDevices unifies over.
 3) Allows new device_copies to be inserted on the arguments and results of lowered
    calls so as to acount for any memory scope mismatches which are now apparent.

[device_planner.cc has main changes, rest is secondary.]

In the short term we'd like to use this machinery to flow memory scope choices made
during lowering back out into the overall Relay program. In the longer term we'd
also like to be able to use memory scopes to influence the lowering of
yet-to-be-lowered functions (or lowered functions which have yet to been scheduled,
a distinction now possible with TensorIR).

 - Memory scope constraints can flow both out of and in to PrimFuncs
   introduced by LowerTE. In TIR memory scopes are represented by
   'storage scopes' on the PointerType type annotations on TIR Buffer data
   variables.
    - It is straightforward to extract memory scopes from PrimFuncs by
      looking at the PrimFunc's buffer_map. We do this is 'phase 1' of
      PlanDevices, which collects all the device constraints implied by
    - However, pushing memory constraints in to PrimFuncs is more challenging
      due to buffer aliasing. This aspect is still experimental.

 - Allow device_copies to be inserted for both arguments and
   results of PrimFunc calls, on the assumption PlanDevices has
   already established a consistent device assignment prior to
   lowering and any new mismatch is required to match up memory scopes.
   We use the new 'free' on_device annotations to implement this.

Coming along for the ride:

 - To make unit tests of mixed Relay/TIR functions possible needed
   to be able to supply a checked_type to GlobalVar since that's currently
   the only way to give a Relay type to PrimFuncs.

 - Use GenSym to get unique var names in ANF & partial eval so easier
   to diff debug output between passes and connect program fragments
   back into the overall program. Relying on pretty-printing to
   automagically unique-ify var names is certainly cute but until we
   have better span support is very hard to work with.

 - Realized both dead_code.cc and fold_constant.cc would
   happily move values into a different lexical virtual
   device context since device_planner.cc was being
   'clever' and eliding on_devices for let-bound values
   when there's no change. Fixed so that every let-bound
   value has an on_device. Will be much better after
   apache/tvm-rfcs#45 is implemented.

 - Make build -Werror clean for clang-12 (mostly move fixups).

 - Address post-submit comments from apache#9693.

* [checkpoint] thread safe GenSym
yangulei pushed a commit to yangulei/tvm that referenced this pull request Jan 12, 2022
* [Relay] PlanDevices supports 'free' on_device annotations

This is in support of apache#9613, which allows PlanDevices to be run
after lowering so as to flow memory constraints in and
out of PrimFuncs. That requires a way to insert device_copies
when the memory scopes chosen during separate lowering of fused
primitive functions clashes, but otherwise avoid device_copies when
scopes can be chosen so as to avoid them.

We support that by generalizing the "on_device" annotation to
allow the device constraint to be independently controlled for
its 'body' and 'result'.

# Standard user annotation: body is constrained to S
on_device(body, S)

# Used by PlanDevices to 'fix' expression to S
# (was is_fixed=True)
on_device(body, S, constrain_result=True)

# Used by PlanDevices to indicate a device_copy can be
# inserted if necessary.
on_device(body, S, constrain_body=False)

# Supported, but currently has no use.
on_device(body, S, constrain_result=True, constrain_body=False)

A few extra odd's 'n ends collected along the way:
 - Some CallLowered cleanup which I found useful.
 - The usual extra debugging output needed as I debugged.
   In return I removed some particularly verbose logging I'd
   added while tracking down unexpected object copies.
 - Cleanup warnings from clang-12 as I touch files.

* [checkpoint] unused var
yangulei pushed a commit to yangulei/tvm that referenced this pull request Jan 12, 2022
…straints. (apache#9613)

* [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints.

This PR:
 1) Makes PlanDevices consider lowered calls when solving device domain constraints.
 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data
    Var type annotation PointerTypes storage_scope fields) to the memory_scope
    fields of the SEScopes which PlanDevices unifies over.
 3) Allows new device_copies to be inserted on the arguments and results of lowered
    calls so as to acount for any memory scope mismatches which are now apparent.

[device_planner.cc has main changes, rest is secondary.]

In the short term we'd like to use this machinery to flow memory scope choices made
during lowering back out into the overall Relay program. In the longer term we'd
also like to be able to use memory scopes to influence the lowering of
yet-to-be-lowered functions (or lowered functions which have yet to been scheduled,
a distinction now possible with TensorIR).

 - Memory scope constraints can flow both out of and in to PrimFuncs
   introduced by LowerTE. In TIR memory scopes are represented by
   'storage scopes' on the PointerType type annotations on TIR Buffer data
   variables.
    - It is straightforward to extract memory scopes from PrimFuncs by
      looking at the PrimFunc's buffer_map. We do this is 'phase 1' of
      PlanDevices, which collects all the device constraints implied by
    - However, pushing memory constraints in to PrimFuncs is more challenging
      due to buffer aliasing. This aspect is still experimental.

 - Allow device_copies to be inserted for both arguments and
   results of PrimFunc calls, on the assumption PlanDevices has
   already established a consistent device assignment prior to
   lowering and any new mismatch is required to match up memory scopes.
   We use the new 'free' on_device annotations to implement this.

Coming along for the ride:

 - To make unit tests of mixed Relay/TIR functions possible needed
   to be able to supply a checked_type to GlobalVar since that's currently
   the only way to give a Relay type to PrimFuncs.

 - Use GenSym to get unique var names in ANF & partial eval so easier
   to diff debug output between passes and connect program fragments
   back into the overall program. Relying on pretty-printing to
   automagically unique-ify var names is certainly cute but until we
   have better span support is very hard to work with.

 - Realized both dead_code.cc and fold_constant.cc would
   happily move values into a different lexical virtual
   device context since device_planner.cc was being
   'clever' and eliding on_devices for let-bound values
   when there's no change. Fixed so that every let-bound
   value has an on_device. Will be much better after
   apache/tvm-rfcs#45 is implemented.

 - Make build -Werror clean for clang-12 (mostly move fixups).

 - Address post-submit comments from apache#9693.

* [checkpoint] thread safe GenSym
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
* [Relay] PlanDevices supports 'free' on_device annotations

This is in support of apache#9613, which allows PlanDevices to be run
after lowering so as to flow memory constraints in and
out of PrimFuncs. That requires a way to insert device_copies
when the memory scopes chosen during separate lowering of fused
primitive functions clashes, but otherwise avoid device_copies when
scopes can be chosen so as to avoid them.

We support that by generalizing the "on_device" annotation to
allow the device constraint to be independently controlled for
its 'body' and 'result'.

# Standard user annotation: body is constrained to S
on_device(body, S)

# Used by PlanDevices to 'fix' expression to S
# (was is_fixed=True)
on_device(body, S, constrain_result=True)

# Used by PlanDevices to indicate a device_copy can be
# inserted if necessary.
on_device(body, S, constrain_body=False)

# Supported, but currently has no use.
on_device(body, S, constrain_result=True, constrain_body=False)

A few extra odd's 'n ends collected along the way:
 - Some CallLowered cleanup which I found useful.
 - The usual extra debugging output needed as I debugged.
   In return I removed some particularly verbose logging I'd
   added while tracking down unexpected object copies.
 - Cleanup warnings from clang-12 as I touch files.

* [checkpoint] unused var
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
…straints. (apache#9613)

* [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints.

This PR:
 1) Makes PlanDevices consider lowered calls when solving device domain constraints.
 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data
    Var type annotation PointerTypes storage_scope fields) to the memory_scope
    fields of the SEScopes which PlanDevices unifies over.
 3) Allows new device_copies to be inserted on the arguments and results of lowered
    calls so as to acount for any memory scope mismatches which are now apparent.

[device_planner.cc has main changes, rest is secondary.]

In the short term we'd like to use this machinery to flow memory scope choices made
during lowering back out into the overall Relay program. In the longer term we'd
also like to be able to use memory scopes to influence the lowering of
yet-to-be-lowered functions (or lowered functions which have yet to been scheduled,
a distinction now possible with TensorIR).

 - Memory scope constraints can flow both out of and in to PrimFuncs
   introduced by LowerTE. In TIR memory scopes are represented by
   'storage scopes' on the PointerType type annotations on TIR Buffer data
   variables.
    - It is straightforward to extract memory scopes from PrimFuncs by
      looking at the PrimFunc's buffer_map. We do this is 'phase 1' of
      PlanDevices, which collects all the device constraints implied by
    - However, pushing memory constraints in to PrimFuncs is more challenging
      due to buffer aliasing. This aspect is still experimental.

 - Allow device_copies to be inserted for both arguments and
   results of PrimFunc calls, on the assumption PlanDevices has
   already established a consistent device assignment prior to
   lowering and any new mismatch is required to match up memory scopes.
   We use the new 'free' on_device annotations to implement this.

Coming along for the ride:

 - To make unit tests of mixed Relay/TIR functions possible needed
   to be able to supply a checked_type to GlobalVar since that's currently
   the only way to give a Relay type to PrimFuncs.

 - Use GenSym to get unique var names in ANF & partial eval so easier
   to diff debug output between passes and connect program fragments
   back into the overall program. Relying on pretty-printing to
   automagically unique-ify var names is certainly cute but until we
   have better span support is very hard to work with.

 - Realized both dead_code.cc and fold_constant.cc would
   happily move values into a different lexical virtual
   device context since device_planner.cc was being
   'clever' and eliding on_devices for let-bound values
   when there's no change. Fixed so that every let-bound
   value has an on_device. Will be much better after
   apache/tvm-rfcs#45 is implemented.

 - Make build -Werror clean for clang-12 (mostly move fixups).

 - Address post-submit comments from apache#9693.

* [checkpoint] thread safe GenSym
qsqqsqqsq-intellif pushed a commit to qsqqsqqsq-intellif/tvm that referenced this pull request Apr 29, 2022
* [Relay] PlanDevices supports 'free' on_device annotations

This is in support of apache#9613, which allows PlanDevices to be run
after lowering so as to flow memory constraints in and
out of PrimFuncs. That requires a way to insert device_copies
when the memory scopes chosen during separate lowering of fused
primitive functions clashes, but otherwise avoid device_copies when
scopes can be chosen so as to avoid them.

We support that by generalizing the "on_device" annotation to
allow the device constraint to be independently controlled for
its 'body' and 'result'.

# Standard user annotation: body is constrained to S
on_device(body, S)

# Used by PlanDevices to 'fix' expression to S
# (was is_fixed=True)
on_device(body, S, constrain_result=True)

# Used by PlanDevices to indicate a device_copy can be
# inserted if necessary.
on_device(body, S, constrain_body=False)

# Supported, but currently has no use.
on_device(body, S, constrain_result=True, constrain_body=False)

A few extra odd's 'n ends collected along the way:
 - Some CallLowered cleanup which I found useful.
 - The usual extra debugging output needed as I debugged.
   In return I removed some particularly verbose logging I'd
   added while tracking down unexpected object copies.
 - Cleanup warnings from clang-12 as I touch files.

* [checkpoint] unused var
qsqqsqqsq-intellif pushed a commit to qsqqsqqsq-intellif/tvm that referenced this pull request Apr 29, 2022
…straints. (apache#9613)

* [Relay] Re-run PlanDevices after LowerTE to flow new memory scope constraints.

This PR:
 1) Makes PlanDevices consider lowered calls when solving device domain constraints.
 2) Connects the storage scopes on PrimFunc parameters (encoded in their Buffer data
    Var type annotation PointerTypes storage_scope fields) to the memory_scope
    fields of the SEScopes which PlanDevices unifies over.
 3) Allows new device_copies to be inserted on the arguments and results of lowered
    calls so as to acount for any memory scope mismatches which are now apparent.

[device_planner.cc has main changes, rest is secondary.]

In the short term we'd like to use this machinery to flow memory scope choices made
during lowering back out into the overall Relay program. In the longer term we'd
also like to be able to use memory scopes to influence the lowering of
yet-to-be-lowered functions (or lowered functions which have yet to been scheduled,
a distinction now possible with TensorIR).

 - Memory scope constraints can flow both out of and in to PrimFuncs
   introduced by LowerTE. In TIR memory scopes are represented by
   'storage scopes' on the PointerType type annotations on TIR Buffer data
   variables.
    - It is straightforward to extract memory scopes from PrimFuncs by
      looking at the PrimFunc's buffer_map. We do this is 'phase 1' of
      PlanDevices, which collects all the device constraints implied by
    - However, pushing memory constraints in to PrimFuncs is more challenging
      due to buffer aliasing. This aspect is still experimental.

 - Allow device_copies to be inserted for both arguments and
   results of PrimFunc calls, on the assumption PlanDevices has
   already established a consistent device assignment prior to
   lowering and any new mismatch is required to match up memory scopes.
   We use the new 'free' on_device annotations to implement this.

Coming along for the ride:

 - To make unit tests of mixed Relay/TIR functions possible needed
   to be able to supply a checked_type to GlobalVar since that's currently
   the only way to give a Relay type to PrimFuncs.

 - Use GenSym to get unique var names in ANF & partial eval so easier
   to diff debug output between passes and connect program fragments
   back into the overall program. Relying on pretty-printing to
   automagically unique-ify var names is certainly cute but until we
   have better span support is very hard to work with.

 - Realized both dead_code.cc and fold_constant.cc would
   happily move values into a different lexical virtual
   device context since device_planner.cc was being
   'clever' and eliding on_devices for let-bound values
   when there's no change. Fixed so that every let-bound
   value has an on_device. Will be much better after
   apache/tvm-rfcs#45 is implemented.

 - Make build -Werror clean for clang-12 (mostly move fixups).

 - Address post-submit comments from apache#9693.

* [checkpoint] thread safe GenSym
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants