Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Unified device/target/memory scope planning #38

Closed

Conversation

mbs-octoml
Copy link
Contributor

@mbs-octoml mbs-octoml commented Sep 29, 2021

@manupak
Copy link
Contributor

manupak commented Oct 1, 2021

Hi @mbs-octoml ,

I may have put a related comment here : apache/tvm#8892 (comment)
However, partitioning for devices of same kind is a step forward from unifying the BYOC and Device annotations.

Is this the RFC intended to cover these all ?

@mbs-octoml
Copy link
Contributor Author

Thanks @manupa-arm for the reminder there were some good comments on #8892. I see a work stream:

  1. get the multi-target handling under control, and stop relying on the limiting device type -> target mapping
  2. allow device planning to be re-run with additional memory scope constraints to further partition, those constraints may originate in already lowered PrimFuncs.
  3. allow labeling of sets of target/devices a la BYOC target labeling (but probably continue to just pick the 'first')
  4. bring BYOC target labeling / partitioning and device planning together
  5. replace naive partitioning with something subject to optimization

Here I want to just focus on 1 -- everything beyond that really needs face-to-face discussion.

From 2 onward obviously overlaps your USMP. My vague thought was we can work from opposite ends and reconcile at 5. Ie 2-4 sets us up to work in a combined Relay+TIR world, then 5 is where everything we've learned from USMP could perhaps be replayed. Anyway, that's just a vague thought so I'd love to talk more about it.

@mbs-octoml
Copy link
Contributor Author

mbs-octoml commented Oct 1, 2021

Note to self: The With convention should probably also be removed by this work also, but I've not audited the code to see how pervasive it is.

Target already has a GetAttr<String>("device") convention, but I think the map from name to DLDeviceType is only on the py side, and the attribute is not consistently set.

Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbs-octoml thanks for the draft RFC! added some thoughts

rfcs/00xx-improved-multi-target-handling.md Outdated Show resolved Hide resolved
rfcs/00xx-improved-multi-target-handling.md Outdated Show resolved Hide resolved
rfcs/00xx-improved-multi-target-handling.md Outdated Show resolved Hide resolved
```
(We could also use a `Device` and accept the redundant `DLDeviceType` specification.) It is trivial
to go from an "on_device" label to a `TargetDevice` and back using the global `Target` registry.
5. Remove all uses of `TargetMap`. For example, in `LowerTEPass` we simply use the `TargetDevice` associated with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you propose any replacement in case we do need a map-like struct? Map<target_label, Target>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this from the RFC. In #9313 you'll see I kept TargetMap but introduces a helper class to hid it. Over time I think we can replace TargetMap with just Array, but I feel it's not worth getting specific about that in an RFC and is more just a cleanup task. It may well come out of @Mousius ' work on tvmc target specification cleanup.

rfcs/00xx-improved-multi-target-handling.md Outdated Show resolved Hide resolved
rfcs/00xx-improved-multi-target-handling.md Outdated Show resolved Hide resolved
@mbs-octoml
Copy link
Contributor Author

More notes to self:

  • You can get the device_type for a Target from target_kind->device_type. Somehow I missed that very obvious fact.
  • Eric has a very nice write-up explaining devices vs targets vs device api at docs/dev/device_target_interactions.rst

@tqchen
Copy link
Member

tqchen commented Oct 8, 2021

cc @ zxybazh since you authored the Target system, cc @zhiics @comaniac as it is related to BYOC

@tqchen tqchen added the status: need review RFC needs review label Oct 8, 2021
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Oct 19, 2021
…g in 'device' planning

This is the first step in apache/tvm-rfcs#38 to bring devices
and targets together when doing device planning. I've gone ahead and also included a
memory scope in this object since we will also need to propagate memory scopes across
Relay expressions once this basic preparation is in place. In the meantime that field will be
left as "".

Once device planning works in units of SEScopes it will be possible to directly read off
the device and target for any Relay sub-expression without the need for TargetMaps ort
the construction of default Targets.

SEScopes also support 'Join' and 'Default' operations needed when constraint solving in
the device planner. You can see those in use in my scratchpad branch:
  https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes

This PR also brings some duplicated and the ad-hoc 'default target' handling logic
together into a CompilationConfig class. (Again, see the scratchpad branch for how that
will end up being used). I've placed that next to SEScope since it's main purpose is to
  a) establish the default SEScope for primitive ops
  b) establish the SEScope for the 'host'
  c) feed a definitive vector of Targets into device planning so it can resolve all
     "on_device" and "device_copy" device references to their full SEScope form.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Oct 19, 2021
…g in 'device' planning

This is the first step in apache/tvm-rfcs#38 to bring devices
and targets together when doing device planning. I've gone ahead and also included a
memory scope in this object since we will also need to propagate memory scopes across
Relay expressions once this basic preparation is in place. In the meantime that field will be
left as "".

Once device planning works in units of SEScopes it will be possible to directly read off
the device and target for any Relay sub-expression without the need for TargetMaps ort
the construction of default Targets.

SEScopes also support 'Join' and 'Default' operations needed when constraint solving in
the device planner. You can see those in use in my scratchpad branch:
  https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes

This PR also brings some duplicated and the ad-hoc 'default target' handling logic
together into a CompilationConfig class. (Again, see the scratchpad branch for how that
will end up being used). I've placed that next to SEScope since it's main purpose is to
  a) establish the default SEScope for primitive ops
  b) establish the SEScope for the 'host'
  c) feed a definitive vector of Targets into device planning so it can resolve all
     "on_device" and "device_copy" device references to their full SEScope form.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Oct 19, 2021
…g in 'device' planning

This is the first step in apache/tvm-rfcs#38 to bring devices
and targets together when doing device planning. I've gone ahead and also included a
memory scope in this object since we will also need to propagate memory scopes across
Relay expressions once this basic preparation is in place. In the meantime that field will be
left as "".

Once device planning works in units of SEScopes it will be possible to directly read off
the device and target for any Relay sub-expression without the need for TargetMaps ort
the construction of default Targets.

SEScopes also support 'Join' and 'Default' operations needed when constraint solving in
the device planner. You can see those in use in my scratchpad branch:
  https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes

This PR also brings some duplicated and the ad-hoc 'default target' handling logic
together into a CompilationConfig class. (Again, see the scratchpad branch for how that
will end up being used). I've placed that next to SEScope since it's main purpose is to
  a) establish the default SEScope for primitive ops
  b) establish the SEScope for the 'host'
  c) feed a definitive vector of Targets into device planning so it can resolve all
     "on_device" and "device_copy" device references to their full SEScope form.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Oct 20, 2021
…g in 'device' planning

This is the first step in apache/tvm-rfcs#38 to bring devices
and targets together when doing device planning. I've gone ahead and also included a
memory scope in this object since we will also need to propagate memory scopes across
Relay expressions once this basic preparation is in place. In the meantime that field will be
left as "".

Once device planning works in units of SEScopes it will be possible to directly read off
the device and target for any Relay sub-expression without the need for TargetMaps ort
the construction of default Targets.

SEScopes also support 'Join' and 'Default' operations needed when constraint solving in
the device planner. You can see those in use in my scratchpad branch:
  https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes

This PR also brings some duplicated and the ad-hoc 'default target' handling logic
together into a CompilationConfig class. (Again, see the scratchpad branch for how that
will end up being used). I've placed that next to SEScope since it's main purpose is to
  a) establish the default SEScope for primitive ops
  b) establish the SEScope for the 'host'
  c) feed a definitive vector of Targets into device planning so it can resolve all
     "on_device" and "device_copy" device references to their full SEScope form.
@mbs-octoml mbs-octoml changed the title [RFC] Improved multi-target handling [RFC] Unified device/target/memory scope planning Oct 21, 2021
@mbs-octoml
Copy link
Contributor Author

mbs-octoml commented Oct 21, 2021

Thanks for the comments @areusch and @manupa-arm . Now that I've started working on this (with an emphasis on handling memory scopes) I've decided to shift focus a bit. In particular Manupa I'm now much more motivated to tackle the BYOC/device planning overlap aspect, which I think you're particularly interested in. PTAL.

mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Oct 21, 2021
…g in 'device' planning

This is the first step in apache/tvm-rfcs#38 to bring devices
and targets together when doing device planning. I've gone ahead and also included a
memory scope in this object since we will also need to propagate memory scopes across
Relay expressions once this basic preparation is in place. In the meantime that field will be
left as "".

Once device planning works in units of SEScopes it will be possible to directly read off
the device and target for any Relay sub-expression without the need for TargetMaps ort
the construction of default Targets.

SEScopes also support 'Join' and 'Default' operations needed when constraint solving in
the device planner. You can see those in use in my scratchpad branch:
  https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes

This PR also brings some duplicated and the ad-hoc 'default target' handling logic
together into a CompilationConfig class. (Again, see the scratchpad branch for how that
will end up being used). I've placed that next to SEScope since it's main purpose is to
  a) establish the default SEScope for primitive ops
  b) establish the SEScope for the 'host'
  c) feed a definitive vector of Targets into device planning so it can resolve all
     "on_device" and "device_copy" device references to their full SEScope form.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Oct 22, 2021
…g in 'device' planning

This is the first step in apache/tvm-rfcs#38 to bring devices
and targets together when doing device planning. I've gone ahead and also included a
memory scope in this object since we will also need to propagate memory scopes across
Relay expressions once this basic preparation is in place. In the meantime that field will be
left as "".

Once device planning works in units of SEScopes it will be possible to directly read off
the device and target for any Relay sub-expression without the need for TargetMaps ort
the construction of default Targets.

SEScopes also support 'Join' and 'Default' operations needed when constraint solving in
the device planner. You can see those in use in my scratchpad branch:
  https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes

This PR also brings some duplicated and the ad-hoc 'default target' handling logic
together into a CompilationConfig class. (Again, see the scratchpad branch for how that
will end up being used). I've placed that next to SEScope since it's main purpose is to
  a) establish the default SEScope for primitive ops
  b) establish the SEScope for the 'host'
  c) feed a definitive vector of Targets into device planning so it can resolve all
     "on_device" and "device_copy" device references to their full SEScope form.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Oct 27, 2021
…g in 'device' planning

This is the first step in apache/tvm-rfcs#38 to bring devices
and targets together when doing device planning. I've gone ahead and also included a
memory scope in this object since we will also need to propagate memory scopes across
Relay expressions once this basic preparation is in place. In the meantime that field will be
left as "".

Once device planning works in units of SEScopes it will be possible to directly read off
the device and target for any Relay sub-expression without the need for TargetMaps ort
the construction of default Targets.

SEScopes also support 'Join' and 'Default' operations needed when constraint solving in
the device planner. You can see those in use in my scratchpad branch:
  https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes

This PR also brings some duplicated and the ad-hoc 'default target' handling logic
together into a CompilationConfig class. (Again, see the scratchpad branch for how that
will end up being used). I've placed that next to SEScope since it's main purpose is to
  a) establish the default SEScope for primitive ops
  b) establish the SEScope for the 'host'
  c) feed a definitive vector of Targets into device planning so it can resolve all
     "on_device" and "device_copy" device references to their full SEScope form.
@comaniac
Copy link
Contributor

For the proposed BYOC flow (i.e., MergeAndAnnotate/AnnotateSEScopes/PlanDevices/PartitionBySEScope), it doesn't clear to me that whether the developer programming model will change or not. Specifically, could we still use the current approaches (i.e., op-based annotation and pattern-based annotation)? Also how the approach that has a custom Relay pass to annotate compiler_begin/compiler_end would change? Thanks.

mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Nov 2, 2021
…g in 'device' planning

This is the first step in apache/tvm-rfcs#38 to bring devices
and targets together when doing device planning. I've gone ahead and also included a
memory scope in this object since we will also need to propagate memory scopes across
Relay expressions once this basic preparation is in place. In the meantime that field will be
left as "".

Once device planning works in units of SEScopes it will be possible to directly read off
the device and target for any Relay sub-expression without the need for TargetMaps ort
the construction of default Targets.

SEScopes also support 'Join' and 'Default' operations needed when constraint solving in
the device planner. You can see those in use in my scratchpad branch:
  https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes

This PR also brings some duplicated and the ad-hoc 'default target' handling logic
together into a CompilationConfig class. (Again, see the scratchpad branch for how that
will end up being used). I've placed that next to SEScope since it's main purpose is to
  a) establish the default SEScope for primitive ops
  b) establish the SEScope for the 'host'
  c) feed a definitive vector of Targets into device planning so it can resolve all
     "on_device" and "device_copy" device references to their full SEScope form.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Nov 2, 2021
…g in 'device' planning

This is the first step in apache/tvm-rfcs#38 to bring devices
and targets together when doing device planning. I've gone ahead and also included a
memory scope in this object since we will also need to propagate memory scopes across
Relay expressions once this basic preparation is in place. In the meantime that field will be
left as "".

Once device planning works in units of SEScopes it will be possible to directly read off
the device and target for any Relay sub-expression without the need for TargetMaps ort
the construction of default Targets.

SEScopes also support 'Join' and 'Default' operations needed when constraint solving in
the device planner. You can see those in use in my scratchpad branch:
  https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes

This PR also brings some duplicated and the ad-hoc 'default target' handling logic
together into a CompilationConfig class. (Again, see the scratchpad branch for how that
will end up being used). I've placed that next to SEScope since it's main purpose is to
  a) establish the default SEScope for primitive ops
  b) establish the SEScope for the 'host'
  c) feed a definitive vector of Targets into device planning so it can resolve all
     "on_device" and "device_copy" device references to their full SEScope form.
tqchen pushed a commit to apache/tvm that referenced this pull request Nov 3, 2021
…g in 'device' planning. (#9313)

[Target] Adds SEScope (Storage/Execution Scope) for use as new unit of planning in 'device' planning

This is the first step in apache/tvm-rfcs#38 to bring devices
and targets together when doing device planning. I've gone ahead and also included a
memory scope in this object since we will also need to propagate memory scopes across
Relay expressions once this basic preparation is in place. In the meantime that field will be
left as "".

Once device planning works in units of SEScopes it will be possible to directly read off
the device and target for any Relay sub-expression without the need for TargetMaps ort
the construction of default Targets.

SEScopes also support 'Join' and 'Default' operations needed when constraint solving in
the device planner. You can see those in use in my scratchpad branch:
  https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes

This PR also brings some duplicated and the ad-hoc 'default target' handling logic
together into a CompilationConfig class. (Again, see the scratchpad branch for how that
will end up being used). I've placed that next to SEScope since it's main purpose is to
  a) establish the default SEScope for primitive ops
  b) establish the SEScope for the 'host'
  c) feed a definitive vector of Targets into device planning so it can resolve all
     "on_device" and "device_copy" device references to their full SEScope form.

* Reworked to avoid global SEScopeCache.

Realized while working through unit tests in the sequel that it's reasonable
for folks to call build multiple times with distinct Target objects, in which
case the global cache would grow without bound.

So instead placed the cache in the CompilationConfig class. Since that class
now has everything the device planner needs to do its job, promoted it to
be an FFI-able Object, which is now in compilation_config.{h,cc}.

I think we can do much better with CompilationConfig, but for now keeping it
to the minimum I needed to prepare for device planning from all the executor
compilation codepaths.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Nov 4, 2021
[checkpoint] rebase

[checkpoint] fix merge

[checkpoint] lint

[checkpoint] rebase

[checkpoint] Fixed stray use of kDLCPU in vm/profiler/vm.cc

[checkpoint] lint trivia

[checkpoint] fix unit tests

[checkpoint] device planner unit tests passing again

[checkpoint] Switch over to new CompilerOptions

[checkpoint] include

[checkpoint] Almost working again

Need to move the SEScopeCache into CompilationConfig
and pass that into DeviceDomains instead of just the
Vector<Target>. Then the host_se_scope can be memoized
so that direct uses of that scope downstream will match
up with se_scopes already established by PlanDevices.

Sigh.

[checkpoint] Use cache in device domains.

[checkpoint] more moves

[checkpoint] lints

[checkpoint] Fix merge with VM profiling changes.

[checkpoint] trivial

[checkpoint] rebase fix

[checkpoint] More unit tests.

Getting ready to fork out SEScope changes alone.

[checkpoint] lints

[checkpoint] All plan devices unit tests pass

[checkpoint] First unit test passes

[checkpoint] Another go at target management

This at least centralizes all the hackery. Compiles.

[commit] Start to rollback resolving to target in planner.

Better is to do it as stand alone pass I think.
Besides it doesn't work with the structural test for expected output.

[checkpoint] Almost have first unit test going.

About to merge Michalis' changes.
target_host is still a mess.
Starting to eliminate target_map.

[checkpoint] Cleanup VM device matching

[checkpoint] Compiles

[checkpoint] First sweep replacing DLDeviceType with SEScope

VM still not done.

[checkpoint] Expose CompilationConfig ctor in py

[checkpoint] CompilationConfig is nullable for default ctor

[checkpoint] Don't use target:: namespace

[checkpoint] Promote CompilationConfig to be FFI-friendly Object

Also rework to never mix the host_target into the 'primitive' targets.

[checkpoint] ResolveSEScope on CompilationConfig

[checkpoint] hash_reduce using target's data ptr

[checkpoint] Share FullyUnconstrained

[checkpoint] Backtrack on using global memoization for SEScope

Realized while working through unit tests in the sequel that it's reasonable
for folks to call build multiple times with distinct Target objects, in which
case the global cache would grow without bound.

I'll instead tackle memoization of SEScopes directly in device_domains.cc.

[checkpoint] Improve back compat for homogeneous case

If no host target is given but we have a unique target of
kDLCPU device type then also use that for the host.

Reworked to avoid global SEScopeCache.

Realized while working through unit tests in the sequel that it's reasonable
for folks to call build multiple times with distinct Target objects, in which
case the global cache would grow without bound.

So instead placed the cache in the CompilationConfig class. Since that class
now has everything the device planner needs to do its job, promoted it to
be an FFI-able Object, which is now in compilation_config.{h,cc}.

I think we can do much better with CompilationConfig, but for now keeping it
to the minimum I needed to prepare for device planning from all the executor
compilation codepaths.

Adds SEScope (Storage/Execution Scope) for use as new unit of planning in 'device' planning

This is the first step in apache/tvm-rfcs#38 to bring devices
and targets together when doing device planning. I've gone ahead and also included a
memory scope in this object since we will also need to propagate memory scopes across
Relay expressions once this basic preparation is in place. In the meantime that field will be
left as "".

Once device planning works in units of SEScopes it will be possible to directly read off
the device and target for any Relay sub-expression without the need for TargetMaps ort
the construction of default Targets.

SEScopes also support 'Join' and 'Default' operations needed when constraint solving in
the device planner. You can see those in use in my scratchpad branch:
  https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes

This PR also brings some duplicated and the ad-hoc 'default target' handling logic
together into a CompilationConfig class. (Again, see the scratchpad branch for how that
will end up being used). I've placed that next to SEScope since it's main purpose is to
  a) establish the default SEScope for primitive ops
  b) establish the SEScope for the 'host'
  c) feed a definitive vector of Targets into device planning so it can resolve all
     "on_device" and "device_copy" device references to their full SEScope form.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Nov 4, 2021
[checkpoint] rebase

[checkpoint] fix merge

[checkpoint] lint

[checkpoint] rebase

[checkpoint] Fixed stray use of kDLCPU in vm/profiler/vm.cc

[checkpoint] lint trivia

[checkpoint] fix unit tests

[checkpoint] device planner unit tests passing again

[checkpoint] Switch over to new CompilerOptions

[checkpoint] include

[checkpoint] Almost working again

Need to move the SEScopeCache into CompilationConfig
and pass that into DeviceDomains instead of just the
Vector<Target>. Then the host_se_scope can be memoized
so that direct uses of that scope downstream will match
up with se_scopes already established by PlanDevices.

Sigh.

[checkpoint] Use cache in device domains.

[checkpoint] more moves

[checkpoint] lints

[checkpoint] Fix merge with VM profiling changes.

[checkpoint] trivial

[checkpoint] rebase fix

[checkpoint] More unit tests.

Getting ready to fork out SEScope changes alone.

[checkpoint] lints

[checkpoint] All plan devices unit tests pass

[checkpoint] First unit test passes

[checkpoint] Another go at target management

This at least centralizes all the hackery. Compiles.

[commit] Start to rollback resolving to target in planner.

Better is to do it as stand alone pass I think.
Besides it doesn't work with the structural test for expected output.

[checkpoint] Almost have first unit test going.

About to merge Michalis' changes.
target_host is still a mess.
Starting to eliminate target_map.

[checkpoint] Cleanup VM device matching

[checkpoint] Compiles

[checkpoint] First sweep replacing DLDeviceType with SEScope

VM still not done.

[checkpoint] Expose CompilationConfig ctor in py

[checkpoint] CompilationConfig is nullable for default ctor

[checkpoint] Don't use target:: namespace

[checkpoint] Promote CompilationConfig to be FFI-friendly Object

Also rework to never mix the host_target into the 'primitive' targets.

[checkpoint] ResolveSEScope on CompilationConfig

[checkpoint] hash_reduce using target's data ptr

[checkpoint] Share FullyUnconstrained

[checkpoint] Backtrack on using global memoization for SEScope

Realized while working through unit tests in the sequel that it's reasonable
for folks to call build multiple times with distinct Target objects, in which
case the global cache would grow without bound.

I'll instead tackle memoization of SEScopes directly in device_domains.cc.

[checkpoint] Improve back compat for homogeneous case

If no host target is given but we have a unique target of
kDLCPU device type then also use that for the host.

Reworked to avoid global SEScopeCache.

Realized while working through unit tests in the sequel that it's reasonable
for folks to call build multiple times with distinct Target objects, in which
case the global cache would grow without bound.

So instead placed the cache in the CompilationConfig class. Since that class
now has everything the device planner needs to do its job, promoted it to
be an FFI-able Object, which is now in compilation_config.{h,cc}.

I think we can do much better with CompilationConfig, but for now keeping it
to the minimum I needed to prepare for device planning from all the executor
compilation codepaths.

Adds SEScope (Storage/Execution Scope) for use as new unit of planning in 'device' planning

This is the first step in apache/tvm-rfcs#38 to bring devices
and targets together when doing device planning. I've gone ahead and also included a
memory scope in this object since we will also need to propagate memory scopes across
Relay expressions once this basic preparation is in place. In the meantime that field will be
left as "".

Once device planning works in units of SEScopes it will be possible to directly read off
the device and target for any Relay sub-expression without the need for TargetMaps ort
the construction of default Targets.

SEScopes also support 'Join' and 'Default' operations needed when constraint solving in
the device planner. You can see those in use in my scratchpad branch:
  https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes

This PR also brings some duplicated and the ad-hoc 'default target' handling logic
together into a CompilationConfig class. (Again, see the scratchpad branch for how that
will end up being used). I've placed that next to SEScope since it's main purpose is to
  a) establish the default SEScope for primitive ops
  b) establish the SEScope for the 'host'
  c) feed a definitive vector of Targets into device planning so it can resolve all
     "on_device" and "device_copy" device references to their full SEScope form.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Nov 5, 2021
[checkpoint] pretty printing fixes

[checkpoint] Don't dup devices in executable, more unit tests

[checkpoint] woops, left target str debug in

Added Target::ToDebugString() so I can see the hosts since
they were giving me a lot of trouble.

[checkpoint] more pretty printing hackery, interpreter respects host devices

Also try harder to integrate the existing target->host mechanism into
CompilationConfig.

[checkpoint] Almost working again

 - Unit test setup distinguishes CPU for prims from CPU for host.
 - Get pretty printing to use the SEScopeNode ReprPrinter.
 - Allow host and primitive to have same device types.

test_dynamic_input failing

[checkpoint] rebase

[checkpoint] fix merge

[checkpoint] lint

[checkpoint] rebase

[checkpoint] Fixed stray use of kDLCPU in vm/profiler/vm.cc

[checkpoint] lint trivia

[checkpoint] fix unit tests

[checkpoint] device planner unit tests passing again

[checkpoint] Switch over to new CompilerOptions

[checkpoint] include

[checkpoint] Almost working again

Need to move the SEScopeCache into CompilationConfig
and pass that into DeviceDomains instead of just the
Vector<Target>. Then the host_se_scope can be memoized
so that direct uses of that scope downstream will match
up with se_scopes already established by PlanDevices.

Sigh.

[checkpoint] Use cache in device domains.

[checkpoint] more moves

[checkpoint] lints

[checkpoint] Fix merge with VM profiling changes.

[checkpoint] trivial

[checkpoint] rebase fix

[checkpoint] More unit tests.

Getting ready to fork out SEScope changes alone.

[checkpoint] lints

[checkpoint] All plan devices unit tests pass

[checkpoint] First unit test passes

[checkpoint] Another go at target management

This at least centralizes all the hackery. Compiles.

[commit] Start to rollback resolving to target in planner.

Better is to do it as stand alone pass I think.
Besides it doesn't work with the structural test for expected output.

[checkpoint] Almost have first unit test going.

About to merge Michalis' changes.
target_host is still a mess.
Starting to eliminate target_map.

[checkpoint] Cleanup VM device matching

[checkpoint] Compiles

[checkpoint] First sweep replacing DLDeviceType with SEScope

VM still not done.

[checkpoint] Expose CompilationConfig ctor in py

[checkpoint] CompilationConfig is nullable for default ctor

[checkpoint] Don't use target:: namespace

[checkpoint] Promote CompilationConfig to be FFI-friendly Object

Also rework to never mix the host_target into the 'primitive' targets.

[checkpoint] ResolveSEScope on CompilationConfig

[checkpoint] hash_reduce using target's data ptr

[checkpoint] Share FullyUnconstrained

[checkpoint] Backtrack on using global memoization for SEScope

Realized while working through unit tests in the sequel that it's reasonable
for folks to call build multiple times with distinct Target objects, in which
case the global cache would grow without bound.

I'll instead tackle memoization of SEScopes directly in device_domains.cc.

[checkpoint] Improve back compat for homogeneous case

If no host target is given but we have a unique target of
kDLCPU device type then also use that for the host.

Reworked to avoid global SEScopeCache.

Realized while working through unit tests in the sequel that it's reasonable
for folks to call build multiple times with distinct Target objects, in which
case the global cache would grow without bound.

So instead placed the cache in the CompilationConfig class. Since that class
now has everything the device planner needs to do its job, promoted it to
be an FFI-able Object, which is now in compilation_config.{h,cc}.

I think we can do much better with CompilationConfig, but for now keeping it
to the minimum I needed to prepare for device planning from all the executor
compilation codepaths.

Adds SEScope (Storage/Execution Scope) for use as new unit of planning in 'device' planning

This is the first step in apache/tvm-rfcs#38 to bring devices
and targets together when doing device planning. I've gone ahead and also included a
memory scope in this object since we will also need to propagate memory scopes across
Relay expressions once this basic preparation is in place. In the meantime that field will be
left as "".

Once device planning works in units of SEScopes it will be possible to directly read off
the device and target for any Relay sub-expression without the need for TargetMaps ort
the construction of default Targets.

SEScopes also support 'Join' and 'Default' operations needed when constraint solving in
the device planner. You can see those in use in my scratchpad branch:
  https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes

This PR also brings some duplicated and the ad-hoc 'default target' handling logic
together into a CompilationConfig class. (Again, see the scratchpad branch for how that
will end up being used). I've placed that next to SEScope since it's main purpose is to
  a) establish the default SEScope for primitive ops
  b) establish the SEScope for the 'host'
  c) feed a definitive vector of Targets into device planning so it can resolve all
     "on_device" and "device_copy" device references to their full SEScope form.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Nov 5, 2021
[checkpoint] bad rebase

[checkpoint] pretty printing fixes

[checkpoint] Don't dup devices in executable, more unit tests

[checkpoint] woops, left target str debug in

Added Target::ToDebugString() so I can see the hosts since
they were giving me a lot of trouble.

[checkpoint] more pretty printing hackery, interpreter respects host devices

Also try harder to integrate the existing target->host mechanism into
CompilationConfig.

[checkpoint] Almost working again

 - Unit test setup distinguishes CPU for prims from CPU for host.
 - Get pretty printing to use the SEScopeNode ReprPrinter.
 - Allow host and primitive to have same device types.

test_dynamic_input failing

[checkpoint] rebase

[checkpoint] fix merge

[checkpoint] lint

[checkpoint] rebase

[checkpoint] Fixed stray use of kDLCPU in vm/profiler/vm.cc

[checkpoint] lint trivia

[checkpoint] fix unit tests

[checkpoint] device planner unit tests passing again

[checkpoint] Switch over to new CompilerOptions

[checkpoint] include

[checkpoint] Almost working again

Need to move the SEScopeCache into CompilationConfig
and pass that into DeviceDomains instead of just the
Vector<Target>. Then the host_se_scope can be memoized
so that direct uses of that scope downstream will match
up with se_scopes already established by PlanDevices.

Sigh.

[checkpoint] Use cache in device domains.

[checkpoint] more moves

[checkpoint] lints

[checkpoint] Fix merge with VM profiling changes.

[checkpoint] trivial

[checkpoint] rebase fix

[checkpoint] More unit tests.

Getting ready to fork out SEScope changes alone.

[checkpoint] lints

[checkpoint] All plan devices unit tests pass

[checkpoint] First unit test passes

[checkpoint] Another go at target management

This at least centralizes all the hackery. Compiles.

[commit] Start to rollback resolving to target in planner.

Better is to do it as stand alone pass I think.
Besides it doesn't work with the structural test for expected output.

[checkpoint] Almost have first unit test going.

About to merge Michalis' changes.
target_host is still a mess.
Starting to eliminate target_map.

[checkpoint] Cleanup VM device matching

[checkpoint] Compiles

[checkpoint] First sweep replacing DLDeviceType with SEScope

VM still not done.

[checkpoint] Expose CompilationConfig ctor in py

[checkpoint] CompilationConfig is nullable for default ctor

[checkpoint] Don't use target:: namespace

[checkpoint] Promote CompilationConfig to be FFI-friendly Object

Also rework to never mix the host_target into the 'primitive' targets.

[checkpoint] ResolveSEScope on CompilationConfig

[checkpoint] hash_reduce using target's data ptr

[checkpoint] Share FullyUnconstrained

[checkpoint] Backtrack on using global memoization for SEScope

Realized while working through unit tests in the sequel that it's reasonable
for folks to call build multiple times with distinct Target objects, in which
case the global cache would grow without bound.

I'll instead tackle memoization of SEScopes directly in device_domains.cc.

[checkpoint] Improve back compat for homogeneous case

If no host target is given but we have a unique target of
kDLCPU device type then also use that for the host.

Reworked to avoid global SEScopeCache.

Realized while working through unit tests in the sequel that it's reasonable
for folks to call build multiple times with distinct Target objects, in which
case the global cache would grow without bound.

So instead placed the cache in the CompilationConfig class. Since that class
now has everything the device planner needs to do its job, promoted it to
be an FFI-able Object, which is now in compilation_config.{h,cc}.

I think we can do much better with CompilationConfig, but for now keeping it
to the minimum I needed to prepare for device planning from all the executor
compilation codepaths.

Adds SEScope (Storage/Execution Scope) for use as new unit of planning in 'device' planning

This is the first step in apache/tvm-rfcs#38 to bring devices
and targets together when doing device planning. I've gone ahead and also included a
memory scope in this object since we will also need to propagate memory scopes across
Relay expressions once this basic preparation is in place. In the meantime that field will be
left as "".

Once device planning works in units of SEScopes it will be possible to directly read off
the device and target for any Relay sub-expression without the need for TargetMaps ort
the construction of default Targets.

SEScopes also support 'Join' and 'Default' operations needed when constraint solving in
the device planner. You can see those in use in my scratchpad branch:
  https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes

This PR also brings some duplicated and the ad-hoc 'default target' handling logic
together into a CompilationConfig class. (Again, see the scratchpad branch for how that
will end up being used). I've placed that next to SEScope since it's main purpose is to
  a) establish the default SEScope for primitive ops
  b) establish the SEScope for the 'host'
  c) feed a definitive vector of Targets into device planning so it can resolve all
     "on_device" and "device_copy" device references to their full SEScope form.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Nov 6, 2021
[checkpoint] bad rebase

[checkpoint] pretty printing fixes

[checkpoint] Don't dup devices in executable, more unit tests

[checkpoint] woops, left target str debug in

Added Target::ToDebugString() so I can see the hosts since
they were giving me a lot of trouble.

[checkpoint] more pretty printing hackery, interpreter respects host devices

Also try harder to integrate the existing target->host mechanism into
CompilationConfig.

[checkpoint] Almost working again

 - Unit test setup distinguishes CPU for prims from CPU for host.
 - Get pretty printing to use the SEScopeNode ReprPrinter.
 - Allow host and primitive to have same device types.

test_dynamic_input failing

[checkpoint] rebase

[checkpoint] fix merge

[checkpoint] lint

[checkpoint] rebase

[checkpoint] Fixed stray use of kDLCPU in vm/profiler/vm.cc

[checkpoint] lint trivia

[checkpoint] fix unit tests

[checkpoint] device planner unit tests passing again

[checkpoint] Switch over to new CompilerOptions

[checkpoint] include

[checkpoint] Almost working again

Need to move the SEScopeCache into CompilationConfig
and pass that into DeviceDomains instead of just the
Vector<Target>. Then the host_se_scope can be memoized
so that direct uses of that scope downstream will match
up with se_scopes already established by PlanDevices.

Sigh.

[checkpoint] Use cache in device domains.

[checkpoint] more moves

[checkpoint] lints

[checkpoint] Fix merge with VM profiling changes.

[checkpoint] trivial

[checkpoint] rebase fix

[checkpoint] More unit tests.

Getting ready to fork out SEScope changes alone.

[checkpoint] lints

[checkpoint] All plan devices unit tests pass

[checkpoint] First unit test passes

[checkpoint] Another go at target management

This at least centralizes all the hackery. Compiles.

[commit] Start to rollback resolving to target in planner.

Better is to do it as stand alone pass I think.
Besides it doesn't work with the structural test for expected output.

[checkpoint] Almost have first unit test going.

About to merge Michalis' changes.
target_host is still a mess.
Starting to eliminate target_map.

[checkpoint] Cleanup VM device matching

[checkpoint] Compiles

[checkpoint] First sweep replacing DLDeviceType with SEScope

VM still not done.

[checkpoint] Expose CompilationConfig ctor in py

[checkpoint] CompilationConfig is nullable for default ctor

[checkpoint] Don't use target:: namespace

[checkpoint] Promote CompilationConfig to be FFI-friendly Object

Also rework to never mix the host_target into the 'primitive' targets.

[checkpoint] ResolveSEScope on CompilationConfig

[checkpoint] hash_reduce using target's data ptr

[checkpoint] Share FullyUnconstrained

[checkpoint] Backtrack on using global memoization for SEScope

Realized while working through unit tests in the sequel that it's reasonable
for folks to call build multiple times with distinct Target objects, in which
case the global cache would grow without bound.

I'll instead tackle memoization of SEScopes directly in device_domains.cc.

[checkpoint] Improve back compat for homogeneous case

If no host target is given but we have a unique target of
kDLCPU device type then also use that for the host.

Reworked to avoid global SEScopeCache.

Realized while working through unit tests in the sequel that it's reasonable
for folks to call build multiple times with distinct Target objects, in which
case the global cache would grow without bound.

So instead placed the cache in the CompilationConfig class. Since that class
now has everything the device planner needs to do its job, promoted it to
be an FFI-able Object, which is now in compilation_config.{h,cc}.

I think we can do much better with CompilationConfig, but for now keeping it
to the minimum I needed to prepare for device planning from all the executor
compilation codepaths.

Adds SEScope (Storage/Execution Scope) for use as new unit of planning in 'device' planning

This is the first step in apache/tvm-rfcs#38 to bring devices
and targets together when doing device planning. I've gone ahead and also included a
memory scope in this object since we will also need to propagate memory scopes across
Relay expressions once this basic preparation is in place. In the meantime that field will be
left as "".

Once device planning works in units of SEScopes it will be possible to directly read off
the device and target for any Relay sub-expression without the need for TargetMaps ort
the construction of default Targets.

SEScopes also support 'Join' and 'Default' operations needed when constraint solving in
the device planner. You can see those in use in my scratchpad branch:
  https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes

This PR also brings some duplicated and the ad-hoc 'default target' handling logic
together into a CompilationConfig class. (Again, see the scratchpad branch for how that
will end up being used). I've placed that next to SEScope since it's main purpose is to
  a) establish the default SEScope for primitive ops
  b) establish the SEScope for the 'host'
  c) feed a definitive vector of Targets into device planning so it can resolve all
     "on_device" and "device_copy" device references to their full SEScope form.
Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbs-octoml some comments/clarifications, can you also remove/address the boilerplate?

# Summary
[summary]: #summary

TVM supports 'hetrogeneous' execution, whereby primitive operators may be (sequentially) evaluated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: sequentially is a bit misleading--maybe suggest

Suggested change
TVM supports 'hetrogeneous' execution, whereby primitive operators may be (sequentially) evaluated
TVM supports 'hetrogeneous' execution, whereby primitive operators may be evaluated (in topological order)

should reside on a device with a given `DLDeviceType` (`kDLCPU`, `kDLCUDA`, etc).
2. The `PlanDevices` pass uses those annotations to decide the unique device for every Relay
sub-expression, including every primitive operator call. Sub-expressions which are unconstrained
are assigned to the 'default' device. The pass then inserts `device_copy` operators whenever data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"default" also is called "fallback," right?

sub-expression, including every primitive operator call. Sub-expressions which are unconstrained
are assigned to the 'default' device. The pass then inserts `device_copy` operators whenever data
needs to cross device boundaries.
3. The user must also supply a list of `Target` objects. The compiler uses that list to build
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to clarify as they are also required at runtime to the executor ctor

Suggested change
3. The user must also supply a list of `Target` objects. The compiler uses that list to build
3. The user must also supply a list of `Target` objects to `tvm.relay.build`. The compiler uses that list to build


TVM supports 'hetrogeneous' execution, whereby primitive operators may be (sequentially) evaluated
on more than one device (GPU, CPU, accelerator, etc). For the non-BYOC flow this works as follows:
1. Relay programs may contain `on_device` annotations which specify that a sub-expression's result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so is this constraining only the output of a particular subgraph (e.g. the subgraph can be actually implemented on a different device so long as a memory copy is done?)

in-tree.)
3. The `AnnotateTarget` pass looks for the annotations from (1) and (2) to decide the unique
toolchain name for every Relay sub-expression which should go via a BYOC path. The transitions in
to and out of those sub-expressions are marked with `compiler_begin` and `compiler_end`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, because i've seen compiler_begin and compiler_end before but not many examples in complex programs: are these essentially a source-level annotation e.g. marking all Relay expressions between the two annotations as offloaded to a particular compiler? why shouldn't these be hierarchical e.g. CompilerBlock which contains the subgraph as a tree?

```
class SEScope {
DLDeviceType device_type;
int virtual_device_id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this should be a String name which makes sense to the user. Doing this is helpful for a couple other reasons besides the compilation UI:

  • In generated source code, it's possible to refer to the device by name. In particular, the embedded C API would like to have this for the conglomerate tvm_device_t struct.
  • In systems with multiple e.g. CPUs, using an index here then implies some ordering (e.g. littlest CPU to biggest). It's better to make the assignment of ID to CPU capability more explicit

Finally, using a name would simplify the heterogeneous Target.

However, this is a bit of a lift. I do feel strongly we should get to this world. If it's not something that makes sense to do now, we could also revisit after or concurrent with USMP.

`PlanDevices`. In particular, any `SEScope` encountered during device planning is 'canonicalized' to fill
in a `Target` by the same lookup as we do today. This means we continue to support the easy shorthand of
referring to devices by the `DLDeviceType` alone. However, advanced users can supply a `SEScope` to these
operators which contains the exact `Target` to use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would be roughly the deprecation plan here? eventually we ban all the inputs to the compiler which could refer to SEScope in terms of DLDeviceType and then tighten the typing requirements here? this would be a backwards-incompatible Relay change. cc @jroesch

tir::PrimFunc.buffer_map -> tir::Buffer.data -> tir::Var.type_annotation -> PointerType.storage_scope -> String
```

to discover the memory scope for each Relay argument. That scope will enter `SEScope`s and flow through the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean by "enter SEScopes"?

6. We rework `PartitionGraph` to `PartitionBySEScope` to work on `SEScope` annotations instead of
`compiler_begin` and `compiler_end` annotations. Algorithmically it's not a big change -- maximal
sub-expressions which share the same `SEScope` (or a projection thereof, eg just the `target`) are hoisted
into global `Function`s. The function's `"result_se_scope"` attribute describes both the scope holding the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so then here, this sort of implements the "grouping adjacent expressions onto the same device" as a side-effect?

7. We allow `MergeComposite` to be used to insert `on_device` annotations, call it `MergeAndAnnotate`.

8. (?) We rework `AnnotateTarget` to just look for `FTVMAnnotateTarget` operator attributes, call it
`AnnotateSEScopes`. When the function fires an `on_device` annotation is inserted. However since
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarifying my understanding:

Suggested change
`AnnotateSEScopes`. When the function fires an `on_device` annotation is inserted. However since
`AnnotateSEScopes`. When `FTVMAnnotateSEScopes` returns true, an `on_device` annotation is inserted. However since

mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Nov 9, 2021
CAUTION: Breaking VM executable serialization change. I needed a new 'virtual devices' array in the executable so that instructions can continue to refer to devices by a simple index yet the VM can respect both the device type and id for runtime devices.

Continuing from apache#9313, and as part of apache/tvm-rfcs#38, we switch PlanDevices to plan with respect to SEScopes instead of just DLDeviceTypes. Our ultimate goal is to be able to flow memory scopes between PrimFuncs by re-running PlanDevices after the LowerTE pass. This PR at least gets us to being able to flow the memory scopes, but the actual changes to PlanDevices to look inside PrimFuncs is still two PR's in the future.

However, we get two nice side effects right away:
 - Since SEScopes contain Targets we can isolate all the device-to-target resolution machinery within PlanDevices (with the help of CompilationConfig). After PlanDevices has run we can retrieve the Target for any sub-expression directly from that sub-expression's SEScope. For now we retain the one-Target-per-DLDeviceType constraint since it baked into the public 'TargetMap' API, but the path to breaking that constraint is clearer.
 - Device ids are now respected all the way from annotation to executor. Previously though we had a bit of plumbing using Devices the device_id therein was ignored or defaulted to zero.

 The Python "on_device" annotation helpers still work w.r.t. devices. Thus though they now respect device ids, they do not allow the user to specify a Target or memory scope as supported by the underlying SEScope.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Nov 10, 2021
CAUTION: Breaking VM executable serialization change. I needed a new 'virtual devices' array in the executable so that instructions can continue to refer to devices by a simple index yet the VM can respect both the device type and id for runtime devices.

Continuing from apache#9313, and as part of apache/tvm-rfcs#38, we switch PlanDevices to plan with respect to SEScopes instead of just DLDeviceTypes. Our ultimate goal is to be able to flow memory scopes between PrimFuncs by re-running PlanDevices after the LowerTE pass. This PR at least gets us to being able to flow the memory scopes, but the actual changes to PlanDevices to look inside PrimFuncs is still two PR's in the future.

However, we get two nice side effects right away:
 - Since SEScopes contain Targets we can isolate all the device-to-target resolution machinery within PlanDevices (with the help of CompilationConfig). After PlanDevices has run we can retrieve the Target for any sub-expression directly from that sub-expression's SEScope. For now we retain the one-Target-per-DLDeviceType constraint since it baked into the public 'TargetMap' API, but the path to breaking that constraint is clearer.
 - Device ids are now respected all the way from annotation to executor. Previously though we had a bit of plumbing using Devices the device_id therein was ignored or defaulted to zero.

 The Python "on_device" annotation helpers still work w.r.t. devices. Thus though they now respect device ids, they do not allow the user to specify a Target or memory scope as supported by the underlying SEScope.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Nov 11, 2021
CAUTION: Breaking VM executable serialization change. I needed a new 'virtual devices' array in the executable so that instructions can continue to refer to devices by a simple index yet the VM can respect both the device type and id for runtime devices.

Continuing from apache#9313, and as part of apache/tvm-rfcs#38, we switch PlanDevices to plan with respect to SEScopes instead of just DLDeviceTypes. Our ultimate goal is to be able to flow memory scopes between PrimFuncs by re-running PlanDevices after the LowerTE pass. This PR at least gets us to being able to flow the memory scopes, but the actual changes to PlanDevices to look inside PrimFuncs is still two PR's in the future.

However, we get two nice side effects right away:
 - Since SEScopes contain Targets we can isolate all the device-to-target resolution machinery within PlanDevices (with the help of CompilationConfig). After PlanDevices has run we can retrieve the Target for any sub-expression directly from that sub-expression's SEScope. For now we retain the one-Target-per-DLDeviceType constraint since it baked into the public 'TargetMap' API, but the path to breaking that constraint is clearer.
 - Device ids are now respected all the way from annotation to executor. Previously though we had a bit of plumbing using Devices the device_id therein was ignored or defaulted to zero.

 The Python "on_device" annotation helpers still work w.r.t. devices. Thus though they now respect device ids, they do not allow the user to specify a Target or memory scope as supported by the underlying SEScope.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Nov 12, 2021
CAUTION: Breaking VM executable serialization change. I needed a new 'virtual devices' array in the executable so that instructions can continue to refer to devices by a simple index yet the VM can respect both the device type and id for runtime devices.

Continuing from apache#9313, and as part of apache/tvm-rfcs#38, we switch PlanDevices to plan with respect to SEScopes instead of just DLDeviceTypes. Our ultimate goal is to be able to flow memory scopes between PrimFuncs by re-running PlanDevices after the LowerTE pass. This PR at least gets us to being able to flow the memory scopes, but the actual changes to PlanDevices to look inside PrimFuncs is still two PR's in the future.

However, we get two nice side effects right away:
 - Since SEScopes contain Targets we can isolate all the device-to-target resolution machinery within PlanDevices (with the help of CompilationConfig). After PlanDevices has run we can retrieve the Target for any sub-expression directly from that sub-expression's SEScope. For now we retain the one-Target-per-DLDeviceType constraint since it baked into the public 'TargetMap' API, but the path to breaking that constraint is clearer.
 - Device ids are now respected all the way from annotation to executor. Previously though we had a bit of plumbing using Devices the device_id therein was ignored or defaulted to zero.

 The Python "on_device" annotation helpers still work w.r.t. devices. Thus though they now respect device ids, they do not allow the user to specify a Target or memory scope as supported by the underlying SEScope.
mbs-octoml added a commit to mbs-octoml/mbs-tvm that referenced this pull request Nov 12, 2021
CAUTION: Breaking VM executable serialization change. I needed a new 'virtual devices' array in the executable so that instructions can continue to refer to devices by a simple index yet the VM can respect both the device type and id for runtime devices.

Continuing from apache#9313, and as part of apache/tvm-rfcs#38, we switch PlanDevices to plan with respect to SEScopes instead of just DLDeviceTypes. Our ultimate goal is to be able to flow memory scopes between PrimFuncs by re-running PlanDevices after the LowerTE pass. This PR at least gets us to being able to flow the memory scopes, but the actual changes to PlanDevices to look inside PrimFuncs is still two PR's in the future.

However, we get two nice side effects right away:
 - Since SEScopes contain Targets we can isolate all the device-to-target resolution machinery within PlanDevices (with the help of CompilationConfig). After PlanDevices has run we can retrieve the Target for any sub-expression directly from that sub-expression's SEScope. For now we retain the one-Target-per-DLDeviceType constraint since it baked into the public 'TargetMap' API, but the path to breaking that constraint is clearer.
 - Device ids are now respected all the way from annotation to executor. Previously though we had a bit of plumbing using Devices the device_id therein was ignored or defaulted to zero.

 The Python "on_device" annotation helpers still work w.r.t. devices. Thus though they now respect device ids, they do not allow the user to specify a Target or memory scope as supported by the underlying SEScope.
junrushao pushed a commit to apache/tvm that referenced this pull request Nov 12, 2021
…s. (#9326)

* Switch PlanDevices pass to be w.r.t. SEScopes instead of DLDeviceTypes.

CAUTION: Breaking VM executable serialization change. I needed a new 'virtual devices' array in the executable so that instructions can continue to refer to devices by a simple index yet the VM can respect both the device type and id for runtime devices.

Continuing from #9313, and as part of apache/tvm-rfcs#38, we switch PlanDevices to plan with respect to SEScopes instead of just DLDeviceTypes. Our ultimate goal is to be able to flow memory scopes between PrimFuncs by re-running PlanDevices after the LowerTE pass. This PR at least gets us to being able to flow the memory scopes, but the actual changes to PlanDevices to look inside PrimFuncs is still two PR's in the future.

However, we get two nice side effects right away:
 - Since SEScopes contain Targets we can isolate all the device-to-target resolution machinery within PlanDevices (with the help of CompilationConfig). After PlanDevices has run we can retrieve the Target for any sub-expression directly from that sub-expression's SEScope. For now we retain the one-Target-per-DLDeviceType constraint since it baked into the public 'TargetMap' API, but the path to breaking that constraint is clearer.
 - Device ids are now respected all the way from annotation to executor. Previously though we had a bit of plumbing using Devices the device_id therein was ignored or defaulted to zero.

 The Python "on_device" annotation helpers still work w.r.t. devices. Thus though they now respect device ids, they do not allow the user to specify a Target or memory scope as supported by the underlying SEScope.

* [checkpoint] Revert emitter.py, must have run 'black .' by mistake.

* [checkpoint] Address PR comments

Also add back SplitArgs pass in build_module.cc which somehow got lost in the shuffle.

(try again -- flaky test_crt.py test_autotune?)

* [checkpoint] Fix after rebase on CallLowered.
mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request Dec 1, 2021
…s. (apache#9326)

* Switch PlanDevices pass to be w.r.t. SEScopes instead of DLDeviceTypes.

CAUTION: Breaking VM executable serialization change. I needed a new 'virtual devices' array in the executable so that instructions can continue to refer to devices by a simple index yet the VM can respect both the device type and id for runtime devices.

Continuing from apache#9313, and as part of apache/tvm-rfcs#38, we switch PlanDevices to plan with respect to SEScopes instead of just DLDeviceTypes. Our ultimate goal is to be able to flow memory scopes between PrimFuncs by re-running PlanDevices after the LowerTE pass. This PR at least gets us to being able to flow the memory scopes, but the actual changes to PlanDevices to look inside PrimFuncs is still two PR's in the future.

However, we get two nice side effects right away:
 - Since SEScopes contain Targets we can isolate all the device-to-target resolution machinery within PlanDevices (with the help of CompilationConfig). After PlanDevices has run we can retrieve the Target for any sub-expression directly from that sub-expression's SEScope. For now we retain the one-Target-per-DLDeviceType constraint since it baked into the public 'TargetMap' API, but the path to breaking that constraint is clearer.
 - Device ids are now respected all the way from annotation to executor. Previously though we had a bit of plumbing using Devices the device_id therein was ignored or defaulted to zero.

 The Python "on_device" annotation helpers still work w.r.t. devices. Thus though they now respect device ids, they do not allow the user to specify a Target or memory scope as supported by the underlying SEScope.

* [checkpoint] Revert emitter.py, must have run 'black .' by mistake.

* [checkpoint] Address PR comments

Also add back SplitArgs pass in build_module.cc which somehow got lost in the shuffle.

(try again -- flaky test_crt.py test_autotune?)

* [checkpoint] Fix after rebase on CallLowered.
mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request Dec 1, 2021
…g in 'device' planning. (apache#9313)

[Target] Adds SEScope (Storage/Execution Scope) for use as new unit of planning in 'device' planning

This is the first step in apache/tvm-rfcs#38 to bring devices
and targets together when doing device planning. I've gone ahead and also included a
memory scope in this object since we will also need to propagate memory scopes across
Relay expressions once this basic preparation is in place. In the meantime that field will be
left as "".

Once device planning works in units of SEScopes it will be possible to directly read off
the device and target for any Relay sub-expression without the need for TargetMaps ort
the construction of default Targets.

SEScopes also support 'Join' and 'Default' operations needed when constraint solving in
the device planner. You can see those in use in my scratchpad branch:
  https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes

This PR also brings some duplicated and the ad-hoc 'default target' handling logic
together into a CompilationConfig class. (Again, see the scratchpad branch for how that
will end up being used). I've placed that next to SEScope since it's main purpose is to
  a) establish the default SEScope for primitive ops
  b) establish the SEScope for the 'host'
  c) feed a definitive vector of Targets into device planning so it can resolve all
     "on_device" and "device_copy" device references to their full SEScope form.

* Reworked to avoid global SEScopeCache.

Realized while working through unit tests in the sequel that it's reasonable
for folks to call build multiple times with distinct Target objects, in which
case the global cache would grow without bound.

So instead placed the cache in the CompilationConfig class. Since that class
now has everything the device planner needs to do its job, promoted it to
be an FFI-able Object, which is now in compilation_config.{h,cc}.

I think we can do much better with CompilationConfig, but for now keeping it
to the minimum I needed to prepare for device planning from all the executor
compilation codepaths.
mehrdadh pushed a commit to mehrdadh/tvm that referenced this pull request Dec 1, 2021
…s. (apache#9326)

* Switch PlanDevices pass to be w.r.t. SEScopes instead of DLDeviceTypes.

CAUTION: Breaking VM executable serialization change. I needed a new 'virtual devices' array in the executable so that instructions can continue to refer to devices by a simple index yet the VM can respect both the device type and id for runtime devices.

Continuing from apache#9313, and as part of apache/tvm-rfcs#38, we switch PlanDevices to plan with respect to SEScopes instead of just DLDeviceTypes. Our ultimate goal is to be able to flow memory scopes between PrimFuncs by re-running PlanDevices after the LowerTE pass. This PR at least gets us to being able to flow the memory scopes, but the actual changes to PlanDevices to look inside PrimFuncs is still two PR's in the future.

However, we get two nice side effects right away:
 - Since SEScopes contain Targets we can isolate all the device-to-target resolution machinery within PlanDevices (with the help of CompilationConfig). After PlanDevices has run we can retrieve the Target for any sub-expression directly from that sub-expression's SEScope. For now we retain the one-Target-per-DLDeviceType constraint since it baked into the public 'TargetMap' API, but the path to breaking that constraint is clearer.
 - Device ids are now respected all the way from annotation to executor. Previously though we had a bit of plumbing using Devices the device_id therein was ignored or defaulted to zero.

 The Python "on_device" annotation helpers still work w.r.t. devices. Thus though they now respect device ids, they do not allow the user to specify a Target or memory scope as supported by the underlying SEScope.

* [checkpoint] Revert emitter.py, must have run 'black .' by mistake.

* [checkpoint] Address PR comments

Also add back SplitArgs pass in build_module.cc which somehow got lost in the shuffle.

(try again -- flaky test_crt.py test_autotune?)

* [checkpoint] Fix after rebase on CallLowered.
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022
…g in 'device' planning. (apache#9313)

[Target] Adds SEScope (Storage/Execution Scope) for use as new unit of planning in 'device' planning

This is the first step in apache/tvm-rfcs#38 to bring devices
and targets together when doing device planning. I've gone ahead and also included a
memory scope in this object since we will also need to propagate memory scopes across
Relay expressions once this basic preparation is in place. In the meantime that field will be
left as "".

Once device planning works in units of SEScopes it will be possible to directly read off
the device and target for any Relay sub-expression without the need for TargetMaps ort
the construction of default Targets.

SEScopes also support 'Join' and 'Default' operations needed when constraint solving in
the device planner. You can see those in use in my scratchpad branch:
  https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes

This PR also brings some duplicated and the ad-hoc 'default target' handling logic
together into a CompilationConfig class. (Again, see the scratchpad branch for how that
will end up being used). I've placed that next to SEScope since it's main purpose is to
  a) establish the default SEScope for primitive ops
  b) establish the SEScope for the 'host'
  c) feed a definitive vector of Targets into device planning so it can resolve all
     "on_device" and "device_copy" device references to their full SEScope form.

* Reworked to avoid global SEScopeCache.

Realized while working through unit tests in the sequel that it's reasonable
for folks to call build multiple times with distinct Target objects, in which
case the global cache would grow without bound.

So instead placed the cache in the CompilationConfig class. Since that class
now has everything the device planner needs to do its job, promoted it to
be an FFI-able Object, which is now in compilation_config.{h,cc}.

I think we can do much better with CompilationConfig, but for now keeping it
to the minimum I needed to prepare for device planning from all the executor
compilation codepaths.
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 7, 2022
…s. (apache#9326)

* Switch PlanDevices pass to be w.r.t. SEScopes instead of DLDeviceTypes.

CAUTION: Breaking VM executable serialization change. I needed a new 'virtual devices' array in the executable so that instructions can continue to refer to devices by a simple index yet the VM can respect both the device type and id for runtime devices.

Continuing from apache#9313, and as part of apache/tvm-rfcs#38, we switch PlanDevices to plan with respect to SEScopes instead of just DLDeviceTypes. Our ultimate goal is to be able to flow memory scopes between PrimFuncs by re-running PlanDevices after the LowerTE pass. This PR at least gets us to being able to flow the memory scopes, but the actual changes to PlanDevices to look inside PrimFuncs is still two PR's in the future.

However, we get two nice side effects right away:
 - Since SEScopes contain Targets we can isolate all the device-to-target resolution machinery within PlanDevices (with the help of CompilationConfig). After PlanDevices has run we can retrieve the Target for any sub-expression directly from that sub-expression's SEScope. For now we retain the one-Target-per-DLDeviceType constraint since it baked into the public 'TargetMap' API, but the path to breaking that constraint is clearer.
 - Device ids are now respected all the way from annotation to executor. Previously though we had a bit of plumbing using Devices the device_id therein was ignored or defaulted to zero.

 The Python "on_device" annotation helpers still work w.r.t. devices. Thus though they now respect device ids, they do not allow the user to specify a Target or memory scope as supported by the underlying SEScope.

* [checkpoint] Revert emitter.py, must have run 'black .' by mistake.

* [checkpoint] Address PR comments

Also add back SplitArgs pass in build_module.cc which somehow got lost in the shuffle.

(try again -- flaky test_crt.py test_autotune?)

* [checkpoint] Fix after rebase on CallLowered.
yangulei pushed a commit to yangulei/tvm that referenced this pull request Jan 11, 2022
…s. (apache#9326)

* Switch PlanDevices pass to be w.r.t. SEScopes instead of DLDeviceTypes.

CAUTION: Breaking VM executable serialization change. I needed a new 'virtual devices' array in the executable so that instructions can continue to refer to devices by a simple index yet the VM can respect both the device type and id for runtime devices.

Continuing from apache#9313, and as part of apache/tvm-rfcs#38, we switch PlanDevices to plan with respect to SEScopes instead of just DLDeviceTypes. Our ultimate goal is to be able to flow memory scopes between PrimFuncs by re-running PlanDevices after the LowerTE pass. This PR at least gets us to being able to flow the memory scopes, but the actual changes to PlanDevices to look inside PrimFuncs is still two PR's in the future.

However, we get two nice side effects right away:
 - Since SEScopes contain Targets we can isolate all the device-to-target resolution machinery within PlanDevices (with the help of CompilationConfig). After PlanDevices has run we can retrieve the Target for any sub-expression directly from that sub-expression's SEScope. For now we retain the one-Target-per-DLDeviceType constraint since it baked into the public 'TargetMap' API, but the path to breaking that constraint is clearer.
 - Device ids are now respected all the way from annotation to executor. Previously though we had a bit of plumbing using Devices the device_id therein was ignored or defaulted to zero.

 The Python "on_device" annotation helpers still work w.r.t. devices. Thus though they now respect device ids, they do not allow the user to specify a Target or memory scope as supported by the underlying SEScope.

* [checkpoint] Revert emitter.py, must have run 'black .' by mistake.

* [checkpoint] Address PR comments

Also add back SplitArgs pass in build_module.cc which somehow got lost in the shuffle.

(try again -- flaky test_crt.py test_autotune?)

* [checkpoint] Fix after rebase on CallLowered.
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
…g in 'device' planning. (apache#9313)

[Target] Adds SEScope (Storage/Execution Scope) for use as new unit of planning in 'device' planning

This is the first step in apache/tvm-rfcs#38 to bring devices
and targets together when doing device planning. I've gone ahead and also included a
memory scope in this object since we will also need to propagate memory scopes across
Relay expressions once this basic preparation is in place. In the meantime that field will be
left as "".

Once device planning works in units of SEScopes it will be possible to directly read off
the device and target for any Relay sub-expression without the need for TargetMaps ort
the construction of default Targets.

SEScopes also support 'Join' and 'Default' operations needed when constraint solving in
the device planner. You can see those in use in my scratchpad branch:
  https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes

This PR also brings some duplicated and the ad-hoc 'default target' handling logic
together into a CompilationConfig class. (Again, see the scratchpad branch for how that
will end up being used). I've placed that next to SEScope since it's main purpose is to
  a) establish the default SEScope for primitive ops
  b) establish the SEScope for the 'host'
  c) feed a definitive vector of Targets into device planning so it can resolve all
     "on_device" and "device_copy" device references to their full SEScope form.

* Reworked to avoid global SEScopeCache.

Realized while working through unit tests in the sequel that it's reasonable
for folks to call build multiple times with distinct Target objects, in which
case the global cache would grow without bound.

So instead placed the cache in the CompilationConfig class. Since that class
now has everything the device planner needs to do its job, promoted it to
be an FFI-able Object, which is now in compilation_config.{h,cc}.

I think we can do much better with CompilationConfig, but for now keeping it
to the minimum I needed to prepare for device planning from all the executor
compilation codepaths.
ylc pushed a commit to ylc/tvm that referenced this pull request Jan 13, 2022
…s. (apache#9326)

* Switch PlanDevices pass to be w.r.t. SEScopes instead of DLDeviceTypes.

CAUTION: Breaking VM executable serialization change. I needed a new 'virtual devices' array in the executable so that instructions can continue to refer to devices by a simple index yet the VM can respect both the device type and id for runtime devices.

Continuing from apache#9313, and as part of apache/tvm-rfcs#38, we switch PlanDevices to plan with respect to SEScopes instead of just DLDeviceTypes. Our ultimate goal is to be able to flow memory scopes between PrimFuncs by re-running PlanDevices after the LowerTE pass. This PR at least gets us to being able to flow the memory scopes, but the actual changes to PlanDevices to look inside PrimFuncs is still two PR's in the future.

However, we get two nice side effects right away:
 - Since SEScopes contain Targets we can isolate all the device-to-target resolution machinery within PlanDevices (with the help of CompilationConfig). After PlanDevices has run we can retrieve the Target for any sub-expression directly from that sub-expression's SEScope. For now we retain the one-Target-per-DLDeviceType constraint since it baked into the public 'TargetMap' API, but the path to breaking that constraint is clearer.
 - Device ids are now respected all the way from annotation to executor. Previously though we had a bit of plumbing using Devices the device_id therein was ignored or defaulted to zero.

 The Python "on_device" annotation helpers still work w.r.t. devices. Thus though they now respect device ids, they do not allow the user to specify a Target or memory scope as supported by the underlying SEScope.

* [checkpoint] Revert emitter.py, must have run 'black .' by mistake.

* [checkpoint] Address PR comments

Also add back SplitArgs pass in build_module.cc which somehow got lost in the shuffle.

(try again -- flaky test_crt.py test_autotune?)

* [checkpoint] Fix after rebase on CallLowered.
@mbs-octoml
Copy link
Contributor Author

Closing as obsolete, since most of this is either already done or has been subsumed by the Collage proposal.

@mbs-octoml mbs-octoml closed this Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: need review RFC needs review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants