Skip to content

Commit

Permalink
** Switch device planning etc to use SEScope **
Browse files Browse the repository at this point in the history
[checkpoint] bad rebase

[checkpoint] pretty printing fixes

[checkpoint] Don't dup devices in executable, more unit tests

[checkpoint] woops, left target str debug in

Added Target::ToDebugString() so I can see the hosts since
they were giving me a lot of trouble.

[checkpoint] more pretty printing hackery, interpreter respects host devices

Also try harder to integrate the existing target->host mechanism into
CompilationConfig.

[checkpoint] Almost working again

 - Unit test setup distinguishes CPU for prims from CPU for host.
 - Get pretty printing to use the SEScopeNode ReprPrinter.
 - Allow host and primitive to have same device types.

test_dynamic_input failing

[checkpoint] rebase

[checkpoint] fix merge

[checkpoint] lint

[checkpoint] rebase

[checkpoint] Fixed stray use of kDLCPU in vm/profiler/vm.cc

[checkpoint] lint trivia

[checkpoint] fix unit tests

[checkpoint] device planner unit tests passing again

[checkpoint] Switch over to new CompilerOptions

[checkpoint] include

[checkpoint] Almost working again

Need to move the SEScopeCache into CompilationConfig
and pass that into DeviceDomains instead of just the
Vector<Target>. Then the host_se_scope can be memoized
so that direct uses of that scope downstream will match
up with se_scopes already established by PlanDevices.

Sigh.

[checkpoint] Use cache in device domains.

[checkpoint] more moves

[checkpoint] lints

[checkpoint] Fix merge with VM profiling changes.

[checkpoint] trivial

[checkpoint] rebase fix

[checkpoint] More unit tests.

Getting ready to fork out SEScope changes alone.

[checkpoint] lints

[checkpoint] All plan devices unit tests pass

[checkpoint] First unit test passes

[checkpoint] Another go at target management

This at least centralizes all the hackery. Compiles.

[commit] Start to rollback resolving to target in planner.

Better is to do it as stand alone pass I think.
Besides it doesn't work with the structural test for expected output.

[checkpoint] Almost have first unit test going.

About to merge Michalis' changes.
target_host is still a mess.
Starting to eliminate target_map.

[checkpoint] Cleanup VM device matching

[checkpoint] Compiles

[checkpoint] First sweep replacing DLDeviceType with SEScope

VM still not done.

[checkpoint] Expose CompilationConfig ctor in py

[checkpoint] CompilationConfig is nullable for default ctor

[checkpoint] Don't use target:: namespace

[checkpoint] Promote CompilationConfig to be FFI-friendly Object

Also rework to never mix the host_target into the 'primitive' targets.

[checkpoint] ResolveSEScope on CompilationConfig

[checkpoint] hash_reduce using target's data ptr

[checkpoint] Share FullyUnconstrained

[checkpoint] Backtrack on using global memoization for SEScope

Realized while working through unit tests in the sequel that it's reasonable
for folks to call build multiple times with distinct Target objects, in which
case the global cache would grow without bound.

I'll instead tackle memoization of SEScopes directly in device_domains.cc.

[checkpoint] Improve back compat for homogeneous case

If no host target is given but we have a unique target of
kDLCPU device type then also use that for the host.

Reworked to avoid global SEScopeCache.

Realized while working through unit tests in the sequel that it's reasonable
for folks to call build multiple times with distinct Target objects, in which
case the global cache would grow without bound.

So instead placed the cache in the CompilationConfig class. Since that class
now has everything the device planner needs to do its job, promoted it to
be an FFI-able Object, which is now in compilation_config.{h,cc}.

I think we can do much better with CompilationConfig, but for now keeping it
to the minimum I needed to prepare for device planning from all the executor
compilation codepaths.

Adds SEScope (Storage/Execution Scope) for use as new unit of planning in 'device' planning

This is the first step in apache/tvm-rfcs#38 to bring devices
and targets together when doing device planning. I've gone ahead and also included a
memory scope in this object since we will also need to propagate memory scopes across
Relay expressions once this basic preparation is in place. In the meantime that field will be
left as "".

Once device planning works in units of SEScopes it will be possible to directly read off
the device and target for any Relay sub-expression without the need for TargetMaps ort
the construction of default Targets.

SEScopes also support 'Join' and 'Default' operations needed when constraint solving in
the device planner. You can see those in use in my scratchpad branch:
  https://github.com/mbs-octoml/mbs-tvm/tree/mbs-scopes

This PR also brings some duplicated and the ad-hoc 'default target' handling logic
together into a CompilationConfig class. (Again, see the scratchpad branch for how that
will end up being used). I've placed that next to SEScope since it's main purpose is to
  a) establish the default SEScope for primitive ops
  b) establish the SEScope for the 'host'
  c) feed a definitive vector of Targets into device planning so it can resolve all
     "on_device" and "device_copy" device references to their full SEScope form.
  • Loading branch information
mbs-octoml committed Nov 5, 2021
1 parent 63f1375 commit dfbb253
Show file tree
Hide file tree
Showing 58 changed files with 2,423 additions and 1,931 deletions.
12 changes: 6 additions & 6 deletions include/tvm/ir/function.h
Original file line number Diff line number Diff line change
Expand Up @@ -191,24 +191,24 @@ constexpr const char* kTarget = "target";
constexpr const char* kGlobalSymbol = "global_symbol";

/*!
* \brief The device type which will hold each of the functions parameters.
* \brief The SEScope which will hold each of the functions parameters.
*
* Only supported on Relay \p Functions. Generally added by the \p PlanDevices pass, but
* may be included as an annotation on user programs.
*
* Type: Array<Integer> (but interpreted as Array<DLDeviceType>)
* Type: Array<SEScope>
*/
constexpr const char* kParamDeviceTypes = "param_device_types";
constexpr const char* kParamSEScopes = "param_se_scopes";

/*!
* \brief The device type which will hold the function result.
* \brief The SEScope which will hold the function result.
*
* Only supported on Relay \p Functions. Generally added by the \p PlanDevices pass, but
* may be included as an annotation on user programs.
*
* Type: Integer (but interpreted as DLDeviceType)
* Type: SEScope
*/
constexpr const char* kResultDeviceType = "result_device_type";
constexpr const char* kResultSEScope = "result_se_scope";

} // namespace attr
} // namespace tvm
Expand Down
65 changes: 3 additions & 62 deletions include/tvm/relay/attrs/annotation.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,74 +25,13 @@
#define TVM_RELAY_ATTRS_ANNOTATION_H_

#include <tvm/ir/attrs.h>
#include <tvm/target/se_scope.h>

#include <string>

namespace tvm {
namespace relay {

/*!
* \brief Attributes for the "on_device" special operator.
*
* The Relay call (aka 'annotation'):
* \code
* on_device(sub_expr, device_type=2)
* \endcode
* constrains \p sub_expr to execute and store its result on a device with \p DLDeviceType \p 2
* (i.e. a \p kDLCuda device). However the annotation itself may appear in an expression to be
* executed and stored on a different device. If so the compiler will automatically insert a
* "device_copy" call to mediate the transition between devices.
*
* E.g.: Assuming %x and %y reside on the GPU and %z on the CPU then:
* \code
* multiply(on_device(add(%x, %y), device_type=2), %z)
* \endcode
* indicates the \p add should execute on the GPU but the \p multiply should execute on the CPU.
* The compiler will rewrite this to:
* \code
* multiply(device_copy(add(%x, %y), src_dev_type=2, dst_dev_type=1), %z)
* \endcode
*
* The Relay call
* \code
* on_device(sub_expr, device_type=2, is_fixed=True)
* \endcode
* is similar to the above, however the annotation itself must appear in an expression on the
* same device. The compiler will check the devices are consistent, and will not insert any
* "device_copy" call. This form of annotation shouldn't be necessary in user programs. However
* it is needed by the \p PlanDevices pass to fully specify the results of device planning so that
* the pass is idempotent.
*
* E.g.: The following program is equivalent to the above:
* \code
* let %a = on_device(add(%x, %y), device_type=2, is_fixed=True)
* multiply(device_copy(%a, src_dev_type=2, dst_dev_type=1), %z)
* \endcode
* The "on_device" annotation with \p is_fixed=True indicates unambiguously that \p %a is stored
* on the GPU.
*/
struct OnDeviceAttrs : public tvm::AttrsNode<OnDeviceAttrs> {
// TODO(mbs): Replace device types with TargetDevice.
/*! \brief Device type on which argument expression should be evaluated. */
int device_type = kInvalidDeviceType;
/*!
* \brief If true, the result device must also be \p device_type and device planning should
* not insert any "device_copy" calls to respect this annotation.
*
* This is used by the device planning pass itself when annotating the planned program.
*/
bool is_fixed = false;

TVM_DECLARE_ATTRS(OnDeviceAttrs, "relay.attrs.OnDeviceAttrs") {
TVM_ATTR_FIELD(device_type)
.describe("The type of the virtual device which should hold the expression result.")
.set_default(0);
TVM_ATTR_FIELD(is_fixed)
.describe("If true, do not insert a \"device_copy\" call to respect this annotation.")
.set_default(false);
}
};

/*!
* \brief Annotate an expression to be cast into specific data type.
*/
Expand All @@ -118,6 +57,8 @@ struct CompilerAttrs : public tvm::AttrsNode<CompilerAttrs> {

/*!
* \brief Metadata for calls to TIR functions, useful for program analysis crossing Relay and TIR.
*
* TODO(mbs): Replace with typed fields once attributes have stabilized.
*/
struct TIRCallAttrs : public tvm::AttrsNode<TIRCallAttrs> {
/*! \brief The metadata attached to the call node. */
Expand Down
16 changes: 7 additions & 9 deletions include/tvm/relay/attrs/device_copy.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
#define TVM_RELAY_ATTRS_DEVICE_COPY_H_

#include <tvm/ir/attrs.h>
#include <tvm/target/se_scope.h>

#include <string>

Expand All @@ -35,17 +36,14 @@ namespace relay {
* \brief Options for the device copy operators.
*/
struct DeviceCopyAttrs : public tvm::AttrsNode<DeviceCopyAttrs> {
// TODO(mbs): Should be TargetDevice.
int dst_dev_type;
int src_dev_type;
SEScope src_se_scope = SEScope::FullyUnconstrained();
SEScope dst_se_scope = SEScope::FullyUnconstrained();

TVM_DECLARE_ATTRS(DeviceCopyAttrs, "relay.attrs.DeviceCopyAttrs") {
TVM_ATTR_FIELD(src_dev_type)
.describe("The virtual device/context type where the op copies data from.")
.set_default(0);
TVM_ATTR_FIELD(dst_dev_type)
.describe("The virtual device/context type where the op copies data to.")
.set_default(0);
TVM_ATTR_FIELD(src_se_scope)
.describe("The (virtual) device and scope where the op copies data from.");
TVM_ATTR_FIELD(dst_se_scope)
.describe("The (virtual) device and scope where the op copies data to.");
}
};

Expand Down
7 changes: 3 additions & 4 deletions include/tvm/relay/attrs/memory.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@

#include <tvm/ir/attrs.h>
#include <tvm/relay/expr.h>
#include <tvm/target/se_scope.h>

#include <string>
#include <vector>
Expand All @@ -42,15 +43,13 @@ Expr ToTupleType(const Type& t, const std::vector<Expr>& exprs);
*/
struct AllocStorageAttrs : public tvm::AttrsNode<AllocStorageAttrs> {
DataType dtype;
int device_id;
int device_type;
SEScope se_scope = SEScope::FullyUnconstrained();

TVM_DECLARE_ATTRS(AllocStorageAttrs, "relay.attrs.AllocStorageAttrs") {
TVM_ATTR_FIELD(dtype)
.describe("The dtype of the tensor to allocate.")
.set_default(DataType::Float(32, 1));
TVM_ATTR_FIELD(device_id).describe("The device id on which to allocate memory.");
TVM_ATTR_FIELD(device_type).describe("The device type on which to allocate memory.");
TVM_ATTR_FIELD(se_scope).describe("The SEScope on which to allocate memory.");
}
};

Expand Down
101 changes: 101 additions & 0 deletions include/tvm/relay/attrs/on_device.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

/*!
* \file tvm/relay/attrs/on_device.h
* \brief Attribute for the on device annotation.
*/
#ifndef TVM_RELAY_ATTRS_ON_DEVICE_H_
#define TVM_RELAY_ATTRS_ON_DEVICE_H_

#include <tvm/ir/attrs.h>
#include <tvm/target/se_scope.h>

#include <string>

namespace tvm {
namespace relay {

/*!
* \brief Attributes for the "on_device" special operator.
*
* The Relay call (aka 'annotation'):
* \code
* on_device(sub_expr, se_scope=S)
* \endcode
* constrains \p sub_expr to execute and store its result on the \p SEScope \p S.
* However the annotation itself may appear in an expression to be executed and stored on a
* different \p SEScope. If so the compiler will automatically insert a "device_copy" call to
* mediate the transition between \p SEScopes.
*
* E.g.: Assuming %x and %y reside on the GPU and %z on the CPU then:
* \code
* multiply(on_device(add(%x, %y), se_scope=GPU), %z)
* \endcode
* indicates the \p add should execute on the GPU but the \p multiply should execute on the CPU.
* The compiler will rewrite this to:
* \code
* multiply(device_copy(add(%x, %y), src_se_scope=GPU, dst_se_scope=CPU), %z)
* \endcode
*
* The Relay call
* \code
* on_device(sub_expr, se_scope=S, is_fixed=True)
* \endcode
* is similar to the above, however the annotation itself must appear in an expression on the
* same \p SEScope \p S. The compiler will check the \p SEScopes are consistent, and will not
* insert any "device_copy" call. This form of annotation shouldn't be necessary in user programs.
* However it is needed by the \p PlanDevices pass to fully specify the results of device planning
* so that the pass is idempotent.
*
* E.g.: The following program is equivalent to the above:
* \code
* let %a = on_device(add(%x, %y), se_scope=GPU, is_fixed=True)
* multiply(device_copy(%a, src_se_scope=GPU, dst_se_scope=CPU), %z)
* \endcode
* The "on_device" annotation with \p is_fixed=True indicates unambiguously that \p %a is stored
* on the GPU.
*/
struct OnDeviceAttrs : public tvm::AttrsNode<OnDeviceAttrs> {
/*!
* \brief (Virtual) \p SEScope on which the result of the argument expression should be stored.
*/
SEScope se_scope = SEScope::FullyUnconstrained();
/*!
* \brief If true, the result \p SEScope must also be \p se_scope, and device planning should
* not insert any "device_copy" calls to respect this annotation.
*
* This is used by the device planning pass itself when annotating the planned program.
*/
bool is_fixed = false;

TVM_DECLARE_ATTRS(OnDeviceAttrs, "relay.attrs.OnDeviceAttrs") {
TVM_ATTR_FIELD(se_scope)
.describe("The (virtual) device and scope holding the expression result.")
.set_default(SEScope::FullyUnconstrained());
TVM_ATTR_FIELD(is_fixed)
.describe("If true, do not insert a \"device_copy\" call to respect this annotation.")
.set_default(false);
}
};

} // namespace relay
} // namespace tvm

#endif // TVM_RELAY_ATTRS_ON_DEVICE_H_
24 changes: 15 additions & 9 deletions include/tvm/relay/transform.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@
#include <tvm/relay/function.h>
#include <tvm/relay/op.h>
#include <tvm/relay/op_attr_types.h>
#include <tvm/target/compilation_config.h>
#include <tvm/target/se_scope.h>
#include <tvm/target/target.h>

#include <string>
Expand Down Expand Up @@ -437,23 +439,27 @@ TVM_DLL Pass RelayToTIRTargetHook();
* \brief A pass for manifesting explicit memory allocations and rewriting
* specific dialects.
*
* \param target_host The target used by the host for compilation.
* \param targets The device type and target pairs for compilation.
* \param cpu_se_scope SEScope for computations and data which must reside on a CPU, such as
* shapes and shape functions.
*
* \return The pass.
*/
TVM_DLL Pass ManifestAlloc(Target target_host, Map<tvm::Integer, tvm::Target> targets);
TVM_DLL Pass ManifestAlloc(SEScope cpu_se_scope);

/*!
* \brief Uses existing "on_device" and "device_copy" CallNodes to infer the device on which
* every Relay sub-expression should run (and the result stored). Captures the result of that
* analysis using new "on_device" and "device_copy" CallNodes. See
* tvm::relay::transform::{LexicalOnDeviceMixin,DeviceAwareExprVisitor,DeviceAwareExprMutator}
* \brief Uses existing "on_device" and "device_copy" CallNodes to infer the \p SEScope on which
* every Relay sub-expression should run and the result stored. Captures the result of that
* analysis using new "on_device" and "device_copy" CallNodes.
*
* See tvm::relay::transform::{LexicalOnDeviceMixin,DeviceAwareExprVisitor,DeviceAwareExprMutator}
* for help recovering the device for an arbitrary sub-expression in downstream transformations.
*
* \param default_device_type DLDeviceType for default device.
* \param config Describes the targets and default \p SEScope for all primitive operators and
* host sub-expressions.
*
* \return The pass.
*/
TVM_DLL Pass PlanDevices(DLDeviceType default_device_type);
TVM_DLL Pass PlanDevices(CompilationConfig config);

} // namespace transform

Expand Down
26 changes: 14 additions & 12 deletions include/tvm/runtime/vm/bytecode.h
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,7 @@ struct Instruction {
RegName object;
} get_tag;
struct /* AllocADT Operands */ {
// TODO(mbs): Needs a DeviceAndScope.
/*! \brief The datatype's constructor tag. */
Index constructor_tag;
/*! \brief The number of fields to store in the datatype. */
Expand All @@ -184,6 +185,7 @@ struct Instruction {
RegName* datatype_fields;
};
struct /* AllocClosure Operands */ {
// TODO(mbs): Needs a DeviceAndScope.
/*! \brief The index into the function table. */
Index clo_index;
/*! \brief The number of free variables to capture. */
Expand All @@ -198,8 +200,8 @@ struct Instruction {
Index alignment;
/*! \brief The hint of the dtype. */
DLDataType dtype_hint;
/*! \brief The device type of the allocation. */
Index device_type;
/*! \brief The index of the device on which the allocation will be made. */
Index device_index;
} alloc_storage;
struct /* ShapeOf Operands */ {
RegName tensor;
Expand All @@ -210,11 +212,11 @@ struct Instruction {
} reshape_tensor;
struct /* DeviceCopy Operands */ {
RegName src;
/*! \brief The source device type. */
Index src_device_type;
/*! \brief The destination device type. */
Index dst_device_type;
};
/*! \brief The index of the source device to copy from. */
Index src_device_index;
/*! \brief The index of the destination deviceto copy to. */
Index dst_device_index;
} device_copy;
};

/*!
Expand Down Expand Up @@ -352,12 +354,12 @@ struct Instruction {
* \param size The size of the allocation.
* \param alignment The allocation's alignment.
* \param dtype_hint The data type hint for the allocator.
* \param device_type The device type for the allocator.
* \param device_index The index of the device to allocate on.
* \param dst The destination to place the storage.
* \return The alloc storage instruction.
*/
static Instruction AllocStorage(RegName size, Index alignment, DLDataType dtype_hint,
Index device_type, RegName dst);
Index device_index, RegName dst);
/*!
* \brief Get the shape of an input tensor.
* \param tensor The input tensor.
Expand All @@ -376,12 +378,12 @@ struct Instruction {
/*!
* \brief Copy tensor cross different devices.
* \param src The source register.
* \param src_device_type The device type of the tensor for the source register.
* \param dst_device_type The device type of the tensor ofr the destination register.
* \param src_device_index The index of the device holding the tensor in the source register.
* \param dst_device_index The index of the device to hold the tensor in the destination register.
* \param dst The destination register to store the copied tensor.
* \return The device copy instruction.
*/
static Instruction DeviceCopy(RegName src, Index src_device_type, Index dst_device_type,
static Instruction DeviceCopy(RegName src, Index src_device_index, Index dst_device_index,
RegName dst);

Instruction();
Expand Down
Loading

0 comments on commit dfbb253

Please sign in to comment.