Skip to content

Commit

Permalink
Switch PlanDevices pass to be w.r.t. SEScopes instead of DLDeviceTypes.
Browse files Browse the repository at this point in the history
CAUTION: Breaking VM executable serialization change. I needed a new 'virtual devices' array in the executable so that instructions can continue to refer to devices by a simple index yet the VM can respect both the device type and id for runtime devices.

Continuing from apache#9313, and as part of apache/tvm-rfcs#38, we switch PlanDevices to plan with respect to SEScopes instead of just DLDeviceTypes. Our ultimate goal is to be able to flow memory scopes between PrimFuncs by re-running PlanDevices after the LowerTE pass. This PR at least gets us to being able to flow the memory scopes, but the actual changes to PlanDevices to look inside PrimFuncs is still two PR's in the future.

However, we get two nice side effects right away:
 - Since SEScopes contain Targets we can isolate all the device-to-target resolution machinery within PlanDevices (with the help of CompilationConfig). After PlanDevices has run we can retrieve the Target for any sub-expression directly from that sub-expression's SEScope. For now we retain the one-Target-per-DLDeviceType constraint since it baked into the public 'TargetMap' API, but the path to breaking that constraint is clearer.
 - Device ids are now respected all the way from annotation to executor. Previously though we had a bit of plumbing using Devices the device_id therein was ignored or defaulted to zero.

 The Python "on_device" annotation helpers still work w.r.t. devices. Thus though they now respect device ids, they do not allow the user to specify a Target or memory scope as supported by the underlying SEScope.
  • Loading branch information
mbs-octoml committed Nov 11, 2021
1 parent dc56eea commit c452a47
Show file tree
Hide file tree
Showing 58 changed files with 2,445 additions and 1,954 deletions.
12 changes: 6 additions & 6 deletions include/tvm/ir/function.h
Original file line number Diff line number Diff line change
Expand Up @@ -191,24 +191,24 @@ constexpr const char* kTarget = "target";
constexpr const char* kGlobalSymbol = "global_symbol";

/*!
* \brief The device type which will hold each of the functions parameters.
* \brief The SEScope which will hold each of the functions parameters.
*
* Only supported on Relay \p Functions. Generally added by the \p PlanDevices pass, but
* may be included as an annotation on user programs.
*
* Type: Array<Integer> (but interpreted as Array<DLDeviceType>)
* Type: Array<SEScope>
*/
constexpr const char* kParamDeviceTypes = "param_device_types";
constexpr const char* kParamSEScopes = "param_se_scopes";

/*!
* \brief The device type which will hold the function result.
* \brief The SEScope which will hold the function result.
*
* Only supported on Relay \p Functions. Generally added by the \p PlanDevices pass, but
* may be included as an annotation on user programs.
*
* Type: Integer (but interpreted as DLDeviceType)
* Type: SEScope
*/
constexpr const char* kResultDeviceType = "result_device_type";
constexpr const char* kResultSEScope = "result_se_scope";

} // namespace attr
} // namespace tvm
Expand Down
62 changes: 0 additions & 62 deletions include/tvm/relay/attrs/annotation.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,68 +31,6 @@
namespace tvm {
namespace relay {

/*!
* \brief Attributes for the "on_device" special operator.
*
* The Relay call (aka 'annotation'):
* \code
* on_device(sub_expr, device_type=2)
* \endcode
* constrains \p sub_expr to execute and store its result on a device with \p DLDeviceType \p 2
* (i.e. a \p kDLCuda device). However the annotation itself may appear in an expression to be
* executed and stored on a different device. If so the compiler will automatically insert a
* "device_copy" call to mediate the transition between devices.
*
* E.g.: Assuming %x and %y reside on the GPU and %z on the CPU then:
* \code
* multiply(on_device(add(%x, %y), device_type=2), %z)
* \endcode
* indicates the \p add should execute on the GPU but the \p multiply should execute on the CPU.
* The compiler will rewrite this to:
* \code
* multiply(device_copy(add(%x, %y), src_dev_type=2, dst_dev_type=1), %z)
* \endcode
*
* The Relay call
* \code
* on_device(sub_expr, device_type=2, is_fixed=True)
* \endcode
* is similar to the above, however the annotation itself must appear in an expression on the
* same device. The compiler will check the devices are consistent, and will not insert any
* "device_copy" call. This form of annotation shouldn't be necessary in user programs. However
* it is needed by the \p PlanDevices pass to fully specify the results of device planning so that
* the pass is idempotent.
*
* E.g.: The following program is equivalent to the above:
* \code
* let %a = on_device(add(%x, %y), device_type=2, is_fixed=True)
* multiply(device_copy(%a, src_dev_type=2, dst_dev_type=1), %z)
* \endcode
* The "on_device" annotation with \p is_fixed=True indicates unambiguously that \p %a is stored
* on the GPU.
*/
struct OnDeviceAttrs : public tvm::AttrsNode<OnDeviceAttrs> {
// TODO(mbs): Replace device types with TargetDevice.
/*! \brief Device type on which argument expression should be evaluated. */
int device_type = kInvalidDeviceType;
/*!
* \brief If true, the result device must also be \p device_type and device planning should
* not insert any "device_copy" calls to respect this annotation.
*
* This is used by the device planning pass itself when annotating the planned program.
*/
bool is_fixed = false;

TVM_DECLARE_ATTRS(OnDeviceAttrs, "relay.attrs.OnDeviceAttrs") {
TVM_ATTR_FIELD(device_type)
.describe("The type of the virtual device which should hold the expression result.")
.set_default(0);
TVM_ATTR_FIELD(is_fixed)
.describe("If true, do not insert a \"device_copy\" call to respect this annotation.")
.set_default(false);
}
};

/*!
* \brief Annotate an expression to be cast into specific data type.
*/
Expand Down
16 changes: 7 additions & 9 deletions include/tvm/relay/attrs/device_copy.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
#define TVM_RELAY_ATTRS_DEVICE_COPY_H_

#include <tvm/ir/attrs.h>
#include <tvm/target/se_scope.h>

#include <string>

Expand All @@ -35,17 +36,14 @@ namespace relay {
* \brief Options for the device copy operators.
*/
struct DeviceCopyAttrs : public tvm::AttrsNode<DeviceCopyAttrs> {
// TODO(mbs): Should be TargetDevice.
int dst_dev_type;
int src_dev_type;
SEScope src_se_scope = SEScope::FullyUnconstrained();
SEScope dst_se_scope = SEScope::FullyUnconstrained();

TVM_DECLARE_ATTRS(DeviceCopyAttrs, "relay.attrs.DeviceCopyAttrs") {
TVM_ATTR_FIELD(src_dev_type)
.describe("The virtual device/context type where the op copies data from.")
.set_default(0);
TVM_ATTR_FIELD(dst_dev_type)
.describe("The virtual device/context type where the op copies data to.")
.set_default(0);
TVM_ATTR_FIELD(src_se_scope)
.describe("The (virtual) device and scope where the op copies data from.");
TVM_ATTR_FIELD(dst_se_scope)
.describe("The (virtual) device and scope where the op copies data to.");
}
};

Expand Down
7 changes: 3 additions & 4 deletions include/tvm/relay/attrs/memory.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@

#include <tvm/ir/attrs.h>
#include <tvm/relay/expr.h>
#include <tvm/target/se_scope.h>

#include <string>
#include <vector>
Expand All @@ -42,15 +43,13 @@ Expr ToTupleType(const Type& t, const std::vector<Expr>& exprs);
*/
struct AllocStorageAttrs : public tvm::AttrsNode<AllocStorageAttrs> {
DataType dtype;
int device_id;
int device_type;
SEScope se_scope = SEScope::FullyUnconstrained();

TVM_DECLARE_ATTRS(AllocStorageAttrs, "relay.attrs.AllocStorageAttrs") {
TVM_ATTR_FIELD(dtype)
.describe("The dtype of the tensor to allocate.")
.set_default(DataType::Float(32, 1));
TVM_ATTR_FIELD(device_id).describe("The device id on which to allocate memory.");
TVM_ATTR_FIELD(device_type).describe("The device type on which to allocate memory.");
TVM_ATTR_FIELD(se_scope).describe("The SEScope on which to allocate memory.");
}
};

Expand Down
101 changes: 101 additions & 0 deletions include/tvm/relay/attrs/on_device.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

/*!
* \file tvm/relay/attrs/on_device.h
* \brief Attribute for the on device annotation.
*/
#ifndef TVM_RELAY_ATTRS_ON_DEVICE_H_
#define TVM_RELAY_ATTRS_ON_DEVICE_H_

#include <tvm/ir/attrs.h>
#include <tvm/target/se_scope.h>

#include <string>

namespace tvm {
namespace relay {

/*!
* \brief Attributes for the "on_device" special operator.
*
* The Relay call (aka 'annotation'):
* \code
* on_device(sub_expr, se_scope=S)
* \endcode
* constrains \p sub_expr to execute and store its result on the \p SEScope \p S.
* However the annotation itself may appear in an expression to be executed and stored on a
* different \p SEScope. If so the compiler will automatically insert a "device_copy" call to
* mediate the transition between \p SEScopes.
*
* E.g.: Assuming %x and %y reside on the GPU and %z on the CPU then:
* \code
* multiply(on_device(add(%x, %y), se_scope=GPU), %z)
* \endcode
* indicates the \p add should execute on the GPU but the \p multiply should execute on the CPU.
* The compiler will rewrite this to:
* \code
* multiply(device_copy(add(%x, %y), src_se_scope=GPU, dst_se_scope=CPU), %z)
* \endcode
*
* The Relay call
* \code
* on_device(sub_expr, se_scope=S, is_fixed=True)
* \endcode
* is similar to the above, however the annotation itself must appear in an expression on the
* same \p SEScope \p S. The compiler will check the \p SEScopes are consistent, and will not
* insert any "device_copy" call. This form of annotation shouldn't be necessary in user programs.
* However it is needed by the \p PlanDevices pass to fully specify the results of device planning
* so that the pass is idempotent.
*
* E.g.: The following program is equivalent to the above:
* \code
* let %a = on_device(add(%x, %y), se_scope=GPU, is_fixed=True)
* multiply(device_copy(%a, src_se_scope=GPU, dst_se_scope=CPU), %z)
* \endcode
* The "on_device" annotation with \p is_fixed=True indicates unambiguously that \p %a is stored
* on the GPU.
*/
struct OnDeviceAttrs : public tvm::AttrsNode<OnDeviceAttrs> {
/*!
* \brief (Virtual) \p SEScope on which the result of the argument expression should be stored.
*/
SEScope se_scope = SEScope::FullyUnconstrained();
/*!
* \brief If true, the result \p SEScope must also be \p se_scope, and device planning should
* not insert any "device_copy" calls to respect this annotation.
*
* This is used by the device planning pass itself when annotating the planned program.
*/
bool is_fixed = false;

TVM_DECLARE_ATTRS(OnDeviceAttrs, "relay.attrs.OnDeviceAttrs") {
TVM_ATTR_FIELD(se_scope)
.describe("The (virtual) device and scope holding the expression result.")
.set_default(SEScope::FullyUnconstrained());
TVM_ATTR_FIELD(is_fixed)
.describe("If true, do not insert a \"device_copy\" call to respect this annotation.")
.set_default(false);
}
};

} // namespace relay
} // namespace tvm

#endif // TVM_RELAY_ATTRS_ON_DEVICE_H_
24 changes: 15 additions & 9 deletions include/tvm/relay/transform.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@
#include <tvm/relay/function.h>
#include <tvm/relay/op.h>
#include <tvm/relay/op_attr_types.h>
#include <tvm/target/compilation_config.h>
#include <tvm/target/se_scope.h>
#include <tvm/target/target.h>

#include <string>
Expand Down Expand Up @@ -437,23 +439,27 @@ TVM_DLL Pass RelayToTIRTargetHook();
* \brief A pass for manifesting explicit memory allocations and rewriting
* specific dialects.
*
* \param target_host The target used by the host for compilation.
* \param targets The device type and target pairs for compilation.
* \param cpu_se_scope SEScope for computations and data which must reside on a CPU, such as
* shapes and shape functions.
*
* \return The pass.
*/
TVM_DLL Pass ManifestAlloc(Target target_host, Map<tvm::Integer, tvm::Target> targets);
TVM_DLL Pass ManifestAlloc(SEScope cpu_se_scope);

/*!
* \brief Uses existing "on_device" and "device_copy" CallNodes to infer the device on which
* every Relay sub-expression should run (and the result stored). Captures the result of that
* analysis using new "on_device" and "device_copy" CallNodes. See
* tvm::relay::transform::{LexicalOnDeviceMixin,DeviceAwareExprVisitor,DeviceAwareExprMutator}
* \brief Uses existing "on_device" and "device_copy" CallNodes to infer the \p SEScope on which
* every Relay sub-expression should run and the result stored. Captures the result of that
* analysis using new "on_device" and "device_copy" CallNodes.
*
* See tvm::relay::transform::{LexicalOnDeviceMixin,DeviceAwareExprVisitor,DeviceAwareExprMutator}
* for help recovering the device for an arbitrary sub-expression in downstream transformations.
*
* \param default_device_type DLDeviceType for default device.
* \param config Describes the targets and default \p SEScope for all primitive operators and
* host sub-expressions.
*
* \return The pass.
*/
TVM_DLL Pass PlanDevices(DLDeviceType default_device_type);
TVM_DLL Pass PlanDevices(CompilationConfig config);

} // namespace transform

Expand Down
26 changes: 14 additions & 12 deletions include/tvm/runtime/vm/bytecode.h
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,7 @@ struct Instruction {
RegName object;
} get_tag;
struct /* AllocADT Operands */ {
// TODO(mbs): Needs a DeviceAndScope.
/*! \brief The datatype's constructor tag. */
Index constructor_tag;
/*! \brief The number of fields to store in the datatype. */
Expand All @@ -184,6 +185,7 @@ struct Instruction {
RegName* datatype_fields;
};
struct /* AllocClosure Operands */ {
// TODO(mbs): Needs a DeviceAndScope.
/*! \brief The index into the function table. */
Index clo_index;
/*! \brief The number of free variables to capture. */
Expand All @@ -198,8 +200,8 @@ struct Instruction {
Index alignment;
/*! \brief The hint of the dtype. */
DLDataType dtype_hint;
/*! \brief The device type of the allocation. */
Index device_type;
/*! \brief The index of the device on which the allocation will be made. */
Index device_index;
} alloc_storage;
struct /* ShapeOf Operands */ {
RegName tensor;
Expand All @@ -210,11 +212,11 @@ struct Instruction {
} reshape_tensor;
struct /* DeviceCopy Operands */ {
RegName src;
/*! \brief The source device type. */
Index src_device_type;
/*! \brief The destination device type. */
Index dst_device_type;
};
/*! \brief The index of the source device to copy from. */
Index src_device_index;
/*! \brief The index of the destination deviceto copy to. */
Index dst_device_index;
} device_copy;
};

/*!
Expand Down Expand Up @@ -352,12 +354,12 @@ struct Instruction {
* \param size The size of the allocation.
* \param alignment The allocation's alignment.
* \param dtype_hint The data type hint for the allocator.
* \param device_type The device type for the allocator.
* \param device_index The index of the device to allocate on.
* \param dst The destination to place the storage.
* \return The alloc storage instruction.
*/
static Instruction AllocStorage(RegName size, Index alignment, DLDataType dtype_hint,
Index device_type, RegName dst);
Index device_index, RegName dst);
/*!
* \brief Get the shape of an input tensor.
* \param tensor The input tensor.
Expand All @@ -376,12 +378,12 @@ struct Instruction {
/*!
* \brief Copy tensor cross different devices.
* \param src The source register.
* \param src_device_type The device type of the tensor for the source register.
* \param dst_device_type The device type of the tensor ofr the destination register.
* \param src_device_index The index of the device holding the tensor in the source register.
* \param dst_device_index The index of the device to hold the tensor in the destination register.
* \param dst The destination register to store the copied tensor.
* \return The device copy instruction.
*/
static Instruction DeviceCopy(RegName src, Index src_device_type, Index dst_device_type,
static Instruction DeviceCopy(RegName src, Index src_device_index, Index dst_device_index,
RegName dst);

Instruction();
Expand Down
Loading

0 comments on commit c452a47

Please sign in to comment.