You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
A new deferred computation (DC) argument to the imperative MXNet APIs is
proposed. If enabled, memory allocation and computation is deferred as long as
possible. Users can export the computational graph recorded during deferred
computation, which enables hybridization support.
Arrays for which DC is enabled are called lazy. Other arrays are called normal. Inplace operations on lazy arrays are unsupported.
Storage allocation and computation for lazy arrays is deferred until their
results are required by conversion to numpy or use as input to an operator
creating a normal array. Accessing attributes such as shape can also trigger
computation if the attribute can't be inferred.
C API
Deferred Compute (DC) Mode
An “alias” to MXImperativeInvokeEx, MXImperativeDeferredInvokeEx is
introduced which creates lazy arrays based on (normal or lazy) input arrays and
the operator
/*! * \brief invoke a nnvm op and imperative function creating lazy ndarray * \param creator the op * \param num_inputs number of input NDArrays * \param inputs input NDArrays * \param num_outputs number of output NDArrays * \param outputs output NDArrays * \param num_params number of keyword parameters * \param param_keys keys for keyword parameters * \param param_vals values for keyword parameters * \param out_stypes output ndarrays' stypes * \return 0 when success, -1 when failure happens */MXNET_DLLintMXImperativeDeferredInvokeEx(AtomicSymbolCreatorcreator,
intnum_inputs,
NDArrayHandle*inputs,
int*num_outputs,
NDArrayHandle**outputs,
intnum_params,
constchar**param_keys,
constchar**param_vals,
constint**out_stypes);
Checks and explicit trigger
/*! * \brief Check if array's computation is deferred. * \param handles ndarray handles to be checked * \param num_handles nmuber of ndarray handles to be checked * \param status pointer to array of num_handles integers to hold the result. */MXNET_DLLintMXNDArrayGetIsDeferredCompute(NDArrayHandle*handles,
intnum_handles,
int*status);
/*! * \brief Trigger deferred computation. * \param handles ndarray handles to trigger comuptation of. * \param num_handles nmuber of ndarray handles to be checked * * Deferred computation of input arrays for specified handles is triggered if * required. Arrays that are already computed are ignored. */MXNET_DLLintMXNDArrayTriggerDeferredCompute(NDArrayHandle*handles,
intnum_handles);
Exporting to symbol
The computational graph recorded in deferred computation mode can be exported to
symbol. Users must specify all inputs and outputs, to define the part of the
graph they are interested in exporting.
It is an error, if any of the output depends on an input is not or cannot be
computed from the specified inputs. Equally, providing an input that is not
connected to any output is an error.
/*! * \brief Extract the graph constructed during deferred computation mode as a * Symbol. * \param input_handles ndarray handles of inputs * \param output_handles ndarray handles of outputs * \param input_names names associated with the inputs of the returned Symbol * \param output_names names associated with the outputs of the returned Symbol * \param out grouped output symbol handle * * Construct a Symbol for the subgraph of the deferred computation graph * spanning from the input_handles to the output_handles. Requires that * input_handles and output_handles are connected in the tracked computational * graph. The input_handles are required to have been used as arguments to an * operator that is part of the tracked subgraph. All inputs of the * computational graph must be specified. */MXNET_DLLintMXNDArrayGetDeferredComputeSymbol(NDArrayHandle*input_handles,
NDArrayHandle*output_handles,
constchar**input_names,
constchar**output_names,
intnum_inputs,
intnum_outputs,
SymbolHandle*out);
classNDArray {
public:
[...]
/*! * \brief constructs a new dynamic NDArray * \param shape the shape of array * \param ctx context of NDArray * \param `delay_alloc whether delay the allocation (True for DC mode)` * \param dtype data type of this ndarray*/NDArray(const mxnet::TShape &shape, Context ctx,
bool delay_alloc = false, int dtype = mshadow::default_type_flag)
: ptr_(std::make_shared<Chunk>(shape, ctx, delay_alloc, dtype)),
shape_(shape),
dtype_(dtype),
storage_type_(kDefaultStorage),
autograd_entry_(nullptr) {
}
[...]
/*! * \brief Block until all the pending write operations with respect * to current NDArray are finished, and read can be performed. * * If this is a array with deferred computation, computation is triggered.*/inlinevoidWaitToRead() const;
/*! * \brief Block until all the pending read/write operations with respect * to current NDArray are finished, and write can be performed. * * If this is a array with deferred computation, computation is triggered.*/inlinevoidWaitToWrite() const;
[...]
private:
[...]
/*! \brief node entry for autograd */
nnvm::NodeEntry autograd_entry_; // renamed from entry_/*! \brief node entry for deferred computation tracking */
nnvm::NodeEntry deferredcompute_entry_;
/*! * \brief Perform deferred computation. * * Applicable if current array is associated with deferredcompute_entry_ and * DCInfo. If so, compute this and all dependent NDArrays. * * Triggered automatically if needed by WaitToRead*/voidDeferredCompute() const;
[...]
};
DCInfo
/*! \brief DCInfo stores NDArays required to perform the deferred computation * of it's owning NDArray. * * Once deferred computation is completed, DCInfo::Clear should be executed * to release references to input data.*/classDCInfo {
public:DCInfo(std::vector<NDArray> inputs) inputs_(inputs);
static DCInfo& Get(const nnvm::NodePtr& node) {
return dmlc::get<DCInfo>(node->info);
}
staticvoidClear(const nnvm::NodePtr& node) {
if (node == nullptr || node->info.empty()) return;
DCInfo& info = Get(node);
node->info.clear();
}
static DCInfo& Create(const nnvm::NodePtr& node) {
node->info.construct<DCInfo>();
returnGet(node);
}
private:
std::vector<NDArray> inputs_;
};
Execution
Trigger execution from within NDArray
NDArray::WaitToRead and NDArray::WaitToWrite are extended to trigger
execution, calling NDArray::TriggerDeferredCompute. TriggerDeferredCompute
is a no-op if no DCInfo is associated with the current array, ie. if it is
already computed.
Explicit C API to trigger execution
Users can also manually trigger the computation of specified arrays.
Implementation
Operations on the graph are pushed to the engine for asynchronous execution via RunGraph.
FAQ
How about Autograd, NDArray.autograd_entry_ and AGInfo?
Autograd inside deferred computation (DC) mode can be supported.
Relation of Autograd and DC: While autograd’s RecordOp provides a similar
recording functionality to the deferred computation, the autograd graph is not
the same as a computational graph: NDArray::Detach() serves to detach a node
from the autograd graph by deleting NDArray.entry_, though the NodeEntry is
still required for reconstructing the computational history of how this NDArray
came to be.
Are reqs like kInPlace supported?
No. For now only kWriteTo is supported in DC mode.
The plan is to replace inplace operations with kWriteTo operations, writing to
a new (lazy) array. The framework should be smart enough to decide when to reuse
memory and when not. It shouldn’t be required for users to specify that they
want an inplace operation.
How is context attribute handled, specifically context changes?
Cross-device copy must be represented as operator (CrossDeviceCopyOp) which
requires special handling in the graph executor.
How is incomplete shape information handled? shape property triggers computation if shape is accessed and can't be inferred completely.
Users can access static_shape if they wan't to avoid triggering computation.
Python (Gluon)
Based on DC, hybridization in Gluon is simplified:
Instead of implementing def hybrid_forward(self, F, x, ...) in HybridBlock,
users can opt to implement def forward(self, x, ...) in HybridBlock.
Hybridization based on DC works by the HybridBlock performing the following
steps (if it is not called by a parent block being hybridized)
keeping a reference to the input arrays and a reference to the parameter
arrays to pass them to MXNDArrayGetDeferredComputeSymbol;
enabling deferred compute mode
running forward
exporting to symbol and create CachedOp; Run CachedOp
A (internal) global context variable tracks if hybridization is ongoing. If set
to False and a Block is called that is to be hybridized, the global context
variable is set to True and the Block goes through all 4 steps outlined above;
finally the context variable is set back to False after the export to Symbol
step is finished.
DC could be used to support hybridzing Block if all logic can be traced. A
separate effort may add logic to detect these cases and add hybridization
support based on DC. For now we rely on user to signify hybridization support by
subclassing HybridBlock.
Parameter Shape Inference
For HybridBlock making use of DC for hybridization, we request users to
implement HybridBlock.infer_shape to infer the parameters shape given the
inputs.
Currently, if HybridBlock.infer_shape is not implemented, backward shape
inference is used to infer the shape of parameters. However backward shape
inference is not supported in all cases (cf #14253, #14983 (comment))
and relying on it for parameter shape inference is brittle. Thus for consistency
and simplicity we require infer_shape method implementation when using
hybridization based on DC.
The text was updated successfully, but these errors were encountered:
Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended label(s): Feature
A new deferred computation (DC) argument to the imperative MXNet APIs is
proposed. If enabled, memory allocation and computation is deferred as long as
possible. Users can export the computational graph recorded during deferred
computation, which enables hybridization support.
Arrays for which DC is enabled are called lazy. Other arrays are called
normal. Inplace operations on lazy arrays are unsupported.
Storage allocation and computation for lazy arrays is deferred until their
results are required by conversion to numpy or use as input to an operator
creating a normal array. Accessing attributes such as
shape
can also triggercomputation if the attribute can't be inferred.
C API
Deferred Compute (DC) Mode
An “alias” to
MXImperativeInvokeEx
,MXImperativeDeferredInvokeEx
isintroduced which creates lazy arrays based on (normal or lazy) input arrays and
the operator
Checks and explicit trigger
Exporting to symbol
The computational graph recorded in deferred computation mode can be exported to
symbol. Users must specify all inputs and outputs, to define the part of the
graph they are interested in exporting.
It is an error, if any of the output depends on an input is not or cannot be
computed from the specified inputs. Equally, providing an input that is not
connected to any output is an error.
Basic Python usage example
Example without Gluon.
Implementation (C++)
NDArray
DCInfo
Execution
Trigger execution from within
NDArray
NDArray::WaitToRead
andNDArray::WaitToWrite
are extended to triggerexecution, calling
NDArray::TriggerDeferredCompute
.TriggerDeferredCompute
is a no-op if no
DCInfo
is associated with the current array, ie. if it isalready computed.
Explicit C API to trigger execution
Users can also manually trigger the computation of specified arrays.
Implementation
Operations on the graph are pushed to the engine for asynchronous execution via RunGraph.
FAQ
How about Autograd,
NDArray.autograd_entry_
andAGInfo
?Autograd inside deferred computation (DC) mode can be supported.
Relation of Autograd and DC: While autograd’s
RecordOp
provides a similarrecording functionality to the deferred computation, the autograd graph is not
the same as a computational graph:
NDArray::Detach()
serves to detach a nodefrom the autograd graph by deleting
NDArray.entry_
, though theNodeEntry
isstill required for reconstructing the computational history of how this NDArray
came to be.
Are reqs like
kInPlace
supported?No. For now only
kWriteTo
is supported in DC mode.The plan is to replace inplace operations with
kWriteTo
operations, writing toa new (lazy) array. The framework should be smart enough to decide when to reuse
memory and when not. It shouldn’t be required for users to specify that they
want an inplace operation.
How is context attribute handled, specifically context changes?
Cross-device copy must be represented as operator (
CrossDeviceCopyOp
) whichrequires special handling in the graph executor.
How is incomplete shape information handled?
shape
property triggers computation if shape is accessed and can't be inferred completely.Users can access
static_shape
if they wan't to avoid triggering computation.Python (Gluon)
Based on DC, hybridization in Gluon is simplified:
Instead of implementing
def hybrid_forward(self, F, x, ...)
inHybridBlock
,users can opt to implement
def forward(self, x, ...)
inHybridBlock
.Hybridization based on DC works by the HybridBlock performing the following
steps (if it is not called by a parent block being hybridized)
arrays to pass them to
MXNDArrayGetDeferredComputeSymbol
;forward
A (internal) global context variable tracks if hybridization is ongoing. If set
to False and a Block is called that is to be hybridized, the global context
variable is set to True and the Block goes through all 4 steps outlined above;
finally the context variable is set back to False after the export to Symbol
step is finished.
Usage example
Hybridizing
gluon.Block
s?DC could be used to support hybridzing
Block
if all logic can be traced. Aseparate effort may add logic to detect these cases and add hybridization
support based on DC. For now we rely on user to signify hybridization support by
subclassing
HybridBlock
.Parameter Shape Inference
For HybridBlock making use of DC for hybridization, we request users to
implement
HybridBlock.infer_shape
to infer the parameters shape given theinputs.
Currently, if
HybridBlock.infer_shape
is not implemented, backward shapeinference is used to infer the shape of parameters. However backward shape
inference is not supported in all cases (cf #14253,
#14983 (comment))
and relying on it for parameter shape inference is brittle. Thus for consistency
and simplicity we require
infer_shape
method implementation when usinghybridization based on DC.
The text was updated successfully, but these errors were encountered: