[RFC] Deferred compute in imperative interface to unify imperative and symbolic interface #16375

leezu · 2019-10-04T22:58:14Z

A new deferred computation (DC) argument to the imperative MXNet APIs is
proposed. If enabled, memory allocation and computation is deferred as long as
possible. Users can export the computational graph recorded during deferred
computation, which enables hybridization support.

Arrays for which DC is enabled are called lazy. Other arrays are called
normal. Inplace operations on lazy arrays are unsupported.

Storage allocation and computation for lazy arrays is deferred until their
results are required by conversion to numpy or use as input to an operator
creating a normal array. Accessing attributes such as shape can also trigger
computation if the attribute can't be inferred.

C API

Deferred Compute (DC) Mode

An “alias” to MXImperativeInvokeEx, MXImperativeDeferredInvokeEx is
introduced which creates lazy arrays based on (normal or lazy) input arrays and
the operator

/*!
 * \brief invoke a nnvm op and imperative function creating lazy ndarray
 * \param creator the op
 * \param num_inputs number of input NDArrays
 * \param inputs input NDArrays
 * \param num_outputs number of output NDArrays
 * \param outputs output NDArrays
 * \param num_params number of keyword parameters
 * \param param_keys keys for keyword parameters
 * \param param_vals values for keyword parameters
 * \param out_stypes output ndarrays' stypes
 * \return 0 when success, -1 when failure happens
 */
MXNET_DLL int MXImperativeDeferredInvokeEx(AtomicSymbolCreator creator,
                                           int num_inputs,
                                           NDArrayHandle *inputs,
                                           int *num_outputs,
                                           NDArrayHandle **outputs,
                                           int num_params,
                                           const char **param_keys,
                                           const char **param_vals,
                                           const int **out_stypes);

Checks and explicit trigger

/*!
 * \brief Check if array's computation is deferred.
 * \param handles ndarray handles to be checked
 * \param num_handles nmuber of ndarray handles to be checked
 * \param status pointer to array of num_handles integers to hold the result.
 */
MXNET_DLL int MXNDArrayGetIsDeferredCompute(NDArrayHandle *handles,
                                             int num_handles,
                                             int *status);
/*!
 * \brief Trigger deferred computation.
 * \param handles ndarray handles to trigger comuptation of.
 * \param num_handles nmuber of ndarray handles to be checked
 *
 * Deferred computation of input arrays for specified handles is triggered if
 * required. Arrays that are already computed are ignored.
 */
 MXNET_DLL int MXNDArrayTriggerDeferredCompute(NDArrayHandle *handles,
                                              int num_handles);

Exporting to symbol

The computational graph recorded in deferred computation mode can be exported to
symbol. Users must specify all inputs and outputs, to define the part of the
graph they are interested in exporting.

It is an error, if any of the output depends on an input is not or cannot be
computed from the specified inputs. Equally, providing an input that is not
connected to any output is an error.

/*!
 * \brief Extract the graph constructed during deferred computation mode as a
 * Symbol.
 * \param input_handles ndarray handles of inputs
 * \param output_handles ndarray handles of outputs
 * \param input_names names associated with the inputs of the returned Symbol
 * \param output_names names associated with the outputs of the returned Symbol
 * \param out grouped output symbol handle
 *
 * Construct a Symbol for the subgraph of the deferred computation graph
 * spanning from the input_handles to the output_handles. Requires that
 * input_handles and output_handles are connected in the tracked computational
 * graph. The input_handles are required to have been used as arguments to an
 * operator that is part of the tracked subgraph. All inputs of the
 * computational graph must be specified.
 */
MXNET_DLL int MXNDArrayGetDeferredComputeSymbol(NDArrayHandle *input_handles,
                                                NDArrayHandle *output_handles,
                                                const char** input_names,
                                                const char** output_names,
                                                int num_inputs,
                                                int num_outputs,
                                                SymbolHandle *out);

Basic Python usage example
Example without Gluon.

x = mx.np.arange(shape=(8, 10))
with deferred_compute():
    y = (x + 5) * (x + 5)
    z = x**2
s = export(inputs={'x': x}, outputs={'y': y, 'z': z})
assert s.list_inputs() == ['x']
assert s.list_outputs() == ['y', 'z']

Implementation (C++)

`NDArray`

class NDArray {
 public:

   [...]

  /*!
   * \brief constructs a new dynamic NDArray
   * \param shape the shape of array
   * \param ctx context of NDArray
   * \param `delay_alloc whether delay the allocation (True for DC mode)`
   * \param dtype data type of this ndarray
   */
  NDArray(const mxnet::TShape &shape, Context ctx,
          bool delay_alloc = false, int dtype = mshadow::default_type_flag)
      : ptr_(std::make_shared<Chunk>(shape, ctx, delay_alloc, dtype)),
        shape_(shape),
        dtype_(dtype),
        storage_type_(kDefaultStorage),
        autograd_entry_(nullptr) {
  }

   [...]
   
  /*!
   * \brief Block until all the pending write operations with respect
   *    to current NDArray are finished, and read can be performed.
   *
   * If this is a array with deferred computation, computation is triggered.
   */
  inline void WaitToRead() const;
  /*!
   * \brief Block until all the pending read/write operations with respect
   *    to current NDArray are finished, and write can be performed.
   *
   * If this is a array with deferred computation, computation is triggered.
   */
  inline void WaitToWrite() const;

   [...]

 private:

   [...]

  /*! \brief node entry for autograd */
  nnvm::NodeEntry autograd_entry_;  // renamed from entry_
  /*! \brief node entry for deferred computation tracking */
  nnvm::NodeEntry deferredcompute_entry_;
  
  /*!
   * \brief Perform deferred computation.
   *
   * Applicable if current array is associated with deferredcompute_entry_ and
   * DCInfo. If so, compute this and all dependent NDArrays.
   *
   * Triggered automatically if needed by WaitToRead
   */
  void DeferredCompute() const;

   [...]

};

`DCInfo`

  /*! \brief DCInfo stores NDArays required to perform the deferred computation
   *  of it's owning NDArray.
   *
   *  Once deferred computation is completed, DCInfo::Clear should be executed
   *  to release references to input data.
   */
  class DCInfo {
  public:
    DCInfo(std::vector<NDArray> inputs) inputs_(inputs);

    static DCInfo& Get(const nnvm::NodePtr& node) {
      return dmlc::get<DCInfo>(node->info);
    }

    static void Clear(const nnvm::NodePtr& node) {
      if (node == nullptr || node->info.empty()) return;
      DCInfo& info = Get(node);
      node->info.clear();
    }

    static DCInfo& Create(const nnvm::NodePtr& node) {
      node->info.construct<DCInfo>();
      return Get(node);
    }

  private:
    std::vector<NDArray> inputs_;

  };

Execution

Trigger execution from within NDArray

NDArray::WaitToRead and NDArray::WaitToWrite are extended to trigger
execution, calling NDArray::TriggerDeferredCompute. TriggerDeferredCompute
is a no-op if no DCInfo is associated with the current array, ie. if it is
already computed.

Explicit C API to trigger execution
Users can also manually trigger the computation of specified arrays.

MXNET_DLL int MXNDArrayTriggerDeferredComputation(NDArrayHandle *handles)

Implementation
Operations on the graph are pushed to the engine for asynchronous execution via RunGraph.

FAQ

How about Autograd, NDArray.autograd_entry_ and AGInfo?
Autograd inside deferred computation (DC) mode can be supported.

Relation of Autograd and DC: While autograd’s RecordOp provides a similar
recording functionality to the deferred computation, the autograd graph is not
the same as a computational graph: NDArray::Detach() serves to detach a node
from the autograd graph by deleting NDArray.entry_, though the NodeEntry is
still required for reconstructing the computational history of how this NDArray
came to be.

Are reqs like kInPlace supported?
No. For now only kWriteTo is supported in DC mode.

The plan is to replace inplace operations with kWriteTo operations, writing to
a new (lazy) array. The framework should be smart enough to decide when to reuse
memory and when not. It shouldn’t be required for users to specify that they
want an inplace operation.

How is context attribute handled, specifically context changes?

Cross-device copy must be represented as operator (CrossDeviceCopyOp) which
requires special handling in the graph executor.

How is incomplete shape information handled?
shape property triggers computation if shape is accessed and can't be inferred completely.
Users can access static_shape if they wan't to avoid triggering computation.

Python (Gluon)

Based on DC, hybridization in Gluon is simplified:

Instead of implementing def hybrid_forward(self, F, x, ...) in HybridBlock,
users can opt to implement def forward(self, x, ...) in HybridBlock.

Hybridization based on DC works by the HybridBlock performing the following
steps (if it is not called by a parent block being hybridized)

keeping a reference to the input arrays and a reference to the parameter
arrays to pass them to MXNDArrayGetDeferredComputeSymbol;
enabling deferred compute mode
running forward
exporting to symbol and create CachedOp; Run CachedOp

A (internal) global context variable tracks if hybridization is ongoing. If set
to False and a Block is called that is to be hybridized, the global context
variable is set to True and the Block goes through all 4 steps outlined above;
finally the context variable is set back to False after the export to Symbol
step is finished.

Usage example

class Net(nn.HybridBlock):  
    def forward(self, x, ...):
        ...

Hybridizing gluon.Blocks?

DC could be used to support hybridzing Block if all logic can be traced. A
separate effort may add logic to detect these cases and add hybridization
support based on DC. For now we rely on user to signify hybridization support by
subclassing HybridBlock.

Parameter Shape Inference

For HybridBlock making use of DC for hybridization, we request users to
implement HybridBlock.infer_shape to infer the parameters shape given the
inputs.

Currently, if HybridBlock.infer_shape is not implemented, backward shape
inference is used to infer the shape of parameters. However backward shape
inference is not supported in all cases (cf #14253,
#14983 (comment))
and relying on it for parameter shape inference is brittle. Thus for consistency
and simplicity we require infer_shape method implementation when using
hybridization based on DC.

The text was updated successfully, but these errors were encountered:

mxnet-label-bot · 2019-10-04T22:58:18Z

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended label(s): Feature

leezu · 2019-10-05T00:03:17Z

Closing as issue was not picked up by the mailing list bridge.

leezu closed this as completed Oct 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Deferred compute in imperative interface to unify imperative and symbolic interface #16375

[RFC] Deferred compute in imperative interface to unify imperative and symbolic interface #16375

leezu commented Oct 4, 2019

mxnet-label-bot commented Oct 4, 2019

leezu commented Oct 5, 2019

[RFC] Deferred compute in imperative interface to unify imperative and symbolic interface #16375

[RFC] Deferred compute in imperative interface to unify imperative and symbolic interface #16375

Comments

leezu commented Oct 4, 2019

C API

Deferred Compute (DC) Mode

Checks and explicit trigger

Exporting to symbol

Implementation (C++)

NDArray

DCInfo

Execution

FAQ

Python (Gluon)

Parameter Shape Inference

mxnet-label-bot commented Oct 4, 2019

leezu commented Oct 5, 2019

`NDArray`

`DCInfo`