From 5efec2132111a29418468871f5ece1ad71b07b77 Mon Sep 17 00:00:00 2001
From: "q.yao" <yaoqian@sensetime.com>
Date: Thu, 1 Apr 2021 15:27:43 +0800
Subject: [PATCH 01/10] update tensorrt plugin document

---
 docs/tensorrt_ops.md    | 202 ++++++++++++++++++++++++++++++++++++++++
 docs/tensorrt_plugin.md |   8 +-
 2 files changed, 207 insertions(+), 3 deletions(-)
 create mode 100644 docs/tensorrt_ops.md
diff --git a/docs/tensorrt_ops.md b/docs/tensorrt_ops.md
new file mode 100644
index 0000000000..a8b2607aa6
--- /dev/null
+++ b/docs/tensorrt_ops.md
@@ -0,0 +1,202 @@
+# TensorRT Ops
+
+<!-- TOC -->
+
+- [TensorRT OPS](#tensort-ops)
+  - [MMCVRoiAlign](#mmcvroialign)
+  - [ScatterND](#scatternd)
+  - [NonMaxSuppression](#nonmaxsuppression)
+  - [MMCVDeformConv2d](#mmcvdeformconv2d)
+  - [grid_sampler](#grid-sampler)
+
+<!-- TOC -->
+
+## MMCVRoiAlign
+
+### Description
+
+Perform ROIAlign on output feature, used in bbox_head of most two stage detectors.
+
+### Parameters
+
+| Type | Parameter | Description |
+| --- | --- | --- |
+| `int` | `output_height` | height of output roi |
+| `int` | `output_width` | width of output roi |
+| `float` | `spatial_scale` | scale the input boxes by this |
+| `int` | `sampling_ratio` | number of inputs samples to take for each output sample. 0 to take samples densely for current models. |
+| `str` | `mode` | pooling mode in each bin. `avg` or `max` |
+| `int` | `aligned` | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly. |
+
+### Inputs
+
+<dl>
+<dt><tt>inputs[0]</tt>: T</dt>
+<dd>Input feature map; 4D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the data.</dd>
+<dt><tt>inputs[1]</tt>: T</dt>
+<dd>RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...]. The RoIs' coordinates are the coordinate system of inputs[0].</dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dt><tt>outputs[0]</tt>: T</dt>
+<dd>RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element output[0][r-1] is a pooled feature map corresponding to the r-th RoI inputs[1][r-1].<dd>
+</dl>
+
+### Type Constraints
+
+- T:tensor(float32, Linear)
+
+## ScatterND
+
+### Description
+
+ScatterND takes three inputs `data` tensor of rank r >= 1, `indices` tensor of rank q >= 1, and `updates` tensor of rank q + r - indices.shape[-1] - 1. The output of the operation is produced by creating a copy of the input `data`, and then updating its value to values specified by updates at specific index positions specified by `indices`. Its output shape is the same as the shape of `data`. Note that `indices` should not have duplicate entries. That is, two or more updates for the same index-location is not supported.
+
+The `output` is calculated via the following equation:
+```python
+  output = np.copy(data)
+  update_indices = indices.shape[:-1]
+  for idx in np.ndindex(update_indices):
+      output[indices[idx]] = updates[idx]
+```
+
+### Parameters
+
+None
+
+### Inputs
+
+<dl>
+<dt><tt>inputs[0]</tt>: T</dt>
+<dd>Tensor of rank r>=1.</dd>
+
+<dt><tt>inputs[1]</tt>: tensor(int32, Linear)</dt>
+<dd>Tensor of rank q>=1.</dd>
+
+<dt><tt>inputs[2]</tt>: T</dt>
+<dd>Tensor of rank q + r - indices_shape[-1] - 1.</dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dt><tt>outputs[0]</tt>: T</dt>
+<dd>Tensor of rank r >= 1.</dd>
+</dl>
+
+### Type Constaints
+
+- T:tensor(float32, Linear), tensor(int32, Linear)
+
+## NonMaxSuppression
+
+### Description
+
+Filter out boxes has high IOU overlap with previously selected boxes or low score. Output the indices of valid boxes. Indices of invalid boxes will be filled with -1.
+
+### Parameters
+
+| Type | Parameter | Description |
+| --- | --- | --- |
+| `int` | `center_point_box` | 0 - the box data is supplied as [y1, x1, y2, x2], 1-the box data is supplied as [x_center, y_center, width, height]. |
+| `int` | `max_output_boxes_per_class` | The maximum number of boxes to be selected per batch per class. Default to 0, number of output boxes equal to number of input boxes. |
+| `float` | `iou_threshold` | The threshold for deciding whether boxes overlap too much with respect to IOU. Value range [0, 1]. Default to 0. |
+| `float` | `score_threshold` | The threshold for deciding when to remove boxes based on score. |
+| `int` | `offset` | 0 or 1, boxes' width or height is (x2 - x1 + offset). |
+
+### Inputs
+
+<dl>
+<dt><tt>inputs[0]</tt>: T</dt>
+<dd>Input boxes. 3-D tensor of shape (num_batches, spatial_dimension, 4).</dd>
+<dt><tt>inputs[1]</tt>: T</dt>
+<dd>Input scores. 3-D tensor of shape (num_batches, num_classes, spatial_dimension).</dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dt><tt>outputs[0]</tt>: tensor(int32, Linear)</dt>
+<dd>Selected indices. 2-D tensor of shape (num_selected_indices, 3) as [[batch_index, class_index, box_index], ...].</dd>
+<dd>num_selected_indices=num_batches * num_classes * min(max_output_boxes_per_class, spatial_dimension).</dd>
+<dd>All invalid indices will be filled with -1.</dd>
+</dl>
+
+### Type Constaints
+
+- T:tensor(float32, Linear)
+
+## MMCVDeformConv2d
+
+### Description
+
+Perform Deformable Convolution on input feature, read [Deformable Convolutional Network](https://arxiv.org/abs/1703.06211) for detail.
+
+### Parameters
+
+| Type | Parameter | Description |
+| --- | --- | --- |
+| `list of ints` | `stride` | The stride of the convolving kernel. (sH, sW) |
+| `list of ints` | `padding` | Paddings on both sides of the input. (padH, padW) |
+| `list of ints` | `dilation` | The spacing between kernel elements. (dH, dW) |
+| `int` | `deformable_group` | Groups of deformable offset. |
+| `int` | `group` | Split input into groups. `input_channel` should be divisible by the number of groups. |
+| `int` | `im2col_step` | DeformableConv2d use im2col to compute convolution. im2col_step is used to split input and offset, reduce memory usage of column. |
+
+### Inputs
+
+<dl>
+<dt><tt>inputs[0]</tt>: T</dt>
+<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the numbers of channels, inH and inW are the height and width of the data.</dd>
+<dt><tt>inputs[1]</tt>: T</dt>
+<dd>Input offset; 4-D tensor of shape (N, deformable_group * 2 * kH * kW, outH, outW), where kH and kW is the height and width of weight, outH and outW is the height and width of offset and output.</dd>
+<dt><tt>inputs[2]</tt>: T</dt>
+<dd>Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).</dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dt><tt>outputs[0]</tt>: T</dt>
+<dd>Output feature; 4-D tensor of shape (N, output_channel, outH, outW).</dd>
+</dl>
+
+### Type Constaints
+
+- T:tensor(float32, Linear)
+
+## grid sampler
+
+### Description
+
+Perform sample from `input` with pixel locations from `grid`.
+
+### Parameters
+
+| Type | Parameter | Description |
+| --- | --- | --- |
+| `int` | `interpolation_mode` | Interpolation mode to calculate output values. (0: `bilinear` , 1: `nearest`) |
+| `int` | `padding_mode` | Padding mode for outside grid values. (0: `zeros`, 1: `border`, 2: `reflection`) |
+| `int` | `align_corners` | If `align_corners=1`, the extrema (`-1` and `1`) are considered as referring to the center points of the input's corner pixels. If `align_corners=0`, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic. |
+
+### Inputs
+
+<dl>
+<dt><tt>inputs[0]</tt>: T</dt>
+<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the numbers of channels, inH and inW are the height and width of the data.</dd>
+<dt><tt>inputs[1]</tt>: T</dt>
+<dd>Input offset; 4-D tensor of shape (N, outH, outW, 2), where outH and outW is the height and width of offset and output. </dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dt><tt>outputs[0]</tt>: T</dt>
+<dd>Output feature; 4-D tensor of shape (N, C, outH, outW).</dd>
+</dl>
+
+### Type Constaints
+
+- T:tensor(float32, Linear)
diff --git a/docs/tensorrt_plugin.md b/docs/tensorrt_plugin.md
index 69a20fd9da..0ba1094e7b 100644
--- a/docs/tensorrt_plugin.md
+++ b/docs/tensorrt_plugin.md
@@ -26,9 +26,11 @@ To ease the deployment of trained models with custom operators from `mmcv.ops` u
 
 |   ONNX Operator   |    TensorRT Plugin    | Note  |
 | :---------------: | :-------------------: | :---: |
-|     RoiAlign      |     MMCVRoiAlign      |   Y   |
-|     ScatterND     |       ScatterND       |   Y   |
-| NonMaxSuppression | MMCVNonMaxSuppression |  WIP  |
+| MMCVRoiAlign | [MMCVRoiAlign](./tensorrt_ops.md#mmcvroialign) | Y |
+| ScatterND | [ScatterND](./tensorrt_ops.md#scatternd) | Y |
+| NonMaxSuppression | [NonMaxSuppression](./tensorrt_ops.md#nonmaxsuppression) | Y |
+| MMCVDeformConv2d | [MMCVDeformConv2d](./tensorrt_ops.md#mmcvdeformconv2d) | Y |
+| grid_sampler | [grid_sampler](./tensorrt_ops.md#grid-sampler) | Y |
 
 Notes
 

From 9be413d1689caefe66bab63ac4357e0d3460710e Mon Sep 17 00:00:00 2001
From: "q.yao" <yaoqian@sensetime.com>
Date: Thu, 1 Apr 2021 19:21:02 +0800
Subject: [PATCH 02/10] add onnxruntime custom ops document

---
 docs/onnxruntime_custom_ops.md                | 118 ++++++++++++++++++
 docs/onnxruntime_op.md                        |   5 +-
 ...tensorrt_ops.md => tensorrt_custom_ops.md} |   4 +-
 docs/tensorrt_plugin.md                       |  10 +-
 4 files changed, 128 insertions(+), 9 deletions(-)
 create mode 100644 docs/onnxruntime_custom_ops.md
 rename docs/{tensorrt_ops.md => tensorrt_custom_ops.md} (99%)

diff --git a/docs/onnxruntime_custom_ops.md b/docs/onnxruntime_custom_ops.md
new file mode 100644
index 0000000000..c2f63cf9b3
--- /dev/null
+++ b/docs/onnxruntime_custom_ops.md
@@ -0,0 +1,118 @@
+# Onnxruntime Custom Ops
+
+<!-- TOC -->
+
+- [Onnxruntime Custom Ops](#onnxruntime-custom-ops)
+  - [SoftNMS](#softnms)
+  - [RoiAlign](#roialign)
+  - [NMS](#nms)
+
+<!-- TOC -->
+
+## SoftNMS
+
+### Description
+
+Perform soft NMS on `boxes` with `scores`. Read [Soft-NMS -- Improving Object Detection With One Line of Code](https://arxiv.org/abs/1704.04503) for detail.
+
+### Parameters
+
+| Type | Parameter | Description |
+| --- | --- | --- |
+| `float` | `iou_threshold` | IoU threshold for NMS |
+| `float` | `sigma` | hyperparameter for gaussian method |
+| `float` | `min_score` | score filter threshold |
+| `int` | `method` | method to do the nms, (0: `naive`, 1: `linear`, 2: `gaussian`) |
+| `int` | `offset` | `boxes` width or height is (x2 - x1 + offset). (0 or 1) |
+
+### Inputs
+
+<dl>
+<dt><tt>boxes</tt>: T</dt>
+<dd>Input boxes. 2-D tensor of shape (N, 4). N is the batch size.</dd>
+<dt><tt>scores</tt>: T</dt>
+<dd>Input scores. 1-D tensor of shape (N, ).</dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dt><tt>dets</tt>: tensor(int64)</dt>
+<dd>Output boxes and scores. 2-D tensor of shape (num_valid_boxes, 5), [[x1, y1, x2, y2, score], ...]. num_valid_boxes is the number of valid boxes.</dd>
+<dt><tt>indices</tt>: T</dt>
+<dd>Output indices. 1-D tensor of shape (num_valid_boxes, ).</dd>
+</dl>
+
+### Type Constraints
+
+- T:tensor(float32)
+
+## RoiAlign
+
+### Description
+
+Perform ROIAlign on output feature, used in bbox_head of most two stage detectors.
+
+### Parameters
+
+| Type | Parameter | Description |
+| --- | --- | --- |
+| `int` | `output_height` | height of output roi |
+| `int` | `output_width` | width of output roi |
+| `float` | `spatial_scale` | scale the input boxes by this |
+| `int` | `sampling_ratio` | number of inputs samples to take for each output sample. 0 to take samples densely for current models. |
+| `str` | `mode` | pooling mode in each bin. `avg` or `max` |
+| `int` | `aligned` | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly. |
+
+### Inputs
+
+<dl>
+<dt><tt>input</tt>: T</dt>
+<dd>Input feature map; 4D tensor of shape (N, C, H, W), where N is the batch size, C is the numbers of channels, H and W are the height and width of the data.</dd>
+<dt><tt>rois</tt>: T</dt>
+<dd>RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...]. The RoIs' coordinates are the coordinate system of input.</dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dt><tt>feat</tt>: T</dt>
+<dd>RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element feat[r-1] is a pooled feature map corresponding to the r-th RoI RoIs[r-1].<dd>
+</dl>
+
+### Type Constraints
+
+- T:tensor(float32)
+
+## NMS
+
+### Description
+
+Filter out boxes has high IOU overlap with previously selected boxes.
+
+### Parameters
+
+| Type | Parameter | Description |
+| --- | --- | --- |
+| `float` | `iou_threshold` | The threshold for deciding whether boxes overlap too much with respect to IOU. Value range [0, 1]. Default to 0. |
+| `int` | `offset` | 0 or 1, boxes' width or height is (x2 - x1 + offset). |
+
+### Inputs
+
+<dl>
+<dt><tt>bboxes</tt>: T</dt>
+<dd>Input boxes. 2-D tensor of shape (num_boxes, 4). num_boxes is the number of input boxes.</dd>
+<dt><tt>scores</tt>: T</dt>
+<dd>Input scores. 1-D tensor of shape (num_boxes, ).</dd>
+</dl>
+
+### Outputs
+
+<dl>
+<dt><tt>indices</tt>: tensor(int32, Linear)</dt>
+<dd>Selected indices. 1-D tensor of shape (num_valid_boxes, ). num_valid_boxes is the number of valid boxes.</dd>
+</dl>
+
+### Type Constraints
+
+- T:tensor(float32)
diff --git a/docs/onnxruntime_op.md b/docs/onnxruntime_op.md
index 9090656bc2..7dcf81ba6c 100644
--- a/docs/onnxruntime_op.md
+++ b/docs/onnxruntime_op.md
@@ -17,8 +17,9 @@
 
 | Operator |  CPU  |  GPU  |                                                Note                                                 |
 | :------: | :---: | :---: | :-------------------------------------------------------------------------------------------------: |
-| SoftNMS  |   Y   |   N   | commit [94810f](https://github.com/open-mmlab/mmcv/commit/94810f2297871d0ea3ca650dcb2e842f5374d998) |
-| RoiAlign |   Y   |   N   |                                                None                                                 |
+|  [SoftNMS](onnxruntime_custom_ops.md#softnms)  |   Y   |   N   | commit [94810f](https://github.com/open-mmlab/mmcv/commit/94810f2297871d0ea3ca650dcb2e842f5374d998) |
+| [RoiAlign](onnxruntime_custom_ops.md#roialign) |   Y   |   N   |                                                None                                                 |
+|      [NMS](onnxruntime_custom_ops.md#nms)      |   Y   |   N   |                                                None                                                 |
 
 ## How to build custom operators for ONNX Runtime
 
diff --git a/docs/tensorrt_ops.md b/docs/tensorrt_custom_ops.md
similarity index 99%
rename from docs/tensorrt_ops.md
rename to docs/tensorrt_custom_ops.md
index a8b2607aa6..204ee032c2 100644
--- a/docs/tensorrt_ops.md
+++ b/docs/tensorrt_custom_ops.md
@@ -1,8 +1,8 @@
-# TensorRT Ops
+# TensorRT Custom Ops
 
 <!-- TOC -->
 
-- [TensorRT OPS](#tensort-ops)
+- [TensorRT Custom Ops](#tensorrt-custom-ops)
   - [MMCVRoiAlign](#mmcvroialign)
   - [ScatterND](#scatternd)
   - [NonMaxSuppression](#nonmaxsuppression)
diff --git a/docs/tensorrt_plugin.md b/docs/tensorrt_plugin.md
index 0ba1094e7b..d44029ca57 100644
--- a/docs/tensorrt_plugin.md
+++ b/docs/tensorrt_plugin.md
@@ -26,11 +26,11 @@ To ease the deployment of trained models with custom operators from `mmcv.ops` u
 
 |   ONNX Operator   |    TensorRT Plugin    | Note  |
 | :---------------: | :-------------------: | :---: |
-| MMCVRoiAlign | [MMCVRoiAlign](./tensorrt_ops.md#mmcvroialign) | Y |
-| ScatterND | [ScatterND](./tensorrt_ops.md#scatternd) | Y |
-| NonMaxSuppression | [NonMaxSuppression](./tensorrt_ops.md#nonmaxsuppression) | Y |
-| MMCVDeformConv2d | [MMCVDeformConv2d](./tensorrt_ops.md#mmcvdeformconv2d) | Y |
-| grid_sampler | [grid_sampler](./tensorrt_ops.md#grid-sampler) | Y |
+| MMCVRoiAlign | [MMCVRoiAlign](./tensorrt_custom_ops.md#mmcvroialign) | Y |
+| ScatterND | [ScatterND](./tensorrt_custom_ops.md#scatternd) | Y |
+| NonMaxSuppression | [NonMaxSuppression](./tensorrt_custom_ops.md#nonmaxsuppression) | Y |
+| MMCVDeformConv2d | [MMCVDeformConv2d](./tensorrt_custom_ops.md#mmcvdeformconv2d) | Y |
+| grid_sampler | [grid_sampler](./tensorrt_custom_ops.md#grid-sampler) | Y |
 
 Notes
 

From b74dbb54e838a595806df2930c9e736aa9e26b15 Mon Sep 17 00:00:00 2001
From: "q.yao" <yaoqian@sensetime.com>
Date: Fri, 2 Apr 2021 11:40:00 +0800
Subject: [PATCH 03/10] format document

---
 docs/onnxruntime_custom_ops.md |  68 +++++++++----------
 docs/onnxruntime_op.md         |   4 +-
 docs/tensorrt_custom_ops.md    | 116 +++++++++++++++++----------------
 docs/tensorrt_plugin.md        |  14 ++--
 4 files changed, 102 insertions(+), 100 deletions(-)

diff --git a/docs/onnxruntime_custom_ops.md b/docs/onnxruntime_custom_ops.md
index c2f63cf9b3..01a08a8b7d 100644
--- a/docs/onnxruntime_custom_ops.md
+++ b/docs/onnxruntime_custom_ops.md
@@ -11,21 +11,21 @@
 
 ## SoftNMS
 
-### Description
+<h3>Descriptions</h3>
 
 Perform soft NMS on `boxes` with `scores`. Read [Soft-NMS -- Improving Object Detection With One Line of Code](https://arxiv.org/abs/1704.04503) for detail.
 
-### Parameters
+<h3>Parameters</h3>
 
-| Type | Parameter | Description |
-| --- | --- | --- |
-| `float` | `iou_threshold` | IoU threshold for NMS |
-| `float` | `sigma` | hyperparameter for gaussian method |
-| `float` | `min_score` | score filter threshold |
-| `int` | `method` | method to do the nms, (0: `naive`, 1: `linear`, 2: `gaussian`) |
-| `int` | `offset` | `boxes` width or height is (x2 - x1 + offset). (0 or 1) |
+| Type    | Parameter       | Description                                                    |
+| ------- | --------------- | -------------------------------------------------------------- |
+| `float` | `iou_threshold` | IoU threshold for NMS                                          |
+| `float` | `sigma`         | hyperparameter for gaussian method                             |
+| `float` | `min_score`     | score filter threshold                                         |
+| `int`   | `method`        | method to do the nms, (0: `naive`, 1: `linear`, 2: `gaussian`) |
+| `int`   | `offset`        | `boxes` width or height is (x2 - x1 + offset). (0 or 1)        |
 
-### Inputs
+<h3>Inputs</h3>
 
 <dl>
 <dt><tt>boxes</tt>: T</dt>
@@ -34,7 +34,7 @@ Perform soft NMS on `boxes` with `scores`. Read [Soft-NMS -- Improving Object De
 <dd>Input scores. 1-D tensor of shape (N, ).</dd>
 </dl>
 
-### Outputs
+<h3>Outputs</h3>
 
 <dl>
 <dt><tt>dets</tt>: tensor(int64)</dt>
@@ -43,28 +43,28 @@ Perform soft NMS on `boxes` with `scores`. Read [Soft-NMS -- Improving Object De
 <dd>Output indices. 1-D tensor of shape (num_valid_boxes, ).</dd>
 </dl>
 
-### Type Constraints
+<h3>Type Constraints</h3>
 
 - T:tensor(float32)
 
 ## RoiAlign
 
-### Description
+<h3>Descriptions</h3>
 
-Perform ROIAlign on output feature, used in bbox_head of most two stage detectors.
+Perform ROIAlign on output feature, used in bbox_head of most two-stage detectors.
 
-### Parameters
+<h3>Parameters</h3>
 
-| Type | Parameter | Description |
-| --- | --- | --- |
-| `int` | `output_height` | height of output roi |
-| `int` | `output_width` | width of output roi |
-| `float` | `spatial_scale` | scale the input boxes by this |
-| `int` | `sampling_ratio` | number of inputs samples to take for each output sample. 0 to take samples densely for current models. |
-| `str` | `mode` | pooling mode in each bin. `avg` or `max` |
-| `int` | `aligned` | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly. |
+| Type    | Parameter        | Description                                                                                            |
+| ------- | ---------------- | ------------------------------------------------------------------------------------------------------ |
+| `int`   | `output_height`  | height of output roi                                                                                   |
+| `int`   | `output_width`   | width of output roi                                                                                    |
+| `float` | `spatial_scale`  | scale the input boxes by this                                                                          |
+| `int`   | `sampling_ratio` | number of inputs samples to take for each output sample. 0 to take samples densely for current models. |
+| `str`   | `mode`           | pooling mode in each bin. `avg` or `max`                                                               |
+| `int`   | `aligned`        | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly.  |
 
-### Inputs
+<h3>Inputs</h3>
 
 <dl>
 <dt><tt>input</tt>: T</dt>
@@ -73,31 +73,31 @@ Perform ROIAlign on output feature, used in bbox_head of most two stage detector
 <dd>RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...]. The RoIs' coordinates are the coordinate system of input.</dd>
 </dl>
 
-### Outputs
+<h3>Outputs</h3>
 
 <dl>
 <dt><tt>feat</tt>: T</dt>
 <dd>RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element feat[r-1] is a pooled feature map corresponding to the r-th RoI RoIs[r-1].<dd>
 </dl>
 
-### Type Constraints
+<h3>Type Constraints</h3>
 
 - T:tensor(float32)
 
 ## NMS
 
-### Description
+<h3>Descriptions</h3>
 
 Filter out boxes has high IOU overlap with previously selected boxes.
 
-### Parameters
+<h3>Parameters</h3>
 
-| Type | Parameter | Description |
-| --- | --- | --- |
+| Type    | Parameter       | Description                                                                                                      |
+| ------- | --------------- | ---------------------------------------------------------------------------------------------------------------- |
 | `float` | `iou_threshold` | The threshold for deciding whether boxes overlap too much with respect to IOU. Value range [0, 1]. Default to 0. |
-| `int` | `offset` | 0 or 1, boxes' width or height is (x2 - x1 + offset). |
+| `int`   | `offset`        | 0 or 1, boxes' width or height is (x2 - x1 + offset).                                                            |
 
-### Inputs
+<h3>Inputs</h3>
 
 <dl>
 <dt><tt>bboxes</tt>: T</dt>
@@ -106,13 +106,13 @@ Filter out boxes has high IOU overlap with previously selected boxes.
 <dd>Input scores. 1-D tensor of shape (num_boxes, ).</dd>
 </dl>
 
-### Outputs
+<h3>Outputs</h3>
 
 <dl>
 <dt><tt>indices</tt>: tensor(int32, Linear)</dt>
 <dd>Selected indices. 1-D tensor of shape (num_valid_boxes, ). num_valid_boxes is the number of valid boxes.</dd>
 </dl>
 
-### Type Constraints
+<h3>Type Constraints</h3>
 
 - T:tensor(float32)
diff --git a/docs/onnxruntime_op.md b/docs/onnxruntime_op.md
index 7dcf81ba6c..a31a0d6971 100644
--- a/docs/onnxruntime_op.md
+++ b/docs/onnxruntime_op.md
@@ -15,8 +15,8 @@
 
 ## List of operators for ONNX Runtime supported in MMCV
 
-| Operator |  CPU  |  GPU  |                                                Note                                                 |
-| :------: | :---: | :---: | :-------------------------------------------------------------------------------------------------: |
+|                    Operator                    |  CPU  |  GPU  |                                                Note                                                 |
+| :--------------------------------------------: | :---: | :---: | :-------------------------------------------------------------------------------------------------: |
 |  [SoftNMS](onnxruntime_custom_ops.md#softnms)  |   Y   |   N   | commit [94810f](https://github.com/open-mmlab/mmcv/commit/94810f2297871d0ea3ca650dcb2e842f5374d998) |
 | [RoiAlign](onnxruntime_custom_ops.md#roialign) |   Y   |   N   |                                                None                                                 |
 |      [NMS](onnxruntime_custom_ops.md#nms)      |   Y   |   N   |                                                None                                                 |
diff --git a/docs/tensorrt_custom_ops.md b/docs/tensorrt_custom_ops.md
index 204ee032c2..2267a25ec5 100644
--- a/docs/tensorrt_custom_ops.md
+++ b/docs/tensorrt_custom_ops.md
@@ -7,28 +7,29 @@
   - [ScatterND](#scatternd)
   - [NonMaxSuppression](#nonmaxsuppression)
   - [MMCVDeformConv2d](#mmcvdeformconv2d)
-  - [grid_sampler](#grid-sampler)
+  - [grid sampler](#grid-sampler)
 
 <!-- TOC -->
 
 ## MMCVRoiAlign
 
-### Description
+<h3>Description</h3>
 
-Perform ROIAlign on output feature, used in bbox_head of most two stage detectors.
+Perform ROIAlign on output feature, used in bbox_head of most two stage
+detectors.
 
-### Parameters
+<h3>Parameters</h3>
 
-| Type | Parameter | Description |
-| --- | --- | --- |
-| `int` | `output_height` | height of output roi |
-| `int` | `output_width` | width of output roi |
-| `float` | `spatial_scale` | scale the input boxes by this |
-| `int` | `sampling_ratio` | number of inputs samples to take for each output sample. 0 to take samples densely for current models. |
-| `str` | `mode` | pooling mode in each bin. `avg` or `max` |
-| `int` | `aligned` | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly. |
+| Type    | Parameter        | Description                                                                                            |
+| ------- | ---------------- | ------------------------------------------------------------------------------------------------------ |
+| `int`   | `output_height`  | height of output roi                                                                                   |
+| `int`   | `output_width`   | width of output roi                                                                                    |
+| `float` | `spatial_scale`  | scale the input boxes by this                                                                          |
+| `int`   | `sampling_ratio` | number of inputs samples to take for each output sample. 0 to take samples densely for current models. |
+| `str`   | `mode`           | pooling mode in each bin. `avg` or `max`                                                               |
+| `int`   | `aligned`        | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly.  |
 
-### Inputs
+<h3>Inputs</h3>
 
 <dl>
 <dt><tt>inputs[0]</tt>: T</dt>
@@ -37,24 +38,25 @@ Perform ROIAlign on output feature, used in bbox_head of most two stage detector
 <dd>RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...]. The RoIs' coordinates are the coordinate system of inputs[0].</dd>
 </dl>
 
-### Outputs
+<h3>Outputs</h3>
 
 <dl>
 <dt><tt>outputs[0]</tt>: T</dt>
 <dd>RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element output[0][r-1] is a pooled feature map corresponding to the r-th RoI inputs[1][r-1].<dd>
 </dl>
 
-### Type Constraints
+<h3>Type Constraints</h3>
 
 - T:tensor(float32, Linear)
 
 ## ScatterND
 
-### Description
+<h3>Description</h3>
 
 ScatterND takes three inputs `data` tensor of rank r >= 1, `indices` tensor of rank q >= 1, and `updates` tensor of rank q + r - indices.shape[-1] - 1. The output of the operation is produced by creating a copy of the input `data`, and then updating its value to values specified by updates at specific index positions specified by `indices`. Its output shape is the same as the shape of `data`. Note that `indices` should not have duplicate entries. That is, two or more updates for the same index-location is not supported.
 
 The `output` is calculated via the following equation:
+
 ```python
   output = np.copy(data)
   update_indices = indices.shape[:-1]
@@ -62,11 +64,11 @@ The `output` is calculated via the following equation:
       output[indices[idx]] = updates[idx]
 ```
 
-### Parameters
+<h3>Parameters</h3>
 
 None
 
-### Inputs
+<h3>Inputs</h3>
 
 <dl>
 <dt><tt>inputs[0]</tt>: T</dt>
@@ -79,34 +81,34 @@ None
 <dd>Tensor of rank q + r - indices_shape[-1] - 1.</dd>
 </dl>
 
-### Outputs
+<h3>Outputs</h3>
 
 <dl>
 <dt><tt>outputs[0]</tt>: T</dt>
 <dd>Tensor of rank r >= 1.</dd>
 </dl>
 
-### Type Constaints
+<h3>Type Constraints</h3>
 
 - T:tensor(float32, Linear), tensor(int32, Linear)
 
 ## NonMaxSuppression
 
-### Description
+<h3>Description</h3>
 
 Filter out boxes has high IOU overlap with previously selected boxes or low score. Output the indices of valid boxes. Indices of invalid boxes will be filled with -1.
 
-### Parameters
+<h3>Parameters</h3>
 
-| Type | Parameter | Description |
-| --- | --- | --- |
-| `int` | `center_point_box` | 0 - the box data is supplied as [y1, x1, y2, x2], 1-the box data is supplied as [x_center, y_center, width, height]. |
-| `int` | `max_output_boxes_per_class` | The maximum number of boxes to be selected per batch per class. Default to 0, number of output boxes equal to number of input boxes. |
-| `float` | `iou_threshold` | The threshold for deciding whether boxes overlap too much with respect to IOU. Value range [0, 1]. Default to 0. |
-| `float` | `score_threshold` | The threshold for deciding when to remove boxes based on score. |
-| `int` | `offset` | 0 or 1, boxes' width or height is (x2 - x1 + offset). |
+| Type    | Parameter                    | Description                                                                                                                          |
+| ------- | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
+| `int`   | `center_point_box`           | 0 - the box data is supplied as [y1, x1, y2, x2], 1-the box data is supplied as [x_center, y_center, width, height].                 |
+| `int`   | `max_output_boxes_per_class` | The maximum number of boxes to be selected per batch per class. Default to 0, number of output boxes equal to number of input boxes. |
+| `float` | `iou_threshold`              | The threshold for deciding whether boxes overlap too much with respect to IOU. Value range [0, 1]. Default to 0.                     |
+| `float` | `score_threshold`            | The threshold for deciding when to remove boxes based on score.                                                                      |
+| `int`   | `offset`                     | 0 or 1, boxes' width or height is (x2 - x1 + offset).                                                                                |
 
-### Inputs
+<h3>Inputs</h3>
 
 <dl>
 <dt><tt>inputs[0]</tt>: T</dt>
@@ -115,73 +117,73 @@ Filter out boxes has high IOU overlap with previously selected boxes or low scor
 <dd>Input scores. 3-D tensor of shape (num_batches, num_classes, spatial_dimension).</dd>
 </dl>
 
-### Outputs
+<h3>Outputs</h3>
 
 <dl>
 <dt><tt>outputs[0]</tt>: tensor(int32, Linear)</dt>
 <dd>Selected indices. 2-D tensor of shape (num_selected_indices, 3) as [[batch_index, class_index, box_index], ...].</dd>
-<dd>num_selected_indices=num_batches * num_classes * min(max_output_boxes_per_class, spatial_dimension).</dd>
+<dd>num_selected_indices=num_batches* num_classes* min(max_output_boxes_per_class, spatial_dimension).</dd>
 <dd>All invalid indices will be filled with -1.</dd>
 </dl>
 
-### Type Constaints
+<h3>Type Constraints</h3>
 
 - T:tensor(float32, Linear)
 
 ## MMCVDeformConv2d
 
-### Description
+<h3>Description</h3>
 
 Perform Deformable Convolution on input feature, read [Deformable Convolutional Network](https://arxiv.org/abs/1703.06211) for detail.
 
-### Parameters
+<h3>Parameters</h3>
 
-| Type | Parameter | Description |
-| --- | --- | --- |
-| `list of ints` | `stride` | The stride of the convolving kernel. (sH, sW) |
-| `list of ints` | `padding` | Paddings on both sides of the input. (padH, padW) |
-| `list of ints` | `dilation` | The spacing between kernel elements. (dH, dW) |
-| `int` | `deformable_group` | Groups of deformable offset. |
-| `int` | `group` | Split input into groups. `input_channel` should be divisible by the number of groups. |
-| `int` | `im2col_step` | DeformableConv2d use im2col to compute convolution. im2col_step is used to split input and offset, reduce memory usage of column. |
+| Type           | Parameter          | Description                                                                                                                       |
+| -------------- | ------------------ | --------------------------------------------------------------------------------------------------------------------------------- |
+| `list of ints` | `stride`           | The stride of the convolving kernel. (sH, sW)                                                                                     |
+| `list of ints` | `padding`          | Paddings on both sides of the input. (padH, padW)                                                                                 |
+| `list of ints` | `dilation`         | The spacing between kernel elements. (dH, dW)                                                                                     |
+| `int`          | `deformable_group` | Groups of deformable offset.                                                                                                      |
+| `int`          | `group`            | Split input into groups. `input_channel` should be divisible by the number of groups.                                             |
+| `int`          | `im2col_step`      | DeformableConv2d use im2col to compute convolution. im2col_step is used to split input and offset, reduce memory usage of column. |
 
-### Inputs
+<h3>Inputs</h3>
 
 <dl>
 <dt><tt>inputs[0]</tt>: T</dt>
 <dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the numbers of channels, inH and inW are the height and width of the data.</dd>
 <dt><tt>inputs[1]</tt>: T</dt>
-<dd>Input offset; 4-D tensor of shape (N, deformable_group * 2 * kH * kW, outH, outW), where kH and kW is the height and width of weight, outH and outW is the height and width of offset and output.</dd>
+<dd>Input offset; 4-D tensor of shape (N, deformable_group* 2* kH* kW, outH, outW), where kH and kW is the height and width of weight, outH and outW is the height and width of offset and output.</dd>
 <dt><tt>inputs[2]</tt>: T</dt>
 <dd>Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).</dd>
 </dl>
 
-### Outputs
+<h3>Outputs</h3>
 
 <dl>
 <dt><tt>outputs[0]</tt>: T</dt>
 <dd>Output feature; 4-D tensor of shape (N, output_channel, outH, outW).</dd>
 </dl>
 
-### Type Constaints
+<h3>Type Constraints</h3>
 
 - T:tensor(float32, Linear)
 
 ## grid sampler
 
-### Description
+<h3>Description</h3>
 
 Perform sample from `input` with pixel locations from `grid`.
 
-### Parameters
+<h3>Parameters</h3>
 
-| Type | Parameter | Description |
-| --- | --- | --- |
-| `int` | `interpolation_mode` | Interpolation mode to calculate output values. (0: `bilinear` , 1: `nearest`) |
-| `int` | `padding_mode` | Padding mode for outside grid values. (0: `zeros`, 1: `border`, 2: `reflection`) |
-| `int` | `align_corners` | If `align_corners=1`, the extrema (`-1` and `1`) are considered as referring to the center points of the input's corner pixels. If `align_corners=0`, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic. |
+| Type  | Parameter            | Description                                                                                                                                                                                                                                                                                     |
+| ----- | -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `int` | `interpolation_mode` | Interpolation mode to calculate output values. (0: `bilinear` , 1: `nearest`)                                                                                                                                                                                                                   |
+| `int` | `padding_mode`       | Padding mode for outside grid values. (0: `zeros`, 1: `border`, 2: `reflection`)                                                                                                                                                                                                                |
+| `int` | `align_corners`      | If `align_corners=1`, the extrema (`-1` and `1`) are considered as referring to the center points of the input's corner pixels. If `align_corners=0`, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic. |
 
-### Inputs
+<h3>Inputs</h3>
 
 <dl>
 <dt><tt>inputs[0]</tt>: T</dt>
@@ -190,13 +192,13 @@ Perform sample from `input` with pixel locations from `grid`.
 <dd>Input offset; 4-D tensor of shape (N, outH, outW, 2), where outH and outW is the height and width of offset and output. </dd>
 </dl>
 
-### Outputs
+<h3>Outputs</h3>
 
 <dl>
 <dt><tt>outputs[0]</tt>: T</dt>
 <dd>Output feature; 4-D tensor of shape (N, C, outH, outW).</dd>
 </dl>
 
-### Type Constaints
+<h3>Type Constraints</h3>
 
 - T:tensor(float32, Linear)
diff --git a/docs/tensorrt_plugin.md b/docs/tensorrt_plugin.md
index d44029ca57..fbfc58dcf5 100644
--- a/docs/tensorrt_plugin.md
+++ b/docs/tensorrt_plugin.md
@@ -24,13 +24,13 @@ To ease the deployment of trained models with custom operators from `mmcv.ops` u
 
 ## List of TensorRT plugins supported in MMCV
 
-|   ONNX Operator   |    TensorRT Plugin    | Note  |
-| :---------------: | :-------------------: | :---: |
-| MMCVRoiAlign | [MMCVRoiAlign](./tensorrt_custom_ops.md#mmcvroialign) | Y |
-| ScatterND | [ScatterND](./tensorrt_custom_ops.md#scatternd) | Y |
-| NonMaxSuppression | [NonMaxSuppression](./tensorrt_custom_ops.md#nonmaxsuppression) | Y |
-| MMCVDeformConv2d | [MMCVDeformConv2d](./tensorrt_custom_ops.md#mmcvdeformconv2d) | Y |
-| grid_sampler | [grid_sampler](./tensorrt_custom_ops.md#grid-sampler) | Y |
+|   ONNX Operator   |                         TensorRT Plugin                         | Note  |
+| :---------------: | :-------------------------------------------------------------: | :---: |
+|   MMCVRoiAlign    |      [MMCVRoiAlign](./tensorrt_custom_ops.md#mmcvroialign)      |   Y   |
+|     ScatterND     |         [ScatterND](./tensorrt_custom_ops.md#scatternd)         |   Y   |
+| NonMaxSuppression | [NonMaxSuppression](./tensorrt_custom_ops.md#nonmaxsuppression) |   Y   |
+| MMCVDeformConv2d  |  [MMCVDeformConv2d](./tensorrt_custom_ops.md#mmcvdeformconv2d)  |   Y   |
+|   grid_sampler    |      [grid_sampler](./tensorrt_custom_ops.md#grid-sampler)      |   Y   |
 
 Notes
 

From 7ead67440b43f147209b98784b42334e5c79fac1 Mon Sep 17 00:00:00 2001
From: "q.yao" <yaoqian@sensetime.com>
Date: Wed, 7 Apr 2021 09:45:55 +0800
Subject: [PATCH 04/10] add release note to onnxruntime_op and tensorrt_plugin

---
 docs/onnxruntime_custom_ops.md |  4 ++--
 docs/onnxruntime_op.md         | 10 +++++-----
 docs/tensorrt_plugin.md        | 14 +++++++-------
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/docs/onnxruntime_custom_ops.md b/docs/onnxruntime_custom_ops.md
index 01a08a8b7d..5ef2ad0664 100644
--- a/docs/onnxruntime_custom_ops.md
+++ b/docs/onnxruntime_custom_ops.md
@@ -4,7 +4,7 @@
 
 - [Onnxruntime Custom Ops](#onnxruntime-custom-ops)
   - [SoftNMS](#softnms)
-  - [RoiAlign](#roialign)
+  - [RoIAlign](#roialign)
   - [NMS](#nms)
 
 <!-- TOC -->
@@ -47,7 +47,7 @@ Perform soft NMS on `boxes` with `scores`. Read [Soft-NMS -- Improving Object De
 
 - T:tensor(float32)
 
-## RoiAlign
+## RoIAlign
 
 <h3>Descriptions</h3>
 
diff --git a/docs/onnxruntime_op.md b/docs/onnxruntime_op.md
index a31a0d6971..75b0551f93 100644
--- a/docs/onnxruntime_op.md
+++ b/docs/onnxruntime_op.md
@@ -15,11 +15,11 @@
 
 ## List of operators for ONNX Runtime supported in MMCV
 
-|                    Operator                    |  CPU  |  GPU  |                                                Note                                                 |
-| :--------------------------------------------: | :---: | :---: | :-------------------------------------------------------------------------------------------------: |
-|  [SoftNMS](onnxruntime_custom_ops.md#softnms)  |   Y   |   N   | commit [94810f](https://github.com/open-mmlab/mmcv/commit/94810f2297871d0ea3ca650dcb2e842f5374d998) |
-| [RoiAlign](onnxruntime_custom_ops.md#roialign) |   Y   |   N   |                                                None                                                 |
-|      [NMS](onnxruntime_custom_ops.md#nms)      |   Y   |   N   |                                                None                                                 |
+|                    Operator                    |  CPU  |  GPU  | MMCV Releases |
+| :--------------------------------------------: | :---: | :---: | :-----------: |
+|  [SoftNMS](onnxruntime_custom_ops.md#softnms)  |   Y   |   N   |     1.2.3     |
+| [RoiAlign](onnxruntime_custom_ops.md#roialign) |   Y   |   N   |     1.2.5     |
+|      [NMS](onnxruntime_custom_ops.md#nms)      |   Y   |   N   |     1.2.7     |
 
 ## How to build custom operators for ONNX Runtime
 
diff --git a/docs/tensorrt_plugin.md b/docs/tensorrt_plugin.md
index fbfc58dcf5..5ed62d1ba3 100644
--- a/docs/tensorrt_plugin.md
+++ b/docs/tensorrt_plugin.md
@@ -24,13 +24,13 @@ To ease the deployment of trained models with custom operators from `mmcv.ops` u
 
 ## List of TensorRT plugins supported in MMCV
 
-|   ONNX Operator   |                         TensorRT Plugin                         | Note  |
-| :---------------: | :-------------------------------------------------------------: | :---: |
-|   MMCVRoiAlign    |      [MMCVRoiAlign](./tensorrt_custom_ops.md#mmcvroialign)      |   Y   |
-|     ScatterND     |         [ScatterND](./tensorrt_custom_ops.md#scatternd)         |   Y   |
-| NonMaxSuppression | [NonMaxSuppression](./tensorrt_custom_ops.md#nonmaxsuppression) |   Y   |
-| MMCVDeformConv2d  |  [MMCVDeformConv2d](./tensorrt_custom_ops.md#mmcvdeformconv2d)  |   Y   |
-|   grid_sampler    |      [grid_sampler](./tensorrt_custom_ops.md#grid-sampler)      |   Y   |
+|   ONNX Operator   |                         TensorRT Plugin                         | MMCV Releases |
+| :---------------: | :-------------------------------------------------------------: | :-----------: |
+|   MMCVRoiAlign    |      [MMCVRoiAlign](./tensorrt_custom_ops.md#mmcvroialign)      |     1.2.6     |
+|     ScatterND     |         [ScatterND](./tensorrt_custom_ops.md#scatternd)         |     1.2.6     |
+| NonMaxSuppression | [NonMaxSuppression](./tensorrt_custom_ops.md#nonmaxsuppression) |     1.3.0     |
+| MMCVDeformConv2d  |  [MMCVDeformConv2d](./tensorrt_custom_ops.md#mmcvdeformconv2d)  |     1.3.0     |
+|   grid_sampler    |      [grid_sampler](./tensorrt_custom_ops.md#grid-sampler)      |    master     |
 
 Notes
 

From 70b6e3200f0dbe3e7811486e3810ee6d6dd5b835 Mon Sep 17 00:00:00 2001
From: "q.yao" <yaoqian@sensetime.com>
Date: Wed, 7 Apr 2021 16:17:48 +0800
Subject: [PATCH 05/10] update document

---
 docs/onnxruntime_custom_ops.md | 24 ++++++++++++------------
 docs/tensorrt_custom_ops.md    | 22 +++++++++++-----------
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/docs/onnxruntime_custom_ops.md b/docs/onnxruntime_custom_ops.md
index 5ef2ad0664..519a265715 100644
--- a/docs/onnxruntime_custom_ops.md
+++ b/docs/onnxruntime_custom_ops.md
@@ -29,7 +29,7 @@ Perform soft NMS on `boxes` with `scores`. Read [Soft-NMS -- Improving Object De
 
 <dl>
 <dt><tt>boxes</tt>: T</dt>
-<dd>Input boxes. 2-D tensor of shape (N, 4). N is the batch size.</dd>
+<dd>Input boxes. 2-D tensor of shape (N, 4). N is the number of boxes.</dd>
 <dt><tt>scores</tt>: T</dt>
 <dd>Input scores. 1-D tensor of shape (N, ).</dd>
 </dl>
@@ -51,18 +51,18 @@ Perform soft NMS on `boxes` with `scores`. Read [Soft-NMS -- Improving Object De
 
 <h3>Descriptions</h3>
 
-Perform ROIAlign on output feature, used in bbox_head of most two-stage detectors.
+Perform RoIAlign on output feature, used in bbox_head of most two-stage detectors.
 
 <h3>Parameters</h3>
 
-| Type    | Parameter        | Description                                                                                            |
-| ------- | ---------------- | ------------------------------------------------------------------------------------------------------ |
-| `int`   | `output_height`  | height of output roi                                                                                   |
-| `int`   | `output_width`   | width of output roi                                                                                    |
-| `float` | `spatial_scale`  | scale the input boxes by this                                                                          |
-| `int`   | `sampling_ratio` | number of inputs samples to take for each output sample. 0 to take samples densely for current models. |
-| `str`   | `mode`           | pooling mode in each bin. `avg` or `max`                                                               |
-| `int`   | `aligned`        | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly.  |
+| Type    | Parameter        | Description                                                                                                   |
+| ------- | ---------------- | ------------------------------------------------------------------------------------------------------------- |
+| `int`   | `output_height`  | height of output roi                                                                                          |
+| `int`   | `output_width`   | width of output roi                                                                                           |
+| `float` | `spatial_scale`  | used to scale the input boxes                                                                                 |
+| `int`   | `sampling_ratio` | number of input samples to take for each output sample. `0` means to take samples densely for current models. |
+| `str`   | `mode`           | pooling mode in each bin. `avg` or `max`                                                                      |
+| `int`   | `aligned`        | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly.         |
 
 <h3>Inputs</h3>
 
@@ -88,13 +88,13 @@ Perform ROIAlign on output feature, used in bbox_head of most two-stage detector
 
 <h3>Descriptions</h3>
 
-Filter out boxes has high IOU overlap with previously selected boxes.
+Filter out boxes has high IoU overlap with previously selected boxes.
 
 <h3>Parameters</h3>
 
 | Type    | Parameter       | Description                                                                                                      |
 | ------- | --------------- | ---------------------------------------------------------------------------------------------------------------- |
-| `float` | `iou_threshold` | The threshold for deciding whether boxes overlap too much with respect to IOU. Value range [0, 1]. Default to 0. |
+| `float` | `iou_threshold` | The threshold for deciding whether boxes overlap too much with respect to IoU. Value range [0, 1]. Default to 0. |
 | `int`   | `offset`        | 0 or 1, boxes' width or height is (x2 - x1 + offset).                                                            |
 
 <h3>Inputs</h3>
diff --git a/docs/tensorrt_custom_ops.md b/docs/tensorrt_custom_ops.md
index 2267a25ec5..24833c27bf 100644
--- a/docs/tensorrt_custom_ops.md
+++ b/docs/tensorrt_custom_ops.md
@@ -15,19 +15,19 @@
 
 <h3>Description</h3>
 
-Perform ROIAlign on output feature, used in bbox_head of most two stage
+Perform RoIAlign on output feature, used in bbox_head of most two stage
 detectors.
 
 <h3>Parameters</h3>
 
-| Type    | Parameter        | Description                                                                                            |
-| ------- | ---------------- | ------------------------------------------------------------------------------------------------------ |
-| `int`   | `output_height`  | height of output roi                                                                                   |
-| `int`   | `output_width`   | width of output roi                                                                                    |
-| `float` | `spatial_scale`  | scale the input boxes by this                                                                          |
-| `int`   | `sampling_ratio` | number of inputs samples to take for each output sample. 0 to take samples densely for current models. |
-| `str`   | `mode`           | pooling mode in each bin. `avg` or `max`                                                               |
-| `int`   | `aligned`        | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly.  |
+| Type    | Parameter        | Description                                                                                                   |
+| ------- | ---------------- | ------------------------------------------------------------------------------------------------------------- |
+| `int`   | `output_height`  | height of output roi                                                                                          |
+| `int`   | `output_width`   | width of output roi                                                                                           |
+| `float` | `spatial_scale`  | used to scale the input boxes                                                                                 |
+| `int`   | `sampling_ratio` | number of input samples to take for each output sample. `0` means to take samples densely for current models. |
+| `str`   | `mode`           | pooling mode in each bin. `avg` or `max`                                                                      |
+| `int`   | `aligned`        | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly.         |
 
 <h3>Inputs</h3>
 
@@ -96,7 +96,7 @@ None
 
 <h3>Description</h3>
 
-Filter out boxes has high IOU overlap with previously selected boxes or low score. Output the indices of valid boxes. Indices of invalid boxes will be filled with -1.
+Filter out boxes has high IoU overlap with previously selected boxes or low score. Output the indices of valid boxes. Indices of invalid boxes will be filled with -1.
 
 <h3>Parameters</h3>
 
@@ -104,7 +104,7 @@ Filter out boxes has high IOU overlap with previously selected boxes or low scor
 | ------- | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
 | `int`   | `center_point_box`           | 0 - the box data is supplied as [y1, x1, y2, x2], 1-the box data is supplied as [x_center, y_center, width, height].                 |
 | `int`   | `max_output_boxes_per_class` | The maximum number of boxes to be selected per batch per class. Default to 0, number of output boxes equal to number of input boxes. |
-| `float` | `iou_threshold`              | The threshold for deciding whether boxes overlap too much with respect to IOU. Value range [0, 1]. Default to 0.                     |
+| `float` | `iou_threshold`              | The threshold for deciding whether boxes overlap too much with respect to IoU. Value range [0, 1]. Default to 0.                     |
 | `float` | `score_threshold`            | The threshold for deciding when to remove boxes based on score.                                                                      |
 | `int`   | `offset`                     | 0 or 1, boxes' width or height is (x2 - x1 + offset).                                                                                |
 

From 32827dc3a528683679336ad7139f0f6f6ee91fd8 Mon Sep 17 00:00:00 2001
From: maningsheng <maningsheng@sensetime.com>
Date: Thu, 8 Apr 2021 14:47:57 +0800
Subject: [PATCH 06/10] add deployment.rst

---
 docs/deployment.rst         | 11 +++++++++++
 docs/index.rst              |  1 +
 docs/tensorrt_custom_ops.md |  8 ++++----
 3 files changed, 16 insertions(+), 4 deletions(-)
 create mode 100644 docs/deployment.rst

diff --git a/docs/deployment.rst b/docs/deployment.rst
new file mode 100644
index 0000000000..68f81f9520
--- /dev/null
+++ b/docs/deployment.rst
@@ -0,0 +1,11 @@
+Deployment
+========
+
+.. toctree::
+    :maxdepth: 2
+
+    onnx.md
+    onnxruntime_op.md
+    onnxruntime_custom_ops.md
+    tensorrt_plugin.md
+    tensorrt_custom_ops.md
diff --git a/docs/index.rst b/docs/index.rst
index 996b200ca1..444ba1f2ca 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -17,6 +17,7 @@ Contents
    cnn.md
    ops.md
    build.md
+   deployment.rst
    trouble_shooting.md
    api.rst
 
diff --git a/docs/tensorrt_custom_ops.md b/docs/tensorrt_custom_ops.md
index 24833c27bf..ba140fbf1d 100644
--- a/docs/tensorrt_custom_ops.md
+++ b/docs/tensorrt_custom_ops.md
@@ -3,15 +3,15 @@
 <!-- TOC -->
 
 - [TensorRT Custom Ops](#tensorrt-custom-ops)
-  - [MMCVRoiAlign](#mmcvroialign)
+  - [MMCVRoIAlign](#mmcvroialign)
   - [ScatterND](#scatternd)
   - [NonMaxSuppression](#nonmaxsuppression)
   - [MMCVDeformConv2d](#mmcvdeformconv2d)
-  - [grid sampler](#grid-sampler)
+  - [grid_sampler](#grid_sampler)
 
 <!-- TOC -->
 
-## MMCVRoiAlign
+## MMCVRoIAlign
 
 <h3>Description</h3>
 
@@ -169,7 +169,7 @@ Perform Deformable Convolution on input feature, read [Deformable Convolutional
 
 - T:tensor(float32, Linear)
 
-## grid sampler
+## grid_sampler
 
 <h3>Description</h3>
 

From b6c69e3c446bed39af86de747129775c1c8381b3 Mon Sep 17 00:00:00 2001
From: "q.yao" <yaoqian@sensetime.com>
Date: Fri, 9 Apr 2021 11:08:13 +0800
Subject: [PATCH 07/10] add grid_sampler onnxruntime document

---
 docs/onnxruntime_custom_ops.md | 35 ++++++++++++++++++++++++++++++++++
 docs/onnxruntime_op.md         | 11 ++++++-----
 2 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/docs/onnxruntime_custom_ops.md b/docs/onnxruntime_custom_ops.md
index 519a265715..cf4a1b41ec 100644
--- a/docs/onnxruntime_custom_ops.md
+++ b/docs/onnxruntime_custom_ops.md
@@ -6,6 +6,7 @@
   - [SoftNMS](#softnms)
   - [RoIAlign](#roialign)
   - [NMS](#nms)
+  - [grid_sampler](#grid_sampler)
 
 <!-- TOC -->
 
@@ -116,3 +117,37 @@ Filter out boxes has high IoU overlap with previously selected boxes.
 <h3>Type Constraints</h3>
 
 - T:tensor(float32)
+
+## grid_sampler
+
+<h3>Description</h3>
+
+Perform sample from `input` with pixel locations from `grid`.
+
+<h3>Parameters</h3>
+
+| Type  | Parameter            | Description                                                                                                                                                                                                                                                                                     |
+| ----- | -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `int` | `interpolation_mode` | Interpolation mode to calculate output values. (0: `bilinear` , 1: `nearest`)                                                                                                                                                                                                                   |
+| `int` | `padding_mode`       | Padding mode for outside grid values. (0: `zeros`, 1: `border`, 2: `reflection`)                                                                                                                                                                                                                |
+| `int` | `align_corners`      | If `align_corners=1`, the extrema (`-1` and `1`) are considered as referring to the center points of the input's corner pixels. If `align_corners=0`, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic. |
+
+<h3>Inputs</h3>
+
+<dl>
+<dt><tt>input</tt>: T</dt>
+<dd>Input feature; 4-D tensor of shape (N, C, inH, inW), where N is the batch size, C is the numbers of channels, inH and inW are the height and width of the data.</dd>
+<dt><tt>grid</tt>: T</dt>
+<dd>Input offset; 4-D tensor of shape (N, outH, outW, 2), where outH and outW is the height and width of offset and output. </dd>
+</dl>
+
+<h3>Outputs</h3>
+
+<dl>
+<dt><tt>output</tt>: T</dt>
+<dd>Output feature; 4-D tensor of shape (N, C, outH, outW).</dd>
+</dl>
+
+<h3>Type Constraints</h3>
+
+- T:tensor(float32, Linear)
\ No newline at end of file
diff --git a/docs/onnxruntime_op.md b/docs/onnxruntime_op.md
index 75b0551f93..9324524e39 100644
--- a/docs/onnxruntime_op.md
+++ b/docs/onnxruntime_op.md
@@ -15,11 +15,12 @@
 
 ## List of operators for ONNX Runtime supported in MMCV
 
-|                    Operator                    |  CPU  |  GPU  | MMCV Releases |
-| :--------------------------------------------: | :---: | :---: | :-----------: |
-|  [SoftNMS](onnxruntime_custom_ops.md#softnms)  |   Y   |   N   |     1.2.3     |
-| [RoiAlign](onnxruntime_custom_ops.md#roialign) |   Y   |   N   |     1.2.5     |
-|      [NMS](onnxruntime_custom_ops.md#nms)      |   Y   |   N   |     1.2.7     |
+|                        Operator                        |  CPU  |  GPU  | MMCV Releases |
+| :----------------------------------------------------: | :---: | :---: | :-----------: |
+|      [SoftNMS](onnxruntime_custom_ops.md#softnms)      |   Y   |   N   |     1.2.3     |
+|     [RoIAlign](onnxruntime_custom_ops.md#roialign)     |   Y   |   N   |     1.2.5     |
+|          [NMS](onnxruntime_custom_ops.md#nms)          |   Y   |   N   |     1.2.7     |
+| [grid_sampler](onnxruntime_custom_ops.md#grid_sampler) |   Y   |   N   |    master     |
 
 ## How to build custom operators for ONNX Runtime
 

From c83bf8b4f51b671e4040e6b26e810dc0d9ce1264 Mon Sep 17 00:00:00 2001
From: "q.yao" <yaoqian@sensetime.com>
Date: Fri, 9 Apr 2021 12:36:49 +0800
Subject: [PATCH 08/10] fix lint

---
 docs/onnxruntime_custom_ops.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/onnxruntime_custom_ops.md b/docs/onnxruntime_custom_ops.md
index cf4a1b41ec..c37757929c 100644
--- a/docs/onnxruntime_custom_ops.md
+++ b/docs/onnxruntime_custom_ops.md
@@ -150,4 +150,4 @@ Perform sample from `input` with pixel locations from `grid`.
 
 <h3>Type Constraints</h3>
 
-- T:tensor(float32, Linear)
\ No newline at end of file
+- T:tensor(float32, Linear)

From 1bcb7375898518a40f15ef6e53c488d2d7034449 Mon Sep 17 00:00:00 2001
From: grimoire <streetyao@live.com>
Date: Sun, 11 Apr 2021 16:28:56 +0800
Subject: [PATCH 09/10] add allow_different_nesting tag

---
 docs/onnxruntime_custom_ops.md | 60 ++++++++++++++++++---------
 docs/tensorrt_custom_ops.md    | 75 ++++++++++++++++++++++------------
 2 files changed, 90 insertions(+), 45 deletions(-)

diff --git a/docs/onnxruntime_custom_ops.md b/docs/onnxruntime_custom_ops.md
index c37757929c..e42032d23d 100644
--- a/docs/onnxruntime_custom_ops.md
+++ b/docs/onnxruntime_custom_ops.md
@@ -4,19 +4,39 @@
 
 - [Onnxruntime Custom Ops](#onnxruntime-custom-ops)
   - [SoftNMS](#softnms)
+    - [Description](#description)
+    - [Parameters](#parameters)
+    - [Inputs](#inputs)
+    - [Outputs](#outputs)
+    - [Type Constraints](#type-constraints)
   - [RoIAlign](#roialign)
+    - [Description](#description-1)
+    - [Parameters](#parameters-1)
+    - [Inputs](#inputs-1)
+    - [Outputs](#outputs-1)
+    - [Type Constraints](#type-constraints-1)
   - [NMS](#nms)
+    - [Description](#description-2)
+    - [Parameters](#parameters-2)
+    - [Inputs](#inputs-2)
+    - [Outputs](#outputs-2)
+    - [Type Constraints](#type-constraints-2)
   - [grid_sampler](#grid_sampler)
+    - [Description](#description-3)
+    - [Parameters](#parameters-3)
+    - [Inputs](#inputs-3)
+    - [Outputs](#outputs-3)
+    - [Type Constraints](#type-constraints-3)
 
 <!-- TOC -->
 
 ## SoftNMS
 
-<h3>Descriptions</h3>
+### Description
 
 Perform soft NMS on `boxes` with `scores`. Read [Soft-NMS -- Improving Object Detection With One Line of Code](https://arxiv.org/abs/1704.04503) for detail.
 
-<h3>Parameters</h3>
+### Parameters
 
 | Type    | Parameter       | Description                                                    |
 | ------- | --------------- | -------------------------------------------------------------- |
@@ -26,7 +46,7 @@ Perform soft NMS on `boxes` with `scores`. Read [Soft-NMS -- Improving Object De
 | `int`   | `method`        | method to do the nms, (0: `naive`, 1: `linear`, 2: `gaussian`) |
 | `int`   | `offset`        | `boxes` width or height is (x2 - x1 + offset). (0 or 1)        |
 
-<h3>Inputs</h3>
+### Inputs
 
 <dl>
 <dt><tt>boxes</tt>: T</dt>
@@ -35,7 +55,7 @@ Perform soft NMS on `boxes` with `scores`. Read [Soft-NMS -- Improving Object De
 <dd>Input scores. 1-D tensor of shape (N, ).</dd>
 </dl>
 
-<h3>Outputs</h3>
+### Outputs
 
 <dl>
 <dt><tt>dets</tt>: tensor(int64)</dt>
@@ -44,17 +64,17 @@ Perform soft NMS on `boxes` with `scores`. Read [Soft-NMS -- Improving Object De
 <dd>Output indices. 1-D tensor of shape (num_valid_boxes, ).</dd>
 </dl>
 
-<h3>Type Constraints</h3>
+### Type Constraints
 
 - T:tensor(float32)
 
 ## RoIAlign
 
-<h3>Descriptions</h3>
+### Description
 
 Perform RoIAlign on output feature, used in bbox_head of most two-stage detectors.
 
-<h3>Parameters</h3>
+### Parameters
 
 | Type    | Parameter        | Description                                                                                                   |
 | ------- | ---------------- | ------------------------------------------------------------------------------------------------------------- |
@@ -65,7 +85,7 @@ Perform RoIAlign on output feature, used in bbox_head of most two-stage detector
 | `str`   | `mode`           | pooling mode in each bin. `avg` or `max`                                                                      |
 | `int`   | `aligned`        | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly.         |
 
-<h3>Inputs</h3>
+### Inputs
 
 <dl>
 <dt><tt>input</tt>: T</dt>
@@ -74,31 +94,31 @@ Perform RoIAlign on output feature, used in bbox_head of most two-stage detector
 <dd>RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...]. The RoIs' coordinates are the coordinate system of input.</dd>
 </dl>
 
-<h3>Outputs</h3>
+### Outputs
 
 <dl>
 <dt><tt>feat</tt>: T</dt>
 <dd>RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element feat[r-1] is a pooled feature map corresponding to the r-th RoI RoIs[r-1].<dd>
 </dl>
 
-<h3>Type Constraints</h3>
+### Type Constraints
 
 - T:tensor(float32)
 
 ## NMS
 
-<h3>Descriptions</h3>
+### Description
 
 Filter out boxes has high IoU overlap with previously selected boxes.
 
-<h3>Parameters</h3>
+### Parameters
 
 | Type    | Parameter       | Description                                                                                                      |
 | ------- | --------------- | ---------------------------------------------------------------------------------------------------------------- |
 | `float` | `iou_threshold` | The threshold for deciding whether boxes overlap too much with respect to IoU. Value range [0, 1]. Default to 0. |
 | `int`   | `offset`        | 0 or 1, boxes' width or height is (x2 - x1 + offset).                                                            |
 
-<h3>Inputs</h3>
+### Inputs
 
 <dl>
 <dt><tt>bboxes</tt>: T</dt>
@@ -107,24 +127,24 @@ Filter out boxes has high IoU overlap with previously selected boxes.
 <dd>Input scores. 1-D tensor of shape (num_boxes, ).</dd>
 </dl>
 
-<h3>Outputs</h3>
+### Outputs
 
 <dl>
 <dt><tt>indices</tt>: tensor(int32, Linear)</dt>
 <dd>Selected indices. 1-D tensor of shape (num_valid_boxes, ). num_valid_boxes is the number of valid boxes.</dd>
 </dl>
 
-<h3>Type Constraints</h3>
+### Type Constraints
 
 - T:tensor(float32)
 
 ## grid_sampler
 
-<h3>Description</h3>
+### Description
 
 Perform sample from `input` with pixel locations from `grid`.
 
-<h3>Parameters</h3>
+### Parameters
 
 | Type  | Parameter            | Description                                                                                                                                                                                                                                                                                     |
 | ----- | -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
@@ -132,7 +152,7 @@ Perform sample from `input` with pixel locations from `grid`.
 | `int` | `padding_mode`       | Padding mode for outside grid values. (0: `zeros`, 1: `border`, 2: `reflection`)                                                                                                                                                                                                                |
 | `int` | `align_corners`      | If `align_corners=1`, the extrema (`-1` and `1`) are considered as referring to the center points of the input's corner pixels. If `align_corners=0`, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic. |
 
-<h3>Inputs</h3>
+### Inputs
 
 <dl>
 <dt><tt>input</tt>: T</dt>
@@ -141,13 +161,13 @@ Perform sample from `input` with pixel locations from `grid`.
 <dd>Input offset; 4-D tensor of shape (N, outH, outW, 2), where outH and outW is the height and width of offset and output. </dd>
 </dl>
 
-<h3>Outputs</h3>
+### Outputs
 
 <dl>
 <dt><tt>output</tt>: T</dt>
 <dd>Output feature; 4-D tensor of shape (N, C, outH, outW).</dd>
 </dl>
 
-<h3>Type Constraints</h3>
+### Type Constraints
 
 - T:tensor(float32, Linear)
diff --git a/docs/tensorrt_custom_ops.md b/docs/tensorrt_custom_ops.md
index ba140fbf1d..da696f03e9 100644
--- a/docs/tensorrt_custom_ops.md
+++ b/docs/tensorrt_custom_ops.md
@@ -4,21 +4,46 @@
 
 - [TensorRT Custom Ops](#tensorrt-custom-ops)
   - [MMCVRoIAlign](#mmcvroialign)
+    - [Description](#description)
+    - [Parameters](#parameters)
+    - [Inputs](#inputs)
+    - [Outputs](#outputs)
+    - [Type Constraints](#type-constraints)
   - [ScatterND](#scatternd)
+    - [Description](#description-1)
+    - [Parameters](#parameters-1)
+    - [Inputs](#inputs-1)
+    - [Outputs](#outputs-1)
+    - [Type Constraints](#type-constraints-1)
   - [NonMaxSuppression](#nonmaxsuppression)
+    - [Description](#description-2)
+    - [Parameters](#parameters-2)
+    - [Inputs](#inputs-2)
+    - [Outputs](#outputs-2)
+    - [Type Constraints](#type-constraints-2)
   - [MMCVDeformConv2d](#mmcvdeformconv2d)
+    - [Description](#description-3)
+    - [Parameters](#parameters-3)
+    - [Inputs](#inputs-3)
+    - [Outputs](#outputs-3)
+    - [Type Constraints](#type-constraints-3)
   - [grid_sampler](#grid_sampler)
+    - [Description](#description-4)
+    - [Parameters](#parameters-4)
+    - [Inputs](#inputs-4)
+    - [Outputs](#outputs-4)
+    - [Type Constraints](#type-constraints-4)
 
 <!-- TOC -->
 
 ## MMCVRoIAlign
 
-<h3>Description</h3>
+### Description
 
 Perform RoIAlign on output feature, used in bbox_head of most two stage
 detectors.
 
-<h3>Parameters</h3>
+### Parameters
 
 | Type    | Parameter        | Description                                                                                                   |
 | ------- | ---------------- | ------------------------------------------------------------------------------------------------------------- |
@@ -29,7 +54,7 @@ detectors.
 | `str`   | `mode`           | pooling mode in each bin. `avg` or `max`                                                                      |
 | `int`   | `aligned`        | If `aligned=0`, use the legacy implementation in MMDetection. Else, align the results more perfectly.         |
 
-<h3>Inputs</h3>
+### Inputs
 
 <dl>
 <dt><tt>inputs[0]</tt>: T</dt>
@@ -38,20 +63,20 @@ detectors.
 <dd>RoIs (Regions of Interest) to pool over; 2-D tensor of shape (num_rois, 5) given as [[batch_index, x1, y1, x2, y2], ...]. The RoIs' coordinates are the coordinate system of inputs[0].</dd>
 </dl>
 
-<h3>Outputs</h3>
+### Outputs
 
 <dl>
 <dt><tt>outputs[0]</tt>: T</dt>
 <dd>RoI pooled output, 4-D tensor of shape (num_rois, C, output_height, output_width). The r-th batch element output[0][r-1] is a pooled feature map corresponding to the r-th RoI inputs[1][r-1].<dd>
 </dl>
 
-<h3>Type Constraints</h3>
+### Type Constraints
 
 - T:tensor(float32, Linear)
 
 ## ScatterND
 
-<h3>Description</h3>
+### Description
 
 ScatterND takes three inputs `data` tensor of rank r >= 1, `indices` tensor of rank q >= 1, and `updates` tensor of rank q + r - indices.shape[-1] - 1. The output of the operation is produced by creating a copy of the input `data`, and then updating its value to values specified by updates at specific index positions specified by `indices`. Its output shape is the same as the shape of `data`. Note that `indices` should not have duplicate entries. That is, two or more updates for the same index-location is not supported.
 
@@ -64,11 +89,11 @@ The `output` is calculated via the following equation:
       output[indices[idx]] = updates[idx]
 ```
 
-<h3>Parameters</h3>
+### Parameters
 
 None
 
-<h3>Inputs</h3>
+### Inputs
 
 <dl>
 <dt><tt>inputs[0]</tt>: T</dt>
@@ -81,24 +106,24 @@ None
 <dd>Tensor of rank q + r - indices_shape[-1] - 1.</dd>
 </dl>
 
-<h3>Outputs</h3>
+### Outputs
 
 <dl>
 <dt><tt>outputs[0]</tt>: T</dt>
 <dd>Tensor of rank r >= 1.</dd>
 </dl>
 
-<h3>Type Constraints</h3>
+### Type Constraints
 
 - T:tensor(float32, Linear), tensor(int32, Linear)
 
 ## NonMaxSuppression
 
-<h3>Description</h3>
+### Description
 
 Filter out boxes has high IoU overlap with previously selected boxes or low score. Output the indices of valid boxes. Indices of invalid boxes will be filled with -1.
 
-<h3>Parameters</h3>
+### Parameters
 
 | Type    | Parameter                    | Description                                                                                                                          |
 | ------- | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
@@ -108,7 +133,7 @@ Filter out boxes has high IoU overlap with previously selected boxes or low scor
 | `float` | `score_threshold`            | The threshold for deciding when to remove boxes based on score.                                                                      |
 | `int`   | `offset`                     | 0 or 1, boxes' width or height is (x2 - x1 + offset).                                                                                |
 
-<h3>Inputs</h3>
+### Inputs
 
 <dl>
 <dt><tt>inputs[0]</tt>: T</dt>
@@ -117,7 +142,7 @@ Filter out boxes has high IoU overlap with previously selected boxes or low scor
 <dd>Input scores. 3-D tensor of shape (num_batches, num_classes, spatial_dimension).</dd>
 </dl>
 
-<h3>Outputs</h3>
+### Outputs
 
 <dl>
 <dt><tt>outputs[0]</tt>: tensor(int32, Linear)</dt>
@@ -126,17 +151,17 @@ Filter out boxes has high IoU overlap with previously selected boxes or low scor
 <dd>All invalid indices will be filled with -1.</dd>
 </dl>
 
-<h3>Type Constraints</h3>
+### Type Constraints
 
 - T:tensor(float32, Linear)
 
 ## MMCVDeformConv2d
 
-<h3>Description</h3>
+### Description
 
 Perform Deformable Convolution on input feature, read [Deformable Convolutional Network](https://arxiv.org/abs/1703.06211) for detail.
 
-<h3>Parameters</h3>
+### Parameters
 
 | Type           | Parameter          | Description                                                                                                                       |
 | -------------- | ------------------ | --------------------------------------------------------------------------------------------------------------------------------- |
@@ -147,7 +172,7 @@ Perform Deformable Convolution on input feature, read [Deformable Convolutional
 | `int`          | `group`            | Split input into groups. `input_channel` should be divisible by the number of groups.                                             |
 | `int`          | `im2col_step`      | DeformableConv2d use im2col to compute convolution. im2col_step is used to split input and offset, reduce memory usage of column. |
 
-<h3>Inputs</h3>
+### Inputs
 
 <dl>
 <dt><tt>inputs[0]</tt>: T</dt>
@@ -158,24 +183,24 @@ Perform Deformable Convolution on input feature, read [Deformable Convolutional
 <dd>Input weight; 4-D tensor of shape (output_channel, input_channel, kH, kW).</dd>
 </dl>
 
-<h3>Outputs</h3>
+### Outputs
 
 <dl>
 <dt><tt>outputs[0]</tt>: T</dt>
 <dd>Output feature; 4-D tensor of shape (N, output_channel, outH, outW).</dd>
 </dl>
 
-<h3>Type Constraints</h3>
+### Type Constraints
 
 - T:tensor(float32, Linear)
 
 ## grid_sampler
 
-<h3>Description</h3>
+### Description
 
 Perform sample from `input` with pixel locations from `grid`.
 
-<h3>Parameters</h3>
+### Parameters
 
 | Type  | Parameter            | Description                                                                                                                                                                                                                                                                                     |
 | ----- | -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
@@ -183,7 +208,7 @@ Perform sample from `input` with pixel locations from `grid`.
 | `int` | `padding_mode`       | Padding mode for outside grid values. (0: `zeros`, 1: `border`, 2: `reflection`)                                                                                                                                                                                                                |
 | `int` | `align_corners`      | If `align_corners=1`, the extrema (`-1` and `1`) are considered as referring to the center points of the input's corner pixels. If `align_corners=0`, they are instead considered as referring to the corner points of the input's corner pixels, making the sampling more resolution agnostic. |
 
-<h3>Inputs</h3>
+### Inputs
 
 <dl>
 <dt><tt>inputs[0]</tt>: T</dt>
@@ -192,13 +217,13 @@ Perform sample from `input` with pixel locations from `grid`.
 <dd>Input offset; 4-D tensor of shape (N, outH, outW, 2), where outH and outW is the height and width of offset and output. </dd>
 </dl>
 
-<h3>Outputs</h3>
+### Outputs
 
 <dl>
 <dt><tt>outputs[0]</tt>: T</dt>
 <dd>Output feature; 4-D tensor of shape (N, C, outH, outW).</dd>
 </dl>
 
-<h3>Type Constraints</h3>
+### Type Constraints
 
 - T:tensor(float32, Linear)

From 3bbb7bca35503db9469e694078687821cfbf4460 Mon Sep 17 00:00:00 2001
From: grimoire <streetyao@live.com>
Date: Sun, 11 Apr 2021 16:29:12 +0800
Subject: [PATCH 10/10] add allow_different_nesting tag

---
 .pre-commit-config.yaml | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 056c046592..efa84b8cfa 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -33,7 +33,8 @@ repos:
     rev: 2.1.4
     hooks:
       - id: markdownlint
-        args: ["-r", "~MD002,~MD013,~MD029,~MD033,~MD034"]
+        args: ["-r", "~MD002,~MD013,~MD029,~MD033,~MD034",
+              "-t", "allow_different_nesting"]
   - repo: https://github.com/myint/docformatter
     rev: v1.3.1
     hooks: