diff --git a/docs/supported_tasks/index.rst b/docs/supported_tasks/index.rst
index 318439fe1e..7b30c59d17 100644
--- a/docs/supported_tasks/index.rst
+++ b/docs/supported_tasks/index.rst
@@ -2,4 +2,5 @@
    :maxdepth: 2
 
    lidar_det3d.md
+   vision_det3d.md
    lidar_sem_seg3d.md
diff --git a/docs/supported_tasks/vision_det3d.md b/docs/supported_tasks/vision_det3d.md
new file mode 100644
index 0000000000..3b9a249b16
--- /dev/null
+++ b/docs/supported_tasks/vision_det3d.md
@@ -0,0 +1,133 @@
+# Vision-Based 3D Detection
+
+Vision-based 3D detection refers to the 3D detection solutions based on vision-only input, such as monocular, binocular, and multi-view image based 3D detection.
+Currently, we only support monocular and multi-view 3D detection methods. Other approaches should be also compatible with our framework and will be supported in the future.
+
+It expects the given model to take any number of images as input, and predict the 3D bounding boxes and category labels for each object of interest.
+Taking FCOS3D on the nuScenes dataset as an example, we will show how to prepare data, train and test a model on a standard 3D detection benchmark, and how to visualize and validate the results.
+
+## Data Preparation
+
+To begin with, we need to download the raw data and reorganize the data in a standard way presented in the [doc for data preparation](https://mmdetection3d.readthedocs.io/en/latest/data_preparation.html).
+
+Due to different ways of organizing the raw data in different datasets, we typically need to collect the useful data information with a .pkl or .json file.
+So after getting all the raw data ready, we need to run the scripts provided in the `create_data.py` for different datasets to generate data infos.
+For example, for nuScenes we need to run:
+
+```
+python tools/create_data.py nuscenes --root-path ./data/nuscenes --out-dir ./data/nuscenes --extra-tag nuscenes
+```
+
+Afterwards, the related folder structure should be as follows:
+
+```
+mmdetection3d
+├── mmdet3d
+├── tools
+├── configs
+├── data
+│   ├── nuscenes
+│   │   ├── maps
+│   │   ├── samples
+│   │   ├── sweeps
+│   │   ├── v1.0-test
+|   |   ├── v1.0-trainval
+│   │   ├── nuscenes_database
+│   │   ├── nuscenes_infos_train.pkl
+│   │   ├── nuscenes_infos_trainval.pkl
+│   │   ├── nuscenes_infos_val.pkl
+│   │   ├── nuscenes_infos_test.pkl
+│   │   ├── nuscenes_dbinfos_train.pkl
+│   │   ├── nuscenes_infos_train_mono3d.coco.json
+│   │   ├── nuscenes_infos_trainval_mono3d.coco.json
+│   │   ├── nuscenes_infos_val_mono3d.coco.json
+│   │   ├── nuscenes_infos_test_mono3d.coco.json
+```
+
+Note that the .pkl files here are mainly used for methods using LiDAR data and .json files are used for 2D detection/vision-only 3D detection.
+The .json files only contain infos for 2D detection before supporting monocular 3D detection in v0.13.0, so if you need the latest infos, please checkout the branches after v0.13.0.
+
+## Training
+
+Then let us train a model with provided configs for FCOS3D. The basic script is the same as other models.
+You can basically follow the examples provided in this [tutorial](https://mmdetection3d.readthedocs.io/en/latest/1_exist_data_model.html#inference-with-existing-models) when training with different GPU settings.
+Suppose we use 8 GPUs on a single machine with distributed training:
+
+```
+./tools/dist_train.sh configs/fcos3d/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d.py 8
+```
+
+Note that `2x8` in the config name refers to the training is completed with 8 GPUs and 2 data samples on each GPU.
+If your customized setting is different from this, sometimes you need to adjust the learning rate accordingly.
+A basic rule can be referred to [here](https://arxiv.org/abs/1706.02677).
+
+We can also achieve better performance with finetuned FCOS3D by running:
+
+```
+./tools/dist_train.sh fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d_finetune.py 8
+```
+
+after training a baseline model with the previous script.
+Please remember to modify the path [here](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/fcos3d/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d_finetune.py#L8) correspondingly.
+
+## Quantitative Evaluation
+
+During training, the model checkpoints will be evaluated regularly according to the setting of `evaluation = dict(interval=xxx)` in the config.
+
+We support official evaluation protocols for different datasets.
+Due to the output format is the same as 3D detection based on other modalities, the evaluation methods are also the same.
+
+For nuScenes, the model will be evaluated with distance-based mean AP (mAP) and NuScenes Detection Score (NDS) for 10 categories respectively.
+The evaluation results will be printed in the command like:
+
+```
+mAP: 0.3197
+mATE: 0.7595
+mASE: 0.2700
+mAOE: 0.4918
+mAVE: 1.3307
+mAAE: 0.1724
+NDS: 0.3905
+Eval time: 170.8s
+
+Per-class results:
+Object Class    AP      ATE     ASE     AOE     AVE     AAE
+car     0.503   0.577   0.152   0.111   2.096   0.136
+truck   0.223   0.857   0.224   0.220   1.389   0.179
+bus     0.294   0.855   0.204   0.190   2.689   0.283
+trailer 0.081   1.094   0.243   0.553   0.742   0.167
+construction_vehicle    0.058   1.017   0.450   1.019   0.137   0.341
+pedestrian      0.392   0.687   0.284   0.694   0.876   0.158
+motorcycle      0.317   0.737   0.265   0.580   2.033   0.104
+bicycle 0.308   0.704   0.299   0.892   0.683   0.010
+traffic_cone    0.555   0.486   0.309   nan     nan     nan
+barrier 0.466   0.581   0.269   0.169   nan     nan
+```
+
+In addition, you can also evaluate a specific model checkpoint after training is finished. Simply run scripts like the following:
+
+```
+./tools/dist_test.sh configs/fcos3d/fcos3d_r101_caffe_fpn_gn-head_dcn_2x8_1x_nus-mono3d.py \
+    work_dirs/fcos3d/latest.pth --eval mAP
+```
+
+## Testing and Making a Submission
+
+If you would like to only conduct inference or test the model performance on the online benchmark,
+you just need to replace the `--eval mAP` with `--format-only` in the previous evaluation script and specify the `jsonfile_prefix` if necessary,
+e.g., adding an option `--eval-options jsonfile_prefix=work_dirs/fcos3d/test_submission`.
+Please guarantee the [info for testing](https://github.com/open-mmlab/mmdetection3d/blob/master/configs/_base_/datasets/nus-mono3d.py#L93) in the config corresponds to the test set instead of validation set.
+
+After generating the results, you can basically compress the folder and upload to the evalAI evaluation server for nuScenes 3D detection challenge.
+
+## Qualitative Validation
+
+MMDetection3D also provides versatile tools for visualization such that we can have an intuitive feeling of the detection results predicted by our trained models.
+You can either set the `--eval-options 'show=True' 'out_dir=${SHOW_DIR}'` option to visualize the detection results online during evaluation,
+or using `tools/misc/visualize_results.py` for offline visualization.
+
+Besides, we also provide scripts `tools/misc/browse_dataset.py` to visualize the dataset without inference.
+Please refer more details in the [doc for visualization](https://mmdetection3d.readthedocs.io/en/latest/useful_tools.html#visualization).
+
+Note that currently we only support the visualization on images for vision-only methods.
+The visualization in the perspective view and bird-eye-view (BEV) will be integrated in the future.